[ 
https://issues.apache.org/jira/browse/PIG-55?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12577123#action_12577123
 ] 

Benjamin Reed commented on PIG-55:
----------------------------------

Great work Charlie! I like it. Couple of details:

1) PigContext isn't part of the public API, so it would probably be best if the 
LoadFunc was passed to the Chunk rather than relying on the Chunk class to 
construct if needed.

2) It would be nice if the compressed handling could be done outside the Chunk 
class so that programmers don't have to boiler plate it. (I'm not sure there is 
a nice way to do it, so I'm fine with blowing this off for now.)

3) Javadoc is needed for the Chunk and Chunker classes. The interaction between 
the LoadFunc and the Chunk/Chunker classes needs to be well documented.

4) You should put in a test case for a user defined Chunker and Chunk class. 
(When InputSplits were first put into Hadoop, it worked for the builtin classes 
but failed for user defined Splits).

Alan can you check this out? I'd like to commit this soon. I don't think it 
should effect your pipeline work too much.

> Allow user control over split creation
> --------------------------------------
>
>                 Key: PIG-55
>                 URL: https://issues.apache.org/jira/browse/PIG-55
>             Project: Pig
>          Issue Type: Improvement
>    Affects Versions: 0.0.0
>            Reporter: Charlie Groves
>             Fix For: 0.1.0
>
>         Attachments: pig_chunker_split.patch, replaceable_PigSplit.diff, 
> replaceable_PigSplit_v2.diff
>
>
> I have a dataset in HDFS that's stored in a file per column that I'd like to 
> access from pig.  This means I can't use LoadFunc to get at the data as it 
> only allows the loader access to a single input stream at a time.  To handle 
> this usage, I've broken the existing split creation code out into a few 
> classes and interfaces, and allowed user specified load functions to be used 
> in place of the existing code.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to