[jira] Commented: (PIG-55) Allow user control over split creation

Charlie Groves (JIRA) Mon, 10 Mar 2008 16:50:14 -0700

    [ 
https://issues.apache.org/jira/browse/PIG-55?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12577253#action_12577253
 ]


Charlie Groves commented on PIG-55:
-----------------------------------

bq. 1) PigContext isn't part of the public API, so it would probably be best if 
the LoadFunc was passed to the Chunk rather than relying on the Chunk class to 
construct if needed.

This is probably clearer in the version of the patch with Javadoc, but there is 
no requirement that a Chunk use LoadFuncs internally.  The whole reason I'm 
writing this is to get around the requirement of using a single stream to read 
tuples in LoadFunc.  For user implementations of Chunker, I expect them to 
create their own Chunks as well that do Tuple creation in next.  They won't 
touch LoadFunc at all, and in any case, the class name string usually used to 
create a LoadFunc is being used to create a Chunker, so there won't be a 
LoadFunc available automatically.  This isn't to say that Chunks can't use 
LoadFuncs, just that they aren't expected to.

> Allow user control over split creation
> --------------------------------------
>
>                 Key: PIG-55
>                 URL: https://issues.apache.org/jira/browse/PIG-55
>             Project: Pig
>          Issue Type: Improvement
>    Affects Versions: 0.0.0
>            Reporter: Charlie Groves
>             Fix For: 0.1.0
>
>         Attachments: pig_chunker_split.patch, pig_chunker_split_v2.patch, 
> replaceable_PigSplit.diff, replaceable_PigSplit_v2.diff
>
>
> I have a dataset in HDFS that's stored in a file per column that I'd like to 
> access from pig.  This means I can't use LoadFunc to get at the data as it 
> only allows the loader access to a single input stream at a time.  To handle 
> this usage, I've broken the existing split creation code out into a few 
> classes and interfaces, and allowed user specified load functions to be used 
> in place of the existing code.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (PIG-55) Allow user control over split creation

Reply via email to