[
https://issues.apache.org/jira/browse/PIG-55?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12577253#action_12577253
]
Charlie Groves commented on PIG-55:
-----------------------------------
bq. 1) PigContext isn't part of the public API, so it would probably be best if
the LoadFunc was passed to the Chunk rather than relying on the Chunk class to
construct if needed.
This is probably clearer in the version of the patch with Javadoc, but there is
no requirement that a Chunk use LoadFuncs internally. The whole reason I'm
writing this is to get around the requirement of using a single stream to read
tuples in LoadFunc. For user implementations of Chunker, I expect them to
create their own Chunks as well that do Tuple creation in next. They won't
touch LoadFunc at all, and in any case, the class name string usually used to
create a LoadFunc is being used to create a Chunker, so there won't be a
LoadFunc available automatically. This isn't to say that Chunks can't use
LoadFuncs, just that they aren't expected to.
> Allow user control over split creation
> --------------------------------------
>
> Key: PIG-55
> URL: https://issues.apache.org/jira/browse/PIG-55
> Project: Pig
> Issue Type: Improvement
> Affects Versions: 0.0.0
> Reporter: Charlie Groves
> Fix For: 0.1.0
>
> Attachments: pig_chunker_split.patch, pig_chunker_split_v2.patch,
> replaceable_PigSplit.diff, replaceable_PigSplit_v2.diff
>
>
> I have a dataset in HDFS that's stored in a file per column that I'd like to
> access from pig. This means I can't use LoadFunc to get at the data as it
> only allows the loader access to a single input stream at a time. To handle
> this usage, I've broken the existing split creation code out into a few
> classes and interfaces, and allowed user specified load functions to be used
> in place of the existing code.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.