[
https://issues.apache.org/jira/browse/PIG-55?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Charlie Groves updated PIG-55:
------------------------------
Attachment: pig_chunker_split_v4.patch
pig_chunker_split_v4.patch adds a validate method to Slicers that is called
when the Slicer is parsed to allow it to throw an exception if a location isn't
parsable or if it doesn't exist on the dfs. It also adds a new test case,
TestParser, that checks that an exception is raised at parse time by the
default slicer if the location it's given doesn't exist. TestParser looks a
little slim with a single test method in it, but it didn't seem like the test
fit in with any of the other existing test cases.
All the tests pass for me with the patch in place.
> Allow user control over split creation
> --------------------------------------
>
> Key: PIG-55
> URL: https://issues.apache.org/jira/browse/PIG-55
> Project: Pig
> Issue Type: Improvement
> Affects Versions: 0.0.0
> Reporter: Charlie Groves
> Fix For: 0.1.0
>
> Attachments: pig_chunker_split.patch, pig_chunker_split_v2.patch,
> pig_chunker_split_v3.patch, pig_chunker_split_v4.patch,
> replaceable_PigSplit.diff, replaceable_PigSplit_v2.diff
>
>
> I have a dataset in HDFS that's stored in a file per column that I'd like to
> access from pig. This means I can't use LoadFunc to get at the data as it
> only allows the loader access to a single input stream at a time. To handle
> this usage, I've broken the existing split creation code out into a few
> classes and interfaces, and allowed user specified load functions to be used
> in place of the existing code.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.