Gaurav Jain commented on PIG-1111:
There was an code review feedback outside of jira by Yan Zhou.
1) why build.xml needs any changes?
2) BasicTableOutputFormat.IS_MULTI should be of package scope instead of
3) In RecordWriter::write() method, the check of
"if(jobConf.getBoolean(BasicTableOutputFormat.IS_MULTI, false) == true)" should
be replaced with a simple "if (op != null)". As a consequence, "jobConf"
variable is not needed;
4) A lot of RuntimeExceptions have been thrown, which should be replaced
5) getRecordWriter: why remove the check for Path's nullness? The patch
seems to be inconsistent with what's on trunk. Patch says the check is
completely removed; while the trunk has an empty check;
6) TableRecordWriter: commaSeparatedLocs is never used;
7) In getOutputPartition, why are setConf/getConf necessary? Just curious.
In the latest patch all the above issues have been addressed
> [Zebra] multiple outputs support
> Key: PIG-1111
> URL: https://issues.apache.org/jira/browse/PIG-1111
> Project: Pig
> Issue Type: New Feature
> Affects Versions: 0.6.0, 0.7.0
> Reporter: Gaurav Jain
> Assignee: Gaurav Jain
> Fix For: 0.6.0, 0.7.0
> Attachments: PIG-1111.patch, PIG-1111.patch
> Zebra enables application to stream data into different zebra table instances.
> New Interface added:
> setMultipleOutputs( JobConf jobconf, String commaSeparatedLocation, Class<?
> extends ZebraOutputPartitioner> theClass.
> Zebra maintains a list of tables instances based on commaseparatedlocations (
> in that order )
> ZebraOutputPartitioner interface has getOutputPartition method which is
> implemented by the application. It will return an index into the list. Zebra
> will write to that instance
> We also introduce a new mapred property for setting multiple outputs.
This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.