[
https://issues.apache.org/jira/browse/PIG-958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12758952#action_12758952
]
Pradeep Kamath commented on PIG-958:
------------------------------------
In general the patch looks good - do address the hudson QA issues. Some code
review comments:
1) I don't think pig.properties changes should be part of this patch - was this
accidental? Likewise for Main.java.
2)
{noformat}
72 /**
73 * storeUID is needed to get the around the case of multi-query
optimisation where
74 * multiple instances of MultiStorage could be running in a single
mapper/reducer
75 * and run the risk of overwriting each other's output.
76 */
{noformat}
Wouldn't the multiple instances be writing to different locations in which case
there should be no race condition right?
3)
{noformat}
172 private void initJobSpecificParams() throws IOException {
173 if (partition == null || outputPath == null) {
{noformat}
Wouldn't the outputPath never be null since it is initialized in the
constructor?
4) Consider removing the log.debug() since these are all in the inner loop and
would possibly impact performance.
5)
{noformat}
291 String fieldValueBasedPathStr =
fieldValueBasedPath.toUri().getPath();
{noformat}
This variable is only really used in a log.debug(), I think removing this and
using fieldValueBasedPath in all the fs.create() will make the code shorter and
cleaner.
6) Some comments on why the move of the output dirs is needed and in the
moveResults() and removePart() methods would be helpful. Additional comments on
the code flow would also help
> Splitting output data on key field
> ----------------------------------
>
> Key: PIG-958
> URL: https://issues.apache.org/jira/browse/PIG-958
> Project: Pig
> Issue Type: Bug
> Affects Versions: 0.4.0
> Reporter: Ankur
> Attachments: 958.v2.patch
>
>
> Pig users often face the need to split the output records into a bunch of
> files and directories depending on the type of record. Pig's SPLIT operator
> is useful when record types are few and known in advance. In cases where type
> is not directly known but is derived dynamically from values of a key field
> in the output tuple, a custom store function is a better solution.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.