[ https://issues.apache.org/jira/browse/PIG-1144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12788447#action_12788447 ]
Daniel Dai commented on PIG-1144: --------------------------------- I find the root cause of the problem. For every sort job, we hard code parallelism as 1 if user do not use PARALLEL keyword. We shall leave parallelism as -1 in this case, and then the later code will find it and use default_parallel value instead. > set default_parallelism construct does not set the number of reducers > correctly > ------------------------------------------------------------------------------- > > Key: PIG-1144 > URL: https://issues.apache.org/jira/browse/PIG-1144 > Project: Pig > Issue Type: Bug > Components: impl > Affects Versions: 0.7.0 > Environment: Hadoop 20 cluster with multi-node installation > Reporter: Viraj Bhat > Fix For: 0.7.0 > > Attachments: brokenparallel.out, genericscript_broken_parallel.pig > > > Hi all, > I have a Pig script where I set the parallelism using the following set > construct: "set default_parallel 100" . I modified the "MRPrinter.java" to > printout the parallelism > {code} > ... > public void visitMROp(MapReduceOper mr) > mStream.println("MapReduce node " + mr.getOperatorKey().toString() + " > Parallelism " + mr.getRequestedParallelism()); > ... > {code} > When I run an explain on the script, I see that the last job which does the > actual sort, runs as a single reducer job. This can be corrected, by adding > the PARALLEL keyword in front of the ORDER BY. > Attaching the script and the explain output > Viraj -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.