[jira] Updated: (PIG-1144) set default_parallelism construct does not set the number of reducers correctly

2009-12-17 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1144?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-1144:


Status: Open  (was: Patch Available)

 set default_parallelism construct does not set the number of reducers 
 correctly
 ---

 Key: PIG-1144
 URL: https://issues.apache.org/jira/browse/PIG-1144
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.6.0
 Environment: Hadoop 20 cluster with multi-node installation
Reporter: Viraj Bhat
Assignee: Daniel Dai
 Fix For: 0.6.0

 Attachments: brokenparallel.out, genericscript_broken_parallel.pig, 
 PIG-1144-1.patch, PIG-1144-2.patch, PIG-1144-3.patch, PIG-1144-4.patch


 Hi all,
  I have a Pig script where I set the parallelism using the following set 
 construct: set default_parallel 100 . I modified the MRPrinter.java to 
 printout the parallelism
 {code}
 ...
 public void visitMROp(MapReduceOper mr)
 mStream.println(MapReduce node  + mr.getOperatorKey().toString() +  
 Parallelism  + mr.getRequestedParallelism());
 ...
 {code}
 When I run an explain on the script, I see that the last job which does the 
 actual sort, runs as a single reducer job. This can be corrected, by adding 
 the PARALLEL keyword in front of the ORDER BY.
 Attaching the script and the explain output
 Viraj

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1144) set default_parallelism construct does not set the number of reducers correctly

2009-12-17 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1144?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-1144:


Status: Patch Available  (was: Open)

 set default_parallelism construct does not set the number of reducers 
 correctly
 ---

 Key: PIG-1144
 URL: https://issues.apache.org/jira/browse/PIG-1144
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.6.0
 Environment: Hadoop 20 cluster with multi-node installation
Reporter: Viraj Bhat
Assignee: Daniel Dai
 Fix For: 0.6.0

 Attachments: brokenparallel.out, genericscript_broken_parallel.pig, 
 PIG-1144-1.patch, PIG-1144-2.patch, PIG-1144-3.patch, PIG-1144-4.patch


 Hi all,
  I have a Pig script where I set the parallelism using the following set 
 construct: set default_parallel 100 . I modified the MRPrinter.java to 
 printout the parallelism
 {code}
 ...
 public void visitMROp(MapReduceOper mr)
 mStream.println(MapReduce node  + mr.getOperatorKey().toString() +  
 Parallelism  + mr.getRequestedParallelism());
 ...
 {code}
 When I run an explain on the script, I see that the last job which does the 
 actual sort, runs as a single reducer job. This can be corrected, by adding 
 the PARALLEL keyword in front of the ORDER BY.
 Attaching the script and the explain output
 Viraj

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1144) set default_parallelism construct does not set the number of reducers correctly

2009-12-15 Thread Olga Natkovich (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1144?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Olga Natkovich updated PIG-1144:


Fix Version/s: (was: 0.7.0)
   0.6.0

 set default_parallelism construct does not set the number of reducers 
 correctly
 ---

 Key: PIG-1144
 URL: https://issues.apache.org/jira/browse/PIG-1144
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.6.0
 Environment: Hadoop 20 cluster with multi-node installation
Reporter: Viraj Bhat
Assignee: Daniel Dai
 Fix For: 0.6.0

 Attachments: brokenparallel.out, genericscript_broken_parallel.pig, 
 PIG-1144-1.patch, PIG-1144-2.patch, PIG-1144-3.patch


 Hi all,
  I have a Pig script where I set the parallelism using the following set 
 construct: set default_parallel 100 . I modified the MRPrinter.java to 
 printout the parallelism
 {code}
 ...
 public void visitMROp(MapReduceOper mr)
 mStream.println(MapReduce node  + mr.getOperatorKey().toString() +  
 Parallelism  + mr.getRequestedParallelism());
 ...
 {code}
 When I run an explain on the script, I see that the last job which does the 
 actual sort, runs as a single reducer job. This can be corrected, by adding 
 the PARALLEL keyword in front of the ORDER BY.
 Attaching the script and the explain output
 Viraj

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1144) set default_parallelism construct does not set the number of reducers correctly

2009-12-14 Thread Olga Natkovich (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1144?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Olga Natkovich updated PIG-1144:


Status: Open  (was: Patch Available)

 set default_parallelism construct does not set the number of reducers 
 correctly
 ---

 Key: PIG-1144
 URL: https://issues.apache.org/jira/browse/PIG-1144
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.6.0
 Environment: Hadoop 20 cluster with multi-node installation
Reporter: Viraj Bhat
Assignee: Daniel Dai
 Fix For: 0.7.0

 Attachments: brokenparallel.out, genericscript_broken_parallel.pig, 
 PIG-1144-1.patch, PIG-1144-2.patch, PIG-1144-3.patch


 Hi all,
  I have a Pig script where I set the parallelism using the following set 
 construct: set default_parallel 100 . I modified the MRPrinter.java to 
 printout the parallelism
 {code}
 ...
 public void visitMROp(MapReduceOper mr)
 mStream.println(MapReduce node  + mr.getOperatorKey().toString() +  
 Parallelism  + mr.getRequestedParallelism());
 ...
 {code}
 When I run an explain on the script, I see that the last job which does the 
 actual sort, runs as a single reducer job. This can be corrected, by adding 
 the PARALLEL keyword in front of the ORDER BY.
 Attaching the script and the explain output
 Viraj

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1144) set default_parallelism construct does not set the number of reducers correctly

2009-12-11 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1144?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-1144:


Attachment: PIG-1144-3.patch

Change the patch to take mapred.reduce.tasks into account. The hierarchy for 
determining the parallelism is:
1. PARALLEL keywords
2. default_parallel
3. mapred.reduce.tasks system property
4. default value: 1

 set default_parallelism construct does not set the number of reducers 
 correctly
 ---

 Key: PIG-1144
 URL: https://issues.apache.org/jira/browse/PIG-1144
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.6.0
 Environment: Hadoop 20 cluster with multi-node installation
Reporter: Viraj Bhat
Assignee: Daniel Dai
 Fix For: 0.7.0

 Attachments: brokenparallel.out, genericscript_broken_parallel.pig, 
 PIG-1144-1.patch, PIG-1144-2.patch, PIG-1144-3.patch


 Hi all,
  I have a Pig script where I set the parallelism using the following set 
 construct: set default_parallel 100 . I modified the MRPrinter.java to 
 printout the parallelism
 {code}
 ...
 public void visitMROp(MapReduceOper mr)
 mStream.println(MapReduce node  + mr.getOperatorKey().toString() +  
 Parallelism  + mr.getRequestedParallelism());
 ...
 {code}
 When I run an explain on the script, I see that the last job which does the 
 actual sort, runs as a single reducer job. This can be corrected, by adding 
 the PARALLEL keyword in front of the ORDER BY.
 Attaching the script and the explain output
 Viraj

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1144) set default_parallelism construct does not set the number of reducers correctly

2009-12-11 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1144?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-1144:


Status: Open  (was: Patch Available)

 set default_parallelism construct does not set the number of reducers 
 correctly
 ---

 Key: PIG-1144
 URL: https://issues.apache.org/jira/browse/PIG-1144
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.6.0
 Environment: Hadoop 20 cluster with multi-node installation
Reporter: Viraj Bhat
Assignee: Daniel Dai
 Fix For: 0.7.0

 Attachments: brokenparallel.out, genericscript_broken_parallel.pig, 
 PIG-1144-1.patch, PIG-1144-2.patch, PIG-1144-3.patch


 Hi all,
  I have a Pig script where I set the parallelism using the following set 
 construct: set default_parallel 100 . I modified the MRPrinter.java to 
 printout the parallelism
 {code}
 ...
 public void visitMROp(MapReduceOper mr)
 mStream.println(MapReduce node  + mr.getOperatorKey().toString() +  
 Parallelism  + mr.getRequestedParallelism());
 ...
 {code}
 When I run an explain on the script, I see that the last job which does the 
 actual sort, runs as a single reducer job. This can be corrected, by adding 
 the PARALLEL keyword in front of the ORDER BY.
 Attaching the script and the explain output
 Viraj

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1144) set default_parallelism construct does not set the number of reducers correctly

2009-12-11 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1144?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-1144:


Status: Patch Available  (was: Open)

 set default_parallelism construct does not set the number of reducers 
 correctly
 ---

 Key: PIG-1144
 URL: https://issues.apache.org/jira/browse/PIG-1144
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.6.0
 Environment: Hadoop 20 cluster with multi-node installation
Reporter: Viraj Bhat
Assignee: Daniel Dai
 Fix For: 0.7.0

 Attachments: brokenparallel.out, genericscript_broken_parallel.pig, 
 PIG-1144-1.patch, PIG-1144-2.patch, PIG-1144-3.patch


 Hi all,
  I have a Pig script where I set the parallelism using the following set 
 construct: set default_parallel 100 . I modified the MRPrinter.java to 
 printout the parallelism
 {code}
 ...
 public void visitMROp(MapReduceOper mr)
 mStream.println(MapReduce node  + mr.getOperatorKey().toString() +  
 Parallelism  + mr.getRequestedParallelism());
 ...
 {code}
 When I run an explain on the script, I see that the last job which does the 
 actual sort, runs as a single reducer job. This can be corrected, by adding 
 the PARALLEL keyword in front of the ORDER BY.
 Attaching the script and the explain output
 Viraj

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1144) set default_parallelism construct does not set the number of reducers correctly

2009-12-10 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1144?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-1144:


Attachment: PIG-1144-2.patch

I think the reason is quantile job need to know how many reducers we are going 
to use in order to decide tuples to write into quantilesFile. The number of 
reducers is a constant field of the plan. We cannot say -1 and let hadoop to 
decide the parallelism later. The fix actually take default_parallel as the 
constant if user do not use PARALLEL key word. It applies to both order by and 
skew join. Merge join and FRJoin are map only and regular join has been taken 
care of in the original code. Attach the patch again, nothing change except for 
including a new test case for skew join.

 set default_parallelism construct does not set the number of reducers 
 correctly
 ---

 Key: PIG-1144
 URL: https://issues.apache.org/jira/browse/PIG-1144
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.6.0
 Environment: Hadoop 20 cluster with multi-node installation
Reporter: Viraj Bhat
Assignee: Daniel Dai
 Fix For: 0.7.0

 Attachments: brokenparallel.out, genericscript_broken_parallel.pig, 
 PIG-1144-1.patch, PIG-1144-2.patch


 Hi all,
  I have a Pig script where I set the parallelism using the following set 
 construct: set default_parallel 100 . I modified the MRPrinter.java to 
 printout the parallelism
 {code}
 ...
 public void visitMROp(MapReduceOper mr)
 mStream.println(MapReduce node  + mr.getOperatorKey().toString() +  
 Parallelism  + mr.getRequestedParallelism());
 ...
 {code}
 When I run an explain on the script, I see that the last job which does the 
 actual sort, runs as a single reducer job. This can be corrected, by adding 
 the PARALLEL keyword in front of the ORDER BY.
 Attaching the script and the explain output
 Viraj

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1144) set default_parallelism construct does not set the number of reducers correctly

2009-12-10 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1144?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-1144:


Status: Open  (was: Patch Available)

 set default_parallelism construct does not set the number of reducers 
 correctly
 ---

 Key: PIG-1144
 URL: https://issues.apache.org/jira/browse/PIG-1144
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.6.0
 Environment: Hadoop 20 cluster with multi-node installation
Reporter: Viraj Bhat
Assignee: Daniel Dai
 Fix For: 0.7.0

 Attachments: brokenparallel.out, genericscript_broken_parallel.pig, 
 PIG-1144-1.patch, PIG-1144-2.patch


 Hi all,
  I have a Pig script where I set the parallelism using the following set 
 construct: set default_parallel 100 . I modified the MRPrinter.java to 
 printout the parallelism
 {code}
 ...
 public void visitMROp(MapReduceOper mr)
 mStream.println(MapReduce node  + mr.getOperatorKey().toString() +  
 Parallelism  + mr.getRequestedParallelism());
 ...
 {code}
 When I run an explain on the script, I see that the last job which does the 
 actual sort, runs as a single reducer job. This can be corrected, by adding 
 the PARALLEL keyword in front of the ORDER BY.
 Attaching the script and the explain output
 Viraj

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1144) set default_parallelism construct does not set the number of reducers correctly

2009-12-10 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1144?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-1144:


Status: Patch Available  (was: Open)

 set default_parallelism construct does not set the number of reducers 
 correctly
 ---

 Key: PIG-1144
 URL: https://issues.apache.org/jira/browse/PIG-1144
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.6.0
 Environment: Hadoop 20 cluster with multi-node installation
Reporter: Viraj Bhat
Assignee: Daniel Dai
 Fix For: 0.7.0

 Attachments: brokenparallel.out, genericscript_broken_parallel.pig, 
 PIG-1144-1.patch, PIG-1144-2.patch


 Hi all,
  I have a Pig script where I set the parallelism using the following set 
 construct: set default_parallel 100 . I modified the MRPrinter.java to 
 printout the parallelism
 {code}
 ...
 public void visitMROp(MapReduceOper mr)
 mStream.println(MapReduce node  + mr.getOperatorKey().toString() +  
 Parallelism  + mr.getRequestedParallelism());
 ...
 {code}
 When I run an explain on the script, I see that the last job which does the 
 actual sort, runs as a single reducer job. This can be corrected, by adding 
 the PARALLEL keyword in front of the ORDER BY.
 Attaching the script and the explain output
 Viraj

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1144) set default_parallelism construct does not set the number of reducers correctly

2009-12-09 Thread Viraj Bhat (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1144?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Viraj Bhat updated PIG-1144:


Attachment: brokenparallel.out
genericscript_broken_parallel.pig

Script and explain output

 set default_parallelism construct does not set the number of reducers 
 correctly
 ---

 Key: PIG-1144
 URL: https://issues.apache.org/jira/browse/PIG-1144
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.7.0
 Environment: Hadoop 20 cluster with multi-node installation
Reporter: Viraj Bhat
 Fix For: 0.7.0

 Attachments: brokenparallel.out, genericscript_broken_parallel.pig


 Hi all,
  I have a Pig script where I set the parallelism using the following set 
 construct: set default_parallel 100 . I modified the MRPrinter.java to 
 printout the parallelism
 {code}
 ...
 public void visitMROp(MapReduceOper mr)
 mStream.println(MapReduce node  + mr.getOperatorKey().toString() +  
 Parallelism  + mr.getRequestedParallelism());
 ...
 {code}
 When I run an explain on the script, I see that the last job which does the 
 actual sort, runs as a single reducer job. This can be corrected, by adding 
 the PARALLEL keyword in front of the ORDER BY.
 Attaching the script and the explain output
 Viraj

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1144) set default_parallelism construct does not set the number of reducers correctly

2009-12-09 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1144?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-1144:


Attachment: PIG-1144-1.patch

 set default_parallelism construct does not set the number of reducers 
 correctly
 ---

 Key: PIG-1144
 URL: https://issues.apache.org/jira/browse/PIG-1144
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.6.0
 Environment: Hadoop 20 cluster with multi-node installation
Reporter: Viraj Bhat
 Fix For: 0.7.0

 Attachments: brokenparallel.out, genericscript_broken_parallel.pig, 
 PIG-1144-1.patch


 Hi all,
  I have a Pig script where I set the parallelism using the following set 
 construct: set default_parallel 100 . I modified the MRPrinter.java to 
 printout the parallelism
 {code}
 ...
 public void visitMROp(MapReduceOper mr)
 mStream.println(MapReduce node  + mr.getOperatorKey().toString() +  
 Parallelism  + mr.getRequestedParallelism());
 ...
 {code}
 When I run an explain on the script, I see that the last job which does the 
 actual sort, runs as a single reducer job. This can be corrected, by adding 
 the PARALLEL keyword in front of the ORDER BY.
 Attaching the script and the explain output
 Viraj

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1144) set default_parallelism construct does not set the number of reducers correctly

2009-12-09 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1144?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-1144:


Affects Version/s: (was: 0.7.0)
   0.6.0
   Status: Patch Available  (was: Open)

 set default_parallelism construct does not set the number of reducers 
 correctly
 ---

 Key: PIG-1144
 URL: https://issues.apache.org/jira/browse/PIG-1144
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.6.0
 Environment: Hadoop 20 cluster with multi-node installation
Reporter: Viraj Bhat
Assignee: Daniel Dai
 Fix For: 0.7.0

 Attachments: brokenparallel.out, genericscript_broken_parallel.pig, 
 PIG-1144-1.patch


 Hi all,
  I have a Pig script where I set the parallelism using the following set 
 construct: set default_parallel 100 . I modified the MRPrinter.java to 
 printout the parallelism
 {code}
 ...
 public void visitMROp(MapReduceOper mr)
 mStream.println(MapReduce node  + mr.getOperatorKey().toString() +  
 Parallelism  + mr.getRequestedParallelism());
 ...
 {code}
 When I run an explain on the script, I see that the last job which does the 
 actual sort, runs as a single reducer job. This can be corrected, by adding 
 the PARALLEL keyword in front of the ORDER BY.
 Attaching the script and the explain output
 Viraj

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.