[jira] Updated: (PIG-921) Strange use case for Join which produces different results in local and map reduce mode

2009-10-14 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-921?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-921:
---

  Resolution: Fixed
Hadoop Flags: [Reviewed]
  Status: Resolved  (was: Patch Available)

Patch committed.

 Strange use case for Join which produces different results in local and map 
 reduce mode
 ---

 Key: PIG-921
 URL: https://issues.apache.org/jira/browse/PIG-921
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.4.0
 Environment: Hadoop 18 and Hadoop 20
Reporter: Viraj Bhat
Assignee: Daniel Dai
 Fix For: 0.6.0

 Attachments: A.txt, B.txt, joinusecase.pig, PIG-921-1.patch


 I have script in this manner, loads from 2 files A.txt and B.txt
 {code}
 A = LOAD 'A.txt' as (a:tuple(a1:int, a2:chararray));
 B = LOAD 'B.txt' as (b:tuple(b1:int, b2:chararray));
 C = JOIN A by a.a1, B by b.b1;
 DESCRIBE C;
 DUMP C;
 {code}
 A.txt contains the following lines:
 {code}
 (1,a)
 (2,aa)
 {code}
 B.txt contains the following lines:
 {code}
 (1,b)
 (2,bb)
 {code}
 Now running the above script in local and map reduce mode on Hadoop 18  
 Hadoop 20, produces the following:
 Hadoop 18
 =
 (1,1)
 (2,2)
 =
 Hadoop 20
 =
 (1,1)
 (2,2)
 =
 Local Mode: Pig with Hadoop 18 jar release 
 =
 2009-08-13 17:15:13,473 [main] INFO  org.apache.pig.Main - Logging error 
 messages to: /homes/viraj/pig-svn/trunk/pigscripts/pig_1250208913472.log
 09/08/13 17:15:13 INFO pig.Main: Logging error messages to: 
 /homes/viraj/pig-svn/trunk/pigscripts/pig_1250208913472.log
 C: {a: (a1: int,a2: chararray),b: (b1: int,b2: chararray)}
 2009-08-13 17:15:13,932 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 
 1002: Unable to store alias C
 09/08/13 17:15:13 ERROR grunt.Grunt: ERROR 1002: Unable to store alias C
 Details at logfile: 
 /homes/viraj/pig-svn/trunk/pigscripts/pig_1250208913472.log
 =
 Caused by: java.lang.NullPointerException
 at 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POPackage.getNext(POPackage.java:206)
 at 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:231)
 at 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:191)
 at 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:231)
 at 
 org.apache.pig.backend.local.executionengine.physicalLayer.counters.POCounter.getNext(POCounter.java:71)
 at 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:231)
 at 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POStore.getNext(POStore.java:117)
 at 
 org.apache.pig.backend.local.executionengine.LocalPigLauncher.runPipeline(LocalPigLauncher.java:146)
 at 
 org.apache.pig.backend.local.executionengine.LocalPigLauncher.launchPig(LocalPigLauncher.java:109)
 at 
 org.apache.pig.backend.local.executionengine.LocalExecutionEngine.execute(LocalExecutionEngine.java:165)
 ... 9 more
 =
 Local Mode: Pig with Hadoop 20 jar release
 =
 ((1,a),(1,b))
 ((2,aa),(2,bb)
 =

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-921) Strange use case for Join which produces different results in local and map reduce mode

2009-10-13 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-921?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-921:
---

Attachment: PIG-921-1.patch

The problem is in POLocalReArragement, we skip the entire tuple in the value if 
we use one field of the tuple as join key.

 Strange use case for Join which produces different results in local and map 
 reduce mode
 ---

 Key: PIG-921
 URL: https://issues.apache.org/jira/browse/PIG-921
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.4.0
 Environment: Hadoop 18 and Hadoop 20
Reporter: Viraj Bhat
Assignee: Daniel Dai
 Fix For: 0.6.0

 Attachments: A.txt, B.txt, joinusecase.pig, PIG-921-1.patch


 I have script in this manner, loads from 2 files A.txt and B.txt
 {code}
 A = LOAD 'A.txt' as (a:tuple(a1:int, a2:chararray));
 B = LOAD 'B.txt' as (b:tuple(b1:int, b2:chararray));
 C = JOIN A by a.a1, B by b.b1;
 DESCRIBE C;
 DUMP C;
 {code}
 A.txt contains the following lines:
 {code}
 (1,a)
 (2,aa)
 {code}
 B.txt contains the following lines:
 {code}
 (1,b)
 (2,bb)
 {code}
 Now running the above script in local and map reduce mode on Hadoop 18  
 Hadoop 20, produces the following:
 Hadoop 18
 =
 (1,1)
 (2,2)
 =
 Hadoop 20
 =
 (1,1)
 (2,2)
 =
 Local Mode: Pig with Hadoop 18 jar release 
 =
 2009-08-13 17:15:13,473 [main] INFO  org.apache.pig.Main - Logging error 
 messages to: /homes/viraj/pig-svn/trunk/pigscripts/pig_1250208913472.log
 09/08/13 17:15:13 INFO pig.Main: Logging error messages to: 
 /homes/viraj/pig-svn/trunk/pigscripts/pig_1250208913472.log
 C: {a: (a1: int,a2: chararray),b: (b1: int,b2: chararray)}
 2009-08-13 17:15:13,932 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 
 1002: Unable to store alias C
 09/08/13 17:15:13 ERROR grunt.Grunt: ERROR 1002: Unable to store alias C
 Details at logfile: 
 /homes/viraj/pig-svn/trunk/pigscripts/pig_1250208913472.log
 =
 Caused by: java.lang.NullPointerException
 at 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POPackage.getNext(POPackage.java:206)
 at 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:231)
 at 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:191)
 at 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:231)
 at 
 org.apache.pig.backend.local.executionengine.physicalLayer.counters.POCounter.getNext(POCounter.java:71)
 at 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:231)
 at 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POStore.getNext(POStore.java:117)
 at 
 org.apache.pig.backend.local.executionengine.LocalPigLauncher.runPipeline(LocalPigLauncher.java:146)
 at 
 org.apache.pig.backend.local.executionengine.LocalPigLauncher.launchPig(LocalPigLauncher.java:109)
 at 
 org.apache.pig.backend.local.executionengine.LocalExecutionEngine.execute(LocalExecutionEngine.java:165)
 ... 9 more
 =
 Local Mode: Pig with Hadoop 20 jar release
 =
 ((1,a),(1,b))
 ((2,aa),(2,bb)
 =

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-921) Strange use case for Join which produces different results in local and map reduce mode

2009-10-13 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-921?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-921:
---

Fix Version/s: 0.6.0
Affects Version/s: (was: 0.3.0)
   0.4.0
   Status: Patch Available  (was: Open)

 Strange use case for Join which produces different results in local and map 
 reduce mode
 ---

 Key: PIG-921
 URL: https://issues.apache.org/jira/browse/PIG-921
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.4.0
 Environment: Hadoop 18 and Hadoop 20
Reporter: Viraj Bhat
Assignee: Daniel Dai
 Fix For: 0.6.0

 Attachments: A.txt, B.txt, joinusecase.pig, PIG-921-1.patch


 I have script in this manner, loads from 2 files A.txt and B.txt
 {code}
 A = LOAD 'A.txt' as (a:tuple(a1:int, a2:chararray));
 B = LOAD 'B.txt' as (b:tuple(b1:int, b2:chararray));
 C = JOIN A by a.a1, B by b.b1;
 DESCRIBE C;
 DUMP C;
 {code}
 A.txt contains the following lines:
 {code}
 (1,a)
 (2,aa)
 {code}
 B.txt contains the following lines:
 {code}
 (1,b)
 (2,bb)
 {code}
 Now running the above script in local and map reduce mode on Hadoop 18  
 Hadoop 20, produces the following:
 Hadoop 18
 =
 (1,1)
 (2,2)
 =
 Hadoop 20
 =
 (1,1)
 (2,2)
 =
 Local Mode: Pig with Hadoop 18 jar release 
 =
 2009-08-13 17:15:13,473 [main] INFO  org.apache.pig.Main - Logging error 
 messages to: /homes/viraj/pig-svn/trunk/pigscripts/pig_1250208913472.log
 09/08/13 17:15:13 INFO pig.Main: Logging error messages to: 
 /homes/viraj/pig-svn/trunk/pigscripts/pig_1250208913472.log
 C: {a: (a1: int,a2: chararray),b: (b1: int,b2: chararray)}
 2009-08-13 17:15:13,932 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 
 1002: Unable to store alias C
 09/08/13 17:15:13 ERROR grunt.Grunt: ERROR 1002: Unable to store alias C
 Details at logfile: 
 /homes/viraj/pig-svn/trunk/pigscripts/pig_1250208913472.log
 =
 Caused by: java.lang.NullPointerException
 at 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POPackage.getNext(POPackage.java:206)
 at 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:231)
 at 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:191)
 at 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:231)
 at 
 org.apache.pig.backend.local.executionengine.physicalLayer.counters.POCounter.getNext(POCounter.java:71)
 at 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:231)
 at 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POStore.getNext(POStore.java:117)
 at 
 org.apache.pig.backend.local.executionengine.LocalPigLauncher.runPipeline(LocalPigLauncher.java:146)
 at 
 org.apache.pig.backend.local.executionengine.LocalPigLauncher.launchPig(LocalPigLauncher.java:109)
 at 
 org.apache.pig.backend.local.executionengine.LocalExecutionEngine.execute(LocalExecutionEngine.java:165)
 ... 9 more
 =
 Local Mode: Pig with Hadoop 20 jar release
 =
 ((1,a),(1,b))
 ((2,aa),(2,bb)
 =

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.