[jira] Updated: (PIG-921) Strange use case for Join which produces different results in local and map reduce mode
[ https://issues.apache.org/jira/browse/PIG-921?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai updated PIG-921: --- Resolution: Fixed Hadoop Flags: [Reviewed] Status: Resolved (was: Patch Available) Patch committed. Strange use case for Join which produces different results in local and map reduce mode --- Key: PIG-921 URL: https://issues.apache.org/jira/browse/PIG-921 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.4.0 Environment: Hadoop 18 and Hadoop 20 Reporter: Viraj Bhat Assignee: Daniel Dai Fix For: 0.6.0 Attachments: A.txt, B.txt, joinusecase.pig, PIG-921-1.patch I have script in this manner, loads from 2 files A.txt and B.txt {code} A = LOAD 'A.txt' as (a:tuple(a1:int, a2:chararray)); B = LOAD 'B.txt' as (b:tuple(b1:int, b2:chararray)); C = JOIN A by a.a1, B by b.b1; DESCRIBE C; DUMP C; {code} A.txt contains the following lines: {code} (1,a) (2,aa) {code} B.txt contains the following lines: {code} (1,b) (2,bb) {code} Now running the above script in local and map reduce mode on Hadoop 18 Hadoop 20, produces the following: Hadoop 18 = (1,1) (2,2) = Hadoop 20 = (1,1) (2,2) = Local Mode: Pig with Hadoop 18 jar release = 2009-08-13 17:15:13,473 [main] INFO org.apache.pig.Main - Logging error messages to: /homes/viraj/pig-svn/trunk/pigscripts/pig_1250208913472.log 09/08/13 17:15:13 INFO pig.Main: Logging error messages to: /homes/viraj/pig-svn/trunk/pigscripts/pig_1250208913472.log C: {a: (a1: int,a2: chararray),b: (b1: int,b2: chararray)} 2009-08-13 17:15:13,932 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1002: Unable to store alias C 09/08/13 17:15:13 ERROR grunt.Grunt: ERROR 1002: Unable to store alias C Details at logfile: /homes/viraj/pig-svn/trunk/pigscripts/pig_1250208913472.log = Caused by: java.lang.NullPointerException at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POPackage.getNext(POPackage.java:206) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:231) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:191) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:231) at org.apache.pig.backend.local.executionengine.physicalLayer.counters.POCounter.getNext(POCounter.java:71) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:231) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POStore.getNext(POStore.java:117) at org.apache.pig.backend.local.executionengine.LocalPigLauncher.runPipeline(LocalPigLauncher.java:146) at org.apache.pig.backend.local.executionengine.LocalPigLauncher.launchPig(LocalPigLauncher.java:109) at org.apache.pig.backend.local.executionengine.LocalExecutionEngine.execute(LocalExecutionEngine.java:165) ... 9 more = Local Mode: Pig with Hadoop 20 jar release = ((1,a),(1,b)) ((2,aa),(2,bb) = -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-921) Strange use case for Join which produces different results in local and map reduce mode
[ https://issues.apache.org/jira/browse/PIG-921?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai updated PIG-921: --- Attachment: PIG-921-1.patch The problem is in POLocalReArragement, we skip the entire tuple in the value if we use one field of the tuple as join key. Strange use case for Join which produces different results in local and map reduce mode --- Key: PIG-921 URL: https://issues.apache.org/jira/browse/PIG-921 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.4.0 Environment: Hadoop 18 and Hadoop 20 Reporter: Viraj Bhat Assignee: Daniel Dai Fix For: 0.6.0 Attachments: A.txt, B.txt, joinusecase.pig, PIG-921-1.patch I have script in this manner, loads from 2 files A.txt and B.txt {code} A = LOAD 'A.txt' as (a:tuple(a1:int, a2:chararray)); B = LOAD 'B.txt' as (b:tuple(b1:int, b2:chararray)); C = JOIN A by a.a1, B by b.b1; DESCRIBE C; DUMP C; {code} A.txt contains the following lines: {code} (1,a) (2,aa) {code} B.txt contains the following lines: {code} (1,b) (2,bb) {code} Now running the above script in local and map reduce mode on Hadoop 18 Hadoop 20, produces the following: Hadoop 18 = (1,1) (2,2) = Hadoop 20 = (1,1) (2,2) = Local Mode: Pig with Hadoop 18 jar release = 2009-08-13 17:15:13,473 [main] INFO org.apache.pig.Main - Logging error messages to: /homes/viraj/pig-svn/trunk/pigscripts/pig_1250208913472.log 09/08/13 17:15:13 INFO pig.Main: Logging error messages to: /homes/viraj/pig-svn/trunk/pigscripts/pig_1250208913472.log C: {a: (a1: int,a2: chararray),b: (b1: int,b2: chararray)} 2009-08-13 17:15:13,932 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1002: Unable to store alias C 09/08/13 17:15:13 ERROR grunt.Grunt: ERROR 1002: Unable to store alias C Details at logfile: /homes/viraj/pig-svn/trunk/pigscripts/pig_1250208913472.log = Caused by: java.lang.NullPointerException at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POPackage.getNext(POPackage.java:206) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:231) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:191) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:231) at org.apache.pig.backend.local.executionengine.physicalLayer.counters.POCounter.getNext(POCounter.java:71) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:231) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POStore.getNext(POStore.java:117) at org.apache.pig.backend.local.executionengine.LocalPigLauncher.runPipeline(LocalPigLauncher.java:146) at org.apache.pig.backend.local.executionengine.LocalPigLauncher.launchPig(LocalPigLauncher.java:109) at org.apache.pig.backend.local.executionengine.LocalExecutionEngine.execute(LocalExecutionEngine.java:165) ... 9 more = Local Mode: Pig with Hadoop 20 jar release = ((1,a),(1,b)) ((2,aa),(2,bb) = -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-921) Strange use case for Join which produces different results in local and map reduce mode
[ https://issues.apache.org/jira/browse/PIG-921?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai updated PIG-921: --- Fix Version/s: 0.6.0 Affects Version/s: (was: 0.3.0) 0.4.0 Status: Patch Available (was: Open) Strange use case for Join which produces different results in local and map reduce mode --- Key: PIG-921 URL: https://issues.apache.org/jira/browse/PIG-921 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.4.0 Environment: Hadoop 18 and Hadoop 20 Reporter: Viraj Bhat Assignee: Daniel Dai Fix For: 0.6.0 Attachments: A.txt, B.txt, joinusecase.pig, PIG-921-1.patch I have script in this manner, loads from 2 files A.txt and B.txt {code} A = LOAD 'A.txt' as (a:tuple(a1:int, a2:chararray)); B = LOAD 'B.txt' as (b:tuple(b1:int, b2:chararray)); C = JOIN A by a.a1, B by b.b1; DESCRIBE C; DUMP C; {code} A.txt contains the following lines: {code} (1,a) (2,aa) {code} B.txt contains the following lines: {code} (1,b) (2,bb) {code} Now running the above script in local and map reduce mode on Hadoop 18 Hadoop 20, produces the following: Hadoop 18 = (1,1) (2,2) = Hadoop 20 = (1,1) (2,2) = Local Mode: Pig with Hadoop 18 jar release = 2009-08-13 17:15:13,473 [main] INFO org.apache.pig.Main - Logging error messages to: /homes/viraj/pig-svn/trunk/pigscripts/pig_1250208913472.log 09/08/13 17:15:13 INFO pig.Main: Logging error messages to: /homes/viraj/pig-svn/trunk/pigscripts/pig_1250208913472.log C: {a: (a1: int,a2: chararray),b: (b1: int,b2: chararray)} 2009-08-13 17:15:13,932 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1002: Unable to store alias C 09/08/13 17:15:13 ERROR grunt.Grunt: ERROR 1002: Unable to store alias C Details at logfile: /homes/viraj/pig-svn/trunk/pigscripts/pig_1250208913472.log = Caused by: java.lang.NullPointerException at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POPackage.getNext(POPackage.java:206) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:231) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:191) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:231) at org.apache.pig.backend.local.executionengine.physicalLayer.counters.POCounter.getNext(POCounter.java:71) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:231) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POStore.getNext(POStore.java:117) at org.apache.pig.backend.local.executionengine.LocalPigLauncher.runPipeline(LocalPigLauncher.java:146) at org.apache.pig.backend.local.executionengine.LocalPigLauncher.launchPig(LocalPigLauncher.java:109) at org.apache.pig.backend.local.executionengine.LocalExecutionEngine.execute(LocalExecutionEngine.java:165) ... 9 more = Local Mode: Pig with Hadoop 20 jar release = ((1,a),(1,b)) ((2,aa),(2,bb) = -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.