[jira] Commented: (PIG-1145) [zebra] merge join on large table ( 100,000.000 rows zebra table) failed

2009-12-11 Thread Chao Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1145?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12789425#action_12789425
 ] 

Chao Wang commented on PIG-1145:


The patch looks good +1.

> [zebra] merge join on large table ( 100,000.000 rows zebra table) failed
> 
>
> Key: PIG-1145
> URL: https://issues.apache.org/jira/browse/PIG-1145
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.6.0, 0.7.0
>Reporter: Jing Huang
>Assignee: Yan Zhou
> Fix For: 0.6.0, 0.7.0
>
> Attachments: PIG-1145.patch, PIG-1145.patch, PIG-1145.patch
>
>
> Pig script :
> register $zebraJar;
> --fs -rmr $outputDir
> a1 = LOAD '$inputDir/unsorted1' USING 
> org.apache.hadoop.zebra.pig.TableLoader('count,seed,int1,str2');
> a2 = LOAD '$inputDir/unsorted2' USING 
> org.apache.hadoop.zebra.pig.TableLoader('count,seed,int1,str2');
> sort1 = order a1 by str2;
> sort2 = order a2 by str2;
> --store sort1 into '$outputDir/sorted11' using 
> org.apache.hadoop.zebra.pig.TableStorer('[count,seed,int1,str2]');
> --store sort2 into '$outputDir/sorted21' using 
> org.apache.hadoop.zebra.pig.TableStorer('[count,seed,int1,str2]');
> rec1 = load '$outputDir/sorted11' using 
> org.apache.hadoop.zebra.pig.TableLoader();
> rec2 = load '$outputDir/sorted21' using 
> org.apache.hadoop.zebra.pig.TableLoader();
> joina = join rec1 by str2, rec2 by str2 using "merge" ;
> --E = foreach joina  generate $0 as count,  $1 as seed,  $2 as int1,  $3 as 
> str2;
> store joina into '$outputDir/join1' using 
> org.apache.hadoop.zebra.pig.TableStorer('');
> ~ 
>   
> 
> ~ 
>   
> 
> ~  
> ==
> stacktrace:
> org.apache.pig.backend.executionengine.ExecException: ERROR 2176: Error 
> processing right input during merge join at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POMergeJoin.throwProcessingException(POMergeJoin.java:453)
>  at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POMergeJoin.getNextRightInp(POMergeJoin.java:443)
>  at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POMergeJoin.getNext(POMergeJoin.java:337)
>  at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:253)
>  at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.close(PigMapBase.java:107)
>  at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:57) at 
> org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358) at 
> org.apache.hadoop.mapred.MapTask.run(MapTask.java:307) at 
> org.apache.hadoop.mapred.Child.main(Child.java:159) Caused by: 
> java.io.EOFException: No key-value to read at 
> org.apache.hadoop.zebra.tfile.TFile$Reader$Scanner.checkKey(TFile.java:1590) 
> at org.apache.hadoop.zebra.tfile.TFile$Reader$Scanner.entry(TFile.java:1611) 
> at 
> org.apache.hadoop.zebra.io.ColumnGroup$Reader$TFileScanner.getKey(ColumnGroup.java:854)
>  at 
> org.apache.hadoop.zebra.io.ColumnGroup$Reader$CGScanner.getCGKey(ColumnGroup.java:1035)
>  at 
> org.apache.hadoop.zebra.io.BasicTable$Reader$BTScanner.getKey(BasicTable.java:1082)
>  at 
> org.apache.hadoop.zebra.mapred.TableRecordReader.next(TableRecordReader.java:105)
>  at org.apache.hadoop.zebra.pig.TableLoader.getNext(TableLoader.java:414) at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POMergeJoin.getNextRightInp(POMergeJoin.java:415)
>  ... 7 more 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1145) [zebra] merge join on large table ( 100,000.000 rows zebra table) failed

2009-12-11 Thread Yan Zhou (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1145?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12789411#action_12789411
 ] 

Yan Zhou commented on PIG-1145:
---

All failed test cases are PIG tests and look like enviromental. I reran the 
first failed test, TestJoin, in my local cluser, and it passes cleanly.

> [zebra] merge join on large table ( 100,000.000 rows zebra table) failed
> 
>
> Key: PIG-1145
> URL: https://issues.apache.org/jira/browse/PIG-1145
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.6.0, 0.7.0
>Reporter: Jing Huang
>Assignee: Yan Zhou
> Fix For: 0.6.0, 0.7.0
>
> Attachments: PIG-1145.patch, PIG-1145.patch, PIG-1145.patch
>
>
> Pig script :
> register $zebraJar;
> --fs -rmr $outputDir
> a1 = LOAD '$inputDir/unsorted1' USING 
> org.apache.hadoop.zebra.pig.TableLoader('count,seed,int1,str2');
> a2 = LOAD '$inputDir/unsorted2' USING 
> org.apache.hadoop.zebra.pig.TableLoader('count,seed,int1,str2');
> sort1 = order a1 by str2;
> sort2 = order a2 by str2;
> --store sort1 into '$outputDir/sorted11' using 
> org.apache.hadoop.zebra.pig.TableStorer('[count,seed,int1,str2]');
> --store sort2 into '$outputDir/sorted21' using 
> org.apache.hadoop.zebra.pig.TableStorer('[count,seed,int1,str2]');
> rec1 = load '$outputDir/sorted11' using 
> org.apache.hadoop.zebra.pig.TableLoader();
> rec2 = load '$outputDir/sorted21' using 
> org.apache.hadoop.zebra.pig.TableLoader();
> joina = join rec1 by str2, rec2 by str2 using "merge" ;
> --E = foreach joina  generate $0 as count,  $1 as seed,  $2 as int1,  $3 as 
> str2;
> store joina into '$outputDir/join1' using 
> org.apache.hadoop.zebra.pig.TableStorer('');
> ~ 
>   
> 
> ~ 
>   
> 
> ~  
> ==
> stacktrace:
> org.apache.pig.backend.executionengine.ExecException: ERROR 2176: Error 
> processing right input during merge join at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POMergeJoin.throwProcessingException(POMergeJoin.java:453)
>  at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POMergeJoin.getNextRightInp(POMergeJoin.java:443)
>  at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POMergeJoin.getNext(POMergeJoin.java:337)
>  at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:253)
>  at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.close(PigMapBase.java:107)
>  at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:57) at 
> org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358) at 
> org.apache.hadoop.mapred.MapTask.run(MapTask.java:307) at 
> org.apache.hadoop.mapred.Child.main(Child.java:159) Caused by: 
> java.io.EOFException: No key-value to read at 
> org.apache.hadoop.zebra.tfile.TFile$Reader$Scanner.checkKey(TFile.java:1590) 
> at org.apache.hadoop.zebra.tfile.TFile$Reader$Scanner.entry(TFile.java:1611) 
> at 
> org.apache.hadoop.zebra.io.ColumnGroup$Reader$TFileScanner.getKey(ColumnGroup.java:854)
>  at 
> org.apache.hadoop.zebra.io.ColumnGroup$Reader$CGScanner.getCGKey(ColumnGroup.java:1035)
>  at 
> org.apache.hadoop.zebra.io.BasicTable$Reader$BTScanner.getKey(BasicTable.java:1082)
>  at 
> org.apache.hadoop.zebra.mapred.TableRecordReader.next(TableRecordReader.java:105)
>  at org.apache.hadoop.zebra.pig.TableLoader.getNext(TableLoader.java:414) at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POMergeJoin.getNextRightInp(POMergeJoin.java:415)
>  ... 7 more 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1145) [zebra] merge join on large table ( 100,000.000 rows zebra table) failed

2009-12-11 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1145?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12789376#action_12789376
 ] 

Hadoop QA commented on PIG-1145:


-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12427696/PIG-1145.patch
  against trunk revision 889346.

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 2 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

-1 core tests.  The patch failed core unit tests.

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/116/testReport/
Findbugs warnings: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/116/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/116/console

This message is automatically generated.

> [zebra] merge join on large table ( 100,000.000 rows zebra table) failed
> 
>
> Key: PIG-1145
> URL: https://issues.apache.org/jira/browse/PIG-1145
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.6.0, 0.7.0
>Reporter: Jing Huang
>Assignee: Yan Zhou
> Fix For: 0.6.0, 0.7.0
>
> Attachments: PIG-1145.patch, PIG-1145.patch, PIG-1145.patch
>
>
> Pig script :
> register $zebraJar;
> --fs -rmr $outputDir
> a1 = LOAD '$inputDir/unsorted1' USING 
> org.apache.hadoop.zebra.pig.TableLoader('count,seed,int1,str2');
> a2 = LOAD '$inputDir/unsorted2' USING 
> org.apache.hadoop.zebra.pig.TableLoader('count,seed,int1,str2');
> sort1 = order a1 by str2;
> sort2 = order a2 by str2;
> --store sort1 into '$outputDir/sorted11' using 
> org.apache.hadoop.zebra.pig.TableStorer('[count,seed,int1,str2]');
> --store sort2 into '$outputDir/sorted21' using 
> org.apache.hadoop.zebra.pig.TableStorer('[count,seed,int1,str2]');
> rec1 = load '$outputDir/sorted11' using 
> org.apache.hadoop.zebra.pig.TableLoader();
> rec2 = load '$outputDir/sorted21' using 
> org.apache.hadoop.zebra.pig.TableLoader();
> joina = join rec1 by str2, rec2 by str2 using "merge" ;
> --E = foreach joina  generate $0 as count,  $1 as seed,  $2 as int1,  $3 as 
> str2;
> store joina into '$outputDir/join1' using 
> org.apache.hadoop.zebra.pig.TableStorer('');
> ~ 
>   
> 
> ~ 
>   
> 
> ~  
> ==
> stacktrace:
> org.apache.pig.backend.executionengine.ExecException: ERROR 2176: Error 
> processing right input during merge join at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POMergeJoin.throwProcessingException(POMergeJoin.java:453)
>  at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POMergeJoin.getNextRightInp(POMergeJoin.java:443)
>  at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POMergeJoin.getNext(POMergeJoin.java:337)
>  at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:253)
>  at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.close(PigMapBase.java:107)
>  at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:57) at 
> org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358) at 
> org.apache.hadoop.mapred.MapTask.run(MapTask.java:307) at 
> org.apache.hadoop.mapred.Child.main(Child.java:159) Caused by: 
> java.io.EOFException: No key-value to read at 
> org.apache.hadoop.zebra.tfile.TFile$Reader$Scanner.checkKey(TFile.java:1590) 
> at org.apache.hadoop.zebra.tfile.TFile$Reader$Scanner.entry(TFile.java:1611) 
> at 
> org.apache.hadoop.zebra.io.ColumnGroup$Reader$TFileScanner.getKey(ColumnGroup.java:854)
>  at 
> org.apache.hadoop.zebra.io.ColumnGroup$Reader$CGScanner.getCGKey(ColumnGroup.java:1035)
>  at 
> org.apache.hadoop.zebra.io.BasicTable$Reader$BTScanner.getKey(BasicTable.java:1082)
>  at 
> org.apache.hadoop.zebra.mapred.TableRecordReader.next(TableRecordReader.java:105)
>  at org.apache.hadoop.zebra.pig.

[jira] Commented: (PIG-1145) [zebra] merge join on large table ( 100,000.000 rows zebra table) failed

2009-12-11 Thread Yan Zhou (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1145?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12789196#action_12789196
 ] 

Yan Zhou commented on PIG-1145:
---

Actually with pruning enabled the exception stack is:

Backend error message
-
org.apache.pig.backend.executionengine.ExecException: ERROR 2176: Error 
processing right input during merge join
at 
org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POMergeJoin.throwProcessingException(POMergeJoin.java:453)
at 
org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POMergeJoin.getNext(POMergeJoin.java:186)
at 
org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:260)
at 
org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:237)
at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:253)
at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:244)
at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapOnly$Map.map(PigMapOnly.java:65)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)
at org.apache.hadoop.mapred.Child.main(Child.java:159)
Caused by: java.io.IOException: seekTo() failed: Column Groups are not evenly 
positioned.
at 
org.apache.hadoop.zebra.io.BasicTable$Reader$BTScanner.seekTo(BasicTable.java:1148)
at 
org.apache.hadoop.zebra.mapred.TableRecordReader.seekTo(TableRecordReader.java:120)
at 
org.apache.hadoop.zebra.pig.TableLoader.seekNear(TableLoader.java:190)
at 
org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POMergeJoin.seekInRightStream(POMergeJoin.java:406)
at 
org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POMergeJoin.getNext(POMergeJoin.java:184)
... 9 more


> [zebra] merge join on large table ( 100,000.000 rows zebra table) failed
> 
>
> Key: PIG-1145
> URL: https://issues.apache.org/jira/browse/PIG-1145
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.6.0, 0.7.0
>Reporter: Jing Huang
>Assignee: Yan Zhou
> Fix For: 0.6.0, 0.7.0
>
> Attachments: PIG-1145.patch, PIG-1145.patch
>
>
> Pig script :
> register $zebraJar;
> --fs -rmr $outputDir
> a1 = LOAD '$inputDir/unsorted1' USING 
> org.apache.hadoop.zebra.pig.TableLoader('count,seed,int1,str2');
> a2 = LOAD '$inputDir/unsorted2' USING 
> org.apache.hadoop.zebra.pig.TableLoader('count,seed,int1,str2');
> sort1 = order a1 by str2;
> sort2 = order a2 by str2;
> --store sort1 into '$outputDir/sorted11' using 
> org.apache.hadoop.zebra.pig.TableStorer('[count,seed,int1,str2]');
> --store sort2 into '$outputDir/sorted21' using 
> org.apache.hadoop.zebra.pig.TableStorer('[count,seed,int1,str2]');
> rec1 = load '$outputDir/sorted11' using 
> org.apache.hadoop.zebra.pig.TableLoader();
> rec2 = load '$outputDir/sorted21' using 
> org.apache.hadoop.zebra.pig.TableLoader();
> joina = join rec1 by str2, rec2 by str2 using "merge" ;
> --E = foreach joina  generate $0 as count,  $1 as seed,  $2 as int1,  $3 as 
> str2;
> store joina into '$outputDir/join1' using 
> org.apache.hadoop.zebra.pig.TableStorer('');
> ~ 
>   
> 
> ~ 
>   
> 
> ~  
> ==
> stacktrace:
> org.apache.pig.backend.executionengine.ExecException: ERROR 2176: Error 
> processing right input during merge join at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POMergeJoin.throwProcessingException(POMergeJoin.java:453)
>  at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POMergeJoin.getNextRightInp(POMergeJoin.java:443)
>  at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POMergeJoin.getNext(POMergeJoin.java:337)
>  at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:253)
>  at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.close(PigMapBase.java:107)
>  at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:57) at 
> 

[jira] Commented: (PIG-1145) [zebra] merge join on large table ( 100,000.000 rows zebra table) failed

2009-12-11 Thread Jing Huang (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1145?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12789188#action_12789188
 ] 

Jing Huang commented on PIG-1145:
-

found another failure on merge join
This merge join script failed:
register $zebraJar;
--fs -rmr $outputDir


--a1 = LOAD '$inputDir/unsorted1' USING 
org.apache.hadoop.zebra.pig.TableLoader('count,seed,int1,str2,byte2');
--a2 = LOAD '$inputDir/unsorted2' USING 
org.apache.hadoop.zebra.pig.TableLoader('count,seed,int1,str2,byte2');

--sort1 = order a1 by byte2;
--sort2 = order a2 by byte2;

--store sort1 into '$outputDir/100Msortedbyte21' using 
org.apache.hadoop.zebra.pig.TableStorer('[count,seed,int1,str2];[byte2]');
--store sort2 into '$outputDir/100Msortedbyte22' using 
org.apache.hadoop.zebra.pig.TableStorer('[count,seed,int1,str2];[byte2]');

rec1 = load '$outputDir/100Msortedbyte21' using 
org.apache.hadoop.zebra.pig.TableLoader('','sorted');
rec2 = load '$outputDir/100Msortedbyte22' using 
org.apache.hadoop.zebra.pig.TableLoader('','sorted');

joina = join rec1 by byte2, rec2 by byte2 using "merge" ;

E = foreach joina  generate $0 as count,  $1 as seed,  $2 as int1,  $3 as str2, 
$4 as byte2;

store E into '$outputDir/bad1' using 
org.apache.hadoop.zebra.pig.TableStorer('');
=
instead, this similiar script works with the previous patch:
register $zebraJar;
--fs -rmr $outputDir


a1 = LOAD '$inputDir/unsorted1' USING 
org.apache.hadoop.zebra.pig.TableLoader('count,seed,int1,str2,byte2');
a2 = LOAD '$inputDir/unsorted2' USING 
org.apache.hadoop.zebra.pig.TableLoader('count,seed,int1,str2,byte2');

sort1 = order a1 by byte2;
sort2 = order a2 by byte2;

store sort1 into '$outputDir/100Msortedbyte21' using 
org.apache.hadoop.zebra.pig.TableStorer('[count,seed,int1,str2,byte2]');
store sort2 into '$outputDir/100Msortedbyte22' using 
org.apache.hadoop.zebra.pig.TableStorer('[count,seed,int1,str2,byte2]');

rec1 = load '$outputDir/100Msortedbyte21' using 
org.apache.hadoop.zebra.pig.TableLoader('','sorted');
rec2 = load '$outputDir/100Msortedbyte22' using 
org.apache.hadoop.zebra.pig.TableLoader('','sorted');

joina = join rec1 by byte2, rec2 by byte2 using "merge" ;

E = foreach joina  generate $0 as count,  $1 as seed,  $2 as int1,  $3 as str2, 
$4 as byte2;

store E into '$outputDir/join3' using 
org.apache.hadoop.zebra.pig.TableStorer('');
~ 

Here is stack trace:
Backend error message
-
org.apache.pig.backend.executionengine.ExecException: ERROR 2176: Error 
processing right input during merge join
at 
org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POMergeJoin.throwProcessingException(POMergeJoin.java:453)
at 
org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POMergeJoin.getNextRightInp(POMergeJoin.java:443)
at 
org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POMergeJoin.getNext(POMergeJoin.java:337)
at 
org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:260)
at 
org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:237)
at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:253)
at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.close(PigMapBase.java:107)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:57)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)
at org.apache.hadoop.mapred.Child.main(Child.java:159)
Caused by: java.io.EOFException: No key-value to read
at 
org.apache.hadoop.zebra.tfile.TFile$Reader$Scanner.checkKey(TFile.java:1590)
at 
org.apache.hadoop.zebra.tfile.TFile$Reader$Scanner.entry(TFile.java:1611)
at 
org.apache.hadoop.zebra.io.ColumnGroup$Reader$TFileScanner.getKey(ColumnGroup.java:854)
at 
org.apache.hadoop.zebra.io.ColumnGroup$Reader$CGScanner.getCGKey(ColumnGroup.java:1035)
at 
org.apache.hadoop.zebra.io.BasicTable$Reader$BTScanner.getKey(BasicTable.java:1083)
at 
org.apache.hadoop.zebra.mapred.TableRecordReader.next(TableRecordReader.java:105)
at org.apache.hadoop.zebra.pig.TableLoader.getNext(TableLoader.java:414)
at 
org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POMergeJoin.getNextRightInp(POMergeJoin.java:415)
... 9 more
=
This is how I run it (i disabled pruning to simply the possible problem)
java -cp 
/grid/0/dev/hadoopqa/jing1234/conf:/grid/0/dev/hadoopqa/jars/pig.jar:/grid/0/dev/hadoopqa/jars/tfile.jar:/grid/0/dev/hadoopqa/jars/zebra.jar
 org.apache.pig.Main -m config -M -t PruneColumns bad_join.pig 



> [zebr

[jira] Commented: (PIG-1145) [zebra] merge join on large table ( 100,000.000 rows zebra table) failed

2009-12-10 Thread Jing Huang (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1145?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12788952#action_12788952
 ] 

Jing Huang commented on PIG-1145:
-

I verified fix. 
It works.

> [zebra] merge join on large table ( 100,000.000 rows zebra table) failed
> 
>
> Key: PIG-1145
> URL: https://issues.apache.org/jira/browse/PIG-1145
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.6.0, 0.7.0
>Reporter: Jing Huang
>Assignee: Yan Zhou
> Fix For: 0.6.0, 0.7.0
>
> Attachments: PIG-1145.patch
>
>
> Pig script :
> register $zebraJar;
> --fs -rmr $outputDir
> a1 = LOAD '$inputDir/unsorted1' USING 
> org.apache.hadoop.zebra.pig.TableLoader('count,seed,int1,str2');
> a2 = LOAD '$inputDir/unsorted2' USING 
> org.apache.hadoop.zebra.pig.TableLoader('count,seed,int1,str2');
> sort1 = order a1 by str2;
> sort2 = order a2 by str2;
> --store sort1 into '$outputDir/sorted11' using 
> org.apache.hadoop.zebra.pig.TableStorer('[count,seed,int1,str2]');
> --store sort2 into '$outputDir/sorted21' using 
> org.apache.hadoop.zebra.pig.TableStorer('[count,seed,int1,str2]');
> rec1 = load '$outputDir/sorted11' using 
> org.apache.hadoop.zebra.pig.TableLoader();
> rec2 = load '$outputDir/sorted21' using 
> org.apache.hadoop.zebra.pig.TableLoader();
> joina = join rec1 by str2, rec2 by str2 using "merge" ;
> --E = foreach joina  generate $0 as count,  $1 as seed,  $2 as int1,  $3 as 
> str2;
> store joina into '$outputDir/join1' using 
> org.apache.hadoop.zebra.pig.TableStorer('');
> ~ 
>   
> 
> ~ 
>   
> 
> ~  
> ==
> stacktrace:
> org.apache.pig.backend.executionengine.ExecException: ERROR 2176: Error 
> processing right input during merge join at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POMergeJoin.throwProcessingException(POMergeJoin.java:453)
>  at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POMergeJoin.getNextRightInp(POMergeJoin.java:443)
>  at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POMergeJoin.getNext(POMergeJoin.java:337)
>  at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:253)
>  at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.close(PigMapBase.java:107)
>  at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:57) at 
> org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358) at 
> org.apache.hadoop.mapred.MapTask.run(MapTask.java:307) at 
> org.apache.hadoop.mapred.Child.main(Child.java:159) Caused by: 
> java.io.EOFException: No key-value to read at 
> org.apache.hadoop.zebra.tfile.TFile$Reader$Scanner.checkKey(TFile.java:1590) 
> at org.apache.hadoop.zebra.tfile.TFile$Reader$Scanner.entry(TFile.java:1611) 
> at 
> org.apache.hadoop.zebra.io.ColumnGroup$Reader$TFileScanner.getKey(ColumnGroup.java:854)
>  at 
> org.apache.hadoop.zebra.io.ColumnGroup$Reader$CGScanner.getCGKey(ColumnGroup.java:1035)
>  at 
> org.apache.hadoop.zebra.io.BasicTable$Reader$BTScanner.getKey(BasicTable.java:1082)
>  at 
> org.apache.hadoop.zebra.mapred.TableRecordReader.next(TableRecordReader.java:105)
>  at org.apache.hadoop.zebra.pig.TableLoader.getNext(TableLoader.java:414) at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POMergeJoin.getNextRightInp(POMergeJoin.java:415)
>  ... 7 more 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1145) [zebra] merge join on large table ( 100,000.000 rows zebra table) failed

2009-12-10 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1145?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12788607#action_12788607
 ] 

Hadoop QA commented on PIG-1145:


-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12427556/PIG-1145.patch
  against trunk revision 52.

+1 @author.  The patch does not contain any @author tags.

-1 tests included.  The patch doesn't appear to include any new or modified 
tests.
Please justify why no tests are needed for this patch.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

+1 core tests.  The patch passed core unit tests.

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/111/testReport/
Findbugs warnings: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/111/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/111/console

This message is automatically generated.

> [zebra] merge join on large table ( 100,000.000 rows zebra table) failed
> 
>
> Key: PIG-1145
> URL: https://issues.apache.org/jira/browse/PIG-1145
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.6.0, 0.7.0
>Reporter: Jing Huang
>Assignee: Yan Zhou
> Fix For: 0.6.0, 0.7.0
>
> Attachments: PIG-1145.patch
>
>
> Pig script :
> register $zebraJar;
> --fs -rmr $outputDir
> a1 = LOAD '$inputDir/unsorted1' USING 
> org.apache.hadoop.zebra.pig.TableLoader('count,seed,int1,str2');
> a2 = LOAD '$inputDir/unsorted2' USING 
> org.apache.hadoop.zebra.pig.TableLoader('count,seed,int1,str2');
> sort1 = order a1 by str2;
> sort2 = order a2 by str2;
> --store sort1 into '$outputDir/sorted11' using 
> org.apache.hadoop.zebra.pig.TableStorer('[count,seed,int1,str2]');
> --store sort2 into '$outputDir/sorted21' using 
> org.apache.hadoop.zebra.pig.TableStorer('[count,seed,int1,str2]');
> rec1 = load '$outputDir/sorted11' using 
> org.apache.hadoop.zebra.pig.TableLoader();
> rec2 = load '$outputDir/sorted21' using 
> org.apache.hadoop.zebra.pig.TableLoader();
> joina = join rec1 by str2, rec2 by str2 using "merge" ;
> --E = foreach joina  generate $0 as count,  $1 as seed,  $2 as int1,  $3 as 
> str2;
> store joina into '$outputDir/join1' using 
> org.apache.hadoop.zebra.pig.TableStorer('');
> ~ 
>   
> 
> ~ 
>   
> 
> ~  
> ==
> stacktrace:
> org.apache.pig.backend.executionengine.ExecException: ERROR 2176: Error 
> processing right input during merge join at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POMergeJoin.throwProcessingException(POMergeJoin.java:453)
>  at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POMergeJoin.getNextRightInp(POMergeJoin.java:443)
>  at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POMergeJoin.getNext(POMergeJoin.java:337)
>  at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:253)
>  at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.close(PigMapBase.java:107)
>  at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:57) at 
> org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358) at 
> org.apache.hadoop.mapred.MapTask.run(MapTask.java:307) at 
> org.apache.hadoop.mapred.Child.main(Child.java:159) Caused by: 
> java.io.EOFException: No key-value to read at 
> org.apache.hadoop.zebra.tfile.TFile$Reader$Scanner.checkKey(TFile.java:1590) 
> at org.apache.hadoop.zebra.tfile.TFile$Reader$Scanner.entry(TFile.java:1611) 
> at 
> org.apache.hadoop.zebra.io.ColumnGroup$Reader$TFileScanner.getKey(ColumnGroup.java:854)
>  at 
> org.apache.hadoop.zebra.io.ColumnGroup$Reader$CGScanner.getCGKey(ColumnGroup.java:1035)
>  at 
> org.apache.hadoop.zebra.io.BasicTable$Reader$BTScanner.getKey(BasicTable.java:1082)
>  at 
> org.apache.hadoop.zebra.mapred.TableRecordReader.next(Table

[jira] Commented: (PIG-1145) [zebra] merge join on large table ( 100,000.000 rows zebra table) failed

2009-12-09 Thread Yan Zhou (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1145?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12788524#action_12788524
 ] 

Yan Zhou commented on PIG-1145:
---

The problem is that if the seek call on the index table is on a key that is 
past the last key of a data file, the scanner is positioned past the EOF of 
that data file. Instead it should be positioned to the beginning of the next 
data file. As result, since the CGScanner.atEnd method only checks if the 
current file index is within the valid range and leaves the responsibility of 
setting the proper file index to the position movers such as the scanner's 
advance and seekTo methods, positioning the scanner past the EOF of any data 
file will cause an EOF to be thrown.

The fix is to add a check in the scanner's seekTo method so that if after seek 
the position is past the end of a data file, it will be positioned to the start 
of the next data file, just as the advace method already does.

> [zebra] merge join on large table ( 100,000.000 rows zebra table) failed
> 
>
> Key: PIG-1145
> URL: https://issues.apache.org/jira/browse/PIG-1145
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.6.0, 0.7.0
>Reporter: Jing Huang
>Assignee: Yan Zhou
> Fix For: 0.6.0, 0.7.0
>
> Attachments: PIG-1145.patch
>
>
> Pig script :
> register $zebraJar;
> --fs -rmr $outputDir
> a1 = LOAD '$inputDir/unsorted1' USING 
> org.apache.hadoop.zebra.pig.TableLoader('count,seed,int1,str2');
> a2 = LOAD '$inputDir/unsorted2' USING 
> org.apache.hadoop.zebra.pig.TableLoader('count,seed,int1,str2');
> sort1 = order a1 by str2;
> sort2 = order a2 by str2;
> --store sort1 into '$outputDir/sorted11' using 
> org.apache.hadoop.zebra.pig.TableStorer('[count,seed,int1,str2]');
> --store sort2 into '$outputDir/sorted21' using 
> org.apache.hadoop.zebra.pig.TableStorer('[count,seed,int1,str2]');
> rec1 = load '$outputDir/sorted11' using 
> org.apache.hadoop.zebra.pig.TableLoader();
> rec2 = load '$outputDir/sorted21' using 
> org.apache.hadoop.zebra.pig.TableLoader();
> joina = join rec1 by str2, rec2 by str2 using "merge" ;
> --E = foreach joina  generate $0 as count,  $1 as seed,  $2 as int1,  $3 as 
> str2;
> store joina into '$outputDir/join1' using 
> org.apache.hadoop.zebra.pig.TableStorer('');
> ~ 
>   
> 
> ~ 
>   
> 
> ~  
> ==
> stacktrace:
> org.apache.pig.backend.executionengine.ExecException: ERROR 2176: Error 
> processing right input during merge join at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POMergeJoin.throwProcessingException(POMergeJoin.java:453)
>  at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POMergeJoin.getNextRightInp(POMergeJoin.java:443)
>  at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POMergeJoin.getNext(POMergeJoin.java:337)
>  at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:253)
>  at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.close(PigMapBase.java:107)
>  at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:57) at 
> org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358) at 
> org.apache.hadoop.mapred.MapTask.run(MapTask.java:307) at 
> org.apache.hadoop.mapred.Child.main(Child.java:159) Caused by: 
> java.io.EOFException: No key-value to read at 
> org.apache.hadoop.zebra.tfile.TFile$Reader$Scanner.checkKey(TFile.java:1590) 
> at org.apache.hadoop.zebra.tfile.TFile$Reader$Scanner.entry(TFile.java:1611) 
> at 
> org.apache.hadoop.zebra.io.ColumnGroup$Reader$TFileScanner.getKey(ColumnGroup.java:854)
>  at 
> org.apache.hadoop.zebra.io.ColumnGroup$Reader$CGScanner.getCGKey(ColumnGroup.java:1035)
>  at 
> org.apache.hadoop.zebra.io.BasicTable$Reader$BTScanner.getKey(BasicTable.java:1082)
>  at 
> org.apache.hadoop.zebra.mapred.TableRecordReader.next(TableRecordReader.java:105)
>  at org.apache.hadoop.zebra.pig.TableLoader.getNext(TableLoader.java:414) at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POMergeJoin.getNextRightInp(POMergeJoin.java:415)
>  ... 7 more 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1145) [zebra] merge join on large table ( 100,000.000 rows zebra table) failed

2009-12-09 Thread Chao Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1145?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12788496#action_12788496
 ] 

Chao Wang commented on PIG-1145:


Patch reviewed +1.

> [zebra] merge join on large table ( 100,000.000 rows zebra table) failed
> 
>
> Key: PIG-1145
> URL: https://issues.apache.org/jira/browse/PIG-1145
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.6.0, 0.7.0
>Reporter: Jing Huang
>Assignee: Yan Zhou
> Fix For: 0.6.0, 0.7.0
>
> Attachments: PIG-1145.patch
>
>
> Pig script :
> register $zebraJar;
> --fs -rmr $outputDir
> a1 = LOAD '$inputDir/unsorted1' USING 
> org.apache.hadoop.zebra.pig.TableLoader('count,seed,int1,str2');
> a2 = LOAD '$inputDir/unsorted2' USING 
> org.apache.hadoop.zebra.pig.TableLoader('count,seed,int1,str2');
> sort1 = order a1 by str2;
> sort2 = order a2 by str2;
> --store sort1 into '$outputDir/sorted11' using 
> org.apache.hadoop.zebra.pig.TableStorer('[count,seed,int1,str2]');
> --store sort2 into '$outputDir/sorted21' using 
> org.apache.hadoop.zebra.pig.TableStorer('[count,seed,int1,str2]');
> rec1 = load '$outputDir/sorted11' using 
> org.apache.hadoop.zebra.pig.TableLoader();
> rec2 = load '$outputDir/sorted21' using 
> org.apache.hadoop.zebra.pig.TableLoader();
> joina = join rec1 by str2, rec2 by str2 using "merge" ;
> --E = foreach joina  generate $0 as count,  $1 as seed,  $2 as int1,  $3 as 
> str2;
> store joina into '$outputDir/join1' using 
> org.apache.hadoop.zebra.pig.TableStorer('');
> ~ 
>   
> 
> ~ 
>   
> 
> ~  
> ==
> stacktrace:
> org.apache.pig.backend.executionengine.ExecException: ERROR 2176: Error 
> processing right input during merge join at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POMergeJoin.throwProcessingException(POMergeJoin.java:453)
>  at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POMergeJoin.getNextRightInp(POMergeJoin.java:443)
>  at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POMergeJoin.getNext(POMergeJoin.java:337)
>  at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:253)
>  at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.close(PigMapBase.java:107)
>  at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:57) at 
> org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358) at 
> org.apache.hadoop.mapred.MapTask.run(MapTask.java:307) at 
> org.apache.hadoop.mapred.Child.main(Child.java:159) Caused by: 
> java.io.EOFException: No key-value to read at 
> org.apache.hadoop.zebra.tfile.TFile$Reader$Scanner.checkKey(TFile.java:1590) 
> at org.apache.hadoop.zebra.tfile.TFile$Reader$Scanner.entry(TFile.java:1611) 
> at 
> org.apache.hadoop.zebra.io.ColumnGroup$Reader$TFileScanner.getKey(ColumnGroup.java:854)
>  at 
> org.apache.hadoop.zebra.io.ColumnGroup$Reader$CGScanner.getCGKey(ColumnGroup.java:1035)
>  at 
> org.apache.hadoop.zebra.io.BasicTable$Reader$BTScanner.getKey(BasicTable.java:1082)
>  at 
> org.apache.hadoop.zebra.mapred.TableRecordReader.next(TableRecordReader.java:105)
>  at org.apache.hadoop.zebra.pig.TableLoader.getNext(TableLoader.java:414) at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POMergeJoin.getNextRightInp(POMergeJoin.java:415)
>  ... 7 more 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.