[jira] Updated: (PIG-1159) merge join right side table does not support comma seperated paths

2009-12-21 Thread Olga Natkovich (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1159?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Olga Natkovich updated PIG-1159:


Resolution: Fixed
Status: Resolved  (was: Patch Available)

patch committed. Thanks, Richard

 merge join right side table does not support comma seperated paths
 --

 Key: PIG-1159
 URL: https://issues.apache.org/jira/browse/PIG-1159
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.6.0
Reporter: Jing Huang
Assignee: Richard Ding
 Fix For: 0.7.0

 Attachments: PIG-1159.patch


 For example this is my script:(join_jira1.pig)
 register /grid/0/dev/hadoopqa/jars/zebra.jar;
 --a1 = load '1.txt' as (a:int, 
 b:float,c:long,d:double,e:chararray,f:bytearray,r1(f1:chararray,f2:chararray),m1:map[]);
 --a2 = load '2.txt' as (a:int, 
 b:float,c:long,d:double,e:chararray,f:bytearray,r1(f1:chararray,f2:chararray),m1:map[]);
 --sort1 = order a1 by a parallel 6;
 --sort2 = order a2 by a parallel 5;
 --store sort1 into 'asort1' using 
 org.apache.hadoop.zebra.pig.TableStorer('[a,b,c,d]');
 --store sort2 into 'asort2' using 
 org.apache.hadoop.zebra.pig.TableStorer('[a,b,c,d]');
 --store sort1 into 'asort3' using 
 org.apache.hadoop.zebra.pig.TableStorer('[a,b,c,d]');
 --store sort2 into 'asort4' using 
 org.apache.hadoop.zebra.pig.TableStorer('[a,b,c,d]');
 joinl = LOAD 'asort1,asort2' USING 
 org.apache.hadoop.zebra.pig.TableLoader('a,b,c,d', 'sorted');
 joinr = LOAD 'asort3,asort4' USING 
 org.apache.hadoop.zebra.pig.TableLoader('a,b,c,d', 'sorted');
 joina = join joinl by a, joinr by a using merge ;
 dump joina;
 ==
 here is the log:
 Backend error message
 -
 java.lang.IllegalArgumentException: Pathname 
 /user/hadoopqa/asort3,hdfs:/gsbl90380.blue.ygrid.yahoo.com/user/hadoopqa/asort4
  from 
 hdfs://gsbl90380.blue.ygrid.yahoo.com/user/hadoopqa/asort3,hdfs:/gsbl90380.blue.ygrid.yahoo.com/user/hadoopqa/asort4
  is not a valid DFS filename.
 at 
 org.apache.hadoop.hdfs.DistributedFileSystem.getPathName(DistributedFileSystem.java:158)
 at 
 org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:453)
 at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:648)
 at 
 org.apache.pig.backend.hadoop.datastorage.HDataStorage.isContainer(HDataStorage.java:203)
 at 
 org.apache.pig.backend.hadoop.datastorage.HDataStorage.asElement(HDataStorage.java:131)
 at 
 org.apache.pig.backend.hadoop.datastorage.HDataStorage.asElement(HDataStorage.java:147)
 at 
 org.apache.pig.impl.io.FileLocalizer.fullPath(FileLocalizer.java:534)
 at org.apache.pig.impl.io.FileLocalizer.open(FileLocalizer.java:338)
 at 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POMergeJoin.seekInRightStream(POMergeJoin.java:398)
 at 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POMergeJoin.getNext(POMergeJoin.java:184)
 at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:253)
 at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:244)
 at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapOnly$Map.map(PigMapOnly.java:65)
 at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
 at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358)
 at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)
 at org.apache.hadoop.mapred.Child.main(Child.java:159)
 Pig Stack Trace
 ---
 ERROR 6015: During execution, encountered a Hadoop error.
 org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1066: Unable to 
 open iterator for alias joina
 at org.apache.pig.PigServer.openIterator(PigServer.java:482)
 at 
 org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:539)
 at 
 org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:241)
 at 
 org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:168)
 at 
 org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:144)
 at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:89)
 at org.apache.pig.Main.main(Main.java:386)
 Caused by: org.apache.pig.backend.executionengine.ExecException: ERROR 6015: 
 During execution, encountered a Hadoop error.
 at 
 .apache.hadoop.hdfs.DistributedFileSystem.getPathName(DistributedFileSystem.java:158)
 at 
 .apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:453)
 at .apache.hadoop.fs.FileSystem.exists(FileSystem.java:648)at 
 

[jira] Updated: (PIG-1159) merge join right side table does not support comma seperated paths

2009-12-18 Thread Richard Ding (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1159?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Richard Ding updated PIG-1159:
--

Attachment: PIG-1159.patch

With this patch, Pig runtime no longer passes an InputStream to IndexableLoader 
through the bindTo method. An IndexableLoader is resposible to create its own 
InputStream for reading data. 

This actually isn't a new requirement:  currently all existing IndexableLoaders 
create their own InputStreams. And, in the future, with the load-store 
redesign, Pig runtime will no longer create InputStreams for the loaders.

 merge join right side table does not support comma seperated paths
 --

 Key: PIG-1159
 URL: https://issues.apache.org/jira/browse/PIG-1159
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.6.0
Reporter: Jing Huang
Assignee: Richard Ding
 Fix For: 0.7.0

 Attachments: PIG-1159.patch


 For example this is my script:(join_jira1.pig)
 register /grid/0/dev/hadoopqa/jars/zebra.jar;
 --a1 = load '1.txt' as (a:int, 
 b:float,c:long,d:double,e:chararray,f:bytearray,r1(f1:chararray,f2:chararray),m1:map[]);
 --a2 = load '2.txt' as (a:int, 
 b:float,c:long,d:double,e:chararray,f:bytearray,r1(f1:chararray,f2:chararray),m1:map[]);
 --sort1 = order a1 by a parallel 6;
 --sort2 = order a2 by a parallel 5;
 --store sort1 into 'asort1' using 
 org.apache.hadoop.zebra.pig.TableStorer('[a,b,c,d]');
 --store sort2 into 'asort2' using 
 org.apache.hadoop.zebra.pig.TableStorer('[a,b,c,d]');
 --store sort1 into 'asort3' using 
 org.apache.hadoop.zebra.pig.TableStorer('[a,b,c,d]');
 --store sort2 into 'asort4' using 
 org.apache.hadoop.zebra.pig.TableStorer('[a,b,c,d]');
 joinl = LOAD 'asort1,asort2' USING 
 org.apache.hadoop.zebra.pig.TableLoader('a,b,c,d', 'sorted');
 joinr = LOAD 'asort3,asort4' USING 
 org.apache.hadoop.zebra.pig.TableLoader('a,b,c,d', 'sorted');
 joina = join joinl by a, joinr by a using merge ;
 dump joina;
 ==
 here is the log:
 Backend error message
 -
 java.lang.IllegalArgumentException: Pathname 
 /user/hadoopqa/asort3,hdfs:/gsbl90380.blue.ygrid.yahoo.com/user/hadoopqa/asort4
  from 
 hdfs://gsbl90380.blue.ygrid.yahoo.com/user/hadoopqa/asort3,hdfs:/gsbl90380.blue.ygrid.yahoo.com/user/hadoopqa/asort4
  is not a valid DFS filename.
 at 
 org.apache.hadoop.hdfs.DistributedFileSystem.getPathName(DistributedFileSystem.java:158)
 at 
 org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:453)
 at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:648)
 at 
 org.apache.pig.backend.hadoop.datastorage.HDataStorage.isContainer(HDataStorage.java:203)
 at 
 org.apache.pig.backend.hadoop.datastorage.HDataStorage.asElement(HDataStorage.java:131)
 at 
 org.apache.pig.backend.hadoop.datastorage.HDataStorage.asElement(HDataStorage.java:147)
 at 
 org.apache.pig.impl.io.FileLocalizer.fullPath(FileLocalizer.java:534)
 at org.apache.pig.impl.io.FileLocalizer.open(FileLocalizer.java:338)
 at 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POMergeJoin.seekInRightStream(POMergeJoin.java:398)
 at 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POMergeJoin.getNext(POMergeJoin.java:184)
 at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:253)
 at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:244)
 at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapOnly$Map.map(PigMapOnly.java:65)
 at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
 at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358)
 at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)
 at org.apache.hadoop.mapred.Child.main(Child.java:159)
 Pig Stack Trace
 ---
 ERROR 6015: During execution, encountered a Hadoop error.
 org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1066: Unable to 
 open iterator for alias joina
 at org.apache.pig.PigServer.openIterator(PigServer.java:482)
 at 
 org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:539)
 at 
 org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:241)
 at 
 org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:168)
 at 
 org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:144)
 at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:89)
 at org.apache.pig.Main.main(Main.java:386)
 Caused by: org.apache.pig.backend.executionengine.ExecException: ERROR 6015: 
 During 

[jira] Updated: (PIG-1159) merge join right side table does not support comma seperated paths

2009-12-18 Thread Richard Ding (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1159?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Richard Ding updated PIG-1159:
--

Status: Patch Available  (was: Open)

 merge join right side table does not support comma seperated paths
 --

 Key: PIG-1159
 URL: https://issues.apache.org/jira/browse/PIG-1159
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.6.0
Reporter: Jing Huang
Assignee: Richard Ding
 Fix For: 0.7.0

 Attachments: PIG-1159.patch


 For example this is my script:(join_jira1.pig)
 register /grid/0/dev/hadoopqa/jars/zebra.jar;
 --a1 = load '1.txt' as (a:int, 
 b:float,c:long,d:double,e:chararray,f:bytearray,r1(f1:chararray,f2:chararray),m1:map[]);
 --a2 = load '2.txt' as (a:int, 
 b:float,c:long,d:double,e:chararray,f:bytearray,r1(f1:chararray,f2:chararray),m1:map[]);
 --sort1 = order a1 by a parallel 6;
 --sort2 = order a2 by a parallel 5;
 --store sort1 into 'asort1' using 
 org.apache.hadoop.zebra.pig.TableStorer('[a,b,c,d]');
 --store sort2 into 'asort2' using 
 org.apache.hadoop.zebra.pig.TableStorer('[a,b,c,d]');
 --store sort1 into 'asort3' using 
 org.apache.hadoop.zebra.pig.TableStorer('[a,b,c,d]');
 --store sort2 into 'asort4' using 
 org.apache.hadoop.zebra.pig.TableStorer('[a,b,c,d]');
 joinl = LOAD 'asort1,asort2' USING 
 org.apache.hadoop.zebra.pig.TableLoader('a,b,c,d', 'sorted');
 joinr = LOAD 'asort3,asort4' USING 
 org.apache.hadoop.zebra.pig.TableLoader('a,b,c,d', 'sorted');
 joina = join joinl by a, joinr by a using merge ;
 dump joina;
 ==
 here is the log:
 Backend error message
 -
 java.lang.IllegalArgumentException: Pathname 
 /user/hadoopqa/asort3,hdfs:/gsbl90380.blue.ygrid.yahoo.com/user/hadoopqa/asort4
  from 
 hdfs://gsbl90380.blue.ygrid.yahoo.com/user/hadoopqa/asort3,hdfs:/gsbl90380.blue.ygrid.yahoo.com/user/hadoopqa/asort4
  is not a valid DFS filename.
 at 
 org.apache.hadoop.hdfs.DistributedFileSystem.getPathName(DistributedFileSystem.java:158)
 at 
 org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:453)
 at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:648)
 at 
 org.apache.pig.backend.hadoop.datastorage.HDataStorage.isContainer(HDataStorage.java:203)
 at 
 org.apache.pig.backend.hadoop.datastorage.HDataStorage.asElement(HDataStorage.java:131)
 at 
 org.apache.pig.backend.hadoop.datastorage.HDataStorage.asElement(HDataStorage.java:147)
 at 
 org.apache.pig.impl.io.FileLocalizer.fullPath(FileLocalizer.java:534)
 at org.apache.pig.impl.io.FileLocalizer.open(FileLocalizer.java:338)
 at 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POMergeJoin.seekInRightStream(POMergeJoin.java:398)
 at 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POMergeJoin.getNext(POMergeJoin.java:184)
 at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:253)
 at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:244)
 at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapOnly$Map.map(PigMapOnly.java:65)
 at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
 at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358)
 at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)
 at org.apache.hadoop.mapred.Child.main(Child.java:159)
 Pig Stack Trace
 ---
 ERROR 6015: During execution, encountered a Hadoop error.
 org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1066: Unable to 
 open iterator for alias joina
 at org.apache.pig.PigServer.openIterator(PigServer.java:482)
 at 
 org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:539)
 at 
 org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:241)
 at 
 org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:168)
 at 
 org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:144)
 at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:89)
 at org.apache.pig.Main.main(Main.java:386)
 Caused by: org.apache.pig.backend.executionengine.ExecException: ERROR 6015: 
 During execution, encountered a Hadoop error.
 at 
 .apache.hadoop.hdfs.DistributedFileSystem.getPathName(DistributedFileSystem.java:158)
 at 
 .apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:453)
 at .apache.hadoop.fs.FileSystem.exists(FileSystem.java:648)at 
 .apache.pig.backend.hadoop.datastorage.HDataStorage.isContainer(HDataStorage.java:203)