from:"Pradeep Kamath"


[ 
https://issues.apache.org/jira/browse/PIG-1211?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12864386#action_12864386
 ] 

Pradeep Kamath commented on PIG-1211:
-

core unit tests are pass on my local machine - the errors reported above seem 
to be related to the environment. The release audit warning is due to a html 
file change and can be ignored - the patch is ready for review.

 Pig script runs half way after which it reports syntax error
 

 Key: PIG-1211
 URL: https://issues.apache.org/jira/browse/PIG-1211
 Project: Pig
  Issue Type: Improvement
  Components: impl
Affects Versions: 0.6.0
Reporter: Viraj Bhat
 Fix For: 0.8.0

 Attachments: PIG-1211.patch


 I have a Pig script which is structured in the following way
 {code}
 register cp.jar
 dataset = load '/data/dataset/' using PigStorage('\u0001') as (col1, col2, 
 col3, col4, col5);
 filtered_dataset = filter dataset by (col1 == 1);
 proj_filtered_dataset = foreach filtered_dataset generate col2, col3;
 rmf $output1;
 store proj_filtered_dataset into '$output1' using PigStorage();
 second_stream = foreach filtered_dataset  generate col2, col4, col5;
 group_second_stream = group second_stream by col4;
 output2 = foreach group_second_stream {
  a =  second_stream.col2
  b =   distinct second_stream.col5;
  c = order b by $0;
  generate 1 as key, group as keyword, MYUDF(c, 100) as finalcalc;
 }
 rmf  $output2;
 --syntax error here
 store output2 to '$output2' using PigStorage();
 {code}
 I run this script using the Multi-query option, it runs successfully till the 
 first store but later fails with a syntax error. 
 The usage of HDFS option, rmf causes the first store to execute. 
 The only option the I have is to run an explain before running his script 
 grunt explain -script myscript.pig -out explain.out
 or moving the rmf statements to the top of the script
 Here are some questions:
 a) Can we have an option to do something like checkscript instead of 
 explain to get the same syntax error?  In this way I can ensure that I do not 
 run for 3-4 hours before encountering a syntax error
 b) Can pig not figure out a way to re-order the rmf statements since all the 
 store directories are variables
 Thanks
 Viraj

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (PIG-1378) har url of the form har:///path not usable in Pig scripts (har://hdfs-namenode:port/path works)


 [ 
https://issues.apache.org/jira/browse/PIG-1378?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pradeep Kamath updated PIG-1378:


Summary: har url of the form har:///path not usable in Pig scripts 
(har://hdfs-namenode:port/path works)  (was: har url not usable in Pig 
scripts)

 har url of the form har:///path not usable in Pig scripts 
 (har://hdfs-namenode:port/path works)
 ---

 Key: PIG-1378
 URL: https://issues.apache.org/jira/browse/PIG-1378
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.7.0
Reporter: Viraj Bhat
Assignee: Pradeep Kamath
 Fix For: 0.8.0

 Attachments: PIG-1378-2.patch, PIG-1378-3.patch, PIG-1378-4.patch, 
 PIG-1378.patch


 I am trying to use har (Hadoop Archives) in my Pig script.
 I can use them through the HDFS shell
 {noformat}
 $hadoop fs -ls 'har:///user/viraj/project/subproject/files/size/data'
 Found 1 items
 -rw---   5 viraj users1537234 2010-04-14 09:49 
 user/viraj/project/subproject/files/size/data/part-1
 {noformat}
 Using similar URL's in grunt yields
 {noformat}
 grunt a = load 'har:///user/viraj/project/subproject/files/size/data'; 
 grunt dump a;
 {noformat}
 {noformat}
 2010-04-14 22:08:48,814 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 
 2998: Unhandled internal error. 
 org.apache.pig.impl.logicalLayer.FrontendException: ERROR 0: Incompatible 
 file URI scheme: har : hdfs
 2010-04-14 22:08:48,814 [main] WARN  org.apache.pig.tools.grunt.Grunt - There 
 is no log file to write to.
 2010-04-14 22:08:48,814 [main] ERROR org.apache.pig.tools.grunt.Grunt - 
 java.lang.Error: org.apache.pig.impl.logicalLayer.FrontendException: ERROR 0: 
 Incompatible file URI scheme: har : hdfs
 at 
 org.apache.pig.impl.logicalLayer.parser.QueryParser.LoadClause(QueryParser.java:1483)
 at 
 org.apache.pig.impl.logicalLayer.parser.QueryParser.BaseExpr(QueryParser.java:1245)
 at 
 org.apache.pig.impl.logicalLayer.parser.QueryParser.Expr(QueryParser.java:911)
 at 
 org.apache.pig.impl.logicalLayer.parser.QueryParser.Parse(QueryParser.java:700)
 at 
 org.apache.pig.impl.logicalLayer.LogicalPlanBuilder.parse(LogicalPlanBuilder.java:63)
 at org.apache.pig.PigServer$Graph.parseQuery(PigServer.java:1164)
 at org.apache.pig.PigServer$Graph.registerQuery(PigServer.java:1114)
 at org.apache.pig.PigServer.registerQuery(PigServer.java:425)
 at 
 org.apache.pig.tools.grunt.GruntParser.processPig(GruntParser.java:737)
 at 
 org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:324)
 at 
 org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:162)
 at 
 org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:138)
 at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:75)
 at org.apache.pig.Main.main(Main.java:357)
 Caused by: org.apache.pig.impl.logicalLayer.FrontendException: ERROR 0: 
 Incompatible file URI scheme: har : hdfs
 at org.apache.pig.LoadFunc.getAbsolutePath(LoadFunc.java:249)
 at org.apache.pig.LoadFunc.relativeToAbsolutePath(LoadFunc.java:62)
 at 
 org.apache.pig.impl.logicalLayer.parser.QueryParser.LoadClause(QueryParser.java:1472)
 ... 13 more
 {noformat}
 According to Jira http://issues.apache.org/jira/browse/PIG-1234 I try the 
 following as stated in the original description
 {noformat}
 grunt a = load 
 'har://namenode-location/user/viraj/project/subproject/files/size/data'; 
 grunt dump a;
 {noformat}
 {noformat}
 Caused by: org.apache.pig.backend.executionengine.ExecException: ERROR 2118: 
 Unable to create input splits for: 
 har://namenode-location/user/viraj/project/subproject/files/size/data'; 
 ... 8 more
 Caused by: java.io.IOException: No FileSystem for scheme: namenode-location
 at .apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1375)
 at .apache.hadoop.fs.FileSystem.access(200(FileSystem.java:66)
 at .apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1390)
 at .apache.hadoop.fs.FileSystem.get(FileSystem.java:196)
 at .apache.hadoop.fs.HarFileSystem.initialize(HarFileSystem.java:104)
 at .apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1378)
 at .apache.hadoop.fs.FileSystem.get(FileSystem.java:193)
 at .apache.hadoop.fs.Path.getFileSystem(Path.java:175)
 at 
 .apache.hadoop.mapreduce.lib.input.FileInputFormat.listStatus(FileInputFormat.java:208)
 at 
 .apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigTextInputFormat.listStatus(PigTextInputFormat.java:36)
 at 
 .apache.hadoop.mapreduce.lib.input.FileInputFormat.getSplits

[jira] Resolved: (PIG-1378) har url of the form har:///path not usable in Pig scripts (har://hdfs-namenode:port/path works)


 [ 
https://issues.apache.org/jira/browse/PIG-1378?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pradeep Kamath resolved PIG-1378.
-

Release Note: The fix for this issue described in this jira depends on a 
issue with Hadoop code which was fixed on the hadoop trunk ( 
https://issues.apache.org/jira/browse/MAPREDUCE-1522). Until that goes into a 
hadoop release which is used by pig, this will remain an issue 
  Resolution: Fixed

Am closing this bug since the pig changes are in and hadoop changes are in 
trunk - this should work once we use the appropriate hadoop release.

 har url of the form har:///path not usable in Pig scripts 
 (har://hdfs-namenode:port/path works)
 ---

 Key: PIG-1378
 URL: https://issues.apache.org/jira/browse/PIG-1378
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.7.0
Reporter: Viraj Bhat
Assignee: Pradeep Kamath
 Fix For: 0.8.0

 Attachments: PIG-1378-2.patch, PIG-1378-3.patch, PIG-1378-4.patch, 
 PIG-1378.patch


 I am trying to use har (Hadoop Archives) in my Pig script.
 I can use them through the HDFS shell
 {noformat}
 $hadoop fs -ls 'har:///user/viraj/project/subproject/files/size/data'
 Found 1 items
 -rw---   5 viraj users1537234 2010-04-14 09:49 
 user/viraj/project/subproject/files/size/data/part-1
 {noformat}
 Using similar URL's in grunt yields
 {noformat}
 grunt a = load 'har:///user/viraj/project/subproject/files/size/data'; 
 grunt dump a;
 {noformat}
 {noformat}
 2010-04-14 22:08:48,814 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 
 2998: Unhandled internal error. 
 org.apache.pig.impl.logicalLayer.FrontendException: ERROR 0: Incompatible 
 file URI scheme: har : hdfs
 2010-04-14 22:08:48,814 [main] WARN  org.apache.pig.tools.grunt.Grunt - There 
 is no log file to write to.
 2010-04-14 22:08:48,814 [main] ERROR org.apache.pig.tools.grunt.Grunt - 
 java.lang.Error: org.apache.pig.impl.logicalLayer.FrontendException: ERROR 0: 
 Incompatible file URI scheme: har : hdfs
 at 
 org.apache.pig.impl.logicalLayer.parser.QueryParser.LoadClause(QueryParser.java:1483)
 at 
 org.apache.pig.impl.logicalLayer.parser.QueryParser.BaseExpr(QueryParser.java:1245)
 at 
 org.apache.pig.impl.logicalLayer.parser.QueryParser.Expr(QueryParser.java:911)
 at 
 org.apache.pig.impl.logicalLayer.parser.QueryParser.Parse(QueryParser.java:700)
 at 
 org.apache.pig.impl.logicalLayer.LogicalPlanBuilder.parse(LogicalPlanBuilder.java:63)
 at org.apache.pig.PigServer$Graph.parseQuery(PigServer.java:1164)
 at org.apache.pig.PigServer$Graph.registerQuery(PigServer.java:1114)
 at org.apache.pig.PigServer.registerQuery(PigServer.java:425)
 at 
 org.apache.pig.tools.grunt.GruntParser.processPig(GruntParser.java:737)
 at 
 org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:324)
 at 
 org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:162)
 at 
 org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:138)
 at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:75)
 at org.apache.pig.Main.main(Main.java:357)
 Caused by: org.apache.pig.impl.logicalLayer.FrontendException: ERROR 0: 
 Incompatible file URI scheme: har : hdfs
 at org.apache.pig.LoadFunc.getAbsolutePath(LoadFunc.java:249)
 at org.apache.pig.LoadFunc.relativeToAbsolutePath(LoadFunc.java:62)
 at 
 org.apache.pig.impl.logicalLayer.parser.QueryParser.LoadClause(QueryParser.java:1472)
 ... 13 more
 {noformat}
 According to Jira http://issues.apache.org/jira/browse/PIG-1234 I try the 
 following as stated in the original description
 {noformat}
 grunt a = load 
 'har://namenode-location/user/viraj/project/subproject/files/size/data'; 
 grunt dump a;
 {noformat}
 {noformat}
 Caused by: org.apache.pig.backend.executionengine.ExecException: ERROR 2118: 
 Unable to create input splits for: 
 har://namenode-location/user/viraj/project/subproject/files/size/data'; 
 ... 8 more
 Caused by: java.io.IOException: No FileSystem for scheme: namenode-location
 at .apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1375)
 at .apache.hadoop.fs.FileSystem.access(200(FileSystem.java:66)
 at .apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1390)
 at .apache.hadoop.fs.FileSystem.get(FileSystem.java:196)
 at .apache.hadoop.fs.HarFileSystem.initialize(HarFileSystem.java:104)
 at .apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1378)
 at .apache.hadoop.fs.FileSystem.get(FileSystem.java:193)
 at .apache.hadoop.fs.Path.getFileSystem(Path.java:175

[jira] Updated: (PIG-1211) Pig script runs half way after which it reports syntax error


 [ 
https://issues.apache.org/jira/browse/PIG-1211?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pradeep Kamath updated PIG-1211:


  Status: Resolved  (was: Patch Available)
Hadoop Flags: [Incompatible change, Reviewed]
Release Note: -c (-cluster) was earlier documented as the option to provide 
cluster information - this was not being used in the Pig code though - with 
PIG-1211, -c is being reused as the option to check syntax of the pig script 
  Resolution: Fixed

Patch committed to trunk

 Pig script runs half way after which it reports syntax error
 

 Key: PIG-1211
 URL: https://issues.apache.org/jira/browse/PIG-1211
 Project: Pig
  Issue Type: Improvement
  Components: impl
Affects Versions: 0.6.0
Reporter: Viraj Bhat
 Fix For: 0.8.0

 Attachments: PIG-1211.patch


 I have a Pig script which is structured in the following way
 {code}
 register cp.jar
 dataset = load '/data/dataset/' using PigStorage('\u0001') as (col1, col2, 
 col3, col4, col5);
 filtered_dataset = filter dataset by (col1 == 1);
 proj_filtered_dataset = foreach filtered_dataset generate col2, col3;
 rmf $output1;
 store proj_filtered_dataset into '$output1' using PigStorage();
 second_stream = foreach filtered_dataset  generate col2, col4, col5;
 group_second_stream = group second_stream by col4;
 output2 = foreach group_second_stream {
  a =  second_stream.col2
  b =   distinct second_stream.col5;
  c = order b by $0;
  generate 1 as key, group as keyword, MYUDF(c, 100) as finalcalc;
 }
 rmf  $output2;
 --syntax error here
 store output2 to '$output2' using PigStorage();
 {code}
 I run this script using the Multi-query option, it runs successfully till the 
 first store but later fails with a syntax error. 
 The usage of HDFS option, rmf causes the first store to execute. 
 The only option the I have is to run an explain before running his script 
 grunt explain -script myscript.pig -out explain.out
 or moving the rmf statements to the top of the script
 Here are some questions:
 a) Can we have an option to do something like checkscript instead of 
 explain to get the same syntax error?  In this way I can ensure that I do not 
 run for 3-4 hours before encountering a syntax error
 b) Can pig not figure out a way to re-order the rmf statements since all the 
 store directories are variables
 Thanks
 Viraj

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (PIG-1401) explain -script script file executes grunt commands like run/dump/copy etc - explain -script should not execute any grunt command and only explain the query plans.


 [ 
https://issues.apache.org/jira/browse/PIG-1401?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pradeep Kamath updated PIG-1401:


Status: Patch Available  (was: Open)

 explain -script script file executes grunt commands like run/dump/copy 
 etc - explain -script should not execute any grunt command and only explain 
 the query plans.
 ---

 Key: PIG-1401
 URL: https://issues.apache.org/jira/browse/PIG-1401
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.6.0, 0.7.0
Reporter: Pradeep Kamath
Assignee: Pradeep Kamath
 Fix For: 0.8.0

 Attachments: PIG-1401-2.patch, PIG-1401-3.patch, PIG-1401.patch


 explain -script script file executes grunt commands like run/dump/copy 
 etc - explain -script should not execute any grunt command and only explain 
 the query plans.
 Note: explain alias statement in the script will still cause all grunt 
 commands upto the explain to be executed. This issue only fixes the behavior 
 of explain -script script file wherein any grunt commands like run, 
 dump, copy, fs .. present in the supplied script file will need to be 
 ignored.
 This should be documented in the release in which this jira will be resolved.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (PIG-1401) explain -script script file executes grunt commands like run/dump/copy etc - explain -script should not execute any grunt command and only explain the query plans.


[ 
https://issues.apache.org/jira/browse/PIG-1401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12863859#action_12863859
 ] 

Pradeep Kamath commented on PIG-1401:
-

The release audit warning is due to the new test script file added in the patch 
and can be ignored - the patch is ready for review.

 explain -script script file executes grunt commands like run/dump/copy 
 etc - explain -script should not execute any grunt command and only explain 
 the query plans.
 ---

 Key: PIG-1401
 URL: https://issues.apache.org/jira/browse/PIG-1401
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.6.0, 0.7.0
Reporter: Pradeep Kamath
Assignee: Pradeep Kamath
 Fix For: 0.8.0

 Attachments: PIG-1401-2.patch, PIG-1401-3.patch, PIG-1401.patch


 explain -script script file executes grunt commands like run/dump/copy 
 etc - explain -script should not execute any grunt command and only explain 
 the query plans.
 Note: explain alias statement in the script will still cause all grunt 
 commands upto the explain to be executed. This issue only fixes the behavior 
 of explain -script script file wherein any grunt commands like run, 
 dump, copy, fs .. present in the supplied script file will need to be 
 ignored.
 This should be documented in the release in which this jira will be resolved.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (PIG-1378) har url not usable in Pig scripts


[ 
https://issues.apache.org/jira/browse/PIG-1378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12863997#action_12863997
 ] 

Pradeep Kamath commented on PIG-1378:
-

Spoke with a developer on the hadoop team to confirm that this is an issue with 
Hadoop code fixed on the hadoop trunk ( 
https://issues.apache.org/jira/browse/MAPREDUCE-1522). Until that goes into a 
hadoop release which is used by pig, this will remain an issue - not sure if we 
should keep this jira open until that point - am fine if we should.

 har url not usable in Pig scripts
 -

 Key: PIG-1378
 URL: https://issues.apache.org/jira/browse/PIG-1378
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.7.0
Reporter: Viraj Bhat
Assignee: Pradeep Kamath
 Fix For: 0.8.0

 Attachments: PIG-1378-2.patch, PIG-1378-3.patch, PIG-1378-4.patch, 
 PIG-1378.patch


 I am trying to use har (Hadoop Archives) in my Pig script.
 I can use them through the HDFS shell
 {noformat}
 $hadoop fs -ls 'har:///user/viraj/project/subproject/files/size/data'
 Found 1 items
 -rw---   5 viraj users1537234 2010-04-14 09:49 
 user/viraj/project/subproject/files/size/data/part-1
 {noformat}
 Using similar URL's in grunt yields
 {noformat}
 grunt a = load 'har:///user/viraj/project/subproject/files/size/data'; 
 grunt dump a;
 {noformat}
 {noformat}
 2010-04-14 22:08:48,814 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 
 2998: Unhandled internal error. 
 org.apache.pig.impl.logicalLayer.FrontendException: ERROR 0: Incompatible 
 file URI scheme: har : hdfs
 2010-04-14 22:08:48,814 [main] WARN  org.apache.pig.tools.grunt.Grunt - There 
 is no log file to write to.
 2010-04-14 22:08:48,814 [main] ERROR org.apache.pig.tools.grunt.Grunt - 
 java.lang.Error: org.apache.pig.impl.logicalLayer.FrontendException: ERROR 0: 
 Incompatible file URI scheme: har : hdfs
 at 
 org.apache.pig.impl.logicalLayer.parser.QueryParser.LoadClause(QueryParser.java:1483)
 at 
 org.apache.pig.impl.logicalLayer.parser.QueryParser.BaseExpr(QueryParser.java:1245)
 at 
 org.apache.pig.impl.logicalLayer.parser.QueryParser.Expr(QueryParser.java:911)
 at 
 org.apache.pig.impl.logicalLayer.parser.QueryParser.Parse(QueryParser.java:700)
 at 
 org.apache.pig.impl.logicalLayer.LogicalPlanBuilder.parse(LogicalPlanBuilder.java:63)
 at org.apache.pig.PigServer$Graph.parseQuery(PigServer.java:1164)
 at org.apache.pig.PigServer$Graph.registerQuery(PigServer.java:1114)
 at org.apache.pig.PigServer.registerQuery(PigServer.java:425)
 at 
 org.apache.pig.tools.grunt.GruntParser.processPig(GruntParser.java:737)
 at 
 org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:324)
 at 
 org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:162)
 at 
 org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:138)
 at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:75)
 at org.apache.pig.Main.main(Main.java:357)
 Caused by: org.apache.pig.impl.logicalLayer.FrontendException: ERROR 0: 
 Incompatible file URI scheme: har : hdfs
 at org.apache.pig.LoadFunc.getAbsolutePath(LoadFunc.java:249)
 at org.apache.pig.LoadFunc.relativeToAbsolutePath(LoadFunc.java:62)
 at 
 org.apache.pig.impl.logicalLayer.parser.QueryParser.LoadClause(QueryParser.java:1472)
 ... 13 more
 {noformat}
 According to Jira http://issues.apache.org/jira/browse/PIG-1234 I try the 
 following as stated in the original description
 {noformat}
 grunt a = load 
 'har://namenode-location/user/viraj/project/subproject/files/size/data'; 
 grunt dump a;
 {noformat}
 {noformat}
 Caused by: org.apache.pig.backend.executionengine.ExecException: ERROR 2118: 
 Unable to create input splits for: 
 har://namenode-location/user/viraj/project/subproject/files/size/data'; 
 ... 8 more
 Caused by: java.io.IOException: No FileSystem for scheme: namenode-location
 at .apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1375)
 at .apache.hadoop.fs.FileSystem.access(200(FileSystem.java:66)
 at .apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1390)
 at .apache.hadoop.fs.FileSystem.get(FileSystem.java:196)
 at .apache.hadoop.fs.HarFileSystem.initialize(HarFileSystem.java:104)
 at .apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1378)
 at .apache.hadoop.fs.FileSystem.get(FileSystem.java:193)
 at .apache.hadoop.fs.Path.getFileSystem(Path.java:175)
 at 
 .apache.hadoop.mapreduce.lib.input.FileInputFormat.listStatus(FileInputFormat.java:208

[jira] Updated: (PIG-1211) Pig script runs half way after which it reports syntax error


 [ 
https://issues.apache.org/jira/browse/PIG-1211?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pradeep Kamath updated PIG-1211:


Status: Patch Available  (was: Open)

 Pig script runs half way after which it reports syntax error
 

 Key: PIG-1211
 URL: https://issues.apache.org/jira/browse/PIG-1211
 Project: Pig
  Issue Type: Improvement
  Components: impl
Affects Versions: 0.6.0
Reporter: Viraj Bhat
 Fix For: 0.8.0

 Attachments: PIG-1211.patch


 I have a Pig script which is structured in the following way
 {code}
 register cp.jar
 dataset = load '/data/dataset/' using PigStorage('\u0001') as (col1, col2, 
 col3, col4, col5);
 filtered_dataset = filter dataset by (col1 == 1);
 proj_filtered_dataset = foreach filtered_dataset generate col2, col3;
 rmf $output1;
 store proj_filtered_dataset into '$output1' using PigStorage();
 second_stream = foreach filtered_dataset  generate col2, col4, col5;
 group_second_stream = group second_stream by col4;
 output2 = foreach group_second_stream {
  a =  second_stream.col2
  b =   distinct second_stream.col5;
  c = order b by $0;
  generate 1 as key, group as keyword, MYUDF(c, 100) as finalcalc;
 }
 rmf  $output2;
 --syntax error here
 store output2 to '$output2' using PigStorage();
 {code}
 I run this script using the Multi-query option, it runs successfully till the 
 first store but later fails with a syntax error. 
 The usage of HDFS option, rmf causes the first store to execute. 
 The only option the I have is to run an explain before running his script 
 grunt explain -script myscript.pig -out explain.out
 or moving the rmf statements to the top of the script
 Here are some questions:
 a) Can we have an option to do something like checkscript instead of 
 explain to get the same syntax error?  In this way I can ensure that I do not 
 run for 3-4 hours before encountering a syntax error
 b) Can pig not figure out a way to re-order the rmf statements since all the 
 store directories are variables
 Thanks
 Viraj

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (PIG-1211) Pig script runs half way after which it reports syntax error


 [ 
https://issues.apache.org/jira/browse/PIG-1211?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pradeep Kamath updated PIG-1211:


Attachment: PIG-1211.patch

Attached patch addresses the issue by adding support for a check script option. 
For this purpose, the -c command line option is reused thus fixing 
https://issues.apache.org/jira/browse/PIG-1382 (Command line option -c doesn't 
work ...Currently this option is not used...).

The implementation of this check option piggybacks on explain -script and 
just modifies the GruntParser code to not output the explain output. 

 Pig script runs half way after which it reports syntax error
 

 Key: PIG-1211
 URL: https://issues.apache.org/jira/browse/PIG-1211
 Project: Pig
  Issue Type: Improvement
  Components: impl
Affects Versions: 0.6.0
Reporter: Viraj Bhat
 Fix For: 0.8.0

 Attachments: PIG-1211.patch


 I have a Pig script which is structured in the following way
 {code}
 register cp.jar
 dataset = load '/data/dataset/' using PigStorage('\u0001') as (col1, col2, 
 col3, col4, col5);
 filtered_dataset = filter dataset by (col1 == 1);
 proj_filtered_dataset = foreach filtered_dataset generate col2, col3;
 rmf $output1;
 store proj_filtered_dataset into '$output1' using PigStorage();
 second_stream = foreach filtered_dataset  generate col2, col4, col5;
 group_second_stream = group second_stream by col4;
 output2 = foreach group_second_stream {
  a =  second_stream.col2
  b =   distinct second_stream.col5;
  c = order b by $0;
  generate 1 as key, group as keyword, MYUDF(c, 100) as finalcalc;
 }
 rmf  $output2;
 --syntax error here
 store output2 to '$output2' using PigStorage();
 {code}
 I run this script using the Multi-query option, it runs successfully till the 
 first store but later fails with a syntax error. 
 The usage of HDFS option, rmf causes the first store to execute. 
 The only option the I have is to run an explain before running his script 
 grunt explain -script myscript.pig -out explain.out
 or moving the rmf statements to the top of the script
 Here are some questions:
 a) Can we have an option to do something like checkscript instead of 
 explain to get the same syntax error?  In this way I can ensure that I do not 
 run for 3-4 hours before encountering a syntax error
 b) Can pig not figure out a way to re-order the rmf statements since all the 
 store directories are variables
 Thanks
 Viraj

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Resolved: (PIG-1382) Command line option -c doesn't work


 [ 
https://issues.apache.org/jira/browse/PIG-1382?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pradeep Kamath resolved PIG-1382.
-

Hadoop Flags: [Incompatible change]
Release Note: -c (-cluster) was earlier documented as the option to provide 
cluster information - this was not being used in the Pig code though - with 
PIG-1211, -c is being reused as the option to check syntax of the pig script 
Assignee: Pradeep Kamath
  Resolution: Fixed

Fixed through 
https://issues.apache.org/jira/browse/PIG-1211?focusedCommentId=12864002page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12864002

 Command line option -c doesn't work
 ---

 Key: PIG-1382
 URL: https://issues.apache.org/jira/browse/PIG-1382
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.6.0
Reporter: Richard Ding
Assignee: Pradeep Kamath
 Fix For: 0.8.0


 Currently this option is not used, but it's documented:
 -c, -cluster clustername, kryptonite is default
 We should either remove it from documentation or find someway to use it.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Created: (PIG-1401) explain -script script file executes grunt commands like run/dump/copy etc - explain -script should not execute any grunt command and only explain the query plans.

explain -script script file executes grunt commands like run/dump/copy etc 
- explain -script should not execute any grunt command and only explain the 
query plans.
---

 Key: PIG-1401
 URL: https://issues.apache.org/jira/browse/PIG-1401
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.6.0, 0.7.0
Reporter: Pradeep Kamath
Assignee: Pradeep Kamath
 Fix For: 0.8.0


explain -script script file executes grunt commands like run/dump/copy etc 
- explain -script should not execute any grunt command and only explain the 
query plans.

Note: explain alias statement in the script will still cause all grunt 
commands upto the explain to be executed. This issue only fixes the behavior of 
explain -script script file wherein any grunt commands like run, dump, 
copy, fs .. present in the supplied script file will need to be ignored.

This should be documented in the release in which this jira will be resolved.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (PIG-740) Incorrect line number is generated when a string with double quotes is used instead of single quotes and is passed to UDF


 [ 
https://issues.apache.org/jira/browse/PIG-740?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pradeep Kamath updated PIG-740:
---

  Status: Resolved  (was: Patch Available)
Hadoop Flags: [Reviewed]
  Resolution: Fixed

The javac warning is due to generated javacc code and cannot be avoided. I ran 
all unit tests on my local machine and they passed - patch committed to trunk.

 Incorrect line number is generated when a string  with double quotes is used 
 instead of single quotes and is passed to UDF
 --

 Key: PIG-740
 URL: https://issues.apache.org/jira/browse/PIG-740
 Project: Pig
  Issue Type: Bug
  Components: grunt
Affects Versions: 0.3.0
Reporter: Viraj Bhat
Assignee: Pradeep Kamath
Priority: Minor
 Fix For: 0.8.0

 Attachments: PIG-740.patch


 Consider the Pig script with the error that a String with double quotes 
 {code}www\\.{code} is used instead of a single quote {code}'www\\.'{code} 
 in the UDF string.REPLACEALL()
 {code}
 register string-2.0.jar;
 A = load 'inputdata' using PigStorage() as ( curr_searchQuery );
 B = foreach A {
 domain = string.REPLACEALL(curr_searchQuery,^www\\.,'');
 generate
 domain;
 };
 dump B;
 {code}
 I get the following error message where Line 11 points to the end of file. 
 The error message should point to Line 5.
 ===
 2009-03-31 01:33:38,403 [main] INFO  
 org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting 
 to hadoop file system at: hdfs://localhost:9000
 2009-03-31 01:33:39,168 [main] INFO  
 org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting 
 to map-reduce job tracker at: localhost:9001
 2009-03-31 01:33:39,589 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 
 1000: Error during parsing. Lexical error at line 11, column 0.  Encountered: 
 EOF after : 
 Details at logfile: /home/viraj/pig-svn/trunk/pig_1238463218046.log
 ===
 The log file contains the following contents
 ===
 ERROR 1000: Error during parsing. Lexical error at line 11, column 0.  
 Encountered: EOF after : 
 org.apache.pig.tools.pigscript.parser.TokenMgrError: Lexical error at line 
 11, column 0.  Encountered: EOF after : 
 at 
 org.apache.pig.tools.pigscript.parser.PigScriptParserTokenManager.getNextToken(PigScriptParserTokenManager.java:2739)
 at 
 org.apache.pig.tools.pigscript.parser.PigScriptParser.jj_ntk(PigScriptParser.java:778)
 at 
 org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:89)
 at 
 org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:99)
 at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:88)
 at org.apache.pig.Main.main(Main.java:352)
 ===

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (PIG-1401) explain -script script file executes grunt commands like run/dump/copy etc - explain -script should not execute any grunt command and only explain the query plans.


 [ 
https://issues.apache.org/jira/browse/PIG-1401?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pradeep Kamath updated PIG-1401:


Attachment: PIG-1401.patch

Attached patch addresses the issue by checking internal state in GruntParser to 
check if the current execution is in explain -script mode and if so, ignores 
grunt commands like run, copy etc.

 explain -script script file executes grunt commands like run/dump/copy 
 etc - explain -script should not execute any grunt command and only explain 
 the query plans.
 ---

 Key: PIG-1401
 URL: https://issues.apache.org/jira/browse/PIG-1401
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.6.0, 0.7.0
Reporter: Pradeep Kamath
Assignee: Pradeep Kamath
 Fix For: 0.8.0

 Attachments: PIG-1401.patch


 explain -script script file executes grunt commands like run/dump/copy 
 etc - explain -script should not execute any grunt command and only explain 
 the query plans.
 Note: explain alias statement in the script will still cause all grunt 
 commands upto the explain to be executed. This issue only fixes the behavior 
 of explain -script script file wherein any grunt commands like run, 
 dump, copy, fs .. present in the supplied script file will need to be 
 ignored.
 This should be documented in the release in which this jira will be resolved.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (PIG-1401) explain -script script file executes grunt commands like run/dump/copy etc - explain -script should not execute any grunt command and only explain the query plans.


 [ 
https://issues.apache.org/jira/browse/PIG-1401?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pradeep Kamath updated PIG-1401:


Status: Patch Available  (was: Open)

 explain -script script file executes grunt commands like run/dump/copy 
 etc - explain -script should not execute any grunt command and only explain 
 the query plans.
 ---

 Key: PIG-1401
 URL: https://issues.apache.org/jira/browse/PIG-1401
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.6.0, 0.7.0
Reporter: Pradeep Kamath
Assignee: Pradeep Kamath
 Fix For: 0.8.0

 Attachments: PIG-1401.patch


 explain -script script file executes grunt commands like run/dump/copy 
 etc - explain -script should not execute any grunt command and only explain 
 the query plans.
 Note: explain alias statement in the script will still cause all grunt 
 commands upto the explain to be executed. This issue only fixes the behavior 
 of explain -script script file wherein any grunt commands like run, 
 dump, copy, fs .. present in the supplied script file will need to be 
 ignored.
 This should be documented in the release in which this jira will be resolved.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (PIG-1398) Marking Pig interfaces for org.apache.pig.data package


[ 
https://issues.apache.org/jira/browse/PIG-1398?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12863502#action_12863502
 ] 

Pradeep Kamath commented on PIG-1398:
-

+1

 Marking Pig interfaces for org.apache.pig.data package
 --

 Key: PIG-1398
 URL: https://issues.apache.org/jira/browse/PIG-1398
 Project: Pig
  Issue Type: Sub-task
  Components: documentation
Affects Versions: 0.8.0
Reporter: Alan Gates
Assignee: Alan Gates
Priority: Minor
 Fix For: 0.8.0

 Attachments: PIG-1398.patch


 Marking Pig interfaces for stability and audience, as well as javadoc 
 cleanup, for the data package.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (PIG-1401) explain -script script file executes grunt commands like run/dump/copy etc - explain -script should not execute any grunt command and only explain the query plans.


 [ 
https://issues.apache.org/jira/browse/PIG-1401?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pradeep Kamath updated PIG-1401:


Status: Open  (was: Patch Available)

The patch did not contain the test script file- will attach new patch shortly

 explain -script script file executes grunt commands like run/dump/copy 
 etc - explain -script should not execute any grunt command and only explain 
 the query plans.
 ---

 Key: PIG-1401
 URL: https://issues.apache.org/jira/browse/PIG-1401
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.6.0, 0.7.0
Reporter: Pradeep Kamath
Assignee: Pradeep Kamath
 Fix For: 0.8.0

 Attachments: PIG-1401.patch


 explain -script script file executes grunt commands like run/dump/copy 
 etc - explain -script should not execute any grunt command and only explain 
 the query plans.
 Note: explain alias statement in the script will still cause all grunt 
 commands upto the explain to be executed. This issue only fixes the behavior 
 of explain -script script file wherein any grunt commands like run, 
 dump, copy, fs .. present in the supplied script file will need to be 
 ignored.
 This should be documented in the release in which this jira will be resolved.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (PIG-1401) explain -script script file executes grunt commands like run/dump/copy etc - explain -script should not execute any grunt command and only explain the query plans.

[
https://issues.apache.org/jira/browse/PIG-1401?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Pradeep Kamath updated PIG-1401:

Attachment: PIG-1401-2.patch

New patch includes test script file needed for the unit test. It also has some
changes in code to not call executeBatch() in explain -script mode. Also fs
.. commands also invoke executeBatch() now - this was missing but is required
since the fs command could be a delete/move/copy command which should result in
an execution of the current batch just like the rm, mv and cp grunt
statements do.

explain -script script file executes grunt commands like run/dump/copy
etc - explain -script should not execute any grunt command and only explain
the query plans.
---

Key: PIG-1401
URL: https://issues.apache.org/jira/browse/PIG-1401
Project: Pig
Issue Type: Bug
Affects Versions: 0.6.0, 0.7.0
Reporter: Pradeep Kamath
Assignee: Pradeep Kamath
Fix For: 0.8.0

Attachments: PIG-1401-2.patch, PIG-1401.patch

explain -script script file executes grunt commands like run/dump/copy
etc - explain -script should not execute any grunt command and only explain
the query plans.
Note: explain alias statement in the script will still cause all grunt
commands upto the explain to be executed. This issue only fixes the behavior
of explain -script script file wherein any grunt commands like run,
dump, copy, fs .. present in the supplied script file will need to be
ignored.
This should be documented in the release in which this jira will be resolved.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (PIG-1401) explain -script script file executes grunt commands like run/dump/copy etc - explain -script should not execute any grunt command and only explain the query plans.


 [ 
https://issues.apache.org/jira/browse/PIG-1401?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pradeep Kamath updated PIG-1401:


Status: Patch Available  (was: Open)

 explain -script script file executes grunt commands like run/dump/copy 
 etc - explain -script should not execute any grunt command and only explain 
 the query plans.
 ---

 Key: PIG-1401
 URL: https://issues.apache.org/jira/browse/PIG-1401
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.6.0, 0.7.0
Reporter: Pradeep Kamath
Assignee: Pradeep Kamath
 Fix For: 0.8.0

 Attachments: PIG-1401-2.patch, PIG-1401.patch


 explain -script script file executes grunt commands like run/dump/copy 
 etc - explain -script should not execute any grunt command and only explain 
 the query plans.
 Note: explain alias statement in the script will still cause all grunt 
 commands upto the explain to be executed. This issue only fixes the behavior 
 of explain -script script file wherein any grunt commands like run, 
 dump, copy, fs .. present in the supplied script file will need to be 
 ignored.
 This should be documented in the release in which this jira will be resolved.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (PIG-1378) har url not usable in Pig scripts

2010-04-30 Thread Pradeep Kamath (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-1378?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pradeep Kamath updated PIG-1378:


  Status: Resolved  (was: Patch Available)
Hadoop Flags: [Reviewed]
  Resolution: Fixed

The hudson test failures seem to be due to some temporary env. issue - I ran 
all unit tests locally and the run was successful - patch committed to trunk.

 har url not usable in Pig scripts
 -

 Key: PIG-1378
 URL: https://issues.apache.org/jira/browse/PIG-1378
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.7.0
Reporter: Viraj Bhat
Assignee: Pradeep Kamath
 Fix For: 0.8.0

 Attachments: PIG-1378-2.patch, PIG-1378-3.patch, PIG-1378-4.patch, 
 PIG-1378.patch


 I am trying to use har (Hadoop Archives) in my Pig script.
 I can use them through the HDFS shell
 {noformat}
 $hadoop fs -ls 'har:///user/viraj/project/subproject/files/size/data'
 Found 1 items
 -rw---   5 viraj users1537234 2010-04-14 09:49 
 user/viraj/project/subproject/files/size/data/part-1
 {noformat}
 Using similar URL's in grunt yields
 {noformat}
 grunt a = load 'har:///user/viraj/project/subproject/files/size/data'; 
 grunt dump a;
 {noformat}
 {noformat}
 2010-04-14 22:08:48,814 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 
 2998: Unhandled internal error. 
 org.apache.pig.impl.logicalLayer.FrontendException: ERROR 0: Incompatible 
 file URI scheme: har : hdfs
 2010-04-14 22:08:48,814 [main] WARN  org.apache.pig.tools.grunt.Grunt - There 
 is no log file to write to.
 2010-04-14 22:08:48,814 [main] ERROR org.apache.pig.tools.grunt.Grunt - 
 java.lang.Error: org.apache.pig.impl.logicalLayer.FrontendException: ERROR 0: 
 Incompatible file URI scheme: har : hdfs
 at 
 org.apache.pig.impl.logicalLayer.parser.QueryParser.LoadClause(QueryParser.java:1483)
 at 
 org.apache.pig.impl.logicalLayer.parser.QueryParser.BaseExpr(QueryParser.java:1245)
 at 
 org.apache.pig.impl.logicalLayer.parser.QueryParser.Expr(QueryParser.java:911)
 at 
 org.apache.pig.impl.logicalLayer.parser.QueryParser.Parse(QueryParser.java:700)
 at 
 org.apache.pig.impl.logicalLayer.LogicalPlanBuilder.parse(LogicalPlanBuilder.java:63)
 at org.apache.pig.PigServer$Graph.parseQuery(PigServer.java:1164)
 at org.apache.pig.PigServer$Graph.registerQuery(PigServer.java:1114)
 at org.apache.pig.PigServer.registerQuery(PigServer.java:425)
 at 
 org.apache.pig.tools.grunt.GruntParser.processPig(GruntParser.java:737)
 at 
 org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:324)
 at 
 org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:162)
 at 
 org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:138)
 at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:75)
 at org.apache.pig.Main.main(Main.java:357)
 Caused by: org.apache.pig.impl.logicalLayer.FrontendException: ERROR 0: 
 Incompatible file URI scheme: har : hdfs
 at org.apache.pig.LoadFunc.getAbsolutePath(LoadFunc.java:249)
 at org.apache.pig.LoadFunc.relativeToAbsolutePath(LoadFunc.java:62)
 at 
 org.apache.pig.impl.logicalLayer.parser.QueryParser.LoadClause(QueryParser.java:1472)
 ... 13 more
 {noformat}
 According to Jira http://issues.apache.org/jira/browse/PIG-1234 I try the 
 following as stated in the original description
 {noformat}
 grunt a = load 
 'har://namenode-location/user/viraj/project/subproject/files/size/data'; 
 grunt dump a;
 {noformat}
 {noformat}
 Caused by: org.apache.pig.backend.executionengine.ExecException: ERROR 2118: 
 Unable to create input splits for: 
 har://namenode-location/user/viraj/project/subproject/files/size/data'; 
 ... 8 more
 Caused by: java.io.IOException: No FileSystem for scheme: namenode-location
 at .apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1375)
 at .apache.hadoop.fs.FileSystem.access(200(FileSystem.java:66)
 at .apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1390)
 at .apache.hadoop.fs.FileSystem.get(FileSystem.java:196)
 at .apache.hadoop.fs.HarFileSystem.initialize(HarFileSystem.java:104)
 at .apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1378)
 at .apache.hadoop.fs.FileSystem.get(FileSystem.java:193)
 at .apache.hadoop.fs.Path.getFileSystem(Path.java:175)
 at 
 .apache.hadoop.mapreduce.lib.input.FileInputFormat.listStatus(FileInputFormat.java:208)
 at 
 .apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigTextInputFormat.listStatus(PigTextInputFormat.java:36)
 at 
 .apache.hadoop.mapreduce.lib.input.FileInputFormat.getSplits(FileInputFormat.java:246

[jira] Updated: (PIG-740) Incorrect line number is generated when a string with double quotes is used instead of single quotes and is passed to UDF

2010-04-30 Thread Pradeep Kamath (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-740?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pradeep Kamath updated PIG-740:
---

Assignee: Pradeep Kamath

 Incorrect line number is generated when a string  with double quotes is used 
 instead of single quotes and is passed to UDF
 --

 Key: PIG-740
 URL: https://issues.apache.org/jira/browse/PIG-740
 Project: Pig
  Issue Type: Bug
  Components: grunt
Affects Versions: 0.3.0
Reporter: Viraj Bhat
Assignee: Pradeep Kamath
Priority: Minor
 Fix For: 0.8.0

 Attachments: PIG-740.patch


 Consider the Pig script with the error that a String with double quotes 
 {code}www\\.{code} is used instead of a single quote {code}'www\\.'{code} 
 in the UDF string.REPLACEALL()
 {code}
 register string-2.0.jar;
 A = load 'inputdata' using PigStorage() as ( curr_searchQuery );
 B = foreach A {
 domain = string.REPLACEALL(curr_searchQuery,^www\\.,'');
 generate
 domain;
 };
 dump B;
 {code}
 I get the following error message where Line 11 points to the end of file. 
 The error message should point to Line 5.
 ===
 2009-03-31 01:33:38,403 [main] INFO  
 org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting 
 to hadoop file system at: hdfs://localhost:9000
 2009-03-31 01:33:39,168 [main] INFO  
 org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting 
 to map-reduce job tracker at: localhost:9001
 2009-03-31 01:33:39,589 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 
 1000: Error during parsing. Lexical error at line 11, column 0.  Encountered: 
 EOF after : 
 Details at logfile: /home/viraj/pig-svn/trunk/pig_1238463218046.log
 ===
 The log file contains the following contents
 ===
 ERROR 1000: Error during parsing. Lexical error at line 11, column 0.  
 Encountered: EOF after : 
 org.apache.pig.tools.pigscript.parser.TokenMgrError: Lexical error at line 
 11, column 0.  Encountered: EOF after : 
 at 
 org.apache.pig.tools.pigscript.parser.PigScriptParserTokenManager.getNextToken(PigScriptParserTokenManager.java:2739)
 at 
 org.apache.pig.tools.pigscript.parser.PigScriptParser.jj_ntk(PigScriptParser.java:778)
 at 
 org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:89)
 at 
 org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:99)
 at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:88)
 at org.apache.pig.Main.main(Main.java:352)
 ===

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (PIG-740) Incorrect line number is generated when a string with double quotes is used instead of single quotes and is passed to UDF

2010-04-30 Thread Pradeep Kamath (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-740?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pradeep Kamath updated PIG-740:
---

   Status: Patch Available  (was: Open)
Fix Version/s: 0.8.0

 Incorrect line number is generated when a string  with double quotes is used 
 instead of single quotes and is passed to UDF
 --

 Key: PIG-740
 URL: https://issues.apache.org/jira/browse/PIG-740
 Project: Pig
  Issue Type: Bug
  Components: grunt
Affects Versions: 0.3.0
Reporter: Viraj Bhat
Priority: Minor
 Fix For: 0.8.0

 Attachments: PIG-740.patch


 Consider the Pig script with the error that a String with double quotes 
 {code}www\\.{code} is used instead of a single quote {code}'www\\.'{code} 
 in the UDF string.REPLACEALL()
 {code}
 register string-2.0.jar;
 A = load 'inputdata' using PigStorage() as ( curr_searchQuery );
 B = foreach A {
 domain = string.REPLACEALL(curr_searchQuery,^www\\.,'');
 generate
 domain;
 };
 dump B;
 {code}
 I get the following error message where Line 11 points to the end of file. 
 The error message should point to Line 5.
 ===
 2009-03-31 01:33:38,403 [main] INFO  
 org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting 
 to hadoop file system at: hdfs://localhost:9000
 2009-03-31 01:33:39,168 [main] INFO  
 org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting 
 to map-reduce job tracker at: localhost:9001
 2009-03-31 01:33:39,589 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 
 1000: Error during parsing. Lexical error at line 11, column 0.  Encountered: 
 EOF after : 
 Details at logfile: /home/viraj/pig-svn/trunk/pig_1238463218046.log
 ===
 The log file contains the following contents
 ===
 ERROR 1000: Error during parsing. Lexical error at line 11, column 0.  
 Encountered: EOF after : 
 org.apache.pig.tools.pigscript.parser.TokenMgrError: Lexical error at line 
 11, column 0.  Encountered: EOF after : 
 at 
 org.apache.pig.tools.pigscript.parser.PigScriptParserTokenManager.getNextToken(PigScriptParserTokenManager.java:2739)
 at 
 org.apache.pig.tools.pigscript.parser.PigScriptParser.jj_ntk(PigScriptParser.java:778)
 at 
 org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:89)
 at 
 org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:99)
 at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:88)
 at org.apache.pig.Main.main(Main.java:352)
 ===

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (PIG-1378) har url not usable in Pig scripts


 [ 
https://issues.apache.org/jira/browse/PIG-1378?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pradeep Kamath updated PIG-1378:


Status: Patch Available  (was: Open)

 har url not usable in Pig scripts
 -

 Key: PIG-1378
 URL: https://issues.apache.org/jira/browse/PIG-1378
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.7.0
Reporter: Viraj Bhat
Assignee: Pradeep Kamath
 Fix For: 0.8.0

 Attachments: PIG-1378-2.patch, PIG-1378-3.patch, PIG-1378.patch


 I am trying to use har (Hadoop Archives) in my Pig script.
 I can use them through the HDFS shell
 {noformat}
 $hadoop fs -ls 'har:///user/viraj/project/subproject/files/size/data'
 Found 1 items
 -rw---   5 viraj users1537234 2010-04-14 09:49 
 user/viraj/project/subproject/files/size/data/part-1
 {noformat}
 Using similar URL's in grunt yields
 {noformat}
 grunt a = load 'har:///user/viraj/project/subproject/files/size/data'; 
 grunt dump a;
 {noformat}
 {noformat}
 2010-04-14 22:08:48,814 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 
 2998: Unhandled internal error. 
 org.apache.pig.impl.logicalLayer.FrontendException: ERROR 0: Incompatible 
 file URI scheme: har : hdfs
 2010-04-14 22:08:48,814 [main] WARN  org.apache.pig.tools.grunt.Grunt - There 
 is no log file to write to.
 2010-04-14 22:08:48,814 [main] ERROR org.apache.pig.tools.grunt.Grunt - 
 java.lang.Error: org.apache.pig.impl.logicalLayer.FrontendException: ERROR 0: 
 Incompatible file URI scheme: har : hdfs
 at 
 org.apache.pig.impl.logicalLayer.parser.QueryParser.LoadClause(QueryParser.java:1483)
 at 
 org.apache.pig.impl.logicalLayer.parser.QueryParser.BaseExpr(QueryParser.java:1245)
 at 
 org.apache.pig.impl.logicalLayer.parser.QueryParser.Expr(QueryParser.java:911)
 at 
 org.apache.pig.impl.logicalLayer.parser.QueryParser.Parse(QueryParser.java:700)
 at 
 org.apache.pig.impl.logicalLayer.LogicalPlanBuilder.parse(LogicalPlanBuilder.java:63)
 at org.apache.pig.PigServer$Graph.parseQuery(PigServer.java:1164)
 at org.apache.pig.PigServer$Graph.registerQuery(PigServer.java:1114)
 at org.apache.pig.PigServer.registerQuery(PigServer.java:425)
 at 
 org.apache.pig.tools.grunt.GruntParser.processPig(GruntParser.java:737)
 at 
 org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:324)
 at 
 org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:162)
 at 
 org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:138)
 at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:75)
 at org.apache.pig.Main.main(Main.java:357)
 Caused by: org.apache.pig.impl.logicalLayer.FrontendException: ERROR 0: 
 Incompatible file URI scheme: har : hdfs
 at org.apache.pig.LoadFunc.getAbsolutePath(LoadFunc.java:249)
 at org.apache.pig.LoadFunc.relativeToAbsolutePath(LoadFunc.java:62)
 at 
 org.apache.pig.impl.logicalLayer.parser.QueryParser.LoadClause(QueryParser.java:1472)
 ... 13 more
 {noformat}
 According to Jira http://issues.apache.org/jira/browse/PIG-1234 I try the 
 following as stated in the original description
 {noformat}
 grunt a = load 
 'har://namenode-location/user/viraj/project/subproject/files/size/data'; 
 grunt dump a;
 {noformat}
 {noformat}
 Caused by: org.apache.pig.backend.executionengine.ExecException: ERROR 2118: 
 Unable to create input splits for: 
 har://namenode-location/user/viraj/project/subproject/files/size/data'; 
 ... 8 more
 Caused by: java.io.IOException: No FileSystem for scheme: namenode-location
 at .apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1375)
 at .apache.hadoop.fs.FileSystem.access(200(FileSystem.java:66)
 at .apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1390)
 at .apache.hadoop.fs.FileSystem.get(FileSystem.java:196)
 at .apache.hadoop.fs.HarFileSystem.initialize(HarFileSystem.java:104)
 at .apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1378)
 at .apache.hadoop.fs.FileSystem.get(FileSystem.java:193)
 at .apache.hadoop.fs.Path.getFileSystem(Path.java:175)
 at 
 .apache.hadoop.mapreduce.lib.input.FileInputFormat.listStatus(FileInputFormat.java:208)
 at 
 .apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigTextInputFormat.listStatus(PigTextInputFormat.java:36)
 at 
 .apache.hadoop.mapreduce.lib.input.FileInputFormat.getSplits(FileInputFormat.java:246)
 at 
 .apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.getSplits(PigInputFormat.java:245)
 {noformat}
 Viraj

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment

[jira] Updated: (PIG-1378) har url not usable in Pig scripts


 [ 
https://issues.apache.org/jira/browse/PIG-1378?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pradeep Kamath updated PIG-1378:


Attachment: PIG-1378-3.patch

Looks like the golden file change from the last patch was not correct - updated 
patch with just that change attached - all unit tests ran successfully locally 
with this new patch - patch is ready for review

 har url not usable in Pig scripts
 -

 Key: PIG-1378
 URL: https://issues.apache.org/jira/browse/PIG-1378
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.7.0
Reporter: Viraj Bhat
Assignee: Pradeep Kamath
 Fix For: 0.8.0

 Attachments: PIG-1378-2.patch, PIG-1378-3.patch, PIG-1378.patch


 I am trying to use har (Hadoop Archives) in my Pig script.
 I can use them through the HDFS shell
 {noformat}
 $hadoop fs -ls 'har:///user/viraj/project/subproject/files/size/data'
 Found 1 items
 -rw---   5 viraj users1537234 2010-04-14 09:49 
 user/viraj/project/subproject/files/size/data/part-1
 {noformat}
 Using similar URL's in grunt yields
 {noformat}
 grunt a = load 'har:///user/viraj/project/subproject/files/size/data'; 
 grunt dump a;
 {noformat}
 {noformat}
 2010-04-14 22:08:48,814 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 
 2998: Unhandled internal error. 
 org.apache.pig.impl.logicalLayer.FrontendException: ERROR 0: Incompatible 
 file URI scheme: har : hdfs
 2010-04-14 22:08:48,814 [main] WARN  org.apache.pig.tools.grunt.Grunt - There 
 is no log file to write to.
 2010-04-14 22:08:48,814 [main] ERROR org.apache.pig.tools.grunt.Grunt - 
 java.lang.Error: org.apache.pig.impl.logicalLayer.FrontendException: ERROR 0: 
 Incompatible file URI scheme: har : hdfs
 at 
 org.apache.pig.impl.logicalLayer.parser.QueryParser.LoadClause(QueryParser.java:1483)
 at 
 org.apache.pig.impl.logicalLayer.parser.QueryParser.BaseExpr(QueryParser.java:1245)
 at 
 org.apache.pig.impl.logicalLayer.parser.QueryParser.Expr(QueryParser.java:911)
 at 
 org.apache.pig.impl.logicalLayer.parser.QueryParser.Parse(QueryParser.java:700)
 at 
 org.apache.pig.impl.logicalLayer.LogicalPlanBuilder.parse(LogicalPlanBuilder.java:63)
 at org.apache.pig.PigServer$Graph.parseQuery(PigServer.java:1164)
 at org.apache.pig.PigServer$Graph.registerQuery(PigServer.java:1114)
 at org.apache.pig.PigServer.registerQuery(PigServer.java:425)
 at 
 org.apache.pig.tools.grunt.GruntParser.processPig(GruntParser.java:737)
 at 
 org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:324)
 at 
 org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:162)
 at 
 org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:138)
 at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:75)
 at org.apache.pig.Main.main(Main.java:357)
 Caused by: org.apache.pig.impl.logicalLayer.FrontendException: ERROR 0: 
 Incompatible file URI scheme: har : hdfs
 at org.apache.pig.LoadFunc.getAbsolutePath(LoadFunc.java:249)
 at org.apache.pig.LoadFunc.relativeToAbsolutePath(LoadFunc.java:62)
 at 
 org.apache.pig.impl.logicalLayer.parser.QueryParser.LoadClause(QueryParser.java:1472)
 ... 13 more
 {noformat}
 According to Jira http://issues.apache.org/jira/browse/PIG-1234 I try the 
 following as stated in the original description
 {noformat}
 grunt a = load 
 'har://namenode-location/user/viraj/project/subproject/files/size/data'; 
 grunt dump a;
 {noformat}
 {noformat}
 Caused by: org.apache.pig.backend.executionengine.ExecException: ERROR 2118: 
 Unable to create input splits for: 
 har://namenode-location/user/viraj/project/subproject/files/size/data'; 
 ... 8 more
 Caused by: java.io.IOException: No FileSystem for scheme: namenode-location
 at .apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1375)
 at .apache.hadoop.fs.FileSystem.access(200(FileSystem.java:66)
 at .apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1390)
 at .apache.hadoop.fs.FileSystem.get(FileSystem.java:196)
 at .apache.hadoop.fs.HarFileSystem.initialize(HarFileSystem.java:104)
 at .apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1378)
 at .apache.hadoop.fs.FileSystem.get(FileSystem.java:193)
 at .apache.hadoop.fs.Path.getFileSystem(Path.java:175)
 at 
 .apache.hadoop.mapreduce.lib.input.FileInputFormat.listStatus(FileInputFormat.java:208)
 at 
 .apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigTextInputFormat.listStatus(PigTextInputFormat.java:36)
 at 
 .apache.hadoop.mapreduce.lib.input.FileInputFormat.getSplits(FileInputFormat.java:246

[jira] Updated: (PIG-1378) har url not usable in Pig scripts


 [ 
https://issues.apache.org/jira/browse/PIG-1378?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pradeep Kamath updated PIG-1378:


Status: Open  (was: Patch Available)

 har url not usable in Pig scripts
 -

 Key: PIG-1378
 URL: https://issues.apache.org/jira/browse/PIG-1378
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.7.0
Reporter: Viraj Bhat
Assignee: Pradeep Kamath
 Fix For: 0.8.0

 Attachments: PIG-1378-2.patch, PIG-1378-3.patch, PIG-1378.patch


 I am trying to use har (Hadoop Archives) in my Pig script.
 I can use them through the HDFS shell
 {noformat}
 $hadoop fs -ls 'har:///user/viraj/project/subproject/files/size/data'
 Found 1 items
 -rw---   5 viraj users1537234 2010-04-14 09:49 
 user/viraj/project/subproject/files/size/data/part-1
 {noformat}
 Using similar URL's in grunt yields
 {noformat}
 grunt a = load 'har:///user/viraj/project/subproject/files/size/data'; 
 grunt dump a;
 {noformat}
 {noformat}
 2010-04-14 22:08:48,814 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 
 2998: Unhandled internal error. 
 org.apache.pig.impl.logicalLayer.FrontendException: ERROR 0: Incompatible 
 file URI scheme: har : hdfs
 2010-04-14 22:08:48,814 [main] WARN  org.apache.pig.tools.grunt.Grunt - There 
 is no log file to write to.
 2010-04-14 22:08:48,814 [main] ERROR org.apache.pig.tools.grunt.Grunt - 
 java.lang.Error: org.apache.pig.impl.logicalLayer.FrontendException: ERROR 0: 
 Incompatible file URI scheme: har : hdfs
 at 
 org.apache.pig.impl.logicalLayer.parser.QueryParser.LoadClause(QueryParser.java:1483)
 at 
 org.apache.pig.impl.logicalLayer.parser.QueryParser.BaseExpr(QueryParser.java:1245)
 at 
 org.apache.pig.impl.logicalLayer.parser.QueryParser.Expr(QueryParser.java:911)
 at 
 org.apache.pig.impl.logicalLayer.parser.QueryParser.Parse(QueryParser.java:700)
 at 
 org.apache.pig.impl.logicalLayer.LogicalPlanBuilder.parse(LogicalPlanBuilder.java:63)
 at org.apache.pig.PigServer$Graph.parseQuery(PigServer.java:1164)
 at org.apache.pig.PigServer$Graph.registerQuery(PigServer.java:1114)
 at org.apache.pig.PigServer.registerQuery(PigServer.java:425)
 at 
 org.apache.pig.tools.grunt.GruntParser.processPig(GruntParser.java:737)
 at 
 org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:324)
 at 
 org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:162)
 at 
 org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:138)
 at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:75)
 at org.apache.pig.Main.main(Main.java:357)
 Caused by: org.apache.pig.impl.logicalLayer.FrontendException: ERROR 0: 
 Incompatible file URI scheme: har : hdfs
 at org.apache.pig.LoadFunc.getAbsolutePath(LoadFunc.java:249)
 at org.apache.pig.LoadFunc.relativeToAbsolutePath(LoadFunc.java:62)
 at 
 org.apache.pig.impl.logicalLayer.parser.QueryParser.LoadClause(QueryParser.java:1472)
 ... 13 more
 {noformat}
 According to Jira http://issues.apache.org/jira/browse/PIG-1234 I try the 
 following as stated in the original description
 {noformat}
 grunt a = load 
 'har://namenode-location/user/viraj/project/subproject/files/size/data'; 
 grunt dump a;
 {noformat}
 {noformat}
 Caused by: org.apache.pig.backend.executionengine.ExecException: ERROR 2118: 
 Unable to create input splits for: 
 har://namenode-location/user/viraj/project/subproject/files/size/data'; 
 ... 8 more
 Caused by: java.io.IOException: No FileSystem for scheme: namenode-location
 at .apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1375)
 at .apache.hadoop.fs.FileSystem.access(200(FileSystem.java:66)
 at .apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1390)
 at .apache.hadoop.fs.FileSystem.get(FileSystem.java:196)
 at .apache.hadoop.fs.HarFileSystem.initialize(HarFileSystem.java:104)
 at .apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1378)
 at .apache.hadoop.fs.FileSystem.get(FileSystem.java:193)
 at .apache.hadoop.fs.Path.getFileSystem(Path.java:175)
 at 
 .apache.hadoop.mapreduce.lib.input.FileInputFormat.listStatus(FileInputFormat.java:208)
 at 
 .apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigTextInputFormat.listStatus(PigTextInputFormat.java:36)
 at 
 .apache.hadoop.mapreduce.lib.input.FileInputFormat.getSplits(FileInputFormat.java:246)
 at 
 .apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.getSplits(PigInputFormat.java:245)
 {noformat}
 Viraj

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment

[jira] Updated: (PIG-1378) har url not usable in Pig scripts


 [ 
https://issues.apache.org/jira/browse/PIG-1378?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pradeep Kamath updated PIG-1378:


Status: Open  (was: Patch Available)

 har url not usable in Pig scripts
 -

 Key: PIG-1378
 URL: https://issues.apache.org/jira/browse/PIG-1378
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.7.0
Reporter: Viraj Bhat
Assignee: Pradeep Kamath
 Fix For: 0.8.0

 Attachments: PIG-1378-2.patch, PIG-1378-3.patch, PIG-1378-4.patch, 
 PIG-1378.patch


 I am trying to use har (Hadoop Archives) in my Pig script.
 I can use them through the HDFS shell
 {noformat}
 $hadoop fs -ls 'har:///user/viraj/project/subproject/files/size/data'
 Found 1 items
 -rw---   5 viraj users1537234 2010-04-14 09:49 
 user/viraj/project/subproject/files/size/data/part-1
 {noformat}
 Using similar URL's in grunt yields
 {noformat}
 grunt a = load 'har:///user/viraj/project/subproject/files/size/data'; 
 grunt dump a;
 {noformat}
 {noformat}
 2010-04-14 22:08:48,814 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 
 2998: Unhandled internal error. 
 org.apache.pig.impl.logicalLayer.FrontendException: ERROR 0: Incompatible 
 file URI scheme: har : hdfs
 2010-04-14 22:08:48,814 [main] WARN  org.apache.pig.tools.grunt.Grunt - There 
 is no log file to write to.
 2010-04-14 22:08:48,814 [main] ERROR org.apache.pig.tools.grunt.Grunt - 
 java.lang.Error: org.apache.pig.impl.logicalLayer.FrontendException: ERROR 0: 
 Incompatible file URI scheme: har : hdfs
 at 
 org.apache.pig.impl.logicalLayer.parser.QueryParser.LoadClause(QueryParser.java:1483)
 at 
 org.apache.pig.impl.logicalLayer.parser.QueryParser.BaseExpr(QueryParser.java:1245)
 at 
 org.apache.pig.impl.logicalLayer.parser.QueryParser.Expr(QueryParser.java:911)
 at 
 org.apache.pig.impl.logicalLayer.parser.QueryParser.Parse(QueryParser.java:700)
 at 
 org.apache.pig.impl.logicalLayer.LogicalPlanBuilder.parse(LogicalPlanBuilder.java:63)
 at org.apache.pig.PigServer$Graph.parseQuery(PigServer.java:1164)
 at org.apache.pig.PigServer$Graph.registerQuery(PigServer.java:1114)
 at org.apache.pig.PigServer.registerQuery(PigServer.java:425)
 at 
 org.apache.pig.tools.grunt.GruntParser.processPig(GruntParser.java:737)
 at 
 org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:324)
 at 
 org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:162)
 at 
 org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:138)
 at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:75)
 at org.apache.pig.Main.main(Main.java:357)
 Caused by: org.apache.pig.impl.logicalLayer.FrontendException: ERROR 0: 
 Incompatible file URI scheme: har : hdfs
 at org.apache.pig.LoadFunc.getAbsolutePath(LoadFunc.java:249)
 at org.apache.pig.LoadFunc.relativeToAbsolutePath(LoadFunc.java:62)
 at 
 org.apache.pig.impl.logicalLayer.parser.QueryParser.LoadClause(QueryParser.java:1472)
 ... 13 more
 {noformat}
 According to Jira http://issues.apache.org/jira/browse/PIG-1234 I try the 
 following as stated in the original description
 {noformat}
 grunt a = load 
 'har://namenode-location/user/viraj/project/subproject/files/size/data'; 
 grunt dump a;
 {noformat}
 {noformat}
 Caused by: org.apache.pig.backend.executionengine.ExecException: ERROR 2118: 
 Unable to create input splits for: 
 har://namenode-location/user/viraj/project/subproject/files/size/data'; 
 ... 8 more
 Caused by: java.io.IOException: No FileSystem for scheme: namenode-location
 at .apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1375)
 at .apache.hadoop.fs.FileSystem.access(200(FileSystem.java:66)
 at .apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1390)
 at .apache.hadoop.fs.FileSystem.get(FileSystem.java:196)
 at .apache.hadoop.fs.HarFileSystem.initialize(HarFileSystem.java:104)
 at .apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1378)
 at .apache.hadoop.fs.FileSystem.get(FileSystem.java:193)
 at .apache.hadoop.fs.Path.getFileSystem(Path.java:175)
 at 
 .apache.hadoop.mapreduce.lib.input.FileInputFormat.listStatus(FileInputFormat.java:208)
 at 
 .apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigTextInputFormat.listStatus(PigTextInputFormat.java:36)
 at 
 .apache.hadoop.mapreduce.lib.input.FileInputFormat.getSplits(FileInputFormat.java:246)
 at 
 .apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.getSplits(PigInputFormat.java:245)
 {noformat}
 Viraj

-- 
This message is automatically generated by JIRA.
-
You can reply

[jira] Updated: (PIG-1378) har url not usable in Pig scripts


 [ 
https://issues.apache.org/jira/browse/PIG-1378?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pradeep Kamath updated PIG-1378:


Attachment: PIG-1378-4.patch

Realized that a stray change in TestMRCompiler got into my previous patch - 
attaching new patch with just that change removed.

 har url not usable in Pig scripts
 -

 Key: PIG-1378
 URL: https://issues.apache.org/jira/browse/PIG-1378
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.7.0
Reporter: Viraj Bhat
Assignee: Pradeep Kamath
 Fix For: 0.8.0

 Attachments: PIG-1378-2.patch, PIG-1378-3.patch, PIG-1378-4.patch, 
 PIG-1378.patch


 I am trying to use har (Hadoop Archives) in my Pig script.
 I can use them through the HDFS shell
 {noformat}
 $hadoop fs -ls 'har:///user/viraj/project/subproject/files/size/data'
 Found 1 items
 -rw---   5 viraj users1537234 2010-04-14 09:49 
 user/viraj/project/subproject/files/size/data/part-1
 {noformat}
 Using similar URL's in grunt yields
 {noformat}
 grunt a = load 'har:///user/viraj/project/subproject/files/size/data'; 
 grunt dump a;
 {noformat}
 {noformat}
 2010-04-14 22:08:48,814 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 
 2998: Unhandled internal error. 
 org.apache.pig.impl.logicalLayer.FrontendException: ERROR 0: Incompatible 
 file URI scheme: har : hdfs
 2010-04-14 22:08:48,814 [main] WARN  org.apache.pig.tools.grunt.Grunt - There 
 is no log file to write to.
 2010-04-14 22:08:48,814 [main] ERROR org.apache.pig.tools.grunt.Grunt - 
 java.lang.Error: org.apache.pig.impl.logicalLayer.FrontendException: ERROR 0: 
 Incompatible file URI scheme: har : hdfs
 at 
 org.apache.pig.impl.logicalLayer.parser.QueryParser.LoadClause(QueryParser.java:1483)
 at 
 org.apache.pig.impl.logicalLayer.parser.QueryParser.BaseExpr(QueryParser.java:1245)
 at 
 org.apache.pig.impl.logicalLayer.parser.QueryParser.Expr(QueryParser.java:911)
 at 
 org.apache.pig.impl.logicalLayer.parser.QueryParser.Parse(QueryParser.java:700)
 at 
 org.apache.pig.impl.logicalLayer.LogicalPlanBuilder.parse(LogicalPlanBuilder.java:63)
 at org.apache.pig.PigServer$Graph.parseQuery(PigServer.java:1164)
 at org.apache.pig.PigServer$Graph.registerQuery(PigServer.java:1114)
 at org.apache.pig.PigServer.registerQuery(PigServer.java:425)
 at 
 org.apache.pig.tools.grunt.GruntParser.processPig(GruntParser.java:737)
 at 
 org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:324)
 at 
 org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:162)
 at 
 org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:138)
 at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:75)
 at org.apache.pig.Main.main(Main.java:357)
 Caused by: org.apache.pig.impl.logicalLayer.FrontendException: ERROR 0: 
 Incompatible file URI scheme: har : hdfs
 at org.apache.pig.LoadFunc.getAbsolutePath(LoadFunc.java:249)
 at org.apache.pig.LoadFunc.relativeToAbsolutePath(LoadFunc.java:62)
 at 
 org.apache.pig.impl.logicalLayer.parser.QueryParser.LoadClause(QueryParser.java:1472)
 ... 13 more
 {noformat}
 According to Jira http://issues.apache.org/jira/browse/PIG-1234 I try the 
 following as stated in the original description
 {noformat}
 grunt a = load 
 'har://namenode-location/user/viraj/project/subproject/files/size/data'; 
 grunt dump a;
 {noformat}
 {noformat}
 Caused by: org.apache.pig.backend.executionengine.ExecException: ERROR 2118: 
 Unable to create input splits for: 
 har://namenode-location/user/viraj/project/subproject/files/size/data'; 
 ... 8 more
 Caused by: java.io.IOException: No FileSystem for scheme: namenode-location
 at .apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1375)
 at .apache.hadoop.fs.FileSystem.access(200(FileSystem.java:66)
 at .apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1390)
 at .apache.hadoop.fs.FileSystem.get(FileSystem.java:196)
 at .apache.hadoop.fs.HarFileSystem.initialize(HarFileSystem.java:104)
 at .apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1378)
 at .apache.hadoop.fs.FileSystem.get(FileSystem.java:193)
 at .apache.hadoop.fs.Path.getFileSystem(Path.java:175)
 at 
 .apache.hadoop.mapreduce.lib.input.FileInputFormat.listStatus(FileInputFormat.java:208)
 at 
 .apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigTextInputFormat.listStatus(PigTextInputFormat.java:36)
 at 
 .apache.hadoop.mapreduce.lib.input.FileInputFormat.getSplits(FileInputFormat.java:246)
 at 
 .apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.getSplits

[jira] Updated: (PIG-740) Incorrect line number is generated when a string with double quotes is used instead of single quotes and is passed to UDF


 [ 
https://issues.apache.org/jira/browse/PIG-740?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pradeep Kamath updated PIG-740:
---

Attachment: PIG-740.patch

GruntParser was not handling double quotes within foreach blocks correctly by 
incorrectly treating them the same way as single quote for the starting double 
quote and not handling the end double quote - the patch addresses the bug by 
treating double quotes correctly.

 Incorrect line number is generated when a string  with double quotes is used 
 instead of single quotes and is passed to UDF
 --

 Key: PIG-740
 URL: https://issues.apache.org/jira/browse/PIG-740
 Project: Pig
  Issue Type: Bug
  Components: grunt
Affects Versions: 0.3.0
Reporter: Viraj Bhat
Priority: Minor
 Attachments: PIG-740.patch


 Consider the Pig script with the error that a String with double quotes 
 {code}www\\.{code} is used instead of a single quote {code}'www\\.'{code} 
 in the UDF string.REPLACEALL()
 {code}
 register string-2.0.jar;
 A = load 'inputdata' using PigStorage() as ( curr_searchQuery );
 B = foreach A {
 domain = string.REPLACEALL(curr_searchQuery,^www\\.,'');
 generate
 domain;
 };
 dump B;
 {code}
 I get the following error message where Line 11 points to the end of file. 
 The error message should point to Line 5.
 ===
 2009-03-31 01:33:38,403 [main] INFO  
 org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting 
 to hadoop file system at: hdfs://localhost:9000
 2009-03-31 01:33:39,168 [main] INFO  
 org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting 
 to map-reduce job tracker at: localhost:9001
 2009-03-31 01:33:39,589 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 
 1000: Error during parsing. Lexical error at line 11, column 0.  Encountered: 
 EOF after : 
 Details at logfile: /home/viraj/pig-svn/trunk/pig_1238463218046.log
 ===
 The log file contains the following contents
 ===
 ERROR 1000: Error during parsing. Lexical error at line 11, column 0.  
 Encountered: EOF after : 
 org.apache.pig.tools.pigscript.parser.TokenMgrError: Lexical error at line 
 11, column 0.  Encountered: EOF after : 
 at 
 org.apache.pig.tools.pigscript.parser.PigScriptParserTokenManager.getNextToken(PigScriptParserTokenManager.java:2739)
 at 
 org.apache.pig.tools.pigscript.parser.PigScriptParser.jj_ntk(PigScriptParser.java:778)
 at 
 org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:89)
 at 
 org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:99)
 at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:88)
 at org.apache.pig.Main.main(Main.java:352)
 ===

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Created: (PIG-1397) GruntParser should invoke executeBatch() first in processFsCommand()

GruntParser should invoke executeBatch() first in processFsCommand()


 Key: PIG-1397
 URL: https://issues.apache.org/jira/browse/PIG-1397
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.6.0, 0.7.0
Reporter: Pradeep Kamath
 Fix For: 0.8.0


If a script has multiple stores which can be optimized using multiquery 
optimization and if the script also has some file system state modifiying 
commands like cp, mv, rm then currently Gruntparser executes the plan until 
the filesystem command so that multiquery optimization will work on the file 
system after it has been modified (for example some portion of the multi query 
optimized script might be depending on the cp/mv or rm command to have run 
first). This is not done for fs ... commands - GruntParser should do the same 
even for fs .. commands in processFsCommand()

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (PIG-1394) POCombinerPackage hold too much memory for InternalCachedBag


[ 
https://issues.apache.org/jira/browse/PIG-1394?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12862430#action_12862430
 ] 

Pradeep Kamath commented on PIG-1394:
-

+1

 POCombinerPackage hold too much memory for InternalCachedBag
 

 Key: PIG-1394
 URL: https://issues.apache.org/jira/browse/PIG-1394
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.7.0
Reporter: Daniel Dai
Assignee: Daniel Dai
 Fix For: 0.7.0

 Attachments: PIG-1394-1.patch, PIG-1394-2.patch


 In POCombinerPackage, we create bunch of InternalCachedBag, the number of 
 which is the number of algebraic UDFs we use. However, when we create 
 InternalCachedBag, we use the default construct which assume we only create 1 
 InternalCachedBag in the system. It turns out we reserve way to much memory 
 to InternalCachedBag.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (PIG-1378) har url not usable in Pig scripts

2010-04-28 Thread Pradeep Kamath (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-1378?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pradeep Kamath updated PIG-1378:


Status: Open  (was: Patch Available)

 har url not usable in Pig scripts
 -

 Key: PIG-1378
 URL: https://issues.apache.org/jira/browse/PIG-1378
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.7.0
Reporter: Viraj Bhat
Assignee: Pradeep Kamath
 Fix For: 0.8.0

 Attachments: PIG-1378-2.patch, PIG-1378.patch


 I am trying to use har (Hadoop Archives) in my Pig script.
 I can use them through the HDFS shell
 {noformat}
 $hadoop fs -ls 'har:///user/viraj/project/subproject/files/size/data'
 Found 1 items
 -rw---   5 viraj users1537234 2010-04-14 09:49 
 user/viraj/project/subproject/files/size/data/part-1
 {noformat}
 Using similar URL's in grunt yields
 {noformat}
 grunt a = load 'har:///user/viraj/project/subproject/files/size/data'; 
 grunt dump a;
 {noformat}
 {noformat}
 2010-04-14 22:08:48,814 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 
 2998: Unhandled internal error. 
 org.apache.pig.impl.logicalLayer.FrontendException: ERROR 0: Incompatible 
 file URI scheme: har : hdfs
 2010-04-14 22:08:48,814 [main] WARN  org.apache.pig.tools.grunt.Grunt - There 
 is no log file to write to.
 2010-04-14 22:08:48,814 [main] ERROR org.apache.pig.tools.grunt.Grunt - 
 java.lang.Error: org.apache.pig.impl.logicalLayer.FrontendException: ERROR 0: 
 Incompatible file URI scheme: har : hdfs
 at 
 org.apache.pig.impl.logicalLayer.parser.QueryParser.LoadClause(QueryParser.java:1483)
 at 
 org.apache.pig.impl.logicalLayer.parser.QueryParser.BaseExpr(QueryParser.java:1245)
 at 
 org.apache.pig.impl.logicalLayer.parser.QueryParser.Expr(QueryParser.java:911)
 at 
 org.apache.pig.impl.logicalLayer.parser.QueryParser.Parse(QueryParser.java:700)
 at 
 org.apache.pig.impl.logicalLayer.LogicalPlanBuilder.parse(LogicalPlanBuilder.java:63)
 at org.apache.pig.PigServer$Graph.parseQuery(PigServer.java:1164)
 at org.apache.pig.PigServer$Graph.registerQuery(PigServer.java:1114)
 at org.apache.pig.PigServer.registerQuery(PigServer.java:425)
 at 
 org.apache.pig.tools.grunt.GruntParser.processPig(GruntParser.java:737)
 at 
 org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:324)
 at 
 org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:162)
 at 
 org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:138)
 at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:75)
 at org.apache.pig.Main.main(Main.java:357)
 Caused by: org.apache.pig.impl.logicalLayer.FrontendException: ERROR 0: 
 Incompatible file URI scheme: har : hdfs
 at org.apache.pig.LoadFunc.getAbsolutePath(LoadFunc.java:249)
 at org.apache.pig.LoadFunc.relativeToAbsolutePath(LoadFunc.java:62)
 at 
 org.apache.pig.impl.logicalLayer.parser.QueryParser.LoadClause(QueryParser.java:1472)
 ... 13 more
 {noformat}
 According to Jira http://issues.apache.org/jira/browse/PIG-1234 I try the 
 following as stated in the original description
 {noformat}
 grunt a = load 
 'har://namenode-location/user/viraj/project/subproject/files/size/data'; 
 grunt dump a;
 {noformat}
 {noformat}
 Caused by: org.apache.pig.backend.executionengine.ExecException: ERROR 2118: 
 Unable to create input splits for: 
 har://namenode-location/user/viraj/project/subproject/files/size/data'; 
 ... 8 more
 Caused by: java.io.IOException: No FileSystem for scheme: namenode-location
 at .apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1375)
 at .apache.hadoop.fs.FileSystem.access(200(FileSystem.java:66)
 at .apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1390)
 at .apache.hadoop.fs.FileSystem.get(FileSystem.java:196)
 at .apache.hadoop.fs.HarFileSystem.initialize(HarFileSystem.java:104)
 at .apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1378)
 at .apache.hadoop.fs.FileSystem.get(FileSystem.java:193)
 at .apache.hadoop.fs.Path.getFileSystem(Path.java:175)
 at 
 .apache.hadoop.mapreduce.lib.input.FileInputFormat.listStatus(FileInputFormat.java:208)
 at 
 .apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigTextInputFormat.listStatus(PigTextInputFormat.java:36)
 at 
 .apache.hadoop.mapreduce.lib.input.FileInputFormat.getSplits(FileInputFormat.java:246)
 at 
 .apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.getSplits(PigInputFormat.java:245)
 {noformat}
 Viraj

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue

[jira] Updated: (PIG-1378) har url not usable in Pig scripts

2010-04-28 Thread Pradeep Kamath (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-1378?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pradeep Kamath updated PIG-1378:


Status: Patch Available  (was: Open)

 har url not usable in Pig scripts
 -

 Key: PIG-1378
 URL: https://issues.apache.org/jira/browse/PIG-1378
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.7.0
Reporter: Viraj Bhat
Assignee: Pradeep Kamath
 Fix For: 0.8.0

 Attachments: PIG-1378-2.patch, PIG-1378.patch


 I am trying to use har (Hadoop Archives) in my Pig script.
 I can use them through the HDFS shell
 {noformat}
 $hadoop fs -ls 'har:///user/viraj/project/subproject/files/size/data'
 Found 1 items
 -rw---   5 viraj users1537234 2010-04-14 09:49 
 user/viraj/project/subproject/files/size/data/part-1
 {noformat}
 Using similar URL's in grunt yields
 {noformat}
 grunt a = load 'har:///user/viraj/project/subproject/files/size/data'; 
 grunt dump a;
 {noformat}
 {noformat}
 2010-04-14 22:08:48,814 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 
 2998: Unhandled internal error. 
 org.apache.pig.impl.logicalLayer.FrontendException: ERROR 0: Incompatible 
 file URI scheme: har : hdfs
 2010-04-14 22:08:48,814 [main] WARN  org.apache.pig.tools.grunt.Grunt - There 
 is no log file to write to.
 2010-04-14 22:08:48,814 [main] ERROR org.apache.pig.tools.grunt.Grunt - 
 java.lang.Error: org.apache.pig.impl.logicalLayer.FrontendException: ERROR 0: 
 Incompatible file URI scheme: har : hdfs
 at 
 org.apache.pig.impl.logicalLayer.parser.QueryParser.LoadClause(QueryParser.java:1483)
 at 
 org.apache.pig.impl.logicalLayer.parser.QueryParser.BaseExpr(QueryParser.java:1245)
 at 
 org.apache.pig.impl.logicalLayer.parser.QueryParser.Expr(QueryParser.java:911)
 at 
 org.apache.pig.impl.logicalLayer.parser.QueryParser.Parse(QueryParser.java:700)
 at 
 org.apache.pig.impl.logicalLayer.LogicalPlanBuilder.parse(LogicalPlanBuilder.java:63)
 at org.apache.pig.PigServer$Graph.parseQuery(PigServer.java:1164)
 at org.apache.pig.PigServer$Graph.registerQuery(PigServer.java:1114)
 at org.apache.pig.PigServer.registerQuery(PigServer.java:425)
 at 
 org.apache.pig.tools.grunt.GruntParser.processPig(GruntParser.java:737)
 at 
 org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:324)
 at 
 org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:162)
 at 
 org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:138)
 at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:75)
 at org.apache.pig.Main.main(Main.java:357)
 Caused by: org.apache.pig.impl.logicalLayer.FrontendException: ERROR 0: 
 Incompatible file URI scheme: har : hdfs
 at org.apache.pig.LoadFunc.getAbsolutePath(LoadFunc.java:249)
 at org.apache.pig.LoadFunc.relativeToAbsolutePath(LoadFunc.java:62)
 at 
 org.apache.pig.impl.logicalLayer.parser.QueryParser.LoadClause(QueryParser.java:1472)
 ... 13 more
 {noformat}
 According to Jira http://issues.apache.org/jira/browse/PIG-1234 I try the 
 following as stated in the original description
 {noformat}
 grunt a = load 
 'har://namenode-location/user/viraj/project/subproject/files/size/data'; 
 grunt dump a;
 {noformat}
 {noformat}
 Caused by: org.apache.pig.backend.executionengine.ExecException: ERROR 2118: 
 Unable to create input splits for: 
 har://namenode-location/user/viraj/project/subproject/files/size/data'; 
 ... 8 more
 Caused by: java.io.IOException: No FileSystem for scheme: namenode-location
 at .apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1375)
 at .apache.hadoop.fs.FileSystem.access(200(FileSystem.java:66)
 at .apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1390)
 at .apache.hadoop.fs.FileSystem.get(FileSystem.java:196)
 at .apache.hadoop.fs.HarFileSystem.initialize(HarFileSystem.java:104)
 at .apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1378)
 at .apache.hadoop.fs.FileSystem.get(FileSystem.java:193)
 at .apache.hadoop.fs.Path.getFileSystem(Path.java:175)
 at 
 .apache.hadoop.mapreduce.lib.input.FileInputFormat.listStatus(FileInputFormat.java:208)
 at 
 .apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigTextInputFormat.listStatus(PigTextInputFormat.java:36)
 at 
 .apache.hadoop.mapreduce.lib.input.FileInputFormat.getSplits(FileInputFormat.java:246)
 at 
 .apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.getSplits(PigInputFormat.java:245)
 {noformat}
 Viraj

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue

[jira] Updated: (PIG-1378) har url not usable in Pig scripts

2010-04-28 Thread Pradeep Kamath (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-1378?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pradeep Kamath updated PIG-1378:


Attachment: PIG-1378-2.patch

Attached new patch addressing unit test failures - mostly due to the fact that 
the new patch no longer converts locations which are already absolute like 
'/foo/bar'

 har url not usable in Pig scripts
 -

 Key: PIG-1378
 URL: https://issues.apache.org/jira/browse/PIG-1378
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.7.0
Reporter: Viraj Bhat
Assignee: Pradeep Kamath
 Fix For: 0.8.0

 Attachments: PIG-1378-2.patch, PIG-1378.patch


 I am trying to use har (Hadoop Archives) in my Pig script.
 I can use them through the HDFS shell
 {noformat}
 $hadoop fs -ls 'har:///user/viraj/project/subproject/files/size/data'
 Found 1 items
 -rw---   5 viraj users1537234 2010-04-14 09:49 
 user/viraj/project/subproject/files/size/data/part-1
 {noformat}
 Using similar URL's in grunt yields
 {noformat}
 grunt a = load 'har:///user/viraj/project/subproject/files/size/data'; 
 grunt dump a;
 {noformat}
 {noformat}
 2010-04-14 22:08:48,814 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 
 2998: Unhandled internal error. 
 org.apache.pig.impl.logicalLayer.FrontendException: ERROR 0: Incompatible 
 file URI scheme: har : hdfs
 2010-04-14 22:08:48,814 [main] WARN  org.apache.pig.tools.grunt.Grunt - There 
 is no log file to write to.
 2010-04-14 22:08:48,814 [main] ERROR org.apache.pig.tools.grunt.Grunt - 
 java.lang.Error: org.apache.pig.impl.logicalLayer.FrontendException: ERROR 0: 
 Incompatible file URI scheme: har : hdfs
 at 
 org.apache.pig.impl.logicalLayer.parser.QueryParser.LoadClause(QueryParser.java:1483)
 at 
 org.apache.pig.impl.logicalLayer.parser.QueryParser.BaseExpr(QueryParser.java:1245)
 at 
 org.apache.pig.impl.logicalLayer.parser.QueryParser.Expr(QueryParser.java:911)
 at 
 org.apache.pig.impl.logicalLayer.parser.QueryParser.Parse(QueryParser.java:700)
 at 
 org.apache.pig.impl.logicalLayer.LogicalPlanBuilder.parse(LogicalPlanBuilder.java:63)
 at org.apache.pig.PigServer$Graph.parseQuery(PigServer.java:1164)
 at org.apache.pig.PigServer$Graph.registerQuery(PigServer.java:1114)
 at org.apache.pig.PigServer.registerQuery(PigServer.java:425)
 at 
 org.apache.pig.tools.grunt.GruntParser.processPig(GruntParser.java:737)
 at 
 org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:324)
 at 
 org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:162)
 at 
 org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:138)
 at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:75)
 at org.apache.pig.Main.main(Main.java:357)
 Caused by: org.apache.pig.impl.logicalLayer.FrontendException: ERROR 0: 
 Incompatible file URI scheme: har : hdfs
 at org.apache.pig.LoadFunc.getAbsolutePath(LoadFunc.java:249)
 at org.apache.pig.LoadFunc.relativeToAbsolutePath(LoadFunc.java:62)
 at 
 org.apache.pig.impl.logicalLayer.parser.QueryParser.LoadClause(QueryParser.java:1472)
 ... 13 more
 {noformat}
 According to Jira http://issues.apache.org/jira/browse/PIG-1234 I try the 
 following as stated in the original description
 {noformat}
 grunt a = load 
 'har://namenode-location/user/viraj/project/subproject/files/size/data'; 
 grunt dump a;
 {noformat}
 {noformat}
 Caused by: org.apache.pig.backend.executionengine.ExecException: ERROR 2118: 
 Unable to create input splits for: 
 har://namenode-location/user/viraj/project/subproject/files/size/data'; 
 ... 8 more
 Caused by: java.io.IOException: No FileSystem for scheme: namenode-location
 at .apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1375)
 at .apache.hadoop.fs.FileSystem.access(200(FileSystem.java:66)
 at .apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1390)
 at .apache.hadoop.fs.FileSystem.get(FileSystem.java:196)
 at .apache.hadoop.fs.HarFileSystem.initialize(HarFileSystem.java:104)
 at .apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1378)
 at .apache.hadoop.fs.FileSystem.get(FileSystem.java:193)
 at .apache.hadoop.fs.Path.getFileSystem(Path.java:175)
 at 
 .apache.hadoop.mapreduce.lib.input.FileInputFormat.listStatus(FileInputFormat.java:208)
 at 
 .apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigTextInputFormat.listStatus(PigTextInputFormat.java:36)
 at 
 .apache.hadoop.mapreduce.lib.input.FileInputFormat.getSplits(FileInputFormat.java:246)
 at 
 .apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.getSplits

[jira] Updated: (PIG-1378) har url not usable in Pig scripts

2010-04-27 Thread Pradeep Kamath (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-1378?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pradeep Kamath updated PIG-1378:


Attachment: PIG-1378.patch

Attached patch addresses the issue in the description by changing 
LoadFunc.relativeToAbsolutePath() implementation to only convert input 
locations if the location does not have a scheme or the path in the location is 
not absolute.

 har url not usable in Pig scripts
 -

 Key: PIG-1378
 URL: https://issues.apache.org/jira/browse/PIG-1378
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.7.0
Reporter: Viraj Bhat
 Fix For: 0.8.0

 Attachments: PIG-1378.patch


 I am trying to use har (Hadoop Archives) in my Pig script.
 I can use them through the HDFS shell
 {noformat}
 $hadoop fs -ls 'har:///user/viraj/project/subproject/files/size/data'
 Found 1 items
 -rw---   5 viraj users1537234 2010-04-14 09:49 
 user/viraj/project/subproject/files/size/data/part-1
 {noformat}
 Using similar URL's in grunt yields
 {noformat}
 grunt a = load 'har:///user/viraj/project/subproject/files/size/data'; 
 grunt dump a;
 {noformat}
 {noformat}
 2010-04-14 22:08:48,814 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 
 2998: Unhandled internal error. 
 org.apache.pig.impl.logicalLayer.FrontendException: ERROR 0: Incompatible 
 file URI scheme: har : hdfs
 2010-04-14 22:08:48,814 [main] WARN  org.apache.pig.tools.grunt.Grunt - There 
 is no log file to write to.
 2010-04-14 22:08:48,814 [main] ERROR org.apache.pig.tools.grunt.Grunt - 
 java.lang.Error: org.apache.pig.impl.logicalLayer.FrontendException: ERROR 0: 
 Incompatible file URI scheme: har : hdfs
 at 
 org.apache.pig.impl.logicalLayer.parser.QueryParser.LoadClause(QueryParser.java:1483)
 at 
 org.apache.pig.impl.logicalLayer.parser.QueryParser.BaseExpr(QueryParser.java:1245)
 at 
 org.apache.pig.impl.logicalLayer.parser.QueryParser.Expr(QueryParser.java:911)
 at 
 org.apache.pig.impl.logicalLayer.parser.QueryParser.Parse(QueryParser.java:700)
 at 
 org.apache.pig.impl.logicalLayer.LogicalPlanBuilder.parse(LogicalPlanBuilder.java:63)
 at org.apache.pig.PigServer$Graph.parseQuery(PigServer.java:1164)
 at org.apache.pig.PigServer$Graph.registerQuery(PigServer.java:1114)
 at org.apache.pig.PigServer.registerQuery(PigServer.java:425)
 at 
 org.apache.pig.tools.grunt.GruntParser.processPig(GruntParser.java:737)
 at 
 org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:324)
 at 
 org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:162)
 at 
 org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:138)
 at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:75)
 at org.apache.pig.Main.main(Main.java:357)
 Caused by: org.apache.pig.impl.logicalLayer.FrontendException: ERROR 0: 
 Incompatible file URI scheme: har : hdfs
 at org.apache.pig.LoadFunc.getAbsolutePath(LoadFunc.java:249)
 at org.apache.pig.LoadFunc.relativeToAbsolutePath(LoadFunc.java:62)
 at 
 org.apache.pig.impl.logicalLayer.parser.QueryParser.LoadClause(QueryParser.java:1472)
 ... 13 more
 {noformat}
 According to Jira http://issues.apache.org/jira/browse/PIG-1234 I try the 
 following as stated in the original description
 {noformat}
 grunt a = load 
 'har://namenode-location/user/viraj/project/subproject/files/size/data'; 
 grunt dump a;
 {noformat}
 {noformat}
 Caused by: org.apache.pig.backend.executionengine.ExecException: ERROR 2118: 
 Unable to create input splits for: 
 har://namenode-location/user/viraj/project/subproject/files/size/data'; 
 ... 8 more
 Caused by: java.io.IOException: No FileSystem for scheme: namenode-location
 at .apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1375)
 at .apache.hadoop.fs.FileSystem.access(200(FileSystem.java:66)
 at .apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1390)
 at .apache.hadoop.fs.FileSystem.get(FileSystem.java:196)
 at .apache.hadoop.fs.HarFileSystem.initialize(HarFileSystem.java:104)
 at .apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1378)
 at .apache.hadoop.fs.FileSystem.get(FileSystem.java:193)
 at .apache.hadoop.fs.Path.getFileSystem(Path.java:175)
 at 
 .apache.hadoop.mapreduce.lib.input.FileInputFormat.listStatus(FileInputFormat.java:208)
 at 
 .apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigTextInputFormat.listStatus(PigTextInputFormat.java:36)
 at 
 .apache.hadoop.mapreduce.lib.input.FileInputFormat.getSplits(FileInputFormat.java:246

[jira] Updated: (PIG-1378) har url not usable in Pig scripts

2010-04-27 Thread Pradeep Kamath (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-1378?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pradeep Kamath updated PIG-1378:


  Status: Patch Available  (was: Open)
Assignee: Pradeep Kamath

 har url not usable in Pig scripts
 -

 Key: PIG-1378
 URL: https://issues.apache.org/jira/browse/PIG-1378
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.7.0
Reporter: Viraj Bhat
Assignee: Pradeep Kamath
 Fix For: 0.8.0

 Attachments: PIG-1378.patch


 I am trying to use har (Hadoop Archives) in my Pig script.
 I can use them through the HDFS shell
 {noformat}
 $hadoop fs -ls 'har:///user/viraj/project/subproject/files/size/data'
 Found 1 items
 -rw---   5 viraj users1537234 2010-04-14 09:49 
 user/viraj/project/subproject/files/size/data/part-1
 {noformat}
 Using similar URL's in grunt yields
 {noformat}
 grunt a = load 'har:///user/viraj/project/subproject/files/size/data'; 
 grunt dump a;
 {noformat}
 {noformat}
 2010-04-14 22:08:48,814 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 
 2998: Unhandled internal error. 
 org.apache.pig.impl.logicalLayer.FrontendException: ERROR 0: Incompatible 
 file URI scheme: har : hdfs
 2010-04-14 22:08:48,814 [main] WARN  org.apache.pig.tools.grunt.Grunt - There 
 is no log file to write to.
 2010-04-14 22:08:48,814 [main] ERROR org.apache.pig.tools.grunt.Grunt - 
 java.lang.Error: org.apache.pig.impl.logicalLayer.FrontendException: ERROR 0: 
 Incompatible file URI scheme: har : hdfs
 at 
 org.apache.pig.impl.logicalLayer.parser.QueryParser.LoadClause(QueryParser.java:1483)
 at 
 org.apache.pig.impl.logicalLayer.parser.QueryParser.BaseExpr(QueryParser.java:1245)
 at 
 org.apache.pig.impl.logicalLayer.parser.QueryParser.Expr(QueryParser.java:911)
 at 
 org.apache.pig.impl.logicalLayer.parser.QueryParser.Parse(QueryParser.java:700)
 at 
 org.apache.pig.impl.logicalLayer.LogicalPlanBuilder.parse(LogicalPlanBuilder.java:63)
 at org.apache.pig.PigServer$Graph.parseQuery(PigServer.java:1164)
 at org.apache.pig.PigServer$Graph.registerQuery(PigServer.java:1114)
 at org.apache.pig.PigServer.registerQuery(PigServer.java:425)
 at 
 org.apache.pig.tools.grunt.GruntParser.processPig(GruntParser.java:737)
 at 
 org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:324)
 at 
 org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:162)
 at 
 org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:138)
 at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:75)
 at org.apache.pig.Main.main(Main.java:357)
 Caused by: org.apache.pig.impl.logicalLayer.FrontendException: ERROR 0: 
 Incompatible file URI scheme: har : hdfs
 at org.apache.pig.LoadFunc.getAbsolutePath(LoadFunc.java:249)
 at org.apache.pig.LoadFunc.relativeToAbsolutePath(LoadFunc.java:62)
 at 
 org.apache.pig.impl.logicalLayer.parser.QueryParser.LoadClause(QueryParser.java:1472)
 ... 13 more
 {noformat}
 According to Jira http://issues.apache.org/jira/browse/PIG-1234 I try the 
 following as stated in the original description
 {noformat}
 grunt a = load 
 'har://namenode-location/user/viraj/project/subproject/files/size/data'; 
 grunt dump a;
 {noformat}
 {noformat}
 Caused by: org.apache.pig.backend.executionengine.ExecException: ERROR 2118: 
 Unable to create input splits for: 
 har://namenode-location/user/viraj/project/subproject/files/size/data'; 
 ... 8 more
 Caused by: java.io.IOException: No FileSystem for scheme: namenode-location
 at .apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1375)
 at .apache.hadoop.fs.FileSystem.access(200(FileSystem.java:66)
 at .apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1390)
 at .apache.hadoop.fs.FileSystem.get(FileSystem.java:196)
 at .apache.hadoop.fs.HarFileSystem.initialize(HarFileSystem.java:104)
 at .apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1378)
 at .apache.hadoop.fs.FileSystem.get(FileSystem.java:193)
 at .apache.hadoop.fs.Path.getFileSystem(Path.java:175)
 at 
 .apache.hadoop.mapreduce.lib.input.FileInputFormat.listStatus(FileInputFormat.java:208)
 at 
 .apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigTextInputFormat.listStatus(PigTextInputFormat.java:36)
 at 
 .apache.hadoop.mapreduce.lib.input.FileInputFormat.getSplits(FileInputFormat.java:246)
 at 
 .apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.getSplits(PigInputFormat.java:245)
 {noformat}
 Viraj

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment

[jira] Commented: (PIG-1395) Mapside cogroup runs out of memory

2010-04-27 Thread Pradeep Kamath (JIRA)


[ 
https://issues.apache.org/jira/browse/PIG-1395?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12861611#action_12861611
 ] 

Pradeep Kamath commented on PIG-1395:
-

+1, the comment can be updated to reflect the nature of the comparison in the 
code - currently the comment and code seem to be different. - otherwise the 
change looks good.

 Mapside cogroup runs out of memory
 --

 Key: PIG-1395
 URL: https://issues.apache.org/jira/browse/PIG-1395
 Project: Pig
  Issue Type: Improvement
  Components: impl
Affects Versions: 0.8.0
Reporter: Ashutosh Chauhan
Assignee: Ashutosh Chauhan
 Fix For: 0.8.0

 Attachments: cogrp_mem.patch


 In a particular scenario when there aren't lot of tuples with a same key in a 
 relation (i.e. there aren't many repeating keys) map tasks doing cogroup 
 fails with GC overhead exception.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (PIG-1371) Pig should handle deep casting of complex types

2010-04-26 Thread Pradeep Kamath (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-1371?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pradeep Kamath updated PIG-1371:


Attachment: PIG-1371-partial.patch

partial patch - attaching here for future reference

 Pig should handle deep casting of complex types 
 

 Key: PIG-1371
 URL: https://issues.apache.org/jira/browse/PIG-1371
 Project: Pig
  Issue Type: Bug
Reporter: Pradeep Kamath
 Fix For: 0.8.0

 Attachments: PIG-1371-partial.patch


 Consider input data in BinStorage format which has a field of bag type - 
 bg:{t:(i:int)}. In the load statement if the schema specified has the type 
 for this field specified as bg:{t:(c:chararray)}, the current behavior is 
 that Pig thinks of the field to be of type specified in the load statement 
 (bg:{t:(c:chararray)}) but no deep cast from bag of int (the real data) to 
 bag of chararray (the user specified schema) is made.
 There are two issues currently:
 1) The TypeCastInserter only considers the byte 'type' between the loader 
 presented schema and user specified schema to decided whether to introduce a 
 cast or not. In the above case since both schema have the type bag no cast 
 is inserted. This check has to be extended to consider the full FieldSchema 
 (with inner subschema) in order to decide whether a cast is needed.
 2) POCast should be changed to handle casting a complex type to the type 
 specified the user supplied FieldSchema. Here is there is one issue to be 
 considered - if the user specified the cast type to be bg:{t:(i:int, j:int)} 
 and the real data had only one field what should the result of the cast be:
  * A bag with two fields - the int field and a null? - In this approach pig 
 is assuming the lone field in the data is the first field which might be 
 incorrect if it in fact is the second field.
  * A null bag to indicate that the bag is of unknown value - this is the one 
 I personally prefer
  * The cast throws an IncompatibleCastException

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (PIG-1392) Parser fails to recognize valid field

2010-04-23 Thread Pradeep Kamath (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-1392?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pradeep Kamath updated PIG-1392:


Fix Version/s: 0.8.0
   (was: 0.7.0)

Unlinking this from 0.7 release and moving to 0.8 since there is a workaround.

 Parser fails to recognize valid field
 -

 Key: PIG-1392
 URL: https://issues.apache.org/jira/browse/PIG-1392
 Project: Pig
  Issue Type: Bug
Reporter: Ankur
 Fix For: 0.8.0


 Using this script below, parser fails to recognize a valid field in the 
 relation and throws error
 A = LOAD '/tmp' as (a:int, b:chararray, c:int);
 B = GROUP A BY (a, b);
 C = FOREACH B { bg = A.(b,c); GENERATE group, bg; } ;
 The error thrown is
 2010-04-23 10:16:20,610 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 
 1000: Error during parsing. Invalid alias: c in {group: (a: int,b: 
 chararray),A: {a: int,b: chararray,c: int}}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (PIG-1378) har url not usable in Pig scripts

2010-04-21 Thread Pradeep Kamath (JIRA)


[ 
https://issues.apache.org/jira/browse/PIG-1378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12859449#action_12859449
 ] 

Pradeep Kamath commented on PIG-1378:
-

Adding to previous comment the har url has to be of the form (note the hdfs- 
prefix in the authority part):
har://hdfs-namenodehost:namenodeport/datalocation

 har url not usable in Pig scripts
 -

 Key: PIG-1378
 URL: https://issues.apache.org/jira/browse/PIG-1378
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.7.0
Reporter: Viraj Bhat
 Fix For: 0.8.0


 I am trying to use har (Hadoop Archives) in my Pig script.
 I can use them through the HDFS shell
 {noformat}
 $hadoop fs -ls 'har:///user/viraj/project/subproject/files/size/data'
 Found 1 items
 -rw---   5 viraj users1537234 2010-04-14 09:49 
 user/viraj/project/subproject/files/size/data/part-1
 {noformat}
 Using similar URL's in grunt yields
 {noformat}
 grunt a = load 'har:///user/viraj/project/subproject/files/size/data'; 
 grunt dump a;
 {noformat}
 {noformat}
 2010-04-14 22:08:48,814 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 
 2998: Unhandled internal error. 
 org.apache.pig.impl.logicalLayer.FrontendException: ERROR 0: Incompatible 
 file URI scheme: har : hdfs
 2010-04-14 22:08:48,814 [main] WARN  org.apache.pig.tools.grunt.Grunt - There 
 is no log file to write to.
 2010-04-14 22:08:48,814 [main] ERROR org.apache.pig.tools.grunt.Grunt - 
 java.lang.Error: org.apache.pig.impl.logicalLayer.FrontendException: ERROR 0: 
 Incompatible file URI scheme: har : hdfs
 at 
 org.apache.pig.impl.logicalLayer.parser.QueryParser.LoadClause(QueryParser.java:1483)
 at 
 org.apache.pig.impl.logicalLayer.parser.QueryParser.BaseExpr(QueryParser.java:1245)
 at 
 org.apache.pig.impl.logicalLayer.parser.QueryParser.Expr(QueryParser.java:911)
 at 
 org.apache.pig.impl.logicalLayer.parser.QueryParser.Parse(QueryParser.java:700)
 at 
 org.apache.pig.impl.logicalLayer.LogicalPlanBuilder.parse(LogicalPlanBuilder.java:63)
 at org.apache.pig.PigServer$Graph.parseQuery(PigServer.java:1164)
 at org.apache.pig.PigServer$Graph.registerQuery(PigServer.java:1114)
 at org.apache.pig.PigServer.registerQuery(PigServer.java:425)
 at 
 org.apache.pig.tools.grunt.GruntParser.processPig(GruntParser.java:737)
 at 
 org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:324)
 at 
 org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:162)
 at 
 org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:138)
 at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:75)
 at org.apache.pig.Main.main(Main.java:357)
 Caused by: org.apache.pig.impl.logicalLayer.FrontendException: ERROR 0: 
 Incompatible file URI scheme: har : hdfs
 at org.apache.pig.LoadFunc.getAbsolutePath(LoadFunc.java:249)
 at org.apache.pig.LoadFunc.relativeToAbsolutePath(LoadFunc.java:62)
 at 
 org.apache.pig.impl.logicalLayer.parser.QueryParser.LoadClause(QueryParser.java:1472)
 ... 13 more
 {noformat}
 According to Jira http://issues.apache.org/jira/browse/PIG-1234 I try the 
 following as stated in the original description
 {noformat}
 grunt a = load 
 'har://namenode-location/user/viraj/project/subproject/files/size/data'; 
 grunt dump a;
 {noformat}
 {noformat}
 Caused by: org.apache.pig.backend.executionengine.ExecException: ERROR 2118: 
 Unable to create input splits for: 
 har://namenode-location/user/viraj/project/subproject/files/size/data'; 
 ... 8 more
 Caused by: java.io.IOException: No FileSystem for scheme: namenode-location
 at .apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1375)
 at .apache.hadoop.fs.FileSystem.access(200(FileSystem.java:66)
 at .apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1390)
 at .apache.hadoop.fs.FileSystem.get(FileSystem.java:196)
 at .apache.hadoop.fs.HarFileSystem.initialize(HarFileSystem.java:104)
 at .apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1378)
 at .apache.hadoop.fs.FileSystem.get(FileSystem.java:193)
 at .apache.hadoop.fs.Path.getFileSystem(Path.java:175)
 at 
 .apache.hadoop.mapreduce.lib.input.FileInputFormat.listStatus(FileInputFormat.java:208)
 at 
 .apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigTextInputFormat.listStatus(PigTextInputFormat.java:36)
 at 
 .apache.hadoop.mapreduce.lib.input.FileInputFormat.getSplits(FileInputFormat.java:246)
 at 
 .apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.getSplits(PigInputFormat.java:245)
 {noformat}
 Viraj

-- 
This message is automatically

[jira] Updated: (PIG-1372) Restore PigInputFormat.sJob for backward compatibility

2010-04-15 Thread Pradeep Kamath (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-1372?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pradeep Kamath updated PIG-1372:


  Status: Resolved  (was: Patch Available)
Hadoop Flags: [Reviewed]
  Resolution: Fixed

All tests pass on my local machine - patch committed to 0.7 and trunk

 Restore PigInputFormat.sJob for backward compatibility
 --

 Key: PIG-1372
 URL: https://issues.apache.org/jira/browse/PIG-1372
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.7.0
Reporter: Pradeep Kamath
Assignee: Pradeep Kamath
 Fix For: 0.7.0

 Attachments: PIG-1372-2.patch, PIG-1372.patch


 The preferred method to get the job's Configuration object would be to use 
 UDFContext.getJobConf(). This jira is to restore PigInputFormat.sJob  (but we 
 will be marking it deprecated and indicating to use UDFContext.getJobConf() 
 instead) to be backward compatible - we can remove it from pig in a future 
 release.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Commented: (PIG-1363) Unnecessary loadFunc instantiations


[ 
https://issues.apache.org/jira/browse/PIG-1363?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12857139#action_12857139
 ] 

Pradeep Kamath commented on PIG-1363:
-

+1

 Unnecessary loadFunc instantiations
 ---

 Key: PIG-1363
 URL: https://issues.apache.org/jira/browse/PIG-1363
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.7.0
Reporter: Ashutosh Chauhan
Assignee: Ashutosh Chauhan
 Fix For: 0.8.0

 Attachments: pig-1363.patch


 In MRCompiler loadfuncs are instantiated at multiple locations in different 
 visit methods. This is inconsistent and confusing. LoadFunc should be 
 instantiated at only one place, ideally in LogToPhyTanslation#visit(LOLoad). 
 A getter should be added to POLoad to retrieve this instantiated loadFunc 
 wherever it is needed in later stages of compilation. 

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Updated: (PIG-1372) Restore PigInputFormat.sJob for backward compatibility


 [ 
https://issues.apache.org/jira/browse/PIG-1372?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pradeep Kamath updated PIG-1372:


Attachment: PIG-1372-2.patch

Regenerated patch against latest trunk (same changes).

Here are the results of running test-patch ant target:
[exec] -1 overall.  
 [exec] 
 [exec] +1 @author.  The patch does not contain any @author tags.
 [exec] 
 [exec] -1 tests included.  The patch doesn't appear to include any new 
or modified tests.
 [exec] Please justify why no tests are needed for 
this patch.
 [exec] 
 [exec] +1 javadoc.  The javadoc tool did not generate any warning 
messages.
 [exec] 
 [exec] +1 javac.  The applied patch does not increase the total number 
of javac compiler warnings.
 [exec] 
 [exec] +1 findbugs.  The patch does not introduce any new Findbugs 
warnings.
 [exec] 
 [exec] +1 release audit.  The applied patch does not increase the 
total number of release audit warnings.
 [exec] 



 Restore PigInputFormat.sJob for backward compatibility
 --

 Key: PIG-1372
 URL: https://issues.apache.org/jira/browse/PIG-1372
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.7.0
Reporter: Pradeep Kamath
Assignee: Pradeep Kamath
 Fix For: 0.7.0

 Attachments: PIG-1372-2.patch, PIG-1372.patch


 The preferred method to get the job's Configuration object would be to use 
 UDFContext.getJobConf(). This jira is to restore PigInputFormat.sJob  (but we 
 will be marking it deprecated and indicating to use UDFContext.getJobConf() 
 instead) to be backward compatible - we can remove it from pig in a future 
 release.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Updated: (PIG-1372) Restore PigInputFormat.sJob for backward compatibility


 [ 
https://issues.apache.org/jira/browse/PIG-1372?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pradeep Kamath updated PIG-1372:


Status: Patch Available  (was: Open)

 Restore PigInputFormat.sJob for backward compatibility
 --

 Key: PIG-1372
 URL: https://issues.apache.org/jira/browse/PIG-1372
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.7.0
Reporter: Pradeep Kamath
Assignee: Pradeep Kamath
 Fix For: 0.7.0

 Attachments: PIG-1372-2.patch, PIG-1372.patch


 The preferred method to get the job's Configuration object would be to use 
 UDFContext.getJobConf(). This jira is to restore PigInputFormat.sJob  (but we 
 will be marking it deprecated and indicating to use UDFContext.getJobConf() 
 instead) to be backward compatible - we can remove it from pig in a future 
 release.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Updated: (PIG-1372) Restore PigInputFormat.sJob for backward compatibility


 [ 
https://issues.apache.org/jira/browse/PIG-1372?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pradeep Kamath updated PIG-1372:


Status: Open  (was: Patch Available)

 Restore PigInputFormat.sJob for backward compatibility
 --

 Key: PIG-1372
 URL: https://issues.apache.org/jira/browse/PIG-1372
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.7.0
Reporter: Pradeep Kamath
Assignee: Pradeep Kamath
 Fix For: 0.7.0

 Attachments: PIG-1372-2.patch, PIG-1372.patch


 The preferred method to get the job's Configuration object would be to use 
 UDFContext.getJobConf(). This jira is to restore PigInputFormat.sJob  (but we 
 will be marking it deprecated and indicating to use UDFContext.getJobConf() 
 instead) to be backward compatible - we can remove it from pig in a future 
 release.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Commented: (PIG-1370) Marking Pig interfaces for org.apache.pig package

2010-04-13 Thread Pradeep Kamath (JIRA)


[ 
https://issues.apache.org/jira/browse/PIG-1370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12856496#action_12856496
 ] 

Pradeep Kamath commented on PIG-1370:
-

bq. But it is accepted as an arg to one of ResourceSchema's constructors. I 
think that makes it public, unless we want to say that constructor isn't 
intended for public use (in which case, why is it public?).
This constructor is called from internal Pig code and we should not expose this 
to users - if we don't make the constructor public we cannot call this 
constructor since the callers are in different packages - I really think we 
need an annotation to say internal-use so we can annotate some of the public 
methods which we don't want users to use.

bq. I did mark ComparisonFunc as deprecated. Are you saying we should just 
remove it instead of deprecate it?
I think for now deprecated is fine.



 Marking Pig interfaces for org.apache.pig package
 -

 Key: PIG-1370
 URL: https://issues.apache.org/jira/browse/PIG-1370
 Project: Pig
  Issue Type: Sub-task
  Components: documentation
Reporter: Alan Gates
Assignee: Alan Gates
 Fix For: 0.8.0

 Attachments: PIG-1370.patch, PIG-1370_2.patch


 Done as a separate JIRA from PIG-1311 since this alone contains quite a lot 
 of changes.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Updated: (PIG-1369) POProject does not handle null tuples and non existent fields in some cases

2010-04-13 Thread Pradeep Kamath (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-1369?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pradeep Kamath updated PIG-1369:


   Status: Resolved  (was: Patch Available)
 Hadoop Flags: [Reviewed]
Fix Version/s: 0.7.0
   Resolution: Fixed

Patch committed to trunk and branch-0.7

 POProject does not handle null tuples and non existent fields in some cases
 ---

 Key: PIG-1369
 URL: https://issues.apache.org/jira/browse/PIG-1369
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.7.0
Reporter: Pradeep Kamath
Assignee: Pradeep Kamath
 Fix For: 0.7.0

 Attachments: PIG-1369.patch


 If a field (which is of type Tuple) in the data in null, POProject throws a 
 NullPointerException. Also while projecting fields form a bag if a certain 
 tuple in the bag does not contain a field being projected, an 
 IndexOutofBoundsException is thrown. Since in a similar situation (accessing 
 a non exisiting field in input tuple), POProject catches the 
 IndexOutOfBoundsException and returns null, it should do the same for the 
 above two cases and other cases where similar situations occur.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Created: (PIG-1371) Pig should handle deep casting of complex types

Pig should handle deep casting of complex types 


 Key: PIG-1371
 URL: https://issues.apache.org/jira/browse/PIG-1371
 Project: Pig
  Issue Type: Bug
Reporter: Pradeep Kamath
 Fix For: 0.8.0


Consider input data in BinStorage format which has a field of bag type - 
bg:{t:(i:int)}. In the load statement if the schema specified has the type for 
this field specified as bg:{t:(c:chararray)}, the current behavior is that Pig 
thinks of the field to be of type specified in the load statement 
(bg:{t:(c:chararray)}) but no deep cast from bag of int (the real data) to bag 
of chararray (the user specified schema) is made.

There are two issues currently:
1) The TypeCastInserter only considers the byte 'type' between the loader 
presented schema and user specified schema to decided whether to introduce a 
cast or not. In the above case since both schema have the type bag no cast is 
inserted. This check has to be extended to consider the full FieldSchema (with 
inner subschema) in order to decide whether a cast is needed.
2) POCast should be changed to handle casting a complex type to the type 
specified the user supplied FieldSchema. Here is there is one issue to be 
considered - if the user specified the cast type to be bg:{t:(i:int, j:int)} 
and the real data had only one field what should the result of the cast be:
 * A bag with two fields - the int field and a null? - In this approach pig is 
assuming the lone field in the data is the first field which might be incorrect 
if it in fact is the second field.
 * A null bag to indicate that the bag is of unknown value - this is the one I 
personally prefer
 * The cast throws an IncompatibleCastException


-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Updated: (PIG-1369) POProject does not handle null tuples and non existent fields in some cases


 [ 
https://issues.apache.org/jira/browse/PIG-1369?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pradeep Kamath updated PIG-1369:


Status: Patch Available  (was: Open)

 POProject does not handle null tuples and non existent fields in some cases
 ---

 Key: PIG-1369
 URL: https://issues.apache.org/jira/browse/PIG-1369
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.7.0
Reporter: Pradeep Kamath
Assignee: Pradeep Kamath
 Attachments: PIG-1369.patch


 If a field (which is of type Tuple) in the data in null, POProject throws a 
 NullPointerException. Also while projecting fields form a bag if a certain 
 tuple in the bag does not contain a field being projected, an 
 IndexOutofBoundsException is thrown. Since in a similar situation (accessing 
 a non exisiting field in input tuple), POProject catches the 
 IndexOutOfBoundsException and returns null, it should do the same for the 
 above two cases and other cases where similar situations occur.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Updated: (PIG-1369) POProject does not handle null tuples and non existent fields in some cases


 [ 
https://issues.apache.org/jira/browse/PIG-1369?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pradeep Kamath updated PIG-1369:


Status: Open  (was: Patch Available)

The unit tests all run successfully on my local machine - the hudson QA failure 
was due to a temporal port conflict issue - will resubmit - meantime the patch 
is ready for review.

 POProject does not handle null tuples and non existent fields in some cases
 ---

 Key: PIG-1369
 URL: https://issues.apache.org/jira/browse/PIG-1369
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.7.0
Reporter: Pradeep Kamath
Assignee: Pradeep Kamath
 Attachments: PIG-1369.patch


 If a field (which is of type Tuple) in the data in null, POProject throws a 
 NullPointerException. Also while projecting fields form a bag if a certain 
 tuple in the bag does not contain a field being projected, an 
 IndexOutofBoundsException is thrown. Since in a similar situation (accessing 
 a non exisiting field in input tuple), POProject catches the 
 IndexOutOfBoundsException and returns null, it should do the same for the 
 above two cases and other cases where similar situations occur.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Created: (PIG-1372) Restore PigInputFormat.sJob for backward compatibility

Restore PigInputFormat.sJob for backward compatibility
--

 Key: PIG-1372
 URL: https://issues.apache.org/jira/browse/PIG-1372
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.7.0
Reporter: Pradeep Kamath
Assignee: Pradeep Kamath
 Fix For: 0.7.0


The preferred method to get the job's Configuration object would be to use 
UDFContext.getJobConf(). This jira is to restore PigInputFormat.sJob  (but we 
will be marking it deprecated and indicating to use UDFContext.getJobConf() 
instead) to be backward compatible - we can remove it from pig in a future 
release.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Updated: (PIG-1323) Communicate whether the call to LoadFunc.setLocation is being made in hadoop's front end or backend


 [ 
https://issues.apache.org/jira/browse/PIG-1323?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pradeep Kamath updated PIG-1323:


Status: Resolved  (was: Patch Available)
Resolution: Invalid

There is already a hadoop property mapred.task.id which is set to the 
map/reduce task id in the backend and is not set in the front end which can be 
used to figure this out. Hence it is best not to introduce new properties in 
the configuration for this purpose.

 Communicate whether the call to LoadFunc.setLocation is being made in 
 hadoop's front end or backend
 ---

 Key: PIG-1323
 URL: https://issues.apache.org/jira/browse/PIG-1323
 Project: Pig
  Issue Type: Improvement
Affects Versions: 0.7.0
Reporter: Pradeep Kamath
Assignee: Pradeep Kamath
 Fix For: 0.7.0

 Attachments: PIG-1323.patch


 Loaders which interact with external systems like a metadata server may need 
 to know if the LoadFunc.setLocation call happens from the frontend (on the 
 client machine) or in the backend (on each map task). The Configuration in 
 the Job argument to setLocation() can contain this information.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Updated: (PIG-1372) Restore PigInputFormat.sJob for backward compatibility


 [ 
https://issues.apache.org/jira/browse/PIG-1372?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pradeep Kamath updated PIG-1372:


Attachment: PIG-1372.patch

Attached patch restores PigInputFormat.sJob - however it is deprecated (and so 
also PigMapReduce.sJobConf for user code) and the javadoc comment indicates to 
use UDFContext.getUDFContext().getJobConf() instead. No tests are included 
since this simply restores a static variable for backward compatibility and is 
not used in pig code.

 Restore PigInputFormat.sJob for backward compatibility
 --

 Key: PIG-1372
 URL: https://issues.apache.org/jira/browse/PIG-1372
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.7.0
Reporter: Pradeep Kamath
Assignee: Pradeep Kamath
 Fix For: 0.7.0

 Attachments: PIG-1372.patch


 The preferred method to get the job's Configuration object would be to use 
 UDFContext.getJobConf(). This jira is to restore PigInputFormat.sJob  (but we 
 will be marking it deprecated and indicating to use UDFContext.getJobConf() 
 instead) to be backward compatible - we can remove it from pig in a future 
 release.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Updated: (PIG-1372) Restore PigInputFormat.sJob for backward compatibility


 [ 
https://issues.apache.org/jira/browse/PIG-1372?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pradeep Kamath updated PIG-1372:


Status: Patch Available  (was: Open)

 Restore PigInputFormat.sJob for backward compatibility
 --

 Key: PIG-1372
 URL: https://issues.apache.org/jira/browse/PIG-1372
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.7.0
Reporter: Pradeep Kamath
Assignee: Pradeep Kamath
 Fix For: 0.7.0

 Attachments: PIG-1372.patch


 The preferred method to get the job's Configuration object would be to use 
 UDFContext.getJobConf(). This jira is to restore PigInputFormat.sJob  (but we 
 will be marking it deprecated and indicating to use UDFContext.getJobConf() 
 instead) to be backward compatible - we can remove it from pig in a future 
 release.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Updated: (PIG-1366) PigStorage's pushProjection implementation results in NPE under certain data conditions

2010-04-09 Thread Pradeep Kamath (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-1366?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pradeep Kamath updated PIG-1366:


  Resolution: Fixed
Hadoop Flags: [Reviewed]
  Status: Resolved  (was: Patch Available)

Patch committed to trunk and branch-0.7

 PigStorage's pushProjection implementation results in NPE under certain data 
 conditions
 ---

 Key: PIG-1366
 URL: https://issues.apache.org/jira/browse/PIG-1366
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.6.0, 0.7.0
Reporter: Pradeep Kamath
Assignee: Pradeep Kamath
 Fix For: 0.7.0

 Attachments: PIG-1366.patch


 Under the following conditions, a NullPointerException is caused when 
 PigStorage is used:
 If in the script, only the 2nd and 3rd column of the data (say) are used, the 
 PruneColumns optimization passes this information to PigStorage through the 
 pushProjection() method. If the data contains a row with only one column 
 (malformed data due to missing cols in certain rows), PigStorage returns a 
 Tuple backed by a null ArrayList. Subsequent projection operations on this 
 tuple result in the NPE.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (PIG-1365) WrappedIOException is missing from Pig.jar


[ 
https://issues.apache.org/jira/browse/PIG-1365?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12854970#action_12854970
 ] 

Pradeep Kamath commented on PIG-1365:
-

No unit tests have been added since this is just restoring an old class for 
backward compatibility for users and is no longer used in the pig code. The 
release audit warning is about a html file and can be ignored.

 WrappedIOException is missing from Pig.jar
 --

 Key: PIG-1365
 URL: https://issues.apache.org/jira/browse/PIG-1365
 Project: Pig
  Issue Type: Bug
Reporter: Olga Natkovich
Assignee: Pradeep Kamath
Priority: Critical
 Fix For: 0.7.0

 Attachments: PIG-1365.patch


 We need to put it back since UDFs rely on it.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (PIG-1365) WrappedIOException is missing from Pig.jar


 [ 
https://issues.apache.org/jira/browse/PIG-1365?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pradeep Kamath updated PIG-1365:


  Resolution: Fixed
Hadoop Flags: [Reviewed]
  Status: Resolved  (was: Patch Available)

Patch committed to trunk and branch-0.7

 WrappedIOException is missing from Pig.jar
 --

 Key: PIG-1365
 URL: https://issues.apache.org/jira/browse/PIG-1365
 Project: Pig
  Issue Type: Bug
Reporter: Olga Natkovich
Assignee: Pradeep Kamath
Priority: Critical
 Fix For: 0.7.0

 Attachments: PIG-1365.patch


 We need to put it back since UDFs rely on it.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Created: (PIG-1368) Utf8StorageConvertor's bytesToTuple and bytesToBag methods need to be tightened for corner cases

Utf8StorageConvertor's bytesToTuple and bytesToBag methods need to be tightened 
for corner cases


 Key: PIG-1368
 URL: https://issues.apache.org/jira/browse/PIG-1368
 Project: Pig
  Issue Type: Improvement
Affects Versions: 0.7.0
Reporter: Pradeep Kamath


Consider the following data:
1\t ( hello , bye ) \n
1\t( hello , bye )a\n
2 \t (good , bye)\n

The following script gives the results below:
a = load 'junk' as (i:int, t:tuple(s:chararray, r:chararray)); dump a;

(1,( hello , bye ))
(1,( hello , bye ))
(2,(good , bye))

The current bytesToTuple implementation discards leading and trailing 
characters before the tuple delimiters and parses the tuple out - I think 
instead it should treat any leading and trailing characters (including space) 
near the delimiters as an indication of a malformed tuple and return null.

Also in the code, consumeBag() should handle the special case of {} and not 
delegate the handling to consumeTuple(). 

In consumeBag() null tuples should not be skipped.


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Created: (PIG-1369) POProject does not handle null tuples and non existent fields in some cases

POProject does not handle null tuples and non existent fields in some cases
---

 Key: PIG-1369
 URL: https://issues.apache.org/jira/browse/PIG-1369
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.7.0
Reporter: Pradeep Kamath
Assignee: Pradeep Kamath


If a field (which is of type Tuple) in the data in null, POProject throws a 
NullPointerException. Also while projecting fields form a bag if a certain 
tuple in the bag does not contain a field being projected, an 
IndexOutofBoundsException is thrown. Since in a similar situation (accessing a 
non exisiting field in input tuple), POProject catches the 
IndexOutOfBoundsException and returns null, it should do the same for the above 
two cases and other cases where similar situations occur.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (PIG-1299) Implement Pig counter to track number of output rows for each output files


[ 
https://issues.apache.org/jira/browse/PIG-1299?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12855157#action_12855157
 ] 

Pradeep Kamath commented on PIG-1299:
-

+1

 Implement Pig counter  to track number of output rows for each output files 
 

 Key: PIG-1299
 URL: https://issues.apache.org/jira/browse/PIG-1299
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.6.0
Reporter: Richard Ding
Assignee: Richard Ding
 Fix For: 0.8.0

 Attachments: PIG-1299.patch, PIG-1299.patch


 When running a multi-store query, the Hadoop job tracker often displays only 
 0 for Reduce output records or Map output records counters, This is 
 incorrect and misleading. Pig should implement an output records counter 
 for each output files in the query. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (PIG-1369) POProject does not handle null tuples and non existent fields in some cases


 [ 
https://issues.apache.org/jira/browse/PIG-1369?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pradeep Kamath updated PIG-1369:


Attachment: PIG-1369.patch

Attached patch addresses the issues mentioned in the description by catching 
NullPointerException and IndexOutofBoundsException at appropriate places.

 POProject does not handle null tuples and non existent fields in some cases
 ---

 Key: PIG-1369
 URL: https://issues.apache.org/jira/browse/PIG-1369
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.7.0
Reporter: Pradeep Kamath
Assignee: Pradeep Kamath
 Attachments: PIG-1369.patch


 If a field (which is of type Tuple) in the data in null, POProject throws a 
 NullPointerException. Also while projecting fields form a bag if a certain 
 tuple in the bag does not contain a field being projected, an 
 IndexOutofBoundsException is thrown. Since in a similar situation (accessing 
 a non exisiting field in input tuple), POProject catches the 
 IndexOutOfBoundsException and returns null, it should do the same for the 
 above two cases and other cases where similar situations occur.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (PIG-1369) POProject does not handle null tuples and non existent fields in some cases


 [ 
https://issues.apache.org/jira/browse/PIG-1369?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pradeep Kamath updated PIG-1369:


Status: Patch Available  (was: Open)

 POProject does not handle null tuples and non existent fields in some cases
 ---

 Key: PIG-1369
 URL: https://issues.apache.org/jira/browse/PIG-1369
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.7.0
Reporter: Pradeep Kamath
Assignee: Pradeep Kamath
 Attachments: PIG-1369.patch


 If a field (which is of type Tuple) in the data in null, POProject throws a 
 NullPointerException. Also while projecting fields form a bag if a certain 
 tuple in the bag does not contain a field being projected, an 
 IndexOutofBoundsException is thrown. Since in a similar situation (accessing 
 a non exisiting field in input tuple), POProject catches the 
 IndexOutOfBoundsException and returns null, it should do the same for the 
 above two cases and other cases where similar situations occur.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (PIG-1362) Provide udf context signature in ensureAllKeysInSameSplit() method of loader


 [ 
https://issues.apache.org/jira/browse/PIG-1362?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pradeep Kamath updated PIG-1362:


Resolution: Fixed
Status: Resolved  (was: Patch Available)

+1

 Provide udf context signature in ensureAllKeysInSameSplit() method of loader
 

 Key: PIG-1362
 URL: https://issues.apache.org/jira/browse/PIG-1362
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.7.0
Reporter: Ashutosh Chauhan
Assignee: Ashutosh Chauhan
Priority: Critical
 Fix For: 0.7.0

 Attachments: backport.patch


 As a part of PIG-1292 a check was introduced to make sure loader used in 
 collected group-by implements CollectableLoader (new interface in that 
 patch). In its method, loader may use udf context to store some info. We need 
 to make sure that udf context signature is setup correctly in such cases. 
 This is already the case in trunk, need to backport it to 0.7 branch. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Created: (PIG-1366) PigStorage's pushProjection implementation results in NPE under certain data conditions

PigStorage's pushProjection implementation results in NPE under certain data 
conditions
---

 Key: PIG-1366
 URL: https://issues.apache.org/jira/browse/PIG-1366
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.6.0, 0.7.0
Reporter: Pradeep Kamath
Assignee: Pradeep Kamath
 Fix For: 0.7.0


Under the following conditions, a NullPointerException is caused when 
PigStorage is used:
If in the script, only the 2nd and 3rd column of the data (say) are used, the 
PruneColumns optimization passes this information to PigStorage through the 
pushProjection() method. If the data contains a row with only one column 
(malformed data due to missing cols in certain rows), PigStorage returns a 
Tuple backed by a null ArrayList. Subsequent projection operations on this 
tuple result in the NPE.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (PIG-1366) PigStorage's pushProjection implementation results in NPE under certain data conditions

[
https://issues.apache.org/jira/browse/PIG-1366?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Pradeep Kamath updated PIG-1366:

Attachment: PIG-1366.patch

Currently in PigStorage the ArrayList backing the Tuple returned in getNext()
is created in readField(). Under the data conditions explained in the
description, readField() never gets called and the ArrayList (mProtoTuple)
remains null causing the eventual NPE. The patch fixes the issue by
initializing mProtoTuple to a new ArrayList at the beginning of getNext().

PigStorage's pushProjection implementation results in NPE under certain data
conditions
---

Key: PIG-1366
URL: https://issues.apache.org/jira/browse/PIG-1366
Project: Pig
Issue Type: Bug
Affects Versions: 0.6.0, 0.7.0
Reporter: Pradeep Kamath
Assignee: Pradeep Kamath
Fix For: 0.7.0

Attachments: PIG-1366.patch

Under the following conditions, a NullPointerException is caused when
PigStorage is used:
If in the script, only the 2nd and 3rd column of the data (say) are used, the
PruneColumns optimization passes this information to PigStorage through the
pushProjection() method. If the data contains a row with only one column
(malformed data due to missing cols in certain rows), PigStorage returns a
Tuple backed by a null ArrayList. Subsequent projection operations on this
tuple result in the NPE.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (PIG-1366) PigStorage's pushProjection implementation results in NPE under certain data conditions


 [ 
https://issues.apache.org/jira/browse/PIG-1366?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pradeep Kamath updated PIG-1366:


Status: Patch Available  (was: Open)

 PigStorage's pushProjection implementation results in NPE under certain data 
 conditions
 ---

 Key: PIG-1366
 URL: https://issues.apache.org/jira/browse/PIG-1366
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.6.0, 0.7.0
Reporter: Pradeep Kamath
Assignee: Pradeep Kamath
 Fix For: 0.7.0

 Attachments: PIG-1366.patch


 Under the following conditions, a NullPointerException is caused when 
 PigStorage is used:
 If in the script, only the 2nd and 3rd column of the data (say) are used, the 
 PruneColumns optimization passes this information to PigStorage through the 
 pushProjection() method. If the data contains a row with only one column 
 (malformed data due to missing cols in certain rows), PigStorage returns a 
 Tuple backed by a null ArrayList. Subsequent projection operations on this 
 tuple result in the NPE.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (PIG-1365) WrappedIOException is missing from Pig.jar


 [ 
https://issues.apache.org/jira/browse/PIG-1365?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pradeep Kamath updated PIG-1365:


Attachment: PIG-1365.patch

Attached patch restores WrappedIOException - this is not used in Pig Code and 
only provided for use by UDFs to maintain backward compatibility. I have marked 
the class as deprecated so that it can be removed from pig code base in a later 
release.

No unit tests have been added since this is just restoring an old class which 
is no longer used in the pig code.

 WrappedIOException is missing from Pig.jar
 --

 Key: PIG-1365
 URL: https://issues.apache.org/jira/browse/PIG-1365
 Project: Pig
  Issue Type: Bug
Reporter: Olga Natkovich
Assignee: Pradeep Kamath
Priority: Critical
 Fix For: 0.7.0

 Attachments: PIG-1365.patch


 We need to put it back since UDFs rely on it.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (PIG-1365) WrappedIOException is missing from Pig.jar


 [ 
https://issues.apache.org/jira/browse/PIG-1365?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pradeep Kamath updated PIG-1365:


Status: Patch Available  (was: Open)

 WrappedIOException is missing from Pig.jar
 --

 Key: PIG-1365
 URL: https://issues.apache.org/jira/browse/PIG-1365
 Project: Pig
  Issue Type: Bug
Reporter: Olga Natkovich
Assignee: Pradeep Kamath
Priority: Critical
 Fix For: 0.7.0

 Attachments: PIG-1365.patch


 We need to put it back since UDFs rely on it.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (PIG-1338) Pig should exclude hadoop conf in local mode

2010-04-06 Thread Pradeep Kamath (JIRA)


[ 
https://issues.apache.org/jira/browse/PIG-1338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12854166#action_12854166
 ] 

Pradeep Kamath commented on PIG-1338:
-

+1 - changes look good. A minor comment:
Can the following error message be changed from:
{noformat}
Cannot find hadoop configurations in classpath.
{noformat}

to 

{noformat}
Cannot find hadoop configurations in classpath (neither hadoop-site.xml nor 
core-site.xml was found in the classpath).
{noformat}

 Pig should exclude hadoop conf in local mode
 

 Key: PIG-1338
 URL: https://issues.apache.org/jira/browse/PIG-1338
 Project: Pig
  Issue Type: Improvement
  Components: impl
Affects Versions: 0.7.0
Reporter: Daniel Dai
Assignee: Daniel Dai
 Attachments: PIG-1338-1.patch, PIG-1338-2.patch, PIG-1338-3.patch, 
 PIG-1338-4.patch, PIG-1338-5.patch, PIG-1338-6.patch


 Currently, the behavior for hadoop conf look up is:
 * in local mode, if there is hadoop conf, bail out; if there is no hadoop 
 conf, launch local mode
 * in hadoop mode, if there is hadoop conf, use this conf to launch Pig; if 
 no, still launch without warning, but many functionality will go wrong
 We should bring it to a more intuitive way, which is:
 * in local mode, always launch Pig in local mode
 * in hadoop mode, if there is hadoop conf, use this conf to launch Pig; if 
 no, bail out with a meaningful message

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (PIG-1299) Implement Pig counter to track number of output rows for each output files

2010-04-06 Thread Pradeep Kamath (JIRA)


[ 
https://issues.apache.org/jira/browse/PIG-1299?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12854188#action_12854188
 ] 

Pradeep Kamath commented on PIG-1299:
-

Changes are mostly good - a few comments:
1) Instead of creating a wrapper RecordWriter in MapReducePOStoreImpl, the 
incrementing of the counter should be done in POStore.getNext() - POStore holds 
a reference to MapReducePOStoreImpl, so the counter is available for 
incrementing. This way, we will still keep our contract to StoreFunc that the 
RecordWriter instance provided in prepareToWrite() is the same as the one given 
by StoreFunc.getOutputFormat().getRecordWriter(). With this change, the change 
to BinStorage should be reverted.
2) Is the check for store.isMultiStore() required in MapReducePOStoreImpl - I 
think MapReducePOStoreImpl is used only with multi-store POStore(s) - so the 
check seems redundant
3) If javac warnings can be addressed, please address them - also unit tests 
along the lines of those in TestCounters would be good.

 Implement Pig counter  to track number of output rows for each output files 
 

 Key: PIG-1299
 URL: https://issues.apache.org/jira/browse/PIG-1299
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.6.0
Reporter: Richard Ding
Assignee: Richard Ding
 Fix For: 0.8.0

 Attachments: PIG-1299.patch


 When running a multi-store query, the Hadoop job tracker often displays only 
 0 for Reduce output records or Map output records counters, This is 
 incorrect and misleading. Pig should implement an output records counter 
 for each output files in the query. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

RE: Begin a discussion about Pig as a top level project

2010-04-05 Thread Pradeep Kamath

I agree with Ashutosh and Santhosh. Just based on the current direction of the 
project I think we are more closely tied with Hadoop now (with Pig 0.7, our 
load/store interfaces are very closely tied with Hadoop) - hence for now my 
vote would be a -1 to be a TLP - if there is change in that 
direction/philosophy to be really backend agnostic I think we should revisit 
this question.

Pradeep

-Original Message-
From: Ashutosh Chauhan [mailto:ashutosh.chau...@gmail.com] 
Sent: Sunday, April 04, 2010 11:11 PM
To: pig-dev@hadoop.apache.org
Subject: Re: Begin a discussion about Pig as a top level project

I concur with Santhosh here. I think main question we need to answer
here is how close our ties are with Hadoop currently and how it will
be in future ? When Pig was originally designed the intent was to keep
it backend neutral, so  much so that there was a reference backend
implementation (also known as local engine) which had nothing to do
with Hadoop. But things have changed since then. Hadoop's local mode
is adopted in favor of Pig's own local mode. We have moved from being
backend agnostic to hadoop favoring. And while this was happening, it
seems we tried to keep Pig Latin language independent of hadoop
backend  while Pig runtime started to make use of hadoop concepts.

Apart from design decisions, this move also has a practical impact on
our codebase. Since we adopted Hadoop more closely, we got rid of an
extra layer of abstraction and instead started using similar
abstractions already existing in Hadoop. This has a positive impact
that it simplified the codebase and provides tighter integration with
Hadoop.
So, if we are continuing in a direction where Hadoop is our only
backend (or atleast a favored one), close ties to Hadoop are useful
because of the reasons Alan and Dmitriy pointed out. if not, then I
think moving out to TLP makes sense. Since, there is no efforts which
I am aware of, is trying to plug in a different backend for Pig, I
think maintaining close ties with Hadoop is useful for Pig. In future
when there is a different distributed computing platform comes up
which we want to use as backend, we can revisit our decision. So, as
for things stand today I am -1 to move out of  Hadoop.

And I would also like to reiterate my point that though Pig runtime
may continue to get closer to Hadoop, we shall keep Pig Latin
completely backend agnostic.

Ashutosh

On Sat, Apr 3, 2010 at 12:43, Santhosh Srinivasan s...@yahoo-inc.com wrote:
 I see this as a multi-part question. Looking back at some of the
 significant roadmap/existential questions asked in the last 12 months, I
 see the following:

 1. With the introduction of SQL, what is the philosophy of Pig (I sent
 an email about this approximately 9 months ago)
 2. What is the approach to support backward compatibility in Pig (Alan
 had sent an email about this 3 months ago)
 3. Should Pig be a TLP (the current email thread).

 Here is my take on answering the aforementioned questions.

 The initial philosophy of Pig was to be backend agnostic. It was
 designed as a data flow language. Whenever a new language is designed,
 the syntax and semantics of the language have to be laid out. The syntax
 is usually captured in the form of a BNF grammar. The semantics are
 defined by the language creators. Backward compatibility is then a
 question of holding true to the syntax and semantics. With Pig, in
 addition to the language, the Java APIs were exposed to customers to
 implement UDFs (load/store/filter/grouping/row transformation etc),
 provision looping since the language does not support looping constructs
 and also support a programmatic mode of access. Backward compatibility
 in this context is to support API versioning.

 Do we still intend to position as a data flow language that is backend
 agnostic? If the answer is yes, then there is a strong case for making
 Pig a TLP.

 Are we influenced by Hadoop? A big YES! The reason Pig chose to become a
 Hadoop sub-project was to ride the Hadoop popularity wave. As a
 consequence, we chose to be heavily influenced by the Hadoop roadmap.

 Like a good lawyer, I also have rebuttals to Alan's questions :)

 1. Search engine popularity - We can discuss this with the Hadoop team
 and still retain links to TLP's that are coupled (loosely or tightly).
 2. Explicit connection to Hadoop - I see this as logical connection v/s
 physical connection. Today, we are physically connected as a
 sub-project. Becoming a TLP, will not increase/decrease our influence on
 the Hadoop community (think Logical, Physical and MR Layers :)
 3. Philosophy - I have already talked about this. The tight coupling is
 by choice. If Pig continues to be a data flow language with clear syntax
 and semantics then someone can implement Pig on top of a different
 backend. Do we intend to take this approach?

 I just wanted to offer a different opinion to this thread. I strongly
 believe that we should think about the original

[jira] Commented: (PIG-1330) Move pruned schema tracking logic from LoadFunc to core code

2010-04-05 Thread Pradeep Kamath (JIRA)


[ 
https://issues.apache.org/jira/browse/PIG-1330?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12853600#action_12853600
 ] 

Pradeep Kamath commented on PIG-1330:
-

+1

 Move pruned schema tracking logic from LoadFunc to core code
 

 Key: PIG-1330
 URL: https://issues.apache.org/jira/browse/PIG-1330
 Project: Pig
  Issue Type: Improvement
  Components: impl
Affects Versions: 0.7.0
Reporter: Daniel Dai
Assignee: Daniel Dai
 Fix For: 0.7.0

 Attachments: PIG-1330-1.patch


 Currently, LoadFunc.getSchema require a schema after column pruning. The good 
 side of this is LoadFunc.getSchema matches the data it actually load. This 
 gives a sense of consistency. However, by doing this, every LoadFunc need to 
 keep track of the columns pruned. This is an unnecessary burden to the 
 LoadFunc writer and it is very error proning. This issue is to move this 
 logic from LoadFunc to Pig core. LoadFunc.getSchema then only need to return 
 original schema even after pruning.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (PIG-1337) Need a way to pass distributed cache configuration information to hadoop backend in Pig's LoadFunc

[
https://issues.apache.org/jira/browse/PIG-1337?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12852485#action_12852485
]

Pradeep Kamath commented on PIG-1337:
-

We may need to add a new method - addToDistributedCache() on LoadFunc -
notice this is an adder not a setter since there is only one key for
distributed cache in hadoop's Job (Configuration in the Job). So
implementations of loadfunc will have to use the DistributedCache.add*()
methods.

Need a way to pass distributed cache configuration information to hadoop
backend in Pig's LoadFunc
--

Key: PIG-1337
URL: https://issues.apache.org/jira/browse/PIG-1337
Project: Pig
Issue Type: Improvement
Affects Versions: 0.6.0
Reporter: Chao Wang
Fix For: 0.8.0

The Zebra storage layer needs to use distributed cache to reduce name node
load during job runs.
To to this, Zebra needs to set up distributed cache related configuration
information in TableLoader (which extends Pig's LoadFunc) .
It is doing this within getSchema(conf). The problem is that the conf object
here is not the one that is being serialized to map/reduce backend. As such,
the distributed cache is not set up properly.
To work over this problem, we need Pig in its LoadFunc to ensure a way that
we can use to set up distributed cache information in a conf object, and this
conf object is the one used by map/reduce backend.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (PIG-1346) In unit tests Util.executeShellCommand relies on java commands being in the path and does not consider JAVA_HOME


 [ 
https://issues.apache.org/jira/browse/PIG-1346?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pradeep Kamath updated PIG-1346:


Status: Open  (was: Patch Available)

 In unit tests Util.executeShellCommand relies on java commands being in the 
 path and does not consider JAVA_HOME
 

 Key: PIG-1346
 URL: https://issues.apache.org/jira/browse/PIG-1346
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.6.0, 0.7.0
Reporter: Pradeep Kamath
Assignee: Pradeep Kamath
 Attachments: PIG-1346.patch


 Util.executeShellCommand is currently used in unit tests to execute java 
 related binaries like java, javac, jar - this method should check if 
 JAVA_HOME is set and use $JAVA_HOME/bin/java etc. If JAVA_HOME is not set, 
 the method can try and execute the command as-is.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (PIG-1346) In unit tests Util.executeShellCommand relies on java commands being in the path and does not consider JAVA_HOME


 [ 
https://issues.apache.org/jira/browse/PIG-1346?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pradeep Kamath updated PIG-1346:


Attachment: PIG-1346-2.patch

The earlier patch was using System.getProperty(java.home) - apparently ant 
sometimes appends jre to $JAVA_HOME as the value of the java.home property - 
this causes failures since $JAVA_HOME/jre/bin/ does not contain javac. I have 
changed this code to use System.getEnv(JAVA_HOME) instead.

 In unit tests Util.executeShellCommand relies on java commands being in the 
 path and does not consider JAVA_HOME
 

 Key: PIG-1346
 URL: https://issues.apache.org/jira/browse/PIG-1346
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.6.0, 0.7.0
Reporter: Pradeep Kamath
Assignee: Pradeep Kamath
 Attachments: PIG-1346-2.patch, PIG-1346.patch


 Util.executeShellCommand is currently used in unit tests to execute java 
 related binaries like java, javac, jar - this method should check if 
 JAVA_HOME is set and use $JAVA_HOME/bin/java etc. If JAVA_HOME is not set, 
 the method can try and execute the command as-is.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (PIG-1346) In unit tests Util.executeShellCommand relies on java commands being in the path and does not consider JAVA_HOME