[jira] Commented: (PIG-1288) EvalFunc returnType is wrong for generic subclasses

2010-05-05 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12864254#action_12864254
 ] 

Hadoop QA commented on PIG-1288:


+1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12443659/PIG-1288-3.patch
  against trunk revision 941005.

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 17 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

+1 core tests.  The patch passed core unit tests.

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/315/testReport/
Findbugs warnings: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/315/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/315/console

This message is automatically generated.

 EvalFunc returnType is wrong for generic subclasses
 ---

 Key: PIG-1288
 URL: https://issues.apache.org/jira/browse/PIG-1288
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.7.0
Reporter: Daniel Dai
Assignee: Daniel Dai
 Fix For: 0.8.0

 Attachments: PIG-1288-1.patch, PIG-1288-2.patch, PIG-1288-3.patch


 From Garrett Buster Kaminaga:
 The EvalFunc constructor has code to determine the return type of the 
 function.
 This walks up the object hierarchy until it encounters EvalFunc, then calls 
 getActualTypeArguments and extracts type
 param 0.
 However, if the user class is itself a generic extension of EvalFunc, then 
 the returned object is not the correct type,
 but a TypeVariable.
 Example:
   class MyAbstractEvalFuncT extends EvalFuncT ...
   class MyEvalFunc extends MyAbstractEvalFuncString ...
 when MyEvalFunc() is called, inside EvalFunc constructor the return type is 
 set to a TypeVariable rather than
 String.class.
 The workaround we've implemented is for the MyAbstractEvalFuncT to 
 determine *its* type parameters using code
 similar to that in the EvalFunc constructor, and then reset protected data 
 member returnType manually in the
 MyAbstractEvalFunc constructor.  (though this has the same drawback of not 
 working if someone then extends
 MyAbstractEvalFunc)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-566) Dump and store outputs do not match for PigStorage

2010-05-05 Thread Gianmarco De Francisci Morales (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12864336#action_12864336
 ] 

Gianmarco De Francisci Morales commented on PIG-566:


What should the default format be? With or without L/F at the end?

The loader function already checks for the presence of a letter at the end, so 
we can accept both.

I think that without is better anyway, it complies to normal Java behaviour. 
The L/F is used only in source code.

 Dump and store outputs do not match for PigStorage
 --

 Key: PIG-566
 URL: https://issues.apache.org/jira/browse/PIG-566
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.2.0
Reporter: Santhosh Srinivasan
Priority: Minor

 The dump and store formats for PigStorage do not match for longs and floats.
 {code}
 grunt y = foreach x generate {(2985671202194220139L)};
 grunt describe y;
 y: {{(long)}}
 grunt dump y;
 ({(2985671202194220139L)})
 grunt store y into 'y';
 grunt cat y
 {(2985671202194220139)}
 {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1211) Pig script runs half way after which it reports syntax error

2010-05-05 Thread Pradeep Kamath (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1211?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12864386#action_12864386
 ] 

Pradeep Kamath commented on PIG-1211:
-

core unit tests are pass on my local machine - the errors reported above seem 
to be related to the environment. The release audit warning is due to a html 
file change and can be ignored - the patch is ready for review.

 Pig script runs half way after which it reports syntax error
 

 Key: PIG-1211
 URL: https://issues.apache.org/jira/browse/PIG-1211
 Project: Pig
  Issue Type: Improvement
  Components: impl
Affects Versions: 0.6.0
Reporter: Viraj Bhat
 Fix For: 0.8.0

 Attachments: PIG-1211.patch


 I have a Pig script which is structured in the following way
 {code}
 register cp.jar
 dataset = load '/data/dataset/' using PigStorage('\u0001') as (col1, col2, 
 col3, col4, col5);
 filtered_dataset = filter dataset by (col1 == 1);
 proj_filtered_dataset = foreach filtered_dataset generate col2, col3;
 rmf $output1;
 store proj_filtered_dataset into '$output1' using PigStorage();
 second_stream = foreach filtered_dataset  generate col2, col4, col5;
 group_second_stream = group second_stream by col4;
 output2 = foreach group_second_stream {
  a =  second_stream.col2
  b =   distinct second_stream.col5;
  c = order b by $0;
  generate 1 as key, group as keyword, MYUDF(c, 100) as finalcalc;
 }
 rmf  $output2;
 --syntax error here
 store output2 to '$output2' using PigStorage();
 {code}
 I run this script using the Multi-query option, it runs successfully till the 
 first store but later fails with a syntax error. 
 The usage of HDFS option, rmf causes the first store to execute. 
 The only option the I have is to run an explain before running his script 
 grunt explain -script myscript.pig -out explain.out
 or moving the rmf statements to the top of the script
 Here are some questions:
 a) Can we have an option to do something like checkscript instead of 
 explain to get the same syntax error?  In this way I can ensure that I do not 
 run for 3-4 hours before encountering a syntax error
 b) Can pig not figure out a way to re-order the rmf statements since all the 
 store directories are variables
 Thanks
 Viraj

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-823) Hadoop Metadata Service

2010-05-05 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-823?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates updated PIG-823:
---

Status: Resolved  (was: Patch Available)
Resolution: Duplicate

 Hadoop Metadata Service
 ---

 Key: PIG-823
 URL: https://issues.apache.org/jira/browse/PIG-823
 Project: Pig
  Issue Type: New Feature
Reporter: Olga Natkovich
 Attachments: owl.filelist, owl.patch.gz, owl_libdeps.tgz, 
 owl_otherdeps.tgz


 This JIRA is created to track development of a metadata system for  Hadoop. 
 The goal of the system is to allow users and applications to register data 
 stored on HDFS, search for the data available on HDFS, and associate metadata 
 such as schema, statistics, etc. with a particular data unit or a data set 
 stored on HDFS. The initial goal is to provide a fairly generic, low level 
 abstraction that any user or application on HDFS can use to store an retrieve 
 metadata. Over time a higher level abstractions closely tied to particular 
 applications or tools can be developed.
 Over time, it would make sense for the metadata service to become a 
 subproject within Hadoop. For now, the proposal is to make it a contrib to 
 Pig since Pig SQL is likely to be the first user of the system.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-928) UDFs in scripting languages

2010-05-05 Thread Julien Le Dem (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12864402#action_12864402
 ] 

Julien Le Dem commented on PIG-928:
---

The attentive reader will have noticed that it should be tar xzvf 
pig-greek.tgz in my previous comment.

 UDFs in scripting languages
 ---

 Key: PIG-928
 URL: https://issues.apache.org/jira/browse/PIG-928
 Project: Pig
  Issue Type: New Feature
Reporter: Alan Gates
 Fix For: 0.8.0

 Attachments: package.zip, pig-greek.tgz, pyg.tgz, scripting.tgz, 
 scripting.tgz


 It should be possible to write UDFs in scripting languages such as python, 
 ruby, etc.  This frees users from needing to compile Java, generate a jar, 
 etc.  It also opens Pig to programmers who prefer scripting languages over 
 Java.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1378) har url of the form har:///path not usable in Pig scripts (har://hdfs-namenode:port/path works)

2010-05-05 Thread Pradeep Kamath (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1378?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pradeep Kamath updated PIG-1378:


Summary: har url of the form har:///path not usable in Pig scripts 
(har://hdfs-namenode:port/path works)  (was: har url not usable in Pig 
scripts)

 har url of the form har:///path not usable in Pig scripts 
 (har://hdfs-namenode:port/path works)
 ---

 Key: PIG-1378
 URL: https://issues.apache.org/jira/browse/PIG-1378
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.7.0
Reporter: Viraj Bhat
Assignee: Pradeep Kamath
 Fix For: 0.8.0

 Attachments: PIG-1378-2.patch, PIG-1378-3.patch, PIG-1378-4.patch, 
 PIG-1378.patch


 I am trying to use har (Hadoop Archives) in my Pig script.
 I can use them through the HDFS shell
 {noformat}
 $hadoop fs -ls 'har:///user/viraj/project/subproject/files/size/data'
 Found 1 items
 -rw---   5 viraj users1537234 2010-04-14 09:49 
 user/viraj/project/subproject/files/size/data/part-1
 {noformat}
 Using similar URL's in grunt yields
 {noformat}
 grunt a = load 'har:///user/viraj/project/subproject/files/size/data'; 
 grunt dump a;
 {noformat}
 {noformat}
 2010-04-14 22:08:48,814 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 
 2998: Unhandled internal error. 
 org.apache.pig.impl.logicalLayer.FrontendException: ERROR 0: Incompatible 
 file URI scheme: har : hdfs
 2010-04-14 22:08:48,814 [main] WARN  org.apache.pig.tools.grunt.Grunt - There 
 is no log file to write to.
 2010-04-14 22:08:48,814 [main] ERROR org.apache.pig.tools.grunt.Grunt - 
 java.lang.Error: org.apache.pig.impl.logicalLayer.FrontendException: ERROR 0: 
 Incompatible file URI scheme: har : hdfs
 at 
 org.apache.pig.impl.logicalLayer.parser.QueryParser.LoadClause(QueryParser.java:1483)
 at 
 org.apache.pig.impl.logicalLayer.parser.QueryParser.BaseExpr(QueryParser.java:1245)
 at 
 org.apache.pig.impl.logicalLayer.parser.QueryParser.Expr(QueryParser.java:911)
 at 
 org.apache.pig.impl.logicalLayer.parser.QueryParser.Parse(QueryParser.java:700)
 at 
 org.apache.pig.impl.logicalLayer.LogicalPlanBuilder.parse(LogicalPlanBuilder.java:63)
 at org.apache.pig.PigServer$Graph.parseQuery(PigServer.java:1164)
 at org.apache.pig.PigServer$Graph.registerQuery(PigServer.java:1114)
 at org.apache.pig.PigServer.registerQuery(PigServer.java:425)
 at 
 org.apache.pig.tools.grunt.GruntParser.processPig(GruntParser.java:737)
 at 
 org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:324)
 at 
 org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:162)
 at 
 org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:138)
 at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:75)
 at org.apache.pig.Main.main(Main.java:357)
 Caused by: org.apache.pig.impl.logicalLayer.FrontendException: ERROR 0: 
 Incompatible file URI scheme: har : hdfs
 at org.apache.pig.LoadFunc.getAbsolutePath(LoadFunc.java:249)
 at org.apache.pig.LoadFunc.relativeToAbsolutePath(LoadFunc.java:62)
 at 
 org.apache.pig.impl.logicalLayer.parser.QueryParser.LoadClause(QueryParser.java:1472)
 ... 13 more
 {noformat}
 According to Jira http://issues.apache.org/jira/browse/PIG-1234 I try the 
 following as stated in the original description
 {noformat}
 grunt a = load 
 'har://namenode-location/user/viraj/project/subproject/files/size/data'; 
 grunt dump a;
 {noformat}
 {noformat}
 Caused by: org.apache.pig.backend.executionengine.ExecException: ERROR 2118: 
 Unable to create input splits for: 
 har://namenode-location/user/viraj/project/subproject/files/size/data'; 
 ... 8 more
 Caused by: java.io.IOException: No FileSystem for scheme: namenode-location
 at .apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1375)
 at .apache.hadoop.fs.FileSystem.access(200(FileSystem.java:66)
 at .apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1390)
 at .apache.hadoop.fs.FileSystem.get(FileSystem.java:196)
 at .apache.hadoop.fs.HarFileSystem.initialize(HarFileSystem.java:104)
 at .apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1378)
 at .apache.hadoop.fs.FileSystem.get(FileSystem.java:193)
 at .apache.hadoop.fs.Path.getFileSystem(Path.java:175)
 at 
 .apache.hadoop.mapreduce.lib.input.FileInputFormat.listStatus(FileInputFormat.java:208)
 at 
 .apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigTextInputFormat.listStatus(PigTextInputFormat.java:36)
 at 
 

[jira] Resolved: (PIG-1378) har url of the form har:///path not usable in Pig scripts (har://hdfs-namenode:port/path works)

2010-05-05 Thread Pradeep Kamath (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1378?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pradeep Kamath resolved PIG-1378.
-

Release Note: The fix for this issue described in this jira depends on a 
issue with Hadoop code which was fixed on the hadoop trunk ( 
https://issues.apache.org/jira/browse/MAPREDUCE-1522). Until that goes into a 
hadoop release which is used by pig, this will remain an issue 
  Resolution: Fixed

Am closing this bug since the pig changes are in and hadoop changes are in 
trunk - this should work once we use the appropriate hadoop release.

 har url of the form har:///path not usable in Pig scripts 
 (har://hdfs-namenode:port/path works)
 ---

 Key: PIG-1378
 URL: https://issues.apache.org/jira/browse/PIG-1378
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.7.0
Reporter: Viraj Bhat
Assignee: Pradeep Kamath
 Fix For: 0.8.0

 Attachments: PIG-1378-2.patch, PIG-1378-3.patch, PIG-1378-4.patch, 
 PIG-1378.patch


 I am trying to use har (Hadoop Archives) in my Pig script.
 I can use them through the HDFS shell
 {noformat}
 $hadoop fs -ls 'har:///user/viraj/project/subproject/files/size/data'
 Found 1 items
 -rw---   5 viraj users1537234 2010-04-14 09:49 
 user/viraj/project/subproject/files/size/data/part-1
 {noformat}
 Using similar URL's in grunt yields
 {noformat}
 grunt a = load 'har:///user/viraj/project/subproject/files/size/data'; 
 grunt dump a;
 {noformat}
 {noformat}
 2010-04-14 22:08:48,814 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 
 2998: Unhandled internal error. 
 org.apache.pig.impl.logicalLayer.FrontendException: ERROR 0: Incompatible 
 file URI scheme: har : hdfs
 2010-04-14 22:08:48,814 [main] WARN  org.apache.pig.tools.grunt.Grunt - There 
 is no log file to write to.
 2010-04-14 22:08:48,814 [main] ERROR org.apache.pig.tools.grunt.Grunt - 
 java.lang.Error: org.apache.pig.impl.logicalLayer.FrontendException: ERROR 0: 
 Incompatible file URI scheme: har : hdfs
 at 
 org.apache.pig.impl.logicalLayer.parser.QueryParser.LoadClause(QueryParser.java:1483)
 at 
 org.apache.pig.impl.logicalLayer.parser.QueryParser.BaseExpr(QueryParser.java:1245)
 at 
 org.apache.pig.impl.logicalLayer.parser.QueryParser.Expr(QueryParser.java:911)
 at 
 org.apache.pig.impl.logicalLayer.parser.QueryParser.Parse(QueryParser.java:700)
 at 
 org.apache.pig.impl.logicalLayer.LogicalPlanBuilder.parse(LogicalPlanBuilder.java:63)
 at org.apache.pig.PigServer$Graph.parseQuery(PigServer.java:1164)
 at org.apache.pig.PigServer$Graph.registerQuery(PigServer.java:1114)
 at org.apache.pig.PigServer.registerQuery(PigServer.java:425)
 at 
 org.apache.pig.tools.grunt.GruntParser.processPig(GruntParser.java:737)
 at 
 org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:324)
 at 
 org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:162)
 at 
 org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:138)
 at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:75)
 at org.apache.pig.Main.main(Main.java:357)
 Caused by: org.apache.pig.impl.logicalLayer.FrontendException: ERROR 0: 
 Incompatible file URI scheme: har : hdfs
 at org.apache.pig.LoadFunc.getAbsolutePath(LoadFunc.java:249)
 at org.apache.pig.LoadFunc.relativeToAbsolutePath(LoadFunc.java:62)
 at 
 org.apache.pig.impl.logicalLayer.parser.QueryParser.LoadClause(QueryParser.java:1472)
 ... 13 more
 {noformat}
 According to Jira http://issues.apache.org/jira/browse/PIG-1234 I try the 
 following as stated in the original description
 {noformat}
 grunt a = load 
 'har://namenode-location/user/viraj/project/subproject/files/size/data'; 
 grunt dump a;
 {noformat}
 {noformat}
 Caused by: org.apache.pig.backend.executionengine.ExecException: ERROR 2118: 
 Unable to create input splits for: 
 har://namenode-location/user/viraj/project/subproject/files/size/data'; 
 ... 8 more
 Caused by: java.io.IOException: No FileSystem for scheme: namenode-location
 at .apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1375)
 at .apache.hadoop.fs.FileSystem.access(200(FileSystem.java:66)
 at .apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1390)
 at .apache.hadoop.fs.FileSystem.get(FileSystem.java:196)
 at .apache.hadoop.fs.HarFileSystem.initialize(HarFileSystem.java:104)
 at .apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1378)
 at .apache.hadoop.fs.FileSystem.get(FileSystem.java:193)
 at .apache.hadoop.fs.Path.getFileSystem(Path.java:175)
 at 

[VOTE] Release Pig 0.7.0 (candidate 0)

2010-05-05 Thread Daniel Dai

Hi,

I have created a candidate build for Pig 0.7.0. A description of what is 
new and different is included in the release notes: 
http://people.apache.org/~daijy/pig-0.7.0-candidate-0/RELEASE_NOTES.txt


Keys used to sign the release are available at 
http://svn.apache.org/viewvc/hadoop/pig/trunk/KEYS?view=markup


Please download, test, try it out and vote. The download link is:

http://people.apache.org/~daijy/pig-0.7.0-candidate-0

Thanks
Daniel


[jira] Updated: (PIG-1280) Add a pig-script-id to the JobConf of all jobs run in a pig-script

2010-05-05 Thread Richard Ding (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1280?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Richard Ding updated PIG-1280:
--

Fix Version/s: 0.8.0

 Add a pig-script-id to the JobConf of all jobs run in a pig-script
 --

 Key: PIG-1280
 URL: https://issues.apache.org/jira/browse/PIG-1280
 Project: Pig
  Issue Type: Improvement
  Components: impl
Reporter: Arun C Murthy
Assignee: Richard Ding
 Fix For: 0.8.0


 It would be very useful for tools like gridmix if pig could add a 
 'pig-script-id' to all Map-Reduce jobs spawned by a single pig-script. 
 Potentially we could use this to re-construct the DAG of jobs in gridmix and 
 so on.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1280) Add a pig-script-id to the JobConf of all jobs run in a pig-script

2010-05-05 Thread Richard Ding (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1280?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12864482#action_12864482
 ] 

Richard Ding commented on PIG-1280:
---

There have been several similar requests (on adding new Pig properties to MR 
job) since this Jira was filed. Here are a compilation of those properties:

* _pig.script.id_
* _pig.script_ (the Pig script generates this job)
* _pig.launcher.host_ (the host/IP of the machine on which the Pig script is 
executed)
* _pig.command.line_ (the Pig command line arguments of this script)
* _pig.input.dirs_ (comma separated input directory list of this job)
* _pig.output.dirs_ (comma separated output directory list of this job)
* _pig.version_ 
  


 Add a pig-script-id to the JobConf of all jobs run in a pig-script
 --

 Key: PIG-1280
 URL: https://issues.apache.org/jira/browse/PIG-1280
 Project: Pig
  Issue Type: Improvement
  Components: impl
Reporter: Arun C Murthy
Assignee: Richard Ding

 It would be very useful for tools like gridmix if pig could add a 
 'pig-script-id' to all Map-Reduce jobs spawned by a single pig-script. 
 Potentially we could use this to re-construct the DAG of jobs in gridmix and 
 so on.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1405) Need to move many standard functions from piggybank into Pig

2010-05-05 Thread Dmitriy V. Ryaboy (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12864499#action_12864499
 ] 

Dmitriy V. Ryaboy commented on PIG-1405:


I think Top (TOP) is a common enough thing to do to put into builtin.

Regarding naming -- for readability I propose, LAST_INDEX_OF,  REGEX_EXTRACT 
and REGEX_EXTRACT_ALL

 Need to move many standard functions from piggybank into Pig
 

 Key: PIG-1405
 URL: https://issues.apache.org/jira/browse/PIG-1405
 Project: Pig
  Issue Type: Improvement
Reporter: Alan Gates
 Fix For: 0.8.0


 There are currently a number of functions in Piggybank that represent 
 features commonly supported by languages and database engines.  We need to 
 decide which of these Pig should support as built in functions and put them 
 in org.apache.pig.builtin.  This will also mean adding unit tests and 
 javadocs for some UDFs.  The existing classes will be left in Piggybank for 
 some time for backward compatibility.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-566) Dump and store outputs do not match for PigStorage

2010-05-05 Thread Daniel Dai (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12864567#action_12864567
 ] 

Daniel Dai commented on PIG-566:


Agree, I vote for without L/F.

 Dump and store outputs do not match for PigStorage
 --

 Key: PIG-566
 URL: https://issues.apache.org/jira/browse/PIG-566
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.2.0
Reporter: Santhosh Srinivasan
Priority: Minor

 The dump and store formats for PigStorage do not match for longs and floats.
 {code}
 grunt y = foreach x generate {(2985671202194220139L)};
 grunt describe y;
 y: {{(long)}}
 grunt dump y;
 ({(2985671202194220139L)})
 grunt store y into 'y';
 grunt cat y
 {(2985671202194220139)}
 {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Assigned: (PIG-566) Dump and store outputs do not match for PigStorage

2010-05-05 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-566?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai reassigned PIG-566:
--

Assignee: Gianmarco De Francisci Morales

 Dump and store outputs do not match for PigStorage
 --

 Key: PIG-566
 URL: https://issues.apache.org/jira/browse/PIG-566
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.2.0
Reporter: Santhosh Srinivasan
Assignee: Gianmarco De Francisci Morales
Priority: Minor

 The dump and store formats for PigStorage do not match for longs and floats.
 {code}
 grunt y = foreach x generate {(2985671202194220139L)};
 grunt describe y;
 y: {{(long)}}
 grunt dump y;
 ({(2985671202194220139L)})
 grunt store y into 'y';
 grunt cat y
 {(2985671202194220139)}
 {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1391) pig unit tests leave behind files in temp directory because MiniCluster files don't get deleted

2010-05-05 Thread Daniel Dai (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12864577#action_12864577
 ] 

Daniel Dai commented on PIG-1391:
-

Recommend to change two things:
1. remove ${junit.tmp.dir} after unit test targets in build.xml
2. remove fixes for zebra test case in this patch, it seems to be an irrelevant 
change.

Other part are good. Please commit after the above two changes.

 pig unit tests leave behind files in temp directory because MiniCluster files 
 don't get deleted
 ---

 Key: PIG-1391
 URL: https://issues.apache.org/jira/browse/PIG-1391
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.7.0, 0.8.0
Reporter: Thejas M Nair
Assignee: Thejas M Nair
 Fix For: 0.6.0, 0.7.0, 0.8.0

 Attachments: minicluster.patch, PIG-1391.06.2.patch, 
 PIG-1391.06.patch, PIG-1391.07.patch, PIG-1391.trunk.patch


 Pig unit test runs leave behind files in temp dir (/tmp) and there are too 
 many files in the directory over time.
 Most of the files are left behind by MiniCluster . It closes/shutsdown 
 MiniDFSCluster, MiniMRCluster and the FileSystem that it has created when the 
 constructor is called, only in finalize(). And java does not guarantee that 
 finalize() will be called. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1345) Link casting errors in POCast to actual lines numbers in Pig script

2010-05-05 Thread Richard Ding (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12864579#action_12864579
 ] 

Richard Ding commented on PIG-1345:
---


By default, the Pig property aggregate.warning is set to true. So in the 
above example, you only get aggregated warning messages, not the original 
detailed warning messages. You can turn off aggregate.warning by the command 
line switch -w. The detailed warning messages contain more information. Here is 
an example:

{code}
[main] WARN  org.apache.pig.PigServer - int is implicitly cast to float under 
LOAdd Operator
[main] WARN  org.apache.pig.PigServer - long is implicitly cast to float under 
LOAdd Operator 
{code}

instead of 

{code}
[main] WARN  org.apache.pig.PigServer - Encountered Warning 
IMPLICIT_CAST_TO_FLOAT 2 time(s).
{code}

 Link casting errors in POCast to actual lines numbers in Pig script
 ---

 Key: PIG-1345
 URL: https://issues.apache.org/jira/browse/PIG-1345
 Project: Pig
  Issue Type: Sub-task
  Components: impl
Affects Versions: 0.6.0
Reporter: Viraj Bhat

 For the purpose of easy debugging, I would be nice to find out where  my 
 warnings are coming from is in the pig script. 
 The only known process is to comment out lines in the Pig script and see if 
 these warnings go away.
 2010-01-13 21:34:13,697 [main] WARN  org.apache.pig.PigServer - Encountered 
 Warning IMPLICIT_CAST_TO_MAP 2 time(s) line 22 
 2010-01-13 21:34:13,698 [main] WARN  org.apache.pig.PigServer - Encountered 
 Warning IMPLICIT_CAST_TO_LONG 2 time(s) line 23
 2010-01-13 21:34:13,698 [main] WARN  org.apache.pig.PigServer - Encountered 
 Warning IMPLICIT_CAST_TO_BAG 1 time(s). line 26
 I think this may need us to keep track of the line numbers of the Pig script 
 (via out javacc parser) and maintain it in the logical and physical plan.
 It would help users in debugging simple errors/warning related to casting.
 Is this enhancement listed in the  http://wiki.apache.org/pig/PigJournal?
 Do we need to change the parser to something other than javacc to make this 
 task simpler?
 Standardize on Parser and Scanner Technology
 Viraj

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1211) Pig script runs half way after which it reports syntax error

2010-05-05 Thread Thejas M Nair (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1211?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12864589#action_12864589
 ] 

Thejas M Nair commented on PIG-1211:


+1

 Pig script runs half way after which it reports syntax error
 

 Key: PIG-1211
 URL: https://issues.apache.org/jira/browse/PIG-1211
 Project: Pig
  Issue Type: Improvement
  Components: impl
Affects Versions: 0.6.0
Reporter: Viraj Bhat
 Fix For: 0.8.0

 Attachments: PIG-1211.patch


 I have a Pig script which is structured in the following way
 {code}
 register cp.jar
 dataset = load '/data/dataset/' using PigStorage('\u0001') as (col1, col2, 
 col3, col4, col5);
 filtered_dataset = filter dataset by (col1 == 1);
 proj_filtered_dataset = foreach filtered_dataset generate col2, col3;
 rmf $output1;
 store proj_filtered_dataset into '$output1' using PigStorage();
 second_stream = foreach filtered_dataset  generate col2, col4, col5;
 group_second_stream = group second_stream by col4;
 output2 = foreach group_second_stream {
  a =  second_stream.col2
  b =   distinct second_stream.col5;
  c = order b by $0;
  generate 1 as key, group as keyword, MYUDF(c, 100) as finalcalc;
 }
 rmf  $output2;
 --syntax error here
 store output2 to '$output2' using PigStorage();
 {code}
 I run this script using the Multi-query option, it runs successfully till the 
 first store but later fails with a syntax error. 
 The usage of HDFS option, rmf causes the first store to execute. 
 The only option the I have is to run an explain before running his script 
 grunt explain -script myscript.pig -out explain.out
 or moving the rmf statements to the top of the script
 Here are some questions:
 a) Can we have an option to do something like checkscript instead of 
 explain to get the same syntax error?  In this way I can ensure that I do not 
 run for 3-4 hours before encountering a syntax error
 b) Can pig not figure out a way to re-order the rmf statements since all the 
 store directories are variables
 Thanks
 Viraj

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1211) Pig script runs half way after which it reports syntax error

2010-05-05 Thread Pradeep Kamath (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1211?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pradeep Kamath updated PIG-1211:


  Status: Resolved  (was: Patch Available)
Hadoop Flags: [Incompatible change, Reviewed]
Release Note: -c (-cluster) was earlier documented as the option to provide 
cluster information - this was not being used in the Pig code though - with 
PIG-1211, -c is being reused as the option to check syntax of the pig script 
  Resolution: Fixed

Patch committed to trunk

 Pig script runs half way after which it reports syntax error
 

 Key: PIG-1211
 URL: https://issues.apache.org/jira/browse/PIG-1211
 Project: Pig
  Issue Type: Improvement
  Components: impl
Affects Versions: 0.6.0
Reporter: Viraj Bhat
 Fix For: 0.8.0

 Attachments: PIG-1211.patch


 I have a Pig script which is structured in the following way
 {code}
 register cp.jar
 dataset = load '/data/dataset/' using PigStorage('\u0001') as (col1, col2, 
 col3, col4, col5);
 filtered_dataset = filter dataset by (col1 == 1);
 proj_filtered_dataset = foreach filtered_dataset generate col2, col3;
 rmf $output1;
 store proj_filtered_dataset into '$output1' using PigStorage();
 second_stream = foreach filtered_dataset  generate col2, col4, col5;
 group_second_stream = group second_stream by col4;
 output2 = foreach group_second_stream {
  a =  second_stream.col2
  b =   distinct second_stream.col5;
  c = order b by $0;
  generate 1 as key, group as keyword, MYUDF(c, 100) as finalcalc;
 }
 rmf  $output2;
 --syntax error here
 store output2 to '$output2' using PigStorage();
 {code}
 I run this script using the Multi-query option, it runs successfully till the 
 first store but later fails with a syntax error. 
 The usage of HDFS option, rmf causes the first store to execute. 
 The only option the I have is to run an explain before running his script 
 grunt explain -script myscript.pig -out explain.out
 or moving the rmf statements to the top of the script
 Here are some questions:
 a) Can we have an option to do something like checkscript instead of 
 explain to get the same syntax error?  In this way I can ensure that I do not 
 run for 3-4 hours before encountering a syntax error
 b) Can pig not figure out a way to re-order the rmf statements since all the 
 store directories are variables
 Thanks
 Viraj

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.