[jira] Commented: (PIG-1288) EvalFunc returnType is wrong for generic subclasses
[ https://issues.apache.org/jira/browse/PIG-1288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12864254#action_12864254 ] Hadoop QA commented on PIG-1288: +1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12443659/PIG-1288-3.patch against trunk revision 941005. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 17 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/315/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/315/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/315/console This message is automatically generated. EvalFunc returnType is wrong for generic subclasses --- Key: PIG-1288 URL: https://issues.apache.org/jira/browse/PIG-1288 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.7.0 Reporter: Daniel Dai Assignee: Daniel Dai Fix For: 0.8.0 Attachments: PIG-1288-1.patch, PIG-1288-2.patch, PIG-1288-3.patch From Garrett Buster Kaminaga: The EvalFunc constructor has code to determine the return type of the function. This walks up the object hierarchy until it encounters EvalFunc, then calls getActualTypeArguments and extracts type param 0. However, if the user class is itself a generic extension of EvalFunc, then the returned object is not the correct type, but a TypeVariable. Example: class MyAbstractEvalFuncT extends EvalFuncT ... class MyEvalFunc extends MyAbstractEvalFuncString ... when MyEvalFunc() is called, inside EvalFunc constructor the return type is set to a TypeVariable rather than String.class. The workaround we've implemented is for the MyAbstractEvalFuncT to determine *its* type parameters using code similar to that in the EvalFunc constructor, and then reset protected data member returnType manually in the MyAbstractEvalFunc constructor. (though this has the same drawback of not working if someone then extends MyAbstractEvalFunc) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-566) Dump and store outputs do not match for PigStorage
[ https://issues.apache.org/jira/browse/PIG-566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12864336#action_12864336 ] Gianmarco De Francisci Morales commented on PIG-566: What should the default format be? With or without L/F at the end? The loader function already checks for the presence of a letter at the end, so we can accept both. I think that without is better anyway, it complies to normal Java behaviour. The L/F is used only in source code. Dump and store outputs do not match for PigStorage -- Key: PIG-566 URL: https://issues.apache.org/jira/browse/PIG-566 Project: Pig Issue Type: Bug Affects Versions: 0.2.0 Reporter: Santhosh Srinivasan Priority: Minor The dump and store formats for PigStorage do not match for longs and floats. {code} grunt y = foreach x generate {(2985671202194220139L)}; grunt describe y; y: {{(long)}} grunt dump y; ({(2985671202194220139L)}) grunt store y into 'y'; grunt cat y {(2985671202194220139)} {code} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1211) Pig script runs half way after which it reports syntax error
[ https://issues.apache.org/jira/browse/PIG-1211?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12864386#action_12864386 ] Pradeep Kamath commented on PIG-1211: - core unit tests are pass on my local machine - the errors reported above seem to be related to the environment. The release audit warning is due to a html file change and can be ignored - the patch is ready for review. Pig script runs half way after which it reports syntax error Key: PIG-1211 URL: https://issues.apache.org/jira/browse/PIG-1211 Project: Pig Issue Type: Improvement Components: impl Affects Versions: 0.6.0 Reporter: Viraj Bhat Fix For: 0.8.0 Attachments: PIG-1211.patch I have a Pig script which is structured in the following way {code} register cp.jar dataset = load '/data/dataset/' using PigStorage('\u0001') as (col1, col2, col3, col4, col5); filtered_dataset = filter dataset by (col1 == 1); proj_filtered_dataset = foreach filtered_dataset generate col2, col3; rmf $output1; store proj_filtered_dataset into '$output1' using PigStorage(); second_stream = foreach filtered_dataset generate col2, col4, col5; group_second_stream = group second_stream by col4; output2 = foreach group_second_stream { a = second_stream.col2 b = distinct second_stream.col5; c = order b by $0; generate 1 as key, group as keyword, MYUDF(c, 100) as finalcalc; } rmf $output2; --syntax error here store output2 to '$output2' using PigStorage(); {code} I run this script using the Multi-query option, it runs successfully till the first store but later fails with a syntax error. The usage of HDFS option, rmf causes the first store to execute. The only option the I have is to run an explain before running his script grunt explain -script myscript.pig -out explain.out or moving the rmf statements to the top of the script Here are some questions: a) Can we have an option to do something like checkscript instead of explain to get the same syntax error? In this way I can ensure that I do not run for 3-4 hours before encountering a syntax error b) Can pig not figure out a way to re-order the rmf statements since all the store directories are variables Thanks Viraj -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-823) Hadoop Metadata Service
[ https://issues.apache.org/jira/browse/PIG-823?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Gates updated PIG-823: --- Status: Resolved (was: Patch Available) Resolution: Duplicate Hadoop Metadata Service --- Key: PIG-823 URL: https://issues.apache.org/jira/browse/PIG-823 Project: Pig Issue Type: New Feature Reporter: Olga Natkovich Attachments: owl.filelist, owl.patch.gz, owl_libdeps.tgz, owl_otherdeps.tgz This JIRA is created to track development of a metadata system for Hadoop. The goal of the system is to allow users and applications to register data stored on HDFS, search for the data available on HDFS, and associate metadata such as schema, statistics, etc. with a particular data unit or a data set stored on HDFS. The initial goal is to provide a fairly generic, low level abstraction that any user or application on HDFS can use to store an retrieve metadata. Over time a higher level abstractions closely tied to particular applications or tools can be developed. Over time, it would make sense for the metadata service to become a subproject within Hadoop. For now, the proposal is to make it a contrib to Pig since Pig SQL is likely to be the first user of the system. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-928) UDFs in scripting languages
[ https://issues.apache.org/jira/browse/PIG-928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12864402#action_12864402 ] Julien Le Dem commented on PIG-928: --- The attentive reader will have noticed that it should be tar xzvf pig-greek.tgz in my previous comment. UDFs in scripting languages --- Key: PIG-928 URL: https://issues.apache.org/jira/browse/PIG-928 Project: Pig Issue Type: New Feature Reporter: Alan Gates Fix For: 0.8.0 Attachments: package.zip, pig-greek.tgz, pyg.tgz, scripting.tgz, scripting.tgz It should be possible to write UDFs in scripting languages such as python, ruby, etc. This frees users from needing to compile Java, generate a jar, etc. It also opens Pig to programmers who prefer scripting languages over Java. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1378) har url of the form har:///path not usable in Pig scripts (har://hdfs-namenode:port/path works)
[ https://issues.apache.org/jira/browse/PIG-1378?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pradeep Kamath updated PIG-1378: Summary: har url of the form har:///path not usable in Pig scripts (har://hdfs-namenode:port/path works) (was: har url not usable in Pig scripts) har url of the form har:///path not usable in Pig scripts (har://hdfs-namenode:port/path works) --- Key: PIG-1378 URL: https://issues.apache.org/jira/browse/PIG-1378 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.7.0 Reporter: Viraj Bhat Assignee: Pradeep Kamath Fix For: 0.8.0 Attachments: PIG-1378-2.patch, PIG-1378-3.patch, PIG-1378-4.patch, PIG-1378.patch I am trying to use har (Hadoop Archives) in my Pig script. I can use them through the HDFS shell {noformat} $hadoop fs -ls 'har:///user/viraj/project/subproject/files/size/data' Found 1 items -rw--- 5 viraj users1537234 2010-04-14 09:49 user/viraj/project/subproject/files/size/data/part-1 {noformat} Using similar URL's in grunt yields {noformat} grunt a = load 'har:///user/viraj/project/subproject/files/size/data'; grunt dump a; {noformat} {noformat} 2010-04-14 22:08:48,814 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 2998: Unhandled internal error. org.apache.pig.impl.logicalLayer.FrontendException: ERROR 0: Incompatible file URI scheme: har : hdfs 2010-04-14 22:08:48,814 [main] WARN org.apache.pig.tools.grunt.Grunt - There is no log file to write to. 2010-04-14 22:08:48,814 [main] ERROR org.apache.pig.tools.grunt.Grunt - java.lang.Error: org.apache.pig.impl.logicalLayer.FrontendException: ERROR 0: Incompatible file URI scheme: har : hdfs at org.apache.pig.impl.logicalLayer.parser.QueryParser.LoadClause(QueryParser.java:1483) at org.apache.pig.impl.logicalLayer.parser.QueryParser.BaseExpr(QueryParser.java:1245) at org.apache.pig.impl.logicalLayer.parser.QueryParser.Expr(QueryParser.java:911) at org.apache.pig.impl.logicalLayer.parser.QueryParser.Parse(QueryParser.java:700) at org.apache.pig.impl.logicalLayer.LogicalPlanBuilder.parse(LogicalPlanBuilder.java:63) at org.apache.pig.PigServer$Graph.parseQuery(PigServer.java:1164) at org.apache.pig.PigServer$Graph.registerQuery(PigServer.java:1114) at org.apache.pig.PigServer.registerQuery(PigServer.java:425) at org.apache.pig.tools.grunt.GruntParser.processPig(GruntParser.java:737) at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:324) at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:162) at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:138) at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:75) at org.apache.pig.Main.main(Main.java:357) Caused by: org.apache.pig.impl.logicalLayer.FrontendException: ERROR 0: Incompatible file URI scheme: har : hdfs at org.apache.pig.LoadFunc.getAbsolutePath(LoadFunc.java:249) at org.apache.pig.LoadFunc.relativeToAbsolutePath(LoadFunc.java:62) at org.apache.pig.impl.logicalLayer.parser.QueryParser.LoadClause(QueryParser.java:1472) ... 13 more {noformat} According to Jira http://issues.apache.org/jira/browse/PIG-1234 I try the following as stated in the original description {noformat} grunt a = load 'har://namenode-location/user/viraj/project/subproject/files/size/data'; grunt dump a; {noformat} {noformat} Caused by: org.apache.pig.backend.executionengine.ExecException: ERROR 2118: Unable to create input splits for: har://namenode-location/user/viraj/project/subproject/files/size/data'; ... 8 more Caused by: java.io.IOException: No FileSystem for scheme: namenode-location at .apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1375) at .apache.hadoop.fs.FileSystem.access(200(FileSystem.java:66) at .apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1390) at .apache.hadoop.fs.FileSystem.get(FileSystem.java:196) at .apache.hadoop.fs.HarFileSystem.initialize(HarFileSystem.java:104) at .apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1378) at .apache.hadoop.fs.FileSystem.get(FileSystem.java:193) at .apache.hadoop.fs.Path.getFileSystem(Path.java:175) at .apache.hadoop.mapreduce.lib.input.FileInputFormat.listStatus(FileInputFormat.java:208) at .apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigTextInputFormat.listStatus(PigTextInputFormat.java:36) at
[jira] Resolved: (PIG-1378) har url of the form har:///path not usable in Pig scripts (har://hdfs-namenode:port/path works)
[ https://issues.apache.org/jira/browse/PIG-1378?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pradeep Kamath resolved PIG-1378. - Release Note: The fix for this issue described in this jira depends on a issue with Hadoop code which was fixed on the hadoop trunk ( https://issues.apache.org/jira/browse/MAPREDUCE-1522). Until that goes into a hadoop release which is used by pig, this will remain an issue Resolution: Fixed Am closing this bug since the pig changes are in and hadoop changes are in trunk - this should work once we use the appropriate hadoop release. har url of the form har:///path not usable in Pig scripts (har://hdfs-namenode:port/path works) --- Key: PIG-1378 URL: https://issues.apache.org/jira/browse/PIG-1378 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.7.0 Reporter: Viraj Bhat Assignee: Pradeep Kamath Fix For: 0.8.0 Attachments: PIG-1378-2.patch, PIG-1378-3.patch, PIG-1378-4.patch, PIG-1378.patch I am trying to use har (Hadoop Archives) in my Pig script. I can use them through the HDFS shell {noformat} $hadoop fs -ls 'har:///user/viraj/project/subproject/files/size/data' Found 1 items -rw--- 5 viraj users1537234 2010-04-14 09:49 user/viraj/project/subproject/files/size/data/part-1 {noformat} Using similar URL's in grunt yields {noformat} grunt a = load 'har:///user/viraj/project/subproject/files/size/data'; grunt dump a; {noformat} {noformat} 2010-04-14 22:08:48,814 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 2998: Unhandled internal error. org.apache.pig.impl.logicalLayer.FrontendException: ERROR 0: Incompatible file URI scheme: har : hdfs 2010-04-14 22:08:48,814 [main] WARN org.apache.pig.tools.grunt.Grunt - There is no log file to write to. 2010-04-14 22:08:48,814 [main] ERROR org.apache.pig.tools.grunt.Grunt - java.lang.Error: org.apache.pig.impl.logicalLayer.FrontendException: ERROR 0: Incompatible file URI scheme: har : hdfs at org.apache.pig.impl.logicalLayer.parser.QueryParser.LoadClause(QueryParser.java:1483) at org.apache.pig.impl.logicalLayer.parser.QueryParser.BaseExpr(QueryParser.java:1245) at org.apache.pig.impl.logicalLayer.parser.QueryParser.Expr(QueryParser.java:911) at org.apache.pig.impl.logicalLayer.parser.QueryParser.Parse(QueryParser.java:700) at org.apache.pig.impl.logicalLayer.LogicalPlanBuilder.parse(LogicalPlanBuilder.java:63) at org.apache.pig.PigServer$Graph.parseQuery(PigServer.java:1164) at org.apache.pig.PigServer$Graph.registerQuery(PigServer.java:1114) at org.apache.pig.PigServer.registerQuery(PigServer.java:425) at org.apache.pig.tools.grunt.GruntParser.processPig(GruntParser.java:737) at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:324) at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:162) at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:138) at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:75) at org.apache.pig.Main.main(Main.java:357) Caused by: org.apache.pig.impl.logicalLayer.FrontendException: ERROR 0: Incompatible file URI scheme: har : hdfs at org.apache.pig.LoadFunc.getAbsolutePath(LoadFunc.java:249) at org.apache.pig.LoadFunc.relativeToAbsolutePath(LoadFunc.java:62) at org.apache.pig.impl.logicalLayer.parser.QueryParser.LoadClause(QueryParser.java:1472) ... 13 more {noformat} According to Jira http://issues.apache.org/jira/browse/PIG-1234 I try the following as stated in the original description {noformat} grunt a = load 'har://namenode-location/user/viraj/project/subproject/files/size/data'; grunt dump a; {noformat} {noformat} Caused by: org.apache.pig.backend.executionengine.ExecException: ERROR 2118: Unable to create input splits for: har://namenode-location/user/viraj/project/subproject/files/size/data'; ... 8 more Caused by: java.io.IOException: No FileSystem for scheme: namenode-location at .apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1375) at .apache.hadoop.fs.FileSystem.access(200(FileSystem.java:66) at .apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1390) at .apache.hadoop.fs.FileSystem.get(FileSystem.java:196) at .apache.hadoop.fs.HarFileSystem.initialize(HarFileSystem.java:104) at .apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1378) at .apache.hadoop.fs.FileSystem.get(FileSystem.java:193) at .apache.hadoop.fs.Path.getFileSystem(Path.java:175) at
[VOTE] Release Pig 0.7.0 (candidate 0)
Hi, I have created a candidate build for Pig 0.7.0. A description of what is new and different is included in the release notes: http://people.apache.org/~daijy/pig-0.7.0-candidate-0/RELEASE_NOTES.txt Keys used to sign the release are available at http://svn.apache.org/viewvc/hadoop/pig/trunk/KEYS?view=markup Please download, test, try it out and vote. The download link is: http://people.apache.org/~daijy/pig-0.7.0-candidate-0 Thanks Daniel
[jira] Updated: (PIG-1280) Add a pig-script-id to the JobConf of all jobs run in a pig-script
[ https://issues.apache.org/jira/browse/PIG-1280?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Richard Ding updated PIG-1280: -- Fix Version/s: 0.8.0 Add a pig-script-id to the JobConf of all jobs run in a pig-script -- Key: PIG-1280 URL: https://issues.apache.org/jira/browse/PIG-1280 Project: Pig Issue Type: Improvement Components: impl Reporter: Arun C Murthy Assignee: Richard Ding Fix For: 0.8.0 It would be very useful for tools like gridmix if pig could add a 'pig-script-id' to all Map-Reduce jobs spawned by a single pig-script. Potentially we could use this to re-construct the DAG of jobs in gridmix and so on. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1280) Add a pig-script-id to the JobConf of all jobs run in a pig-script
[ https://issues.apache.org/jira/browse/PIG-1280?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12864482#action_12864482 ] Richard Ding commented on PIG-1280: --- There have been several similar requests (on adding new Pig properties to MR job) since this Jira was filed. Here are a compilation of those properties: * _pig.script.id_ * _pig.script_ (the Pig script generates this job) * _pig.launcher.host_ (the host/IP of the machine on which the Pig script is executed) * _pig.command.line_ (the Pig command line arguments of this script) * _pig.input.dirs_ (comma separated input directory list of this job) * _pig.output.dirs_ (comma separated output directory list of this job) * _pig.version_ Add a pig-script-id to the JobConf of all jobs run in a pig-script -- Key: PIG-1280 URL: https://issues.apache.org/jira/browse/PIG-1280 Project: Pig Issue Type: Improvement Components: impl Reporter: Arun C Murthy Assignee: Richard Ding It would be very useful for tools like gridmix if pig could add a 'pig-script-id' to all Map-Reduce jobs spawned by a single pig-script. Potentially we could use this to re-construct the DAG of jobs in gridmix and so on. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1405) Need to move many standard functions from piggybank into Pig
[ https://issues.apache.org/jira/browse/PIG-1405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12864499#action_12864499 ] Dmitriy V. Ryaboy commented on PIG-1405: I think Top (TOP) is a common enough thing to do to put into builtin. Regarding naming -- for readability I propose, LAST_INDEX_OF, REGEX_EXTRACT and REGEX_EXTRACT_ALL Need to move many standard functions from piggybank into Pig Key: PIG-1405 URL: https://issues.apache.org/jira/browse/PIG-1405 Project: Pig Issue Type: Improvement Reporter: Alan Gates Fix For: 0.8.0 There are currently a number of functions in Piggybank that represent features commonly supported by languages and database engines. We need to decide which of these Pig should support as built in functions and put them in org.apache.pig.builtin. This will also mean adding unit tests and javadocs for some UDFs. The existing classes will be left in Piggybank for some time for backward compatibility. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-566) Dump and store outputs do not match for PigStorage
[ https://issues.apache.org/jira/browse/PIG-566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12864567#action_12864567 ] Daniel Dai commented on PIG-566: Agree, I vote for without L/F. Dump and store outputs do not match for PigStorage -- Key: PIG-566 URL: https://issues.apache.org/jira/browse/PIG-566 Project: Pig Issue Type: Bug Affects Versions: 0.2.0 Reporter: Santhosh Srinivasan Priority: Minor The dump and store formats for PigStorage do not match for longs and floats. {code} grunt y = foreach x generate {(2985671202194220139L)}; grunt describe y; y: {{(long)}} grunt dump y; ({(2985671202194220139L)}) grunt store y into 'y'; grunt cat y {(2985671202194220139)} {code} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Assigned: (PIG-566) Dump and store outputs do not match for PigStorage
[ https://issues.apache.org/jira/browse/PIG-566?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai reassigned PIG-566: -- Assignee: Gianmarco De Francisci Morales Dump and store outputs do not match for PigStorage -- Key: PIG-566 URL: https://issues.apache.org/jira/browse/PIG-566 Project: Pig Issue Type: Bug Affects Versions: 0.2.0 Reporter: Santhosh Srinivasan Assignee: Gianmarco De Francisci Morales Priority: Minor The dump and store formats for PigStorage do not match for longs and floats. {code} grunt y = foreach x generate {(2985671202194220139L)}; grunt describe y; y: {{(long)}} grunt dump y; ({(2985671202194220139L)}) grunt store y into 'y'; grunt cat y {(2985671202194220139)} {code} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1391) pig unit tests leave behind files in temp directory because MiniCluster files don't get deleted
[ https://issues.apache.org/jira/browse/PIG-1391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12864577#action_12864577 ] Daniel Dai commented on PIG-1391: - Recommend to change two things: 1. remove ${junit.tmp.dir} after unit test targets in build.xml 2. remove fixes for zebra test case in this patch, it seems to be an irrelevant change. Other part are good. Please commit after the above two changes. pig unit tests leave behind files in temp directory because MiniCluster files don't get deleted --- Key: PIG-1391 URL: https://issues.apache.org/jira/browse/PIG-1391 Project: Pig Issue Type: Bug Affects Versions: 0.7.0, 0.8.0 Reporter: Thejas M Nair Assignee: Thejas M Nair Fix For: 0.6.0, 0.7.0, 0.8.0 Attachments: minicluster.patch, PIG-1391.06.2.patch, PIG-1391.06.patch, PIG-1391.07.patch, PIG-1391.trunk.patch Pig unit test runs leave behind files in temp dir (/tmp) and there are too many files in the directory over time. Most of the files are left behind by MiniCluster . It closes/shutsdown MiniDFSCluster, MiniMRCluster and the FileSystem that it has created when the constructor is called, only in finalize(). And java does not guarantee that finalize() will be called. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1345) Link casting errors in POCast to actual lines numbers in Pig script
[ https://issues.apache.org/jira/browse/PIG-1345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12864579#action_12864579 ] Richard Ding commented on PIG-1345: --- By default, the Pig property aggregate.warning is set to true. So in the above example, you only get aggregated warning messages, not the original detailed warning messages. You can turn off aggregate.warning by the command line switch -w. The detailed warning messages contain more information. Here is an example: {code} [main] WARN org.apache.pig.PigServer - int is implicitly cast to float under LOAdd Operator [main] WARN org.apache.pig.PigServer - long is implicitly cast to float under LOAdd Operator {code} instead of {code} [main] WARN org.apache.pig.PigServer - Encountered Warning IMPLICIT_CAST_TO_FLOAT 2 time(s). {code} Link casting errors in POCast to actual lines numbers in Pig script --- Key: PIG-1345 URL: https://issues.apache.org/jira/browse/PIG-1345 Project: Pig Issue Type: Sub-task Components: impl Affects Versions: 0.6.0 Reporter: Viraj Bhat For the purpose of easy debugging, I would be nice to find out where my warnings are coming from is in the pig script. The only known process is to comment out lines in the Pig script and see if these warnings go away. 2010-01-13 21:34:13,697 [main] WARN org.apache.pig.PigServer - Encountered Warning IMPLICIT_CAST_TO_MAP 2 time(s) line 22 2010-01-13 21:34:13,698 [main] WARN org.apache.pig.PigServer - Encountered Warning IMPLICIT_CAST_TO_LONG 2 time(s) line 23 2010-01-13 21:34:13,698 [main] WARN org.apache.pig.PigServer - Encountered Warning IMPLICIT_CAST_TO_BAG 1 time(s). line 26 I think this may need us to keep track of the line numbers of the Pig script (via out javacc parser) and maintain it in the logical and physical plan. It would help users in debugging simple errors/warning related to casting. Is this enhancement listed in the http://wiki.apache.org/pig/PigJournal? Do we need to change the parser to something other than javacc to make this task simpler? Standardize on Parser and Scanner Technology Viraj -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1211) Pig script runs half way after which it reports syntax error
[ https://issues.apache.org/jira/browse/PIG-1211?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12864589#action_12864589 ] Thejas M Nair commented on PIG-1211: +1 Pig script runs half way after which it reports syntax error Key: PIG-1211 URL: https://issues.apache.org/jira/browse/PIG-1211 Project: Pig Issue Type: Improvement Components: impl Affects Versions: 0.6.0 Reporter: Viraj Bhat Fix For: 0.8.0 Attachments: PIG-1211.patch I have a Pig script which is structured in the following way {code} register cp.jar dataset = load '/data/dataset/' using PigStorage('\u0001') as (col1, col2, col3, col4, col5); filtered_dataset = filter dataset by (col1 == 1); proj_filtered_dataset = foreach filtered_dataset generate col2, col3; rmf $output1; store proj_filtered_dataset into '$output1' using PigStorage(); second_stream = foreach filtered_dataset generate col2, col4, col5; group_second_stream = group second_stream by col4; output2 = foreach group_second_stream { a = second_stream.col2 b = distinct second_stream.col5; c = order b by $0; generate 1 as key, group as keyword, MYUDF(c, 100) as finalcalc; } rmf $output2; --syntax error here store output2 to '$output2' using PigStorage(); {code} I run this script using the Multi-query option, it runs successfully till the first store but later fails with a syntax error. The usage of HDFS option, rmf causes the first store to execute. The only option the I have is to run an explain before running his script grunt explain -script myscript.pig -out explain.out or moving the rmf statements to the top of the script Here are some questions: a) Can we have an option to do something like checkscript instead of explain to get the same syntax error? In this way I can ensure that I do not run for 3-4 hours before encountering a syntax error b) Can pig not figure out a way to re-order the rmf statements since all the store directories are variables Thanks Viraj -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1211) Pig script runs half way after which it reports syntax error
[ https://issues.apache.org/jira/browse/PIG-1211?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pradeep Kamath updated PIG-1211: Status: Resolved (was: Patch Available) Hadoop Flags: [Incompatible change, Reviewed] Release Note: -c (-cluster) was earlier documented as the option to provide cluster information - this was not being used in the Pig code though - with PIG-1211, -c is being reused as the option to check syntax of the pig script Resolution: Fixed Patch committed to trunk Pig script runs half way after which it reports syntax error Key: PIG-1211 URL: https://issues.apache.org/jira/browse/PIG-1211 Project: Pig Issue Type: Improvement Components: impl Affects Versions: 0.6.0 Reporter: Viraj Bhat Fix For: 0.8.0 Attachments: PIG-1211.patch I have a Pig script which is structured in the following way {code} register cp.jar dataset = load '/data/dataset/' using PigStorage('\u0001') as (col1, col2, col3, col4, col5); filtered_dataset = filter dataset by (col1 == 1); proj_filtered_dataset = foreach filtered_dataset generate col2, col3; rmf $output1; store proj_filtered_dataset into '$output1' using PigStorage(); second_stream = foreach filtered_dataset generate col2, col4, col5; group_second_stream = group second_stream by col4; output2 = foreach group_second_stream { a = second_stream.col2 b = distinct second_stream.col5; c = order b by $0; generate 1 as key, group as keyword, MYUDF(c, 100) as finalcalc; } rmf $output2; --syntax error here store output2 to '$output2' using PigStorage(); {code} I run this script using the Multi-query option, it runs successfully till the first store but later fails with a syntax error. The usage of HDFS option, rmf causes the first store to execute. The only option the I have is to run an explain before running his script grunt explain -script myscript.pig -out explain.out or moving the rmf statements to the top of the script Here are some questions: a) Can we have an option to do something like checkscript instead of explain to get the same syntax error? In this way I can ensure that I do not run for 3-4 hours before encountering a syntax error b) Can pig not figure out a way to re-order the rmf statements since all the store directories are variables Thanks Viraj -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.