[jira] Updated: (PIG-1401) explain -script script file executes grunt commands like run/dump/copy etc - explain -script should not execute any grunt command and only explain the query plans.
[ https://issues.apache.org/jira/browse/PIG-1401?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pradeep Kamath updated PIG-1401: Status: Patch Available (was: Open) explain -script script file executes grunt commands like run/dump/copy etc - explain -script should not execute any grunt command and only explain the query plans. --- Key: PIG-1401 URL: https://issues.apache.org/jira/browse/PIG-1401 Project: Pig Issue Type: Bug Affects Versions: 0.6.0, 0.7.0 Reporter: Pradeep Kamath Assignee: Pradeep Kamath Fix For: 0.8.0 Attachments: PIG-1401-2.patch, PIG-1401-3.patch, PIG-1401.patch explain -script script file executes grunt commands like run/dump/copy etc - explain -script should not execute any grunt command and only explain the query plans. Note: explain alias statement in the script will still cause all grunt commands upto the explain to be executed. This issue only fixes the behavior of explain -script script file wherein any grunt commands like run, dump, copy, fs .. present in the supplied script file will need to be ignored. This should be documented in the release in which this jira will be resolved. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
need help again, what causes Cannot cast to Unknown ?
Hey, guys, I managed to generate another horrendous error message (before the plan completes). What typically causes this error to happen? The script survives through all describes. (I can describe after all assignments to aliases), but it still produces this error. (running pit 0.5 on hadoop .20) 2010-05-03 22:54:22,054 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1051: Cannot cast to Unknown 2010-05-03 22:54:22,054 [main] ERROR org.apache.pig.tools.grunt.Grunt - org.apache.pig.impl.plan.PlanValidationException: ERROR 0: An unexpected exception caused the validation to stop at org.apache.pig.impl.plan.PlanValidator.validateSkipCollectException(PlanValidator.java:104) at org.apache.pig.impl.logicalLayer.validators.TypeCheckingValidator.validate(TypeCheckingValidator.java:40) at org.apache.pig.impl.logicalLayer.validators.TypeCheckingValidator.validate(TypeCheckingValidator.java:30) at org.apache.pig.impl.logicalLayer.validators.LogicalPlanValidationExecutor.validate(LogicalPlanValidationExecutor.java:83) at org.apache.pig.PigServer.compileLp(PigServer.java:818) at org.apache.pig.PigServer.compileLp(PigServer.java:789) at org.apache.pig.PigServer.execute(PigServer.java:758) at org.apache.pig.PigServer.access$100(PigServer.java:89) at org.apache.pig.PigServer$Graph.execute(PigServer.java:947) at org.apache.pig.PigServer.executeBatch(PigServer.java:249) at org.apache.pig.tools.grunt.GruntParser.executeBatch(GruntParser.java:115) at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:172) at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:144) at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:89) at org.apache.pig.Main.main(Main.java:320) Caused by: org.apache.pig.impl.logicalLayer.validators.TypeCheckerException: ERROR 1060: Cannot resolve Join output schema at org.apache.pig.impl.logicalLayer.validators.TypeCheckingVisitor.visit(TypeCheckingVisitor.java:2360) at org.apache.pig.impl.logicalLayer.LOJoin.visit(LOJoin.java:201) at org.apache.pig.impl.logicalLayer.LOJoin.visit(LOJoin.java:45) at org.apache.pig.impl.plan.DependencyOrderWalker.walk(DependencyOrderWalker.java:68) at org.apache.pig.impl.plan.PlanVisitor.visit(PlanVisitor.java:51) at org.apache.pig.impl.plan.PlanValidator.validateSkipCollectException(PlanValidator.java:101) ... 14 more Caused by: org.apache.pig.impl.logicalLayer.validators.TypeCheckerException: ERROR 1051: Cannot cast to Unknown at org.apache.pig.impl.logicalLayer.validators.TypeCheckingVisitor.insertAtomicCastForJoinInnerPlan(TypeCheckingVisitor.java:2544) at org.apache.pig.impl.logicalLayer.validators.TypeCheckingVisitor.visit(TypeCheckingVisitor.java:2348) ... 19 more
[jira] Commented: (PIG-1331) Owl Hadoop Table Management Service
[ https://issues.apache.org/jira/browse/PIG-1331?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12863750#action_12863750 ] Jeff Hammerbacher commented on PIG-1331: Hey, Does this issue make PIG-823 a duplicate? Thanks, Jeff Owl Hadoop Table Management Service --- Key: PIG-1331 URL: https://issues.apache.org/jira/browse/PIG-1331 Project: Pig Issue Type: New Feature Affects Versions: 0.8.0 Reporter: Jay Tang Assignee: Ajay Kidave Fix For: 0.8.0 Attachments: anttestoutput.tgz, build.log, ivy_version.patch, owl.contrib.3.tgz, owl.contrib.4.tar.gz This JIRA is a proposal to create a Hadoop table management service: Owl. Today, MapReduce and Pig applications interacts directly with HDFS directories and files and must deal with low level data management issues such as storage format, serialization/compression schemes, data layout, and efficient data accesses, etc, often with different solutions. Owl aims to provide a standard way to addresses this issue and abstracts away the complexities of reading/writing huge amount of data from/to HDFS. Owl has a data access API that is modeled after the traditional Hadoop !InputFormt and a management API to manipulate Owl objects. This JIRA is related to Pig-823 (Hadoop Metadata Service) as Owl has an internal metadata store. Owl integrates with different storage module like Zebra with a pluggable architecture. Initially, the proposal is to submit Owl as a Pig contrib project. Over time, it makes sense to move it to a Hadoop subproject. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1401) explain -script script file executes grunt commands like run/dump/copy etc - explain -script should not execute any grunt command and only explain the query plans.
[ https://issues.apache.org/jira/browse/PIG-1401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12863760#action_12863760 ] Hadoop QA commented on PIG-1401: -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12443548/PIG-1401-3.patch against trunk revision 940601. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 6 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. -1 release audit. The applied patch generated 536 release audit warnings (more than the trunk's current 535 warnings). +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/313/testReport/ Release audit warnings: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/313/artifact/trunk/patchprocess/releaseAuditDiffWarnings.txt Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/313/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/313/console This message is automatically generated. explain -script script file executes grunt commands like run/dump/copy etc - explain -script should not execute any grunt command and only explain the query plans. --- Key: PIG-1401 URL: https://issues.apache.org/jira/browse/PIG-1401 Project: Pig Issue Type: Bug Affects Versions: 0.6.0, 0.7.0 Reporter: Pradeep Kamath Assignee: Pradeep Kamath Fix For: 0.8.0 Attachments: PIG-1401-2.patch, PIG-1401-3.patch, PIG-1401.patch explain -script script file executes grunt commands like run/dump/copy etc - explain -script should not execute any grunt command and only explain the query plans. Note: explain alias statement in the script will still cause all grunt commands upto the explain to be executed. This issue only fixes the behavior of explain -script script file wherein any grunt commands like run, dump, copy, fs .. present in the supplied script file will need to be ignored. This should be documented in the release in which this jira will be resolved. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1331) Owl Hadoop Table Management Service
[ https://issues.apache.org/jira/browse/PIG-1331?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12863835#action_12863835 ] Jay Tang commented on PIG-1331: --- Yes, Jeff. Owl, as a table management service, has a metadata module. Please see http://wiki.apache.org/pig/owl for more information. Owl Hadoop Table Management Service --- Key: PIG-1331 URL: https://issues.apache.org/jira/browse/PIG-1331 Project: Pig Issue Type: New Feature Affects Versions: 0.8.0 Reporter: Jay Tang Assignee: Ajay Kidave Fix For: 0.8.0 Attachments: anttestoutput.tgz, build.log, ivy_version.patch, owl.contrib.3.tgz, owl.contrib.4.tar.gz This JIRA is a proposal to create a Hadoop table management service: Owl. Today, MapReduce and Pig applications interacts directly with HDFS directories and files and must deal with low level data management issues such as storage format, serialization/compression schemes, data layout, and efficient data accesses, etc, often with different solutions. Owl aims to provide a standard way to addresses this issue and abstracts away the complexities of reading/writing huge amount of data from/to HDFS. Owl has a data access API that is modeled after the traditional Hadoop !InputFormt and a management API to manipulate Owl objects. This JIRA is related to Pig-823 (Hadoop Metadata Service) as Owl has an internal metadata store. Owl integrates with different storage module like Zebra with a pluggable architecture. Initially, the proposal is to submit Owl as a Pig contrib project. Over time, it makes sense to move it to a Hadoop subproject. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1401) explain -script script file executes grunt commands like run/dump/copy etc - explain -script should not execute any grunt command and only explain the query plans.
[ https://issues.apache.org/jira/browse/PIG-1401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12863859#action_12863859 ] Pradeep Kamath commented on PIG-1401: - The release audit warning is due to the new test script file added in the patch and can be ignored - the patch is ready for review. explain -script script file executes grunt commands like run/dump/copy etc - explain -script should not execute any grunt command and only explain the query plans. --- Key: PIG-1401 URL: https://issues.apache.org/jira/browse/PIG-1401 Project: Pig Issue Type: Bug Affects Versions: 0.6.0, 0.7.0 Reporter: Pradeep Kamath Assignee: Pradeep Kamath Fix For: 0.8.0 Attachments: PIG-1401-2.patch, PIG-1401-3.patch, PIG-1401.patch explain -script script file executes grunt commands like run/dump/copy etc - explain -script should not execute any grunt command and only explain the query plans. Note: explain alias statement in the script will still cause all grunt commands upto the explain to be executed. This issue only fixes the behavior of explain -script script file wherein any grunt commands like run, dump, copy, fs .. present in the supplied script file will need to be ignored. This should be documented in the release in which this jira will be resolved. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
Prepare to release Pig 0.7.0
Pig Developers, It has been a few weeks since we branch for 0.7. We have fixed couple of bugs since then and now we believe 0.7 branch is stabilized and ready to release. I will work on rolling up the release. Please let me know if you have any objections. Thanks Daniel
[jira] Commented: (PIG-1401) explain -script script file executes grunt commands like run/dump/copy etc - explain -script should not execute any grunt command and only explain the query plans.
[ https://issues.apache.org/jira/browse/PIG-1401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12863921#action_12863921 ] Olga Natkovich commented on PIG-1401: - +1 explain -script script file executes grunt commands like run/dump/copy etc - explain -script should not execute any grunt command and only explain the query plans. --- Key: PIG-1401 URL: https://issues.apache.org/jira/browse/PIG-1401 Project: Pig Issue Type: Bug Affects Versions: 0.6.0, 0.7.0 Reporter: Pradeep Kamath Assignee: Pradeep Kamath Fix For: 0.8.0 Attachments: PIG-1401-2.patch, PIG-1401-3.patch, PIG-1401.patch explain -script script file executes grunt commands like run/dump/copy etc - explain -script should not execute any grunt command and only explain the query plans. Note: explain alias statement in the script will still cause all grunt commands upto the explain to be executed. This issue only fixes the behavior of explain -script script file wherein any grunt commands like run, dump, copy, fs .. present in the supplied script file will need to be ignored. This should be documented in the release in which this jira will be resolved. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1391) pig unit tests leave behind files in temp directory because MiniCluster files don't get deleted
[ https://issues.apache.org/jira/browse/PIG-1391?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thejas M Nair updated PIG-1391: --- Status: Patch Available (was: Open) Affects Version/s: 0.8.0 Fix Version/s: 0.8.0 0.6.0 pig unit tests leave behind files in temp directory because MiniCluster files don't get deleted --- Key: PIG-1391 URL: https://issues.apache.org/jira/browse/PIG-1391 Project: Pig Issue Type: Bug Affects Versions: 0.7.0, 0.8.0 Reporter: Thejas M Nair Assignee: Thejas M Nair Fix For: 0.7.0, 0.8.0, 0.6.0 Attachments: minicluster.patch, PIG-1391.06.2.patch, PIG-1391.06.patch, PIG-1391.07.patch, PIG-1391.trunk.patch Pig unit test runs leave behind files in temp dir (/tmp) and there are too many files in the directory over time. Most of the files are left behind by MiniCluster . It closes/shutsdown MiniDFSCluster, MiniMRCluster and the FileSystem that it has created when the constructor is called, only in finalize(). And java does not guarantee that finalize() will be called. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1391) pig unit tests leave behind files in temp directory because MiniCluster files don't get deleted
[ https://issues.apache.org/jira/browse/PIG-1391?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thejas M Nair updated PIG-1391: --- Attachment: PIG-1391.trunk.patch pig unit tests leave behind files in temp directory because MiniCluster files don't get deleted --- Key: PIG-1391 URL: https://issues.apache.org/jira/browse/PIG-1391 Project: Pig Issue Type: Bug Affects Versions: 0.7.0, 0.8.0 Reporter: Thejas M Nair Assignee: Thejas M Nair Fix For: 0.6.0, 0.7.0, 0.8.0 Attachments: minicluster.patch, PIG-1391.06.2.patch, PIG-1391.06.patch, PIG-1391.07.patch, PIG-1391.trunk.patch Pig unit test runs leave behind files in temp dir (/tmp) and there are too many files in the directory over time. Most of the files are left behind by MiniCluster . It closes/shutsdown MiniDFSCluster, MiniMRCluster and the FileSystem that it has created when the constructor is called, only in finalize(). And java does not guarantee that finalize() will be called. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1398) Marking Pig interfaces for org.apache.pig.data package
[ https://issues.apache.org/jira/browse/PIG-1398?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Gates updated PIG-1398: Status: Resolved (was: Patch Available) Resolution: Fixed Patch checked in. Marking Pig interfaces for org.apache.pig.data package -- Key: PIG-1398 URL: https://issues.apache.org/jira/browse/PIG-1398 Project: Pig Issue Type: Sub-task Components: documentation Affects Versions: 0.8.0 Reporter: Alan Gates Assignee: Alan Gates Priority: Minor Fix For: 0.8.0 Attachments: PIG-1398.patch Marking Pig interfaces for stability and audience, as well as javadoc cleanup, for the data package. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1378) har url not usable in Pig scripts
[ https://issues.apache.org/jira/browse/PIG-1378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12863997#action_12863997 ] Pradeep Kamath commented on PIG-1378: - Spoke with a developer on the hadoop team to confirm that this is an issue with Hadoop code fixed on the hadoop trunk ( https://issues.apache.org/jira/browse/MAPREDUCE-1522). Until that goes into a hadoop release which is used by pig, this will remain an issue - not sure if we should keep this jira open until that point - am fine if we should. har url not usable in Pig scripts - Key: PIG-1378 URL: https://issues.apache.org/jira/browse/PIG-1378 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.7.0 Reporter: Viraj Bhat Assignee: Pradeep Kamath Fix For: 0.8.0 Attachments: PIG-1378-2.patch, PIG-1378-3.patch, PIG-1378-4.patch, PIG-1378.patch I am trying to use har (Hadoop Archives) in my Pig script. I can use them through the HDFS shell {noformat} $hadoop fs -ls 'har:///user/viraj/project/subproject/files/size/data' Found 1 items -rw--- 5 viraj users1537234 2010-04-14 09:49 user/viraj/project/subproject/files/size/data/part-1 {noformat} Using similar URL's in grunt yields {noformat} grunt a = load 'har:///user/viraj/project/subproject/files/size/data'; grunt dump a; {noformat} {noformat} 2010-04-14 22:08:48,814 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 2998: Unhandled internal error. org.apache.pig.impl.logicalLayer.FrontendException: ERROR 0: Incompatible file URI scheme: har : hdfs 2010-04-14 22:08:48,814 [main] WARN org.apache.pig.tools.grunt.Grunt - There is no log file to write to. 2010-04-14 22:08:48,814 [main] ERROR org.apache.pig.tools.grunt.Grunt - java.lang.Error: org.apache.pig.impl.logicalLayer.FrontendException: ERROR 0: Incompatible file URI scheme: har : hdfs at org.apache.pig.impl.logicalLayer.parser.QueryParser.LoadClause(QueryParser.java:1483) at org.apache.pig.impl.logicalLayer.parser.QueryParser.BaseExpr(QueryParser.java:1245) at org.apache.pig.impl.logicalLayer.parser.QueryParser.Expr(QueryParser.java:911) at org.apache.pig.impl.logicalLayer.parser.QueryParser.Parse(QueryParser.java:700) at org.apache.pig.impl.logicalLayer.LogicalPlanBuilder.parse(LogicalPlanBuilder.java:63) at org.apache.pig.PigServer$Graph.parseQuery(PigServer.java:1164) at org.apache.pig.PigServer$Graph.registerQuery(PigServer.java:1114) at org.apache.pig.PigServer.registerQuery(PigServer.java:425) at org.apache.pig.tools.grunt.GruntParser.processPig(GruntParser.java:737) at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:324) at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:162) at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:138) at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:75) at org.apache.pig.Main.main(Main.java:357) Caused by: org.apache.pig.impl.logicalLayer.FrontendException: ERROR 0: Incompatible file URI scheme: har : hdfs at org.apache.pig.LoadFunc.getAbsolutePath(LoadFunc.java:249) at org.apache.pig.LoadFunc.relativeToAbsolutePath(LoadFunc.java:62) at org.apache.pig.impl.logicalLayer.parser.QueryParser.LoadClause(QueryParser.java:1472) ... 13 more {noformat} According to Jira http://issues.apache.org/jira/browse/PIG-1234 I try the following as stated in the original description {noformat} grunt a = load 'har://namenode-location/user/viraj/project/subproject/files/size/data'; grunt dump a; {noformat} {noformat} Caused by: org.apache.pig.backend.executionengine.ExecException: ERROR 2118: Unable to create input splits for: har://namenode-location/user/viraj/project/subproject/files/size/data'; ... 8 more Caused by: java.io.IOException: No FileSystem for scheme: namenode-location at .apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1375) at .apache.hadoop.fs.FileSystem.access(200(FileSystem.java:66) at .apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1390) at .apache.hadoop.fs.FileSystem.get(FileSystem.java:196) at .apache.hadoop.fs.HarFileSystem.initialize(HarFileSystem.java:104) at .apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1378) at .apache.hadoop.fs.FileSystem.get(FileSystem.java:193) at .apache.hadoop.fs.Path.getFileSystem(Path.java:175) at .apache.hadoop.mapreduce.lib.input.FileInputFormat.listStatus(FileInputFormat.java:208) at
[jira] Updated: (PIG-1211) Pig script runs half way after which it reports syntax error
[ https://issues.apache.org/jira/browse/PIG-1211?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pradeep Kamath updated PIG-1211: Status: Patch Available (was: Open) Pig script runs half way after which it reports syntax error Key: PIG-1211 URL: https://issues.apache.org/jira/browse/PIG-1211 Project: Pig Issue Type: Improvement Components: impl Affects Versions: 0.6.0 Reporter: Viraj Bhat Fix For: 0.8.0 Attachments: PIG-1211.patch I have a Pig script which is structured in the following way {code} register cp.jar dataset = load '/data/dataset/' using PigStorage('\u0001') as (col1, col2, col3, col4, col5); filtered_dataset = filter dataset by (col1 == 1); proj_filtered_dataset = foreach filtered_dataset generate col2, col3; rmf $output1; store proj_filtered_dataset into '$output1' using PigStorage(); second_stream = foreach filtered_dataset generate col2, col4, col5; group_second_stream = group second_stream by col4; output2 = foreach group_second_stream { a = second_stream.col2 b = distinct second_stream.col5; c = order b by $0; generate 1 as key, group as keyword, MYUDF(c, 100) as finalcalc; } rmf $output2; --syntax error here store output2 to '$output2' using PigStorage(); {code} I run this script using the Multi-query option, it runs successfully till the first store but later fails with a syntax error. The usage of HDFS option, rmf causes the first store to execute. The only option the I have is to run an explain before running his script grunt explain -script myscript.pig -out explain.out or moving the rmf statements to the top of the script Here are some questions: a) Can we have an option to do something like checkscript instead of explain to get the same syntax error? In this way I can ensure that I do not run for 3-4 hours before encountering a syntax error b) Can pig not figure out a way to re-order the rmf statements since all the store directories are variables Thanks Viraj -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1211) Pig script runs half way after which it reports syntax error
[ https://issues.apache.org/jira/browse/PIG-1211?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pradeep Kamath updated PIG-1211: Attachment: PIG-1211.patch Attached patch addresses the issue by adding support for a check script option. For this purpose, the -c command line option is reused thus fixing https://issues.apache.org/jira/browse/PIG-1382 (Command line option -c doesn't work ...Currently this option is not used...). The implementation of this check option piggybacks on explain -script and just modifies the GruntParser code to not output the explain output. Pig script runs half way after which it reports syntax error Key: PIG-1211 URL: https://issues.apache.org/jira/browse/PIG-1211 Project: Pig Issue Type: Improvement Components: impl Affects Versions: 0.6.0 Reporter: Viraj Bhat Fix For: 0.8.0 Attachments: PIG-1211.patch I have a Pig script which is structured in the following way {code} register cp.jar dataset = load '/data/dataset/' using PigStorage('\u0001') as (col1, col2, col3, col4, col5); filtered_dataset = filter dataset by (col1 == 1); proj_filtered_dataset = foreach filtered_dataset generate col2, col3; rmf $output1; store proj_filtered_dataset into '$output1' using PigStorage(); second_stream = foreach filtered_dataset generate col2, col4, col5; group_second_stream = group second_stream by col4; output2 = foreach group_second_stream { a = second_stream.col2 b = distinct second_stream.col5; c = order b by $0; generate 1 as key, group as keyword, MYUDF(c, 100) as finalcalc; } rmf $output2; --syntax error here store output2 to '$output2' using PigStorage(); {code} I run this script using the Multi-query option, it runs successfully till the first store but later fails with a syntax error. The usage of HDFS option, rmf causes the first store to execute. The only option the I have is to run an explain before running his script grunt explain -script myscript.pig -out explain.out or moving the rmf statements to the top of the script Here are some questions: a) Can we have an option to do something like checkscript instead of explain to get the same syntax error? In this way I can ensure that I do not run for 3-4 hours before encountering a syntax error b) Can pig not figure out a way to re-order the rmf statements since all the store directories are variables Thanks Viraj -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Resolved: (PIG-1382) Command line option -c doesn't work
[ https://issues.apache.org/jira/browse/PIG-1382?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pradeep Kamath resolved PIG-1382. - Hadoop Flags: [Incompatible change] Release Note: -c (-cluster) was earlier documented as the option to provide cluster information - this was not being used in the Pig code though - with PIG-1211, -c is being reused as the option to check syntax of the pig script Assignee: Pradeep Kamath Resolution: Fixed Fixed through https://issues.apache.org/jira/browse/PIG-1211?focusedCommentId=12864002page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12864002 Command line option -c doesn't work --- Key: PIG-1382 URL: https://issues.apache.org/jira/browse/PIG-1382 Project: Pig Issue Type: Bug Affects Versions: 0.6.0 Reporter: Richard Ding Assignee: Pradeep Kamath Fix For: 0.8.0 Currently this option is not used, but it's documented: -c, -cluster clustername, kryptonite is default We should either remove it from documentation or find someway to use it. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-824) SQL interface for Pig
[ https://issues.apache.org/jira/browse/PIG-824?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thejas M Nair updated PIG-824: -- Attachment: pigsql.patch pig_sql_beta.pdf java-cup-11a-runtime.jar SQL patch (pigsql.patch) based on version of owl in svn and documentation (pig_sql_beta.pdf). Patch is against the trunk revision 941018 . SQL interface for Pig - Key: PIG-824 URL: https://issues.apache.org/jira/browse/PIG-824 Project: Pig Issue Type: New Feature Reporter: Olga Natkovich Attachments: java-cup-11a-runtime.jar, PIG-824.1.patch, PIG-824.binfiles.tar.gz, pig_sql_beta.pdf, pigsql.patch, SQL_IN_PIG.html In the last 18 month PigLatin has gained significant popularity within the open source community. Many users like its data flow model, its rich type system and its ability to work with any data available on HDFS or outside. We have also heard from many users that having Pig speak SQL would bring many more users. Having a single system that exports multiple interfaces is a big advantage as it guarantees consistent semantics, custom code reuse, and reduces the amount of maintenance. This is especially relevant for project where using both interfaces for different parts of the system is relevant. For instance, in a data warehousing system, you would have ETL component that brings data into the warehouse and a component that analyzes the data and produces reports. PigLatin is uniquely suited for ETL processing while SQL might be a better fit for report generation. To start, it would make sense to implement a subset of SQL92 standard and to be as much as possible standard compliant. This would include all the standard constructs: select, from, where, group-by + having, order by, limit, join (inner + outer). Several extensions such as support for pig's UDFs and possibly streaming, multiquery and support for pig's complex types would be helpful. This work is dependent on metadata support outlined in https://issues.apache.org/jira/browse/PIG-823 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Assigned: (PIG-824) SQL interface for Pig
[ https://issues.apache.org/jira/browse/PIG-824?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thejas M Nair reassigned PIG-824: - Assignee: Thejas M Nair SQL interface for Pig - Key: PIG-824 URL: https://issues.apache.org/jira/browse/PIG-824 Project: Pig Issue Type: New Feature Reporter: Olga Natkovich Assignee: Thejas M Nair Attachments: java-cup-11a-runtime.jar, PIG-824.1.patch, PIG-824.binfiles.tar.gz, pig_sql_beta.pdf, pigsql.patch, SQL_IN_PIG.html In the last 18 month PigLatin has gained significant popularity within the open source community. Many users like its data flow model, its rich type system and its ability to work with any data available on HDFS or outside. We have also heard from many users that having Pig speak SQL would bring many more users. Having a single system that exports multiple interfaces is a big advantage as it guarantees consistent semantics, custom code reuse, and reduces the amount of maintenance. This is especially relevant for project where using both interfaces for different parts of the system is relevant. For instance, in a data warehousing system, you would have ETL component that brings data into the warehouse and a component that analyzes the data and produces reports. PigLatin is uniquely suited for ETL processing while SQL might be a better fit for report generation. To start, it would make sense to implement a subset of SQL92 standard and to be as much as possible standard compliant. This would include all the standard constructs: select, from, where, group-by + having, order by, limit, join (inner + outer). Several extensions such as support for pig's UDFs and possibly streaming, multiquery and support for pig's complex types would be helpful. This work is dependent on metadata support outlined in https://issues.apache.org/jira/browse/PIG-823 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-824) SQL interface for Pig
[ https://issues.apache.org/jira/browse/PIG-824?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thejas M Nair updated PIG-824: -- Attachment: java-cup-11a.jar students2.bin students_attr.bin copy the attached jar files to lib/ dir to build the patch. copy the bin storage format test files to following dirs - students2.bin - test/org/apache/pig/test/data/SQL/students2.bin and contrib/owl/contrib/pig/test/java/org/apache/hadoop/owl/pig/data/SQL/students2.bin students_attr.bin - test/org/apache/pig/test/data/SQL/students_attr.bin and contrib/owl/contrib/pig/test/java/org/apache/hadoop/owl/pig/data/SQL/students_attr.bin SQL interface for Pig - Key: PIG-824 URL: https://issues.apache.org/jira/browse/PIG-824 Project: Pig Issue Type: New Feature Reporter: Olga Natkovich Assignee: Thejas M Nair Attachments: java-cup-11a-runtime.jar, java-cup-11a.jar, PIG-824.1.patch, PIG-824.binfiles.tar.gz, pig_sql_beta.pdf, pigsql.patch, SQL_IN_PIG.html, students2.bin, students_attr.bin In the last 18 month PigLatin has gained significant popularity within the open source community. Many users like its data flow model, its rich type system and its ability to work with any data available on HDFS or outside. We have also heard from many users that having Pig speak SQL would bring many more users. Having a single system that exports multiple interfaces is a big advantage as it guarantees consistent semantics, custom code reuse, and reduces the amount of maintenance. This is especially relevant for project where using both interfaces for different parts of the system is relevant. For instance, in a data warehousing system, you would have ETL component that brings data into the warehouse and a component that analyzes the data and produces reports. PigLatin is uniquely suited for ETL processing while SQL might be a better fit for report generation. To start, it would make sense to implement a subset of SQL92 standard and to be as much as possible standard compliant. This would include all the standard constructs: select, from, where, group-by + having, order by, limit, join (inner + outer). Several extensions such as support for pig's UDFs and possibly streaming, multiquery and support for pig's complex types would be helpful. This work is dependent on metadata support outlined in https://issues.apache.org/jira/browse/PIG-823 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-824) SQL interface for Pig
[ https://issues.apache.org/jira/browse/PIG-824?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thejas M Nair updated PIG-824: -- Attachment: pigsql_tutorial.txt Attaching SQL tutorial (pigsql_tutorial.txt) - This Pig SQL tutorial shows you how to run SQL scripts in local mode and mapreduce mode. The metadata is stored using Owl. In this tutorial a jetty/derby based owl setup is used so that only minimal setup needs to be done to get started. SQL interface for Pig - Key: PIG-824 URL: https://issues.apache.org/jira/browse/PIG-824 Project: Pig Issue Type: New Feature Reporter: Olga Natkovich Assignee: Thejas M Nair Attachments: java-cup-11a-runtime.jar, java-cup-11a.jar, PIG-824.1.patch, PIG-824.binfiles.tar.gz, pig_sql_beta.pdf, pigsql.patch, pigsql_tutorial.txt, SQL_IN_PIG.html, students2.bin, students_attr.bin In the last 18 month PigLatin has gained significant popularity within the open source community. Many users like its data flow model, its rich type system and its ability to work with any data available on HDFS or outside. We have also heard from many users that having Pig speak SQL would bring many more users. Having a single system that exports multiple interfaces is a big advantage as it guarantees consistent semantics, custom code reuse, and reduces the amount of maintenance. This is especially relevant for project where using both interfaces for different parts of the system is relevant. For instance, in a data warehousing system, you would have ETL component that brings data into the warehouse and a component that analyzes the data and produces reports. PigLatin is uniquely suited for ETL processing while SQL might be a better fit for report generation. To start, it would make sense to implement a subset of SQL92 standard and to be as much as possible standard compliant. This would include all the standard constructs: select, from, where, group-by + having, order by, limit, join (inner + outer). Several extensions such as support for pig's UDFs and possibly streaming, multiquery and support for pig's complex types would be helpful. This work is dependent on metadata support outlined in https://issues.apache.org/jira/browse/PIG-823 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1391) pig unit tests leave behind files in temp directory because MiniCluster files don't get deleted
[ https://issues.apache.org/jira/browse/PIG-1391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12864070#action_12864070 ] Hadoop QA commented on PIG-1391: -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12443607/PIG-1391.trunk.patch against trunk revision 940601. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 251 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/314/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/314/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/314/console This message is automatically generated. pig unit tests leave behind files in temp directory because MiniCluster files don't get deleted --- Key: PIG-1391 URL: https://issues.apache.org/jira/browse/PIG-1391 Project: Pig Issue Type: Bug Affects Versions: 0.7.0, 0.8.0 Reporter: Thejas M Nair Assignee: Thejas M Nair Fix For: 0.6.0, 0.7.0, 0.8.0 Attachments: minicluster.patch, PIG-1391.06.2.patch, PIG-1391.06.patch, PIG-1391.07.patch, PIG-1391.trunk.patch Pig unit test runs leave behind files in temp dir (/tmp) and there are too many files in the directory over time. Most of the files are left behind by MiniCluster . It closes/shutsdown MiniDFSCluster, MiniMRCluster and the FileSystem that it has created when the constructor is called, only in finalize(). And java does not guarantee that finalize() will be called. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1288) EvalFunc returnType is wrong for generic subclasses
[ https://issues.apache.org/jira/browse/PIG-1288?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai updated PIG-1288: Status: Open (was: Patch Available) EvalFunc returnType is wrong for generic subclasses --- Key: PIG-1288 URL: https://issues.apache.org/jira/browse/PIG-1288 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.7.0 Reporter: Daniel Dai Assignee: Daniel Dai Fix For: 0.8.0 Attachments: PIG-1288-1.patch, PIG-1288-2.patch, PIG-1288-3.patch From Garrett Buster Kaminaga: The EvalFunc constructor has code to determine the return type of the function. This walks up the object hierarchy until it encounters EvalFunc, then calls getActualTypeArguments and extracts type param 0. However, if the user class is itself a generic extension of EvalFunc, then the returned object is not the correct type, but a TypeVariable. Example: class MyAbstractEvalFuncT extends EvalFuncT ... class MyEvalFunc extends MyAbstractEvalFuncString ... when MyEvalFunc() is called, inside EvalFunc constructor the return type is set to a TypeVariable rather than String.class. The workaround we've implemented is for the MyAbstractEvalFuncT to determine *its* type parameters using code similar to that in the EvalFunc constructor, and then reset protected data member returnType manually in the MyAbstractEvalFunc constructor. (though this has the same drawback of not working if someone then extends MyAbstractEvalFunc) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1288) EvalFunc returnType is wrong for generic subclasses
[ https://issues.apache.org/jira/browse/PIG-1288?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai updated PIG-1288: Attachment: PIG-1288-3.patch Fix unit test failures EvalFunc returnType is wrong for generic subclasses --- Key: PIG-1288 URL: https://issues.apache.org/jira/browse/PIG-1288 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.7.0 Reporter: Daniel Dai Assignee: Daniel Dai Fix For: 0.8.0 Attachments: PIG-1288-1.patch, PIG-1288-2.patch, PIG-1288-3.patch From Garrett Buster Kaminaga: The EvalFunc constructor has code to determine the return type of the function. This walks up the object hierarchy until it encounters EvalFunc, then calls getActualTypeArguments and extracts type param 0. However, if the user class is itself a generic extension of EvalFunc, then the returned object is not the correct type, but a TypeVariable. Example: class MyAbstractEvalFuncT extends EvalFuncT ... class MyEvalFunc extends MyAbstractEvalFuncString ... when MyEvalFunc() is called, inside EvalFunc constructor the return type is set to a TypeVariable rather than String.class. The workaround we've implemented is for the MyAbstractEvalFuncT to determine *its* type parameters using code similar to that in the EvalFunc constructor, and then reset protected data member returnType manually in the MyAbstractEvalFunc constructor. (though this has the same drawback of not working if someone then extends MyAbstractEvalFunc) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1288) EvalFunc returnType is wrong for generic subclasses
[ https://issues.apache.org/jira/browse/PIG-1288?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai updated PIG-1288: Status: Patch Available (was: Open) EvalFunc returnType is wrong for generic subclasses --- Key: PIG-1288 URL: https://issues.apache.org/jira/browse/PIG-1288 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.7.0 Reporter: Daniel Dai Assignee: Daniel Dai Fix For: 0.8.0 Attachments: PIG-1288-1.patch, PIG-1288-2.patch, PIG-1288-3.patch From Garrett Buster Kaminaga: The EvalFunc constructor has code to determine the return type of the function. This walks up the object hierarchy until it encounters EvalFunc, then calls getActualTypeArguments and extracts type param 0. However, if the user class is itself a generic extension of EvalFunc, then the returned object is not the correct type, but a TypeVariable. Example: class MyAbstractEvalFuncT extends EvalFuncT ... class MyEvalFunc extends MyAbstractEvalFuncString ... when MyEvalFunc() is called, inside EvalFunc constructor the return type is set to a TypeVariable rather than String.class. The workaround we've implemented is for the MyAbstractEvalFuncT to determine *its* type parameters using code similar to that in the EvalFunc constructor, and then reset protected data member returnType manually in the MyAbstractEvalFunc constructor. (though this has the same drawback of not working if someone then extends MyAbstractEvalFunc) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1331) Owl Hadoop Table Management Service
[ https://issues.apache.org/jira/browse/PIG-1331?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12864088#action_12864088 ] Jeff Hammerbacher commented on PIG-1331: Okay, seems like PIG-823 should be closed then. I don't have the ability to do that. Owl Hadoop Table Management Service --- Key: PIG-1331 URL: https://issues.apache.org/jira/browse/PIG-1331 Project: Pig Issue Type: New Feature Affects Versions: 0.8.0 Reporter: Jay Tang Assignee: Ajay Kidave Fix For: 0.8.0 Attachments: anttestoutput.tgz, build.log, ivy_version.patch, owl.contrib.3.tgz, owl.contrib.4.tar.gz This JIRA is a proposal to create a Hadoop table management service: Owl. Today, MapReduce and Pig applications interacts directly with HDFS directories and files and must deal with low level data management issues such as storage format, serialization/compression schemes, data layout, and efficient data accesses, etc, often with different solutions. Owl aims to provide a standard way to addresses this issue and abstracts away the complexities of reading/writing huge amount of data from/to HDFS. Owl has a data access API that is modeled after the traditional Hadoop !InputFormt and a management API to manipulate Owl objects. This JIRA is related to Pig-823 (Hadoop Metadata Service) as Owl has an internal metadata store. Owl integrates with different storage module like Zebra with a pluggable architecture. Initially, the proposal is to submit Owl as a Pig contrib project. Over time, it makes sense to move it to a Hadoop subproject. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1211) Pig script runs half way after which it reports syntax error
[ https://issues.apache.org/jira/browse/PIG-1211?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12864120#action_12864120 ] Hadoop QA commented on PIG-1211: -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12443635/PIG-1211.patch against trunk revision 941005. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 6 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. -1 release audit. The applied patch generated 530 release audit warnings (more than the trunk's current 529 warnings). -1 core tests. The patch failed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/308/testReport/ Release audit warnings: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/308/artifact/trunk/patchprocess/releaseAuditDiffWarnings.txt Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/308/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/308/console This message is automatically generated. Pig script runs half way after which it reports syntax error Key: PIG-1211 URL: https://issues.apache.org/jira/browse/PIG-1211 Project: Pig Issue Type: Improvement Components: impl Affects Versions: 0.6.0 Reporter: Viraj Bhat Fix For: 0.8.0 Attachments: PIG-1211.patch I have a Pig script which is structured in the following way {code} register cp.jar dataset = load '/data/dataset/' using PigStorage('\u0001') as (col1, col2, col3, col4, col5); filtered_dataset = filter dataset by (col1 == 1); proj_filtered_dataset = foreach filtered_dataset generate col2, col3; rmf $output1; store proj_filtered_dataset into '$output1' using PigStorage(); second_stream = foreach filtered_dataset generate col2, col4, col5; group_second_stream = group second_stream by col4; output2 = foreach group_second_stream { a = second_stream.col2 b = distinct second_stream.col5; c = order b by $0; generate 1 as key, group as keyword, MYUDF(c, 100) as finalcalc; } rmf $output2; --syntax error here store output2 to '$output2' using PigStorage(); {code} I run this script using the Multi-query option, it runs successfully till the first store but later fails with a syntax error. The usage of HDFS option, rmf causes the first store to execute. The only option the I have is to run an explain before running his script grunt explain -script myscript.pig -out explain.out or moving the rmf statements to the top of the script Here are some questions: a) Can we have an option to do something like checkscript instead of explain to get the same syntax error? In this way I can ensure that I do not run for 3-4 hours before encountering a syntax error b) Can pig not figure out a way to re-order the rmf statements since all the store directories are variables Thanks Viraj -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.