[jira] Commented: (PIG-1178) LogicalPlan and Optimizer are too complex and hard to work with
[ https://issues.apache.org/jira/browse/PIG-1178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12892658#action_12892658 ] Hadoop QA commented on PIG-1178: -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12450250/PIG-1178-4.patch against trunk revision 979362. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 48 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. -1 release audit. The applied patch generated 446 release audit warnings (more than the trunk's current 398 warnings). -1 core tests. The patch failed core unit tests. -1 contrib tests. The patch failed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/355/testReport/ Release audit warnings: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/355/artifact/trunk/patchprocess/releaseAuditDiffWarnings.txt Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/355/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/355/console This message is automatically generated. LogicalPlan and Optimizer are too complex and hard to work with --- Key: PIG-1178 URL: https://issues.apache.org/jira/browse/PIG-1178 Project: Pig Issue Type: Improvement Reporter: Alan Gates Assignee: Daniel Dai Fix For: 0.8.0 Attachments: expressions-2.patch, expressions.patch, lp.patch, lp.patch, PIG-1178-4.patch, pig_1178.patch, pig_1178.patch, PIG_1178.patch, pig_1178_2.patch, pig_1178_3.2.patch, pig_1178_3.3.patch, pig_1178_3.4.patch, pig_1178_3.patch The current implementation of the logical plan and the logical optimizer in Pig has proven to not be easily extensible. Developer feedback has indicated that adding new rules to the optimizer is quite burdensome. In addition, the logical plan has been an area of numerous bugs, many of which have been difficult to fix. Developers also feel that the logical plan is difficult to understand and maintain. The root cause for these issues is that a number of design decisions that were made as part of the 0.2 rewrite of the front end have now proven to be sub-optimal. The heart of this proposal is to revisit a number of those proposals and rebuild the logical plan with a simpler design that will make it much easier to maintain the logical plan as well as extend the logical optimizer. See http://wiki.apache.org/pig/PigLogicalPlanOptimizerRewrite for full details. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-348) -j command line option doesn't work
[ https://issues.apache.org/jira/browse/PIG-348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12892663#action_12892663 ] Hadoop QA commented on PIG-348: --- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12450362/PIG-348.path against trunk revision 979503. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. -1 patch. The patch command could not apply the patch. Console output: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/357/console This message is automatically generated. -j command line option doesn't work --- Key: PIG-348 URL: https://issues.apache.org/jira/browse/PIG-348 Project: Pig Issue Type: Improvement Components: documentation Reporter: Amir Youssefi Assignee: Richard Ding Fix For: 0.8.0 Attachments: PIG-348.path According to: $ pig --help ... -j, -jar jarfile load jarfile ... yet $pig -j my.jar doesn't work in place of: register my.jar in Pig script. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1249) Safe-guards against misconfigured Pig scripts without PARALLEL keyword
[ https://issues.apache.org/jira/browse/PIG-1249?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jeff Zhang updated PIG-1249: Attachment: PIG-1249_5.patch Safe-guards against misconfigured Pig scripts without PARALLEL keyword -- Key: PIG-1249 URL: https://issues.apache.org/jira/browse/PIG-1249 Project: Pig Issue Type: Improvement Affects Versions: 0.8.0 Reporter: Arun C Murthy Assignee: Jeff Zhang Priority: Critical Fix For: 0.8.0 Attachments: PIG-1249-4.patch, PIG-1249.patch, PIG-1249_5.patch, PIG_1249_2.patch, PIG_1249_3.patch It would be *very* useful for Pig to have safe-guards against naive scripts which process a *lot* of data without the use of PARALLEL keyword. We've seen a fair number of instances where naive users process huge data-sets (10TB) with badly mis-configured #reduces e.g. 1 reduce. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1249) Safe-guards against misconfigured Pig scripts without PARALLEL keyword
[ https://issues.apache.org/jira/browse/PIG-1249?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12892697#action_12892697 ] Jeff Zhang commented on PIG-1249: - Olga, I generated the patch for the latest trunk. And add doc for in method estimateNumberOfReducers in JobControlCompiler. If you need anything else, feel free to tell me. Safe-guards against misconfigured Pig scripts without PARALLEL keyword -- Key: PIG-1249 URL: https://issues.apache.org/jira/browse/PIG-1249 Project: Pig Issue Type: Improvement Affects Versions: 0.8.0 Reporter: Arun C Murthy Assignee: Jeff Zhang Priority: Critical Fix For: 0.8.0 Attachments: PIG-1249-4.patch, PIG-1249.patch, PIG-1249_5.patch, PIG_1249_2.patch, PIG_1249_3.patch It would be *very* useful for Pig to have safe-guards against naive scripts which process a *lot* of data without the use of PARALLEL keyword. We've seen a fair number of instances where naive users process huge data-sets (10TB) with badly mis-configured #reduces e.g. 1 reduce. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1229) allow pig to write output into a JDBC db
[ https://issues.apache.org/jira/browse/PIG-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ankur updated PIG-1229: --- Attachment: jira-1229-final.patch Hope this one finally goes in . allow pig to write output into a JDBC db Key: PIG-1229 URL: https://issues.apache.org/jira/browse/PIG-1229 Project: Pig Issue Type: New Feature Components: impl Reporter: Ian Holsman Assignee: Ankur Priority: Minor Fix For: 0.8.0 Attachments: jira-1229-final.patch, jira-1229-v2.patch, jira-1229-v3.patch, pig-1229.2.patch, pig-1229.patch UDF to store data into a DB -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1229) allow pig to write output into a JDBC db
[ https://issues.apache.org/jira/browse/PIG-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ankur updated PIG-1229: --- Status: Patch Available (was: In Progress) Regenerated the patch as per Ashutosh's suggestion. allow pig to write output into a JDBC db Key: PIG-1229 URL: https://issues.apache.org/jira/browse/PIG-1229 Project: Pig Issue Type: New Feature Components: impl Reporter: Ian Holsman Assignee: Ankur Priority: Minor Fix For: 0.8.0 Attachments: jira-1229-final.patch, jira-1229-v2.patch, jira-1229-v3.patch, pig-1229.2.patch, pig-1229.patch UDF to store data into a DB -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1512) PlanPrinter does not print LOJoin operator in the new logical optimization framework
[ https://issues.apache.org/jira/browse/PIG-1512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12892740#action_12892740 ] Hadoop QA commented on PIG-1512: -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12450145/printJoin.patch against trunk revision 979503. +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no tests are needed for this patch. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. -1 release audit. The applied patch generated 407 release audit warnings (more than the trunk's current 405 warnings). -1 core tests. The patch failed core unit tests. -1 contrib tests. The patch failed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/380/testReport/ Release audit warnings: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/380/artifact/trunk/patchprocess/releaseAuditDiffWarnings.txt Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/380/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/380/console This message is automatically generated. PlanPrinter does not print LOJoin operator in the new logical optimization framework Key: PIG-1512 URL: https://issues.apache.org/jira/browse/PIG-1512 Project: Pig Issue Type: Bug Affects Versions: 0.8.0 Reporter: Swati Jain Assignee: Swati Jain Fix For: 0.8.0 Attachments: printJoin.patch PlanPrinter does not print LOJoin relational operator. As such, the LOJoin operator would not get printed when we do an explain. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1343) pig_log file missing even though Main tells it is creating one and an M/R job fails
[ https://issues.apache.org/jira/browse/PIG-1343?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12892800#action_12892800 ] Ashitosh Darbarwar commented on PIG-1343: - Hi .. I'm really sorry for such a long delay .. I'll resume my work on this issue.. thanks pig_log file missing even though Main tells it is creating one and an M/R job fails Key: PIG-1343 URL: https://issues.apache.org/jira/browse/PIG-1343 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.6.0 Reporter: Viraj Bhat Assignee: Ashitosh Darbarwar Fix For: 0.8.0 Attachments: PIG-1343-1.patch There is a particular case where I was running with the latest trunk of Pig. {code} $java -cp pig.jar:/home/path/hadoop20cluster org.apache.pig.Main testcase.pig [main] INFO org.apache.pig.Main - Logging error messages to: /homes/viraj/pig_1263420012601.log $ls -l pig_1263420012601.log ls: pig_1263420012601.log: No such file or directory {code} The job failed and the log file did not contain anything, the only way to debug was to look into the Jobtracker logs. Here are some reasons which would have caused this behavior: 1) The underlying filer/NFS had some issues. In that case do we not error on stdout? 2) There are some errors from the backend which are not being captured Viraj -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1034) Pig does not support ORDER ... BY group alias
[ https://issues.apache.org/jira/browse/PIG-1034?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12892821#action_12892821 ] Thejas M Nair commented on PIG-1034: +1 please commit. I also ran the test-patch and unit tests on my machine, they pass. Pig does not support ORDER ... BY group alias - Key: PIG-1034 URL: https://issues.apache.org/jira/browse/PIG-1034 Project: Pig Issue Type: Bug Affects Versions: 0.8.0 Reporter: David Ciemiewicz Assignee: Jeff Zhang Fix For: 0.8.0 Attachments: PIG_1034.patch GROUP ... ALL and GROUP ... BY produce an alias group. Pig produces a syntax error if you attempt to ORDER ... BY group. This does seem like a perfectly reasonable thing to do. The workaround is to create an alias for group using an AS clause. But I think this workaround should be unnecessary. Here's sample code which elicits the syntax error: {code} A = load 'one.txt' using PigStorage as (one: int); B = group A all; C = foreach B generate group, COUNT(A) as count; D = order C by group parallel 1; -- group is one of the aliases in C, why does this throw a syntax error? dump D; {code} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-348) -j command line option doesn't work
[ https://issues.apache.org/jira/browse/PIG-348?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Richard Ding updated PIG-348: - Attachment: PIG-348_1.patch Resync with trunk -j command line option doesn't work --- Key: PIG-348 URL: https://issues.apache.org/jira/browse/PIG-348 Project: Pig Issue Type: Improvement Components: documentation Reporter: Amir Youssefi Assignee: Richard Ding Fix For: 0.8.0 Attachments: PIG-348.path, PIG-348_1.patch According to: $ pig --help ... -j, -jar jarfile load jarfile ... yet $pig -j my.jar doesn't work in place of: register my.jar in Pig script. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-348) -j command line option doesn't work
[ https://issues.apache.org/jira/browse/PIG-348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12892844#action_12892844 ] Richard Ding commented on PIG-348: -- committed the patch to trunk. -j command line option doesn't work --- Key: PIG-348 URL: https://issues.apache.org/jira/browse/PIG-348 Project: Pig Issue Type: Improvement Components: documentation Reporter: Amir Youssefi Assignee: Richard Ding Fix For: 0.8.0 Attachments: PIG-348.path, PIG-348_1.patch According to: $ pig --help ... -j, -jar jarfile load jarfile ... yet $pig -j my.jar doesn't work in place of: register my.jar in Pig script. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-348) -j command line option doesn't work
[ https://issues.apache.org/jira/browse/PIG-348?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Richard Ding updated PIG-348: - Assignee: Corinne Chandel (was: Richard Ding) -j command line option doesn't work --- Key: PIG-348 URL: https://issues.apache.org/jira/browse/PIG-348 Project: Pig Issue Type: Improvement Components: documentation Reporter: Amir Youssefi Assignee: Corinne Chandel Fix For: 0.8.0 Attachments: PIG-348.path, PIG-348_1.patch According to: $ pig --help ... -j, -jar jarfile load jarfile ... yet $pig -j my.jar doesn't work in place of: register my.jar in Pig script. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-348) -j command line option doesn't work
[ https://issues.apache.org/jira/browse/PIG-348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12892842#action_12892842 ] Richard Ding commented on PIG-348: -- Manually ran and passed core tests. -j command line option doesn't work --- Key: PIG-348 URL: https://issues.apache.org/jira/browse/PIG-348 Project: Pig Issue Type: Improvement Components: documentation Reporter: Amir Youssefi Assignee: Richard Ding Fix For: 0.8.0 Attachments: PIG-348.path, PIG-348_1.patch According to: $ pig --help ... -j, -jar jarfile load jarfile ... yet $pig -j my.jar doesn't work in place of: register my.jar in Pig script. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1496) Mandatory rule ImplicitSplitInserter
[ https://issues.apache.org/jira/browse/PIG-1496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12892846#action_12892846 ] Daniel Dai commented on PIG-1496: - +1, patch looks good. The only comment, can we also put the comments like the old rule does? Mandatory rule ImplicitSplitInserter Key: PIG-1496 URL: https://issues.apache.org/jira/browse/PIG-1496 Project: Pig Issue Type: Sub-task Components: impl Affects Versions: 0.8.0 Reporter: Daniel Dai Assignee: Yan Zhou Fix For: 0.8.0 Attachments: PIG-1496.patch Need to migrate ImplicitSplitInserter to new logical optimizer. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1445) Pig error: ERROR 2013: Moving LOLimit in front of LOStream is not implemented
[ https://issues.apache.org/jira/browse/PIG-1445?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai updated PIG-1445: Status: Resolved (was: Patch Available) Hadoop Flags: [Reviewed] Resolution: Fixed Run unit test manually and all pass. Patch committed. Pig error: ERROR 2013: Moving LOLimit in front of LOStream is not implemented -- Key: PIG-1445 URL: https://issues.apache.org/jira/browse/PIG-1445 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.7.0 Reporter: Daniel Dai Assignee: Daniel Dai Fix For: 0.8.0 Attachments: PIG-1445-1.patch The following script fail due to ERROR 2013: Moving LOLimit in front of LOStream is not implemented. {code} A = LOAD 'data'; B = STREAM A THROUGH `stream.pl`; C = LIMIT B 10; explain C; {code} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-259) allow store to overwrite existing directroy
[ https://issues.apache.org/jira/browse/PIG-259?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Olga Natkovich updated PIG-259: --- Fix Version/s: 0.9.0 Affects Version/s: (was: 0.9.0) allow store to overwrite existing directroy --- Key: PIG-259 URL: https://issues.apache.org/jira/browse/PIG-259 Project: Pig Issue Type: Sub-task Reporter: Olga Natkovich Assignee: Jeff Zhang Fix For: 0.9.0 Attachments: Pig_259.patch, Pig_259_2.patch, Pig_259_3.patch, Pig_259_4.patch we have users who are asking for a flag to overwrite existing directory -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1500) guava.jar should be removed from the lib folder
[ https://issues.apache.org/jira/browse/PIG-1500?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] niraj rai updated PIG-1500: --- Attachment: guava.jar.06.afterjython.patch Merged with the Daniel code of Jython guava.jar should be removed from the lib folder --- Key: PIG-1500 URL: https://issues.apache.org/jira/browse/PIG-1500 Project: Pig Issue Type: Bug Components: build Reporter: Giridharan Kesavan Assignee: niraj rai Fix For: 0.8.0 Attachments: guava.jar.06.afterjython.patch, guava.jar.r06.patch, removeGuavaJar.patch guava jar is available in the maven repository but still its is checked into the pig trunk's lib folder. I ve checked the availability of guava jar in the maven repository. http://mvnrepository.com/artifact/com.google.guava/guava -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1500) guava.jar should be removed from the lib folder
[ https://issues.apache.org/jira/browse/PIG-1500?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] niraj rai updated PIG-1500: --- Status: Patch Available (was: Open) guava.jar should be removed from the lib folder --- Key: PIG-1500 URL: https://issues.apache.org/jira/browse/PIG-1500 Project: Pig Issue Type: Bug Components: build Reporter: Giridharan Kesavan Assignee: niraj rai Fix For: 0.8.0 Attachments: guava.jar.06.afterjython.patch, guava.jar.r06.patch, removeGuavaJar.patch guava jar is available in the maven repository but still its is checked into the pig trunk's lib folder. I ve checked the availability of guava jar in the maven repository. http://mvnrepository.com/artifact/com.google.guava/guava -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1500) guava.jar should be removed from the lib folder
[ https://issues.apache.org/jira/browse/PIG-1500?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] niraj rai updated PIG-1500: --- Status: Open (was: Patch Available) guava.jar should be removed from the lib folder --- Key: PIG-1500 URL: https://issues.apache.org/jira/browse/PIG-1500 Project: Pig Issue Type: Bug Components: build Reporter: Giridharan Kesavan Assignee: niraj rai Fix For: 0.8.0 Attachments: guava.jar.06.afterjython.patch, guava.jar.r06.patch, removeGuavaJar.patch guava jar is available in the maven repository but still its is checked into the pig trunk's lib folder. I ve checked the availability of guava jar in the maven repository. http://mvnrepository.com/artifact/com.google.guava/guava -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Resolved: (PIG-498) Pig does not error out while trying to use a input file to which the user does not have access permissions
[ https://issues.apache.org/jira/browse/PIG-498?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Olga Natkovich resolved PIG-498. Resolution: Fixed Pig does not error out while trying to use a input file to which the user does not have access permissions -- Key: PIG-498 URL: https://issues.apache.org/jira/browse/PIG-498 Project: Pig Issue Type: Bug Affects Versions: 0.2.0 Reporter: Pradeep Kamath Assignee: niraj rai Fix For: 0.8.0 Session illustrating the issue. {code} bash-3.00$ hadoop fs -ls /data/statistics.txt ls: org.apache.hadoop.fs.permission.AccessControlException: Permission denied: user=username, access=READ_EXECUTE, inode=inodepermissions- bash-3.00$ pig -latest 2008-10-16 23:31:25,134 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to HOD... ... 2008-10-16 23:34:45,810 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to hadoop file system at: local grunt a = load '/data/statistics.txt'; grunt dump a; 2008-10-16 23:39:05,624 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 100% complete 2008-10-16 23:39:05,624 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Success! grunt {code} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1517) Pig needs to support keywords in the package name
[ https://issues.apache.org/jira/browse/PIG-1517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12892863#action_12892863 ] Aniket Mokashi commented on PIG-1517: - This bug is an extension of https://issues.apache.org/jira/browse/PIG-656, does not need extra test cases. Other tests pass manually. Pig needs to support keywords in the package name - Key: PIG-1517 URL: https://issues.apache.org/jira/browse/PIG-1517 Project: Pig Issue Type: Bug Components: grunt Reporter: Aniket Mokashi Assignee: Aniket Mokashi Priority: Minor Fix For: 0.8.0 Attachments: pigusergroup656.patch Pig needs to support keywords in the package name. Pig supports most of the keywords as this was fixed in https://issues.apache.org/jira/browse/PIG-656. There are a few missing tokens like eq,gt,lt,gte,lte,neq that need to be supported. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1288) EvalFunc returnType is wrong for generic subclasses
[ https://issues.apache.org/jira/browse/PIG-1288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12892864#action_12892864 ] Richard Ding commented on PIG-1288: --- +1 EvalFunc returnType is wrong for generic subclasses --- Key: PIG-1288 URL: https://issues.apache.org/jira/browse/PIG-1288 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.7.0 Reporter: Daniel Dai Assignee: Daniel Dai Fix For: 0.8.0 Attachments: PIG-1288-1.patch, PIG-1288-2.patch, PIG-1288-3.patch, PIG-1288-4.patch From Garrett Buster Kaminaga: The EvalFunc constructor has code to determine the return type of the function. This walks up the object hierarchy until it encounters EvalFunc, then calls getActualTypeArguments and extracts type param 0. However, if the user class is itself a generic extension of EvalFunc, then the returned object is not the correct type, but a TypeVariable. Example: class MyAbstractEvalFuncT extends EvalFuncT ... class MyEvalFunc extends MyAbstractEvalFuncString ... when MyEvalFunc() is called, inside EvalFunc constructor the return type is set to a TypeVariable rather than String.class. The workaround we've implemented is for the MyAbstractEvalFuncT to determine *its* type parameters using code similar to that in the EvalFunc constructor, and then reset protected data member returnType manually in the MyAbstractEvalFunc constructor. (though this has the same drawback of not working if someone then extends MyAbstractEvalFunc) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1517) Pig needs to support keywords in the package name
[ https://issues.apache.org/jira/browse/PIG-1517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12892867#action_12892867 ] Olga Natkovich commented on PIG-1517: - Thanks, Aniket. I will review and commit the patch Pig needs to support keywords in the package name - Key: PIG-1517 URL: https://issues.apache.org/jira/browse/PIG-1517 Project: Pig Issue Type: Bug Components: grunt Reporter: Aniket Mokashi Assignee: Aniket Mokashi Priority: Minor Fix For: 0.8.0 Attachments: pigusergroup656.patch Pig needs to support keywords in the package name. Pig supports most of the keywords as this was fixed in https://issues.apache.org/jira/browse/PIG-656. There are a few missing tokens like eq,gt,lt,gte,lte,neq that need to be supported. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1034) Pig does not support ORDER ... BY group alias
[ https://issues.apache.org/jira/browse/PIG-1034?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thejas M Nair updated PIG-1034: --- Status: Resolved (was: Patch Available) Resolution: Fixed I have committed the patch. Jeff, thanks for the contribution. Also, marking the jira as fixed. Pig does not support ORDER ... BY group alias - Key: PIG-1034 URL: https://issues.apache.org/jira/browse/PIG-1034 Project: Pig Issue Type: Bug Affects Versions: 0.8.0 Reporter: David Ciemiewicz Assignee: Jeff Zhang Fix For: 0.8.0 Attachments: PIG_1034.patch GROUP ... ALL and GROUP ... BY produce an alias group. Pig produces a syntax error if you attempt to ORDER ... BY group. This does seem like a perfectly reasonable thing to do. The workaround is to create an alias for group using an AS clause. But I think this workaround should be unnecessary. Here's sample code which elicits the syntax error: {code} A = load 'one.txt' using PigStorage as (one: int); B = group A all; C = foreach B generate group, COUNT(A) as count; D = order C by group parallel 1; -- group is one of the aliases in C, why does this throw a syntax error? dump D; {code} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1249) Safe-guards against misconfigured Pig scripts without PARALLEL keyword
[ https://issues.apache.org/jira/browse/PIG-1249?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12892871#action_12892871 ] Olga Natkovich commented on PIG-1249: - Hi Jeff, Thanks for the quick response. I will review and commit the patch. I am going to add a log statement for the reduce value that has been computed. I will also copy your doc comment from the src to the JIRA to assist our doc writer Safe-guards against misconfigured Pig scripts without PARALLEL keyword -- Key: PIG-1249 URL: https://issues.apache.org/jira/browse/PIG-1249 Project: Pig Issue Type: Improvement Affects Versions: 0.8.0 Reporter: Arun C Murthy Assignee: Jeff Zhang Priority: Critical Fix For: 0.8.0 Attachments: PIG-1249-4.patch, PIG-1249.patch, PIG-1249_5.patch, PIG_1249_2.patch, PIG_1249_3.patch It would be *very* useful for Pig to have safe-guards against naive scripts which process a *lot* of data without the use of PARALLEL keyword. We've seen a fair number of instances where naive users process huge data-sets (10TB) with badly mis-configured #reduces e.g. 1 reduce. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1249) Safe-guards against misconfigured Pig scripts without PARALLEL keyword
[ https://issues.apache.org/jira/browse/PIG-1249?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12892873#action_12892873 ] Hadoop QA commented on PIG-1249: -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12450579/PIG-1249_5.patch against trunk revision 979503. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 5 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed core unit tests. -1 contrib tests. The patch failed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/359/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/359/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/359/console This message is automatically generated. Safe-guards against misconfigured Pig scripts without PARALLEL keyword -- Key: PIG-1249 URL: https://issues.apache.org/jira/browse/PIG-1249 Project: Pig Issue Type: Improvement Affects Versions: 0.8.0 Reporter: Arun C Murthy Assignee: Jeff Zhang Priority: Critical Fix For: 0.8.0 Attachments: PIG-1249-4.patch, PIG-1249.patch, PIG-1249_5.patch, PIG_1249_2.patch, PIG_1249_3.patch It would be *very* useful for Pig to have safe-guards against naive scripts which process a *lot* of data without the use of PARALLEL keyword. We've seen a fair number of instances where naive users process huge data-sets (10TB) with badly mis-configured #reduces e.g. 1 reduce. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1520) Remove Owl from Pig contrib
[ https://issues.apache.org/jira/browse/PIG-1520?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Gates updated PIG-1520: Attachment: PIG-1520.patch Remove Owl from Pig contrib --- Key: PIG-1520 URL: https://issues.apache.org/jira/browse/PIG-1520 Project: Pig Issue Type: Task Components: impl Affects Versions: 0.8.0 Reporter: Alan Gates Assignee: Alan Gates Fix For: 0.8.0 Attachments: PIG-1520.patch Yahoo has transitioned work on Owl to Howl (which will not be a Pig contrib project). Since no one else is working on Owl and there will be no one to support it we should remove it from our contrib before releasing 0.8. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1520) Remove Owl from Pig contrib
[ https://issues.apache.org/jira/browse/PIG-1520?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Gates updated PIG-1520: Status: Patch Available (was: Open) Remove Owl from Pig contrib --- Key: PIG-1520 URL: https://issues.apache.org/jira/browse/PIG-1520 Project: Pig Issue Type: Task Components: impl Affects Versions: 0.8.0 Reporter: Alan Gates Assignee: Alan Gates Fix For: 0.8.0 Attachments: PIG-1520.patch Yahoo has transitioned work on Owl to Howl (which will not be a Pig contrib project). Since no one else is working on Owl and there will be no one to support it we should remove it from our contrib before releasing 0.8. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1461) support union operation that merges based on column names
[ https://issues.apache.org/jira/browse/PIG-1461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12892892#action_12892892 ] Thejas M Nair commented on PIG-1461: The syntax - union ... using 'merge' introduces a new use of the using '...' clause. So far this clause has been used to indicate the implementation algorithm and it did not have any impact on the semantics. Instead a key word might be better if we are trying to avoid introducing another top level operator, similar to the case of outer joins - eg - union onschema L1, L2; More suggestions/opinions on the syntax for this feature are welcome. support union operation that merges based on column names - Key: PIG-1461 URL: https://issues.apache.org/jira/browse/PIG-1461 Project: Pig Issue Type: New Feature Components: impl Affects Versions: 0.8.0 Reporter: Thejas M Nair Assignee: Thejas M Nair Fix For: 0.8.0 When the data has schema, it often makes sense to union on column names in schema rather than the position of the columns. The behavior of existing union operator should remain backward compatible . This feature can be supported using either a new operator or extending union to support 'using' clause . I am thinking of having a new operator called either unionschema or merge . Does anybody have any other suggestions for the syntax ? example - L1 = load 'x' as (a,b); L2 = load 'y' as (b,c); U = unionschema L1, L2; describe U; U: {a:bytearray, b:byetarray, c:bytearray} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1461) support union operation that merges based on column names
[ https://issues.apache.org/jira/browse/PIG-1461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12892903#action_12892903 ] Alan Gates commented on PIG-1461: - +1 on Thejas comment that so far using indicates implementation not semantic change and it's best to keep it that way. union onschema seems fine, as this seems equivalent to join outer. support union operation that merges based on column names - Key: PIG-1461 URL: https://issues.apache.org/jira/browse/PIG-1461 Project: Pig Issue Type: New Feature Components: impl Affects Versions: 0.8.0 Reporter: Thejas M Nair Assignee: Thejas M Nair Fix For: 0.8.0 When the data has schema, it often makes sense to union on column names in schema rather than the position of the columns. The behavior of existing union operator should remain backward compatible . This feature can be supported using either a new operator or extending union to support 'using' clause . I am thinking of having a new operator called either unionschema or merge . Does anybody have any other suggestions for the syntax ? example - L1 = load 'x' as (a,b); L2 = load 'y' as (b,c); U = unionschema L1, L2; describe U; U: {a:bytearray, b:byetarray, c:bytearray} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1334) Make pig artifacts available through maven
[ https://issues.apache.org/jira/browse/PIG-1334?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] niraj rai updated PIG-1334: --- Attachment: mvn_pig_3.patch Added mvn-deploy task to load the jar in the apache repos. Giri, can you test this patch as I don't have permission to run this test. Make pig artifacts available through maven -- Key: PIG-1334 URL: https://issues.apache.org/jira/browse/PIG-1334 Project: Pig Issue Type: Improvement Reporter: Olga Natkovich Assignee: niraj rai Fix For: 0.8.0 Attachments: mvn-pig.patch, mvn_pig_2.patch, mvn_pig_3.patch -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1178) LogicalPlan and Optimizer are too complex and hard to work with
[ https://issues.apache.org/jira/browse/PIG-1178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12892960#action_12892960 ] Alan Gates commented on PIG-1178: - 12K lines of code, wow! In ProjectExpression, what does the new attachedRelationalOp do? javadoc comments on that in the constructor would be good. Is the purpose of this patch to also make this the default optimizer or leave it in experimental mode? If it is to make it the default, I think we should move it from the experimental package, though probably in a separate patch. If it isn't, any thoughts on when it would be ready to become the default optimizer? LogicalPlan and Optimizer are too complex and hard to work with --- Key: PIG-1178 URL: https://issues.apache.org/jira/browse/PIG-1178 Project: Pig Issue Type: Improvement Reporter: Alan Gates Assignee: Daniel Dai Fix For: 0.8.0 Attachments: expressions-2.patch, expressions.patch, lp.patch, lp.patch, PIG-1178-4.patch, pig_1178.patch, pig_1178.patch, PIG_1178.patch, pig_1178_2.patch, pig_1178_3.2.patch, pig_1178_3.3.patch, pig_1178_3.4.patch, pig_1178_3.patch The current implementation of the logical plan and the logical optimizer in Pig has proven to not be easily extensible. Developer feedback has indicated that adding new rules to the optimizer is quite burdensome. In addition, the logical plan has been an area of numerous bugs, many of which have been difficult to fix. Developers also feel that the logical plan is difficult to understand and maintain. The root cause for these issues is that a number of design decisions that were made as part of the 0.2 rewrite of the front end have now proven to be sub-optimal. The heart of this proposal is to revisit a number of those proposals and rebuild the logical plan with a simpler design that will make it much easier to maintain the logical plan as well as extend the logical optimizer. See http://wiki.apache.org/pig/PigLogicalPlanOptimizerRewrite for full details. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1517) Pig needs to support keywords in the package name
[ https://issues.apache.org/jira/browse/PIG-1517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12892982#action_12892982 ] Olga Natkovich commented on PIG-1517: - I asked Aniket to add a test to make sure that we don't regress when we switch parsers. Once the new patch os submitted, I will review and commit it Pig needs to support keywords in the package name - Key: PIG-1517 URL: https://issues.apache.org/jira/browse/PIG-1517 Project: Pig Issue Type: Bug Components: grunt Reporter: Aniket Mokashi Assignee: Aniket Mokashi Priority: Minor Fix For: 0.8.0 Attachments: pigusergroup656.patch Pig needs to support keywords in the package name. Pig supports most of the keywords as this was fixed in https://issues.apache.org/jira/browse/PIG-656. There are a few missing tokens like eq,gt,lt,gte,lte,neq that need to be supported. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1423) Suggest to add clear command in grunt to clear the relation variable
[ https://issues.apache.org/jira/browse/PIG-1423?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12892992#action_12892992 ] Olga Natkovich commented on PIG-1423: - I am not sure I like the idea of unsetting variables as a grant command. I think it can cause more confusion and generate more problems then it would help to solve. I think the right way to solve this problem is to define scope for variables as part of our turing complete effort. If we have an agreement on this, I would like to unlinik it from 0.8.0 release. Suggest to add clear command in grunt to clear the relation variable Key: PIG-1423 URL: https://issues.apache.org/jira/browse/PIG-1423 Project: Pig Issue Type: New Feature Components: grunt Affects Versions: 0.8.0 Reporter: Jeff Zhang Assignee: Jeff Zhang Fix For: 0.8.0 Attachments: PIG-1423.patch The relation name in pig script can been considered as variable of programming language. One risk of the current grunt is that user may use the previous defined relation name by typo after a long period of work on grunt. And it is difficult for users to track this problem. E.g. the following red students is not the user intend to use, but here grunt won't throw any error meesage. students = load 'a.txt'; student = load 'b.txt'; result = foreach {color:red}students{color} generate $0; The clear command is to clear the variable defined before, then if users use the relation name defined before, grunt will throw error message. And this command also will be useful to let user reuse the relation names especially when he's doing lots of experiments for one specific task. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1229) allow pig to write output into a JDBC db
[ https://issues.apache.org/jira/browse/PIG-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12892999#action_12892999 ] Hadoop QA commented on PIG-1229: -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12450586/jira-1229-final.patch against trunk revision 979781. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 4 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed core unit tests. -1 contrib tests. The patch failed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/360/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/360/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/360/console This message is automatically generated. allow pig to write output into a JDBC db Key: PIG-1229 URL: https://issues.apache.org/jira/browse/PIG-1229 Project: Pig Issue Type: New Feature Components: impl Reporter: Ian Holsman Assignee: Ankur Priority: Minor Fix For: 0.8.0 Attachments: jira-1229-final.patch, jira-1229-v2.patch, jira-1229-v3.patch, pig-1229.2.patch, pig-1229.patch UDF to store data into a DB -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.