[jira] Commented: (PIG-1249) Safe-guards against misconfigured Pig scripts without PARALLEL keyword
[ https://issues.apache.org/jira/browse/PIG-1249?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12875551#action_12875551 ] Hadoop QA commented on PIG-1249: -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12446173/PIG-1249-4.patch against trunk revision 951229. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 5 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/329/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/329/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/329/console This message is automatically generated. Safe-guards against misconfigured Pig scripts without PARALLEL keyword -- Key: PIG-1249 URL: https://issues.apache.org/jira/browse/PIG-1249 Project: Pig Issue Type: Improvement Affects Versions: 0.8.0 Reporter: Arun C Murthy Assignee: Jeff Zhang Priority: Critical Fix For: 0.8.0 Attachments: PIG-1249-4.patch, PIG-1249.patch, PIG_1249_2.patch, PIG_1249_3.patch It would be *very* useful for Pig to have safe-guards against naive scripts which process a *lot* of data without the use of PARALLEL keyword. We've seen a fair number of instances where naive users process huge data-sets (10TB) with badly mis-configured #reduces e.g. 1 reduce. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-282) Custom Partitioner
[ https://issues.apache.org/jira/browse/PIG-282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12875554#action_12875554 ] Hadoop QA commented on PIG-282: --- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12446172/CustomPartitionerFinale.patch against trunk revision 951229. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 6 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. -1 release audit. The applied patch generated 380 release audit warnings (more than the trunk's current 379 warnings). -1 core tests. The patch failed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/320/testReport/ Release audit warnings: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/320/artifact/trunk/patchprocess/releaseAuditDiffWarnings.txt Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/320/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/320/console This message is automatically generated. Custom Partitioner -- Key: PIG-282 URL: https://issues.apache.org/jira/browse/PIG-282 Project: Pig Issue Type: New Feature Affects Versions: 0.7.0 Reporter: Amir Youssefi Assignee: Aniket Mokashi Priority: Minor Fix For: 0.8.0 Attachments: CustomPartitioner.patch, CustomPartitionerFinale.patch, CustomPartitionerTest.patch By adding custom partitioner we can give control over which output partition a key (/value) goes to. We can add keywords to language e.g. PARTITION BY UDF(...) or a similar syntax. UDF returns a number between 0 and n-1 where n is number of output partitions. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1436) Print number of records outputted at each step of a Pig script
[ https://issues.apache.org/jira/browse/PIG-1436?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12875638#action_12875638 ] Alan Gates commented on PIG-1436: - Russell, Richard's already doing a lot of work in this area. Check out PIG-1389, PIG-908, PIG-864, PIG-809 to see if those will meet your needs. If not, please discuss with him as his current project is to add script usage statistics. Print number of records outputted at each step of a Pig script -- Key: PIG-1436 URL: https://issues.apache.org/jira/browse/PIG-1436 Project: Pig Issue Type: New Feature Components: grunt Affects Versions: 0.7.0 Reporter: Russell Jurney Priority: Minor Fix For: 0.8.0 I often run a script multiple times, or have to go and look through Hadoop task logs, to figure out where I broke a long script in such a way that I get 0 records out of it. I think this is a common problem. If someone can point me in the right direction, I can make a pass at this. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1433) pig should create success file if mapreduce.fileoutputcommitter.marksuccessfuljobs is true
[ https://issues.apache.org/jira/browse/PIG-1433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12875639#action_12875639 ] Hadoop QA commented on PIG-1433: -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12446222/PIG-1433.patch against trunk revision 951229. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. -1 patch. The patch command could not apply the patch. Console output: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/330/console This message is automatically generated. pig should create success file if mapreduce.fileoutputcommitter.marksuccessfuljobs is true -- Key: PIG-1433 URL: https://issues.apache.org/jira/browse/PIG-1433 Project: Pig Issue Type: Bug Affects Versions: 0.8.0 Reporter: Pradeep Kamath Assignee: Pradeep Kamath Fix For: 0.8.0 Attachments: PIG-1433.patch pig should create success file if mapreduce.fileoutputcommitter.marksuccessfuljobs is true -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1433) pig should create success file if mapreduce.fileoutputcommitter.marksuccessfuljobs is true
[ https://issues.apache.org/jira/browse/PIG-1433?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pradeep Kamath updated PIG-1433: Attachment: PIG-1433-for-branch-0.7.patch The original patch was committed to trunk. It did not apply for branch-0.7 - so I have attached a new patch with minor modifications for branch-0.7. This latter patch was committed to branch-0.7 pig should create success file if mapreduce.fileoutputcommitter.marksuccessfuljobs is true -- Key: PIG-1433 URL: https://issues.apache.org/jira/browse/PIG-1433 Project: Pig Issue Type: Bug Affects Versions: 0.8.0 Reporter: Pradeep Kamath Assignee: Pradeep Kamath Fix For: 0.7.0, 0.8.0 Attachments: PIG-1433-for-branch-0.7.patch, PIG-1433.patch pig should create success file if mapreduce.fileoutputcommitter.marksuccessfuljobs is true -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1433) pig should create success file if mapreduce.fileoutputcommitter.marksuccessfuljobs is true
[ https://issues.apache.org/jira/browse/PIG-1433?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pradeep Kamath updated PIG-1433: Status: Resolved (was: Patch Available) Hadoop Flags: [Reviewed] Fix Version/s: 0.7.0 Resolution: Fixed pig should create success file if mapreduce.fileoutputcommitter.marksuccessfuljobs is true -- Key: PIG-1433 URL: https://issues.apache.org/jira/browse/PIG-1433 Project: Pig Issue Type: Bug Affects Versions: 0.8.0 Reporter: Pradeep Kamath Assignee: Pradeep Kamath Fix For: 0.8.0, 0.7.0 Attachments: PIG-1433-for-branch-0.7.patch, PIG-1433.patch pig should create success file if mapreduce.fileoutputcommitter.marksuccessfuljobs is true -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (PIG-1438) [Performance] MultiQueryOptimizer should also merge DISTINCT jobs
[Performance] MultiQueryOptimizer should also merge DISTINCT jobs - Key: PIG-1438 URL: https://issues.apache.org/jira/browse/PIG-1438 Project: Pig Issue Type: Improvement Components: impl Affects Versions: 0.7.0 Reporter: Richard Ding Assignee: Richard Ding Fix For: 0.8.0 Current implementation doesn't merge jobs derived from DISTINCT statements. The reason is that DISTINCT jobs are implemented using a special combiner (DistinctCombiner). But we should be able to merge jobs that have the same type of combiner (e.g. merge multiple DISTINCT jobs into one). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1437) [Optimization] Rewrite GroupBy-Foreach-flatten(group) to Distinct
[ https://issues.apache.org/jira/browse/PIG-1437?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai updated PIG-1437: Parent: PIG-1319 Issue Type: Sub-task (was: Bug) [Optimization] Rewrite GroupBy-Foreach-flatten(group) to Distinct - Key: PIG-1437 URL: https://issues.apache.org/jira/browse/PIG-1437 Project: Pig Issue Type: Sub-task Components: impl Affects Versions: 0.7.0 Reporter: Ashutosh Chauhan Priority: Minor Its possible to rewrite queries like this {code} A = load 'data' as (name,age); B = group A by (name,age); C = foreach B generate group.name, group.age; dump C; {code} or {code} (name,age); B = group A by (name A = load 'data' as,age); C = foreach B generate flatten(group); dump C; {code} to {code} A = load 'data' as (name,age); B = distinct A; dump B; {code} This could only be done if no columns within the bags are referenced subsequently in the script. Since in Pig-Hadoop world DISTINCT will be executed more effeciently then group-by this will be a huge win. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-283) Allow to set arbitrary jobconf key-value pairs inside pig program
[ https://issues.apache.org/jira/browse/PIG-283?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashutosh Chauhan updated PIG-283: - Status: Resolved (was: Patch Available) Release Note: For documentation: After this patch, it becomes possible to set key value pairs as following in the script. {code} set mapred.map.tasks.speculative.execution false set pig.logfile mylogfile.log set my.arbitrary.key my.arbitary.value {code} These key value pairs would be put in job-conf by Pig. This is a script wide setting meaning if value is defined multiple times for a key in the script, the last one will take effect and it will be this value which will be set for all the jobs generated by script. Resolution: Fixed Re-ran all the test reported by Hudson as failures. All of them passed. Patch committed. Allow to set arbitrary jobconf key-value pairs inside pig program - Key: PIG-283 URL: https://issues.apache.org/jira/browse/PIG-283 Project: Pig Issue Type: New Feature Components: grunt Affects Versions: 0.7.0 Reporter: Christian Kunz Assignee: Ashutosh Chauhan Fix For: 0.8.0 Attachments: pig-282.patch It would be useful to be able to set arbitrary JobConf key-value pairs inside a pig program (e.g. in front of a COGROUP statement). I wonder whether the simplest way to add this feature is by expanding the 'set' command functionality. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-972) Make describe work with nested foreach
[ https://issues.apache.org/jira/browse/PIG-972?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aniket Mokashi updated PIG-972: --- Attachment: NestedDescribeProp2Initial.patch Attaching initial patch for prop2 Make describe work with nested foreach -- Key: PIG-972 URL: https://issues.apache.org/jira/browse/PIG-972 Project: Pig Issue Type: Improvement Reporter: Olga Natkovich Assignee: Aniket Mokashi Fix For: 0.8.0 Attachments: NestedDescribeProp1.patch, NestedDescribeProp2Initial.patch Currently Parser can't deal with that. This is because describe is part of Grunt parser while the rest of nested foreach is handled by the QueryParser -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1334) Make pig artifacts available through maven
[ https://issues.apache.org/jira/browse/PIG-1334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12875766#action_12875766 ] Jeremy Hanna commented on PIG-1334: --- To clarify our need - the Cassandra project would like to use pig 0.7.0 using ivy as a build dependency. Make pig artifacts available through maven -- Key: PIG-1334 URL: https://issues.apache.org/jira/browse/PIG-1334 Project: Pig Issue Type: Improvement Reporter: Olga Natkovich Fix For: 0.8.0 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.