[jira] Commented: (PIG-1520) Remove Owl from Pig contrib
[ https://issues.apache.org/jira/browse/PIG-1520?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12893149#action_12893149 ] Hadoop QA commented on PIG-1520: -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12450615/PIG-1520.patch against trunk revision 979918. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 345 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. -1 contrib tests. The patch failed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/382/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/382/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/382/console This message is automatically generated. Remove Owl from Pig contrib --- Key: PIG-1520 URL: https://issues.apache.org/jira/browse/PIG-1520 Project: Pig Issue Type: Task Components: impl Affects Versions: 0.8.0 Reporter: Alan Gates Assignee: Alan Gates Fix For: 0.8.0 Attachments: PIG-1520.patch Yahoo has transitioned work on Owl to Howl (which will not be a Pig contrib project). Since no one else is working on Owl and there will be no one to support it we should remove it from our contrib before releasing 0.8. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1513) Pig doesn't handle empty input directory
[ https://issues.apache.org/jira/browse/PIG-1513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12893433#action_12893433 ] Hadoop QA commented on PIG-1513: -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12450727/PIG-1513.patch against trunk revision 979918. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed core unit tests. -1 contrib tests. The patch failed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/383/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/383/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/383/console This message is automatically generated. Pig doesn't handle empty input directory Key: PIG-1513 URL: https://issues.apache.org/jira/browse/PIG-1513 Project: Pig Issue Type: Bug Reporter: Richard Ding Assignee: Richard Ding Fix For: 0.8.0 Attachments: PIG-1513.patch The following script {code} A = load 'input'; B = load 'emptydir'; C = join B by $0, A by $0 using 'skewed'; store C into 'output'; {code} fails with ERROR: java.lang.RuntimeException: Empty samples file'; In this case, the sample job has 0 maps. Pig doesn't expect this and fails . For merge join the script The merge join script {code} A = load 'input'; B = load 'emptydir'; C = join A by $0, B by $0 using 'merge'; store C into 'output'; {code} the sample job again has 0 maps and the script fails with ERROR 2176: Error processing right input during merge join. But if we change the join order: {code} A = load 'input'; B = load 'emptydir'; C = join B by $0, A by $0 using 'merge'; store C into 'output'; {code} The second job (merge) now has 0 maps and 0 reduces. And it generates an empty 'output' directory. Order by on empty directory works fine and generates empty part files. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1516) finalize in bag implementations causes pig to run out of memory in reduce
[ https://issues.apache.org/jira/browse/PIG-1516?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thejas M Nair updated PIG-1516: --- Attachment: PIG-1516.2.patch New patch with fix for findbugs warnings. I have also run large queries that spill to disk to test the changes in handling of mSpillFiles. finalize in bag implementations causes pig to run out of memory in reduce -- Key: PIG-1516 URL: https://issues.apache.org/jira/browse/PIG-1516 Project: Pig Issue Type: Bug Affects Versions: 0.7.0 Reporter: Thejas M Nair Assignee: Thejas M Nair Fix For: 0.8.0 Attachments: PIG-1516.2.patch, PIG-1516.patch *Problem:* pig bag implementations that are subclasses of DefaultAbstractBag, have finalize methods implemented. As a result, the garbage collector moves them to a finalization queue, and the memory used is freed only after the finalization happens on it. If the bags are not finalized fast enough, a lot of memory is consumed by the finalization queue, and pig runs out of memory. This can happen if large number of small bags are being created. *Solution:* The finalize function exists for the purpose of deleting the spill files that are created when the bag is too large. But if the bags are small enough, no spill files are created, and there is no use of the finalize function. A new class that holds a list of files will be introduced (FileList). This class will have a finalize method that deletes the files. The bags will no longer have finalize methods, and the bags will use FileList instead of ArrayListFile. *Possible workaround for earlier releases:* Since the fix is going into 0.8, here is a workaround - Disabling the combiner will reduce the number of bags getting created, as there will not be the stage of combining intermediate merge results. But I would recommend disabling it only if you have this problem as it is likely to slow down the query . To disable combiner, set the property: -Dpig.exec.nocombiner=true -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1249) Safe-guards against misconfigured Pig scripts without PARALLEL keyword
[ https://issues.apache.org/jira/browse/PIG-1249?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Olga Natkovich updated PIG-1249: Status: Resolved (was: Patch Available) Resolution: Fixed Patch committed. Thanks Jeff! Safe-guards against misconfigured Pig scripts without PARALLEL keyword -- Key: PIG-1249 URL: https://issues.apache.org/jira/browse/PIG-1249 Project: Pig Issue Type: Improvement Affects Versions: 0.8.0 Reporter: Arun C Murthy Assignee: Jeff Zhang Priority: Critical Fix For: 0.8.0 Attachments: PIG-1249-4.patch, PIG-1249.patch, PIG-1249_5.patch, PIG_1249_2.patch, PIG_1249_3.patch It would be *very* useful for Pig to have safe-guards against naive scripts which process a *lot* of data without the use of PARALLEL keyword. We've seen a fair number of instances where naive users process huge data-sets (10TB) with badly mis-configured #reduces e.g. 1 reduce. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1521) explain plan does not show correct Physical operator in MR plan when POSortedDistinct, POPackageLite are used
[ https://issues.apache.org/jira/browse/PIG-1521?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thejas M Nair updated PIG-1521: --- Status: Patch Available (was: Open) explain plan does not show correct Physical operator in MR plan when POSortedDistinct, POPackageLite are used - Key: PIG-1521 URL: https://issues.apache.org/jira/browse/PIG-1521 Project: Pig Issue Type: Bug Reporter: Thejas M Nair Assignee: Thejas M Nair Priority: Minor Fix For: 0.8.0 Attachments: PIG-1521.patch MR plan in explain shows PODistinct and Package (POPackage), when the operators POSortedDistinct and PackageLite (POPackageLite) are actually being used. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.