[jira] Commented: (PIG-1520) Remove Owl from Pig contrib

2010-07-28 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1520?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12893149#action_12893149
 ] 

Hadoop QA commented on PIG-1520:


-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12450615/PIG-1520.patch
  against trunk revision 979918.

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 345 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

+1 core tests.  The patch passed core unit tests.

-1 contrib tests.  The patch failed contrib unit tests.

Test results: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/382/testReport/
Findbugs warnings: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/382/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/382/console

This message is automatically generated.

 Remove Owl from Pig contrib
 ---

 Key: PIG-1520
 URL: https://issues.apache.org/jira/browse/PIG-1520
 Project: Pig
  Issue Type: Task
  Components: impl
Affects Versions: 0.8.0
Reporter: Alan Gates
Assignee: Alan Gates
 Fix For: 0.8.0

 Attachments: PIG-1520.patch


 Yahoo has transitioned work on Owl to Howl (which will not be a Pig contrib 
 project).  Since no one else is working on Owl and there will be no one to 
 support it we should remove it from our contrib before releasing 0.8.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1513) Pig doesn't handle empty input directory

2010-07-28 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12893433#action_12893433
 ] 

Hadoop QA commented on PIG-1513:


-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12450727/PIG-1513.patch
  against trunk revision 979918.

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 3 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

-1 core tests.  The patch failed core unit tests.

-1 contrib tests.  The patch failed contrib unit tests.

Test results: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/383/testReport/
Findbugs warnings: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/383/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/383/console

This message is automatically generated.

 Pig doesn't handle empty input directory
 

 Key: PIG-1513
 URL: https://issues.apache.org/jira/browse/PIG-1513
 Project: Pig
  Issue Type: Bug
Reporter: Richard Ding
Assignee: Richard Ding
 Fix For: 0.8.0

 Attachments: PIG-1513.patch


 The following script
 {code}
 A = load 'input';
 B = load 'emptydir';
 C = join B by $0, A by $0 using 'skewed';
 store C into 'output';
 {code}
 fails with ERROR: java.lang.RuntimeException: Empty samples file';
 In this case, the sample job has 0 maps.  Pig doesn't expect this and fails . 
 For merge join the script
 The merge join script
 {code}
 A = load 'input';
 B = load 'emptydir';
 C = join A by $0, B by $0 using 'merge';
 store C into 'output';
 {code}
 the sample job again has 0 maps and the script  fails with  ERROR 2176: 
 Error processing right input during merge join.
 But if we change the join order: 
 {code}
 A = load 'input';
 B = load 'emptydir';
 C = join B by $0, A by $0 using 'merge';
 store C into 'output';
 {code}
 The second job (merge) now has 0 maps and 0 reduces. And it generates an 
 empty 'output' directory.
 Order by on empty directory works fine and generates empty part files.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1516) finalize in bag implementations causes pig to run out of memory in reduce

2010-07-28 Thread Thejas M Nair (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1516?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thejas M Nair updated PIG-1516:
---

Attachment: PIG-1516.2.patch

New patch with fix for findbugs warnings.
I have also run large queries that spill to disk to test the changes in 
handling of mSpillFiles. 


 finalize in bag implementations causes pig to run out of memory in reduce 
 --

 Key: PIG-1516
 URL: https://issues.apache.org/jira/browse/PIG-1516
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.7.0
Reporter: Thejas M Nair
Assignee: Thejas M Nair
 Fix For: 0.8.0

 Attachments: PIG-1516.2.patch, PIG-1516.patch


 *Problem:*
 pig bag implementations that are subclasses of DefaultAbstractBag, have 
 finalize methods implemented. As a result, the garbage collector moves them 
 to a finalization queue, and the memory used is freed only after the 
 finalization happens on it.
 If the bags are not finalized fast enough, a lot of memory is consumed by the 
 finalization queue, and pig runs out of memory. This can happen if large 
 number of small bags are being created.
 *Solution:*
 The finalize function exists for the purpose of deleting the spill files that 
 are created when the bag is too large. But if the bags are small enough, no 
 spill files are created, and there is no use of the finalize function.
  A new class that holds a list of files will be introduced (FileList). This 
 class will have a finalize method that deletes the files. The bags will no 
 longer have finalize methods, and the bags will use FileList instead of 
 ArrayListFile.
 *Possible workaround for earlier releases:*
 Since the fix is going into 0.8, here is a workaround -
 Disabling the combiner will reduce the number of bags getting created, as 
 there will not be the stage of combining intermediate merge results. But I 
 would recommend disabling it only if you have this problem as it is likely to 
 slow down the query .
 To disable combiner, set the property: -Dpig.exec.nocombiner=true

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1249) Safe-guards against misconfigured Pig scripts without PARALLEL keyword

2010-07-28 Thread Olga Natkovich (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1249?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Olga Natkovich updated PIG-1249:


Status: Resolved  (was: Patch Available)
Resolution: Fixed

Patch committed. Thanks Jeff!

 Safe-guards against misconfigured Pig scripts without PARALLEL keyword
 --

 Key: PIG-1249
 URL: https://issues.apache.org/jira/browse/PIG-1249
 Project: Pig
  Issue Type: Improvement
Affects Versions: 0.8.0
Reporter: Arun C Murthy
Assignee: Jeff Zhang
Priority: Critical
 Fix For: 0.8.0

 Attachments: PIG-1249-4.patch, PIG-1249.patch, PIG-1249_5.patch, 
 PIG_1249_2.patch, PIG_1249_3.patch


 It would be *very* useful for Pig to have safe-guards against naive scripts 
 which process a *lot* of data without the use of PARALLEL keyword.
 We've seen a fair number of instances where naive users process huge 
 data-sets (10TB) with badly mis-configured #reduces e.g. 1 reduce. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1521) explain plan does not show correct Physical operator in MR plan when POSortedDistinct, POPackageLite are used

2010-07-28 Thread Thejas M Nair (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1521?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thejas M Nair updated PIG-1521:
---

Status: Patch Available  (was: Open)

 explain plan does not show correct Physical operator in MR plan when 
 POSortedDistinct, POPackageLite are used
 -

 Key: PIG-1521
 URL: https://issues.apache.org/jira/browse/PIG-1521
 Project: Pig
  Issue Type: Bug
Reporter: Thejas M Nair
Assignee: Thejas M Nair
Priority: Minor
 Fix For: 0.8.0

 Attachments: PIG-1521.patch


 MR plan in explain shows PODistinct and Package (POPackage), when the 
 operators POSortedDistinct and PackageLite (POPackageLite) are actually being 
 used.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.