[jira] Commented: (PIG-1249) Safe-guards against misconfigured Pig scripts without PARALLEL keyword

2010-06-04 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1249?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12875551#action_12875551
 ] 

Hadoop QA commented on PIG-1249:


-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12446173/PIG-1249-4.patch
  against trunk revision 951229.

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 5 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

-1 core tests.  The patch failed core unit tests.

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/329/testReport/
Findbugs warnings: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/329/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/329/console

This message is automatically generated.

 Safe-guards against misconfigured Pig scripts without PARALLEL keyword
 --

 Key: PIG-1249
 URL: https://issues.apache.org/jira/browse/PIG-1249
 Project: Pig
  Issue Type: Improvement
Affects Versions: 0.8.0
Reporter: Arun C Murthy
Assignee: Jeff Zhang
Priority: Critical
 Fix For: 0.8.0

 Attachments: PIG-1249-4.patch, PIG-1249.patch, PIG_1249_2.patch, 
 PIG_1249_3.patch


 It would be *very* useful for Pig to have safe-guards against naive scripts 
 which process a *lot* of data without the use of PARALLEL keyword.
 We've seen a fair number of instances where naive users process huge 
 data-sets (10TB) with badly mis-configured #reduces e.g. 1 reduce. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-282) Custom Partitioner

2010-06-04 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12875554#action_12875554
 ] 

Hadoop QA commented on PIG-282:
---

-1 overall.  Here are the results of testing the latest attachment 
  
http://issues.apache.org/jira/secure/attachment/12446172/CustomPartitionerFinale.patch
  against trunk revision 951229.

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 6 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs warnings.

-1 release audit.  The applied patch generated 380 release audit warnings 
(more than the trunk's current 379 warnings).

-1 core tests.  The patch failed core unit tests.

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/320/testReport/
Release audit warnings: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/320/artifact/trunk/patchprocess/releaseAuditDiffWarnings.txt
Findbugs warnings: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/320/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/320/console

This message is automatically generated.

 Custom Partitioner
 --

 Key: PIG-282
 URL: https://issues.apache.org/jira/browse/PIG-282
 Project: Pig
  Issue Type: New Feature
Affects Versions: 0.7.0
Reporter: Amir Youssefi
Assignee: Aniket Mokashi
Priority: Minor
 Fix For: 0.8.0

 Attachments: CustomPartitioner.patch, CustomPartitionerFinale.patch, 
 CustomPartitionerTest.patch


 By adding custom partitioner we can give control over which output partition 
 a key (/value) goes to. We can add keywords to language e.g. 
 PARTITION BY UDF(...)
 or a similar syntax. UDF returns a number between 0 and n-1 where n is number 
 of output partitions.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1436) Print number of records outputted at each step of a Pig script

2010-06-04 Thread Alan Gates (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1436?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12875638#action_12875638
 ] 

Alan Gates commented on PIG-1436:
-

Russell,

Richard's already doing a lot of work in this area.  Check out  PIG-1389, 
PIG-908, PIG-864, PIG-809  to see if those will meet your needs.  If not, 
please discuss with him as his current project is to add script usage 
statistics.

 Print number of records outputted at each step of a Pig script
 --

 Key: PIG-1436
 URL: https://issues.apache.org/jira/browse/PIG-1436
 Project: Pig
  Issue Type: New Feature
  Components: grunt
Affects Versions: 0.7.0
Reporter: Russell Jurney
Priority: Minor
 Fix For: 0.8.0


 I often run a script multiple times, or have to go and look through Hadoop 
 task logs, to figure out where I broke a long script in such a way that I get 
 0 records out of it.  I think this is a common problem.
 If someone can point me in the right direction, I can make a pass at this.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1433) pig should create success file if mapreduce.fileoutputcommitter.marksuccessfuljobs is true

2010-06-04 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12875639#action_12875639
 ] 

Hadoop QA commented on PIG-1433:


-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12446222/PIG-1433.patch
  against trunk revision 951229.

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 3 new or modified tests.

-1 patch.  The patch command could not apply the patch.

Console output: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/330/console

This message is automatically generated.

 pig should create success file if 
 mapreduce.fileoutputcommitter.marksuccessfuljobs is true
 --

 Key: PIG-1433
 URL: https://issues.apache.org/jira/browse/PIG-1433
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.8.0
Reporter: Pradeep Kamath
Assignee: Pradeep Kamath
 Fix For: 0.8.0

 Attachments: PIG-1433.patch


 pig should create success file if 
 mapreduce.fileoutputcommitter.marksuccessfuljobs is true

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1433) pig should create success file if mapreduce.fileoutputcommitter.marksuccessfuljobs is true

2010-06-04 Thread Pradeep Kamath (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1433?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pradeep Kamath updated PIG-1433:


Attachment: PIG-1433-for-branch-0.7.patch

The original patch was committed to trunk. It did not apply for branch-0.7 - so 
I have attached a new patch with minor modifications for branch-0.7. This 
latter patch was committed to branch-0.7

 pig should create success file if 
 mapreduce.fileoutputcommitter.marksuccessfuljobs is true
 --

 Key: PIG-1433
 URL: https://issues.apache.org/jira/browse/PIG-1433
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.8.0
Reporter: Pradeep Kamath
Assignee: Pradeep Kamath
 Fix For: 0.7.0, 0.8.0

 Attachments: PIG-1433-for-branch-0.7.patch, PIG-1433.patch


 pig should create success file if 
 mapreduce.fileoutputcommitter.marksuccessfuljobs is true

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1433) pig should create success file if mapreduce.fileoutputcommitter.marksuccessfuljobs is true

2010-06-04 Thread Pradeep Kamath (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1433?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pradeep Kamath updated PIG-1433:


   Status: Resolved  (was: Patch Available)
 Hadoop Flags: [Reviewed]
Fix Version/s: 0.7.0
   Resolution: Fixed

 pig should create success file if 
 mapreduce.fileoutputcommitter.marksuccessfuljobs is true
 --

 Key: PIG-1433
 URL: https://issues.apache.org/jira/browse/PIG-1433
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.8.0
Reporter: Pradeep Kamath
Assignee: Pradeep Kamath
 Fix For: 0.8.0, 0.7.0

 Attachments: PIG-1433-for-branch-0.7.patch, PIG-1433.patch


 pig should create success file if 
 mapreduce.fileoutputcommitter.marksuccessfuljobs is true

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (PIG-1438) [Performance] MultiQueryOptimizer should also merge DISTINCT jobs

2010-06-04 Thread Richard Ding (JIRA)
[Performance] MultiQueryOptimizer should also merge DISTINCT jobs
-

 Key: PIG-1438
 URL: https://issues.apache.org/jira/browse/PIG-1438
 Project: Pig
  Issue Type: Improvement
  Components: impl
Affects Versions: 0.7.0
Reporter: Richard Ding
Assignee: Richard Ding
 Fix For: 0.8.0


Current implementation doesn't merge jobs derived from DISTINCT statements. The 
reason is that DISTINCT jobs are implemented using a special combiner 
(DistinctCombiner). But we should be able to merge jobs that have the same type 
of combiner (e.g. merge multiple DISTINCT jobs into one).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1437) [Optimization] Rewrite GroupBy-Foreach-flatten(group) to Distinct

2010-06-04 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1437?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-1437:


Parent: PIG-1319
Issue Type: Sub-task  (was: Bug)

 [Optimization] Rewrite GroupBy-Foreach-flatten(group) to Distinct
 -

 Key: PIG-1437
 URL: https://issues.apache.org/jira/browse/PIG-1437
 Project: Pig
  Issue Type: Sub-task
  Components: impl
Affects Versions: 0.7.0
Reporter: Ashutosh Chauhan
Priority: Minor

 Its possible to rewrite queries like this
 {code}
 A = load 'data' as (name,age);
 B = group A by (name,age);
 C = foreach B generate group.name, group.age;
 dump C;
 {code}
 or
 {code} 
 (name,age);
 B = group A by (name
 A = load 'data' as,age);
 C = foreach B generate flatten(group);
 dump C;
 {code}
 to
 {code}
 A = load 'data' as (name,age);
 B = distinct A;
 dump B;
 {code}
 This could only be done if no columns within the bags are referenced 
 subsequently in the script. Since in Pig-Hadoop world DISTINCT will be 
 executed more effeciently then group-by this will be a huge win. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-283) Allow to set arbitrary jobconf key-value pairs inside pig program

2010-06-04 Thread Ashutosh Chauhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-283?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated PIG-283:
-

  Status: Resolved  (was: Patch Available)
Release Note: 
For documentation:

After this patch, it becomes possible to set key value pairs as following in 
the script. 
{code}
set mapred.map.tasks.speculative.execution false
set pig.logfile mylogfile.log
set my.arbitrary.key my.arbitary.value
{code}
These key value pairs would be put in job-conf by Pig. This is a script wide 
setting meaning if value is defined multiple times for a key in the script, the 
last one will take effect and it will be this value which will be set for all 
the jobs generated by script. 
  Resolution: Fixed

Re-ran all the test reported by Hudson as failures. All of them passed. Patch 
committed.



 Allow to set arbitrary jobconf key-value pairs inside pig program
 -

 Key: PIG-283
 URL: https://issues.apache.org/jira/browse/PIG-283
 Project: Pig
  Issue Type: New Feature
  Components: grunt
Affects Versions: 0.7.0
Reporter: Christian Kunz
Assignee: Ashutosh Chauhan
 Fix For: 0.8.0

 Attachments: pig-282.patch


 It would be useful to be able to set arbitrary JobConf key-value pairs inside 
 a pig program (e.g. in front of a COGROUP statement).
 I wonder whether the simplest way to add this feature is by expanding the 
 'set' command functionality.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-972) Make describe work with nested foreach

2010-06-04 Thread Aniket Mokashi (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-972?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aniket Mokashi updated PIG-972:
---

Attachment: NestedDescribeProp2Initial.patch

Attaching initial patch for prop2

 Make describe work with nested foreach
 --

 Key: PIG-972
 URL: https://issues.apache.org/jira/browse/PIG-972
 Project: Pig
  Issue Type: Improvement
Reporter: Olga Natkovich
Assignee: Aniket Mokashi
 Fix For: 0.8.0

 Attachments: NestedDescribeProp1.patch, 
 NestedDescribeProp2Initial.patch


 Currently Parser can't deal with that. This is because describe is part of 
 Grunt parser while the rest of nested foreach is handled by the QueryParser

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1334) Make pig artifacts available through maven

2010-06-04 Thread Jeremy Hanna (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12875766#action_12875766
 ] 

Jeremy Hanna commented on PIG-1334:
---

To clarify our need - the Cassandra project would like to use pig 0.7.0 using 
ivy as a build dependency.

 Make pig artifacts available through maven
 --

 Key: PIG-1334
 URL: https://issues.apache.org/jira/browse/PIG-1334
 Project: Pig
  Issue Type: Improvement
Reporter: Olga Natkovich
 Fix For: 0.8.0




-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.