[jira] Updated: (PIG-1272) Column pruner causes wrong results

2010-03-14 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1272?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-1272:


Status: Patch Available  (was: Reopened)

 Column pruner causes wrong results
 --

 Key: PIG-1272
 URL: https://issues.apache.org/jira/browse/PIG-1272
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.6.0
Reporter: Viraj Bhat
Assignee: Daniel Dai
 Fix For: 0.7.0

 Attachments: PIG-1272-1.patch, PIG-1272-2.patch


 For a simple script the column pruner optimization removes certain columns 
 from the original relation, which results in wrong results.
 Input file kv contains the following columns (tab separated)
 {code}
 a   1
 a   2
 a   3
 b   4
 c   5
 c   6
 b   7
 d   8
 {code}
 Now running this script in Pig 0.6 produces
 {code}
 kv = load 'kv' as (k,v);
 keys= foreach kv generate k;
 keys = distinct keys; 
 keys = limit keys 2;
 rejoin = join keys by k, kv by k;
 dump rejoin;
 {code}
 (a,a)
 (a,a)
 (a,a)
 (b,b)
 (b,b)
 Running this in Pig 0.5 version without column pruner results in:
 (a,a,1)
 (a,a,2)
 (a,a,3)
 (b,b,4)
 (b,b,7)
 When we disable the ColumnPruner optimization it gives right results.
 Viraj

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1178) LogicalPlan and Optimizer are too complex and hard to work with

2010-03-14 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12845083#action_12845083
 ] 

Hadoop QA commented on PIG-1178:


-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12438738/pig_1178_3.3.patch
  against trunk revision 922664.

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 28 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

-1 core tests.  The patch failed core unit tests.

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/251/testReport/
Findbugs warnings: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/251/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/251/console

This message is automatically generated.

 LogicalPlan and Optimizer are too complex and hard to work with
 ---

 Key: PIG-1178
 URL: https://issues.apache.org/jira/browse/PIG-1178
 Project: Pig
  Issue Type: Improvement
Reporter: Alan Gates
Assignee: Daniel Dai
 Attachments: expressions-2.patch, expressions.patch, lp.patch, 
 lp.patch, pig_1178.patch, pig_1178.patch, PIG_1178.patch, pig_1178_2.patch, 
 pig_1178_3.2.patch, pig_1178_3.3.patch, pig_1178_3.patch


 The current implementation of the logical plan and the logical optimizer in 
 Pig has proven to not be easily extensible. Developer feedback has indicated 
 that adding new rules to the optimizer is quite burdensome. In addition, the 
 logical plan has been an area of numerous bugs, many of which have been 
 difficult to fix. Developers also feel that the logical plan is difficult to 
 understand and maintain. The root cause for these issues is that a number of 
 design decisions that were made as part of the 0.2 rewrite of the front end 
 have now proven to be sub-optimal. The heart of this proposal is to revisit a 
 number of those proposals and rebuild the logical plan with a simpler design 
 that will make it much easier to maintain the logical plan as well as extend 
 the logical optimizer. 
 See http://wiki.apache.org/pig/PigLogicalPlanOptimizerRewrite for full 
 details.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1272) Column pruner causes wrong results

2010-03-14 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1272?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12845162#action_12845162
 ] 

Hadoop QA commented on PIG-1272:


-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12438743/PIG-1272-2.patch
  against trunk revision 922664.

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 6 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

-1 core tests.  The patch failed core unit tests.

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/252/testReport/
Findbugs warnings: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/252/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/252/console

This message is automatically generated.

 Column pruner causes wrong results
 --

 Key: PIG-1272
 URL: https://issues.apache.org/jira/browse/PIG-1272
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.6.0
Reporter: Viraj Bhat
Assignee: Daniel Dai
 Fix For: 0.7.0

 Attachments: PIG-1272-1.patch, PIG-1272-2.patch


 For a simple script the column pruner optimization removes certain columns 
 from the original relation, which results in wrong results.
 Input file kv contains the following columns (tab separated)
 {code}
 a   1
 a   2
 a   3
 b   4
 c   5
 c   6
 b   7
 d   8
 {code}
 Now running this script in Pig 0.6 produces
 {code}
 kv = load 'kv' as (k,v);
 keys= foreach kv generate k;
 keys = distinct keys; 
 keys = limit keys 2;
 rejoin = join keys by k, kv by k;
 dump rejoin;
 {code}
 (a,a)
 (a,a)
 (a,a)
 (b,b)
 (b,b)
 Running this in Pig 0.5 version without column pruner results in:
 (a,a,1)
 (a,a,2)
 (a,a,3)
 (b,b,4)
 (b,b,7)
 When we disable the ColumnPruner optimization it gives right results.
 Viraj

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1272) Column pruner causes wrong results

2010-03-14 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1272?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-1272:


Resolution: Fixed
Status: Resolved  (was: Patch Available)

Manual unit test pass.

 Column pruner causes wrong results
 --

 Key: PIG-1272
 URL: https://issues.apache.org/jira/browse/PIG-1272
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.6.0
Reporter: Viraj Bhat
Assignee: Daniel Dai
 Fix For: 0.7.0

 Attachments: PIG-1272-1.patch, PIG-1272-2.patch


 For a simple script the column pruner optimization removes certain columns 
 from the original relation, which results in wrong results.
 Input file kv contains the following columns (tab separated)
 {code}
 a   1
 a   2
 a   3
 b   4
 c   5
 c   6
 b   7
 d   8
 {code}
 Now running this script in Pig 0.6 produces
 {code}
 kv = load 'kv' as (k,v);
 keys= foreach kv generate k;
 keys = distinct keys; 
 keys = limit keys 2;
 rejoin = join keys by k, kv by k;
 dump rejoin;
 {code}
 (a,a)
 (a,a)
 (a,a)
 (b,b)
 (b,b)
 Running this in Pig 0.5 version without column pruner results in:
 (a,a,1)
 (a,a,2)
 (a,a,3)
 (b,b,4)
 (b,b,7)
 When we disable the ColumnPruner optimization it gives right results.
 Viraj

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1178) LogicalPlan and Optimizer are too complex and hard to work with

2010-03-14 Thread Daniel Dai (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12845173#action_12845173
 ] 

Daniel Dai commented on PIG-1178:
-

pig_1178_3.3.patch committed. Manual unit pass.

 LogicalPlan and Optimizer are too complex and hard to work with
 ---

 Key: PIG-1178
 URL: https://issues.apache.org/jira/browse/PIG-1178
 Project: Pig
  Issue Type: Improvement
Reporter: Alan Gates
Assignee: Daniel Dai
 Attachments: expressions-2.patch, expressions.patch, lp.patch, 
 lp.patch, pig_1178.patch, pig_1178.patch, PIG_1178.patch, pig_1178_2.patch, 
 pig_1178_3.2.patch, pig_1178_3.3.patch, pig_1178_3.patch


 The current implementation of the logical plan and the logical optimizer in 
 Pig has proven to not be easily extensible. Developer feedback has indicated 
 that adding new rules to the optimizer is quite burdensome. In addition, the 
 logical plan has been an area of numerous bugs, many of which have been 
 difficult to fix. Developers also feel that the logical plan is difficult to 
 understand and maintain. The root cause for these issues is that a number of 
 design decisions that were made as part of the 0.2 rewrite of the front end 
 have now proven to be sub-optimal. The heart of this proposal is to revisit a 
 number of those proposals and rebuild the logical plan with a simpler design 
 that will make it much easier to maintain the logical plan as well as extend 
 the logical optimizer. 
 See http://wiki.apache.org/pig/PigLogicalPlanOptimizerRewrite for full 
 details.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-200) Pig Performance Benchmarks

2010-03-14 Thread duncan (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12845174#action_12845174
 ] 

duncan commented on PIG-200:


Hi Daniel,

How can I run the perf.patch? I saw a lot of different things in the perf.patch.
I want to generate data set and use those 14 pig queries for benchmarking.

Would you mind telling me more on how to use the perf.patch?

Thanks

Duncan

 Pig Performance Benchmarks
 --

 Key: PIG-200
 URL: https://issues.apache.org/jira/browse/PIG-200
 Project: Pig
  Issue Type: Task
Reporter: Amir Youssefi
Assignee: Alan Gates
 Attachments: generate_data.pl, perf.hadoop.patch, perf.patch


 To benchmark Pig performance, we need to have a TPC-H like Large Data Set 
 plus Script Collection. This is used in comparison of different Pig releases, 
 Pig vs. other systems (e.g. Pig + Hadoop vs. Hadoop Only).
 Here is Wiki for small tests: http://wiki.apache.org/pig/PigPerformance
 I am currently running long-running Pig scripts over data-sets in the order 
 of tens of TBs. Next step is hundreds of TBs.
 We need to have an open large-data set (open source scripts which generate 
 data-set) and detailed scripts for important operations such as ORDER, 
 AGGREGATION etc.
 We can call those the Pig Workouts: Cardio (short processing), Marathon (long 
 running scripts) and Triathlon (Mix). 
 I will update this JIRA with more details of current activities soon.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.