date:20100314

[jira] Updated: (PIG-1272) Column pruner causes wrong results

2010-03-14 Thread Daniel Dai (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-1272?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-1272:


Status: Patch Available  (was: Reopened)

 Column pruner causes wrong results
 --

 Key: PIG-1272
 URL: https://issues.apache.org/jira/browse/PIG-1272
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.6.0
Reporter: Viraj Bhat
Assignee: Daniel Dai
 Fix For: 0.7.0

 Attachments: PIG-1272-1.patch, PIG-1272-2.patch


 For a simple script the column pruner optimization removes certain columns 
 from the original relation, which results in wrong results.
 Input file kv contains the following columns (tab separated)
 {code}
 a   1
 a   2
 a   3
 b   4
 c   5
 c   6
 b   7
 d   8
 {code}
 Now running this script in Pig 0.6 produces
 {code}
 kv = load 'kv' as (k,v);
 keys= foreach kv generate k;
 keys = distinct keys; 
 keys = limit keys 2;
 rejoin = join keys by k, kv by k;
 dump rejoin;
 {code}
 (a,a)
 (a,a)
 (a,a)
 (b,b)
 (b,b)
 Running this in Pig 0.5 version without column pruner results in:
 (a,a,1)
 (a,a,2)
 (a,a,3)
 (b,b,4)
 (b,b,7)
 When we disable the ColumnPruner optimization it gives right results.
 Viraj

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (PIG-1178) LogicalPlan and Optimizer are too complex and hard to work with

2010-03-14 Thread Hadoop QA (JIRA)

[
https://issues.apache.org/jira/browse/PIG-1178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12845083#action_12845083
]

Hadoop QA commented on PIG-1178:

-1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12438738/pig_1178_3.3.patch
against trunk revision 922664.

+1 @author. The patch does not contain any @author tags.

+1 tests included. The patch appears to include 28 new or modified tests.

+1 javadoc. The javadoc tool did not generate any warning messages.

+1 javac. The applied patch does not increase the total number of javac
compiler warnings.

+1 findbugs. The patch does not introduce any new Findbugs warnings.

+1 release audit. The applied patch does not increase the total number of
release audit warnings.

-1 core tests. The patch failed core unit tests.

+1 contrib tests. The patch passed contrib unit tests.

Test results:
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/251/testReport/
Findbugs warnings:
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/251/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output:
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/251/console

This message is automatically generated.

LogicalPlan and Optimizer are too complex and hard to work with
---

Key: PIG-1178
URL: https://issues.apache.org/jira/browse/PIG-1178
Project: Pig
Issue Type: Improvement
Reporter: Alan Gates
Assignee: Daniel Dai
Attachments: expressions-2.patch, expressions.patch, lp.patch,
lp.patch, pig_1178.patch, pig_1178.patch, PIG_1178.patch, pig_1178_2.patch,
pig_1178_3.2.patch, pig_1178_3.3.patch, pig_1178_3.patch

The current implementation of the logical plan and the logical optimizer in
Pig has proven to not be easily extensible. Developer feedback has indicated
that adding new rules to the optimizer is quite burdensome. In addition, the
logical plan has been an area of numerous bugs, many of which have been
difficult to fix. Developers also feel that the logical plan is difficult to
understand and maintain. The root cause for these issues is that a number of
design decisions that were made as part of the 0.2 rewrite of the front end
have now proven to be sub-optimal. The heart of this proposal is to revisit a
number of those proposals and rebuild the logical plan with a simpler design
that will make it much easier to maintain the logical plan as well as extend
the logical optimizer.
See http://wiki.apache.org/pig/PigLogicalPlanOptimizerRewrite for full
details.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (PIG-1272) Column pruner causes wrong results

2010-03-14 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/PIG-1272?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12845162#action_12845162
 ] 

Hadoop QA commented on PIG-1272:


-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12438743/PIG-1272-2.patch
  against trunk revision 922664.

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 6 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

-1 core tests.  The patch failed core unit tests.

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/252/testReport/
Findbugs warnings: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/252/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/252/console

This message is automatically generated.

 Column pruner causes wrong results
 --

 Key: PIG-1272
 URL: https://issues.apache.org/jira/browse/PIG-1272
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.6.0
Reporter: Viraj Bhat
Assignee: Daniel Dai
 Fix For: 0.7.0

 Attachments: PIG-1272-1.patch, PIG-1272-2.patch


 For a simple script the column pruner optimization removes certain columns 
 from the original relation, which results in wrong results.
 Input file kv contains the following columns (tab separated)
 {code}
 a   1
 a   2
 a   3
 b   4
 c   5
 c   6
 b   7
 d   8
 {code}
 Now running this script in Pig 0.6 produces
 {code}
 kv = load 'kv' as (k,v);
 keys= foreach kv generate k;
 keys = distinct keys; 
 keys = limit keys 2;
 rejoin = join keys by k, kv by k;
 dump rejoin;
 {code}
 (a,a)
 (a,a)
 (a,a)
 (b,b)
 (b,b)
 Running this in Pig 0.5 version without column pruner results in:
 (a,a,1)
 (a,a,2)
 (a,a,3)
 (b,b,4)
 (b,b,7)
 When we disable the ColumnPruner optimization it gives right results.
 Viraj

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (PIG-1272) Column pruner causes wrong results

2010-03-14 Thread Daniel Dai (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-1272?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-1272:


Resolution: Fixed
Status: Resolved  (was: Patch Available)

Manual unit test pass.

 Column pruner causes wrong results
 --

 Key: PIG-1272
 URL: https://issues.apache.org/jira/browse/PIG-1272
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.6.0
Reporter: Viraj Bhat
Assignee: Daniel Dai
 Fix For: 0.7.0

 Attachments: PIG-1272-1.patch, PIG-1272-2.patch


 For a simple script the column pruner optimization removes certain columns 
 from the original relation, which results in wrong results.
 Input file kv contains the following columns (tab separated)
 {code}
 a   1
 a   2
 a   3
 b   4
 c   5
 c   6
 b   7
 d   8
 {code}
 Now running this script in Pig 0.6 produces
 {code}
 kv = load 'kv' as (k,v);
 keys= foreach kv generate k;
 keys = distinct keys; 
 keys = limit keys 2;
 rejoin = join keys by k, kv by k;
 dump rejoin;
 {code}
 (a,a)
 (a,a)
 (a,a)
 (b,b)
 (b,b)
 Running this in Pig 0.5 version without column pruner results in:
 (a,a,1)
 (a,a,2)
 (a,a,3)
 (b,b,4)
 (b,b,7)
 When we disable the ColumnPruner optimization it gives right results.
 Viraj

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (PIG-1178) LogicalPlan and Optimizer are too complex and hard to work with

2010-03-14 Thread Daniel Dai (JIRA)

[
https://issues.apache.org/jira/browse/PIG-1178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12845173#action_12845173
]

Daniel Dai commented on PIG-1178:
-

pig_1178_3.3.patch committed. Manual unit pass.

LogicalPlan and Optimizer are too complex and hard to work with
---

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (PIG-200) Pig Performance Benchmarks

2010-03-14 Thread duncan (JIRA)

[
https://issues.apache.org/jira/browse/PIG-200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12845174#action_12845174
]

duncan commented on PIG-200:

Hi Daniel,

How can I run the perf.patch? I saw a lot of different things in the perf.patch.
I want to generate data set and use those 14 pig queries for benchmarking.

Would you mind telling me more on how to use the perf.patch?

Thanks

Duncan

Pig Performance Benchmarks
--

Key: PIG-200
URL: https://issues.apache.org/jira/browse/PIG-200
Project: Pig
Issue Type: Task
Reporter: Amir Youssefi
Assignee: Alan Gates
Attachments: generate_data.pl, perf.hadoop.patch, perf.patch

To benchmark Pig performance, we need to have a TPC-H like Large Data Set
plus Script Collection. This is used in comparison of different Pig releases,
Pig vs. other systems (e.g. Pig + Hadoop vs. Hadoop Only).
Here is Wiki for small tests: http://wiki.apache.org/pig/PigPerformance
I am currently running long-running Pig scripts over data-sets in the order
of tens of TBs. Next step is hundreds of TBs.
We need to have an open large-data set (open source scripts which generate
data-set) and detailed scripts for important operations such as ORDER,
AGGREGATION etc.
We can call those the Pig Workouts: Cardio (short processing), Marathon (long
running scripts) and Triathlon (Mix).
I will update this JIRA with more details of current activities soon.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (PIG-1272) Column pruner causes wrong results

[jira] Commented: (PIG-1178) LogicalPlan and Optimizer are too complex and hard to work with

[jira] Commented: (PIG-1272) Column pruner causes wrong results

[jira] Updated: (PIG-1272) Column pruner causes wrong results

[jira] Commented: (PIG-1178) LogicalPlan and Optimizer are too complex and hard to work with

[jira] Commented: (PIG-200) Pig Performance Benchmarks

6 matches

Site Navigation

Mail list logo

Footer information