Build failed in Hudson: Pig-trunk #633

2009-12-02 Thread Apache Hudson Server
See http://hudson.zones.apache.org/hudson/job/Pig-trunk/633/changes

Changes:

[gates] PIG-1098 Zebra Performance Optimizations.

--
[...truncated 2691 lines...]
ivy-init-dirs:

ivy-probe-antlib:

ivy-init-antlib:

ivy-init:

ivy-buildJar:
[ivy:resolve] :: resolving dependencies :: 
org.apache.pig#Pig;2009-12-02_10-05-59
[ivy:resolve]   confs: [buildJar]
[ivy:resolve]   found com.jcraft#jsch;0.1.38 in maven2
[ivy:resolve]   found jline#jline;0.9.94 in maven2
[ivy:resolve]   found net.java.dev.javacc#javacc;4.2 in maven2
[ivy:resolve]   found junit#junit;4.5 in default
[ivy:resolve] :: resolution report :: resolve 70ms :: artifacts dl 4ms
-
|  |modules||   artifacts   |
|   conf   | number| search|dwnlded|evicted|| number|dwnlded|
-
| buildJar |   4   |   0   |   0   |   0   ||   4   |   0   |
-
[ivy:retrieve] :: retrieving :: org.apache.pig#Pig
[ivy:retrieve]  confs: [buildJar]
[ivy:retrieve]  1 artifacts copied, 3 already retrieved (288kB/4ms)

buildJar:
 [echo] svnString 886097
  [jar] Building jar: 
http://hudson.zones.apache.org/hudson/job/Pig-trunk/ws/trunk/build/pig-2009-12-02_10-05-59.jar
 [copy] Copying 1 file to 
http://hudson.zones.apache.org/hudson/job/Pig-trunk/ws/trunk

jarWithOutSvn:

findbugs:
[mkdir] Created dir: 
http://hudson.zones.apache.org/hudson/job/Pig-trunk/ws/trunk/build/test/findbugs
 [findbugs] Executing findbugs from ant task
 [findbugs] Running FindBugs...
 [findbugs] The following classes needed for analysis were missing:
 [findbugs]   com.jcraft.jsch.SocketFactory
 [findbugs]   com.jcraft.jsch.Logger
 [findbugs]   jline.Completor
 [findbugs]   com.jcraft.jsch.Session
 [findbugs]   com.jcraft.jsch.HostKeyRepository
 [findbugs]   com.jcraft.jsch.JSch
 [findbugs]   com.jcraft.jsch.UserInfo
 [findbugs]   jline.ConsoleReaderInputStream
 [findbugs]   com.jcraft.jsch.HostKey
 [findbugs]   jline.ConsoleReader
 [findbugs]   com.jcraft.jsch.ChannelExec
 [findbugs]   jline.History
 [findbugs]   com.jcraft.jsch.ChannelDirectTCPIP
 [findbugs]   com.jcraft.jsch.JSchException
 [findbugs]   com.jcraft.jsch.Channel
 [findbugs] Warnings generated: 20
 [findbugs] Missing classes: 16
 [findbugs] Calculating exit code...
 [findbugs] Setting 'missing class' flag (2)
 [findbugs] Setting 'bugs found' flag (1)
 [findbugs] Exit code set to: 3
 [findbugs] Java Result: 3
 [findbugs] Classes needed for analysis were missing
 [findbugs] Output saved to 
http://hudson.zones.apache.org/hudson/job/Pig-trunk/ws/trunk/build/test/findbugs/pig-findbugs-report.xml
 [xslt] Processing 
http://hudson.zones.apache.org/hudson/job/Pig-trunk/ws/trunk/build/test/findbugs/pig-findbugs-report.xml
 to 
http://hudson.zones.apache.org/hudson/job/Pig-trunk/ws/trunk/build/test/findbugs/pig-findbugs-report.html
 [xslt] Loading stylesheet 
/homes/gkesavan/tools/findbugs/latest/src/xsl/default.xsl

BUILD SUCCESSFUL
Total time: 2 minutes 55 seconds
+ mv build/pig-2009-12-02_10-05-59.tar.gz 
http://hudson.zones.apache.org/hudson/job/Pig-trunk/ws/trunk
+ mv build/test/findbugs 
http://hudson.zones.apache.org/hudson/job/Pig-trunk/ws/trunk
+ mv build/docs/api 
http://hudson.zones.apache.org/hudson/job/Pig-trunk/ws/trunk
+ /homes/hudson/tools/ant/apache-ant-1.7.0/bin/ant clean
Buildfile: build.xml

clean:
   [delete] Deleting directory 
http://hudson.zones.apache.org/hudson/job/Pig-trunk/ws/trunk/src-gen
   [delete] Deleting directory 
http://hudson.zones.apache.org/hudson/job/Pig-trunk/ws/trunk/src/docs/build
   [delete] Deleting directory 
http://hudson.zones.apache.org/hudson/job/Pig-trunk/ws/trunk/build
   [delete] Deleting directory 
http://hudson.zones.apache.org/hudson/job/Pig-trunk/ws/trunk/test/org/apache/pig/test/utils/dotGraph/parser

BUILD SUCCESSFUL
Total time: 0 seconds
+ /homes/hudson/tools/ant/apache-ant-1.7.0/bin/ant 
-Dtest.junit.output.format=xml -Dtest.output=yes 
-Dcheckstyle.home=/homes/hudson/tools/checkstyle/latest -Drun.clover=true 
-Dclover.home=/homes/hudson/tools/clover/latest clover test 
generate-clover-reports
Buildfile: build.xml

clover.setup:
[mkdir] Created dir: 
http://hudson.zones.apache.org/hudson/job/Pig-trunk/ws/trunk/build/test/clover/db
[clover-setup] Clover Version 2.4.3, built on March 09 2009 (build-756)
[clover-setup] Loaded from: /homes/hudson/tools/clover/latest/lib/clover.jar
[clover-setup] Clover: Open Source License registered to Apache.
[clover-setup] Clover is enabled with initstring 
'http://hudson.zones.apache.org/hudson/job/Pig-trunk/ws/trunk/build/test/clover/db/pig_coverage.db'

clover.info:

clover:

test:

ivy-download:
  [get] Getting: 

[jira] Commented: (PIG-965) PERFORMANCE: optimize common case in matches (PORegex)

2009-12-02 Thread Thejas M Nair (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12784849#action_12784849
 ] 

Thejas M Nair commented on PIG-965:
---

In the above performance numbers, I assume optimization 2 (custom string 
comparison) is used only for the regex .*ABCD.* , while optimization 1 
(re-using compiled pattern) is used with dk.brics.automaton as well. Can you 
please confirm ?

From the performance numbers, it looks like we don't need to do optimization 
2. We can just use dk.brics.automaton for the common regexes as well and keep 
the pig code simpler.



 PERFORMANCE: optimize common case in matches (PORegex)
 --

 Key: PIG-965
 URL: https://issues.apache.org/jira/browse/PIG-965
 Project: Pig
  Issue Type: Improvement
  Components: impl
Reporter: Thejas M Nair
Assignee: Ankit Modi

 Some frequently seen use cases of 'matches' comparison operator have follow 
 properties -
 1. The rhs is a constant string . eg c1 matches 'abc%' 
 2. Regexes such that look for matching prefix , suffix etc are very common. 
 eg - abc%', %abc, '%abc%' 
 To optimize for these common cases , PORegex.java can be changed to -
 1. Compile the pattern (rhs of matches) re-use it if the pattern string has 
 not changed. 
 2. Use string comparisons for simple common regexes (in 2 above).
 The implementation of Hive like clause uses similar optimizations.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-922) Logical optimizer: push up project

2009-12-02 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12784879#action_12784879
 ] 

Hadoop QA commented on PIG-922:
---

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12426641/PIG-922-p3_13.patch
  against trunk revision 886015.

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 60 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs warnings.

-1 release audit.  The applied patch generated 368 release audit warnings 
(more than the trunk's current 362 warnings).

+1 core tests.  The patch passed core unit tests.

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/75/testReport/
Release audit warnings: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/75/artifact/trunk/patchprocess/releaseAuditDiffWarnings.txt
Findbugs warnings: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/75/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/75/console

This message is automatically generated.

 Logical optimizer: push up project
 --

 Key: PIG-922
 URL: https://issues.apache.org/jira/browse/PIG-922
 Project: Pig
  Issue Type: New Feature
  Components: impl
Affects Versions: 0.3.0
Reporter: Daniel Dai
Assignee: Daniel Dai
 Fix For: 0.6.0

 Attachments: PIG-922-p1_0.patch, PIG-922-p1_1.patch, 
 PIG-922-p1_2.patch, PIG-922-p1_3.patch, PIG-922-p1_4.patch, 
 PIG-922-p2_preview.patch, PIG-922-p2_preview2.patch, PIG-922-p3_1.patch, 
 PIG-922-p3_10.patch, PIG-922-p3_11.patch, PIG-922-p3_12.patch, 
 PIG-922-p3_13.patch, PIG-922-p3_2.patch, PIG-922-p3_3.patch, 
 PIG-922-p3_4.patch, PIG-922-p3_5.patch, PIG-922-p3_6.patch, 
 PIG-922-p3_7.patch, PIG-922-p3_8.patch, PIG-922-p3_9.patch


 This is a continuation work of 
 [PIG-697|https://issues.apache.org/jira/browse/PIG-697]. We need to add 
 another rule to the logical optimizer: Push up project, ie, prune columns as 
 early as possible.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-966) Proposed rework for LoadFunc, StoreFunc, and Slice/r interfaces

2009-12-02 Thread Dmitriy V. Ryaboy (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12784901#action_12784901
 ] 

Dmitriy V. Ryaboy commented on PIG-966:
---

Quick question:
I don't remember if we've gone over this before -- why is the sortedness 
information considered part of the schema? Shouldn't it be part of the 
statistics?

 Proposed rework for LoadFunc, StoreFunc, and Slice/r interfaces
 ---

 Key: PIG-966
 URL: https://issues.apache.org/jira/browse/PIG-966
 Project: Pig
  Issue Type: Improvement
  Components: impl
Reporter: Alan Gates
Assignee: Alan Gates

 I propose that we rework the LoadFunc, StoreFunc, and Slice/r interfaces 
 significantly.  See http://wiki.apache.org/pig/LoadStoreRedesignProposal for 
 full details

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-966) Proposed rework for LoadFunc, StoreFunc, and Slice/r interfaces

2009-12-02 Thread Alan Gates (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12784931#action_12784931
 ] 

Alan Gates commented on PIG-966:


You can make an argument for putting it in either place.  I argue for putting 
it in for a couple of reasons:

It is useful to a large number of potential optimizations.

Unlike most other statistics, it can be used in correctness checks (eg the user 
asked for a merge join, is the data sorted on the join key?)

The only downside I can see is that some systems that will understand column 
names and types won't necessarily understand sortedness (like json).  But it's 
no harder for the loader to figure out sortedness for the schema than it is for 
the statistics.

 Proposed rework for LoadFunc, StoreFunc, and Slice/r interfaces
 ---

 Key: PIG-966
 URL: https://issues.apache.org/jira/browse/PIG-966
 Project: Pig
  Issue Type: Improvement
  Components: impl
Reporter: Alan Gates
Assignee: Alan Gates

 I propose that we rework the LoadFunc, StoreFunc, and Slice/r interfaces 
 significantly.  See http://wiki.apache.org/pig/LoadStoreRedesignProposal for 
 full details

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1068) COGROUP fails with 'Type mismatch in key from map: expected org.apache.pig.impl.io.NullableText, recieved org.apache.pig.impl.io.NullableTuple'

2009-12-02 Thread Richard Ding (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1068?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Richard Ding updated PIG-1068:
--

Attachment: PIG-1068.patch

This patch fixed the problem by moving the unwrapping logic from demuxer to 
packager.

 COGROUP fails with 'Type mismatch in key from map: expected 
 org.apache.pig.impl.io.NullableText, recieved 
 org.apache.pig.impl.io.NullableTuple'
 ---

 Key: PIG-1068
 URL: https://issues.apache.org/jira/browse/PIG-1068
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.4.0
Reporter: Vikram Oberoi
Assignee: Richard Ding
 Fix For: 0.6.0

 Attachments: cogroup-bug.pig, log, PIG-1068.patch


 The COGROUP in the following script fails in its map:
 {code}
 logs = LOAD '$LOGS' USING PigStorage() AS (ts:int, id:chararray, 
 command:chararray, comments:chararray);   
 
   
   

 SPLIT logs INTO logins IF command == 'login', all_quits IF command == 'quit'; 
   

   
   

 -- Project login clients and count them by ID.
   

 login_info = FOREACH logins { 
   

 GENERATE id as id,
   

 comments AS client;   
   

 };
   

   
   

 logins_grouped = GROUP login_info BY (id, client);
   

   
   

 count_logins_by_client = FOREACH logins_grouped { 
   

 generate group.id AS id, group.client AS client, COUNT($1) AS count;  
   

 } 
   

   
   

 -- Get the first quit.
   

 all_quits_grouped = GROUP all_quits BY id;
   

   
 

[jira] Updated: (PIG-1068) COGROUP fails with 'Type mismatch in key from map: expected org.apache.pig.impl.io.NullableText, recieved org.apache.pig.impl.io.NullableTuple'

2009-12-02 Thread Richard Ding (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1068?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Richard Ding updated PIG-1068:
--

Status: Patch Available  (was: Open)

 COGROUP fails with 'Type mismatch in key from map: expected 
 org.apache.pig.impl.io.NullableText, recieved 
 org.apache.pig.impl.io.NullableTuple'
 ---

 Key: PIG-1068
 URL: https://issues.apache.org/jira/browse/PIG-1068
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.4.0
Reporter: Vikram Oberoi
Assignee: Richard Ding
 Fix For: 0.6.0

 Attachments: cogroup-bug.pig, log, PIG-1068.patch


 The COGROUP in the following script fails in its map:
 {code}
 logs = LOAD '$LOGS' USING PigStorage() AS (ts:int, id:chararray, 
 command:chararray, comments:chararray);   
 
   
   

 SPLIT logs INTO logins IF command == 'login', all_quits IF command == 'quit'; 
   

   
   

 -- Project login clients and count them by ID.
   

 login_info = FOREACH logins { 
   

 GENERATE id as id,
   

 comments AS client;   
   

 };
   

   
   

 logins_grouped = GROUP login_info BY (id, client);
   

   
   

 count_logins_by_client = FOREACH logins_grouped { 
   

 generate group.id AS id, group.client AS client, COUNT($1) AS count;  
   

 } 
   

   
   

 -- Get the first quit.
   

 all_quits_grouped = GROUP all_quits BY id;
   

   
   

 quits = FOREACH 

[jira] Updated: (PIG-1111) [Zebra] multiple outputs support

2009-12-02 Thread Gaurav Jain (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gaurav Jain updated PIG-:
-

Affects Version/s: 0.7.0
   0.6.0
   Status: Patch Available  (was: Open)


Please review and provide feedback at your earliest convenience

 [Zebra] multiple outputs support
 

 Key: PIG-
 URL: https://issues.apache.org/jira/browse/PIG-
 Project: Pig
  Issue Type: New Feature
Affects Versions: 0.6.0, 0.7.0
Reporter: Gaurav Jain
Assignee: Gaurav Jain
 Fix For: 0.6.0, 0.7.0

 Attachments: PIG-.patch


 Zebra enables application to stream data into different zebra table instances.
 New Interface added:
 setMultipleOutputs( JobConf jobconf, String commaSeparatedLocation, Class? 
 extends ZebraOutputPartitioner theClass.
 Zebra maintains a list of tables instances based on commaseparatedlocations ( 
 in that order )
 ZebraOutputPartitioner interface has getOutputPartition method which is 
 implemented by the application. It will return an index into the list. Zebra 
 will write to that instance
 We also introduce a new mapred property for setting multiple outputs.
 mapred.lib.table.multi.output.dirs
  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1118) expression with aggregate functions returning null, with accumulate interface

2009-12-02 Thread Ying He (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1118?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ying He updated PIG-1118:
-

Attachment: PIG_1118.patch

bug fix.

 expression with aggregate functions returning null, with accumulate interface
 -

 Key: PIG-1118
 URL: https://issues.apache.org/jira/browse/PIG-1118
 Project: Pig
  Issue Type: Bug
Reporter: Thejas M Nair
Assignee: Ying He
 Fix For: 0.7.0

 Attachments: PIG_1118.patch


 The problem is in trunk . It works fine in 0.6 branch.
 l = load '/tmp/students.txt' as (a : chararray,b : chararray,c : int);
 grunt g = group l by 1;
 grunt dump g;
 (1,{(asdfxc,M,23),(qwer,F,21),(uhsdf,M,34),(zxldf,M,21),(qwer,F,23),(oiue,M,54)})
 grunt f = foreach g generate SUM(l.c), 1 + SUM(l.c) + SUM(l.c);
 grunt dump f;
 (176L,)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-922) Logical optimizer: push up project

2009-12-02 Thread Pradeep Kamath (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12784985#action_12784985
 ] 

Pradeep Kamath commented on PIG-922:


I reviewed the changes to pass the load signature to the slicer/Slice and 
PigStorage for column pruning to work on the backend - the changes look good. 
The one change I wasn't clear about was the use of signature in order by since 
currently LOSort's alias is used as the signature and that would not be useful 
to the Slicer/slice or PigStorage in the backend since they would expect the 
LOLoad's alias.

 Logical optimizer: push up project
 --

 Key: PIG-922
 URL: https://issues.apache.org/jira/browse/PIG-922
 Project: Pig
  Issue Type: New Feature
  Components: impl
Affects Versions: 0.3.0
Reporter: Daniel Dai
Assignee: Daniel Dai
 Fix For: 0.6.0

 Attachments: PIG-922-p1_0.patch, PIG-922-p1_1.patch, 
 PIG-922-p1_2.patch, PIG-922-p1_3.patch, PIG-922-p1_4.patch, 
 PIG-922-p2_preview.patch, PIG-922-p2_preview2.patch, PIG-922-p3_1.patch, 
 PIG-922-p3_10.patch, PIG-922-p3_11.patch, PIG-922-p3_12.patch, 
 PIG-922-p3_13.patch, PIG-922-p3_2.patch, PIG-922-p3_3.patch, 
 PIG-922-p3_4.patch, PIG-922-p3_5.patch, PIG-922-p3_6.patch, 
 PIG-922-p3_7.patch, PIG-922-p3_8.patch, PIG-922-p3_9.patch


 This is a continuation work of 
 [PIG-697|https://issues.apache.org/jira/browse/PIG-697]. We need to add 
 another rule to the logical optimizer: Push up project, ie, prune columns as 
 early as possible.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1086) Nested sort by * throw exception

2009-12-02 Thread Richard Ding (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1086?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12784992#action_12784992
 ] 

Richard Ding commented on PIG-1086:
---

The script works if the input has no schema:

{code}
A = load '1.txt';
B = group A by a0;
C = foreach B { D = order A by *; generate group, D;};
explain C;
{code}

 Nested sort by * throw exception
 

 Key: PIG-1086
 URL: https://issues.apache.org/jira/browse/PIG-1086
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.5.0
Reporter: Daniel Dai

 The following script fail:
 A = load '1.txt' as (a0, a1, a2);
 B = group A by a0;
 C = foreach B { D = order A by *; generate group, D;};
 explain C;
 Here is the stack:
 Caused by: java.lang.ArrayIndexOutOfBoundsException: -1
 at java.util.ArrayList.get(ArrayList.java:324)
 at 
 org.apache.pig.impl.logicalLayer.schema.Schema.getField(Schema.java:752)
 at 
 org.apache.pig.impl.logicalLayer.LOSort.getSortInfo(LOSort.java:332)
 at 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.LogToPhyTranslationVisitor.visit(LogToPhyTranslationVisitor.java:1365)
 at org.apache.pig.impl.logicalLayer.LOSort.visit(LOSort.java:176)
 at org.apache.pig.impl.logicalLayer.LOSort.visit(LOSort.java:43)
 at 
 org.apache.pig.impl.plan.DependencyOrderWalkerWOSeenChk.walk(DependencyOrderWalkerWOSeenChk.java:69)
 at 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.LogToPhyTranslationVisitor.visit(LogToPhyTranslationVisitor.java:1274)
 at 
 org.apache.pig.impl.logicalLayer.LOForEach.visit(LOForEach.java:130)
 at org.apache.pig.impl.logicalLayer.LOForEach.visit(LOForEach.java:45)
 at 
 org.apache.pig.impl.plan.DependencyOrderWalker.walk(DependencyOrderWalker.java:69)
 at org.apache.pig.impl.plan.PlanVisitor.visit(PlanVisitor.java:51)
 at 
 org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.compile(HExecutionEngine.java:234)
 at org.apache.pig.PigServer.compilePp(PigServer.java:864)
 at org.apache.pig.PigServer.explain(PigServer.java:583)
 ... 8 more

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1111) [Zebra] multiple outputs support

2009-12-02 Thread Gaurav Jain (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gaurav Jain updated PIG-:
-

Status: Open  (was: Patch Available)


Submitting an update

 [Zebra] multiple outputs support
 

 Key: PIG-
 URL: https://issues.apache.org/jira/browse/PIG-
 Project: Pig
  Issue Type: New Feature
Affects Versions: 0.6.0, 0.7.0
Reporter: Gaurav Jain
Assignee: Gaurav Jain
 Fix For: 0.6.0, 0.7.0

 Attachments: PIG-.patch


 Zebra enables application to stream data into different zebra table instances.
 New Interface added:
 setMultipleOutputs( JobConf jobconf, String commaSeparatedLocation, Class? 
 extends ZebraOutputPartitioner theClass.
 Zebra maintains a list of tables instances based on commaseparatedlocations ( 
 in that order )
 ZebraOutputPartitioner interface has getOutputPartition method which is 
 implemented by the application. It will return an index into the list. Zebra 
 will write to that instance
 We also introduce a new mapred property for setting multiple outputs.
 mapred.lib.table.multi.output.dirs
  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1118) expression with aggregate functions returning null, with accumulate interface

2009-12-02 Thread Olga Natkovich (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12785025#action_12785025
 ] 

Olga Natkovich commented on PIG-1118:
-

Ying, the change looks good. Please, add a unit test for this bug.

 expression with aggregate functions returning null, with accumulate interface
 -

 Key: PIG-1118
 URL: https://issues.apache.org/jira/browse/PIG-1118
 Project: Pig
  Issue Type: Bug
Reporter: Thejas M Nair
Assignee: Ying He
 Fix For: 0.7.0

 Attachments: PIG_1118.patch


 The problem is in trunk . It works fine in 0.6 branch.
 l = load '/tmp/students.txt' as (a : chararray,b : chararray,c : int);
 grunt g = group l by 1;
 grunt dump g;
 (1,{(asdfxc,M,23),(qwer,F,21),(uhsdf,M,34),(zxldf,M,21),(qwer,F,23),(oiue,M,54)})
 grunt f = foreach g generate SUM(l.c), 1 + SUM(l.c) + SUM(l.c);
 grunt dump f;
 (176L,)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1118) expression with aggregate functions returning null, with accumulate interface

2009-12-02 Thread Olga Natkovich (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1118?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Olga Natkovich updated PIG-1118:


Status: Open  (was: Patch Available)

 expression with aggregate functions returning null, with accumulate interface
 -

 Key: PIG-1118
 URL: https://issues.apache.org/jira/browse/PIG-1118
 Project: Pig
  Issue Type: Bug
Reporter: Thejas M Nair
Assignee: Ying He
 Fix For: 0.7.0

 Attachments: PIG_1118.patch


 The problem is in trunk . It works fine in 0.6 branch.
 l = load '/tmp/students.txt' as (a : chararray,b : chararray,c : int);
 grunt g = group l by 1;
 grunt dump g;
 (1,{(asdfxc,M,23),(qwer,F,21),(uhsdf,M,34),(zxldf,M,21),(qwer,F,23),(oiue,M,54)})
 grunt f = foreach g generate SUM(l.c), 1 + SUM(l.c) + SUM(l.c);
 grunt dump f;
 (176L,)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1068) COGROUP fails with 'Type mismatch in key from map: expected org.apache.pig.impl.io.NullableText, recieved org.apache.pig.impl.io.NullableTuple'

2009-12-02 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1068?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12785029#action_12785029
 ] 

Hadoop QA commented on PIG-1068:


+1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12426691/PIG-1068.patch
  against trunk revision 886015.

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 3 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

+1 core tests.  The patch passed core unit tests.

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/76/testReport/
Findbugs warnings: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/76/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/76/console

This message is automatically generated.

 COGROUP fails with 'Type mismatch in key from map: expected 
 org.apache.pig.impl.io.NullableText, recieved 
 org.apache.pig.impl.io.NullableTuple'
 ---

 Key: PIG-1068
 URL: https://issues.apache.org/jira/browse/PIG-1068
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.4.0
Reporter: Vikram Oberoi
Assignee: Richard Ding
 Fix For: 0.6.0

 Attachments: cogroup-bug.pig, log, PIG-1068.patch


 The COGROUP in the following script fails in its map:
 {code}
 logs = LOAD '$LOGS' USING PigStorage() AS (ts:int, id:chararray, 
 command:chararray, comments:chararray);   
 
   
   

 SPLIT logs INTO logins IF command == 'login', all_quits IF command == 'quit'; 
   

   
   

 -- Project login clients and count them by ID.
   

 login_info = FOREACH logins { 
   

 GENERATE id as id,
   

 comments AS client;   
   

 };
   

   
   

 logins_grouped = GROUP login_info BY (id, client);
   

   
   

 count_logins_by_client = FOREACH logins_grouped { 
   

 generate group.id AS id, group.client AS client, COUNT($1) AS count;  

[jira] Commented: (PIG-1118) expression with aggregate functions returning null, with accumulate interface

2009-12-02 Thread Ying He (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12785043#action_12785043
 ] 

Ying He commented on PIG-1118:
--

Olga, thank for review.  A unit test is in the patch, TestAccumulator. 

 expression with aggregate functions returning null, with accumulate interface
 -

 Key: PIG-1118
 URL: https://issues.apache.org/jira/browse/PIG-1118
 Project: Pig
  Issue Type: Bug
Reporter: Thejas M Nair
Assignee: Ying He
 Fix For: 0.7.0

 Attachments: PIG_1118.patch


 The problem is in trunk . It works fine in 0.6 branch.
 l = load '/tmp/students.txt' as (a : chararray,b : chararray,c : int);
 grunt g = group l by 1;
 grunt dump g;
 (1,{(asdfxc,M,23),(qwer,F,21),(uhsdf,M,34),(zxldf,M,21),(qwer,F,23),(oiue,M,54)})
 grunt f = foreach g generate SUM(l.c), 1 + SUM(l.c) + SUM(l.c);
 grunt dump f;
 (176L,)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (PIG-1120) [zebra] should support using org.apache.hadoop.zebra.pig.TableStorer() if user does not want to specify storage hint

2009-12-02 Thread Jing Huang (JIRA)
[zebra] should support  using org.apache.hadoop.zebra.pig.TableStorer() if user 
does not want to specify storage hint
-

 Key: PIG-1120
 URL: https://issues.apache.org/jira/browse/PIG-1120
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.6.0
Reporter: Jing Huang
 Fix For: 0.6.0


If user doesn't want to specify storage hint, current zebra implementation only 
support  using org.apache.hadoop.zebra.pig.TableStorer('')  Note: empty string 
in TableStorer(' ').

We should support the format of  using 
org.apache.hadoop.zebra.pig.TableStorer() as we do on  using 
org.apache.hadoop.zebra.pig.TableLoader()

sample pig script:
register /grid/0/dev/hadoopqa/jars/zebra.jar;
a = load '1.txt' as (a:int, 
b:float,c:long,d:double,e:chararray,f:bytearray,r1(f1:chararray,f2:chararray),m1:map[]);

b = load '2.txt' as (a:int, 
b:float,c:long,d:double,e:chararray,f:bytearray,r1(f1:chararray,f2:chararray),m1:map[]);


c = join a by a, b by a;
d = foreach c generate a::a, a::b, b::c;
describe d;
dump d;
store d into 'join3' using org.apache.hadoop.zebra.pig.TableStorer('');
--this will fail
--store d into 'join3' using org.apache.hadoop.zebra.pig.TableStorer( );


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1118) expression with aggregate functions returning null, with accumulate interface

2009-12-02 Thread Olga Natkovich (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1118?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Olga Natkovich updated PIG-1118:


Status: Patch Available  (was: Open)

Thanks, I am not sure how I missed it :)

 expression with aggregate functions returning null, with accumulate interface
 -

 Key: PIG-1118
 URL: https://issues.apache.org/jira/browse/PIG-1118
 Project: Pig
  Issue Type: Bug
Reporter: Thejas M Nair
Assignee: Ying He
 Fix For: 0.7.0

 Attachments: PIG_1118.patch


 The problem is in trunk . It works fine in 0.6 branch.
 l = load '/tmp/students.txt' as (a : chararray,b : chararray,c : int);
 grunt g = group l by 1;
 grunt dump g;
 (1,{(asdfxc,M,23),(qwer,F,21),(uhsdf,M,34),(zxldf,M,21),(qwer,F,23),(oiue,M,54)})
 grunt f = foreach g generate SUM(l.c), 1 + SUM(l.c) + SUM(l.c);
 grunt dump f;
 (176L,)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (PIG-1121) [zebre] zebra user forces pig script to have 'as xxx' in foreach statement in order to be able to store successfully

2009-12-02 Thread Jing Huang (JIRA)
[zebre] zebra user forces pig script to have 'as xxx' in foreach statement in 
order to be able to store successfully


 Key: PIG-1121
 URL: https://issues.apache.org/jira/browse/PIG-1121
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.6.0
Reporter: Jing Huang
 Fix For: 0.6.0


In the following pig script, if user do 
b =  foreach a generate m1#'a' ; 

describe b will be:
b: {bytearray}
zebra store will fail, since there is no name passed to zebra, and zebra not 
only need type but also name in order to store. 

=
If user do 
b =  foreach a generate m1#'a' as ms1;

describe b will be:
b: {ms1: bytearray}

Then zebra store can be succeeded. 

=
Here is the full pig script. 
register /grid/0/dev/hadoopqa/jars/zebra.jar;
a = load '1.txt' as (a:int, 
b:float,c:long,d:double,e:chararray,f:bytearray,r1(f1:chararray,f2:chararray),m1:map[]);

b =  foreach a generate m1#'a' as ms1;
describe b;

store b into 'map1' using org.apache.hadoop.zebra.pig.TableStorer('');



So, we should either fix it or document it. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (PIG-1122) [zebra] Zebra build.xml still uses 0.6 version

2009-12-02 Thread Yan Zhou (JIRA)
[zebra] Zebra build.xml still uses 0.6 version
--

 Key: PIG-1122
 URL: https://issues.apache.org/jira/browse/PIG-1122
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.7.0
Reporter: Yan Zhou
Assignee: Yan Zhou
 Fix For: 0.7.0


 Zebra still uses pig-0.6.0-dev-core.jar in build-contrib.xml. It should be 
changed to pig-0.7.0-dev-core.jar on APACHE trunk only.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1122) [zebra] Zebra build.xml still uses 0.6 version

2009-12-02 Thread Yan Zhou (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1122?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yan Zhou updated PIG-1122:
--

Attachment: PIG-1122.patch

 [zebra] Zebra build.xml still uses 0.6 version
 --

 Key: PIG-1122
 URL: https://issues.apache.org/jira/browse/PIG-1122
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.7.0
Reporter: Yan Zhou
Assignee: Yan Zhou
 Fix For: 0.7.0

 Attachments: PIG-1122.patch


  Zebra still uses pig-0.6.0-dev-core.jar in build-contrib.xml. It should be 
 changed to pig-0.7.0-dev-core.jar on APACHE trunk only.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1122) [zebra] Zebra build.xml still uses 0.6 version

2009-12-02 Thread Yan Zhou (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1122?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yan Zhou updated PIG-1122:
--

Attachment: (was: PIG-1122.patch)

 [zebra] Zebra build.xml still uses 0.6 version
 --

 Key: PIG-1122
 URL: https://issues.apache.org/jira/browse/PIG-1122
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.7.0
Reporter: Yan Zhou
Assignee: Yan Zhou
 Fix For: 0.7.0


  Zebra still uses pig-0.6.0-dev-core.jar in build-contrib.xml. It should be 
 changed to pig-0.7.0-dev-core.jar on APACHE trunk only.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1114) MultiQuery optimization throws error when merging 2 level splits

2009-12-02 Thread Olga Natkovich (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12785115#action_12785115
 ] 

Olga Natkovich commented on PIG-1114:
-

+1 on the changes. will be committing now to trunk and 0.6 branch.

 MultiQuery optimization throws error when merging 2 level splits
 

 Key: PIG-1114
 URL: https://issues.apache.org/jira/browse/PIG-1114
 Project: Pig
  Issue Type: Bug
Reporter: Ankur
Assignee: Richard Ding
Priority: Critical
 Fix For: 0.6.0

 Attachments: PIG-1114.patch, Pig_1114_Client.log


 Multi-query optimization throws an error when merging 2 level splits. 
 Following is the script to reproduce the error
 data = LOAD 'data' USING PigStorage() AS (id:int, name:chararray);
 ids = FOREACH data GENERATE id;
 allId = GROUP ids all;
 allIdCount = FOREACH allId GENERATE group as allId, COUNT(ids) as total;
 idGroup = GROUP ids by id;
 idGroupCount = FOREACH idGroup GENERATE group as id, COUNT(ids) as count;
 countTotal = cross idGroupCount, allIdCount;
 idCountTotal = foreach countTotal generate
 id,
 count,
 total,
 (double)count / (double)total as proportion;
 orderedCounts = order idCountTotal by count desc;
 STORE orderedCounts INTO 'mq_problem/ids';
 names = FOREACH data GENERATE name;
 allNames = GROUP names all;
 allNamesCount = FOREACH allNames GENERATE group as namesAll, COUNT(names) as 
 total;
 nameGroup = GROUP names by name;
 nameGroupCount = FOREACH nameGroup GENERATE group as name, COUNT(names) as 
 count;
 namesCrossed = cross nameGroupCount, allNamesCount;
 nameCountTotal = foreach namesCrossed generate
 name,
 count,
 total,
 (double)count / (double)total as proportion;
 nameCountsOrdered = order nameCountTotal by count desc;
 STORE nameCountsOrdered INTO 'mq_problem/names';

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1122) [zebra] Zebra build.xml still uses 0.6 version

2009-12-02 Thread Yan Zhou (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1122?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yan Zhou updated PIG-1122:
--

Attachment: PIG-1122.patch

 [zebra] Zebra build.xml still uses 0.6 version
 --

 Key: PIG-1122
 URL: https://issues.apache.org/jira/browse/PIG-1122
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.7.0
Reporter: Yan Zhou
Assignee: Yan Zhou
 Fix For: 0.7.0

 Attachments: PIG-1122.patch


  Zebra still uses pig-0.6.0-dev-core.jar in build-contrib.xml. It should be 
 changed to pig-0.7.0-dev-core.jar on APACHE trunk only.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1122) [zebra] Zebra build.xml still uses 0.6 version

2009-12-02 Thread Yan Zhou (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12785118#action_12785118
 ] 

Yan Zhou commented on PIG-1122:
---

Note that the patch should be applied to trunk. 

Also note that there is no test case for this trivial versioning change so any 
Hundson grievance in that regard should be ignored.

 [zebra] Zebra build.xml still uses 0.6 version
 --

 Key: PIG-1122
 URL: https://issues.apache.org/jira/browse/PIG-1122
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.7.0
Reporter: Yan Zhou
Assignee: Yan Zhou
 Fix For: 0.7.0

 Attachments: PIG-1122.patch


  Zebra still uses pig-0.6.0-dev-core.jar in build-contrib.xml. It should be 
 changed to pig-0.7.0-dev-core.jar on APACHE trunk only.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1122) [zebra] Zebra build.xml still uses 0.6 version

2009-12-02 Thread Yan Zhou (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1122?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yan Zhou updated PIG-1122:
--

Status: Open  (was: Patch Available)

 [zebra] Zebra build.xml still uses 0.6 version
 --

 Key: PIG-1122
 URL: https://issues.apache.org/jira/browse/PIG-1122
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.7.0
Reporter: Yan Zhou
Assignee: Yan Zhou
 Fix For: 0.7.0

 Attachments: PIG-1122.patch


  Zebra still uses pig-0.6.0-dev-core.jar in build-contrib.xml. It should be 
 changed to pig-0.7.0-dev-core.jar on APACHE trunk only.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1122) [zebra] Zebra build.xml still uses 0.6 version

2009-12-02 Thread Yan Zhou (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1122?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yan Zhou updated PIG-1122:
--

Status: Patch Available  (was: Open)

 [zebra] Zebra build.xml still uses 0.6 version
 --

 Key: PIG-1122
 URL: https://issues.apache.org/jira/browse/PIG-1122
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.7.0
Reporter: Yan Zhou
Assignee: Yan Zhou
 Fix For: 0.7.0

 Attachments: PIG-1122.patch


  Zebra still uses pig-0.6.0-dev-core.jar in build-contrib.xml. It should be 
 changed to pig-0.7.0-dev-core.jar on APACHE trunk only.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1122) [zebra] Zebra build.xml still uses 0.6 version

2009-12-02 Thread Yan Zhou (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12785123#action_12785123
 ] 

Yan Zhou commented on PIG-1122:
---

This has not caused any problems since the CLASSPATH also contains the path 
pointing to the dir holding the PIG classes directly. But still, it's not 
perfect and could cause nasty headaches if someone has a leftover 0.6 Pig jar 
when build zebra.

 [zebra] Zebra build.xml still uses 0.6 version
 --

 Key: PIG-1122
 URL: https://issues.apache.org/jira/browse/PIG-1122
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.7.0
Reporter: Yan Zhou
Assignee: Yan Zhou
 Fix For: 0.7.0

 Attachments: PIG-1122.patch


  Zebra still uses pig-0.6.0-dev-core.jar in build-contrib.xml. It should be 
 changed to pig-0.7.0-dev-core.jar on APACHE trunk only.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1122) [zebra] Zebra build.xml still uses 0.6 version

2009-12-02 Thread Chao Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12785127#action_12785127
 ] 

Chao Wang commented on PIG-1122:


+1

 [zebra] Zebra build.xml still uses 0.6 version
 --

 Key: PIG-1122
 URL: https://issues.apache.org/jira/browse/PIG-1122
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.7.0
Reporter: Yan Zhou
Assignee: Yan Zhou
 Fix For: 0.7.0

 Attachments: PIG-1122.patch


  Zebra still uses pig-0.6.0-dev-core.jar in build-contrib.xml. It should be 
 changed to pig-0.7.0-dev-core.jar on APACHE trunk only.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1116) Remove redundant map-reduce job for merge join

2009-12-02 Thread Olga Natkovich (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12785134#action_12785134
 ] 

Olga Natkovich commented on PIG-1116:
-

+1

 Remove redundant map-reduce job for merge join
 --

 Key: PIG-1116
 URL: https://issues.apache.org/jira/browse/PIG-1116
 Project: Pig
  Issue Type: Bug
Reporter: Daniel Dai
Assignee: Pradeep Kamath
 Fix For: 0.6.0

 Attachments: PIG-1116.patch


 In merge join, when we convert right hand side file into a side file, we 
 didn't remove it from the map-reduce plan, we only disconnect it from the 
 plan. When we run the query, the redundant load will load the data but doing 
 nothing. This operation should be removed entirely. 
 Eg: 
 a = load '/user/pig/tests/data/zebra/singlefile/studentsortedtab10k' using 
 org.apache.hadoop.zebra.pig.TableLoader('', 'sorted') as (name, age, gpa);
 b = load '/user/pig/tests/data/zebra/singlefile/votersortedtab10k' using 
 org.apache.hadoop.zebra.pig.TableLoader('', 'sorted') as (name, age, 
 registration, contributions);
 c = join a by name, b by name using merge;
 explain c;
 {code}
 #--
 # Map Reduce Plan  
 #--
 MapReduce node 1-21
 Map Plan
 Load(hdfs://wilbur20.labs.corp.sp1.yahoo.com:9020/user/pig/tests/data/zebra/singlefile/votersortedtab10k:org.apache.hadoop.zebra.pig.TableLoader('','sorted'))
  - 1-13
 Global sort: false
 
 MapReduce node 1-20
 Map Plan
 Store(fakefile:org.apache.pig.builtin.PigStorage) - 1-19
 |
 |---MergeJoin[tuple] - 1-16
 |
 
 |---Load(hdfs://wilbur20.labs.corp.sp1.yahoo.com:9020/user/pig/tests/data/zebra/singlefile/studentsortedtab10k:org.apache.hadoop.zebra.pig.TableLoader('','sorted'))
  - 1-12
 Global sort: false
 
 {code}
 1-21 should be removed.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-922) Logical optimizer: push up project

2009-12-02 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-922?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-922:
---

Attachment: PIG-922-p3_14.patch

Address the review comments from Pradeep. Actually we do not need to do 
anything special for order by. We only prune columns on the upfront LOLoad in 
the logical plan. Order by will read intermediate input file (In the case if we 
do not have input schema, order by will read user input file directly, however, 
prune columns only kick in when user give an input schema, so it is not the 
case), nothing will be pruned. 

 Logical optimizer: push up project
 --

 Key: PIG-922
 URL: https://issues.apache.org/jira/browse/PIG-922
 Project: Pig
  Issue Type: New Feature
  Components: impl
Affects Versions: 0.3.0
Reporter: Daniel Dai
Assignee: Daniel Dai
 Fix For: 0.6.0

 Attachments: PIG-922-p1_0.patch, PIG-922-p1_1.patch, 
 PIG-922-p1_2.patch, PIG-922-p1_3.patch, PIG-922-p1_4.patch, 
 PIG-922-p2_preview.patch, PIG-922-p2_preview2.patch, PIG-922-p3_1.patch, 
 PIG-922-p3_10.patch, PIG-922-p3_11.patch, PIG-922-p3_12.patch, 
 PIG-922-p3_13.patch, PIG-922-p3_14.patch, PIG-922-p3_2.patch, 
 PIG-922-p3_3.patch, PIG-922-p3_4.patch, PIG-922-p3_5.patch, 
 PIG-922-p3_6.patch, PIG-922-p3_7.patch, PIG-922-p3_8.patch, PIG-922-p3_9.patch


 This is a continuation work of 
 [PIG-697|https://issues.apache.org/jira/browse/PIG-697]. We need to add 
 another rule to the logical optimizer: Push up project, ie, prune columns as 
 early as possible.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-922) Logical optimizer: push up project

2009-12-02 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-922?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-922:
---

Status: Open  (was: Patch Available)

 Logical optimizer: push up project
 --

 Key: PIG-922
 URL: https://issues.apache.org/jira/browse/PIG-922
 Project: Pig
  Issue Type: New Feature
  Components: impl
Affects Versions: 0.3.0
Reporter: Daniel Dai
Assignee: Daniel Dai
 Fix For: 0.6.0

 Attachments: PIG-922-p1_0.patch, PIG-922-p1_1.patch, 
 PIG-922-p1_2.patch, PIG-922-p1_3.patch, PIG-922-p1_4.patch, 
 PIG-922-p2_preview.patch, PIG-922-p2_preview2.patch, PIG-922-p3_1.patch, 
 PIG-922-p3_10.patch, PIG-922-p3_11.patch, PIG-922-p3_12.patch, 
 PIG-922-p3_13.patch, PIG-922-p3_14.patch, PIG-922-p3_2.patch, 
 PIG-922-p3_3.patch, PIG-922-p3_4.patch, PIG-922-p3_5.patch, 
 PIG-922-p3_6.patch, PIG-922-p3_7.patch, PIG-922-p3_8.patch, PIG-922-p3_9.patch


 This is a continuation work of 
 [PIG-697|https://issues.apache.org/jira/browse/PIG-697]. We need to add 
 another rule to the logical optimizer: Push up project, ie, prune columns as 
 early as possible.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-922) Logical optimizer: push up project

2009-12-02 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-922?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-922:
---

Status: Patch Available  (was: Open)

 Logical optimizer: push up project
 --

 Key: PIG-922
 URL: https://issues.apache.org/jira/browse/PIG-922
 Project: Pig
  Issue Type: New Feature
  Components: impl
Affects Versions: 0.3.0
Reporter: Daniel Dai
Assignee: Daniel Dai
 Fix For: 0.6.0

 Attachments: PIG-922-p1_0.patch, PIG-922-p1_1.patch, 
 PIG-922-p1_2.patch, PIG-922-p1_3.patch, PIG-922-p1_4.patch, 
 PIG-922-p2_preview.patch, PIG-922-p2_preview2.patch, PIG-922-p3_1.patch, 
 PIG-922-p3_10.patch, PIG-922-p3_11.patch, PIG-922-p3_12.patch, 
 PIG-922-p3_13.patch, PIG-922-p3_14.patch, PIG-922-p3_2.patch, 
 PIG-922-p3_3.patch, PIG-922-p3_4.patch, PIG-922-p3_5.patch, 
 PIG-922-p3_6.patch, PIG-922-p3_7.patch, PIG-922-p3_8.patch, PIG-922-p3_9.patch


 This is a continuation work of 
 [PIG-697|https://issues.apache.org/jira/browse/PIG-697]. We need to add 
 another rule to the logical optimizer: Push up project, ie, prune columns as 
 early as possible.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1118) expression with aggregate functions returning null, with accumulate interface

2009-12-02 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12785166#action_12785166
 ] 

Hadoop QA commented on PIG-1118:


-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12426698/PIG_1118.patch
  against trunk revision 886015.

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 3 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

-1 core tests.  The patch failed core unit tests.

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/79/testReport/
Findbugs warnings: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/79/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/79/console

This message is automatically generated.

 expression with aggregate functions returning null, with accumulate interface
 -

 Key: PIG-1118
 URL: https://issues.apache.org/jira/browse/PIG-1118
 Project: Pig
  Issue Type: Bug
Reporter: Thejas M Nair
Assignee: Ying He
 Fix For: 0.7.0

 Attachments: PIG_1118.patch


 The problem is in trunk . It works fine in 0.6 branch.
 l = load '/tmp/students.txt' as (a : chararray,b : chararray,c : int);
 grunt g = group l by 1;
 grunt dump g;
 (1,{(asdfxc,M,23),(qwer,F,21),(uhsdf,M,34),(zxldf,M,21),(qwer,F,23),(oiue,M,54)})
 grunt f = foreach g generate SUM(l.c), 1 + SUM(l.c) + SUM(l.c);
 grunt dump f;
 (176L,)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.