Build failed in Hudson: Pig-Patch-minerva.apache.org #83

2009-06-16 Thread Apache Hudson Server
See http://hudson.zones.apache.org/hudson/job/Pig-Patch-minerva.apache.org/83/

--
[...truncated 93019 lines...]
 [exec] [junit] 09/06/15 23:27:09 INFO dfs.DataNode: PacketResponder 2 
for block blk_617666534651904453_1011 terminating
 [exec] [junit] 09/06/15 23:27:09 INFO dfs.StateChange: BLOCK* 
NameSystem.addStoredBlock: blockMap updated: 127.0.0.1:37738 is added to 
blk_617666534651904453_1011 size 6
 [exec] [junit] 09/06/15 23:27:09 INFO dfs.StateChange: BLOCK* 
NameSystem.addStoredBlock: blockMap updated: 127.0.0.1:60443 is added to 
blk_617666534651904453_1011 size 6
 [exec] [junit] 09/06/15 23:27:09 INFO 
executionengine.HExecutionEngine: Connecting to hadoop file system at: 
hdfs://localhost:49543
 [exec] [junit] 09/06/15 23:27:09 INFO 
executionengine.HExecutionEngine: Connecting to map-reduce job tracker at: 
localhost:33050
 [exec] [junit] 09/06/15 23:27:09 INFO 
mapReduceLayer.MultiQueryOptimizer: MR plan size before optimization: 1
 [exec] [junit] 09/06/15 23:27:09 INFO 
mapReduceLayer.MultiQueryOptimizer: MR plan size after optimization: 1
 [exec] [junit] 09/06/15 23:27:10 INFO 
mapReduceLayer.JobControlCompiler: Setting up single store job
 [exec] [junit] 09/06/15 23:27:10 WARN mapred.JobClient: Use 
GenericOptionsParser for parsing the arguments. Applications should implement 
Tool for the same.
 [exec] [junit] 09/06/15 23:27:10 INFO dfs.StateChange: BLOCK* 
NameSystem.allocateBlock: 
/tmp/hadoop-hudson/mapred/system/job_200906152326_0002/job.jar. 
blk_4875537568425946817_1012
 [exec] [junit] 09/06/15 23:27:10 INFO dfs.DataNode: Receiving block 
blk_4875537568425946817_1012 src: /127.0.0.1:35048 dest: /127.0.0.1:60443
 [exec] [junit] 09/06/15 23:27:10 INFO dfs.DataNode: Receiving block 
blk_4875537568425946817_1012 src: /127.0.0.1:52294 dest: /127.0.0.1:37738
 [exec] [junit] 09/06/15 23:27:10 INFO dfs.DataNode: Receiving block 
blk_4875537568425946817_1012 src: /127.0.0.1:48953 dest: /127.0.0.1:58947
 [exec] [junit] 09/06/15 23:27:10 INFO dfs.DataNode: Received block 
blk_4875537568425946817_1012 of size 1413561 from /127.0.0.1
 [exec] [junit] 09/06/15 23:27:10 INFO dfs.DataNode: PacketResponder 0 
for block blk_4875537568425946817_1012 terminating
 [exec] [junit] 09/06/15 23:27:10 INFO dfs.DataNode: Received block 
blk_4875537568425946817_1012 of size 1413561 from /127.0.0.1
 [exec] [junit] 09/06/15 23:27:10 INFO dfs.DataNode: PacketResponder 1 
for block blk_4875537568425946817_1012 terminating
 [exec] [junit] 09/06/15 23:27:10 INFO dfs.StateChange: BLOCK* 
NameSystem.addStoredBlock: blockMap updated: 127.0.0.1:58947 is added to 
blk_4875537568425946817_1012 size 1413561
 [exec] [junit] 09/06/15 23:27:10 INFO dfs.DataNode: Received block 
blk_4875537568425946817_1012 of size 1413561 from /127.0.0.1
 [exec] [junit] 09/06/15 23:27:10 INFO dfs.DataNode: PacketResponder 2 
for block blk_4875537568425946817_1012 terminating
 [exec] [junit] 09/06/15 23:27:10 INFO dfs.StateChange: BLOCK* 
NameSystem.addStoredBlock: blockMap updated: 127.0.0.1:37738 is added to 
blk_4875537568425946817_1012 size 1413561
 [exec] [junit] 09/06/15 23:27:10 INFO dfs.StateChange: BLOCK* 
NameSystem.addStoredBlock: blockMap updated: 127.0.0.1:60443 is added to 
blk_4875537568425946817_1012 size 1413561
 [exec] [junit] 09/06/15 23:27:10 INFO fs.FSNamesystem: Increasing 
replication for file 
/tmp/hadoop-hudson/mapred/system/job_200906152326_0002/job.jar. New replication 
is 2
 [exec] [junit] 09/06/15 23:27:10 INFO fs.FSNamesystem: Reducing 
replication for file 
/tmp/hadoop-hudson/mapred/system/job_200906152326_0002/job.jar. New replication 
is 2
 [exec] [junit] 09/06/15 23:27:11 INFO dfs.StateChange: BLOCK* 
NameSystem.allocateBlock: 
/tmp/hadoop-hudson/mapred/system/job_200906152326_0002/job.split. 
blk_37034840890077491_1013
 [exec] [junit] 09/06/15 23:27:11 INFO dfs.DataNode: Receiving block 
blk_37034840890077491_1013 src: /127.0.0.1:52296 dest: /127.0.0.1:37738
 [exec] [junit] 09/06/15 23:27:11 INFO dfs.DataNode: Receiving block 
blk_37034840890077491_1013 src: /127.0.0.1:33030 dest: /127.0.0.1:43694
 [exec] [junit] 09/06/15 23:27:11 INFO dfs.DataNode: Receiving block 
blk_37034840890077491_1013 src: /127.0.0.1:35053 dest: /127.0.0.1:60443
 [exec] [junit] 09/06/15 23:27:11 INFO dfs.DataNode: Received block 
blk_37034840890077491_1013 of size 14547 from /127.0.0.1
 [exec] [junit] 09/06/15 23:27:11 INFO dfs.DataNode: PacketResponder 0 
for block blk_37034840890077491_1013 terminating
 [exec] [junit] 09/06/15 23:27:11 INFO dfs.StateChange: BLOCK* 
NameSystem.addStoredBlock: blockMap updated: 127.0.0.1:60443 is added to 
blk_37034840890077491_1013 size 14547
 [exec] [junit] 09/06/15 23:27:11 INFO dfs.DataNode: Received block 

[jira] Commented: (PIG-852) pig -version or pig -help returns exit code of 1

2009-06-16 Thread Milind Bhandarkar (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12719942#action_12719942
 ] 

Milind Bhandarkar commented on PIG-852:
---

Note that JUnit tests that test the return code of a completely different JVM 
are kludgy at best. Therefore writing a test case of checking the System.exit() 
return value is insane. Hadoop folks have *mostly* fixed this issue by having a 
static public run() method that can be invoked directly in the tests, and can 
have it's return value (which is what main() uses as an exit code) checked.

So, committers, please ignore the no-tests warning.

 pig -version or pig -help returns exit code of 1
 

 Key: PIG-852
 URL: https://issues.apache.org/jira/browse/PIG-852
 Project: Pig
  Issue Type: Bug
  Components: grunt
Affects Versions: 0.3.0
 Environment: All
Reporter: Milind Bhandarkar
Assignee: Milind Bhandarkar
 Attachments: rc.patch


 {code}
 java -jar pig.jar -x local [-version|-help]
 {code}
 returns an exit code of 1 to the shell.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-849) Local engine loses records in splits

2009-06-16 Thread Gunther Hagleitner (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-849?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gunther Hagleitner updated PIG-849:
---

Status: Patch Available  (was: Open)

 Local engine loses records in splits
 

 Key: PIG-849
 URL: https://issues.apache.org/jira/browse/PIG-849
 Project: Pig
  Issue Type: Bug
Reporter: Gunther Hagleitner
 Attachments: local_engine.patch, local_engine.patch


 When there is a split in the physical plan records can be dropped in certain 
 circumstances.
 The local split operator puts all records in a databag and turns over 
 iterators to the POSplitOutput operators. The problem is that the local split 
 also adds STATUS_NULL records to the bag. That will cause the databag's 
 iterator to prematurely return false on the hasNext call (so a STATUS_NULL 
 becomes a STATUS_EOP in the split output operators).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-849) Local engine loses records in splits

2009-06-16 Thread Gunther Hagleitner (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-849?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gunther Hagleitner updated PIG-849:
---

Attachment: local_engine.patch

 Local engine loses records in splits
 

 Key: PIG-849
 URL: https://issues.apache.org/jira/browse/PIG-849
 Project: Pig
  Issue Type: Bug
Reporter: Gunther Hagleitner
 Attachments: local_engine.patch, local_engine.patch


 When there is a split in the physical plan records can be dropped in certain 
 circumstances.
 The local split operator puts all records in a databag and turns over 
 iterators to the POSplitOutput operators. The problem is that the local split 
 also adds STATUS_NULL records to the bag. That will cause the databag's 
 iterator to prematurely return false on the hasNext call (so a STATUS_NULL 
 becomes a STATUS_EOP in the split output operators).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-849) Local engine loses records in splits

2009-06-16 Thread Gunther Hagleitner (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-849?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12719968#action_12719968
 ] 

Gunther Hagleitner commented on PIG-849:


the new patch has a unit test - otherwise it's the same

 Local engine loses records in splits
 

 Key: PIG-849
 URL: https://issues.apache.org/jira/browse/PIG-849
 Project: Pig
  Issue Type: Bug
Reporter: Gunther Hagleitner
 Attachments: local_engine.patch, local_engine.patch


 When there is a split in the physical plan records can be dropped in certain 
 circumstances.
 The local split operator puts all records in a databag and turns over 
 iterators to the POSplitOutput operators. The problem is that the local split 
 also adds STATUS_NULL records to the bag. That will cause the databag's 
 iterator to prematurely return false on the hasNext call (so a STATUS_NULL 
 becomes a STATUS_EOP in the split output operators).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Build failed in Hudson: Pig-Patch-minerva.apache.org #84

2009-06-16 Thread Apache Hudson Server
See http://hudson.zones.apache.org/hudson/job/Pig-Patch-minerva.apache.org/84/

--
[...truncated 93228 lines...]
 [exec] [junit] 09/06/16 03:05:17 INFO 
executionengine.HExecutionEngine: Connecting to map-reduce job tracker at: 
localhost:37831
 [exec] [junit] 09/06/16 03:05:17 INFO 
mapReduceLayer.MultiQueryOptimizer: MR plan size before optimization: 1
 [exec] [junit] 09/06/16 03:05:17 INFO 
mapReduceLayer.MultiQueryOptimizer: MR plan size after optimization: 1
 [exec] [junit] 09/06/16 03:05:18 INFO dfs.StateChange: BLOCK* ask 
127.0.0.1:47592 to delete  blk_535537321375412239_1005 
blk_-1167458228323501006_1004 blk_-2664737174212614868_1006
 [exec] [junit] 09/06/16 03:05:18 INFO dfs.StateChange: BLOCK* ask 
127.0.0.1:56761 to delete  blk_-1167458228323501006_1004 
blk_-2664737174212614868_1006
 [exec] [junit] 09/06/16 03:05:18 INFO 
mapReduceLayer.JobControlCompiler: Setting up single store job
 [exec] [junit] 09/06/16 03:05:18 WARN mapred.JobClient: Use 
GenericOptionsParser for parsing the arguments. Applications should implement 
Tool for the same.
 [exec] [junit] 09/06/16 03:05:18 INFO dfs.StateChange: BLOCK* 
NameSystem.allocateBlock: 
/tmp/hadoop-hudson/mapred/system/job_200906160304_0002/job.jar. 
blk_-4927217973331313637_1012
 [exec] [junit] 09/06/16 03:05:18 INFO dfs.DataNode: Receiving block 
blk_-4927217973331313637_1012 src: /127.0.0.1:33793 dest: /127.0.0.1:37060
 [exec] [junit] 09/06/16 03:05:18 INFO dfs.DataNode: Receiving block 
blk_-4927217973331313637_1012 src: /127.0.0.1:54343 dest: /127.0.0.1:56761
 [exec] [junit] 09/06/16 03:05:18 INFO dfs.DataNode: Receiving block 
blk_-4927217973331313637_1012 src: /127.0.0.1:38991 dest: /127.0.0.1:47592
 [exec] [junit] 09/06/16 03:05:18 INFO dfs.DataNode: Received block 
blk_-4927217973331313637_1012 of size 1413551 from /127.0.0.1
 [exec] [junit] 09/06/16 03:05:18 INFO dfs.DataNode: PacketResponder 0 
for block blk_-4927217973331313637_1012 terminating
 [exec] [junit] 09/06/16 03:05:18 INFO dfs.StateChange: BLOCK* 
NameSystem.addStoredBlock: blockMap updated: 127.0.0.1:47592 is added to 
blk_-4927217973331313637_1012 size 1413551
 [exec] [junit] 09/06/16 03:05:18 INFO dfs.DataNode: Received block 
blk_-4927217973331313637_1012 of size 1413551 from /127.0.0.1
 [exec] [junit] 09/06/16 03:05:18 INFO dfs.StateChange: BLOCK* 
NameSystem.addStoredBlock: blockMap updated: 127.0.0.1:56761 is added to 
blk_-4927217973331313637_1012 size 1413551
 [exec] [junit] 09/06/16 03:05:18 INFO dfs.DataNode: PacketResponder 1 
for block blk_-4927217973331313637_1012 terminating
 [exec] [junit] 09/06/16 03:05:18 INFO dfs.DataNode: Received block 
blk_-4927217973331313637_1012 of size 1413551 from /127.0.0.1
 [exec] [junit] 09/06/16 03:05:18 INFO dfs.StateChange: BLOCK* 
NameSystem.addStoredBlock: blockMap updated: 127.0.0.1:37060 is added to 
blk_-4927217973331313637_1012 size 1413551
 [exec] [junit] 09/06/16 03:05:18 INFO dfs.DataNode: PacketResponder 2 
for block blk_-4927217973331313637_1012 terminating
 [exec] [junit] 09/06/16 03:05:18 INFO fs.FSNamesystem: Increasing 
replication for file 
/tmp/hadoop-hudson/mapred/system/job_200906160304_0002/job.jar. New replication 
is 2
 [exec] [junit] 09/06/16 03:05:18 INFO fs.FSNamesystem: Reducing 
replication for file 
/tmp/hadoop-hudson/mapred/system/job_200906160304_0002/job.jar. New replication 
is 2
 [exec] [junit] 09/06/16 03:05:18 INFO dfs.StateChange: BLOCK* 
NameSystem.allocateBlock: 
/tmp/hadoop-hudson/mapred/system/job_200906160304_0002/job.split. 
blk_-7989552776344585634_1013
 [exec] [junit] 09/06/16 03:05:18 INFO dfs.DataNode: Receiving block 
blk_-7989552776344585634_1013 src: /127.0.0.1:54345 dest: /127.0.0.1:56761
 [exec] [junit] 09/06/16 03:05:18 INFO dfs.DataNode: Receiving block 
blk_-7989552776344585634_1013 src: /127.0.0.1:33797 dest: /127.0.0.1:37060
 [exec] [junit] 09/06/16 03:05:18 INFO dfs.DataNode: Receiving block 
blk_-7989552776344585634_1013 src: /127.0.0.1:34635 dest: /127.0.0.1:37750
 [exec] [junit] 09/06/16 03:05:18 INFO dfs.DataNode: Received block 
blk_-7989552776344585634_1013 of size 14547 from /127.0.0.1
 [exec] [junit] 09/06/16 03:05:18 INFO dfs.DataNode: PacketResponder 0 
for block blk_-7989552776344585634_1013 terminating
 [exec] [junit] 09/06/16 03:05:18 INFO dfs.DataNode: Received block 
blk_-7989552776344585634_1013 of size 14547 from /127.0.0.1
 [exec] [junit] 09/06/16 03:05:18 INFO dfs.StateChange: BLOCK* 
NameSystem.addStoredBlock: blockMap updated: 127.0.0.1:37750 is added to 
blk_-7989552776344585634_1013 size 14547
 [exec] [junit] 09/06/16 03:05:18 INFO dfs.DataNode: Received block 
blk_-7989552776344585634_1013 of size 14547 from /127.0.0.1
 [exec] [junit] 09/06/16 03:05:18 INFO 

[jira] Commented: (PIG-849) Local engine loses records in splits

2009-06-16 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-849?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12720013#action_12720013
 ] 

Hadoop QA commented on PIG-849:
---

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12410762/local_engine.patch
  against trunk revision 784333.

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 3 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

-1 core tests.  The patch failed core unit tests.

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-minerva.apache.org/84/testReport/
Findbugs warnings: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-minerva.apache.org/84/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-minerva.apache.org/84/console

This message is automatically generated.

 Local engine loses records in splits
 

 Key: PIG-849
 URL: https://issues.apache.org/jira/browse/PIG-849
 Project: Pig
  Issue Type: Bug
Reporter: Gunther Hagleitner
 Attachments: local_engine.patch, local_engine.patch


 When there is a split in the physical plan records can be dropped in certain 
 circumstances.
 The local split operator puts all records in a databag and turns over 
 iterators to the POSplitOutput operators. The problem is that the local split 
 also adds STATUS_NULL records to the bag. That will cause the databag's 
 iterator to prematurely return false on the hasNext call (so a STATUS_NULL 
 becomes a STATUS_EOP in the split output operators).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-851) Map type used as return type in UDFs not recognized at all times

2009-06-16 Thread Jeff Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-851?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeff Zhang updated PIG-851:
---

Status: Patch Available  (was: Open)

 Map type used as return type in UDFs not recognized at all times
 

 Key: PIG-851
 URL: https://issues.apache.org/jira/browse/PIG-851
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.3.0
Reporter: Santhosh Srinivasan
 Fix For: 0.3.0

 Attachments: Pig_815_patch.txt


 When an UDF returns a map and the outputSchema method is not overridden, Pig 
 does not figure out the data type. As a result, the type is set to unknown 
 resulting in run time failure. An example script and UDF follow
 {code}
 public class mapUDF extends EvalFuncMapObject, Object {
 @Override
 public MapObject, Object exec(Tuple input) throws IOException {
 return new HashMapObject, Object();
 }
 //Note that the outputSchema method is commented out
 /*
 @Override
 public Schema outputSchema(Schema input) {
 try {
 return new Schema(new Schema.FieldSchema(null, null, 
 DataType.MAP));
 } catch (FrontendException e) {
 return null;
 }
 }
 */
 {code}
 {code}
 grunt a = load 'student_tab.data';   
 grunt b = foreach a generate EXPLODE(1);
 grunt describe b;
 b: {Unknown}
 grunt dump b;
 2009-06-15 17:59:01,776 [main] INFO  
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
  - Failed!
 2009-06-15 17:59:01,781 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 
 2080: Foreach currently does not handle type Unknown
 {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Build failed in Hudson: Pig-Patch-minerva.apache.org #85

2009-06-16 Thread Apache Hudson Server
See http://hudson.zones.apache.org/hudson/job/Pig-Patch-minerva.apache.org/85/

--
[...truncated 93231 lines...]
 [exec] [junit] 09/06/16 10:12:48 INFO dfs.DataNode: Receiving block 
blk_-1728121942434331266_1011 src: /127.0.0.1:32802 dest: /127.0.0.1:50795
 [exec] [junit] 09/06/16 10:12:48 INFO dfs.DataNode: Received block 
blk_-1728121942434331266_1011 of size 6 from /127.0.0.1
 [exec] [junit] 09/06/16 10:12:48 INFO dfs.DataNode: PacketResponder 0 
for block blk_-1728121942434331266_1011 terminating
 [exec] [junit] 09/06/16 10:12:48 INFO dfs.StateChange: BLOCK* 
NameSystem.addStoredBlock: blockMap updated: 127.0.0.1:50795 is added to 
blk_-1728121942434331266_1011 size 6
 [exec] [junit] 09/06/16 10:12:48 INFO dfs.DataNode: Received block 
blk_-1728121942434331266_1011 of size 6 from /127.0.0.1
 [exec] [junit] 09/06/16 10:12:48 INFO dfs.StateChange: BLOCK* 
NameSystem.addStoredBlock: blockMap updated: 127.0.0.1:33553 is added to 
blk_-1728121942434331266_1011 size 6
 [exec] [junit] 09/06/16 10:12:48 INFO dfs.DataNode: PacketResponder 1 
for block blk_-1728121942434331266_1011 terminating
 [exec] [junit] 09/06/16 10:12:48 INFO dfs.DataNode: Received block 
blk_-1728121942434331266_1011 of size 6 from /127.0.0.1
 [exec] [junit] 09/06/16 10:12:48 INFO dfs.DataNode: PacketResponder 2 
for block blk_-1728121942434331266_1011 terminating
 [exec] [junit] 09/06/16 10:12:48 INFO dfs.StateChange: BLOCK* 
NameSystem.addStoredBlock: blockMap updated: 127.0.0.1:51457 is added to 
blk_-1728121942434331266_1011 size 6
 [exec] [junit] 09/06/16 10:12:48 INFO 
executionengine.HExecutionEngine: Connecting to hadoop file system at: 
hdfs://localhost:59015
 [exec] [junit] 09/06/16 10:12:48 INFO 
executionengine.HExecutionEngine: Connecting to map-reduce job tracker at: 
localhost:44581
 [exec] [junit] 09/06/16 10:12:48 INFO 
mapReduceLayer.MultiQueryOptimizer: MR plan size before optimization: 1
 [exec] [junit] 09/06/16 10:12:48 INFO 
mapReduceLayer.MultiQueryOptimizer: MR plan size after optimization: 1
 [exec] [junit] 09/06/16 10:12:48 INFO dfs.DataNode: Deleting block 
blk_-7543410323506030326_1004 file 
dfs/data/data1/current/blk_-7543410323506030326
 [exec] [junit] 09/06/16 10:12:48 INFO dfs.DataNode: Deleting block 
blk_-2139022524517865788_1005 file 
dfs/data/data2/current/blk_-2139022524517865788
 [exec] [junit] 09/06/16 10:12:49 INFO 
mapReduceLayer.JobControlCompiler: Setting up single store job
 [exec] [junit] 09/06/16 10:12:49 WARN mapred.JobClient: Use 
GenericOptionsParser for parsing the arguments. Applications should implement 
Tool for the same.
 [exec] [junit] 09/06/16 10:12:49 INFO dfs.StateChange: BLOCK* 
NameSystem.allocateBlock: 
/tmp/hadoop-hudson/mapred/system/job_200906161011_0002/job.jar. 
blk_-8770850262137567712_1012
 [exec] [junit] 09/06/16 10:12:49 INFO dfs.DataNode: Receiving block 
blk_-8770850262137567712_1012 src: /127.0.0.1:56452 dest: /127.0.0.1:42603
 [exec] [junit] 09/06/16 10:12:49 INFO dfs.DataNode: Receiving block 
blk_-8770850262137567712_1012 src: /127.0.0.1:33488 dest: /127.0.0.1:51457
 [exec] [junit] 09/06/16 10:12:49 INFO dfs.DataNode: Receiving block 
blk_-8770850262137567712_1012 src: /127.0.0.1:32805 dest: /127.0.0.1:50795
 [exec] [junit] 09/06/16 10:12:50 INFO dfs.DataNode: Received block 
blk_-8770850262137567712_1012 of size 1413708 from /127.0.0.1
 [exec] [junit] 09/06/16 10:12:50 INFO dfs.DataNode: PacketResponder 0 
for block blk_-8770850262137567712_1012 terminating
 [exec] [junit] 09/06/16 10:12:50 INFO dfs.StateChange: BLOCK* 
NameSystem.addStoredBlock: blockMap updated: 127.0.0.1:50795 is added to 
blk_-8770850262137567712_1012 size 1413708
 [exec] [junit] 09/06/16 10:12:50 INFO dfs.DataNode: Received block 
blk_-8770850262137567712_1012 of size 1413708 from /127.0.0.1
 [exec] [junit] 09/06/16 10:12:50 INFO dfs.StateChange: BLOCK* 
NameSystem.addStoredBlock: blockMap updated: 127.0.0.1:51457 is added to 
blk_-8770850262137567712_1012 size 1413708
 [exec] [junit] 09/06/16 10:12:50 INFO dfs.DataNode: PacketResponder 1 
for block blk_-8770850262137567712_1012 terminating
 [exec] [junit] 09/06/16 10:12:50 INFO dfs.DataNode: Received block 
blk_-8770850262137567712_1012 of size 1413708 from /127.0.0.1
 [exec] [junit] 09/06/16 10:12:50 INFO dfs.StateChange: BLOCK* 
NameSystem.addStoredBlock: blockMap updated: 127.0.0.1:42603 is added to 
blk_-8770850262137567712_1012 size 1413708
 [exec] [junit] 09/06/16 10:12:50 INFO dfs.DataNode: PacketResponder 2 
for block blk_-8770850262137567712_1012 terminating
 [exec] [junit] 09/06/16 10:12:50 INFO fs.FSNamesystem: Increasing 
replication for file 
/tmp/hadoop-hudson/mapred/system/job_200906161011_0002/job.jar. New replication 
is 2
 

[jira] Commented: (PIG-851) Map type used as return type in UDFs not recognized at all times

2009-06-16 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12720234#action_12720234
 ] 

Hadoop QA commented on PIG-851:
---

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12410810/Pig_815_patch.txt
  against trunk revision 784333.

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 7 new or modified tests.

-1 javadoc.  The javadoc tool appears to have generated 1 warning messages.

-1 javac.  The applied patch generated 227 javac compiler warnings (more 
than the trunk's current 224 warnings).

+1 findbugs.  The patch does not introduce any new Findbugs warnings.

-1 release audit.  The applied patch generated 162 release audit warnings 
(more than the trunk's current 160 warnings).

-1 core tests.  The patch failed core unit tests.

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-minerva.apache.org/85/testReport/
Release audit warnings: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-minerva.apache.org/85/artifact/trunk/patchprocess/releaseAuditDiffWarnings.txt
Findbugs warnings: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-minerva.apache.org/85/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-minerva.apache.org/85/console

This message is automatically generated.

 Map type used as return type in UDFs not recognized at all times
 

 Key: PIG-851
 URL: https://issues.apache.org/jira/browse/PIG-851
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.3.0
Reporter: Santhosh Srinivasan
 Fix For: 0.3.0

 Attachments: Pig_815_patch.txt


 When an UDF returns a map and the outputSchema method is not overridden, Pig 
 does not figure out the data type. As a result, the type is set to unknown 
 resulting in run time failure. An example script and UDF follow
 {code}
 public class mapUDF extends EvalFuncMapObject, Object {
 @Override
 public MapObject, Object exec(Tuple input) throws IOException {
 return new HashMapObject, Object();
 }
 //Note that the outputSchema method is commented out
 /*
 @Override
 public Schema outputSchema(Schema input) {
 try {
 return new Schema(new Schema.FieldSchema(null, null, 
 DataType.MAP));
 } catch (FrontendException e) {
 return null;
 }
 }
 */
 {code}
 {code}
 grunt a = load 'student_tab.data';   
 grunt b = foreach a generate EXPLODE(1);
 grunt describe b;
 b: {Unknown}
 grunt dump b;
 2009-06-15 17:59:01,776 [main] INFO  
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
  - Failed!
 2009-06-15 17:59:01,781 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 
 2080: Foreach currently does not handle type Unknown
 {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Re: Rewire and multi-query load/store optimization

2009-06-16 Thread Alan Gates
+1 on option one.  The use of store-load was only to overcome a  
temporary problem in Pig.  We've fixed the problem, so let's not  
propagate it.  We will need to document this very clearly (maybe even  
to the point of issuing warnings in the parser when we see this combo)  
so users understand that this is now a hinderance rather than a help.


Alan.

On Jun 12, 2009, at 2:19 PM, Santhosh Srinivasan wrote:


With the implementation of rewire as part of the optimizer
infrastructure, a bug was exposed in the load/store optimization in  
the

multi-query feature. Below, I will articulate the bug and the
ramifications of a few possible solutions.

Load/store optimization in the multi-query feature?
---

If a script has an explicit store and a corresponding load which loads
the output of the store, the store-load combination can be  
optimized. An

example will illustrate the concept.

Pre-conditions:

1. The store location and the load location should match
2. The store format and the load format should be compatible

{code}

A = load 'input';
B = group A by $0;
store B into 'output';
C = load 'output';
D = group C by $0;
store D into 'some_other_output';

{code}

In the script above, the output of the first store serves as input of
the second load (C). In addition, the store and load use  
PigStorage() as

the store/load mechanism. In the logical plan this combination by
splitting B into the store and D.

Bug
---

When the load in the store/load combination was removed, the inner  
plans

of the load's successors (in this case D), were not updated correctly.
As a result, the projections in the inner plans still held  
references to

non-existing operators.

Consequence of the bug fix
---

During the map-reduce (M/R) compilation the split operator is compiled
into a store and a load. Prior to multi-query, for each M/R boundary
resulted in a temporary store using BinStorage. The subsequent load
could infer the type as BinStorage returns typed records, i.e., non- 
byte

array records.

With multi-query and the load/store optimization, the temporary
BinStorage data is not generated. Instead, the subsequent load uses  
the

output of the previous store as its input. Here, the loader can get
typed or untyped records based on the loader. As a result, the  
operators

in the map phase that rely on the type information (inferred from the
logical plan) will fail due to type mismatch.

Possible Solutions
--

Solution 1
==
Switch the load/store optimization. Users were primarily storing
intermediate data within the same script to overcome Pig's limitation,
i.e., absence of the multi-query feature. Going forward, with
multi-query turned on, users who store intermediate data will not  
enjoy

all the benefits of the optimization.

Solution 2
==
After the M/R compilation is completed, during the final pass of the
plan, fix the types of the projections to reflect typed/untyped  
data. In
other words, if the loader is returning typed data then retain the  
types

else change the types to bytearray. In order to make this decision,
loaders should support an interface to indicate if the records are  
typed

or untyped.


Thanks,
Santhosh




[jira] Updated: (PIG-697) Proposed improvements to pig's optimizer

2009-06-16 Thread Santhosh Srinivasan (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-697?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Santhosh Srinivasan updated PIG-697:


Status: In Progress  (was: Patch Available)

 Proposed improvements to pig's optimizer
 

 Key: PIG-697
 URL: https://issues.apache.org/jira/browse/PIG-697
 Project: Pig
  Issue Type: Bug
  Components: impl
Reporter: Alan Gates
Assignee: Santhosh Srinivasan
 Attachments: OptimizerPhase1.patch, OptimizerPhase1_part2.patch, 
 OptimizerPhase2.patch, OptimizerPhase3_parrt1-1.patch, 
 OptimizerPhase3_parrt1.patch


 I propose the following changes to pig optimizer, plan, and operator 
 functionality to support more robust optimization:
 1) Remove the required array from Rule.  This will change rules so that they 
 only match exact patterns instead of allowing missing elements in the pattern.
 This has the downside that if a given rule applies to two patterns (say 
 Load-Filter-Group, Load-Group) you have to write two rules.  But it has 
 the upside that
 the resulting rules know exactly what they are getting.  The original intent 
 of this was to reduce the number of rules that needed to be written.  But the
 resulting rules have do a lot of work to understand the operators they are 
 working with.  With exact matches only, each rule will know exactly the 
 operators it
 is working on and can apply the logic of shifting the operators around.  All 
 four of the existing rules set all entries of required to true, so removing 
 this
 will have no effect on them.
 2) Change PlanOptimizer.optimize to iterate over the rules until there are no 
 conversions or a certain number of iterations has been reached.  Currently the
 function is:
 {code}
 public final void optimize() throws OptimizerException {
 RuleMatcher matcher = new RuleMatcher();
 for (Rule rule : mRules) {
 if (matcher.match(rule)) {
 // It matches the pattern.  Now check if the transformer
 // approves as well.
 ListListO matches = matcher.getAllMatches();
 for (ListO match:matches)
 {
   if (rule.transformer.check(match)) {
   // The transformer approves.
   rule.transformer.transform(match);
   }
 }
 }
 }
 }
 {code}
 It would change to be:
 {code}
 public final void optimize() throws OptimizerException {
 RuleMatcher matcher = new RuleMatcher();
 boolean sawMatch;
 int iterators = 0;
 do {
 sawMatch = false;
 for (Rule rule : mRules) {
 ListListO matches = matcher.getAllMatches();
 for (ListO match:matches) {
 // It matches the pattern.  Now check if the transformer
 // approves as well.
 if (rule.transformer.check(match)) {
 // The transformer approves.
 sawMatch = true;
 rule.transformer.transform(match);
 }
 }
 }
 // Not sure if 1000 is the right number of iterations, maybe it
 // should be configurable so that large scripts don't stop too 
 // early.
 } while (sawMatch  numIterations++  1000);
 }
 {code}
 The reason for limiting the number of iterations is to avoid infinite loops.  
 The reason for iterating over the rules is so that each rule can be applied 
 multiple
 times as necessary.  This allows us to write simple rules, mostly swaps 
 between neighboring operators, without worrying that we get the plan right in 
 one pass.
 For example, we might have a plan that looks like:  
 Load-Join-Filter-Foreach, and we want to optimize it to 
 Load-Foreach-Filter-Join.  With two simple
 rules (swap filter and join and swap foreach and filter), applied 
 iteratively, we can get from the initial to final plan, without needing to 
 understanding the
 big picture of the entire plan.
 3) Add three calls to OperatorPlan:
 {code}
 /**
  * Swap two operators in a plan.  Both of the operators must have single
  * inputs and single outputs.
  * @param first operator
  * @param second operator
  * @throws PlanException if either operator is not single input and output.
  */
 public void swap(E first, E second) throws PlanException {
 ...
 }
 /**
  * Push one operator in front of another.  This function is for use when
  * the first operator has multiple inputs.  The caller can specify
  * which input of the first operator the second operator should be pushed to.
  * @param first operator, assumed to have multiple inputs.
  * @param second operator, will be pushed in front of 

[jira] Commented: (PIG-842) PigStorage should support multi-byte delimiters

2009-06-16 Thread Alan Gates (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-842?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12720259#action_12720259
 ] 

Alan Gates commented on PIG-842:


I'm concerned about the performance hit of supporting multi-byte comparators.  
Before we commit to doing this in PigStorage, we should test how much it slows 
down reading data.  If it is significant, we should consider having a 
PigMultiByteStorage or something that handles multi-byte delimiter characters.  
It could extend PigStorage and only differ in how it parses the records.

 PigStorage should support multi-byte delimiters
 ---

 Key: PIG-842
 URL: https://issues.apache.org/jira/browse/PIG-842
 Project: Pig
  Issue Type: Improvement
  Components: impl
Affects Versions: 0.3.0
Reporter: Santhosh Srinivasan
 Fix For: 0.3.0


 Currently, PigStorage supports single byte delimiters. Users have requested 
 mult-byte delimiters. There are performance implications with multi-byte 
 delimiters. i.e., instead of looking for a single byte, PigStorage should 
 look for a pattern ala BinStorage.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-697) Proposed improvements to pig's optimizer

2009-06-16 Thread Santhosh Srinivasan (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-697?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Santhosh Srinivasan updated PIG-697:


Status: Patch Available  (was: In Progress)

 Proposed improvements to pig's optimizer
 

 Key: PIG-697
 URL: https://issues.apache.org/jira/browse/PIG-697
 Project: Pig
  Issue Type: Bug
  Components: impl
Reporter: Alan Gates
Assignee: Santhosh Srinivasan
 Attachments: OptimizerPhase1.patch, OptimizerPhase1_part2.patch, 
 OptimizerPhase2.patch, OptimizerPhase3_parrt1-1.patch, 
 OptimizerPhase3_parrt1.patch, OptimizerPhase3_part2_1.patch


 I propose the following changes to pig optimizer, plan, and operator 
 functionality to support more robust optimization:
 1) Remove the required array from Rule.  This will change rules so that they 
 only match exact patterns instead of allowing missing elements in the pattern.
 This has the downside that if a given rule applies to two patterns (say 
 Load-Filter-Group, Load-Group) you have to write two rules.  But it has 
 the upside that
 the resulting rules know exactly what they are getting.  The original intent 
 of this was to reduce the number of rules that needed to be written.  But the
 resulting rules have do a lot of work to understand the operators they are 
 working with.  With exact matches only, each rule will know exactly the 
 operators it
 is working on and can apply the logic of shifting the operators around.  All 
 four of the existing rules set all entries of required to true, so removing 
 this
 will have no effect on them.
 2) Change PlanOptimizer.optimize to iterate over the rules until there are no 
 conversions or a certain number of iterations has been reached.  Currently the
 function is:
 {code}
 public final void optimize() throws OptimizerException {
 RuleMatcher matcher = new RuleMatcher();
 for (Rule rule : mRules) {
 if (matcher.match(rule)) {
 // It matches the pattern.  Now check if the transformer
 // approves as well.
 ListListO matches = matcher.getAllMatches();
 for (ListO match:matches)
 {
   if (rule.transformer.check(match)) {
   // The transformer approves.
   rule.transformer.transform(match);
   }
 }
 }
 }
 }
 {code}
 It would change to be:
 {code}
 public final void optimize() throws OptimizerException {
 RuleMatcher matcher = new RuleMatcher();
 boolean sawMatch;
 int iterators = 0;
 do {
 sawMatch = false;
 for (Rule rule : mRules) {
 ListListO matches = matcher.getAllMatches();
 for (ListO match:matches) {
 // It matches the pattern.  Now check if the transformer
 // approves as well.
 if (rule.transformer.check(match)) {
 // The transformer approves.
 sawMatch = true;
 rule.transformer.transform(match);
 }
 }
 }
 // Not sure if 1000 is the right number of iterations, maybe it
 // should be configurable so that large scripts don't stop too 
 // early.
 } while (sawMatch  numIterations++  1000);
 }
 {code}
 The reason for limiting the number of iterations is to avoid infinite loops.  
 The reason for iterating over the rules is so that each rule can be applied 
 multiple
 times as necessary.  This allows us to write simple rules, mostly swaps 
 between neighboring operators, without worrying that we get the plan right in 
 one pass.
 For example, we might have a plan that looks like:  
 Load-Join-Filter-Foreach, and we want to optimize it to 
 Load-Foreach-Filter-Join.  With two simple
 rules (swap filter and join and swap foreach and filter), applied 
 iteratively, we can get from the initial to final plan, without needing to 
 understanding the
 big picture of the entire plan.
 3) Add three calls to OperatorPlan:
 {code}
 /**
  * Swap two operators in a plan.  Both of the operators must have single
  * inputs and single outputs.
  * @param first operator
  * @param second operator
  * @throws PlanException if either operator is not single input and output.
  */
 public void swap(E first, E second) throws PlanException {
 ...
 }
 /**
  * Push one operator in front of another.  This function is for use when
  * the first operator has multiple inputs.  The caller can specify
  * which input of the first operator the second operator should be pushed to.
  * @param first operator, assumed to have multiple inputs.
  * @param second 

[jira] Commented: (PIG-851) Map type used as return type in UDFs not recognized at all times

2009-06-16 Thread Santhosh Srinivasan (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12720292#action_12720292
 ] 

Santhosh Srinivasan commented on PIG-851:
-

Review comments:

1. The new sources test/org/apache/pig/test/utils/MyUDFReturnMap.java and 
test/org/apache/pig/test/TestUDFReturnMap.java need to include the Apache 
license headers
2. The use of package 
sun.reflect.generics.reflectiveObjects.ParameterizedTypeImpl is resulting in 3 
compiler warnings and 1 javadoc warning. Can we use a different package?
3. The test case in TestUDFReturnMap runs the test in local mode (i.e., 
ExecType.LOCAL). Another test for map reduce mode, ExecType.MAPREDUCE, should 
be added.

 Map type used as return type in UDFs not recognized at all times
 

 Key: PIG-851
 URL: https://issues.apache.org/jira/browse/PIG-851
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.3.0
Reporter: Santhosh Srinivasan
 Fix For: 0.3.0

 Attachments: Pig_815_patch.txt


 When an UDF returns a map and the outputSchema method is not overridden, Pig 
 does not figure out the data type. As a result, the type is set to unknown 
 resulting in run time failure. An example script and UDF follow
 {code}
 public class mapUDF extends EvalFuncMapObject, Object {
 @Override
 public MapObject, Object exec(Tuple input) throws IOException {
 return new HashMapObject, Object();
 }
 //Note that the outputSchema method is commented out
 /*
 @Override
 public Schema outputSchema(Schema input) {
 try {
 return new Schema(new Schema.FieldSchema(null, null, 
 DataType.MAP));
 } catch (FrontendException e) {
 return null;
 }
 }
 */
 {code}
 {code}
 grunt a = load 'student_tab.data';   
 grunt b = foreach a generate EXPLODE(1);
 grunt describe b;
 b: {Unknown}
 grunt dump b;
 2009-06-15 17:59:01,776 [main] INFO  
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
  - Failed!
 2009-06-15 17:59:01,781 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 
 2080: Foreach currently does not handle type Unknown
 {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-797) Limit with ORDER BY producing wrong results

2009-06-16 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-797?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-797:
---

Status: Open  (was: Patch Available)

 Limit with ORDER BY producing wrong results
 ---

 Key: PIG-797
 URL: https://issues.apache.org/jira/browse/PIG-797
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.2.0
Reporter: Olga Natkovich
 Fix For: site

 Attachments: PIG-797.patch


 Query:
 A = load 'studenttab10k' as (name, age, gpa);
 B = group A by name;
 C = foreach B generate group, SUM(A.gpa) as rev;
 D = order C by rev;
 E = limit D 10;
 dump E;
 Output:
 (alice king,31.7)
 (alice laertes,26.453)
 (alice thompson,25.867)
 (alice van buren,23.59)
 (bob allen,19.902)
 (bob ichabod,29.0)
 (bob king,28.454)
 (bob miller,10.28)
 (bob underhill,28.137)
 (bob van buren,25.992)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-797) Limit with ORDER BY producing wrong results

2009-06-16 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-797?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-797:
---

Fix Version/s: (was: site)
   0.3.0
 Assignee: Daniel Dai
   Status: Patch Available  (was: Open)

 Limit with ORDER BY producing wrong results
 ---

 Key: PIG-797
 URL: https://issues.apache.org/jira/browse/PIG-797
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.2.0
Reporter: Olga Natkovich
Assignee: Daniel Dai
 Fix For: 0.3.0

 Attachments: PIG-797-2.patch, PIG-797.patch


 Query:
 A = load 'studenttab10k' as (name, age, gpa);
 B = group A by name;
 C = foreach B generate group, SUM(A.gpa) as rev;
 D = order C by rev;
 E = limit D 10;
 dump E;
 Output:
 (alice king,31.7)
 (alice laertes,26.453)
 (alice thompson,25.867)
 (alice van buren,23.59)
 (bob allen,19.902)
 (bob ichabod,29.0)
 (bob king,28.454)
 (bob miller,10.28)
 (bob underhill,28.137)
 (bob van buren,25.992)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-797) Limit with ORDER BY producing wrong results

2009-06-16 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-797?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-797:
---

Attachment: PIG-797-2.patch

New patch solve the findbug issues and add testcase.

 Limit with ORDER BY producing wrong results
 ---

 Key: PIG-797
 URL: https://issues.apache.org/jira/browse/PIG-797
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.2.0
Reporter: Olga Natkovich
 Fix For: 0.3.0

 Attachments: PIG-797-2.patch, PIG-797.patch


 Query:
 A = load 'studenttab10k' as (name, age, gpa);
 B = group A by name;
 C = foreach B generate group, SUM(A.gpa) as rev;
 D = order C by rev;
 E = limit D 10;
 dump E;
 Output:
 (alice king,31.7)
 (alice laertes,26.453)
 (alice thompson,25.867)
 (alice van buren,23.59)
 (bob allen,19.902)
 (bob ichabod,29.0)
 (bob king,28.454)
 (bob miller,10.28)
 (bob underhill,28.137)
 (bob van buren,25.992)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-849) Local engine loses records in splits

2009-06-16 Thread Gunther Hagleitner (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-849?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12720306#action_12720306
 ] 

Gunther Hagleitner commented on PIG-849:


Same errors as before. Ran manually and passed. The issue with the automated 
patch testing seems to be still there.

 Local engine loses records in splits
 

 Key: PIG-849
 URL: https://issues.apache.org/jira/browse/PIG-849
 Project: Pig
  Issue Type: Bug
Reporter: Gunther Hagleitner
 Attachments: local_engine.patch, local_engine.patch


 When there is a split in the physical plan records can be dropped in certain 
 circumstances.
 The local split operator puts all records in a databag and turns over 
 iterators to the POSplitOutput operators. The problem is that the local split 
 also adds STATUS_NULL records to the bag. That will cause the databag's 
 iterator to prematurely return false on the hasNext call (so a STATUS_NULL 
 becomes a STATUS_EOP in the split output operators).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Build failed in Hudson: Pig-Patch-minerva.apache.org #86

2009-06-16 Thread Apache Hudson Server
See http://hudson.zones.apache.org/hudson/job/Pig-Patch-minerva.apache.org/86/

--
[...truncated 93151 lines...]
 [exec] [junit] 09/06/16 13:39:32 INFO dfs.StateChange: BLOCK* 
NameSystem.allocateBlock: /user/hudson/input2.txt. blk_-8461596318273395362_1011
 [exec] [junit] 09/06/16 13:39:32 INFO dfs.DataNode: Receiving block 
blk_-8461596318273395362_1011 src: /127.0.0.1:41010 dest: /127.0.0.1:50390
 [exec] [junit] 09/06/16 13:39:32 INFO dfs.DataNode: Receiving block 
blk_-8461596318273395362_1011 src: /127.0.0.1:42440 dest: /127.0.0.1:41357
 [exec] [junit] 09/06/16 13:39:32 INFO dfs.DataNode: Receiving block 
blk_-8461596318273395362_1011 src: /127.0.0.1:50128 dest: /127.0.0.1:49392
 [exec] [junit] 09/06/16 13:39:32 INFO dfs.DataNode: Received block 
blk_-8461596318273395362_1011 of size 6 from /127.0.0.1
 [exec] [junit] 09/06/16 13:39:32 INFO dfs.DataNode: PacketResponder 0 
for block blk_-8461596318273395362_1011 terminating
 [exec] [junit] 09/06/16 13:39:32 INFO dfs.StateChange: BLOCK* 
NameSystem.addStoredBlock: blockMap updated: 127.0.0.1:49392 is added to 
blk_-8461596318273395362_1011 size 6
 [exec] [junit] 09/06/16 13:39:32 INFO dfs.DataNode: Received block 
blk_-8461596318273395362_1011 of size 6 from /127.0.0.1
 [exec] [junit] 09/06/16 13:39:32 INFO dfs.DataNode: PacketResponder 1 
for block blk_-8461596318273395362_1011 terminating
 [exec] [junit] 09/06/16 13:39:32 INFO dfs.StateChange: BLOCK* 
NameSystem.addStoredBlock: blockMap updated: 127.0.0.1:41357 is added to 
blk_-8461596318273395362_1011 size 6
 [exec] [junit] 09/06/16 13:39:32 INFO dfs.DataNode: Received block 
blk_-8461596318273395362_1011 of size 6 from /127.0.0.1
 [exec] [junit] 09/06/16 13:39:32 INFO dfs.DataNode: PacketResponder 2 
for block blk_-8461596318273395362_1011 terminating
 [exec] [junit] 09/06/16 13:39:32 INFO dfs.StateChange: BLOCK* 
NameSystem.addStoredBlock: blockMap updated: 127.0.0.1:50390 is added to 
blk_-8461596318273395362_1011 size 6
 [exec] [junit] 09/06/16 13:39:32 INFO 
executionengine.HExecutionEngine: Connecting to hadoop file system at: 
hdfs://localhost:59223
 [exec] [junit] 09/06/16 13:39:32 INFO 
executionengine.HExecutionEngine: Connecting to map-reduce job tracker at: 
localhost:52520
 [exec] [junit] 09/06/16 13:39:32 INFO 
mapReduceLayer.MultiQueryOptimizer: MR plan size before optimization: 1
 [exec] [junit] 09/06/16 13:39:32 INFO 
mapReduceLayer.MultiQueryOptimizer: MR plan size after optimization: 1
 [exec] [junit] 09/06/16 13:39:32 INFO dfs.StateChange: BLOCK* ask 
127.0.0.1:49392 to delete  blk_3740516381899847209_1004 
blk_-1227787253757277492_1006 blk_-2388033644763996144_1005
 [exec] [junit] 09/06/16 13:39:33 INFO 
mapReduceLayer.JobControlCompiler: Setting up single store job
 [exec] [junit] 09/06/16 13:39:33 WARN mapred.JobClient: Use 
GenericOptionsParser for parsing the arguments. Applications should implement 
Tool for the same.
 [exec] [junit] 09/06/16 13:39:33 INFO dfs.StateChange: BLOCK* 
NameSystem.allocateBlock: 
/tmp/hadoop-hudson/mapred/system/job_200906161338_0002/job.jar. 
blk_-5654994480572345644_1012
 [exec] [junit] 09/06/16 13:39:33 INFO dfs.DataNode: Receiving block 
blk_-5654994480572345644_1012 src: /127.0.0.1:50129 dest: /127.0.0.1:49392
 [exec] [junit] 09/06/16 13:39:33 INFO dfs.DataNode: Receiving block 
blk_-5654994480572345644_1012 src: /127.0.0.1:53783 dest: /127.0.0.1:40787
 [exec] [junit] 09/06/16 13:39:33 INFO dfs.DataNode: Receiving block 
blk_-5654994480572345644_1012 src: /127.0.0.1:42444 dest: /127.0.0.1:41357
 [exec] [junit] 09/06/16 13:39:33 INFO dfs.DataNode: Received block 
blk_-5654994480572345644_1012 of size 1425701 from /127.0.0.1
 [exec] [junit] 09/06/16 13:39:33 INFO dfs.DataNode: PacketResponder 0 
for block blk_-5654994480572345644_1012 terminating
 [exec] [junit] 09/06/16 13:39:33 INFO dfs.StateChange: BLOCK* 
NameSystem.addStoredBlock: blockMap updated: 127.0.0.1:41357 is added to 
blk_-5654994480572345644_1012 size 1425701
 [exec] [junit] 09/06/16 13:39:33 INFO dfs.DataNode: Received block 
blk_-5654994480572345644_1012 of size 1425701 from /127.0.0.1
 [exec] [junit] 09/06/16 13:39:33 INFO dfs.DataNode: PacketResponder 1 
for block blk_-5654994480572345644_1012 terminating
 [exec] [junit] 09/06/16 13:39:33 INFO dfs.StateChange: BLOCK* 
NameSystem.addStoredBlock: blockMap updated: 127.0.0.1:40787 is added to 
blk_-5654994480572345644_1012 size 1425701
 [exec] [junit] 09/06/16 13:39:33 INFO dfs.DataNode: Received block 
blk_-5654994480572345644_1012 of size 1425701 from /127.0.0.1
 [exec] [junit] 09/06/16 13:39:33 INFO dfs.StateChange: BLOCK* 
NameSystem.addStoredBlock: blockMap updated: 127.0.0.1:49392 is added to 
blk_-5654994480572345644_1012 size 

[jira] Updated: (PIG-849) Local engine loses records in splits

2009-06-16 Thread Olga Natkovich (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-849?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Olga Natkovich updated PIG-849:
---

Resolution: Fixed
Status: Resolved  (was: Patch Available)

patch committed. Thanks Gunther for contributing.

 Local engine loses records in splits
 

 Key: PIG-849
 URL: https://issues.apache.org/jira/browse/PIG-849
 Project: Pig
  Issue Type: Bug
Reporter: Gunther Hagleitner
 Attachments: local_engine.patch, local_engine.patch


 When there is a split in the physical plan records can be dropped in certain 
 circumstances.
 The local split operator puts all records in a databag and turns over 
 iterators to the POSplitOutput operators. The problem is that the local split 
 also adds STATUS_NULL records to the bag. That will cause the databag's 
 iterator to prematurely return false on the hasNext call (so a STATUS_NULL 
 becomes a STATUS_EOP in the split output operators).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-850) Dump produce wrong result while store into is ok

2009-06-16 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-850?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-850:
---

Attachment: PIG-850.patch

When we add extra limit map-reduce operator (see 
[PIG-364|http://issues.apache.org/jira/browse/PIG-364]), we should mark the 
output file in the original map-reduce as temporary; Otherwise, dump will pick 
the wrong output file.

 Dump produce wrong result while store into is ok
 --

 Key: PIG-850
 URL: https://issues.apache.org/jira/browse/PIG-850
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.2.0
Reporter: Daniel Dai
Assignee: Daniel Dai
 Fix For: 0.3.0

 Attachments: PIG-850.patch


 The following script will wrongly produce 20 output, however, if we change 
 dump to store into, the result is correct. Not sure if the problem is only 
 for limited sort case.
 A = load '/user/pig/tests/data/singlefile/studenttab10k' as (name, age, gpa);
 B = order A by gpa parallel 2;
 C = limit B 10;
 dump C;

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-850) Dump produce wrong result while store into is ok

2009-06-16 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-850?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-850:
---

Status: Patch Available  (was: Open)

 Dump produce wrong result while store into is ok
 --

 Key: PIG-850
 URL: https://issues.apache.org/jira/browse/PIG-850
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.2.0
Reporter: Daniel Dai
Assignee: Daniel Dai
 Fix For: 0.3.0

 Attachments: PIG-850.patch


 The following script will wrongly produce 20 output, however, if we change 
 dump to store into, the result is correct. Not sure if the problem is only 
 for limited sort case.
 A = load '/user/pig/tests/data/singlefile/studenttab10k' as (name, age, gpa);
 B = order A by gpa parallel 2;
 C = limit B 10;
 dump C;

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Hudson build is back to normal: Pig-Patch-minerva.apache.org #87

2009-06-16 Thread Apache Hudson Server
See 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-minerva.apache.org/87/changes




[jira] Commented: (PIG-797) Limit with ORDER BY producing wrong results

2009-06-16 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-797?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12720369#action_12720369
 ] 

Hadoop QA commented on PIG-797:
---

+1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12410843/PIG-797-2.patch
  against trunk revision 785371.

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 3 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

+1 core tests.  The patch passed core unit tests.

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-minerva.apache.org/87/testReport/
Findbugs warnings: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-minerva.apache.org/87/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-minerva.apache.org/87/console

This message is automatically generated.

 Limit with ORDER BY producing wrong results
 ---

 Key: PIG-797
 URL: https://issues.apache.org/jira/browse/PIG-797
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.2.0
Reporter: Olga Natkovich
Assignee: Daniel Dai
 Fix For: 0.3.0

 Attachments: PIG-797-2.patch, PIG-797.patch


 Query:
 A = load 'studenttab10k' as (name, age, gpa);
 B = group A by name;
 C = foreach B generate group, SUM(A.gpa) as rev;
 D = order C by rev;
 E = limit D 10;
 dump E;
 Output:
 (alice king,31.7)
 (alice laertes,26.453)
 (alice thompson,25.867)
 (alice van buren,23.59)
 (bob allen,19.902)
 (bob ichabod,29.0)
 (bob king,28.454)
 (bob miller,10.28)
 (bob underhill,28.137)
 (bob van buren,25.992)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-697) Proposed improvements to pig's optimizer

2009-06-16 Thread Santhosh Srinivasan (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-697?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Santhosh Srinivasan updated PIG-697:


Status: In Progress  (was: Patch Available)

 Proposed improvements to pig's optimizer
 

 Key: PIG-697
 URL: https://issues.apache.org/jira/browse/PIG-697
 Project: Pig
  Issue Type: Bug
  Components: impl
Reporter: Alan Gates
Assignee: Santhosh Srinivasan
 Attachments: OptimizerPhase1.patch, OptimizerPhase1_part2.patch, 
 OptimizerPhase2.patch, OptimizerPhase3_parrt1-1.patch, 
 OptimizerPhase3_parrt1.patch


 I propose the following changes to pig optimizer, plan, and operator 
 functionality to support more robust optimization:
 1) Remove the required array from Rule.  This will change rules so that they 
 only match exact patterns instead of allowing missing elements in the pattern.
 This has the downside that if a given rule applies to two patterns (say 
 Load-Filter-Group, Load-Group) you have to write two rules.  But it has 
 the upside that
 the resulting rules know exactly what they are getting.  The original intent 
 of this was to reduce the number of rules that needed to be written.  But the
 resulting rules have do a lot of work to understand the operators they are 
 working with.  With exact matches only, each rule will know exactly the 
 operators it
 is working on and can apply the logic of shifting the operators around.  All 
 four of the existing rules set all entries of required to true, so removing 
 this
 will have no effect on them.
 2) Change PlanOptimizer.optimize to iterate over the rules until there are no 
 conversions or a certain number of iterations has been reached.  Currently the
 function is:
 {code}
 public final void optimize() throws OptimizerException {
 RuleMatcher matcher = new RuleMatcher();
 for (Rule rule : mRules) {
 if (matcher.match(rule)) {
 // It matches the pattern.  Now check if the transformer
 // approves as well.
 ListListO matches = matcher.getAllMatches();
 for (ListO match:matches)
 {
   if (rule.transformer.check(match)) {
   // The transformer approves.
   rule.transformer.transform(match);
   }
 }
 }
 }
 }
 {code}
 It would change to be:
 {code}
 public final void optimize() throws OptimizerException {
 RuleMatcher matcher = new RuleMatcher();
 boolean sawMatch;
 int iterators = 0;
 do {
 sawMatch = false;
 for (Rule rule : mRules) {
 ListListO matches = matcher.getAllMatches();
 for (ListO match:matches) {
 // It matches the pattern.  Now check if the transformer
 // approves as well.
 if (rule.transformer.check(match)) {
 // The transformer approves.
 sawMatch = true;
 rule.transformer.transform(match);
 }
 }
 }
 // Not sure if 1000 is the right number of iterations, maybe it
 // should be configurable so that large scripts don't stop too 
 // early.
 } while (sawMatch  numIterations++  1000);
 }
 {code}
 The reason for limiting the number of iterations is to avoid infinite loops.  
 The reason for iterating over the rules is so that each rule can be applied 
 multiple
 times as necessary.  This allows us to write simple rules, mostly swaps 
 between neighboring operators, without worrying that we get the plan right in 
 one pass.
 For example, we might have a plan that looks like:  
 Load-Join-Filter-Foreach, and we want to optimize it to 
 Load-Foreach-Filter-Join.  With two simple
 rules (swap filter and join and swap foreach and filter), applied 
 iteratively, we can get from the initial to final plan, without needing to 
 understanding the
 big picture of the entire plan.
 3) Add three calls to OperatorPlan:
 {code}
 /**
  * Swap two operators in a plan.  Both of the operators must have single
  * inputs and single outputs.
  * @param first operator
  * @param second operator
  * @throws PlanException if either operator is not single input and output.
  */
 public void swap(E first, E second) throws PlanException {
 ...
 }
 /**
  * Push one operator in front of another.  This function is for use when
  * the first operator has multiple inputs.  The caller can specify
  * which input of the first operator the second operator should be pushed to.
  * @param first operator, assumed to have multiple inputs.
  * @param second operator, will be pushed in front of 

[jira] Updated: (PIG-697) Proposed improvements to pig's optimizer

2009-06-16 Thread Santhosh Srinivasan (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-697?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Santhosh Srinivasan updated PIG-697:


Status: Patch Available  (was: In Progress)

 Proposed improvements to pig's optimizer
 

 Key: PIG-697
 URL: https://issues.apache.org/jira/browse/PIG-697
 Project: Pig
  Issue Type: Bug
  Components: impl
Reporter: Alan Gates
Assignee: Santhosh Srinivasan
 Attachments: OptimizerPhase1.patch, OptimizerPhase1_part2.patch, 
 OptimizerPhase2.patch, OptimizerPhase3_parrt1-1.patch, 
 OptimizerPhase3_parrt1.patch, OptimizerPhase3_part2_2.patch


 I propose the following changes to pig optimizer, plan, and operator 
 functionality to support more robust optimization:
 1) Remove the required array from Rule.  This will change rules so that they 
 only match exact patterns instead of allowing missing elements in the pattern.
 This has the downside that if a given rule applies to two patterns (say 
 Load-Filter-Group, Load-Group) you have to write two rules.  But it has 
 the upside that
 the resulting rules know exactly what they are getting.  The original intent 
 of this was to reduce the number of rules that needed to be written.  But the
 resulting rules have do a lot of work to understand the operators they are 
 working with.  With exact matches only, each rule will know exactly the 
 operators it
 is working on and can apply the logic of shifting the operators around.  All 
 four of the existing rules set all entries of required to true, so removing 
 this
 will have no effect on them.
 2) Change PlanOptimizer.optimize to iterate over the rules until there are no 
 conversions or a certain number of iterations has been reached.  Currently the
 function is:
 {code}
 public final void optimize() throws OptimizerException {
 RuleMatcher matcher = new RuleMatcher();
 for (Rule rule : mRules) {
 if (matcher.match(rule)) {
 // It matches the pattern.  Now check if the transformer
 // approves as well.
 ListListO matches = matcher.getAllMatches();
 for (ListO match:matches)
 {
   if (rule.transformer.check(match)) {
   // The transformer approves.
   rule.transformer.transform(match);
   }
 }
 }
 }
 }
 {code}
 It would change to be:
 {code}
 public final void optimize() throws OptimizerException {
 RuleMatcher matcher = new RuleMatcher();
 boolean sawMatch;
 int iterators = 0;
 do {
 sawMatch = false;
 for (Rule rule : mRules) {
 ListListO matches = matcher.getAllMatches();
 for (ListO match:matches) {
 // It matches the pattern.  Now check if the transformer
 // approves as well.
 if (rule.transformer.check(match)) {
 // The transformer approves.
 sawMatch = true;
 rule.transformer.transform(match);
 }
 }
 }
 // Not sure if 1000 is the right number of iterations, maybe it
 // should be configurable so that large scripts don't stop too 
 // early.
 } while (sawMatch  numIterations++  1000);
 }
 {code}
 The reason for limiting the number of iterations is to avoid infinite loops.  
 The reason for iterating over the rules is so that each rule can be applied 
 multiple
 times as necessary.  This allows us to write simple rules, mostly swaps 
 between neighboring operators, without worrying that we get the plan right in 
 one pass.
 For example, we might have a plan that looks like:  
 Load-Join-Filter-Foreach, and we want to optimize it to 
 Load-Foreach-Filter-Join.  With two simple
 rules (swap filter and join and swap foreach and filter), applied 
 iteratively, we can get from the initial to final plan, without needing to 
 understanding the
 big picture of the entire plan.
 3) Add three calls to OperatorPlan:
 {code}
 /**
  * Swap two operators in a plan.  Both of the operators must have single
  * inputs and single outputs.
  * @param first operator
  * @param second operator
  * @throws PlanException if either operator is not single input and output.
  */
 public void swap(E first, E second) throws PlanException {
 ...
 }
 /**
  * Push one operator in front of another.  This function is for use when
  * the first operator has multiple inputs.  The caller can specify
  * which input of the first operator the second operator should be pushed to.
  * @param first operator, assumed to have multiple inputs.
  * @param second 

[jira] Updated: (PIG-697) Proposed improvements to pig's optimizer

2009-06-16 Thread Santhosh Srinivasan (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-697?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Santhosh Srinivasan updated PIG-697:


Attachment: OptimizerPhase3_part2_2.patch

Attached patch fixes the findbug warning, and cleans up the sources by removing 
commented out code. The additional 35 compiler warning messages are related to 
type inference. At this point these messages are harmless.

 Proposed improvements to pig's optimizer
 

 Key: PIG-697
 URL: https://issues.apache.org/jira/browse/PIG-697
 Project: Pig
  Issue Type: Bug
  Components: impl
Reporter: Alan Gates
Assignee: Santhosh Srinivasan
 Attachments: OptimizerPhase1.patch, OptimizerPhase1_part2.patch, 
 OptimizerPhase2.patch, OptimizerPhase3_parrt1-1.patch, 
 OptimizerPhase3_parrt1.patch, OptimizerPhase3_part2_2.patch


 I propose the following changes to pig optimizer, plan, and operator 
 functionality to support more robust optimization:
 1) Remove the required array from Rule.  This will change rules so that they 
 only match exact patterns instead of allowing missing elements in the pattern.
 This has the downside that if a given rule applies to two patterns (say 
 Load-Filter-Group, Load-Group) you have to write two rules.  But it has 
 the upside that
 the resulting rules know exactly what they are getting.  The original intent 
 of this was to reduce the number of rules that needed to be written.  But the
 resulting rules have do a lot of work to understand the operators they are 
 working with.  With exact matches only, each rule will know exactly the 
 operators it
 is working on and can apply the logic of shifting the operators around.  All 
 four of the existing rules set all entries of required to true, so removing 
 this
 will have no effect on them.
 2) Change PlanOptimizer.optimize to iterate over the rules until there are no 
 conversions or a certain number of iterations has been reached.  Currently the
 function is:
 {code}
 public final void optimize() throws OptimizerException {
 RuleMatcher matcher = new RuleMatcher();
 for (Rule rule : mRules) {
 if (matcher.match(rule)) {
 // It matches the pattern.  Now check if the transformer
 // approves as well.
 ListListO matches = matcher.getAllMatches();
 for (ListO match:matches)
 {
   if (rule.transformer.check(match)) {
   // The transformer approves.
   rule.transformer.transform(match);
   }
 }
 }
 }
 }
 {code}
 It would change to be:
 {code}
 public final void optimize() throws OptimizerException {
 RuleMatcher matcher = new RuleMatcher();
 boolean sawMatch;
 int iterators = 0;
 do {
 sawMatch = false;
 for (Rule rule : mRules) {
 ListListO matches = matcher.getAllMatches();
 for (ListO match:matches) {
 // It matches the pattern.  Now check if the transformer
 // approves as well.
 if (rule.transformer.check(match)) {
 // The transformer approves.
 sawMatch = true;
 rule.transformer.transform(match);
 }
 }
 }
 // Not sure if 1000 is the right number of iterations, maybe it
 // should be configurable so that large scripts don't stop too 
 // early.
 } while (sawMatch  numIterations++  1000);
 }
 {code}
 The reason for limiting the number of iterations is to avoid infinite loops.  
 The reason for iterating over the rules is so that each rule can be applied 
 multiple
 times as necessary.  This allows us to write simple rules, mostly swaps 
 between neighboring operators, without worrying that we get the plan right in 
 one pass.
 For example, we might have a plan that looks like:  
 Load-Join-Filter-Foreach, and we want to optimize it to 
 Load-Foreach-Filter-Join.  With two simple
 rules (swap filter and join and swap foreach and filter), applied 
 iteratively, we can get from the initial to final plan, without needing to 
 understanding the
 big picture of the entire plan.
 3) Add three calls to OperatorPlan:
 {code}
 /**
  * Swap two operators in a plan.  Both of the operators must have single
  * inputs and single outputs.
  * @param first operator
  * @param second operator
  * @throws PlanException if either operator is not single input and output.
  */
 public void swap(E first, E second) throws PlanException {
 ...
 }
 /**
  * Push one operator in front of another.  This function is for use when
  * the first 

[jira] Commented: (PIG-854) The automated build process should publish the diff in compiler warning messages

2009-06-16 Thread Nigel Daley (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-854?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12720384#action_12720384
 ] 

Nigel Daley commented on PIG-854:
-

Get the project javac and javadoc warnings to 0 and then this becomes a 
non-issue.  That's what Hadoop is working on.

 The automated build process should publish the diff in compiler warning 
 messages
 

 Key: PIG-854
 URL: https://issues.apache.org/jira/browse/PIG-854
 Project: Pig
  Issue Type: Bug
  Components: build
Affects Versions: 0.3.0
 Environment: Hudson
Reporter: Santhosh Srinivasan
Assignee: Giridharan Kesavan
 Fix For: 0.3.0


 Currently, the automated build process publishes a report that captures the 
 difference in the number of warning messages due to a patch from that of 
 trunk However, the details of the new warning messages are not listed. For 
 findbugs, a url that contains the details is published. A similar page is 
 required for the compiler warning messages.
 For reference, check out 
 https://issues.apache.org/jira/browse/PIG-697?focusedCommentId=12720326page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12720326
 The console output has the following line. Note that the details are stored 
 on the build machine. Its not exported as a web page for users to figure out 
 the details. At least,  I was not able to do so. It would be extremely 
 helpful (I would say mandatory) to expose the details. If these details are 
 already present then please point me to the right location.
 {code}
 [exec] /home/hudson/tools/ant/latest/bin/ant  -Djavac.args=-Xlint -Xmaxwarns 
 1000 -Declipse.home=/home/nigel/tools/eclipse/latest 
 -Djava5.home=/home/hudson/tools/java/latest1.5 
 -Dforrest.home=/home/nigel/tools/forrest/latest -DPigPatchProcess= clean tar 
  
 /home/hudson/hudson-slave/workspace/Pig-Patch-minerva.apache.org/patchprocess/patchJavacWarnings.txt
  21
  [exec] There appear to be 224 javac compiler warnings before the patch 
 and 259 javac compiler warnings after applying the patch.
 {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-850) Dump produce wrong result while store into is ok

2009-06-16 Thread Olga Natkovich (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-850?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12720402#action_12720402
 ] 

Olga Natkovich commented on PIG-850:


+1; looks good, please, commit once out qa is done

 Dump produce wrong result while store into is ok
 --

 Key: PIG-850
 URL: https://issues.apache.org/jira/browse/PIG-850
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.2.0
Reporter: Daniel Dai
Assignee: Daniel Dai
 Fix For: 0.3.0

 Attachments: PIG-850.patch


 The following script will wrongly produce 20 output, however, if we change 
 dump to store into, the result is correct. Not sure if the problem is only 
 for limited sort case.
 A = load '/user/pig/tests/data/singlefile/studenttab10k' as (name, age, gpa);
 B = order A by gpa parallel 2;
 C = limit B 10;
 dump C;

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-850) Dump produce wrong result while store into is ok

2009-06-16 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-850?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-850:
---

Resolution: Fixed
Status: Resolved  (was: Patch Available)

Patch submitted

 Dump produce wrong result while store into is ok
 --

 Key: PIG-850
 URL: https://issues.apache.org/jira/browse/PIG-850
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.2.0
Reporter: Daniel Dai
Assignee: Daniel Dai
 Fix For: 0.3.0

 Attachments: PIG-850.patch


 The following script will wrongly produce 20 output, however, if we change 
 dump to store into, the result is correct. Not sure if the problem is only 
 for limited sort case.
 A = load '/user/pig/tests/data/singlefile/studenttab10k' as (name, age, gpa);
 B = order A by gpa parallel 2;
 C = limit B 10;
 dump C;

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-697) Proposed improvements to pig's optimizer

2009-06-16 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-697?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12720465#action_12720465
 ] 

Hadoop QA commented on PIG-697:
---

-1 overall.  Here are the results of testing the latest attachment 
  
http://issues.apache.org/jira/secure/attachment/12410859/OptimizerPhase3_part2_2.patch
  against trunk revision 785450.

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 18 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

-1 javac.  The applied patch generated 259 javac compiler warnings (more 
than the trunk's current 224 warnings).

-1 findbugs.  The patch appears to introduce 1 new Findbugs warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

+1 core tests.  The patch passed core unit tests.

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-minerva.apache.org/89/testReport/
Findbugs warnings: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-minerva.apache.org/89/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-minerva.apache.org/89/console

This message is automatically generated.

 Proposed improvements to pig's optimizer
 

 Key: PIG-697
 URL: https://issues.apache.org/jira/browse/PIG-697
 Project: Pig
  Issue Type: Bug
  Components: impl
Reporter: Alan Gates
Assignee: Santhosh Srinivasan
 Attachments: OptimizerPhase1.patch, OptimizerPhase1_part2.patch, 
 OptimizerPhase2.patch, OptimizerPhase3_parrt1-1.patch, 
 OptimizerPhase3_parrt1.patch, OptimizerPhase3_part2_2.patch


 I propose the following changes to pig optimizer, plan, and operator 
 functionality to support more robust optimization:
 1) Remove the required array from Rule.  This will change rules so that they 
 only match exact patterns instead of allowing missing elements in the pattern.
 This has the downside that if a given rule applies to two patterns (say 
 Load-Filter-Group, Load-Group) you have to write two rules.  But it has 
 the upside that
 the resulting rules know exactly what they are getting.  The original intent 
 of this was to reduce the number of rules that needed to be written.  But the
 resulting rules have do a lot of work to understand the operators they are 
 working with.  With exact matches only, each rule will know exactly the 
 operators it
 is working on and can apply the logic of shifting the operators around.  All 
 four of the existing rules set all entries of required to true, so removing 
 this
 will have no effect on them.
 2) Change PlanOptimizer.optimize to iterate over the rules until there are no 
 conversions or a certain number of iterations has been reached.  Currently the
 function is:
 {code}
 public final void optimize() throws OptimizerException {
 RuleMatcher matcher = new RuleMatcher();
 for (Rule rule : mRules) {
 if (matcher.match(rule)) {
 // It matches the pattern.  Now check if the transformer
 // approves as well.
 ListListO matches = matcher.getAllMatches();
 for (ListO match:matches)
 {
   if (rule.transformer.check(match)) {
   // The transformer approves.
   rule.transformer.transform(match);
   }
 }
 }
 }
 }
 {code}
 It would change to be:
 {code}
 public final void optimize() throws OptimizerException {
 RuleMatcher matcher = new RuleMatcher();
 boolean sawMatch;
 int iterators = 0;
 do {
 sawMatch = false;
 for (Rule rule : mRules) {
 ListListO matches = matcher.getAllMatches();
 for (ListO match:matches) {
 // It matches the pattern.  Now check if the transformer
 // approves as well.
 if (rule.transformer.check(match)) {
 // The transformer approves.
 sawMatch = true;
 rule.transformer.transform(match);
 }
 }
 }
 // Not sure if 1000 is the right number of iterations, maybe it
 // should be configurable so that large scripts don't stop too 
 // early.
 } while (sawMatch  numIterations++  1000);
 }
 {code}
 The reason for limiting the number of iterations is to avoid infinite loops.  
 The reason for iterating over the rules is so that each rule can