[jira] Updated: (PIG-893) support cast of chararray to other simple types

2009-08-10 Thread Jeff Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-893?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeff Zhang updated PIG-893:
---

Status: Patch Available  (was: Open)

 support cast of chararray to other simple types
 ---

 Key: PIG-893
 URL: https://issues.apache.org/jira/browse/PIG-893
 Project: Pig
  Issue Type: New Feature
Affects Versions: 0.4.0
Reporter: Thejas M Nair
Assignee: Jeff Zhang
 Fix For: 0.4.0

 Attachments: Pig_893.Patch


 Pig should support casting of chararray to 
 integer,long,float,double,bytearray. If the conversion fails for reasons such 
 as overflow, cast should return null and log a warning.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Build failed in Hudson: Pig-trunk #518

2009-08-10 Thread Apache Hudson Server
See http://hudson.zones.apache.org/hudson/job/Pig-trunk/518/

--
[...truncated 86842 lines...]
[junit] 09/08/10 14:42:20 INFO mapred.JobInProgress: Task 
'attempt_200908101441_0001_r_00_0' has completed 
task_200908101441_0001_r_00 successfully.
[junit] 09/08/10 14:42:20 INFO dfs.StateChange: BLOCK* NameSystem.delete: 
blk_-6264389947015309622 is added to invalidSet of 127.0.0.1:41255
[junit] 09/08/10 14:42:20 INFO dfs.StateChange: BLOCK* NameSystem.delete: 
blk_-6264389947015309622 is added to invalidSet of 127.0.0.1:38778
[junit] 09/08/10 14:42:20 INFO dfs.StateChange: BLOCK* NameSystem.delete: 
blk_-6264389947015309622 is added to invalidSet of 127.0.0.1:57995
[junit] 09/08/10 14:42:20 INFO dfs.StateChange: BLOCK* NameSystem.delete: 
blk_5765850477736280749 is added to invalidSet of 127.0.0.1:57995
[junit] 09/08/10 14:42:20 INFO dfs.StateChange: BLOCK* NameSystem.delete: 
blk_5765850477736280749 is added to invalidSet of 127.0.0.1:38778
[junit] 09/08/10 14:42:20 INFO dfs.StateChange: BLOCK* NameSystem.delete: 
blk_5765850477736280749 is added to invalidSet of 127.0.0.1:41255
[junit] 09/08/10 14:42:20 INFO dfs.StateChange: BLOCK* NameSystem.delete: 
blk_7559869483135611448 is added to invalidSet of 127.0.0.1:41255
[junit] 09/08/10 14:42:20 INFO dfs.StateChange: BLOCK* NameSystem.delete: 
blk_7559869483135611448 is added to invalidSet of 127.0.0.1:56802
[junit] 09/08/10 14:42:20 INFO dfs.StateChange: BLOCK* NameSystem.delete: 
blk_7559869483135611448 is added to invalidSet of 127.0.0.1:57995
[junit] 09/08/10 14:42:20 INFO mapred.JobInProgress: Job 
job_200908101441_0001 has completed successfully.
[junit] 09/08/10 14:42:20 INFO dfs.StateChange: BLOCK* 
NameSystem.allocateBlock: 
/tmp/temp-1684136488/tmp1187787419/_logs/history/localhost_1249915296024_job_200908101441_0001_hudson_Job8913032530943062158.jar.
 blk_9039513662785461979_1009
[junit] 09/08/10 14:42:20 INFO dfs.DataNode: Receiving block 
blk_9039513662785461979_1009 src: /127.0.0.1:33493 dest: /127.0.0.1:57995
[junit] 09/08/10 14:42:20 INFO dfs.DataNode: Receiving block 
blk_9039513662785461979_1009 src: /127.0.0.1:35971 dest: /127.0.0.1:41255
[junit] 09/08/10 14:42:20 INFO dfs.DataNode: Receiving block 
blk_9039513662785461979_1009 src: /127.0.0.1:44847 dest: /127.0.0.1:56802
[junit] 09/08/10 14:42:20 INFO dfs.DataNode: Received block 
blk_9039513662785461979_1009 of size 5095 from /127.0.0.1
[junit] 09/08/10 14:42:20 INFO dfs.StateChange: BLOCK* 
NameSystem.addStoredBlock: blockMap updated: 127.0.0.1:56802 is added to 
blk_9039513662785461979_1009 size 5095
[junit] 09/08/10 14:42:20 INFO dfs.DataNode: PacketResponder 0 for block 
blk_9039513662785461979_1009 terminating
[junit] 09/08/10 14:42:20 INFO dfs.DataNode: Received block 
blk_9039513662785461979_1009 of size 5095 from /127.0.0.1
[junit] 09/08/10 14:42:20 INFO dfs.DataNode: PacketResponder 1 for block 
blk_9039513662785461979_1009 terminating
[junit] 09/08/10 14:42:20 INFO dfs.StateChange: BLOCK* 
NameSystem.addStoredBlock: blockMap updated: 127.0.0.1:41255 is added to 
blk_9039513662785461979_1009 size 5095
[junit] 09/08/10 14:42:20 INFO dfs.DataNode: Received block 
blk_9039513662785461979_1009 of size 5095 from /127.0.0.1
[junit] 09/08/10 14:42:20 INFO dfs.StateChange: BLOCK* 
NameSystem.addStoredBlock: blockMap updated: 127.0.0.1:57995 is added to 
blk_9039513662785461979_1009 size 5095
[junit] 09/08/10 14:42:20 INFO dfs.DataNode: PacketResponder 2 for block 
blk_9039513662785461979_1009 terminating
[junit] 09/08/10 14:42:20 INFO mapReduceLayer.MapReduceLauncher: 100% 
complete
[junit] 09/08/10 14:42:20 INFO mapReduceLayer.MapReduceLauncher: 
Successfully stored result in: 
hdfs://localhost:48231/tmp/temp-1684136488/tmp1187787419
[junit] 09/08/10 14:42:20 INFO mapReduceLayer.MapReduceLauncher: Records 
written : 1
[junit] 09/08/10 14:42:20 INFO mapReduceLayer.MapReduceLauncher: Bytes 
written : 107
[junit] 09/08/10 14:42:20 INFO mapReduceLayer.MapReduceLauncher: Success!
[junit] 09/08/10 14:42:20 INFO dfs.DataNode: 
DatanodeRegistration(127.0.0.1:56802, 
storageID=DS-641934704-67.195.138.8-56802-1249915294946, infoPort=57860, 
ipcPort=35751) Served block blk_-8213206887175561647_1009 to /127.0.0.1
[junit] 09/08/10 14:42:20 INFO executionengine.HExecutionEngine: Connecting 
to hadoop file system at: file:///
[junit] 09/08/10 14:42:20 INFO jvm.JvmMetrics: Cannot initialize JVM 
Metrics with processName=JobTracker, sessionId= - already initialized
[junit] 09/08/10 14:42:21 INFO dfs.StateChange: BLOCK* 
NameSystem.allocateBlock: /user/hudson/input1.txt. blk_-7512886654948818052_1010
[junit] 09/08/10 14:42:21 INFO dfs.DataNode: Receiving block 
blk_-7512886654948818052_1010 src: /127.0.0.1:35974 dest: /127.0.0.1:41255
[junit] 09/08/10 14:42:21 INFO dfs.DataNode: Receiving block 

Build failed in Hudson: Pig-Patch-minerva.apache.org #155

2009-08-10 Thread Apache Hudson Server
See 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-minerva.apache.org/155/changes

Changes:

[daijy] PIG-905: TOKENIZE throws exception on null data

[daijy] PIG-697: Proposed improvements to pig's optimizer, Phase5

--
[...truncated 103185 lines...]
 [exec] [junit] 09/08/10 19:12:18 INFO dfs.DataNode: PacketResponder 1 
for block blk_-6144671934640457499_1010 terminating
 [exec] [junit] 09/08/10 19:12:18 INFO dfs.DataNode: Received block 
blk_-6144671934640457499_1010 of size 6 from /127.0.0.1
 [exec] [junit] 09/08/10 19:12:18 INFO dfs.StateChange: BLOCK* 
NameSystem.addStoredBlock: blockMap updated: 127.0.0.1:41819 is added to 
blk_-6144671934640457499_1010 size 6
 [exec] [junit] 09/08/10 19:12:18 INFO dfs.DataNode: PacketResponder 2 
for block blk_-6144671934640457499_1010 terminating
 [exec] [junit] 09/08/10 19:12:18 INFO dfs.StateChange: BLOCK* 
NameSystem.allocateBlock: /user/hudson/input2.txt. blk_7692080556445682047_1011
 [exec] [junit] 09/08/10 19:12:18 INFO dfs.DataNode: Receiving block 
blk_7692080556445682047_1011 src: /127.0.0.1:33904 dest: /127.0.0.1:41819
 [exec] [junit] 09/08/10 19:12:18 INFO dfs.DataNode: Receiving block 
blk_7692080556445682047_1011 src: /127.0.0.1:44683 dest: /127.0.0.1:44402
 [exec] [junit] 09/08/10 19:12:18 INFO dfs.DataNode: Receiving block 
blk_7692080556445682047_1011 src: /127.0.0.1:59965 dest: /127.0.0.1:40083
 [exec] [junit] 09/08/10 19:12:18 INFO dfs.DataNode: Received block 
blk_7692080556445682047_1011 of size 6 from /127.0.0.1
 [exec] [junit] 09/08/10 19:12:18 INFO dfs.DataNode: PacketResponder 0 
for block blk_7692080556445682047_1011 terminating
 [exec] [junit] 09/08/10 19:12:18 INFO dfs.StateChange: BLOCK* 
NameSystem.addStoredBlock: blockMap updated: 127.0.0.1:40083 is added to 
blk_7692080556445682047_1011 size 6
 [exec] [junit] 09/08/10 19:12:18 INFO dfs.DataNode: Received block 
blk_7692080556445682047_1011 of size 6 from /127.0.0.1
 [exec] [junit] 09/08/10 19:12:18 INFO dfs.StateChange: BLOCK* 
NameSystem.addStoredBlock: blockMap updated: 127.0.0.1:44402 is added to 
blk_7692080556445682047_1011 size 6
 [exec] [junit] 09/08/10 19:12:18 INFO dfs.DataNode: PacketResponder 1 
for block blk_7692080556445682047_1011 terminating
 [exec] [junit] 09/08/10 19:12:18 INFO dfs.DataNode: Received block 
blk_7692080556445682047_1011 of size 6 from /127.0.0.1
 [exec] [junit] 09/08/10 19:12:18 INFO dfs.StateChange: BLOCK* 
NameSystem.addStoredBlock: blockMap updated: 127.0.0.1:41819 is added to 
blk_7692080556445682047_1011 size 6
 [exec] [junit] 09/08/10 19:12:18 INFO dfs.DataNode: PacketResponder 2 
for block blk_7692080556445682047_1011 terminating
 [exec] [junit] 09/08/10 19:12:18 INFO 
executionengine.HExecutionEngine: Connecting to hadoop file system at: 
hdfs://localhost:40520
 [exec] [junit] 09/08/10 19:12:18 INFO 
executionengine.HExecutionEngine: Connecting to map-reduce job tracker at: 
localhost:51530
 [exec] [junit] 09/08/10 19:12:18 INFO 
mapReduceLayer.MultiQueryOptimizer: MR plan size before optimization: 1
 [exec] [junit] 09/08/10 19:12:18 INFO 
mapReduceLayer.MultiQueryOptimizer: MR plan size after optimization: 1
 [exec] [junit] 09/08/10 19:12:19 INFO 
mapReduceLayer.JobControlCompiler: Setting up single store job
 [exec] [junit] 09/08/10 19:12:19 WARN mapred.JobClient: Use 
GenericOptionsParser for parsing the arguments. Applications should implement 
Tool for the same.
 [exec] [junit] 09/08/10 19:12:19 INFO dfs.StateChange: BLOCK* 
NameSystem.allocateBlock: 
/tmp/hadoop-hudson/mapred/system/job_200908101911_0002/job.jar. 
blk_-3237392627602153862_1012
 [exec] [junit] 09/08/10 19:12:19 INFO dfs.DataNode: Receiving block 
blk_-3237392627602153862_1012 src: /127.0.0.1:33907 dest: /127.0.0.1:41819
 [exec] [junit] 09/08/10 19:12:19 INFO dfs.DataNode: Receiving block 
blk_-3237392627602153862_1012 src: /127.0.0.1:44686 dest: /127.0.0.1:44402
 [exec] [junit] 09/08/10 19:12:19 INFO dfs.DataNode: Receiving block 
blk_-3237392627602153862_1012 src: /127.0.0.1:59968 dest: /127.0.0.1:40083
 [exec] [junit] 09/08/10 19:12:19 INFO dfs.DataNode: Received block 
blk_-3237392627602153862_1012 of size 1478322 from /127.0.0.1
 [exec] [junit] 09/08/10 19:12:19 INFO dfs.DataNode: PacketResponder 0 
for block blk_-3237392627602153862_1012 terminating
 [exec] [junit] 09/08/10 19:12:19 INFO dfs.DataNode: Received block 
blk_-3237392627602153862_1012 of size 1478322 from /127.0.0.1
 [exec] [junit] 09/08/10 19:12:19 INFO dfs.StateChange: BLOCK* 
NameSystem.addStoredBlock: blockMap updated: 127.0.0.1:40083 is added to 
blk_-3237392627602153862_1012 size 1478322
 [exec] [junit] 09/08/10 19:12:19 INFO dfs.DataNode: PacketResponder 1 
for block blk_-3237392627602153862_1012 

[jira] Commented: (PIG-893) support cast of chararray to other simple types

2009-08-10 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12741500#action_12741500
 ] 

Hadoop QA commented on PIG-893:
---

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12415997/Pig_893.Patch
  against trunk revision 801865.

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 9 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

-1 findbugs.  The patch appears to introduce 4 new Findbugs warnings.

-1 release audit.  The applied patch generated 161 release audit warnings 
(more than the trunk's current 160 warnings).

+1 core tests.  The patch passed core unit tests.

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-minerva.apache.org/155/testReport/
Release audit warnings: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-minerva.apache.org/155/artifact/trunk/patchprocess/releaseAuditDiffWarnings.txt
Findbugs warnings: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-minerva.apache.org/155/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-minerva.apache.org/155/console

This message is automatically generated.

 support cast of chararray to other simple types
 ---

 Key: PIG-893
 URL: https://issues.apache.org/jira/browse/PIG-893
 Project: Pig
  Issue Type: New Feature
Affects Versions: 0.4.0
Reporter: Thejas M Nair
Assignee: Jeff Zhang
 Fix For: 0.4.0

 Attachments: Pig_893.Patch


 Pig should support casting of chararray to 
 integer,long,float,double,bytearray. If the conversion fails for reasons such 
 as overflow, cast should return null and log a warning.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-893) support cast of chararray to other simple types

2009-08-10 Thread Olga Natkovich (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12741521#action_12741521
 ] 

Olga Natkovich commented on PIG-893:


The reason for the release audit issue is because one of the new files is 
missing the apache license header. Not sure what is the issue with find bugs. 
These issues need to be resolved or at least investigated before the patch can 
be committed.

 support cast of chararray to other simple types
 ---

 Key: PIG-893
 URL: https://issues.apache.org/jira/browse/PIG-893
 Project: Pig
  Issue Type: New Feature
Affects Versions: 0.4.0
Reporter: Thejas M Nair
Assignee: Jeff Zhang
 Fix For: 0.4.0

 Attachments: Pig_893.Patch


 Pig should support casting of chararray to 
 integer,long,float,double,bytearray. If the conversion fails for reasons such 
 as overflow, cast should return null and log a warning.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-845) PERFORMANCE: Merge Join

2009-08-10 Thread Dmitriy V. Ryaboy (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12741589#action_12741589
 ] 

Dmitriy V. Ryaboy commented on PIG-845:
---

Some Comments below.
It's a big patch, so a lot of comments...

1. 
EndOfAllInput flags -- could you add comments here about what the point of this 
flag is? You explain what EndOfAllInputSetter does (which is actually rather 
self-explanatory) but not what the meaning of the flag is and how it's used. 
There is a bit of an explanation in PigMapBase, but it really belongs here.

2.
Could you explain the relationship between EndOfAllInput and (deleted) POStream?

3.
Comments in MRCompiler alternate between referring to the left MROp as 
LeftMROper and curMROper. Choose one.

4.
I am curious about the decision to throw compiler exceptions if MergeJoin 
requirements re number of inputs, etc, aren't satisfied. It seems like a better 
user experience would be to log a warning and fall back to a regular join.

5.
Style notes for visitMergeJoin: 

It's a 200-line method. Any way you can break it up into smaller components? As 
is, it's hard to follow.

The if statements should be broken up into multiple lines to agree with the 
style guides.

Variable naming: you've got topPrj, prj, pkg, lr, ce, nig.. one at a time they 
are fine, but together in a 200-line method they are undreadable. Please 
consider more descriptive names.

6.
Kind of a global comment, since it applies to more than just MergeJoin:

It seems to me like we need a Builder for operators to clean up some of the 
new, set, set, set stuff.

Having the setters return this and a Plan's add() method return the plan, would 
let us replace this:

POProject topPrj = new POProject(new 
OperatorKey(scope,nig.getNextNodeId(scope)));
topPrj.setColumn(1);
topPrj.setResultType(DataType.TUPLE);
topPrj.setOverloaded(true);
rightMROpr.reducePlan.add(topPrj);
rightMROpr.reducePlan.connect(pkg, topPrj);

with this:

POProject topPrj = new POProject(new 
OperatorKey(scope,nig.getNextNodeId(scope)))
.setColumn(1).setResultType(DataType.TUPLE)
.setOverloaded(true);

rightMROpr.reducePlan.add(topPrj).connect(pkg, topPrj)


7.
Is the change to ListListByte keyTypes in POFRJoin related to MergeJoin or 
just rolled in?

8. MergeJoin

break getNext() into components.

I don't see you supporting Left outer joins. Plans for that? At least document 
the planned approach.

Error codes being declared deep inside classes, and documented on the wiki, is 
a poor practice, imo. They should be pulled out into PigErrors (as lightweight 
final objects that have an error code, a name, and a description..) I thought 
Santhosh made progress on this already, no?

Could you explain the problem with splits and streams? Why can't this work for 
them?


9. Sampler/Indexer:
9a
Looks like you create the same number of map tasks for this as you do for a 
join; all a sampling map task does is read one record and emit a single tuple.  
That seems wasteful; there is a lot of overhead in setting up these tiny jobs 
which might get stuck behind other jobs running on the cluster, etc. If the 
underlying file has syncpoints, a smaller number of MR tasks can be created. If 
we know the ratio of sample tasks to full tasks, we can figure out how many 
records we should emit per job ( ceil(full_tasks/sample_tasks) ).  We can 
approximately achieve this through seeking trough (end-offset)/num_to_emit and 
doing a sync() after that seek. It's approximate, but close enough for an index.

9b
Consider renaming to something like SortedFileIndexer, since it's coneivable 
that this component can be reused in a context other than a Merge Join.

10.
Would it make sense to expose this to the users via a 'CREATE INDEX' (or 
similar) command?
That way the index could be persisted, and the user could tell you to use an 
existing index instead of rescanning the data.

11.
I am not sure about the approach of pushing sampling above filters. Have you 
guys benchmarked this? Seems like you'd wind up reading the whole file in the 
sample job if the filter is selective enough (and high filter selectivity would 
also make materialize-sample go much faster).

Testing: 
12a
You should test for refusal to do 3-way join and other error condition (or a 
warning and successful failover to regular join -- my preference)

12b
You should do a proper unit test for the MergeJoinIndexer (or whatever we are 
calling it).



 PERFORMANCE: Merge Join
 ---

 Key: PIG-845
 URL: https://issues.apache.org/jira/browse/PIG-845
 Project: Pig
  Issue Type: Improvement
Reporter: Olga Natkovich
Assignee: Ashutosh Chauhan
 Attachments: merge-join-1.patch, merge-join-for-review.patch


 Thsi join would work if the data for both tables is sorted on the join key.

-- 
This message is 

[jira] Commented: (PIG-845) PERFORMANCE: Merge Join

2009-08-10 Thread Pradeep Kamath (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12741621#action_12741621
 ] 

Pradeep Kamath commented on PIG-845:


Review comments:
1) In LogicalPlanTester.java, why is the following change required?
{noformat}
@@ -198,7 +198,7 @@
 private LogicalPlan buildPlan(String query, ClassLoader cldr) {
 
 LogicalPlanBuilder.classloader = 
LogicalPlanTester.class.getClassLoader() ;
-PigContext pigContext = new PigContext(ExecType.LOCAL, new 
Properties());
+PigContext pigContext = new PigContext(ExecType.MAPREDUCE, new 
Properties());
 try {
 pigContext.connect();
 } catch (ExecException e1) {
{noformat}

Typically when PigContext is constructed in Map-reduce mode, the properties 
should correspond to the cluster configuration. So the above initialization 
seems odd because the Properties object is an empty object in the constructor 
call above.

2) In PigMapBase.java:

public static final String END_OF_INP_IN_MAP = pig.stream.in.map;

can change to

public static final String END_OF_INP_IN_MAP = pig.blocking.operator.in.map; 
and this should be put as a public static member of JobControlCompiler.

In JobControlCompiler.java,

jobConf.set(pig.stream.in.map, true);  should change to use the above 
public static String.


3) Remove the following comment in QueryParser.jjt (line 302):
{code}
* Join parser. Currently can only handle skewed joins.
{code}

4) In QueryParser.jjt the joinPlans passed to LOJoin constructor is not a 
LinkedMultiMap
but in LogToPhyTranslationVistior the join plans are put in a LinkedMultiMap. 
If order is
important, shouldn't QueryParser.jjt also change?

5) Some comments in LogToPhyTranslationVisitor about the different lists and 
maps would help :)

6) In validateMergeJoin() - the code only considers direct successors and 
predecessors of LOJoin. It should check the entire plan and ensure that 
predecessors of LOJoin all the way to the LOLoad are only LOForEach and 
LOFilter. Strictly we should not allow LOForeach since it could change sort 
order or position of join keys and hence invalidate the index - but we need it
so that the Foreach introduced by the TypeCastInserter when there is a schema 
for either of the inputs remains. You should note in the documentation that 
only Order and join key position preserving Foreachs and Filters are allowed as 
predecessors to merge join and check the same in validateMergeJoin() - it is 
better to use a whitelist of allowed operators than a blacklist
of disallowed once (since then the blacklist would need to be updated anytime a 
new operator comes along. The exception source here is not really a bug but a 
user input error since merge join really doesnot support other ops.

Again for the successor, all successors from mergejoin down to map leaf should 
be checked to ensure stream is absent (really there should be no restriction on 
stream being present after the join - if there is an issue currently with this, 
it is fine to not allow stream but eventually it would be good to not have any 
restriction on what follows the merge join). You can just use a visitor to 
check presence of stream in the plan - this should be done after complete 
LogToPhyTranslation is done - in visit() so that the whole plan can be looked 
at.

7) Is MRStreamHandler.java now replaced by 
/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/plans/EndOfAllInputSetter.java
 ?

8) Some of MRCompilerExceptions do not follow the Error handling spec - 
errcode, errMsg, Src

9) Should assert() statements in MRCompiler be replaced with Exceptions since 
assertions are disabled by default in Java.

10) In MRCompiler.java I wonder if you should change
{code}
rightMapPlan.disconnect(rightLoader, loadSucc);
rightMapPlan.remove(loadSucc);
{code}
to
{code}
rightMapPlan.trimBelow(rightLoader);
{code}
We really want to remove all operators in rightMapPlan other than the loader.

11) We should note in documentation that merge join only works for data sorted 
in ascending order. (the MRCompiler code assumes this - we should have sort 
check if possible - see performance comment below)

12) It would be good to add a couple of unit tests with a few operators after 
merge join to ensure merge join operators well with successors in the plan.

13) In POMergeJoin.java, comments about foreach should be cleaned up since 
foreach is no longer used. For example:
{code}
//variable which denotes whether we are returning tuples from the foreach 
operator
{code}

The following code can be factored out into a function since its repeated twice:
{code}
   case POStatus.STATUS_EOP:  // Current file has ended. 
Need to open next file by reading next index entry.
String prevFile =