[jira] Created: (PIG-926) Merge-Join phase 2

2009-08-16 Thread Ashutosh Chauhan (JIRA)
Merge-Join phase 2
--

 Key: PIG-926
 URL: https://issues.apache.org/jira/browse/PIG-926
 Project: Pig
  Issue Type: Improvement
  Components: impl
Reporter: Ashutosh Chauhan
Assignee: Ashutosh Chauhan
Priority: Minor


This jira is created to keep track of phase-2 work for MergeJoin. Various 
limitations exist in phase-1 for Merge Join which are listed on: 
http://wiki.apache.org/pig/PigMergeJoin Those will be addressed here.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-926) Merge-Join phase 2

2009-08-16 Thread Ashutosh Chauhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-926?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated PIG-926:
-

Attachment: mj_phase2_1.patch

The attached first patch runs the full pipeline of right side in indexer before 
sampling the tuple from block. This has following advantages:
a) It addresses the concern which Pradeep pointed out in phase-1: Strictly we 
should not allow LOForeach since it could change sort order or position of join 
keys and hence invalidate the index - but we need it so that the Foreach 
introduced by the TypeCastInserter when there is a schema for either of the 
inputs remains. Now since pipeline is run before sampling the tuple, this 
becomes a non-issue.
b) Currently type information doesn't make it to the POSort which sorts the 
index entries in reduce task of index job. This works due to other reasons, but 
this patch fixes this.
c) It will improve on performance. Instead of always sampling the first record 
of the block, index now contains the entry of first record in the block for 
which join may happen, thus saving time spent in fetching right tuples over the 
network which couldn't be joined in any case.


 Merge-Join phase 2
 --

 Key: PIG-926
 URL: https://issues.apache.org/jira/browse/PIG-926
 Project: Pig
  Issue Type: Improvement
  Components: impl
Reporter: Ashutosh Chauhan
Assignee: Ashutosh Chauhan
Priority: Minor
 Attachments: mj_phase2_1.patch


 This jira is created to keep track of phase-2 work for MergeJoin. Various 
 limitations exist in phase-1 for Merge Join which are listed on: 
 http://wiki.apache.org/pig/PigMergeJoin Those will be addressed here.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.