[jira] Updated: (PIG-930) merge join should handle compressed bz2 sorted files

2010-07-23 Thread Olga Natkovich (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-930?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Olga Natkovich updated PIG-930:
---

Fix Version/s: (was: 0.8.0)

Unlinking from the release. We have not really seen user asks for this

> merge join should handle compressed bz2 sorted files
> 
>
> Key: PIG-930
> URL: https://issues.apache.org/jira/browse/PIG-930
> Project: Pig
>  Issue Type: Bug
>Reporter: Pradeep Kamath
>
> There are two issues - POLoad which is used to read the right side input does 
> not handle bz2 files right now. This needs to be fixed.
> Further inn the index map job we bindTo(startOfBlockOffSet) (this will 
> internally discard first tuple if offset > 0). Then we do the following:
> {noformat}
> While(tuple survives pipeline) {
>   Pos =  getPosition()
>   getNext() 
>   run the tuple  through pipeline in the right side which could have filter
> }
> Emit(key, pos, filename).
> {noformat}
>  
> Then in the map job which does the join, we bindTo(pos > 0 ? pos  1 : pos) 
> (we do pos -1 because bindTo will discard first tuple for pos> 0). Then we do 
> getNext()
> Now in bz2 compressed files, getPosition() returns a position which is not 
> really accurate. The problem is it could be a position in the middle of a 
> compressed bz2 block. Then when we use that position to bindTo() in the final 
> map job, the code would first hunt for a bz2 block header thus skipping the 
> whole current bz2 block. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-930) merge join should handle compressed bz2 sorted files

2010-07-12 Thread Olga Natkovich (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-930?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Olga Natkovich updated PIG-930:
---

Fix Version/s: 0.8.0

Likely, this is no longer an issue in 0.7.0. Need to verify and add unit tests

> merge join should handle compressed bz2 sorted files
> 
>
> Key: PIG-930
> URL: https://issues.apache.org/jira/browse/PIG-930
> Project: Pig
>  Issue Type: Bug
>Reporter: Pradeep Kamath
> Fix For: 0.8.0
>
>
> There are two issues - POLoad which is used to read the right side input does 
> not handle bz2 files right now. This needs to be fixed.
> Further inn the index map job we bindTo(startOfBlockOffSet) (this will 
> internally discard first tuple if offset > 0). Then we do the following:
> {noformat}
> While(tuple survives pipeline) {
>   Pos =  getPosition()
>   getNext() 
>   run the tuple  through pipeline in the right side which could have filter
> }
> Emit(key, pos, filename).
> {noformat}
>  
> Then in the map job which does the join, we bindTo(pos > 0 ? pos  1 : pos) 
> (we do pos -1 because bindTo will discard first tuple for pos> 0). Then we do 
> getNext()
> Now in bz2 compressed files, getPosition() returns a position which is not 
> really accurate. The problem is it could be a position in the middle of a 
> compressed bz2 block. Then when we use that position to bindTo() in the final 
> map job, the code would first hunt for a bz2 block header thus skipping the 
> whole current bz2 block. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.