[ 
https://issues.apache.org/jira/browse/PIG-953?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pradeep Kamath updated PIG-953:
-------------------------------

    Attachment: PIG-953-5.patch

Zebra needs a global commit method to be able to build an index on the sorted 
zebra file. Attaching a new patch which introduces a CommittableStoreFunc 
interfce with a commit() method which extends StoreFunc. Zebra store function 
will extend this interface and pig will call the commit() method on the 
CommittableStoreFunc at the completion of the job. While this is not ideal and 
we could add commit() into StoreFunc itself, it would break existing store 
functions. Also very soon, if changes in 
http://wiki.apache.org/pig/LoadStoreRedesignProposal are implemented, this 
would change anyway - so this new interface is being introduced so that till we 
move to the new interface changes recommended in the wiki we don't break 
existing store functions.

> Enable merge join in pig to work with loaders and store functions which can 
> internally index sorted data 
> ---------------------------------------------------------------------------------------------------------
>
>                 Key: PIG-953
>                 URL: https://issues.apache.org/jira/browse/PIG-953
>             Project: Pig
>          Issue Type: Improvement
>    Affects Versions: 0.3.0
>            Reporter: Pradeep Kamath
>            Assignee: Pradeep Kamath
>         Attachments: PIG-953-2.patch, PIG-953-3.patch, PIG-953-4.patch, 
> PIG-953-5.patch, PIG-953.patch
>
>
> Currently merge join implementation in pig includes construction of an index 
> on sorted data and use of that index to seek into the "right input" to 
> efficiently perform the join operation. Some loaders (notably the zebra 
> loader) internally implement an index on sorted data and can perform this 
> seek efficiently using their index. So the use of the index needs to be 
> abstracted in such a way that when the loader supports indexing, pig uses it 
> (indirectly through the loader) and does not construct an index. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to