Dmitriy V. Ryaboy commented on PIG-953:

bq. Pradeep: Pig only guarantees order with limit following order - for any 
other relational operator following order there are no guarantees. Today it is 
true that filter or a column pruning foreach would also preserve order but this 
can change if needed in the future. There explicit code to ensure order-limit 
combination works by preserving order - there is no such explicit check for 
other operators (keeping it open for change in the future)

That actually tells me that an orderPreserving property on a LogicalOperator is 
a really good idea.
That way we can set it to true on all operators that are at the moment 
order-preserving (limit, filter, column-prining foreach), and not commit to 
forever maintaining that contract. If filter starts changing order, the patch 
will simply have to include a change to set orderPreserving to false in 
POFilter, and everything will work automagically.

> Enable merge join in pig to work with loaders and store functions which can 
> internally index sorted data 
> ---------------------------------------------------------------------------------------------------------
>                 Key: PIG-953
>                 URL: https://issues.apache.org/jira/browse/PIG-953
>             Project: Pig
>          Issue Type: Improvement
>    Affects Versions: 0.3.0
>            Reporter: Pradeep Kamath
>            Assignee: Pradeep Kamath
>         Attachments: PIG-953.patch
> Currently merge join implementation in pig includes construction of an index 
> on sorted data and use of that index to seek into the "right input" to 
> efficiently perform the join operation. Some loaders (notably the zebra 
> loader) internally implement an index on sorted data and can perform this 
> seek efficiently using their index. So the use of the index needs to be 
> abstracted in such a way that when the loader supports indexing, pig uses it 
> (indirectly through the loader) and does not construct an index. 

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

Reply via email to