[ 
https://issues.apache.org/jira/browse/PIG-627?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gunther Hagleitner updated PIG-627:
-----------------------------------

    Attachment: multi-store-0303.patch

This patch introduces the functionality to support multiple stores in a single 
MR job. It's for the multiquery branch and it is needed to unblock concurrent 
dev on the split operator.

There aren't enough unit tests in this patch yet. They will be provided once 
the split operator can use multi stores (right now, nothing actually uses these 
stores, so testing is difficult). In order to test the patch, I had temporarily 
turned multi store on for all queries (even if they only have one store) and 
then ran all the unit tests. All tests passed.

> PERFORMANCE: multi-query optimization
> -------------------------------------
>
>                 Key: PIG-627
>                 URL: https://issues.apache.org/jira/browse/PIG-627
>             Project: Pig
>          Issue Type: Improvement
>    Affects Versions: types_branch
>            Reporter: Olga Natkovich
>             Fix For: types_branch
>
>         Attachments: multi-store-0303.patch, multiquery_0223.patch, 
> multiquery_0224.patch
>
>
> Currently, if your Pig script contains multiple stores and some shared 
> computation, Pig will execute several independent queries. For instance:
> A = load 'data' as (a, b, c);
> B = filter A by a > 5;
> store B into 'output1';
> C = group B by b;
> store C into 'output2';
> This script will result in map-only job that generated output1 followed by a 
> map-reduce job that generated output2. As the resuld data is read, parsed and 
> filetered twice which is unnecessary and costly. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to