[ https://issues.apache.org/jira/browse/PIG-627?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Gunther Hagleitner updated PIG-627: ----------------------------------- Attachment: multi-store-0303.patch This patch introduces the functionality to support multiple stores in a single MR job. It's for the multiquery branch and it is needed to unblock concurrent dev on the split operator. There aren't enough unit tests in this patch yet. They will be provided once the split operator can use multi stores (right now, nothing actually uses these stores, so testing is difficult). In order to test the patch, I had temporarily turned multi store on for all queries (even if they only have one store) and then ran all the unit tests. All tests passed. > PERFORMANCE: multi-query optimization > ------------------------------------- > > Key: PIG-627 > URL: https://issues.apache.org/jira/browse/PIG-627 > Project: Pig > Issue Type: Improvement > Affects Versions: types_branch > Reporter: Olga Natkovich > Fix For: types_branch > > Attachments: multi-store-0303.patch, multiquery_0223.patch, > multiquery_0224.patch > > > Currently, if your Pig script contains multiple stores and some shared > computation, Pig will execute several independent queries. For instance: > A = load 'data' as (a, b, c); > B = filter A by a > 5; > store B into 'output1'; > C = group B by b; > store C into 'output2'; > This script will result in map-only job that generated output1 followed by a > map-reduce job that generated output2. As the resuld data is read, parsed and > filetered twice which is unnecessary and costly. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.