PERFORMANCE: multi-query optimization

                 Key: PIG-627
             Project: Pig
          Issue Type: Improvement
    Affects Versions: types_branch
            Reporter: Olga Natkovich
             Fix For: types_branch

Currently, if your Pig script contains multiple stores and some shared 
computation, Pig will execute several independent queries. For instance:

A = load 'data' as (a, b, c);
B = filter A by a > 5;
store B into 'output1';
C = group B by b;
store C into 'output2';

This script will result in map-only job that generated output1 followed by a 
map-reduce job that generated output2. As the resuld data is read, parsed and 
filetered twice which is unnecessary and costly. 

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

Reply via email to