optimizing diamond queries

                 Key: PIG-920
                 URL: https://issues.apache.org/jira/browse/PIG-920
             Project: Pig
          Issue Type: Improvement
            Reporter: Olga Natkovich

The following query

A = load 'foo';
B = filer A by $0>1;
C = filter A by $1 = 'foo';
D = COGROUP C by $0, B by $0;

does not get efficiently executed. Currently, it runs a map only job that 
basically reads and write the same data before doing the query processing.

Query where the data is loaded twice actually executed more efficiently.

This is not an uncommon query and we should fix this issue.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

Reply via email to