optimizing diamond queries
--------------------------
Key: PIG-920
URL: https://issues.apache.org/jira/browse/PIG-920
Project: Pig
Issue Type: Improvement
Reporter: Olga Natkovich
The following query
A = load 'foo';
B = filer A by $0>1;
C = filter A by $1 = 'foo';
D = COGROUP C by $0, B by $0;
......
does not get efficiently executed. Currently, it runs a map only job that
basically reads and write the same data before doing the query processing.
Query where the data is loaded twice actually executed more efficiently.
This is not an uncommon query and we should fix this issue.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.