[
https://issues.apache.org/jira/browse/PIG-920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12771712#action_12771712
]
Pradeep Kamath commented on PIG-920:
------------------------------------
In MultiQueryOptimizer.java (the numbers in the code blocks below are line
numbers):
It would be good to add some comments in the following code on why the plan
size should be 2 or 3
and what the POForEach is
{noformat}
223 if (pl.size() == 2 || pl.size() == 3) {
224 PhysicalOperator root = pl.getRoots().get(0);
225 PhysicalOperator leaf = pl.getLeaves().get(0);
226 if (root instanceof POLoad && leaf instanceof POStore) {
227 if (pl.size() == 3) {
228 PhysicalOperator mid =
pl.getSuccessors(root).get(0);
229 if (mid instanceof POForEach) {
230 rtn = true;
231 }
232 } else {
233 rtn = true;
234 }
235 }
236 }
237 }
{noformat}
Just to be safe it might be better to check that there is only 1 successor
before this code:
{noformat}
265 PhysicalOperator opSucc =
succ.mapPlan.getSuccessors(op).get(0);
{noformat}
Is the following by design even in the case where multiple successors are
present for splitter?
{noformat}
309 return 1;
{noformat}
> optimizing diamond queries
> --------------------------
>
> Key: PIG-920
> URL: https://issues.apache.org/jira/browse/PIG-920
> Project: Pig
> Issue Type: Improvement
> Reporter: Olga Natkovich
> Assignee: Richard Ding
> Attachments: PIG-920.patch
>
>
> The following query
> A = load 'foo';
> B = filer A by $0>1;
> C = filter A by $1 = 'foo';
> D = COGROUP C by $0, B by $0;
> ......
> does not get efficiently executed. Currently, it runs a map only job that
> basically reads and write the same data before doing the query processing.
> Query where the data is loaded twice actually executed more efficiently.
> This is not an uncommon query and we should fix this issue.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.