[ 
https://issues.apache.org/jira/browse/PIG-920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12771727#action_12771727
 ] 

Richard Ding commented on PIG-920:
----------------------------------

bq. It would be good to add some comments in the following code on why the plan 
size should be 2 or 3 and what the POForEach is

Will do.

bq. Just to be safe it might be better to check that there is only 1 successor 
before this code:

The load operator can have only one successor (supportsMultipleOutputs = false).

bq. Is the following by design even in the case where multiple successors are 
present for splitter?

This return value is the number of MR operators being merged (removed from plan 
by this method). For this method, the return value can be either 0 or 1.


> optimizing diamond queries
> --------------------------
>
>                 Key: PIG-920
>                 URL: https://issues.apache.org/jira/browse/PIG-920
>             Project: Pig
>          Issue Type: Improvement
>            Reporter: Olga Natkovich
>            Assignee: Richard Ding
>         Attachments: PIG-920.patch
>
>
> The following query
> A = load 'foo';
> B = filer A by $0>1;
> C = filter A by $1 = 'foo';
> D = COGROUP C by $0, B by $0;
> ......
> does not get efficiently executed. Currently, it runs a map only job that 
> basically reads and write the same data before doing the query processing.
> Query where the data is loaded twice actually executed more efficiently.
> This is not an uncommon query and we should fix this issue.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to