[ 
https://issues.apache.org/jira/browse/PIG-161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12605456#action_12605456
 ] 

sms edited comment on PIG-161 at 6/16/08 4:32 PM:
------------------------------------------------------------------

Consider the following example, a modification of Case 2 in the previous 
comment:

{code}
A = load 'myfile';
B = group A by $0;
C = foreach B {
    C1 = distinct $1;
    generate group + SUM(C1);
};
{code}

Top level plan:

load -> group -> foreach

The foreach will have a nested plan:

plan 1: project(1) -> distinct -> accumulate

The accumulate will have a nested plan of: 

{format}
     project( * )
                        \
                        SUM()
                         / 
project(group)
{format}

The accumulate operator requires two inputs:

1. The tuple from foreach for projecting 'group'
2. The bag from distinct for the aggregate SUM

With the proposed changes, accumulate will not be able to receive inputs from 
both foreach and distinct. In order to solve this problem, accumulate has to be 
made a proxy root by attaching the input from foreach to accumulate. The second 
input from distinct will be retrieved using getNext()

In addition to the changes proposed in the previous comment, the following 
changes have to be made:

1. In the logical layer indicate if accumulate requires its input from foreach
2. In the physical layer (for foreach), attach input should attach the tuple to 
accumulate in addition to all the roots in the nested plans of foreach

      was (Author: sms):
    Consider the following example, a modification of Case 2 in the previous 
comment:

{code}
A = load 'myfile';
B = group A by $0;
C = foreach B {
    C1 = distinct $1;
    generate group + SUM(C1);
};
{code}

Top level plan:

load -> group -> foreach

The foreach will have a nested plan:

plan 1: project(1) -> distinct -> accumulate

The accumulate will have a nested plan of: 

{noformat}
     project( * )
                        \
                        COUNT()
                         / 
project(group)
{noformat}

The accumulate operator requires two inputs:

1. The tuple from foreach for projecting 'group'
2. The bag from distinct for the aggregate SUM

With the proposed changes, accumulate will not be able to receive inputs from 
both foreach and distinct. In order to solve this problem, accumulate has to be 
made a proxy root by attaching the input from foreach to accumulate. The second 
input from distinct will be retrieved using getNext()

In addition to the changes proposed in the previous comment, the following 
changes have to be made:

1. In the logical layer indicate if accumulate requires its input from foreach
2. In the physical layer (for foreach), attach input should attach the tuple to 
accumulate in addition to all the roots in the nested plans of foreach
  
> Rework physical plan
> --------------------
>
>                 Key: PIG-161
>                 URL: https://issues.apache.org/jira/browse/PIG-161
>             Project: Pig
>          Issue Type: Sub-task
>            Reporter: Alan Gates
>            Assignee: Alan Gates
>         Attachments: arithmeticOperators.patch, BinCondAndNegative.patch, 
> CastAndMapLookUp.patch, incr2.patch, incr3.patch, incr4.patch, incr5.patch, 
> logToPhyTranslator.patch, missingOps.patch, 
> MRCompilerTests_PlansAndOutputs.txt, Phy_AbsClass.patch, physicalOps.patch, 
> physicalOps.patch, physicalOps.patch, physicalOps.patch, 
> physicalOps_latest.patch, POCast.patch, POCast.patch, podistinct.patch, 
> pogenerate.patch, pogenerate.patch, pogenerate.patch, posort.patch, 
> POUserFuncCorrection.patch, 
> TEST-org.apache.pig.test.TestLocalJobSubmission.txt, 
> TEST-org.apache.pig.test.TestLogToPhyCompiler.txt, 
> TEST-org.apache.pig.test.TestLogToPhyCompiler.txt, 
> TEST-org.apache.pig.test.TestMapReduce.txt, 
> TEST-org.apache.pig.test.TestTypeCheckingValidator.txt, 
> TEST-org.apache.pig.test.TestUnion.txt, translator.patch, translator.patch, 
> translator.patch, translator.patch
>
>
> This bug tracks work to rework all of the physical operators as described in 
> http://wiki.apache.org/pig/PigTypesFunctionalSpec

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to