[ 
https://issues.apache.org/jira/browse/PIG-158?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12593549#action_12593549
 ] 

Pi Song commented on PIG-158:
-----------------------------

Questions and comments as usual:-

1. I still don't quite understand this bit:-
[Santhosh]
>Generate is also used for a couple of things:
>i. In Group and Cogroup, the group by columns are wrapped in LOGenerate
>ii. In nested statements in foreach, the expression operators are wrapped in 
>LOGenerate. In the example below, b refers to the column in c. >LOGenerate 
>wraps up b as distinct expects a relational operator.

Let's formalize a bit to see what really these operator do:-
{noformat}
LOForEach : Bag x (f: Tuple -> Tuple ) -> Bag    where f is applied to every 
tuple in the input bag (the concept is like map in Haskell)
LOGenerate : Tuple -> Tuple     (and this should be where the nested plans go)
{noformat}
So, from the normal form "Foreach X generate $0, $1, $2 ", we can write:-
{noformat}
LOForEach : Bag x (LOGenerate: Tuple -> Tuple ) -> Bag 
{noformat}

For COGroup:-
{noformat}
LOCOGroup :  [ Bag_i, f_i: Bag -> Bag ]^n -> Bag      (f_i is f subscripted 
with i. Same for Bag_i)
where f_i is applied to the input Bag to get the intermediate Bag which 
contains Tuple [groupedColumn, inputTuple]
{noformat}
Thus, f_i cannot be LOGenerate (otherwise type mismatch) but LOForeach coupled 
with LOGenerate will be ok.

That's why I don't understand what you said. I think ForEach and Generate still 
have to go together anyway even in COGroup. Except that if the definition of 
LOCOGroup here is different (if say it also does iterating the input bags). But 
if we do so, we're not reusing the functionality of ForEach.

2. One more thing, if you say LOGenerate will be wrapped in LOCOGroup 
(regardless of the outcome of the first question). Then LOCOGroup may not need 
to handle all the nested plan directly.

So,
private MultiMap<LogicalOperator, LogicalPlan> mGroupByPlans;
might become
private Map<LogicalOperator, LogicalPlan> mGroupByPlans;
and then you put LOGenerate in LogicalPlan

This seems to be supported by the definition of LOCOGroup above which has a 
function (from Tuple to Tuple) as a parameter. Also as shown by the definition 
of LOForEach above, LOForEach should have a nested plan which contains 
LOGenerate. (I don't have time to think through this suggestion yet. Might be 
wrong. Though I believe it's quite right because things from math induction are 
usually right)


3. I think LOProject doesn't need an inner plan. The only scenario LOProject 
being used is to extract data from tuple which doesn't involve any nested 
calculations. In contrast LOGenerate needs a list of plans (which is already 
good) because this operator can do nested expressions.


4. LOGenerate - as we allow users to define aliases like this "generate $1, $2, 
($2*$3) + 5, $2+1  as (name,age,gpa,gpa2)", I think the rule should be:-
Case 1: Aliases are given. We associate given aliases with output column by 
column
Case 2: Aliases are not given. If the column is use as it is (eg. $1, $2 in 
example), we just propagate aliases to the output. If user manipulates the 
column somehow (eg. $2+1 in example), the output column should not be given an 
alias.

5. LOForEach has "private ArrayList<LogicalOperator> mOperators". Can ForEach 
be associated with more than one input?

6. In some operators, you haven't removed unused inner ExpressionOperators yet. 

7. I still think LOGenerate should have "List<Boolean> isFlattenList" because 
you can do flatten only in LOGenerate therefore flatten is not a property of 
operator







> Rework logical plan
> -------------------
>
>                 Key: PIG-158
>                 URL: https://issues.apache.org/jira/browse/PIG-158
>             Project: Pig
>          Issue Type: Sub-task
>          Components: impl
>            Reporter: Alan Gates
>            Assignee: Alan Gates
>         Attachments: logical_operators.patch, logical_operators_rev_1.patch, 
> logical_operators_rev_2.patch, logical_operators_rev_3.patch, 
> parser_changes.patch, parser_changes_v1.patch, parser_changes_v2.patch, 
> parser_changes_v3.patch, ParserErrors.txt, visitorWalker.patch
>
>
> Rework the logical plan in line with 
> http://wiki.apache.org/pig/PigExecutionModel

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to