[
https://issues.apache.org/jira/browse/PIG-158?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12591287#action_12591287
]
Pi Song commented on PIG-158:
-----------------------------
2) So based on how you intend to use it. I would say LOProject is a function
from Tuple x Tuple to Tuple (In our data model) :-
{noformat}
LOProject:(Tuple x Tuple) -> Tuple
{noformat}
or as a method:-
{noformat}
OutputTuple LOProject(InputTuple, IndexListTuple)
{noformat}
So for your given example $1.($0, $1, $2) , we can write like this:-
{noformat}
LOProject( LOProject(A_Tuple_From_Input_Bag, {1}), {0,1,2} )
{noformat}
which is opposite from your solution. This thing is a bit tricky, isn't it?
from what I can see here, LOGenerate suits very well with the nested plan model
and it should have ArrayList<LogicalPlan>. You can look at the inner LOProject
as the upstream operator and the outer one is the downstream. And you need a
list of plans because you may have to handle $1.($0, $1, $2) , $2 , $3.($1,$2)
Another suggestion is that we should also signify tuple operators by having a
parent class for them.
3) I think by having nested operators (which some of them might not be in any
plan) it will be more headache in that we will have to handle special cases for
some operators that are just floating around (depending on the implementation
of the operator those floating operators stick to).
I always emphasize nested plan because if we just have a consistent nested
model then we can just define our operations on our plans using recursive
definitions which I find simpler than having same logics that work differently
on different places. I've started doing this by implementing type-checking and
schema merging using recursive definitions to prove that this concept really
does make things simpler.
7) I have been trying to find special cases where logger cannot be static. If
you know any of such cases please throw me some light.
8) Don't you think "flatten" should be associated with each column in
LOGenerate? So LOGenerate may have "List<boolean> isFlatten". Basically if the
mapped column is not a bag, it is meaningless.
One more question:-
- ForEach and Generate are always in the same statement so they are always used
together. I think what you've done is somehow separating their
responsibilities. Could you please explain how they are being used?
PS. You will see that I really emphasize on model consistency because I believe
that's how to simplify things. If you don't have too many exceptional cases,
then the logical model can be much simpler.
> Rework logical plan
> -------------------
>
> Key: PIG-158
> URL: https://issues.apache.org/jira/browse/PIG-158
> Project: Pig
> Issue Type: Sub-task
> Components: impl
> Reporter: Alan Gates
> Assignee: Alan Gates
> Attachments: logical_operators.patch, logical_operators_rev_1.patch,
> logical_operators_rev_2.patch, logical_operators_rev_3.patch,
> parser_changes.patch, ParserErrors.txt, visitorWalker.patch
>
>
> Rework the logical plan in line with
> http://wiki.apache.org/pig/PigExecutionModel
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.