[
https://issues.apache.org/jira/browse/PIG-158?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12589094#action_12589094
]
Santhosh Srinivasan commented on PIG-158:
-----------------------------------------
Pi,
Thanks for the comments. Please see my responses inline with [Santhosh]
1) In COGroup why is that mInputs an ArrayList<String> ? Shouldn't it be
ArrayList<LogicalOperator> ? How do you plan to get inputs out of strings?
[Santhosh] Yes, it should be ArrayList<LogicalOperator>. I realized this when I
was changing the parser code. I have made these changes but not posted a patch
as the parser code changes are being tested.
2) Why LOSort has getInput() but LOFilter and LOSplit don't have? All of them
have 1 bag input + expression input(s).
[Santhosh] I have added getInput() to LOFilter as part of the parser changes
(see previous response). Looks like I have missed out on LOSplit. I will verify
that and add it.
3) I think the PigTypeDesign documentation in Wiki is out-of-date. Is LOProject
a replacement for FieldExpression?
[Santhosh] LOProject is for operations like A.($0,$1) A.name, etc. I am not
sure about the name FieldExpression. It could be that.
4) What is the right way to get a column name or a column index from LOProject
(if a column name is known or a column index is known) ? At the moment
LOProject maintains "List<String> projection" which seems to contain column
names. If I refer to columns by $0,$1,$2, ... , what will be stored in this
string list?
[Santhosh] I have changed LOProject to take a list of integers instead of a
list of string. The columns should be referred to by position.
5) How to handle algebraic functions (takes bag, outputs dataatom) in the new
type design. I haven't seen such operators yet.
[Santhosh] I haven't looked into that. Let me get back to you.
6) Should all the relational operators share the same RelationalOperator parent
class? All of them share the same characteristic that is taking a bag of tuples
as input and outputing a bag of tuples)
[Santhosh] Thats a good question. Currently, all the relational operators are
logical operators. With your proposal, there will be an equivalent of
expression operators. I would like to hear what other folks think about this.
7) All the relational operators should always have getType() = DataType.BAG ?
[Santhosh] Thats true for most (all?) relational operators. I hope I have not
missed out any. Let me double check that statement.
8) What are setSchema(), getSchema() in relational operators? Do they mean
schema of tuples in the output bag?
[Santhosh] Yes
9) How about setSchema(), getSchema() in expression operators?
[Santhosh] Most of the expression operators should return a null. There are
exceptions - user defined functions can return tuples that have a schema,
arithmetic operators on tuples will result in schemas, etc.
10) (I believe you know this) Do we plan to have a bag containing other
datatypes other than tuples?
[Santhosh] I don't think so.
> Rework logical plan
> -------------------
>
> Key: PIG-158
> URL: https://issues.apache.org/jira/browse/PIG-158
> Project: Pig
> Issue Type: Sub-task
> Components: impl
> Reporter: Alan Gates
> Assignee: Alan Gates
> Attachments: logical_operators.patch, logical_operators_rev_1.patch,
> logical_operators_rev_2.patch, logical_operators_rev_3.patch,
> visitorWalker.patch
>
>
> Rework the logical plan in line with
> http://wiki.apache.org/pig/PigExecutionModel
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.