[ 
https://issues.apache.org/jira/browse/PIG-158?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12589094#action_12589094
 ] 

Santhosh Srinivasan commented on PIG-158:
-----------------------------------------

Pi,

Thanks for the comments. Please see my responses inline with [Santhosh]

1) In COGroup why is that mInputs an ArrayList<String> ? Shouldn't it be 
ArrayList<LogicalOperator> ? How do you plan to get inputs out of strings?

[Santhosh] Yes, it should be ArrayList<LogicalOperator>. I realized this when I 
was changing the parser code. I have made these changes but not posted a patch 
as the parser code changes are being tested.

2) Why LOSort has getInput() but LOFilter and LOSplit don't have? All of them 
have 1 bag input + expression input(s).

[Santhosh] I have added getInput() to LOFilter as part of the parser changes 
(see previous response). Looks like I have missed out on LOSplit. I will verify 
that and add it.

3) I think the PigTypeDesign documentation in Wiki is out-of-date. Is LOProject 
a replacement for FieldExpression?

[Santhosh] LOProject is for operations like A.($0,$1) A.name, etc. I am not 
sure about the name FieldExpression. It could be that.

4) What is the right way to get a column name or a column index from LOProject 
(if a column name is known or a column index is known) ? At the moment 
LOProject maintains "List<String> projection" which seems to contain column 
names. If I refer to columns by $0,$1,$2, ... , what will be stored in this 
string list?

[Santhosh] I have changed LOProject to take a list of integers instead of a 
list of string. The columns should be referred to by position.

5) How to handle algebraic functions (takes bag, outputs dataatom) in the new 
type design. I haven't seen such operators yet.

[Santhosh] I haven't looked into that. Let me get back to you.

6) Should all the relational operators share the same RelationalOperator parent 
class? All of them share the same characteristic that is taking a bag of tuples 
as input and outputing a bag of tuples)

[Santhosh] Thats a good question. Currently, all the relational operators are 
logical operators. With your proposal, there will be an equivalent of 
expression operators. I would like to hear what other folks think about this.

7) All the relational operators should always have getType() = DataType.BAG ?

[Santhosh] Thats true for most (all?) relational operators. I hope I have not 
missed out any. Let me double check that statement.

8) What are setSchema(), getSchema() in relational operators? Do they mean 
schema of tuples in the output bag?

[Santhosh] Yes

9) How about setSchema(), getSchema() in expression operators?

[Santhosh] Most of the expression operators should return a null. There are 
exceptions - user defined functions can return tuples that have a schema, 
arithmetic operators on tuples will result in schemas, etc.

10) (I believe you know this) Do we plan to have a bag containing other 
datatypes other than tuples?

[Santhosh] I don't think so.

> Rework logical plan
> -------------------
>
>                 Key: PIG-158
>                 URL: https://issues.apache.org/jira/browse/PIG-158
>             Project: Pig
>          Issue Type: Sub-task
>          Components: impl
>            Reporter: Alan Gates
>            Assignee: Alan Gates
>         Attachments: logical_operators.patch, logical_operators_rev_1.patch, 
> logical_operators_rev_2.patch, logical_operators_rev_3.patch, 
> visitorWalker.patch
>
>
> Rework the logical plan in line with 
> http://wiki.apache.org/pig/PigExecutionModel

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to