[jira] Commented: (PIG-158) Rework logical plan

Pi Song (JIRA) Mon, 21 Apr 2008 07:50:42 -0700

    [ 
https://issues.apache.org/jira/browse/PIG-158?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12590935#action_12590935
 ]


Pi Song commented on PIG-158:
-----------------------------

Comments on the parser changes patch (Only for Operators because I think this 
has to be done right first):-

1) So seems like every operator will handle its own output schema right? That 
makes it easier for me.

2) in LOProject.getSchema() :-
{code}
                for (int colNum : mProjection) {
                    if (null != mExp) {
                        log.info("mExp is not null");
                        try {
                            fss.add(mExp.getSchema().getField(colNum));
                        } catch (Exception e) {
                            mSchema = null;
                            mIsSchemaComputed = false;
                            throw new IOException(e.getMessage());
                        }
                    } else {
                        log.info("mExp is null");
                        fss.add(new Schema.FieldSchema(null, 
DataType.BYTEARRAY));
                    }
                }
{code}
mExp is an ExpressionOperator. From what you said before .getSchema() will 
always return null except if we're dealing with tuple operators. I think this 
will make more sense if mExp becomes RelationOperator (We only do projection 
operations on relational operators, don't we?)

3) LOSplitOutput
{code}
    public Schema getSchema() throws IOException{
        if (!mIsSchemaComputed && (null == mSchema)) {
            // get our parent's schema
            Collection<LogicalOperator> s = mPlan.getSuccessors(this);
            try {
                LogicalOperator op = s.iterator().next();
                if (null == op) {
                    throw new IOException("Could not find operator in plan");
                }
                mSchema = s.iterator().next().getSchema();
                mIsSchemaComputed = true;
            } catch (IOException ioe) {
                mSchema = null;
                mIsSchemaComputed = false;
                throw ioe;
            }
        }
        return mSchema;
    }
{code}
shouldn't the output schema be the same as the input schema ? (So, use 
mPlan.getPredecessors()). Typing is done by induction so you're not supposed to 
use type from downstream operator to infer type in upstream operator.

4) In think LOSort should have "List<LogicalPlan> cols" because we should be 
able to sort by more than one field and each field can be nested expressions

5) LOCross also uses mPlan.getSuccessors()

6) LOSplit should have "List<LogicalPlan> conds" because we should be able to 
split be nested expressions

7) All "private Log log = LogFactory.getLog(getClass());" should be made static

8) LogicalOperator :-
{code}
     * A boolean variable to remember if input has to be flattened Used only in
     * the context of generate
     */
    private boolean mIsFlatten = false;
{code}
So, shouldn't this be in LOGenerate ?

9) I still think we should have LORelationalOperator superclass that has 
getSchema()/setSchema() and remove those two methods from ExpressionOperator.

10) LOForEach should have "List<LogicalPlan> evals" because each output column 
should be calculated used nested expressions.

11) LOCogroup should have "ArrayList<LogicalPlan> mGroupByCols", same reason as 
above.

12) What is LORegexp for? "Regex match" or "Regex replace" ? If it's "Regex 
match" so getType() should return DataType.BOOLEAN

PS. A lot of things, right? Possibly you're right but my understanding is 
wrong. If so, please give me some clues :). Keep doing good work, mate!!



> Rework logical plan
> -------------------
>
>                 Key: PIG-158
>                 URL: https://issues.apache.org/jira/browse/PIG-158
>             Project: Pig
>          Issue Type: Sub-task
>          Components: impl
>            Reporter: Alan Gates
>            Assignee: Alan Gates
>         Attachments: logical_operators.patch, logical_operators_rev_1.patch, 
> logical_operators_rev_2.patch, logical_operators_rev_3.patch, 
> parser_changes.patch, ParserErrors.txt, visitorWalker.patch
>
>
> Rework the logical plan in line with 
> http://wiki.apache.org/pig/PigExecutionModel

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (PIG-158) Rework logical plan

Reply via email to