[
https://issues.apache.org/jira/browse/PIG-158?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12590935#action_12590935
]
Pi Song commented on PIG-158:
-----------------------------
Comments on the parser changes patch (Only for Operators because I think this
has to be done right first):-
1) So seems like every operator will handle its own output schema right? That
makes it easier for me.
2) in LOProject.getSchema() :-
{code}
for (int colNum : mProjection) {
if (null != mExp) {
log.info("mExp is not null");
try {
fss.add(mExp.getSchema().getField(colNum));
} catch (Exception e) {
mSchema = null;
mIsSchemaComputed = false;
throw new IOException(e.getMessage());
}
} else {
log.info("mExp is null");
fss.add(new Schema.FieldSchema(null,
DataType.BYTEARRAY));
}
}
{code}
mExp is an ExpressionOperator. From what you said before .getSchema() will
always return null except if we're dealing with tuple operators. I think this
will make more sense if mExp becomes RelationOperator (We only do projection
operations on relational operators, don't we?)
3) LOSplitOutput
{code}
public Schema getSchema() throws IOException{
if (!mIsSchemaComputed && (null == mSchema)) {
// get our parent's schema
Collection<LogicalOperator> s = mPlan.getSuccessors(this);
try {
LogicalOperator op = s.iterator().next();
if (null == op) {
throw new IOException("Could not find operator in plan");
}
mSchema = s.iterator().next().getSchema();
mIsSchemaComputed = true;
} catch (IOException ioe) {
mSchema = null;
mIsSchemaComputed = false;
throw ioe;
}
}
return mSchema;
}
{code}
shouldn't the output schema be the same as the input schema ? (So, use
mPlan.getPredecessors()). Typing is done by induction so you're not supposed to
use type from downstream operator to infer type in upstream operator.
4) In think LOSort should have "List<LogicalPlan> cols" because we should be
able to sort by more than one field and each field can be nested expressions
5) LOCross also uses mPlan.getSuccessors()
6) LOSplit should have "List<LogicalPlan> conds" because we should be able to
split be nested expressions
7) All "private Log log = LogFactory.getLog(getClass());" should be made static
8) LogicalOperator :-
{code}
* A boolean variable to remember if input has to be flattened Used only in
* the context of generate
*/
private boolean mIsFlatten = false;
{code}
So, shouldn't this be in LOGenerate ?
9) I still think we should have LORelationalOperator superclass that has
getSchema()/setSchema() and remove those two methods from ExpressionOperator.
10) LOForEach should have "List<LogicalPlan> evals" because each output column
should be calculated used nested expressions.
11) LOCogroup should have "ArrayList<LogicalPlan> mGroupByCols", same reason as
above.
12) What is LORegexp for? "Regex match" or "Regex replace" ? If it's "Regex
match" so getType() should return DataType.BOOLEAN
PS. A lot of things, right? Possibly you're right but my understanding is
wrong. If so, please give me some clues :). Keep doing good work, mate!!
> Rework logical plan
> -------------------
>
> Key: PIG-158
> URL: https://issues.apache.org/jira/browse/PIG-158
> Project: Pig
> Issue Type: Sub-task
> Components: impl
> Reporter: Alan Gates
> Assignee: Alan Gates
> Attachments: logical_operators.patch, logical_operators_rev_1.patch,
> logical_operators_rev_2.patch, logical_operators_rev_3.patch,
> parser_changes.patch, ParserErrors.txt, visitorWalker.patch
>
>
> Rework the logical plan in line with
> http://wiki.apache.org/pig/PigExecutionModel
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.