[
https://issues.apache.org/jira/browse/PIG-161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12584351#action_12584351
]
Alan Gates commented on PIG-161:
--------------------------------
Comments on Shubham's pogenerate patch:
Overall:
Before we can commit this, you'll need to add some unit tests for each of these
operators and fill out the auto-generated methods like name,
supportsMultipleInputs, visit, etc.
POGenerate:
Where possible, we should use native types rather than objects. For example,
on line 113 you declare restartIts to be a Boolean instead of boolean. This
requires an
unnecessary heap allocation and conversions between objects and native types on
the part of the JVM.
I don't understand the logic in getNext(Tuple). I find in a function like this
with a complicated algorithm that it often helps to put an overview of the
function and
what it does in a header comment, and then give detailed comments at each step
of the algorithm. In particular I don't follow the code between lines
206-250, or why we need the RewindableBagIterator.
A couple of comments on what I think I understand in getNext(Tuple):
1 I don't see logic to handle flattening of tuples. So if if I have a
script like: b = group a by ($0, $1); c = foreach b generate flatten(group),
COUNT($1) then
the resulting tuples coming out of the foreach should have 3 fields, since
group was a tuple of 2 fields.
2 Lines 186-195, if I understand this correctly, you are sticking all
non-tuple/non-bag values in a bag with one tuple which has one field. Why?
The old code did
stuff like this but this is exactly what we want to get away from. You
should be able to stick these results directly into the target tuple that you
will eventually
return, if there is no flattening to be done. If there is flattening to be
done the tuple you stick the results into will serve as the prototype for other
tuples
you will generate. But there's still no need to create that extra bag and
tuple to put the result in.
So I'm thinking the logic will look something like this:
{code:borderStyle=solid}
if (previous cross product to return results from) return next cross product
result;
Tuple target = new Tuple(size(inputs));
foreach (i: input) {
target[i] = input[i].getNext();
}
if (no flattening) return target;
else {
if (tuple flattening) remap target to expand flattened tuples;
if (bag flattening) {
remember cross product we're doing;
return first tuple from cross product;
}
}
{code}
I think you're doing most of this, with the expection of the two points I made
above.
What is createFile for?
Its very confusing that getRootOperators calls getLeaves on the contained
plans. I think it's doing the right thing, but at first glance the names
through me.
Arithmetic operators:
The code should not be swallowing error returns from its predecessors and
passing nulls. It should instead pass errors along.
What is the value of the ArithmeticOperator class? How do arithmetic operators
differ from any other binary operator such that there should be a class to
represent them?
We need to add a MOD operator for int and long too.
Doesn't this code make you wish you had a functional language so you didn't
have to write the same classes each time for each operator? :)
> Rework physical plan
> --------------------
>
> Key: PIG-161
> URL: https://issues.apache.org/jira/browse/PIG-161
> Project: Pig
> Issue Type: Sub-task
> Reporter: Alan Gates
> Assignee: Alan Gates
> Attachments: Phy_AbsClass.patch, pogenerate.patch
>
>
> This bug tracks work to rework all of the physical operators as described in
> http://wiki.apache.org/pig/PigTypesFunctionalSpec
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.