[ 
https://issues.apache.org/jira/browse/PIG-161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12584351#action_12584351
 ] 

Alan Gates commented on PIG-161:
--------------------------------

Comments on Shubham's pogenerate patch:

Overall:

Before we can commit this, you'll need to add some unit tests for each of these 
operators and fill out the auto-generated methods like name, 
supportsMultipleInputs, visit, etc.

POGenerate:
Where possible, we should use native types rather than objects.  For example, 
on line 113 you declare restartIts to be a Boolean instead of boolean.  This 
requires an
unnecessary heap allocation and conversions between objects and native types on 
the part of the JVM.

I don't understand the logic in getNext(Tuple).  I find in a function like this 
with a complicated algorithm that it often helps to put an overview of the 
function and
what it does in a header comment, and then give detailed comments at each step 
of the algorithm.  In particular I don't follow the code between lines
206-250, or why we need the RewindableBagIterator.

A couple of comments on what I think I understand in getNext(Tuple):

    1 I don't see logic to handle flattening of tuples.  So if if I have a 
script like:  b = group a by ($0, $1); c = foreach b generate flatten(group), 
COUNT($1) then
    the resulting tuples coming out of the foreach should have 3 fields, since 
group was a tuple of 2 fields.

    2 Lines 186-195, if I understand this correctly, you are sticking all 
non-tuple/non-bag values in a bag with one tuple which has one field.  Why?  
The old code did
    stuff like this but this is exactly what we want to get away from.  You 
should be able to stick these results directly into the target tuple that you 
will eventually
    return, if there is no flattening to be done.  If there is flattening to be 
done the tuple you stick the results into will serve as the prototype for other 
tuples
    you will generate.  But there's still no need to create that extra bag and 
tuple to put the result in.
    
So I'm thinking the logic will look something like this:

{code:borderStyle=solid}

if (previous cross product to return results from) return next cross product 
result;

Tuple target = new Tuple(size(inputs));
foreach (i: input) {
    target[i] = input[i].getNext();
}

if (no flattening) return target;
else {
    if (tuple flattening) remap target to expand flattened tuples;

    if (bag flattening) {
        remember cross product we're doing;
        return first tuple from cross product;
    }
}

{code}

I think you're doing most of this, with the expection of the two points I made 
above.


What is createFile for?

Its very confusing that getRootOperators calls getLeaves on the contained 
plans.  I think it's doing the right thing, but at first glance the names 
through me.

Arithmetic operators:

The code should not be swallowing error returns from its predecessors and 
passing nulls.  It should instead pass errors along.  

What is the value of the ArithmeticOperator class?  How do arithmetic operators 
differ from any other binary operator such that there should be a class to 
represent them?

We need to add a MOD operator for int and long too.

Doesn't this code make you wish you had a functional language so you didn't 
have to write the same classes each time for each operator? :)


> Rework physical plan
> --------------------
>
>                 Key: PIG-161
>                 URL: https://issues.apache.org/jira/browse/PIG-161
>             Project: Pig
>          Issue Type: Sub-task
>            Reporter: Alan Gates
>            Assignee: Alan Gates
>         Attachments: Phy_AbsClass.patch, pogenerate.patch
>
>
> This bug tracks work to rework all of the physical operators as described in 
> http://wiki.apache.org/pig/PigTypesFunctionalSpec

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to