[ 
https://issues.apache.org/jira/browse/PIG-161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12593602#action_12593602
 ] 

Alan Gates commented on PIG-161:
--------------------------------

bq. 1) In POUserFunc.getNext(Integer) you appear to be doing some kind of cast 
from Comparison func. There are a couple of problems with this. One, you don't 
want an if instanceof in a getNext call. We can't afford to do that a million 
times during processing data. Two, physical operators should never be doing 
implicit casts. If we need a cast, it should inserted explicitly by the type 
checker.

bq. Shubham>> POUserFunc needs to handle both the EvalFunc and Comparison func. 
EvalFunc has the exec() method that does the processing while the Comparison 
func has the compare() method. In case of Integers, I need to find out which of 
the two functions I need to use and I do that using the if instanceof. I 
understand this increases the no. of instructions. Can you pls suggest 
something to work around this?? Do you think its a good idea to change the 
Comparison func interface to have an exec() method which can then call the 
compare method or not have a compare at all and have the exec() method do all 
the processing for Comparison func as well??

The best solution is probably to create a subclass of POUserFunc 
(POComparisonFunc maybe) that implements a getNext(Integer) specific to the 
comparison func.  The logical->physical translator can do the instanceof check 
then and instantiate the right class.  This avoids doing the instanceof on 
every getNext().

bq. Also, the cast is because func is an reference of Object type. I check if 
its a ComparisonFunc and then cast it to ComparisonFunc to access the compare 
method.

Ignore my comments on the cast, I was thinking that compare returned a boolean 
instead of an integer.

bq. 2) Several places in getNext are checking if func is null. The constructor 
should instead guarantee that the function has been called and then no checks 
should be done in getNext or anything it calls. This code is going to run once 
for every record processed, so we want to remove every instruction we can from 
it.

bq. Shubham>> Shravan had pointed out earlier that POUserFunc might not be 
serializable because of EvalFunc not being serializable. So I had to declare 
the Object func as transient. The null checking is to make sure that after 
deserialization func is instantiated with EvalFunc/ComparisonFunc.

Ok, but I assume that after deserialization on the MR side, func only needs to 
be instatiated once, but you are checking for it in a number of places.  It 
needs to only be checked once, preferably outside of the getNext loop if 
possible.





> Rework physical plan
> --------------------
>
>                 Key: PIG-161
>                 URL: https://issues.apache.org/jira/browse/PIG-161
>             Project: Pig
>          Issue Type: Sub-task
>            Reporter: Alan Gates
>            Assignee: Alan Gates
>         Attachments: arithmeticOperators.patch, incr2.patch, incr3.patch, 
> incr4.patch, incr5.patch, MRCompilerTests_PlansAndOutputs.txt, 
> Phy_AbsClass.patch, physicalOps.patch, physicalOps.patch, physicalOps.patch, 
> podistinct.patch, pogenerate.patch, pogenerate.patch, pogenerate.patch, 
> posort.patch
>
>
> This bug tracks work to rework all of the physical operators as described in 
> http://wiki.apache.org/pig/PigTypesFunctionalSpec

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to