[
https://issues.apache.org/jira/browse/PIG-161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12593602#action_12593602
]
Alan Gates commented on PIG-161:
--------------------------------
bq. 1) In POUserFunc.getNext(Integer) you appear to be doing some kind of cast
from Comparison func. There are a couple of problems with this. One, you don't
want an if instanceof in a getNext call. We can't afford to do that a million
times during processing data. Two, physical operators should never be doing
implicit casts. If we need a cast, it should inserted explicitly by the type
checker.
bq. Shubham>> POUserFunc needs to handle both the EvalFunc and Comparison func.
EvalFunc has the exec() method that does the processing while the Comparison
func has the compare() method. In case of Integers, I need to find out which of
the two functions I need to use and I do that using the if instanceof. I
understand this increases the no. of instructions. Can you pls suggest
something to work around this?? Do you think its a good idea to change the
Comparison func interface to have an exec() method which can then call the
compare method or not have a compare at all and have the exec() method do all
the processing for Comparison func as well??
The best solution is probably to create a subclass of POUserFunc
(POComparisonFunc maybe) that implements a getNext(Integer) specific to the
comparison func. The logical->physical translator can do the instanceof check
then and instantiate the right class. This avoids doing the instanceof on
every getNext().
bq. Also, the cast is because func is an reference of Object type. I check if
its a ComparisonFunc and then cast it to ComparisonFunc to access the compare
method.
Ignore my comments on the cast, I was thinking that compare returned a boolean
instead of an integer.
bq. 2) Several places in getNext are checking if func is null. The constructor
should instead guarantee that the function has been called and then no checks
should be done in getNext or anything it calls. This code is going to run once
for every record processed, so we want to remove every instruction we can from
it.
bq. Shubham>> Shravan had pointed out earlier that POUserFunc might not be
serializable because of EvalFunc not being serializable. So I had to declare
the Object func as transient. The null checking is to make sure that after
deserialization func is instantiated with EvalFunc/ComparisonFunc.
Ok, but I assume that after deserialization on the MR side, func only needs to
be instatiated once, but you are checking for it in a number of places. It
needs to only be checked once, preferably outside of the getNext loop if
possible.
> Rework physical plan
> --------------------
>
> Key: PIG-161
> URL: https://issues.apache.org/jira/browse/PIG-161
> Project: Pig
> Issue Type: Sub-task
> Reporter: Alan Gates
> Assignee: Alan Gates
> Attachments: arithmeticOperators.patch, incr2.patch, incr3.patch,
> incr4.patch, incr5.patch, MRCompilerTests_PlansAndOutputs.txt,
> Phy_AbsClass.patch, physicalOps.patch, physicalOps.patch, physicalOps.patch,
> podistinct.patch, pogenerate.patch, pogenerate.patch, pogenerate.patch,
> posort.patch
>
>
> This bug tracks work to rework all of the physical operators as described in
> http://wiki.apache.org/pig/PigTypesFunctionalSpec
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.