[jira] Commented: (PIG-276) Allow UDFs to have different implementations based on input types

Alan Gates (JIRA) Thu, 19 Jun 2008 12:40:07 -0700

    [ 
https://issues.apache.org/jira/browse/PIG-276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12606524#action_12606524
 ]


Alan Gates commented on PIG-276:
--------------------------------

Ideally we would like to support full function overloading for UDFs.  In the 
meantime, we need a way to allow some highly used UDFs to have separate 
implementations based on input types.  There are two reasons for this:

# Obeying the law of least astonishment.  Users don't expect to have SUM(int) 
return a double.
# Performance.  Some crude testing showed that summing longs was 10x faster 
than summing doubles.  As some of these builtin functions are very frequently 
used, optimizing them is a worthwhile endeavor.

Based on discussions on PIG-162, I propose the following changes:

It will be possible to specify an implementation of EvalFunc for each type.  In 
the default implementation (such as SUM) there will be a method:

Class classForType(byte type); // uses DataType types

Given a type, this method will return the appropriate extension of EvalFunc to 
be used.  This will require the following changes:

# The EvalFunc class will need to have this method added.  It should have a 
default implementation that returns null.
# The type checker will need to be changed to call classForType as part of 
checking LOUserFunc. If classForType returns anything other than null, it will 
need to change mFuncSpec in LOUserFunc.  Currently, the parser does some checks 
on the function when it loads it (makes sure we can load the indicated class, 
etc.)  This should be factored out and put in LOUserFunc (or a helper class) so 
that type checker can do the same checks after it swaps the function.  Also, 
LOUserFunc shoudl change to keep a reference to the actual UDF (which the 
parser instantiates), so the type checker doesn't have to instantiate it again.

As for builtins, we need to implement the following specialized functions:

|| External name || input type || output type || mapped to || comments ||
| SUM | long | long | longSum | will handle sum of ints too |
| SUM | double | double | doubleSum | will handle sum of floats too |
| MIN | int | int | intMin | 
| MIN | long | long | longMin | 
| MIN | float | float | floatMin |
| MIN | double |        double |        doubleMin |     
| MIN | chararray |     chararray |     charMin |       
| MIN | bytearray |     bytearray |     byteMin |       
| MAX | int |   int |   intMax |        
| MAX | long |  long |  longMax |       
| MAX | float | float | floatMax |      
| MAX | double |        double |        doubleMax |     
| MAX | chararray |     chararray |     charMax |       
| MAX | bytearray |     bytearray |     byteMax |       
| AVG | long |  double |        longAvg |       will handle avg of ints too |
| AVG | double |        double |        doubleAvg |     will handle avg of 
floats too |
| concat |      chararray |     chararray |     charConcat |    new function to 
concatenate strings |
| concat |      bytearray |     bytearray |     byteConcat |    new function to 
concatenate strings |
| size |        bag |   long |  bagSize |       returns number of tuples |
| size |        tuple | long |  tupleSize |     returns number of elements |
| size |        map |   long |  mapSize |       returns number of keys |
| size |        chararray |     long |  charSize |      returns number of 
characters in chararray |
| size |        bytearray |     long |  byteSize |      returns number of bytes 
in chararray |


The existing versions of SUM, MIN, MAX, and AVG will need to implement the 
classForType method.  Default versions of concat and size will need to be 
implemented that also implement the classForType method.  The default 
implementations of eval for these two new functions should just error out.


> Allow UDFs to have different implementations based on input types
> -----------------------------------------------------------------
>
>                 Key: PIG-276
>                 URL: https://issues.apache.org/jira/browse/PIG-276
>             Project: Pig
>          Issue Type: Sub-task
>            Reporter: Alan Gates
>            Assignee: Pradeep Kamath
>


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (PIG-276) Allow UDFs to have different implementations based on input types

Reply via email to