Re: UDFs and types

Benjamin Reed Thu, 03 Jul 2008 08:44:23 -0700

You rock Pi!

It might be good to agree on best-fit rules. There are obvious ones: int
-> long, float -> double, but what about long -> int, long ->float, and
string -> float.


There is also the recursive fits, which might be purely theoretical:
tuples of the form (long, {float}) fit to (double, {long}) or (int,
{long}). (That example might be invalid depending on the first answer,
but hopefully you get the idea.)

ben

pi song wrote:
> +1 Agree.
>
> I will try to make "best fit" happen in 24 hours after you commit the new
> UDF design.
>
>
> On Thu, Jul 3, 2008 at 6:55 AM, Olga Natkovich <[EMAIL PROTECTED]> wrote:
>
>   
>> Sounds good to me.
>>
>> Olga
>>
>>     
>>> -----Original Message-----
>>> From: Alan Gates [mailto:[EMAIL PROTECTED]
>>> Sent: Wednesday, July 02, 2008 1:44 PM
>>> To: [email protected]
>>> Subject: UDFs and types
>>>
>>> With the introduction of types (see
>>> http://issues.apache.org/jira/browse/PIG-157) we need to
>>> decide how EvalFunc will interact with the types.  The
>>> original proposal was that the DEFINE keyword would be
>>> modified to allow specification of types for the UDF.  This
>>> has a couple of problems.  One, DEFINE is already used to
>>> specify constructor arguments.  Using it to also specify
>>> types will be confusing.  Two, it has been pointed out that
>>> this type information is a property of the UDF and should
>>> therefore be declared by the UDF, not in the script.
>>>
>>> Separately, as a way to allow simple function overloading, a
>>> change had been proposed to the EvalFunc interface to allow
>>> an EvalFunc to specify that for a given type, a different
>>> instance of EvalFunc should be used (see
>>> https://issues.apache.org/jira/browse/PIG-276).
>>>
>>> I would like to propose that we expand the changes in PIG-276
>>> to be more general.  Rather than adding classForType() as
>>> proposed in PIG-276, EvalFunc will instead add a function:
>>>
>>> public Map<Schema, FuncSpec> getArgToFuncMapping() {
>>>     return null;
>>> }
>>>
>>> Where FuncSpec is a new class that contains the name of the
>>> class that implements the UDF along with any necessary
>>> arguments for the constructor.
>>>
>>> The type checker will then, as part of type checking
>>> LOUserFunc make a call to this function.  If it receives a
>>> null, it will simply leave the UDF as is, and make the
>>> assumption that the UDF can handle whatever datatype is being
>>> provided to it.  This will cover most existing UDFs, which
>>> will not override the default implementation.
>>>
>>> If a UDF wants to override the default, it should return a
>>> map that gives a FuncSpec for each type of schema that it can
>>> support.  For example, for the UDF concat, the map would have
>>> two entries:
>>> key: schema(chararray, chararray) value: StringConcat
>>> key: schema(bytearray, bytearray) value: ByteConcat
>>>
>>> The type checker will then take the schema of what is being
>>> passed to it and perform a lookup in the map.  If it finds an
>>> entry, it will use the associated FuncSpec.  If it does not,
>>> it will throw an exception saying that that EvalFunc cannot
>>> be used with those types.
>>>
>>> At this point, the type checker will make no effort to find a
>>> best fit function.  Either the fit is perfect, or it will not
>>> be done.  In the future we would like to modify the type
>>> checker to select a best fit.
>>> For example, if a UDF says it can handle schema(long) and the
>>> type checker finds it has schema(int), it can insert a cast
>>> to deal with that.  But in the first pass we will ignore this
>>> and depend on the user to insert the casts.
>>>
>>> Thoughts?
>>>
>>> Alan.
>>>
>>>       
>
>

Re: UDFs and types

Reply via email to