+1 Agree. I will try to make "best fit" happen in 24 hours after you commit the new UDF design.
On Thu, Jul 3, 2008 at 6:55 AM, Olga Natkovich <[EMAIL PROTECTED]> wrote: > Sounds good to me. > > Olga > > > -----Original Message----- > > From: Alan Gates [mailto:[EMAIL PROTECTED] > > Sent: Wednesday, July 02, 2008 1:44 PM > > To: [email protected] > > Subject: UDFs and types > > > > With the introduction of types (see > > http://issues.apache.org/jira/browse/PIG-157) we need to > > decide how EvalFunc will interact with the types. The > > original proposal was that the DEFINE keyword would be > > modified to allow specification of types for the UDF. This > > has a couple of problems. One, DEFINE is already used to > > specify constructor arguments. Using it to also specify > > types will be confusing. Two, it has been pointed out that > > this type information is a property of the UDF and should > > therefore be declared by the UDF, not in the script. > > > > Separately, as a way to allow simple function overloading, a > > change had been proposed to the EvalFunc interface to allow > > an EvalFunc to specify that for a given type, a different > > instance of EvalFunc should be used (see > > https://issues.apache.org/jira/browse/PIG-276). > > > > I would like to propose that we expand the changes in PIG-276 > > to be more general. Rather than adding classForType() as > > proposed in PIG-276, EvalFunc will instead add a function: > > > > public Map<Schema, FuncSpec> getArgToFuncMapping() { > > return null; > > } > > > > Where FuncSpec is a new class that contains the name of the > > class that implements the UDF along with any necessary > > arguments for the constructor. > > > > The type checker will then, as part of type checking > > LOUserFunc make a call to this function. If it receives a > > null, it will simply leave the UDF as is, and make the > > assumption that the UDF can handle whatever datatype is being > > provided to it. This will cover most existing UDFs, which > > will not override the default implementation. > > > > If a UDF wants to override the default, it should return a > > map that gives a FuncSpec for each type of schema that it can > > support. For example, for the UDF concat, the map would have > > two entries: > > key: schema(chararray, chararray) value: StringConcat > > key: schema(bytearray, bytearray) value: ByteConcat > > > > The type checker will then take the schema of what is being > > passed to it and perform a lookup in the map. If it finds an > > entry, it will use the associated FuncSpec. If it does not, > > it will throw an exception saying that that EvalFunc cannot > > be used with those types. > > > > At this point, the type checker will make no effort to find a > > best fit function. Either the fit is perfect, or it will not > > be done. In the future we would like to modify the type > > checker to select a best fit. > > For example, if a UDF says it can handle schema(long) and the > > type checker finds it has schema(int), it can insert a cast > > to deal with that. But in the first pass we will ignore this > > and depend on the user to insert the casts. > > > > Thoughts? > > > > Alan. > > >
