What about using annotations for this? Could we create an annotation say @UDF that allowed us to specify an input schema?
I imagine you could put quite a bit of information into the annotation such as function name, input args, return type, etc... On Wed, Jul 2, 2008 at 3:43 PM, Alan Gates <[EMAIL PROTECTED]> wrote: > With the introduction of types (see > http://issues.apache.org/jira/browse/PIG-157) we need to decide how EvalFunc > will interact with the types. The original proposal was that the DEFINE > keyword would be modified to allow specification of types for the UDF. This > has a couple of problems. One, DEFINE is already used to specify > constructor arguments. Using it to also specify types will be confusing. > Two, it has been pointed out that this type information is a property of > the UDF and should therefore be declared by the UDF, not in the script. > > Separately, as a way to allow simple function overloading, a change had been > proposed to the EvalFunc interface to allow an EvalFunc to specify that for > a given type, a different instance of EvalFunc should be used (see > https://issues.apache.org/jira/browse/PIG-276). > > I would like to propose that we expand the changes in PIG-276 to be more > general. Rather than adding classForType() as proposed in PIG-276, EvalFunc > will instead add a function: > > public Map<Schema, FuncSpec> getArgToFuncMapping() { > return null; > } > > Where FuncSpec is a new class that contains the name of the class that > implements the UDF along with any necessary arguments for the constructor. > > The type checker will then, as part of type checking LOUserFunc make a call > to this function. If it receives a null, it will simply leave the UDF as > is, and make the assumption that the UDF can handle whatever datatype is > being provided to it. This will cover most existing UDFs, which will not > override the default implementation. > > If a UDF wants to override the default, it should return a map that gives a > FuncSpec for each type of schema that it can support. For example, for the > UDF concat, the map would have two entries: > key: schema(chararray, chararray) value: StringConcat > key: schema(bytearray, bytearray) value: ByteConcat > > The type checker will then take the schema of what is being passed to it and > perform a lookup in the map. If it finds an entry, it will use the > associated FuncSpec. If it does not, it will throw an exception saying that > that EvalFunc cannot be used with those types. > > At this point, the type checker will make no effort to find a best fit > function. Either the fit is perfect, or it will not be done. In the future > we would like to modify the type checker to select a best fit. For example, > if a UDF says it can handle schema(long) and the type checker finds it has > schema(int), it can insert a cast to deal with that. But in the first pass > we will ignore this and depend on the user to insert the casts. > > Thoughts? > > Alan. >
