Volodymyr Vysotskyi created DRILL-7337:
------------------------------------------

             Summary: Add vararg UDFs support
                 Key: DRILL-7337
                 URL: https://issues.apache.org/jira/browse/DRILL-7337
             Project: Apache Drill
          Issue Type: Sub-task
    Affects Versions: 1.16.0
            Reporter: Volodymyr Vysotskyi
            Assignee: Volodymyr Vysotskyi
             Fix For: 1.17.0


The aim of this Jira is to add support for vararg UDFs to simplify UDFs 
creation for the case when it is required to accept different numbers of 
arguments.
h2. Requirements for vararg UDFs:
 * It should be possible to register vararg UDFs with the same name, but with 
different argument types;
 * Only vararg UDFs with a single variable-length argument placed after all 
other arguments should be allowed;
 * Vararg UDF should have less priority than the regular one for the case when 
they both are suitable;
 * Besides simple functions, vararg support should be added to the aggregate 
functions.

h2. Implementation details

The lifecycle of UDF is the following:
 * UDF is validated in {{FunctionConverter}} class and for the case when there 
is no problem (UDF has required fields with required types, required 
annotations, etc.), it is converted to the {{DrillFuncHolder}} to be registered 
in the function registry. Also, corresponding {{SqlFunction}} instances are 
created based on {{DrillFuncHolder}} to be used in Calcite;
 * When a query uses this UDF, Calcite validate that UDF with required name, 
arguments number and arguments types (for Drill arguments types are not checked 
at this stage) exists;
 * After Calcite was able to find the required {{SqlFunction instance}}, it 
uses Drill to find required {{DrillFuncHolder}}. All the work for determining 
the most suitable function is done in {{FunctionResolver}} and in 
{{TypeCastRules.getCost()}};
 * At the execution stage, {{DrillFuncHolder}} found again using 
{{FunctionCall}} instance;
 * {{DrillFuncHolder}} is used for code generation.

Considering these steps, the first thing to be done for adding support for 
vararg UDFs is updating logic in {{FunctionConverter}} to allow registering 
vararg UDFs taking into account requirements declared above.

Calcite uses {{SqlOperandTypeChecker}} to verify arguments number, so Drill 
should provide its own for vararg UDFs to be able to use them. To determine 
whether UDF is vararg, new {{isVarArg}} property will be added to the 
{{FunctionTemplate}}.

{{TypeCastRules.getCost()}} method should be updated to be able to find vararg 
UDFs and prioritize regular UDFs.

Code generation logic should be updated to handle vararg UDFs. Generated code 
for varag argument will look in the following way:
{code:java}
                  NullableVarCharHolder[] inputs = new NullableVarCharHolder[3];
                  inputs[0] = out14;
                  inputs[1] = out19;
                  inputs[2] = out24;
{code}
To create own varagr UDF, new {{isVarArg}} property should be set to {{true}} 
in {{FunctionTemplate}}.
 After that, required vararg input should be declared as an array.

Here is an example if vararg UDF:
{code:java}
  @FunctionTemplate(name = "concat_varchar",
                    isVarArg = true,
                    scope = FunctionTemplate.FunctionScope.SIMPLE)
  public class VarCharConcatFunction implements DrillSimpleFunc {
    @Param *VarCharHolder[] inputs*;
    @Output VarCharHolder out;
    @Inject DrillBuf buffer;
 
     @Override
    public void setup() {
    }

     @Override
    public void eval() {
      int length = 0;
      for (VarCharHolder input : inputs) {
        length += input.end - input.start;
      }
       out.buffer = buffer = buffer.reallocIfNeeded(length);
      out.start = out.end = 0;
       for (VarCharHolder input : inputs) {
        for (int id = input.start; id < input.end; id++) {
          out.buffer.setByte(out.end++, input.buffer.getByte(id));
        }
      }
    }
  }
{code}
h2. Limitations connected with VarArg UDFs:
 * Specified nulls handling in FunctionTemplate does not affect vararg 
parameters, i.e. the user should add UDFs with non-nullable and nullable value 
holder vararg fields;
 * VarArg UDFs supports only values of the same type including nullability for 
vararg arguments for value holder vararg fields. If vararg field is 
FieldReader, all the responsibility for handling types and nullability of input 
vararg fields is placed on the UDF implementation;
 * The scalar replacement does not happen for vararg arguments;
 * UDF implementation should consider the case when vararg field is empty.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

Reply via email to