Recently while creating a custom generic hive UDF I came across
a different behavior for the Evaluate method. The custom UDF had a logic to
increment the counter and write it to a file. Now when I execute it directly
without involving any table it always returns an extra count i.e. 2.
Now when I added some logs to inside the evaluate method I
observed that the logs (sysout) were printed twice. Now on further research I
came across the @UDFType annotation and found out that if we do not provide
this annotation in our custom UDF, default value is deterministic true.
When I provide this annotation in my custom UDF and set
@UDFType( deterministic = false ), I observed that my logs were printed only
once and my UDF was returning the accurate count i.e. 1 therefore implying my
evaluate was called only once when @UDFType( deterministic = false ).
Now I wanted to understand what is the connection between
@UDFType and Evaluate method when UDF is invoked directly without a table.
Note : When I invoke my UDF on a table I get the appropriate
count even with @UDFType( deterministic = true ).
Thanks in advance. :)