Re: Hive Custom UDF evaluate behavior when @UDFType is set

2018-04-10 Thread Jason Dere
Might have to do with constant propagation because the function was listed as 
deterministic. You can try logging the stack trace during execution and pasting 
both stack traces here, may help give more clues as to what is going on.



From: PradeepKumar Yadav 
Sent: Monday, April 9, 2018 11:35 PM
To: user@hive.apache.org
Subject: Hive Custom UDF evaluate behavior when @UDFType is set

Hi,
Recently while creating a custom generic hive UDF I came across 
a different behavior for the Evaluate method. The custom UDF had a logic to 
increment the counter and write it to a file. Now when I execute it directly 
without involving any table it always returns an extra count i.e. 2.
Now when I added some logs to inside the evaluate method I 
observed that the logs (sysout) were printed twice. Now on further research I 
came across the @UDFType annotation and found out that if we do not provide 
this annotation in our custom UDF, default value is deterministic true.
When I provide this annotation in my custom UDF and set 
@UDFType( deterministic = false ), I observed that my logs were printed only 
once and my UDF was returning the accurate count i.e. 1 therefore implying my 
evaluate was called only once when @UDFType( deterministic = false ).
Now I wanted to understand what is the connection between 
@UDFType and Evaluate method when UDF is invoked directly without a table.

Note : When I invoke my UDF on a table I get the appropriate 
count even with @UDFType( deterministic = true ).

Thanks in advance. :)
Regards,
PradeepKumar Yadav


Hive Custom UDF evaluate behavior when @UDFType is set

2018-04-10 Thread PradeepKumar Yadav
Hi,
Recently while creating a custom generic hive UDF I came across 
a different behavior for the Evaluate method. The custom UDF had a logic to 
increment the counter and write it to a file. Now when I execute it directly 
without involving any table it always returns an extra count i.e. 2.
Now when I added some logs to inside the evaluate method I 
observed that the logs (sysout) were printed twice. Now on further research I 
came across the @UDFType annotation and found out that if we do not provide 
this annotation in our custom UDF, default value is deterministic true.
When I provide this annotation in my custom UDF and set 
@UDFType( deterministic = false ), I observed that my logs were printed only 
once and my UDF was returning the accurate count i.e. 1 therefore implying my 
evaluate was called only once when @UDFType( deterministic = false ).
Now I wanted to understand what is the connection between 
@UDFType and Evaluate method when UDF is invoked directly without a table.

Note : When I invoke my UDF on a table I get the appropriate 
count even with @UDFType( deterministic = true ).

Thanks in advance. :)
Regards,
PradeepKumar Yadav