Github user zhzhan commented on the issue:
https://github.com/apache/spark/pull/16068
My understanding is that the non-deterministic udf does not need to be
stageful, but a stateful udf has to be non-deterministic.
Here is the comments in hive regarding this property
/**
If a UDF stores state based on the sequence of records it has processed, it
is stateful. A stateful UDF cannot be used in certain expressions such as
case statement and certain optimizations such as AND/OR short circuiting
don't apply for such UDFs, as they need to be invoked for each record.
row_sequence is an example of stateful UDF. A stateful UDF is considered to
be non-deterministic, irrespective of what deterministic() returns.
*
@return true
*/
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]