Github user maryannxue commented on the issue:
https://github.com/apache/spark/pull/19488
The point I was trying to make is, there are two types of non-deterministic
aggregate functions: first being non-deterministic but not necessarily
arbitrary, second being "deliberately" arbitrary, such as RAND. I think for the
first category of non-deterministic functions, it's intuitive to expect that
two occurrences of FIRST_VALUE(x) return the same value when user writes sth.
like "SELECT FIRST(x), FIRST(x) + 1 FROM t".
I'd propose two options:
1. Take a step back and do not deduplicate non-deterministic functions as
@cloud-fan first suggested.
1. To have a way to distinguish arbitrary functions (for UDF as well) for
other non-deterministic cases, and avoid deduplication only for arbitrary
functions.
Thoughts?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]