Github user maryannxue commented on the issue:

    https://github.com/apache/spark/pull/19488
  
    The point I was trying to make is, there are two types of non-deterministic 
aggregate functions: first being non-deterministic but not necessarily 
arbitrary, second being "deliberately" arbitrary, such as RAND. I think for the 
first category of non-deterministic functions, it's intuitive to expect that 
two occurrences of FIRST_VALUE(x) return the same value when user writes sth. 
like "SELECT FIRST(x), FIRST(x) + 1 FROM t".
    I'd propose two options:
    1. Take a step back and do not deduplicate non-deterministic functions as 
@cloud-fan first suggested.
    1. To have a way to distinguish arbitrary functions (for UDF as well) for 
other non-deterministic cases, and avoid deduplication only for arbitrary 
functions.
    Thoughts?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to