Github user cloud-fan commented on the issue:
https://github.com/apache/spark/pull/19193
For 1), yes let's forbid it.
For 2), My feeling is, in `Dataset` API we don't need `having` because it's
easy to change the order of the operator, users can call `filter` first then
`agg`. While in SQL you will need subquery so `having` is convenient.
For `df.groupBy('a).agg(max('b), rank().over(window)).where(sum('b) ===
5)`, I think it's valid to fail, as Spark is not smart enough to rewrite your
query and make it work. If we can find a way to rewrite and fix the query, we
can support it.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]