Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/19193 For 1), yes let's forbid it. For 2), My feeling is, in `Dataset` API we don't need `having` because it's easy to change the order of the operator, users can call `filter` first then `agg`. While in SQL you will need subquery so `having` is convenient. For `df.groupBy('a).agg(max('b), rank().over(window)).where(sum('b) === 5)`, I think it's valid to fail, as Spark is not smart enough to rewrite your query and make it work. If we can find a way to rewrite and fix the query, we can support it.
--- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org