Github user gatorsmile commented on a diff in the pull request:
https://github.com/apache/spark/pull/20254#discussion_r161377477
--- Diff: python/pyspark/sql/dataframe.py ---
@@ -1364,7 +1364,9 @@ def subtract(self, other):
""" Return a new :class:`DataFrame` containing rows in this frame
but not in another frame.
- This is equivalent to `EXCEPT` in SQL.
+ This is equivalent to `EXCEPT DISTINCT` in SQL.
+
+ (Note: Before Spark 2.0, the behavior was equivalent to `EXCEPT
ALL` in SQL.)
--- End diff --
Actually, before 2.0, it is not equivalent to EXCEPT ALL. For details, see
the PR: https://github.com/apache/spark/pull/12736
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]