[GitHub] [spark] cloud-fan commented on a change in pull request #29461: [SPARK-32456][SS][FOLLOWUP] Update doc to note about using SQL statement with streaming Dataset

GitBox Wed, 09 Sep 2020 20:11:24 -0700


cloud-fan commented on a change in pull request #29461:
URL: https://github.com/apache/spark/pull/29461#discussion_r486036434




##########
File path: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala
##########
@@ -3131,8 +3131,12 @@ class Dataset[T] private[sql](
    * Returns a new Dataset that contains only the unique rows from this 
Dataset.
    * This is an alias for `dropDuplicates`.
    *
+   * Note that for a streaming [[Dataset]], this method only returns distinct 
rows only once
+   * regardless of the output mode, which the behavior may not be same with 
`DISTINCT` in SQL
+   * against streaming [[Dataset]].
+   *
    * @note Equality checking is performed directly on the encoded 
representation of the data
-   * and thus is not affected by a custom `equals` function defined on `T`.

Review comment:
       unnecessary change?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] cloud-fan commented on a change in pull request #29461: [SPARK-32456][SS][FOLLOWUP] Update doc to note about using SQL statement with streaming Dataset

Reply via email to