HyukjinKwon commented on a change in pull request #27565: [WIP][SPARK-30791]
Dataframe add sameSemantics and sementicHash method
URL: https://github.com/apache/spark/pull/27565#discussion_r379354632
##########
File path: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala
##########
@@ -3308,6 +3308,37 @@ class Dataset[T] private[sql](
files.toSet.toArray
}
+ /**
+ * Returns true when the query plan of the given Dataset will return the
same results as this
+ * Dataset.
+ *
+ * Since its likely undecidable to generally determine if two given plans
will produce the same
Review comment:
I would rewrite the doc as below if you guys think it's fine.
```
Returns `true` when the logical query plans inside both [[Dataset]]s are
equal and
therefore return same results.
@note The equality comparison here is simplified by tolerating the cosmetic
differences
such as attribute names.
@note This API can compare both [[Dataset]]s very fast but can still return
`false` on
the [[Dataset]] that return the same results, for instance, from different
plans. Such
false negative semantic can be useful when caching as an example.
```
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]