WeichenXu123 commented on a change in pull request #27565: [SPARK-30791]
Dataframe add sameSemantics and sementicHash method
URL: https://github.com/apache/spark/pull/27565#discussion_r378957996
##########
File path: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala
##########
@@ -3308,6 +3308,33 @@ class Dataset[T] private[sql](
files.toSet.toArray
}
+ /**
+ * Returns true when the query plan of the given Dataset will return the
same results as this
+ * Dataset.
+ *
+ * Since its likely undecidable to generally determine if two given plans
will produce the same
+ * results, it is okay for this function to return false, even if the
results are actually
+ * the same. Such behavior will not affect correctness, only the
application of performance
+ * enhancements like caching. However, it is not acceptable to return true
if the results could
+ * possibly be different.
+ *
+ * This function performs a modified version of equality that is tolerant of
cosmetic
+ * differences like attribute naming and or expression id differences.
+ *
+ * @since 3.0.0
+ */
+ @DeveloperApi
+ def sameSemantics(other: Dataset[T]): Boolean = {
Review comment:
Remove @DeveloperApi. Now it is user API.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]