HyukjinKwon commented on a change in pull request #27565: [SPARK-30791] 
Dataframe add sameSemantics and sementicHash method
URL: https://github.com/apache/spark/pull/27565#discussion_r379228900
 
 

 ##########
 File path: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala
 ##########
 @@ -3308,6 +3308,31 @@ class Dataset[T] private[sql](
     files.toSet.toArray
   }
 
+  /**
+   * Returns true when the query plan of the given Dataset will return the 
same results as this
+   * Dataset.
+   *
+   * Since its likely undecidable to generally determine if two given plans 
will produce the same
+   * results, it is okay for this function to return false, even if the 
results are actually
+   * the same.  Such behavior will not affect correctness, only the 
application of performance
+   * enhancements like caching.  However, it is not acceptable to return true 
if the results could
+   * possibly be different.
+   *
+   * This function performs a modified version of equality that is tolerant of 
cosmetic
+   * differences like attribute naming and or expression id differences.
+   *
+   * @since 3.0.0
+   */
+  def sameSemantics(other: Dataset[T]): Boolean = {
+    queryExecution.analyzed.sameResult(other.queryExecution.analyzed)
+  }
+
+  /**
+   * Returns a `hashCode` for the calculation performed by the query plan of 
this Dataset. Unlike
+   * the standard `hashCode`, an attempt has been made to eliminate cosmetic 
differences.
+   */
+  def semanticHash: Int = queryExecution.analyzed.semanticHash()
 
 Review comment:
   I would make it as a proper function `semanticHash()` to allow the same API 
usage in PySpark.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to