HyukjinKwon commented on a change in pull request #27565: [WIP][SPARK-30791] 
Dataframe add sameSemantics and sementicHash method
URL: https://github.com/apache/spark/pull/27565#discussion_r379354632
 
 

 ##########
 File path: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala
 ##########
 @@ -3308,6 +3308,37 @@ class Dataset[T] private[sql](
     files.toSet.toArray
   }
 
+  /**
+   * Returns true when the query plan of the given Dataset will return the 
same results as this
+   * Dataset.
+   *
+   * Since its likely undecidable to generally determine if two given plans 
will produce the same
 
 Review comment:
   I would rewrite the doc as below if you guys think it's fine.
   
   ```
   Returns `true` when the logical query plans inside both [[Dataset]]s are 
equal and
   therefore return same results.
   
   @note The equality comparison here is simplified by tolerating the cosmetic 
differences
   such as attribute names.
   
   @note This API can compare both [[Dataset]]s very fast but can still return 
`false` on
   the [[Dataset]] that return the same results, for instance, from different 
plans. Such
   false negative semantic can be useful when caching as an example.
   ```

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to