Re: [PR] [SPARK-49029][CONNECT][SQL] Create shared Dataset interface [spark]

via GitHub Tue, 27 Aug 2024 11:41:44 -0700


hvanhovell commented on code in PR #47882:
URL: https://github.com/apache/spark/pull/47882#discussion_r1733354114



##########
connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/Dataset.scala:
##########
@@ -3591,5 +1507,234 @@ class Dataset[T] private[sql] (
    * We cannot deserialize a connect [[Dataset]] because of a class clash on 
the server side. We
    * null out the instance for now.
    */
+  @scala.annotation.unused("this is used by java serialization")
   private def writeReplace(): Any = null
+
+  ////////////////////////////////////////////////////////////////////////////
+  // Return type overrides to make sure we return the implementation instead

Review Comment:
   Improve this documentation a bit. There are three reasons for doing this:
   - Retain the old signatures for binary compatibility.
   - Java compatibility. The java compiler uses the byte code signatures, and 
those would point to api.Dataset being returned instead of Dataset. This causes 
issues when the java code tries to materialize results, or tries to use 
functionality that is implementation specfic.
   - Scala method resolution runs into problems when the ambiguous methods are 
scattered across the interface and implementation. `drop` and `select` suffered 
from this.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] [SPARK-49029][CONNECT][SQL] Create shared Dataset interface [spark]

Reply via email to