Re: [PR] [SPARK-52104][CONNECT][SCALA] Validate column name eagerly in Spark Connect Scala Client [spark]

via GitHub Tue, 10 Jun 2025 06:42:54 -0700


hvanhovell commented on PR #50873:
URL: https://github.com/apache/spark/pull/50873#issuecomment-2959316695


   @xi-db the Connect API is supposed to be lazy. That we did this in Python is 
a mistake. Concretely, I can see two problems with this:
   - It can create quite a few more extra RPCs.
   - It is misleading. By the time you submit something for execution, your 
underlying data might have changed. You will see a failure anyway. This works 
for classic because we have eager analysis, and the Dataset is bound at 
definition time instead of execution time.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Re: [PR] [SPARK-52104][CONNECT][SCALA] Validate column name eagerly in Spark Connect Scala Client [spark]

Reply via email to