Re: [PR] [SPARK-46992]make dataset.cache() return new ds instance [spark]

via GitHub Tue, 05 Mar 2024 06:00:39 -0800


dtarima commented on PR #45181:
URL: https://github.com/apache/spark/pull/45181#issuecomment-1978837075


   > We can't cache the queryExecution in the Dataset itself because the 
queryExecution may come from other Dataset instance. See `isEmpty`:
   > 
   > ```scala
   > def isEmpty: Boolean = withAction("isEmpty", 
select().limit(1).queryExecution) { plan =>
   >     plan.executeTake(1).isEmpty
   >   }
   > ```
   
   I don't see a problem here.
   Yes, we can only cache our own `queryExecution` instances associated with 
our `logicalPlan` (the same way it's cached now as `val` in constructor).
   
   `select().limit(1)` will create two more short-lived `Dataset` instances, 
but we don't care about them - it's just an implementation detail not related 
to our `Dataset` instance.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] [SPARK-46992]make dataset.cache() return new ds instance [spark]

Reply via email to