Re: [PR] [SPARK-46992]make dataset.cache() return new ds instance [spark]

via GitHub Tue, 05 Mar 2024 05:43:13 -0800


dtarima commented on PR #45181:
URL: https://github.com/apache/spark/pull/45181#issuecomment-1978804309


   > > Regardless of the answer I think it makes sense to use the same approach 
for both Dataset states (persisted and unpersisted).
   > 
   > I agree. We can cache it in a lazy variable `queryExecutionPersisted`.
   
   :+1:  to `lazy val`
   
   > > The additional responsibility is independent so it should be in a 
separate method which provides proper QueryExecution: if we do that then we'll 
get something similar to def queryExecution: QueryExecution method above.
   > 
   > I'm afraid it'll be a user-facing change if we can only access 
`queryExecution` by method `queryExecution()`.
   
   I don't think we have a choice... Otherwise using unpersisted 
`QueryExecution` when the dataset is cached may result in inconsistencies. 
Basically, the bug is user-facing and our fix have to be user-facing too by 
definition.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] [SPARK-46992]make dataset.cache() return new ds instance [spark]

Reply via email to