dtarima commented on PR #45181:
URL: https://github.com/apache/spark/pull/45181#issuecomment-1969241145
`Dataset#withAction` accepts `queryExecution` so all places where it gets it
unchanged would need to be fixed, like `withAction("collectAsArrowToR",
queryExecution)`.
And no need to change in other places where `queryExecution` is a different
instance, like `withAction("head", limit(n).queryExecution)`. I'm afraid the
code is going to be too implicit to quickly grasp the meaning of additional
`select` for new readers.
We could have something like this below (choosing a proper instance
depending on the caching state in one place):
```
class Dataset[T] private[sql](
@DeveloperApi @Unstable @transient val queryExecutionUnpersisted:
QueryExecution,
@DeveloperApi @Unstable @transient val queryExecutionPersisted:
QueryExecution,
@DeveloperApi @Unstable @transient val encoder: Encoder[T])
extends Serializable {
def this(qe: QueryExecution, encoder: Encoder[T]) = {
this(
new QueryExecution(qe.sparkSession, qe.logical, qe.tracker, qe.mode),
new QueryExecution(qe.sparkSession, qe.logical, qe.tracker, qe.mode),
encoder)
}
def queryExecution: QueryExecution = {
val cacheManager =
queryExecutionUnpersisted.sparkSession.sharedState.cacheManager
val plan = queryExecutionUnpersisted.logical
if (cacheManager.lookupCachedData(plan).isEmpty)
queryExecutionUnpersisted
else queryExecutionPersisted
}
...
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]