maropu commented on a change in pull request #30138:
URL: https://github.com/apache/spark/pull/30138#discussion_r510816024
##########
File path:
sql/core/src/main/scala/org/apache/spark/sql/execution/CacheManager.scala
##########
@@ -55,6 +56,17 @@ class CacheManager extends Logging with
AdaptiveSparkPlanHelper {
@transient @volatile
private var cachedData = IndexedSeq[CachedData]()
+ /**
+ * Configurations needs to be turned off, to avoid regression for cached
query, so that the
+ * outputPartitioning of the underlying cached query plan can be leveraged
later.
+ * Configurations include:
+ * 1. AQE
+ * 2. Automatic bucketed table scan
+ */
+ private val configsOff = Seq(
Review comment:
nit: How about `configsOff` -> `forceDisableConfigs`?
##########
File path:
sql/core/src/main/scala/org/apache/spark/sql/execution/CacheManager.scala
##########
@@ -79,10 +91,11 @@ class CacheManager extends Logging with
AdaptiveSparkPlanHelper {
if (lookupCachedData(planToCache).nonEmpty) {
logWarning("Asked to cache already cached data.")
} else {
- // Turn off AQE so that the outputPartitioning of the underlying plan
can be leveraged.
- val sessionWithAqeOff = getOrCloneSessionWithAqeOff(query.sparkSession)
- val inMemoryRelation = sessionWithAqeOff.withActive {
- val qe = sessionWithAqeOff.sessionState.executePlan(planToCache)
+ // Turn off configs so that the outputPartitioning of the underlying
plan can be leveraged.
+ val sessionWithConfigsOff = getOrCloneSessionWithConfigsOff(
Review comment:
nit: it seems we don't this line break;
```
val sessionWithConfigsOff =
getOrCloneSessionWithConfigsOff(query.sparkSession, configsOff)
```
##########
File path:
sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/AdaptiveSparkPlanHelper.scala
##########
@@ -139,15 +138,21 @@ trait AdaptiveSparkPlanHelper {
}
/**
- * Returns a cloned [[SparkSession]] with adaptive execution disabled, or
the original
- * [[SparkSession]] if its adaptive execution is already disabled.
+ * Returns a cloned [[SparkSession]] with all specified configurations
disabled, or
+ * the original [[SparkSession]] if all configurations are already disabled.
*/
- def getOrCloneSessionWithAqeOff[T](session: SparkSession): SparkSession = {
- if (!session.sessionState.conf.adaptiveExecutionEnabled) {
+ def getOrCloneSessionWithConfigsOff(
Review comment:
Since this method is not only for AQE now, could you move this method
into a more suitable place, e.g., `object SparkSessoin` or somewhere?
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]