[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #40390: [SPARK-42768][SQL] Enable cached plan apply AQE by default

2023-08-21 Thread via GitHub


dongjoon-hyun commented on code in PR #40390:
URL: https://github.com/apache/spark/pull/40390#discussion_r1300808158


##
sql/core/src/main/scala/org/apache/spark/sql/execution/CacheManager.scala:
##
@@ -395,7 +395,10 @@ class CacheManager extends Logging with 
AdaptiveSparkPlanHelper {
*/
   private def getOrCloneSessionWithConfigsOff(session: SparkSession): 
SparkSession = {
 if (session.conf.get(SQLConf.CAN_CHANGE_CACHED_PLAN_OUTPUT_PARTITIONING)) {
-  session
+  // Bucketed scan only has one time overhead but can have multi-times 
benefits in cache,
+  // so we always do bucketed scan in a cached plan.
+  SparkSession.getOrCloneSessionWithConfigsOff(
+session, SQLConf.AUTO_BUCKETED_SCAN_ENABLED :: Nil)

Review Comment:
   Thank you!



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #40390: [SPARK-42768][SQL] Enable cached plan apply AQE by default

2023-08-21 Thread via GitHub


dongjoon-hyun commented on code in PR #40390:
URL: https://github.com/apache/spark/pull/40390#discussion_r1299657447


##
sql/core/src/main/scala/org/apache/spark/sql/execution/CacheManager.scala:
##
@@ -395,7 +395,10 @@ class CacheManager extends Logging with 
AdaptiveSparkPlanHelper {
*/
   private def getOrCloneSessionWithConfigsOff(session: SparkSession): 
SparkSession = {
 if (session.conf.get(SQLConf.CAN_CHANGE_CACHED_PLAN_OUTPUT_PARTITIONING)) {
-  session
+  // Bucketed scan only has one time overhead but can have multi-times 
benefits in cache,
+  // so we always do bucketed scan in a cached plan.
+  SparkSession.getOrCloneSessionWithConfigsOff(
+session, SQLConf.AUTO_BUCKETED_SCAN_ENABLED :: Nil)

Review Comment:
   Please update the method description too according to this code change.
   
   
https://github.com/apache/spark/blob/a21e19b6e7ac4c4c77b39d93a2da2cbe1c88c4c8/sql/core/src/main/scala/org/apache/spark/sql/execution/CacheManager.scala#L394



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #40390: [SPARK-42768][SQL] Enable cached plan apply AQE by default

2023-03-12 Thread via GitHub


dongjoon-hyun commented on code in PR #40390:
URL: https://github.com/apache/spark/pull/40390#discussion_r1133456778


##
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala:
##
@@ -1493,15 +1493,14 @@ object SQLConf {
 
   val CAN_CHANGE_CACHED_PLAN_OUTPUT_PARTITIONING =
 buildConf("spark.sql.optimizer.canChangeCachedPlanOutputPartitioning")
-  .internal()

Review Comment:
   BTW, you don't need to expose this. As you see in the most legacy configs, 
`.internal()` is fine.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org