Re: [PR] [SPARK-47804] Add Dataframe cache debug log [spark]
gengliangwang closed pull request #45990: [SPARK-47804] Add Dataframe cache debug log URL: https://github.com/apache/spark/pull/45990 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-47804] Add Dataframe cache debug log [spark]
gengliangwang commented on PR #45990: URL: https://github.com/apache/spark/pull/45990#issuecomment-2058080259 Thanks, merging to master -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-47804] Add Dataframe cache debug log [spark]
gengliangwang commented on code in PR #45990: URL: https://github.com/apache/spark/pull/45990#discussion_r1566605060 ## sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala: ## @@ -1609,6 +1609,19 @@ object SQLConf { .checkValues(StorageLevelMapper.values.map(_.name()).toSet) .createWithDefault(StorageLevelMapper.MEMORY_AND_DISK.name()) + val DATAFRAME_CACHE_LOG_LEVEL = buildConf("spark.sql.dataframeCache.logLevel") Review Comment: RDD is a Spark core concept. Anyway I respect your choice here. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-47804] Add Dataframe cache debug log [spark]
gengliangwang commented on code in PR #45990: URL: https://github.com/apache/spark/pull/45990#discussion_r1566492674 ## common/utils/src/main/scala/org/apache/spark/internal/Logging.scala: ## @@ -153,10 +153,42 @@ trait Logging { if (log.isDebugEnabled) log.debug(msg) } + protected def logDebug(entry: LogEntry): Unit = { +if (log.isDebugEnabled) { + withLogContext(entry.context) { +log.debug(entry.message) + } +} + } + + protected def logDebug(entry: LogEntry, throwable: Throwable): Unit = { Review Comment: Let's skip this one if it is no needed ## common/utils/src/main/scala/org/apache/spark/internal/Logging.scala: ## @@ -153,10 +153,42 @@ trait Logging { if (log.isDebugEnabled) log.debug(msg) } + protected def logDebug(entry: LogEntry): Unit = { +if (log.isDebugEnabled) { + withLogContext(entry.context) { +log.debug(entry.message) + } +} + } + + protected def logDebug(entry: LogEntry, throwable: Throwable): Unit = { +if (log.isDebugEnabled) { + withLogContext(entry.context) { +log.debug(entry.message, throwable) + } +} + } + protected def logTrace(msg: => String): Unit = { if (log.isTraceEnabled) log.trace(msg) } + protected def logTrace(entry: LogEntry): Unit = { +if (log.isTraceEnabled) { + withLogContext(entry.context) { +log.trace(entry.message) + } +} + } + + protected def logTrace(entry: LogEntry, throwable: Throwable): Unit = { Review Comment: ditto -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-47804] Add Dataframe cache debug log [spark]
gengliangwang commented on code in PR #45990: URL: https://github.com/apache/spark/pull/45990#discussion_r1566492674 ## common/utils/src/main/scala/org/apache/spark/internal/Logging.scala: ## @@ -153,10 +153,42 @@ trait Logging { if (log.isDebugEnabled) log.debug(msg) } + protected def logDebug(entry: LogEntry): Unit = { +if (log.isDebugEnabled) { + withLogContext(entry.context) { +log.debug(entry.message) + } +} + } + + protected def logDebug(entry: LogEntry, throwable: Throwable): Unit = { Review Comment: Let's skip this one if it is no needed ## common/utils/src/main/scala/org/apache/spark/internal/Logging.scala: ## @@ -153,10 +153,42 @@ trait Logging { if (log.isDebugEnabled) log.debug(msg) } + protected def logDebug(entry: LogEntry): Unit = { +if (log.isDebugEnabled) { + withLogContext(entry.context) { +log.debug(entry.message) + } +} + } + + protected def logDebug(entry: LogEntry, throwable: Throwable): Unit = { +if (log.isDebugEnabled) { + withLogContext(entry.context) { +log.debug(entry.message, throwable) + } +} + } + protected def logTrace(msg: => String): Unit = { if (log.isTraceEnabled) log.trace(msg) } + protected def logTrace(entry: LogEntry): Unit = { +if (log.isTraceEnabled) { + withLogContext(entry.context) { +log.trace(entry.message) + } +} + } + + protected def logTrace(entry: LogEntry, throwable: Throwable): Unit = { Review Comment: ditto -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-47804] Add Dataframe cache debug log [spark]
anchovYu commented on code in PR #45990: URL: https://github.com/apache/spark/pull/45990#discussion_r1566490570 ## sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala: ## @@ -1609,6 +1609,19 @@ object SQLConf { .checkValues(StorageLevelMapper.values.map(_.name()).toSet) .createWithDefault(StorageLevelMapper.MEMORY_AND_DISK.name()) + val DATAFRAME_CACHE_LOG_LEVEL = buildConf("spark.sql.dataframeCache.logLevel") Review Comment: I kept the Dataframe cache naming to differentiate it from the RDD cache. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-47804] Add Dataframe cache debug log [spark]
gengliangwang commented on code in PR #45990: URL: https://github.com/apache/spark/pull/45990#discussion_r1563258451 ## sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala: ## @@ -1609,6 +1609,19 @@ object SQLConf { .checkValues(StorageLevelMapper.values.map(_.name()).toSet) .createWithDefault(StorageLevelMapper.MEMORY_AND_DISK.name()) + val DATAFRAME_CACHE_LOG_LEVEL = buildConf("spark.sql.dataframeCache.logLevel") Review Comment: Also, let's make it an internal conf since it is for developers. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-47804] Add Dataframe cache debug log [spark]
gengliangwang commented on code in PR #45990: URL: https://github.com/apache/spark/pull/45990#discussion_r1563257890 ## sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala: ## @@ -1609,6 +1609,19 @@ object SQLConf { .checkValues(StorageLevelMapper.values.map(_.name()).toSet) .createWithDefault(StorageLevelMapper.MEMORY_AND_DISK.name()) + val DATAFRAME_CACHE_LOG_LEVEL = buildConf("spark.sql.dataframeCache.logLevel") Review Comment: Will we need to debug `cache table` as well? Shall we rename the config as `spark.sql.cache.logLevel` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-47804] Add Dataframe cache debug log [spark]
gengliangwang commented on code in PR #45990: URL: https://github.com/apache/spark/pull/45990#discussion_r1563255189 ## sql/core/src/main/scala/org/apache/spark/sql/execution/CacheManager.scala: ## @@ -204,6 +215,8 @@ class CacheManager extends Logging with AdaptiveSparkPlanHelper { cachedData = cachedData.filterNot(cd => plansToUncache.exists(_ eq cd)) } plansToUncache.foreach { _.cachedRepresentation.cacheBuilder.clearCache(blocking) } +CacheManager.logCacheOperation(s"Removed ${plansToUncache.size} Dataframe cache " + Review Comment: I think we need to migrate to the structured logging API here. Framework: https://github.com/apache/spark/pull/45729 Example migration PR: https://github.com/apache/spark/pull/45834 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org