Re: [PR] [SPARK-47804] Add Dataframe cache debug log [spark]

2024-04-15 Thread via GitHub


gengliangwang closed pull request #45990: [SPARK-47804] Add Dataframe cache 
debug log
URL: https://github.com/apache/spark/pull/45990


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-47804] Add Dataframe cache debug log [spark]

2024-04-15 Thread via GitHub


gengliangwang commented on PR #45990:
URL: https://github.com/apache/spark/pull/45990#issuecomment-2058080259

   Thanks, merging to master


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-47804] Add Dataframe cache debug log [spark]

2024-04-15 Thread via GitHub


gengliangwang commented on code in PR #45990:
URL: https://github.com/apache/spark/pull/45990#discussion_r1566605060


##
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala:
##
@@ -1609,6 +1609,19 @@ object SQLConf {
 .checkValues(StorageLevelMapper.values.map(_.name()).toSet)
 .createWithDefault(StorageLevelMapper.MEMORY_AND_DISK.name())
 
+  val DATAFRAME_CACHE_LOG_LEVEL = 
buildConf("spark.sql.dataframeCache.logLevel")

Review Comment:
   RDD is a Spark core concept. Anyway I respect your choice here.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-47804] Add Dataframe cache debug log [spark]

2024-04-15 Thread via GitHub


gengliangwang commented on code in PR #45990:
URL: https://github.com/apache/spark/pull/45990#discussion_r1566492674


##
common/utils/src/main/scala/org/apache/spark/internal/Logging.scala:
##
@@ -153,10 +153,42 @@ trait Logging {
 if (log.isDebugEnabled) log.debug(msg)
   }
 
+  protected def logDebug(entry: LogEntry): Unit = {
+if (log.isDebugEnabled) {
+  withLogContext(entry.context) {
+log.debug(entry.message)
+  }
+}
+  }
+
+  protected def logDebug(entry: LogEntry, throwable: Throwable): Unit = {

Review Comment:
   Let's skip this one if it is no needed



##
common/utils/src/main/scala/org/apache/spark/internal/Logging.scala:
##
@@ -153,10 +153,42 @@ trait Logging {
 if (log.isDebugEnabled) log.debug(msg)
   }
 
+  protected def logDebug(entry: LogEntry): Unit = {
+if (log.isDebugEnabled) {
+  withLogContext(entry.context) {
+log.debug(entry.message)
+  }
+}
+  }
+
+  protected def logDebug(entry: LogEntry, throwable: Throwable): Unit = {
+if (log.isDebugEnabled) {
+  withLogContext(entry.context) {
+log.debug(entry.message, throwable)
+  }
+}
+  }
+
   protected def logTrace(msg: => String): Unit = {
 if (log.isTraceEnabled) log.trace(msg)
   }
 
+  protected def logTrace(entry: LogEntry): Unit = {
+if (log.isTraceEnabled) {
+  withLogContext(entry.context) {
+log.trace(entry.message)
+  }
+}
+  }
+
+  protected def logTrace(entry: LogEntry, throwable: Throwable): Unit = {

Review Comment:
   ditto



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-47804] Add Dataframe cache debug log [spark]

2024-04-15 Thread via GitHub


gengliangwang commented on code in PR #45990:
URL: https://github.com/apache/spark/pull/45990#discussion_r1566492674


##
common/utils/src/main/scala/org/apache/spark/internal/Logging.scala:
##
@@ -153,10 +153,42 @@ trait Logging {
 if (log.isDebugEnabled) log.debug(msg)
   }
 
+  protected def logDebug(entry: LogEntry): Unit = {
+if (log.isDebugEnabled) {
+  withLogContext(entry.context) {
+log.debug(entry.message)
+  }
+}
+  }
+
+  protected def logDebug(entry: LogEntry, throwable: Throwable): Unit = {

Review Comment:
   Let's skip this one if it is no needed



##
common/utils/src/main/scala/org/apache/spark/internal/Logging.scala:
##
@@ -153,10 +153,42 @@ trait Logging {
 if (log.isDebugEnabled) log.debug(msg)
   }
 
+  protected def logDebug(entry: LogEntry): Unit = {
+if (log.isDebugEnabled) {
+  withLogContext(entry.context) {
+log.debug(entry.message)
+  }
+}
+  }
+
+  protected def logDebug(entry: LogEntry, throwable: Throwable): Unit = {
+if (log.isDebugEnabled) {
+  withLogContext(entry.context) {
+log.debug(entry.message, throwable)
+  }
+}
+  }
+
   protected def logTrace(msg: => String): Unit = {
 if (log.isTraceEnabled) log.trace(msg)
   }
 
+  protected def logTrace(entry: LogEntry): Unit = {
+if (log.isTraceEnabled) {
+  withLogContext(entry.context) {
+log.trace(entry.message)
+  }
+}
+  }
+
+  protected def logTrace(entry: LogEntry, throwable: Throwable): Unit = {

Review Comment:
   ditto



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-47804] Add Dataframe cache debug log [spark]

2024-04-15 Thread via GitHub


anchovYu commented on code in PR #45990:
URL: https://github.com/apache/spark/pull/45990#discussion_r1566490570


##
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala:
##
@@ -1609,6 +1609,19 @@ object SQLConf {
 .checkValues(StorageLevelMapper.values.map(_.name()).toSet)
 .createWithDefault(StorageLevelMapper.MEMORY_AND_DISK.name())
 
+  val DATAFRAME_CACHE_LOG_LEVEL = 
buildConf("spark.sql.dataframeCache.logLevel")

Review Comment:
   I kept the Dataframe cache naming to differentiate it from the RDD cache.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-47804] Add Dataframe cache debug log [spark]

2024-04-12 Thread via GitHub


gengliangwang commented on code in PR #45990:
URL: https://github.com/apache/spark/pull/45990#discussion_r1563258451


##
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala:
##
@@ -1609,6 +1609,19 @@ object SQLConf {
 .checkValues(StorageLevelMapper.values.map(_.name()).toSet)
 .createWithDefault(StorageLevelMapper.MEMORY_AND_DISK.name())
 
+  val DATAFRAME_CACHE_LOG_LEVEL = 
buildConf("spark.sql.dataframeCache.logLevel")

Review Comment:
   Also, let's make it an internal conf since it is for developers.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-47804] Add Dataframe cache debug log [spark]

2024-04-12 Thread via GitHub


gengliangwang commented on code in PR #45990:
URL: https://github.com/apache/spark/pull/45990#discussion_r1563257890


##
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala:
##
@@ -1609,6 +1609,19 @@ object SQLConf {
 .checkValues(StorageLevelMapper.values.map(_.name()).toSet)
 .createWithDefault(StorageLevelMapper.MEMORY_AND_DISK.name())
 
+  val DATAFRAME_CACHE_LOG_LEVEL = 
buildConf("spark.sql.dataframeCache.logLevel")

Review Comment:
   Will we need to debug `cache table` as well? Shall we rename the config as 
   `spark.sql.cache.logLevel`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-47804] Add Dataframe cache debug log [spark]

2024-04-12 Thread via GitHub


gengliangwang commented on code in PR #45990:
URL: https://github.com/apache/spark/pull/45990#discussion_r1563255189


##
sql/core/src/main/scala/org/apache/spark/sql/execution/CacheManager.scala:
##
@@ -204,6 +215,8 @@ class CacheManager extends Logging with 
AdaptiveSparkPlanHelper {
   cachedData = cachedData.filterNot(cd => plansToUncache.exists(_ eq cd))
 }
 plansToUncache.foreach { 
_.cachedRepresentation.cacheBuilder.clearCache(blocking) }
+CacheManager.logCacheOperation(s"Removed ${plansToUncache.size} Dataframe 
cache " +

Review Comment:
   I think we need to migrate to the structured logging API here.
   Framework: https://github.com/apache/spark/pull/45729
   Example migration PR: https://github.com/apache/spark/pull/45834



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org