[GitHub] [spark] HeartSaVioR commented on a change in pull request #34502: [SPARK-37224][SS] Optimize write path on RocksDB state store provider

GitBox Wed, 17 Nov 2021 17:16:29 -0800


HeartSaVioR commented on a change in pull request #34502:
URL: https://github.com/apache/spark/pull/34502#discussion_r751808093




##########
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/RocksDB.scala
##########
@@ -112,8 +112,20 @@ class RocksDB(
         closeDB()
         val metadata = fileManager.loadCheckpointFromDfs(version, workingDir)
         openDB()
-        numKeysOnWritingVersion = metadata.numKeys
-        numKeysOnLoadedVersion = metadata.numKeys
+
+        val numKeys = if (!conf.trackTotalNumberOfRows) {
+          // we don't track the total number of rows - discard the number 
being track
+          -1L

Review comment:
       [class SQLMetric(val metricType: String, initValue: Long = 0L) extends 
AccumulatorV2[Long, Long] {
     // This is a workaround for SPARK-11013.
     // We may use -1 as initial value of the accumulator, if the accumulator 
is valid, we will
     // update it at the end of task and the value will be at least 0. Then we 
can filter out the -1
     // values before calculate max, min, 
etc.](https://github.com/apache/spark/blob/6450f6bfa9697fbdd81d93d88bbfd2459b3837d3/sql/core/src/main/scala/org/apache/spark/sql/execution/metric/SQLMetrics.scala#L40-L44)
   
   Even we separate the values for "no key" vs "don't know", the value will go 
through SQLMetric and negative values are not contributing on accumulation.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] HeartSaVioR commented on a change in pull request #34502: [SPARK-37224][SS] Optimize write path on RocksDB state store provider

Reply via email to