[GitHub] [spark] HeartSaVioR commented on a change in pull request #31495: [SPARK-34383][SS] Optimize WAL commit phase via reducing cost of filesystem operations

GitBox Mon, 08 Feb 2021 18:51:01 -0800


HeartSaVioR commented on a change in pull request #31495:
URL: https://github.com/apache/spark/pull/31495#discussion_r572532292




##########
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/OffsetSeqLog.scala
##########
@@ -46,6 +47,23 @@ import org.apache.spark.sql.connector.read.streaming.{Offset 
=> OffsetV2}
 class OffsetSeqLog(sparkSession: SparkSession, path: String)
   extends HDFSMetadataLog[OffsetSeq](sparkSession, path) {
 
+  private val cachedMetadata = new ju.TreeMap[Long, OffsetSeq]()

Review comment:
       There're different usage patterns and characteristics against these 
metadata logs, and the change is rather only bound to the usage pattern of 
offsetLog.
   e.g. commitLog is only added per batch, and read only when query is started 
or restored from checkpoint.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] HeartSaVioR commented on a change in pull request #31495: [SPARK-34383][SS] Optimize WAL commit phase via reducing cost of filesystem operations

Reply via email to