HeartSaVioR commented on a change in pull request #24936: [SPARK-24634][SS] Add 
a new metric regarding number of rows later than watermark plus allowed delay
URL: https://github.com/apache/spark/pull/24936#discussion_r329928176
 
 

 ##########
 File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/ProgressReporter.scala
 ##########
 @@ -201,26 +201,32 @@ trait ProgressReporter extends Logging {
   }
 
   /** Extract statistics about stateful operators from the executed query 
plan. */
-  private def extractStateOperatorMetrics(hasNewData: Boolean): 
Seq[StateOperatorProgress] = {
+  private def extractStateOperatorMetrics(
+      hasNewData: Boolean,
+      runBatch: Boolean): Seq[StateOperatorProgress] = {
     if (lastExecution == null) return Nil
-    // lastExecution could belong to one of the previous triggers if 
`!hasNewData`.
+    // lastExecution could belong to one of the previous triggers if 
`!hasNewData && !runBatch`.
     // Walking the plan again should be inexpensive.
     lastExecution.executedPlan.collect {
       case p if p.isInstanceOf[StateStoreWriter] =>
         val progress = p.asInstanceOf[StateStoreWriter].getProgress()
-        if (hasNewData) progress else progress.copy(newNumRowsUpdated = 0)
+        if (hasNewData || runBatch) {
 
 Review comment:
   Here `runBatch` is needed here because we don't want to reset the values for 
`newNumLateInputRows` if batch ran, even the batch ran with empty data.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to