HeartSaVioR commented on a change in pull request #30427:
URL: https://github.com/apache/spark/pull/30427#discussion_r528586781



##########
File path: 
sql/core/src/main/scala/org/apache/spark/sql/streaming/ui/StreamingQueryStatisticsPage.scala
##########
@@ -126,6 +126,58 @@ private[ui] class StreamingQueryStatisticsPage(parent: 
StreamingQueryTab)
     <br />
   }
 
+  def generateWatermark(
+      query: StreamingQueryUIData,
+      minBatchTime: Long,
+      maxBatchTime: Long,
+      jsCollector: JsCollector): Seq[Node] = {
+    // This is made sure on caller side but put it here to be defensive
+    require(query.lastProgress != null)
+    if (query.lastProgress.eventTime.containsKey("watermark")) {
+      val watermarkData = query.recentProgress.flatMap { p =>
+        val batchTimestamp = parseProgressTimestamp(p.timestamp)
+        val watermarkValue = 
parseProgressTimestamp(p.eventTime.get("watermark"))
+        if (watermarkValue > 0L) {
+          // seconds
+          Some((batchTimestamp, ((batchTimestamp - watermarkValue) / 1000.0)))

Review comment:
       More correctly, how we define the "processing time" in the graph in 
https://github.com/apache/spark/pull/30427#issuecomment-730844687. (y axis) 
   
   The query which pulls recent events are expected to have processing time as 
wall clock. That is only broken when we deal with historical data - that's not 
having the "ideal" processing time. One of approaches which can rough guess 
would be tracking event time, but given Spark takes max event time to calculate 
watermark (while other engines take min event time) the gap is more likely 
pretty much similar across batches.
   
   In historical case, as well as real time case (as Spark picks max event 
time), tracking the gap between global watermark and min event time would be 
more helpful, as we can at least see whether the watermark delay is enough to 
cover the min event time of the next batch. This is pretty specific to Spark's 
case, though.
   
   (So likewise I said, there're several useful lines to plot which can be 
compared between and produce the meaning. I just don't take the step to go my 
life for frontend engineer.)




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to