HeartSaVioR commented on a change in pull request #30427:
URL: https://github.com/apache/spark/pull/30427#discussion_r528586781
##########
File path:
sql/core/src/main/scala/org/apache/spark/sql/streaming/ui/StreamingQueryStatisticsPage.scala
##########
@@ -126,6 +126,58 @@ private[ui] class StreamingQueryStatisticsPage(parent:
StreamingQueryTab)
<br />
}
+ def generateWatermark(
+ query: StreamingQueryUIData,
+ minBatchTime: Long,
+ maxBatchTime: Long,
+ jsCollector: JsCollector): Seq[Node] = {
+ // This is made sure on caller side but put it here to be defensive
+ require(query.lastProgress != null)
+ if (query.lastProgress.eventTime.containsKey("watermark")) {
+ val watermarkData = query.recentProgress.flatMap { p =>
+ val batchTimestamp = parseProgressTimestamp(p.timestamp)
+ val watermarkValue =
parseProgressTimestamp(p.eventTime.get("watermark"))
+ if (watermarkValue > 0L) {
+ // seconds
+ Some((batchTimestamp, ((batchTimestamp - watermarkValue) / 1000.0)))
Review comment:
More correctly, how we define the "processing time" in the graph in
https://github.com/apache/spark/pull/30427#issuecomment-730844687. (y axis)
The query which pulls recent events are expected to have processing time as
wall clock. That is only broken when we deal with historical data - that's not
having the "ideal" processing time. One of approaches which can rough guess
would be tracking event time, but given Spark takes max event time to calculate
watermark (while other engines take min event time) the gap is more likely
pretty much similar across batches.
In historical case, as well as real time case (as Spark picks max event
time), tracking the gap between global watermark and min event time would be
more helpful, as we can at least see whether the watermark delay is enough to
cover the min event time of the next batch. This is pretty specific to Spark's
case, though.
(So likewise I said, there're several useful lines to plot which can be
compared between and produce the meaning. I just don't take the step to go my
life for frontend engineer.)
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]