viirya commented on pull request #30427:
URL: https://github.com/apache/spark/pull/30427#issuecomment-730852396


   > Technically, the graph is almost meaningless on processing time, because 
the event timestamp would be nearly same as batch timestamp. Even the query is 
lagging, once the next batch is launched, the event timestamp of inputs will be 
matched to the batch timestamp.
   > 
   > The graph will be helpful if they're either using "ingest time" (not 
timestamped by Spark, but timestamped when ingested to the input storage) which 
could show the lag of process, or using "event time" which is the best case of 
showing the gap.
   
   The gap is calculated by the difference between batch timestamp (this should 
be processing time, right? Because the trigger clock is `SystemClock` by 
default) and watermark. My previous question maybe not clear. If we process 
history data or some simulation data, the event time could be far different to 
processing time. For example, if we process some data from 2010 to 2019, now 
the gap is current time - 2010-xx-xx...?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to