[GitHub] [spark] HeartSaVioR commented on issue #27557: [SPARK-30804][SS] Measure and log elapsed time for "compact" operation in CompactibleFileStreamLog

GitBox Thu, 20 Feb 2020 13:44:16 -0800

HeartSaVioR commented on issue #27557: [SPARK-30804][SS] Measure and log 
elapsed time for "compact" operation in CompactibleFileStreamLog
URL: https://github.com/apache/spark/pull/27557#issuecomment-589333368
 
 
   >> For streaming workloads, latency is the first class consideration.
   >  When the query is not running properly.
   
   OK I admit my major experience had been with "low-latency", but even Spark 
runs with "micro-batch", it doesn't mean latency is not important. The latency 
is the thing in streaming workload to "define" whether the query is running 
properly or not. Even Spark had to claim that a micro-batch could run in 
sub-second because one of major downside for Spark Streaming has been the 
latency, and continuous processing had to be introduced.
   
   Higher latency doesn't only mean output will be late. When you turn on 
"latestFirst" (with maxFilesPerTrigger, as this case we assume we can't process 
all the inputs) to start reading from latest files, then the latency on a batch 
defines the boundary of inputs.
   
   It's a critical aspect which operators should always observe via their 
monitoring approaches (alerts, time-series DB and dashboard, etc.), and find 
out what happens when the latency fluctuates a lot. 
   
   > I think it's debug information which helps developers to find out what's 
the issue and not users (INFO is more like to users in my understanding).
   
   I'm not sure who do you mean by "users". AFAIK, in many cases (not all cases 
for sure), users = developers = operators.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] HeartSaVioR commented on issue #27557: [SPARK-30804][SS] Measure and log elapsed time for "compact" operation in CompactibleFileStreamLog

Reply via email to