Github user HeartSaVioR commented on the issue:
https://github.com/apache/spark/pull/21721
If batch query also leverages AccumulatorV2 for metrics, IMHO it might not
need to redesign metrics API start from scratch. For batch and micro-batch the
metrics API work without any concerns (it is getting requests for improvement
though), and for continuous mode the metrics just don't work because task never
finishes.
The change in metrics affects both query status as well as SQL tab in UI. I
haven't concerned too deeply with metrics on continuous mode so not sure about
current state of UI and the ideal shape of UI, so will spend time to play with.
My 2 cents, once we have existing metrics work well, we could find out some
ways to let current metrics work well with continuous mode, to not break other
things as well.
One thing I would like to ask to ourselves is, would we treat epoch id as
batch id? For checkpointing we already did it, and in some streaming framework
they represent `stream between epochs` as `logical batch` which makes sense to
me. If we deal with watermark we are likely to update watermark per epoch, as
well as dealing with state, and if my understanding is correct epoch id looks
like just an alias of batch id.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]