Github user HeartSaVioR commented on the issue:
https://github.com/apache/spark/pull/21721
My 2 cents, the root reason is the lifecycle of reporting query progress is
tied to `finishTrigger` and we read updated metrics from executed plan which
continuous mode doesn't have both `finishTrigger` as well as finished plan to
be executed.
I'm not aware of how/when updated information of nodes of physical plan are
transmitted from executor to the driver, but we should avoid using executed
plan as a source to read information, and find alternative to be compatible
between micro-batch and continuous mode. It doesn't apply only metrics but also
watermarks.
I'm not sure it is viable, but It could be via RPC or whatever once we can
aggregate the information from driver. Then each operators can send information
on driver directly and driver can aggregate them and utilize once a batch or an
epoch is finished.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]