Github user cloud-fan commented on the issue:
https://github.com/apache/spark/pull/21721
Since I'm continuously working on data source v2 API, this gets my
attention. Do we have a story for the metrics in data source v2 streaming API?
It's weird to me that we add public APIs that only work for micro-batch.
For streaming API, the abstraction is that, we have a logical scan for a
streaming source in a query(to keep query specific states like offsets), and a
physical scan to do the actual job, for each micro-batch or for the entire
continuous query(if `needsReconfigure` is false).
Where does the metrics fit in with the abstraction? It's ok that it only
works for micro-batch now, but we must have a clear plan that we can and will
make it work for continuous.
cc @tdas @zsxwing @rdblue
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]