Github user cloud-fan commented on the issue:
https://github.com/apache/spark/pull/20386
There is a lesson I learned from streaming data source v1: even it's
totally internal, there are people already using it and ask us to not remove
the API.
I think it's also true for the file-based data source. It's internal but
people may still use it. Although we don't find any use case for `onTaskCommit`
among built-in data sources, it may be required by external data sources.
One possible use case might be, the implementation needs a 2-phase commit
at the driver side. Then it can use `onTaskCommit` to finish the first phase
earlier. Or maybe someone wanna collect the received commit messages so far and
report statistics regularly, then he needs the `onTaskCommit`.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]