Github user jose-torres commented on the issue:
https://github.com/apache/spark/pull/20710
As you say, there's no strict semantic need to have createDataWriter() take
arguments. We could simply have each DataWriter identify itself by a random
UUID, and require upstream components to keep track of which UUIDs map to which
of the writers they care about. But the current API design is to enable each
data writer to identify its logical place in the query, and epoch ID is an
important part of that. (I expect it would be infeasible to migrate existing
sources to an API which didn't provide things like partition ID or attempt
number.)
StreamWriter is the separate streaming interface, and DataWriterFactory
implementations in streaming queries will always come from a StreamWriter.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]