Github user jose-torres commented on the issue:

    https://github.com/apache/spark/pull/20710
  
    As you say, there's no strict semantic need to have createDataWriter() take 
arguments. We could simply have each DataWriter identify itself by a random 
UUID, and require upstream components to keep track of which UUIDs map to which 
of the writers they care about. But the current API design is to enable each 
data writer to identify its logical place in the query, and epoch ID is an 
important part of that. (I expect it would be infeasible to migrate existing 
sources to an API which didn't provide things like partition ID or attempt 
number.)
    
    StreamWriter is the separate streaming interface, and DataWriterFactory 
implementations in streaming queries will always come from a StreamWriter.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to