Github user cloud-fan commented on the issue:
https://github.com/apache/spark/pull/20397
About the renaming, a lot of people complained to me about why the namings
are not consistent, including @rxin . I named it `ReadTask` at the beginning
because it really works like a task. But I believe after 2.3 more and more
people will complain about the naming inconsistency because the difference
between `ReadTask` and `DataWriterFactory` is too subtle: both of them are
responsible for serializing information and initializing the actual
reader/writer at executor side. The only difference is, we only get one
`DataWriterFactor`, serialize and send it to all partitions, which means we
implicitly "copy" the writer factory to all partitions. While for `ReakTask`,
we get many of them, and send each one to its corresponding partition, which
means there is no "copy". I think the renaming is worth to remove future
confusions.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]