Github user rdblue commented on the issue:
https://github.com/apache/spark/pull/20397
This is more confusing, not less. Look at @jiangxb1987's comment above: "We
shall create only one DataReaderFactory, and have that create multiple data
readers." It is not clear why the API requires a list of factories, instead of
using just one. If this is renamed to factory, is it a requirement that the
factory can create more than one data reader for the same task?
To the point about serializing and sending to executors, "factory" doesn't
imply that any more than "task" does. The fact that these are serialized needs
to be clear in documentation.
The read and write side behave differently. They do not need to mirror one
another's naming when that makes names less precise. This isn't forcing users
to look at a subtle difference. It is just breaking the (wrong) assumption that
both read and write sides have the same behavior.
@rxin, any opinion here?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]