[GitHub] spark issue #20397: [SPARK-23219][SQL]Rename ReadTask to DataReaderFactory i...

rdblue Mon, 29 Jan 2018 10:47:05 -0800

Github user rdblue commented on the issue:

    https://github.com/apache/spark/pull/20397
  
    This is more confusing, not less. Look at @jiangxb1987's comment above: "We 
shall create only one DataReaderFactory, and have that create multiple data 
readers." It is not clear why the API requires a list of factories, instead of 
using just one. If this is renamed to factory, is it a requirement that the 
factory can create more than one data reader for the same task?
    
    To the point about serializing and sending to executors, "factory" doesn't 
imply that any more than "task" does. The fact that these are serialized needs 
to be clear in documentation.
    
    The read and write side behave differently. They do not need to mirror one 
another's naming when that makes names less precise. This isn't forcing users 
to look at a subtle difference. It is just breaking the (wrong) assumption that 
both read and write sides have the same behavior.
    
    @rxin, any opinion here?



---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark issue #20397: [SPARK-23219][SQL]Rename ReadTask to DataReaderFactory i...

Reply via email to