Github user rdblue commented on the issue:

    https://github.com/apache/spark/pull/20387
  
    The reader is to be created and configured by the relation, then the 
relation needs to be able to set the table, path, and other properties. This 
adds necessary data to the relation that is no longer be passed directly to the 
reader from `DataFrameReader`.
    
    From the other thread on this, I think we agree that minimizing the number 
of places that work with `DataSourceOptions` and the specific option strings is 
a good idea. So it makes sense to define the relation using `TableIdentifier`. 
Other paths that create `DataSourceV2Relation` need the table name to be passed 
like this.
    
    I guess we *could* revert the change and add it in a separate commit, but I 
don't see a reason for the extra work. It would be impractical to backport a 
later `TableIdentifier` change without this immutability change. Similarly, why 
would someone want to move to an immutable plan, but leave some left-over logic 
for configuration in `DataFrameReader`?
    
    I don't see why we wouldn't want to have these options in the immutable 
relation node from the start. Do you have a case in mind that I'm missing?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to