Github user rdblue commented on the issue: https://github.com/apache/spark/pull/20387 The reader is to be created and configured by the relation, then the relation needs to be able to set the table, path, and other properties. This adds necessary data to the relation that is no longer be passed directly to the reader from `DataFrameReader`. From the other thread on this, I think we agree that minimizing the number of places that work with `DataSourceOptions` and the specific option strings is a good idea. So it makes sense to define the relation using `TableIdentifier`. Other paths that create `DataSourceV2Relation` need the table name to be passed like this. I guess we *could* revert the change and add it in a separate commit, but I don't see a reason for the extra work. It would be impractical to backport a later `TableIdentifier` change without this immutability change. Similarly, why would someone want to move to an immutable plan, but leave some left-over logic for configuration in `DataFrameReader`? I don't see why we wouldn't want to have these options in the immutable relation node from the start. Do you have a case in mind that I'm missing?
--- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org