rdblue commented on issue #25651: [SPARK-28948][SQL] Support passing all Table metadata in TableProvider URL: https://github.com/apache/spark/pull/25651#issuecomment-545600442 > Do you mean we should block this PR until we figure out the behavior of path-based tables? No, and sorry for the misunderstanding! My point is that your claim that inference won't be used in the write path is not necessarily correct and depends on the behavior we decide for path-based tables. > But seems now we are unable to keep file source skipping schema/partition inference during write. I think it's an exaggeration to say "unable". Partition inference in particular can be done much more easily and efficiently than depending on a recursive directory listing to find all data files. Granted, the current implementation would need to change, but do you really think that "unable" is an accurate description? The problem is that this needs to be decided because it affects the API that will go into Spark 3.0. I think we should go with what we agreed was a good solution for the API -- adding the `inferSchema` and `inferPartitioning` methods -- because I haven't heard a very strong argument against it. Let's talk about this in the next v2 sync to get more opinions.
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
