rdblue commented on issue #25651: [SPARK-28948][SQL] Support passing all Table metadata in TableProvider URL: https://github.com/apache/spark/pull/25651#issuecomment-544698226 > inferSchema/inferPartitioning need to list files, and we should only do file listing once when we scan a directory without user-specified schema Why should this internal concern of one source affect the API? Partitioning inference does not require listing all the files in a table, and I doubt that it is a good idea to do that for schema inference either. If a table is small, it doesn't matter if this work is done twice (before it is fixed); and if a table is really large, then it isn't a good idea to do this for schema inference anyway. > when writing to a directory, no schema/partition inference should be done. This statement makes assumptions about the behavior of path-based tables and that behavior hasn't been clearly defined yet. Can you be more specific about the case and how you think path-based tables will behave? I disagree that no schema or partition inference should be done for writing. Maybe it isn't done today, but if there is existing data, Spark shouldn't allow writing new data that will break a table by using an incompatible schema or partition layout. In that case, we would want to infer the schema and partitioning. Also, if it isn't necessary to infer schema and partitioning, then this information still needs to be passed to the table. When running a CTAS operation, Spark might be called with `partitionBy`. In that case, if Spark doesn't call `inferPartitioning` then what is the problem?
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
