rdblue commented on issue #25651: [SPARK-28948][SQL] Support passing all Table 
metadata in TableProvider
URL: https://github.com/apache/spark/pull/25651#issuecomment-544698226
 
 
   > inferSchema/inferPartitioning need to list files, and we should only do 
file listing once when we scan a directory without user-specified schema
   
   Why should this internal concern of one source affect the API? Partitioning 
inference does not require listing all the files in a table, and I doubt that 
it is a good idea to do that for schema inference either. If a table is small, 
it doesn't matter if this work is done twice (before it is fixed); and if a 
table is really large, then it isn't a good idea to do this for schema 
inference anyway.
   
   > when writing to a directory, no schema/partition inference should be done.
   
   This statement makes assumptions about the behavior of path-based tables and 
that behavior hasn't been clearly defined yet. Can you be more specific about 
the case and how you think path-based tables will behave?
   
   I disagree that no schema or partition inference should be done for writing. 
Maybe it isn't done today, but if there is existing data, Spark shouldn't allow 
writing new data that will break a table by using an incompatible schema or 
partition layout. In that case, we would want to infer the schema and 
partitioning.
   
   Also, if it isn't necessary to infer schema and partitioning, then this 
information still needs to be passed to the table. When running a CTAS 
operation, Spark might be called with `partitionBy`. In that case, if Spark 
doesn't call `inferPartitioning` then what is the problem?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to