rdblue commented on issue #25651: [SPARK-28948][SQL] Support passing all Table 
metadata in TableProvider
URL: https://github.com/apache/spark/pull/25651#issuecomment-545600442
 
 
   > Do you mean we should block this PR until we figure out the behavior of 
path-based tables?
   
   No, and sorry for the misunderstanding! My point is that your claim that 
inference won't be used in the write path is not necessarily correct and 
depends on the behavior we decide for path-based tables.
   
   > But seems now we are unable to keep file source skipping schema/partition 
inference during write.
   
   I think it's an exaggeration to say "unable". Partition inference in 
particular can be done much more easily and efficiently than depending on a 
recursive directory listing to find all data files. Granted, the current 
implementation would need to change, but do you really think that "unable" is 
an accurate description?
   
   The problem is that this needs to be decided because it affects the API that 
will go into Spark 3.0. I think we should go with what we agreed was a good 
solution for the API -- adding the `inferSchema` and `inferPartitioning` 
methods -- because I haven't heard a very strong argument against it. Let's 
talk about this in the next v2 sync to get more opinions.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to