rdblue commented on a change in pull request #25651: [SPARK-28948][SQL] Support 
passing all Table metadata in TableProvider
URL: https://github.com/apache/spark/pull/25651#discussion_r336713473
 
 

 ##########
 File path: 
sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/TableProvider.java
 ##########
 @@ -36,26 +35,12 @@
 public interface TableProvider {
 
   /**
-   * Return a {@link Table} instance to do read/write with user-specified 
options.
+   * Return a {@link Table} instance with the given table options to do 
read/write.
+   * Implementations should infer the table schema and partitioning.
    *
    * @param options the user-specified options that can identify a table, e.g. 
file path, Kafka
    *                topic name, etc. It's an immutable case-insensitive 
string-to-string map.
    */
+  // TODO: this should take a Map<String, String> as table properties.
 
 Review comment:
   My point was that Spark needs to infer the partitioning of the table, not 
exhaustively list directories. This can be done more quickly than in the 
current implementation, by listing all files later and just getting the 
directory structure for `inferPartititoning`.
   
   The static cache I'm talking about is a cache of metastore connections, not 
files. In this case, you could build your file list for a location and cache 
that for some period of time, using it for partition and schema inference, as 
well as for the `FileIndex` in the table you created. Caching would also help 
consistency because the same files would be in all versions of the table loaded 
within some period of time (and could be refreshed, of course). But, these 
concerns shouldn't affect the API.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to