rdblue commented on a change in pull request #25651: [SPARK-28948][SQL] Support passing all Table metadata in TableProvider URL: https://github.com/apache/spark/pull/25651#discussion_r336713473
########## File path: sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/TableProvider.java ########## @@ -36,26 +35,12 @@ public interface TableProvider { /** - * Return a {@link Table} instance to do read/write with user-specified options. + * Return a {@link Table} instance with the given table options to do read/write. + * Implementations should infer the table schema and partitioning. * * @param options the user-specified options that can identify a table, e.g. file path, Kafka * topic name, etc. It's an immutable case-insensitive string-to-string map. */ + // TODO: this should take a Map<String, String> as table properties. Review comment: My point was that Spark needs to infer the partitioning of the table, not exhaustively list directories. This can be done more quickly than in the current implementation, by listing all files later and just getting the directory structure for `inferPartititoning`. The static cache I'm talking about is a cache of metastore connections, not files. In this case, you could build your file list for a location and cache that for some period of time, using it for partition and schema inference, as well as for the `FileIndex` in the table you created. Caching would also help consistency because the same files would be in all versions of the table loaded within some period of time (and could be refreshed, of course). But, these concerns shouldn't affect the API. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org