cloud-fan commented on a change in pull request #26297: [SPARK-29665][SQL] 
refine the TableProvider interface
URL: https://github.com/apache/spark/pull/26297#discussion_r341933239
 
 

 ##########
 File path: 
sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/TableProvider.java
 ##########
 @@ -36,26 +39,34 @@
 public interface TableProvider {
 
   /**
-   * Return a {@link Table} instance to do read/write with user-specified 
options.
+   * Infer the schema of the table that is identified by the given options.
+   *
+   * @param options The options that can identify a table, e.g. file path, 
Kafka topic name, etc.
+   *                It's an immutable case-insensitive string-to-string map.
    *
-   * @param options the user-specified options that can identify a table, e.g. 
file path, Kafka
-   *                topic name, etc. It's an immutable case-insensitive 
string-to-string map.
    */
-  Table getTable(CaseInsensitiveStringMap options);
+  StructType inferSchema(CaseInsensitiveStringMap options);
 
   /**
-   * Return a {@link Table} instance to do read/write with user-specified 
schema and options.
-   * <p>
-   * By default this method throws {@link UnsupportedOperationException}, 
implementations should
-   * override this method to handle user-specified schema.
-   * </p>
-   * @param options the user-specified options that can identify a table, e.g. 
file path, Kafka
-   *                topic name, etc. It's an immutable case-insensitive 
string-to-string map.
-   * @param schema the user-specified schema.
-   * @throws UnsupportedOperationException
+   * Infer the partitioning of the table that is identified by the given 
options.
+   *
+   * @param schema The schema of the table.
+   * @param options The options that can identify a table, e.g. file path, 
Kafka topic name, etc.
+   *                It's an immutable case-insensitive string-to-string map.
+   */
+  Transform[] inferPartitioning(StructType schema, CaseInsensitiveStringMap 
options);
 
 Review comment:
   We can remove the schema parameter and make the API more flexible, but I'm 
not sure we need such flexibility.
   
   As I mentioned in the PR description, Spark only supports
   1) infer both schema and partitioning.
   2) specifies schema and infer partitioning.
   3) specifies both schema and partitioning.
   
   It seems very weird if we allow users to specify partitioning and infer 
schema. Since partitioning is something depending on the schema (e.g. you can't 
pick a non-existing column as partition column), I think in general it makes 
sense to have the schema parameter in `inferPartitioning`.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to