brkyvz commented on a change in pull request #24560: [SPARK-27661][SQL] Add SupportsNamespaces API URL: https://github.com/apache/spark/pull/24560#discussion_r309357901
########## File path: sql/catalyst/src/main/java/org/apache/spark/sql/catalog/v2/SupportsNamespaces.java ########## @@ -0,0 +1,145 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.catalog.v2; + +import org.apache.spark.sql.catalyst.analysis.NamespaceAlreadyExistsException; +import org.apache.spark.sql.catalyst.analysis.NoSuchNamespaceException; + +import java.util.Map; + +/** + * Catalog methods for working with namespaces. + * <p> + * If an object such as a table, view, or function exists, its parent namespaces must also exist + * and must be returned by the discovery methods {@link #listNamespaces()} and + * {@link #listNamespaces(String[])}. + * <p> + * Catalog implementations are not required to maintain the existence of namespaces independent of + * objects in a namespace. For example, a function catalog that loads functions using reflection + * and uses Java packages as namespaces is not required to support the methods to create, alter, or + * drop a namespace. Implementations are allowed to discover the existence of objects or namespaces + * without throwing {@link NoSuchNamespaceException} when no namespace is found. + */ +public interface SupportsNamespaces extends CatalogPlugin { + + /** + * Return a default namespace for the catalog. + * <p> + * When this catalog is set as the current catalog, the namespace returned by this method will be + * set as the current namespace. + * <p> + * The namespace returned by this method is not required to exist. + * + * @return a multi-part namespace + */ + default String[] defaultNamespace() { + return new String[0]; + } + + /** + * List top-level namespaces from the catalog. + * <p> + * If an object such as a table, view, or function exists, its parent namespaces must also exist + * and must be returned by this discovery method. For example, if table a.b.t exists, this method + * must return ["a"] in the result array. + * + * @return an array of multi-part namespace names + */ + String[][] listNamespaces() throws NoSuchNamespaceException; + + /** + * List namespaces in a namespace. + * <p> + * If an object such as a table, view, or function exists, its parent namespaces must also exist + * and must be returned by this discovery method. For example, if table a.b.t exists, this method + * invoked as listNamespaces(["a"]) must return ["a", "b"] in the result array. + * + * @param namespace a multi-part namespace + * @return an array of multi-part namespace names + * @throws NoSuchNamespaceException If the namespace does not exist (optional) + */ + String[][] listNamespaces(String[] namespace) throws NoSuchNamespaceException; + + /** + * Test whether a namespace exists. + * <p> + * If an object such as a table, view, or function exists, its parent namespaces must also exist. + * For example, if table a.b.t exists, this method invoked as namespaceExists(["a"]) or + * namespaceExists(["a", "b"]) must return true. + * + * @param namespace a multi-part namespace + * @return true if the namespace exists, false otherwise + */ + default boolean namespaceExists(String[] namespace) { + try { + loadNamespaceMetadata(namespace); + return true; + } catch (NoSuchNamespaceException e) { + return false; + } + } + + /** + * Load metadata properties for a namespace. + * + * @param namespace a multi-part namespace + * @return a string map of properties for the given namespace + * @throws NoSuchNamespaceException If the namespace does not exist (optional) + * @throws UnsupportedOperationException If namespace properties are not supported + */ + Map<String, String> loadNamespaceMetadata(String[] namespace) throws NoSuchNamespaceException; + + /** + * Create a namespace in the catalog. + * + * @param namespace a multi-part namespace + * @param metadata a string map of properties for the given namespace + * @throws NamespaceAlreadyExistsException If the namespace already exists + * @throws UnsupportedOperationException If create is not a supported operation + */ + void createNamespaceMetadata( Review comment: @cloud-fan But the whole goal of DSV2 is to let the implementation decide the (internal) behavior, but have Spark to enforce the semantics. If you truncate a table with Spark, all Spark needs is that the next time that table is queried that there is no data in the table. It doesn't care if the table (datasource) decided to physically delete all the data or just did a logical deletion. - `CREATE TABLE` should fail if the namespace doesn't exist => This makes sense to me. I'd say this is the behavior that users expect. I don't think this would be a blocker if people start to implement object store backed catalogs. Then they can run `CREATE NAMESPACE a.b.c`, and then try the CREATE TABLE again - If a namespace already exists, then IMHO CREATE NAMESPACE should throw an error and they should run ALTER NAMESPACE to add the metadata - Regarding implicit namespaces in object stores, I feel that this still means that someone created the namespace through some other means (imagine using a GUI to create a MySQL table instead of the shell), and alter namespace should be used to add the metadata. wdyt @rdblue @mccheah ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
