Github user rdblue commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21306#discussion_r200418152
  
    --- Diff: 
sql/core/src/main/java/org/apache/spark/sql/sources/v2/catalog/TableCatalog.java
 ---
    @@ -0,0 +1,123 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.sql.sources.v2.catalog;
    +
    +import org.apache.spark.sql.catalyst.TableIdentifier;
    +import org.apache.spark.sql.catalyst.analysis.NoSuchTableException;
    +import org.apache.spark.sql.catalyst.analysis.TableAlreadyExistsException;
    +import org.apache.spark.sql.catalyst.expressions.Expression;
    +import org.apache.spark.sql.types.StructType;
    +
    +import java.util.Arrays;
    +import java.util.Collections;
    +import java.util.List;
    +import java.util.Map;
    +
    +public interface TableCatalog {
    +  /**
    +   * Load table metadata by {@link TableIdentifier identifier} from the 
catalog.
    +   *
    +   * @param ident a table identifier
    +   * @return the table's metadata
    +   * @throws NoSuchTableException If the table doesn't exist.
    +   */
    +  Table loadTable(TableIdentifier ident) throws NoSuchTableException;
    +
    +  /**
    +   * Create a table in the catalog.
    +   *
    +   * @param ident a table identifier
    +   * @param schema the schema of the new table, as a struct type
    +   * @return metadata for the new table
    +   * @throws TableAlreadyExistsException If a table already exists for the 
identifier
    +   */
    +  default Table createTable(TableIdentifier ident,
    +                            StructType schema) throws 
TableAlreadyExistsException {
    +    return createTable(ident, schema, Collections.emptyList(), 
Collections.emptyMap());
    +  }
    +
    +  /**
    +   * Create a table in the catalog.
    +   *
    +   * @param ident a table identifier
    +   * @param schema the schema of the new table, as a struct type
    +   * @param properties a string map of table properties
    +   * @return metadata for the new table
    +   * @throws TableAlreadyExistsException If a table already exists for the 
identifier
    +   */
    +  default Table createTable(TableIdentifier ident,
    +                            StructType schema,
    +                            Map<String, String> properties) throws 
TableAlreadyExistsException {
    +    return createTable(ident, schema, Collections.emptyList(), properties);
    +  }
    +
    +  /**
    +   * Create a table in the catalog.
    +   *
    +   * @param ident a table identifier
    +   * @param schema the schema of the new table, as a struct type
    +   * @param partitions a list of expressions to use for partitioning data 
in the table
    +   * @param properties a string map of table properties
    +   * @return metadata for the new table
    +   * @throws TableAlreadyExistsException If a table already exists for the 
identifier
    +   */
    +  Table createTable(TableIdentifier ident,
    +                    StructType schema,
    +                    List<Expression> partitions,
    --- End diff --
    
    I recommend reading the proposal SPIP's "Proposed Changes" section, which 
goes into more detail than this comment can. In short, you're thinking of 
partitions as columns like Hive tables, but that is a narrow definition that 
prevents the underlying format from optimizing queries.
    
    Partitions of a table are derived from the column data through some 
transform. For example, partitioning by day uses a day transform from a 
timestamp column: `day(ts)`. Hive doesn't keep track of that transform and 
requires queries to handle it by inserting both `ts` and `day` columns. This 
leads to a few problems, including:
    * Hive has no ability to transform `ts > X` to the partition predicate `day 
>= day(X)`. Queries that don't take into account the table's physical storage 
by adding partition predicates by hand will result in full table scans.
    * Users can insert any data they choose into the `day` partition and it is 
up to them to do it correctly.
    
    Also, consider bucketing. Bucketing is also a transform that is effectively 
a partitioning of the table's files: `bucket=hash(col) % N`. The reason why 
bucketing is handled as a special case in Hive is that using it _requires_ 
knowing the transform and relationship between the bucket number and its 
column. If we think of partitioning as grouping data by common values of a set 
of transforms, then buckets are just another partition that we can use for 
purposes like bucketed joins or limiting scans when looking for specific values.
    
    If the transform is _identity_ -- just copy the value into partition data 
-- then you have the same functionality that Hive provides. But by building the 
transformations into the partitioning layer, we can do more to optimize 
queries, while hiding the physical layout of a table.
    
    Using Expression allows Spark to pass `day(ts)` to the data source. It is 
up to the source which expressions are supported. The current FS tables would 
reject any expression that isn't just a column reference. Iceberg supports 
`identity`, `year`, `month`, `day`, `hour`, `truncate`, and `bucket` transforms.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to