rdblue commented on a change in pull request #23606: [SPARK-26666][SQL] Support DSv2 overwrite and dynamic partition overwrite. URL: https://github.com/apache/spark/pull/23606#discussion_r250295514
########## File path: sql/core/src/main/java/org/apache/spark/sql/sources/v2/writer/SupportsOverwrite.java ########## @@ -0,0 +1,36 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.sources.v2.writer; + +import org.apache.spark.sql.sources.Filter; + +/** + * Write builder trait for tables that support overwrite by filter. + * <p> + * Overwriting data by filter will delete any data that matches the filter and replace it with data + * that is committed in the write. + */ +public interface SupportsOverwrite extends WriteBuilder { Review comment: > I think the semantic can also apply to non-partitioned tables but that will be very hard to implement. Not necessarily. JDBC sources can implement this fairly easily, and those that support transactions can make it an atomic operation. There are also strategies that can work for unpartitioned tables, like deleting data files using min/max ranges show all rows are matched by a filter. > BTW for a normal INSERT OVERWRITE, which needs to truncate the entire table, the filters will be a single true literal? I think that truncate is slightly different. Because it is fairly easy to support truncate, but not overwrite by expression, I think that truncate should be a separate operation in the v2 API. I would make `SupportsOverwrite` implement `SupportsTruncate` with a default that calls overwrite with `true` like you suggest, but I think we will need to add a `true` filter. Also, what do you mean by "normal INSERT OVERWRITE"? What operation is that? Right now, `INSERT OVERWRITE` is effectively dynamic partition overwrite. Unpartitioned tables are truncated because they have just one "partition". Do you agree with that summary? ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
