[GitHub] spark pull request #22190: [SPARK-25188][SQL] Add WriteConfig to v2 write AP...

rdblue Wed, 22 Aug 2018 14:42:43 -0700

Github user rdblue commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22190#discussion_r212119716
  
    --- Diff: 
sql/core/src/main/java/org/apache/spark/sql/sources/v2/writer/BatchOverwriteSupport.java
 ---
    @@ -0,0 +1,61 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.sql.sources.v2.writer;
    +
    +import org.apache.spark.sql.catalyst.plans.logical.Filter;
    +import org.apache.spark.sql.sources.v2.DataSourceOptions;
    +import org.apache.spark.sql.types.StructType;
    +
    +/**
    + * An interface that adds support to {@link BatchWriteSupport} for a 
replace data operation that
    + * replaces a subset of the output table with the output of a write 
operation. The subset removed is
    + * determined by a set of filter expressions.
    + * <p>
    + * Data source implementations can implement this interface in addition to 
{@link BatchWriteSupport}
    + * to support idempotent write operations that replace data matched by a 
set of delete filters with
    + * the result of the write operation.
    + * <p>
    + * This is used to build idempotent writes. For example, a query that 
produces a daily summary
    + * may be run several times as new data arrives. Each run should replace 
the output of the last
    + * run for a particular day in the partitioned output table. Such a job 
would write using this
    + * WriteSupport and would pass a filter matching the previous job's 
output, like
    + * <code>$"day" === '2018-08-22'</code>, to remove that data and commit 
the replacement data at
    + * the same time.
    + */
    +public interface BatchOverwriteSupport extends BatchWriteSupport {
    --- End diff --
    
    This class will be used to create the `WriteConfig` for idempotent 
overwrite operations. This would be triggered by an overwrite like this (the 
API could be different).
    
    ```
    df.writeTo("table").overwrite($"day" === "2018-08-22")
    ```
    
    That would produce a `OverwriteData(source, deleteFilter, query)` logical 
plan, which would result in the exec node calling this to create the write 
config.



---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request #22190: [SPARK-25188][SQL] Add WriteConfig to v2 write AP...

Reply via email to