[GitHub] spark pull request #20208: [SPARK-23007][SQL][TEST] Add schema evolution tes...

gatorsmile Sun, 25 Mar 2018 00:05:09 -0700

Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20208#discussion_r176930962
  
    --- Diff: docs/sql-programming-guide.md ---
    @@ -815,6 +815,54 @@ should start with, they can set `basePath` in the data 
source options. For examp
     when `path/to/table/gender=male` is the path of the data and
     users set `basePath` to `path/to/table/`, `gender` will be a partitioning 
column.
     
    +### Schema Evolution
    +
    +Users can control schema evolution in several ways. For example, new file 
can have additional
    +new column. All file-based data sources (`csv`, `json`, `orc`, and 
`parquet`) except `text`
    +data source supports this. Note that `text` data source always has a fixed 
single string column
    +schema.
    +
    +<div class="codetabs">
    +
    +<div data-lang="scala"  markdown="1">
    +val df1 = Seq("a", "b").toDF("col1")
    +val df2 = df1.withColumn("col2", lit("x"))
    +
    +df1.write.save("/tmp/evolved_data/part=1")
    +df2.write.save("/tmp/evolved_data/part=2")
    +
    +spark.read.schema("col1 string, col2 
string").load("/tmp/evolved_data").show
    ++----+----+----+
    +|col1|col2|part|
    ++----+----+----+
    +|   a|   x|   2|
    +|   b|   x|   2|
    +|   a|null|   1|
    +|   b|null|   1|
    ++----+----+----+
    +</div>
    +
    +</div>
    +
    +The following schema evolutions are supported in `csv`, `json`, `orc`, and 
`parquet` file-based
    +data sources.
    +
    +  1. Add a column
    +  2. Remove a column
    --- End diff --
    
    In SQL standard, when we remove a column, all the data are removed. 
However, we do not support it. Users could still see the data after they add 
the column with the same name like what they removed previously.



---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request #20208: [SPARK-23007][SQL][TEST] Add schema evolution tes...

Reply via email to