[GitHub] spark pull request #20208: [SPARK-23007][SQL][TEST] Add schema evolution tes...

gatorsmile Sun, 25 Mar 2018 00:08:49 -0700

Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20208#discussion_r176931012
  
    --- Diff: docs/sql-programming-guide.md ---
    @@ -815,6 +815,54 @@ should start with, they can set `basePath` in the data 
source options. For examp
     when `path/to/table/gender=male` is the path of the data and
     users set `basePath` to `path/to/table/`, `gender` will be a partitioning 
column.
     
    +### Schema Evolution
    +
    +Users can control schema evolution in several ways. For example, new file 
can have additional
    +new column. All file-based data sources (`csv`, `json`, `orc`, and 
`parquet`) except `text`
    +data source supports this. Note that `text` data source always has a fixed 
single string column
    +schema.
    +
    +<div class="codetabs">
    +
    +<div data-lang="scala"  markdown="1">
    +val df1 = Seq("a", "b").toDF("col1")
    +val df2 = df1.withColumn("col2", lit("x"))
    +
    +df1.write.save("/tmp/evolved_data/part=1")
    +df2.write.save("/tmp/evolved_data/part=2")
    +
    +spark.read.schema("col1 string, col2 
string").load("/tmp/evolved_data").show
    ++----+----+----+
    +|col1|col2|part|
    ++----+----+----+
    +|   a|   x|   2|
    +|   b|   x|   2|
    +|   a|null|   1|
    +|   b|null|   1|
    ++----+----+----+
    +</div>
    +
    +</div>
    +
    +The following schema evolutions are supported in `csv`, `json`, `orc`, and 
`parquet` file-based
    +data sources.
    +
    +  1. Add a column
    +  2. Remove a column
    +  3. Change a column position
    --- End diff --
    
    Do we support it? When people issuing `select * from tab`, we automatically 
reorder the partition columns to the end of the schema.



---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request #20208: [SPARK-23007][SQL][TEST] Add schema evolution tes...

Reply via email to