Github user gatorsmile commented on a diff in the pull request:
https://github.com/apache/spark/pull/20208#discussion_r176930962
--- Diff: docs/sql-programming-guide.md ---
@@ -815,6 +815,54 @@ should start with, they can set `basePath` in the data
source options. For examp
when `path/to/table/gender=male` is the path of the data and
users set `basePath` to `path/to/table/`, `gender` will be a partitioning
column.
+### Schema Evolution
+
+Users can control schema evolution in several ways. For example, new file
can have additional
+new column. All file-based data sources (`csv`, `json`, `orc`, and
`parquet`) except `text`
+data source supports this. Note that `text` data source always has a fixed
single string column
+schema.
+
+<div class="codetabs">
+
+<div data-lang="scala" markdown="1">
+val df1 = Seq("a", "b").toDF("col1")
+val df2 = df1.withColumn("col2", lit("x"))
+
+df1.write.save("/tmp/evolved_data/part=1")
+df2.write.save("/tmp/evolved_data/part=2")
+
+spark.read.schema("col1 string, col2
string").load("/tmp/evolved_data").show
++----+----+----+
+|col1|col2|part|
++----+----+----+
+| a| x| 2|
+| b| x| 2|
+| a|null| 1|
+| b|null| 1|
++----+----+----+
+</div>
+
+</div>
+
+The following schema evolutions are supported in `csv`, `json`, `orc`, and
`parquet` file-based
+data sources.
+
+ 1. Add a column
+ 2. Remove a column
--- End diff --
In SQL standard, when we remove a column, all the data are removed.
However, we do not support it. Users could still see the data after they add
the column with the same name like what they removed previously.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]