Github user dongjoon-hyun commented on a diff in the pull request:
https://github.com/apache/spark/pull/20208#discussion_r176933664
--- Diff: docs/sql-programming-guide.md ---
@@ -815,6 +815,54 @@ should start with, they can set `basePath` in the data
source options. For examp
when `path/to/table/gender=male` is the path of the data and
users set `basePath` to `path/to/table/`, `gender` will be a partitioning
column.
+### Schema Evolution
+
+Users can control schema evolution in several ways. For example, new file
can have additional
+new column. All file-based data sources (`csv`, `json`, `orc`, and
`parquet`) except `text`
+data source supports this. Note that `text` data source always has a fixed
single string column
+schema.
+
+<div class="codetabs">
+
+<div data-lang="scala" markdown="1">
+val df1 = Seq("a", "b").toDF("col1")
+val df2 = df1.withColumn("col2", lit("x"))
+
+df1.write.save("/tmp/evolved_data/part=1")
+df2.write.save("/tmp/evolved_data/part=2")
+
+spark.read.schema("col1 string, col2
string").load("/tmp/evolved_data").show
++----+----+----+
+|col1|col2|part|
++----+----+----+
+| a| x| 2|
+| b| x| 2|
+| a|null| 1|
+| b|null| 1|
++----+----+----+
+</div>
+
+</div>
+
+The following schema evolutions are supported in `csv`, `json`, `orc`, and
`parquet` file-based
+data sources.
+
+ 1. Add a column
+ 2. Remove a column
+ 3. Change a column position
+ 4. Change a column type (`byte` -> `short` -> `int` -> `long`, `float`
-> `double`)
--- End diff --
Yep. `Upcast`s are safe. This PR doesn't aim to cover or guarantee unsafe
casting at this stage. Although these are straight-forward `upcast`s, not all
Spark file-based data sources seems to support them (based on the test cases).
This PR is trying to set the clear boundary and to clarify those missed things.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]