Github user HyukjinKwon commented on a diff in the pull request:
https://github.com/apache/spark/pull/20208#discussion_r162837578
--- Diff:
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/SchemaEvolutionTest.scala
---
@@ -0,0 +1,406 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.datasources
+
+import java.io.File
+
+import org.apache.spark.sql.{QueryTest, Row}
+import org.apache.spark.sql.functions._
+import org.apache.spark.sql.test.{SharedSQLContext, SQLTestUtils}
+
+/**
+ * Schema can evolve in several ways and the followings are supported in
file-based data sources.
+ *
+ * 1. Add a column
+ * 2. Remove a column
+ * 3. Change a column position
+ * 4. Change a column type
+ *
+ * Here, we consider safe evolution without data loss. For example, data
type evolution should be
+ * from small types to larger types like `int`-to-`long`, not vice versa.
+ *
+ * So far, file-based data sources have schema evolution coverages like
the followings.
+ *
+ * | File Format | Coverage | Note
|
+ * | ------------ | ------------ |
------------------------------------------------------ |
+ * | TEXT | N/A | Schema consists of a single string
column. |
+ * | CSV | 1, 2, 4 |
|
+ * | JSON | 1, 2, 3, 4 |
|
+ * | ORC | 1, 2, 3, 4 | Native vectorized ORC reader has the
widest coverage. |
+ * | PARQUET | 1, 2, 3 |
|
--- End diff --
Ohaaa, the schema is explicitly set here. Sorry, I missed it.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]