[GitHub] [spark] cloud-fan commented on a change in pull request #35705: [SPARK-38382][DOC] Unify the presentation of migration guide doc

GitBox Thu, 03 Mar 2022 05:33:45 -0800


cloud-fan commented on a change in pull request #35705:
URL: https://github.com/apache/spark/pull/35705#discussion_r818658970




##########
File path: docs/sql-migration-guide.md
##########
@@ -109,123 +110,123 @@ license: |
     * and the method `spark.catalog.refreshTable`
   In Spark 3.1 and earlier, table refreshing leaves dependents uncached.
 
-  - In Spark 3.2, the usage of `count(tblName.*)` is blocked to avoid 
producing ambiguous results. Because `count(*)` and `count(tblName.*)` will 
output differently if there is any null values. To restore the behavior before 
Spark 3.2, you can set 
`spark.sql.legacy.allowStarWithSingleTableIdentifierInCount` to `true`.
+  - Since Spark 3.2, the usage of `count(tblName.*)` is blocked to avoid 
producing ambiguous results. Because `count(*)` and `count(tblName.*)` will 
output differently if there is any null values. To restore the behavior before 
Spark 3.2, you can set 
`spark.sql.legacy.allowStarWithSingleTableIdentifierInCount` to `true`.
 
-  - In Spark 3.2, we support typed literals in the partition spec of INSERT 
and ADD/DROP/RENAME PARTITION. For example, `ADD PARTITION(dt = 
date'2020-01-01')` adds a partition with date value `2020-01-01`. In Spark 3.1 
and earlier, the partition value will be parsed as string value `date 
'2020-01-01'`, which is an illegal date value, and we add a partition with null 
value at the end.
+  - Since Spark 3.2, we support typed literals in the partition spec of INSERT 
and ADD/DROP/RENAME PARTITION. For example, `ADD PARTITION(dt = 
date'2020-01-01')` adds a partition with date value `2020-01-01`. In Spark 3.1 
and earlier, the partition value will be parsed as string value `date 
'2020-01-01'`, which is an illegal date value, and we add a partition with null 
value at the end.
       
-  - In Spark 3.2, `DataFrameNaFunctions.replace()` no longer uses exact string 
match for the input column names, to match the SQL syntax and support qualified 
column names. Input column name having a dot in the name (not nested) needs to 
be escaped with backtick \`. Now, it throws `AnalysisException` if the column 
is not found in the data frame schema. It also throws 
`IllegalArgumentException` if the input column name is a nested column. In 
Spark 3.1 and earlier, it used to ignore invalid input column name and nested 
column name.
+  - Since Spark 3.2, `DataFrameNaFunctions.replace()` no longer uses exact 
string match for the input column names, to match the SQL syntax and support 
qualified column names. Input column name having a dot in the name (not nested) 
needs to be escaped with backtick \`. Now, it throws `AnalysisException` if the 
column is not found in the data frame schema. It also throws 
`IllegalArgumentException` if the input column name is a nested column. In 
Spark 3.1 and earlier, it used to ignore invalid input column name and nested 
column name.
 
-  - In Spark 3.2, the dates subtraction expression such as `date1 - date2` 
returns values of `DayTimeIntervalType`. In Spark 3.1 and earlier, the returned 
type is `CalendarIntervalType`. To restore the behavior before Spark 3.2, you 
can set `spark.sql.legacy.interval.enabled` to `true`.
+  - Since Spark 3.2, the dates subtraction expression such as `date1 - date2` 
returns values of `DayTimeIntervalType`. In Spark 3.1 and earlier, the returned 
type is `CalendarIntervalType`. To restore the behavior before Spark 3.2, you 
can set `spark.sql.legacy.interval.enabled` to `true`.
 
-  - In Spark 3.2, the timestamps subtraction expression such as `timestamp 
'2021-03-31 23:48:00' - timestamp '2021-01-01 00:00:00'` returns values of 
`DayTimeIntervalType`. In Spark 3.1 and earlier, the type of the same 
expression is `CalendarIntervalType`. To restore the behavior before Spark 3.2, 
you can set `spark.sql.legacy.interval.enabled` to `true`.
+  - Since Spark 3.2, the timestamps subtraction expression such as `timestamp 
'2021-03-31 23:48:00' - timestamp '2021-01-01 00:00:00'` returns values of 
`DayTimeIntervalType`. In Spark 3.1 and earlier, the type of the same 
expression is `CalendarIntervalType`. To restore the behavior before Spark 3.2, 
you can set `spark.sql.legacy.interval.enabled` to `true`.
 
-  - In Spark 3.2, `CREATE TABLE .. LIKE ..` command can not use reserved 
properties. You need their specific clauses to specify them, for example, 
`CREATE TABLE test1 LIKE test LOCATION 'some path'`. You can set 
`spark.sql.legacy.notReserveProperties` to `true` to ignore the 
`ParseException`, in this case, these properties will be silently removed, for 
example: `TBLPROPERTIES('owner'='yao')` will have no effect. In Spark version 
3.1 and below, the reserved properties can be used in `CREATE TABLE .. LIKE ..` 
command but have no side effects, for example, 
`TBLPROPERTIES('location'='/tmp')` does not change the location of the table 
but only create a headless property just like `'a'='b'`.
+  - Since Spark 3.2, `CREATE TABLE .. LIKE ..` command can not use reserved 
properties. You need their specific clauses to specify them, for example, 
`CREATE TABLE test1 LIKE test LOCATION 'some path'`. You can set 
`spark.sql.legacy.notReserveProperties` to `true` to ignore the 
`ParseException`, in this case, these properties will be silently removed, for 
example: `TBLPROPERTIES('owner'='yao')` will have no effect. In Spark version 
3.1 and below, the reserved properties can be used in `CREATE TABLE .. LIKE ..` 
command but have no side effects, for example, 
`TBLPROPERTIES('location'='/tmp')` does not change the location of the table 
but only create a headless property just like `'a'='b'`.
 
-  - In Spark 3.2, `TRANSFORM` operator can't support alias in inputs. In Spark 
3.1 and earlier, we can write script transform like `SELECT TRANSFORM(a AS c1, 
b AS c2) USING 'cat' FROM TBL`.
+  - Since Spark 3.2, `TRANSFORM` operator can't support alias in inputs. In 
Spark 3.1 and earlier, we can write script transform like `SELECT TRANSFORM(a 
AS c1, b AS c2) USING 'cat' FROM TBL`.
 
-  - In Spark 3.2, `TRANSFORM` operator can support 
`ArrayType/MapType/StructType` without Hive SerDe, in this mode, we use 
`StructsToJosn` to convert `ArrayType/MapType/StructType` column to `STRING` 
and use `JsonToStructs` to parse `STRING` to `ArrayType/MapType/StructType`. In 
Spark 3.1, Spark just support case `ArrayType/MapType/StructType` column as 
`STRING` but can't support parse `STRING` to `ArrayType/MapType/StructType` 
output columns.
+  - Since Spark 3.2, `TRANSFORM` operator can support 
`ArrayType/MapType/StructType` without Hive SerDe, in this mode, we use 
`StructsToJosn` to convert `ArrayType/MapType/StructType` column to `STRING` 
and use `JsonToStructs` to parse `STRING` to `ArrayType/MapType/StructType`. In 
Spark 3.1, Spark just support case `ArrayType/MapType/StructType` column as 
`STRING` but can't support parse `STRING` to `ArrayType/MapType/StructType` 
output columns.
 
-  - In Spark 3.2, the unit-to-unit interval literals like `INTERVAL '1-1' YEAR 
TO MONTH` and the unit list interval literals like `INTERVAL '3' DAYS '1' HOUR` 
are converted to ANSI interval types: `YearMonthIntervalType` or 
`DayTimeIntervalType`. In Spark 3.1 and earlier, such interval literals are 
converted to `CalendarIntervalType`. To restore the behavior before Spark 3.2, 
you can set `spark.sql.legacy.interval.enabled` to `true`.
+  - Since Spark 3.2, the unit-to-unit interval literals like `INTERVAL '1-1' 
YEAR TO MONTH` and the unit list interval literals like `INTERVAL '3' DAYS '1' 
HOUR` are converted to ANSI interval types: `YearMonthIntervalType` or 
`DayTimeIntervalType`. In Spark 3.1 and earlier, such interval literals are 
converted to `CalendarIntervalType`. To restore the behavior before Spark 3.2, 
you can set `spark.sql.legacy.interval.enabled` to `true`.
 
-  - In Spark 3.2, the unit list interval literals can not mix year-month 
fields (YEAR and MONTH) and day-time fields (WEEK, DAY, ..., MICROSECOND). For 
example, `INTERVAL 1 month 1 hour` is invalid in Spark 3.2. In Spark 3.1 and 
earlier, there is no such limitation and the literal returns value of 
`CalendarIntervalType`. To restore the behavior before Spark 3.2, you can set 
`spark.sql.legacy.interval.enabled` to `true`.
+  - Since Spark 3.2, the unit list interval literals can not mix year-month 
fields (YEAR and MONTH) and day-time fields (WEEK, DAY, ..., MICROSECOND). For 
example, `INTERVAL 1 month 1 hour` is invalid in Spark 3.2. In Spark 3.1 and 
earlier, there is no such limitation and the literal returns value of 
`CalendarIntervalType`. To restore the behavior before Spark 3.2, you can set 
`spark.sql.legacy.interval.enabled` to `true`.
 
-  - In Spark 3.2, Spark supports `DayTimeIntervalType` and 
`YearMonthIntervalType` as inputs and outputs of `TRANSFORM` clause in Hive 
`SERDE` mode, the behavior is different between Hive `SERDE` mode and `ROW 
FORMAT DELIMITED` mode when these two types are used as inputs. In Hive `SERDE` 
mode, `DayTimeIntervalType` column is converted to `HiveIntervalDayTime`, its 
string format is `[-]?d h:m:s.n`, but in `ROW FORMAT DELIMITED` mode the format 
is `INTERVAL '[-]?d h:m:s.n' DAY TO TIME`. In Hive `SERDE` mode, 
`YearMonthIntervalType` column is converted to `HiveIntervalYearMonth`, its 
string format is `[-]?y-m`, but in `ROW FORMAT DELIMITED` mode the format is 
`INTERVAL '[-]?y-m' YEAR TO MONTH`.
+  - Since Spark 3.2, Spark supports `DayTimeIntervalType` and 
`YearMonthIntervalType` as inputs and outputs of `TRANSFORM` clause in Hive 
`SERDE` mode, the behavior is different between Hive `SERDE` mode and `ROW 
FORMAT DELIMITED` mode when these two types are used as inputs. In Hive `SERDE` 
mode, `DayTimeIntervalType` column is converted to `HiveIntervalDayTime`, its 
string format is `[-]?d h:m:s.n`, but in `ROW FORMAT DELIMITED` mode the format 
is `INTERVAL '[-]?d h:m:s.n' DAY TO TIME`. In Hive `SERDE` mode, 
`YearMonthIntervalType` column is converted to `HiveIntervalYearMonth`, its 
string format is `[-]?y-m`, but in `ROW FORMAT DELIMITED` mode the format is 
`INTERVAL '[-]?y-m' YEAR TO MONTH`.
 
-  - In Spark 3.2, `hash(0) == hash(-0)` for floating point types. Previously, 
different values were generated.
+  - Since Spark 3.2, `hash(0) == hash(-0)` for floating point types. 
Previously, different values were generated.
 
-  - In Spark 3.2, `CREATE TABLE AS SELECT` with non-empty `LOCATION` will 
throw `AnalysisException`. To restore the behavior before Spark 3.2, you can 
set `spark.sql.legacy.allowNonEmptyLocationInCTAS` to `true`.
+  - Since Spark 3.2, `CREATE TABLE AS SELECT` with non-empty `LOCATION` will 
throw `AnalysisException`. To restore the behavior before Spark 3.2, you can 
set `spark.sql.legacy.allowNonEmptyLocationInCTAS` to `true`.
 
-  - In Spark 3.2, special datetime values such as `epoch`, `today`, 
`yesterday`, `tomorrow`, and `now` are supported in typed literals or in cast 
of foldable strings only, for instance, `select timestamp'now'` or `select 
cast('today' as date)`. In Spark 3.1 and 3.0, such special values are supported 
in any casts of strings to dates/timestamps. To keep these special values as 
dates/timestamps in Spark 3.1 and 3.0, you should replace them manually, e.g. 
`if (c in ('now', 'today'), current_date(), cast(c as date))`.
+  - Since Spark 3.2, special datetime values such as `epoch`, `today`, 
`yesterday`, `tomorrow`, and `now` are supported in typed literals or in cast 
of foldable strings only, for instance, `select timestamp'now'` or `select 
cast('today' as date)`. In Spark 3.1 and 3.0, such special values are supported 
in any casts of strings to dates/timestamps. To keep these special values as 
dates/timestamps in Spark 3.1 and 3.0, you should replace them manually, e.g. 
`if (c in ('now', 'today'), current_date(), cast(c as date))`.
   
-  - In Spark 3.2, `FloatType` is mapped to `FLOAT` in MySQL. Prior to this, it 
used to be mapped to `REAL`, which is by default a synonym to `DOUBLE 
PRECISION` in MySQL. 
+  - Since Spark 3.2, `FloatType` is mapped to `FLOAT` in MySQL. Prior to this, 
it used to be mapped to `REAL`, which is by default a synonym to `DOUBLE 
PRECISION` in MySQL. 
 
-  - In Spark 3.2, the query executions triggered by `DataFrameWriter` are 
always named `command` when being sent to `QueryExecutionListener`. In Spark 
3.1 and earlier, the name is one of `save`, `insertInto`, `saveAsTable`.
+  - Since Spark 3.2, the query executions triggered by `DataFrameWriter` are 
always named `command` when being sent to `QueryExecutionListener`. In Spark 
3.1 and earlier, the name is one of `save`, `insertInto`, `saveAsTable`.
   
-  - In Spark 3.2, `Dataset.unionByName` with `allowMissingColumns` set to true 
will add missing nested fields to the end of structs. In Spark 3.1, nested 
struct fields are sorted alphabetically.
+  - Since Spark 3.2, `Dataset.unionByName` with `allowMissingColumns` set to 
true will add missing nested fields to the end of structs. In Spark 3.1, nested 
struct fields are sorted alphabetically.
 
-  - In Spark 3.2, create/alter view will fail if the input query output 
columns contain auto-generated alias. This is necessary to make sure the query 
output column names are stable across different spark versions. To restore the 
behavior before Spark 3.2, set 
`spark.sql.legacy.allowAutoGeneratedAliasForView` to `true`.
+  - Since Spark 3.2, create/alter view will fail if the input query output 
columns contain auto-generated alias. This is necessary to make sure the query 
output column names are stable across different spark versions. To restore the 
behavior before Spark 3.2, set 
`spark.sql.legacy.allowAutoGeneratedAliasForView` to `true`.
 
-  - In Spark 3.2, date +/- interval with only day-time fields such as `date 
'2011-11-11' + interval 12 hours` returns timestamp. In Spark 3.1 and earlier, 
the same expression returns date. To restore the behavior before Spark 3.2, you 
can use `cast` to convert timestamp as date.
+  - Since Spark 3.2, date +/- interval with only day-time fields such as `date 
'2011-11-11' + interval 12 hours` returns timestamp. In Spark 3.1 and earlier, 
the same expression returns date. To restore the behavior before Spark 3.2, you 
can use `cast` to convert timestamp as date.
 
 ## Upgrading from Spark SQL 3.0 to 3.1
 
-  - In Spark 3.1, statistical aggregation function includes `std`, `stddev`, 
`stddev_samp`, `variance`, `var_samp`, `skewness`, `kurtosis`, `covar_samp`, 
`corr` will return `NULL` instead of `Double.NaN` when `DivideByZero` occurs 
during expression evaluation, for example, when `stddev_samp` applied on a 
single element set. In Spark version 3.0 and earlier, it will return 
`Double.NaN` in such case. To restore the behavior before Spark 3.1, you can 
set `spark.sql.legacy.statisticalAggregate` to `true`.
+  - Since Spark 3.1, statistical aggregation function includes `std`, 
`stddev`, `stddev_samp`, `variance`, `var_samp`, `skewness`, `kurtosis`, 
`covar_samp`, `corr` will return `NULL` instead of `Double.NaN` when 
`DivideByZero` occurs during expression evaluation, for example, when 
`stddev_samp` applied on a single element set. In Spark version 3.0 and 
earlier, it will return `Double.NaN` in such case. To restore the behavior 
before Spark 3.1, you can set `spark.sql.legacy.statisticalAggregate` to `true`.
 
-  - In Spark 3.1, grouping_id() returns long values. In Spark version 3.0 and 
earlier, this function returns int values. To restore the behavior before Spark 
3.1, you can set `spark.sql.legacy.integerGroupingId` to `true`.
+  - Since Spark 3.1, grouping_id() returns long values. In Spark 3.0 or 
earlier, this function returns int values. To restore the behavior before Spark 
3.1, you can set `spark.sql.legacy.integerGroupingId` to `true`.
 
-  - In Spark 3.1, SQL UI data adopts the `formatted` mode for the query plan 
explain results. To restore the behavior before Spark 3.1, you can set 
`spark.sql.ui.explainMode` to `extended`.
+  - Since Spark 3.1, SQL UI data adopts the `formatted` mode for the query 
plan explain results. To restore the behavior before Spark 3.1, you can set 
`spark.sql.ui.explainMode` to `extended`.
   
-  - In Spark 3.1, `from_unixtime`, `unix_timestamp`,`to_unix_timestamp`, 
`to_timestamp` and `to_date` will fail if the specified datetime pattern is 
invalid. In Spark 3.0 or earlier, they result `NULL`.
+  - Since Spark 3.1, `from_unixtime`, `unix_timestamp`,`to_unix_timestamp`, 
`to_timestamp` and `to_date` will fail if the specified datetime pattern is 
invalid. In Spark 3.0 or earlier, they result `NULL`.
   
-  - In Spark 3.1, the Parquet, ORC, Avro and JSON datasources throw the 
exception `org.apache.spark.sql.AnalysisException: Found duplicate column(s) in 
the data schema` in read if they detect duplicate names in top-level columns as 
well in nested structures. The datasources take into account the SQL config 
`spark.sql.caseSensitive` while detecting column name duplicates.
+  - Since Spark 3.1, the Parquet, ORC, Avro and JSON datasources throw the 
exception `org.apache.spark.sql.AnalysisException: Found duplicate column(s) in 
the data schema` in read if they detect duplicate names in top-level columns as 
well in nested structures. The datasources take into account the SQL config 
`spark.sql.caseSensitive` while detecting column name duplicates.
 
-  - In Spark 3.1, structs and maps are wrapped by the `{}` brackets in casting 
them to strings. For instance, the `show()` action and the `CAST` expression 
use such brackets. In Spark 3.0 and earlier, the `[]` brackets are used for the 
same purpose. To restore the behavior before Spark 3.1, you can set 
`spark.sql.legacy.castComplexTypesToString.enabled` to `true`.
+  - Since Spark 3.1, structs and maps are wrapped by the `{}` brackets in 
casting them to strings. For instance, the `show()` action and the `CAST` 
expression use such brackets. In Spark 3.0 or earlier, the `[]` brackets are 
used for the same purpose. To restore the behavior before Spark 3.1, you can 
set `spark.sql.legacy.castComplexTypesToString.enabled` to `true`.
 
-  - In Spark 3.1, NULL elements of structures, arrays and maps are converted 
to "null" in casting them to strings. In Spark 3.0 or earlier, NULL elements 
are converted to empty strings. To restore the behavior before Spark 3.1, you 
can set `spark.sql.legacy.castComplexTypesToString.enabled` to `true`.
+  - Since Spark 3.1, NULL elements of structures, arrays and maps are 
converted to "null" in casting them to strings. In Spark 3.0 or earlier, NULL 
elements are converted to empty strings. To restore the behavior before Spark 
3.1, you can set `spark.sql.legacy.castComplexTypesToString.enabled` to `true`.
 
-  - In Spark 3.1, when `spark.sql.ansi.enabled` is false, Spark always returns 
null if the sum of decimal type column overflows. In Spark 3.0 or earlier, in 
the case, the sum of decimal type column may return null or incorrect result, 
or even fails at runtime (depending on the actual query plan execution).
+  - Since Spark 3.1, when `spark.sql.ansi.enabled` is false, Spark always 
returns null if the sum of decimal type column overflows. In Spark 3.0 or 
earlier, in the case, the sum of decimal type column may return null or 
incorrect result, or even fails at runtime (depending on the actual query plan 
execution).
 
-  - In Spark 3.1, `path` option cannot coexist when the following methods are 
called with path parameter(s): `DataFrameReader.load()`, 
`DataFrameWriter.save()`, `DataStreamReader.load()`, or 
`DataStreamWriter.start()`. In addition, `paths` option cannot coexist for 
`DataFrameReader.load()`. For example, `spark.read.format("csv").option("path", 
"/tmp").load("/tmp2")` or `spark.read.option("path", "/tmp").csv("/tmp2")` will 
throw `org.apache.spark.sql.AnalysisException`. In Spark version 3.0 and below, 
`path` option is overwritten if one path parameter is passed to above methods; 
`path` option is added to the overall paths if multiple path parameters are 
passed to `DataFrameReader.load()`. To restore the behavior before Spark 3.1, 
you can set `spark.sql.legacy.pathOptionBehavior.enabled` to `true`.
+  - Since Spark 3.1, `path` option cannot coexist when the following methods 
are called with path parameter(s): `DataFrameReader.load()`, 
`DataFrameWriter.save()`, `DataStreamReader.load()`, or 
`DataStreamWriter.start()`. In addition, `paths` option cannot coexist for 
`DataFrameReader.load()`. For example, `spark.read.format("csv").option("path", 
"/tmp").load("/tmp2")` or `spark.read.option("path", "/tmp").csv("/tmp2")` will 
throw `org.apache.spark.sql.AnalysisException`. In Spark 3.0 or earlier, `path` 
option is overwritten if one path parameter is passed to above methods; `path` 
option is added to the overall paths if multiple path parameters are passed to 
`DataFrameReader.load()`. To restore the behavior before Spark 3.1, you can set 
`spark.sql.legacy.pathOptionBehavior.enabled` to `true`.
 
-  - In Spark 3.1, `IllegalArgumentException` is returned for the incomplete 
interval literals, e.g. `INTERVAL '1'`, `INTERVAL '1 DAY 2'`, which are 
invalid. In Spark 3.0, these literals result in `NULL`s.
+  - Since Spark 3.1, `IllegalArgumentException` is returned for the incomplete 
interval literals, e.g. `INTERVAL '1'`, `INTERVAL '1 DAY 2'`, which are 
invalid. In Spark 3.0, these literals result in `NULL`s.
 
-  - In Spark 3.1, we remove the built-in Hive 1.2. You need to migrate your 
custom SerDes to Hive 2.3. See 
[HIVE-15167](https://issues.apache.org/jira/browse/HIVE-15167) for more details.
+  - Since Spark 3.1, we remove the built-in Hive 1.2. You need to migrate your 
custom SerDes to Hive 2.3. See 
[HIVE-15167](https://issues.apache.org/jira/browse/HIVE-15167) for more details.
   
-  - In Spark 3.1, loading and saving of timestamps from/to parquet files fails 
if the timestamps are before 1900-01-01 00:00:00Z, and loaded (saved) as the 
INT96 type. In Spark 3.0, the actions don't fail but might lead to shifting of 
the input timestamps due to rebasing from/to Julian to/from Proleptic Gregorian 
calendar. To restore the behavior before Spark 3.1, you can set 
`spark.sql.legacy.parquet.int96RebaseModeInRead` or/and 
`spark.sql.legacy.parquet.int96RebaseModeInWrite` to `LEGACY`.
+  - Since Spark 3.1, loading and saving of timestamps from/to parquet files 
fails if the timestamps are before 1900-01-01 00:00:00Z, and loaded (saved) as 
the INT96 type. In Spark 3.0, the actions don't fail but might lead to shifting 
of the input timestamps due to rebasing from/to Julian to/from Proleptic 
Gregorian calendar. To restore the behavior before Spark 3.1, you can set 
`spark.sql.legacy.parquet.int96RebaseModeInRead` or/and 
`spark.sql.legacy.parquet.int96RebaseModeInWrite` to `LEGACY`.
   
-  - In Spark 3.1, the `schema_of_json` and `schema_of_csv` functions return 
the schema in the SQL format in which field names are quoted. In Spark 3.0, the 
function returns a catalog string without field quoting and in lower case. 
+  - Since Spark 3.1, the `schema_of_json` and `schema_of_csv` functions return 
the schema in the SQL format in which field names are quoted. In Spark 3.0, the 
function returns a catalog string without field quoting and in lower case. 
 
-  - In Spark 3.1, refreshing a table will trigger an uncache operation for all 
other caches that reference the table, even if the table itself is not cached. 
In Spark 3.0 the operation will only be triggered if the table itself is cached.
+  - Since Spark 3.1, refreshing a table will trigger an uncache operation for 
all other caches that reference the table, even if the table itself is not 
cached. In Spark 3.0, the operation will only be triggered if the table itself 
is cached.
   
-  - In Spark 3.1, creating or altering a permanent view will capture runtime 
SQL configs and store them as view properties. These configs will be applied 
during the parsing and analysis phases of the view resolution. To restore the 
behavior before Spark 3.1, you can set 
`spark.sql.legacy.useCurrentConfigsForView` to `true`.
+  - Since Spark 3.1, creating or altering a permanent view will capture 
runtime SQL configs and store them as view properties. These configs will be 
applied during the parsing and analysis phases of the view resolution. To 
restore the behavior before Spark 3.1, you can set 
`spark.sql.legacy.useCurrentConfigsForView` to `true`.
 
-  - In Spark 3.1, the temporary view will have same behaviors with the 
permanent view, i.e. capture and store runtime SQL configs, SQL text, catalog 
and namespace. The capatured view properties will be applied during the parsing 
and analysis phases of the view resolution. To restore the behavior before 
Spark 3.1, you can set `spark.sql.legacy.storeAnalyzedPlanForView` to `true`.
+  - Since Spark 3.1, the temporary view will have same behaviors with the 
permanent view, i.e. capture and store runtime SQL configs, SQL text, catalog 
and namespace. The capatured view properties will be applied during the parsing 
and analysis phases of the view resolution. To restore the behavior before 
Spark 3.1, you can set `spark.sql.legacy.storeAnalyzedPlanForView` to `true`.
 
-  - In Spark 3.1, temporary view created via `CACHE TABLE ... AS SELECT` will 
also have the same behavior with permanent view. In particular, when the 
temporary view is dropped, Spark will invalidate all its cache dependents, as 
well as the cache for the temporary view itself. This is different from Spark 
3.0 and below, which only does the latter. To restore the previous behavior, 
you can set `spark.sql.legacy.storeAnalyzedPlanForView` to `true`.
+  - Since Spark 3.1, temporary view created via `CACHE TABLE ... AS SELECT` 
will also have the same behavior with permanent view. In particular, when the 
temporary view is dropped, Spark will invalidate all its cache dependents, as 
well as the cache for the temporary view itself. In Spark 3.0 or earlier, Spark 
only does the latter. To restore the previous behavior, you can set 
`spark.sql.legacy.storeAnalyzedPlanForView` to `true`.
 
   - Since Spark 3.1, CHAR/CHARACTER and VARCHAR types are supported in the 
table schema. Table scan/insertion will respect the char/varchar semantic. If 
char/varchar is used in places other than table schema, an exception will be 
thrown (CAST is an exception that simply treats char/varchar as string like 
before). To restore the behavior before Spark 3.1, which treats them as STRING 
types and ignores a length parameter, e.g. `CHAR(4)`, you can set 
`spark.sql.legacy.charVarcharAsString` to `true`.
 
-  - In Spark 3.1, `AnalysisException` is replaced by its sub-classes that are 
thrown for tables from Hive external catalog in the following situations:
+  - Since Spark 3.1, `AnalysisException` is replaced by its sub-classes that 
are thrown for tables from Hive external catalog in the following situations:
     * `ALTER TABLE .. ADD PARTITION` throws `PartitionsAlreadyExistException` 
if new partition exists already
     * `ALTER TABLE .. DROP PARTITION` throws `NoSuchPartitionsException` for 
not existing partitions
 
 ## Upgrading from Spark SQL 3.0.1 to 3.0.2
 
-  - In Spark 3.0.2, `AnalysisException` is replaced by its sub-classes that 
are thrown for tables from Hive external catalog in the following situations:
+  - Since Spark 3.0.2, `AnalysisException` is replaced by its sub-classes that 
are thrown for tables from Hive external catalog in the following situations:
     * `ALTER TABLE .. ADD PARTITION` throws `PartitionsAlreadyExistException` 
if new partition exists already
     * `ALTER TABLE .. DROP PARTITION` throws `NoSuchPartitionsException` for 
not existing partitions
 
-  - In Spark 3.0.2, `PARTITION(col=null)` is always parsed as a null literal 
in the partition spec. In Spark 3.0.1 or earlier, it is parsed as a string 
literal of its text representation, e.g., string "null", if the partition 
column is string type. To restore the legacy behavior, you can set 
`spark.sql.legacy.parseNullPartitionSpecAsStringLiteral` as true.
+  - Since Spark 3.0.2, `PARTITION(col=null)` is always parsed as a null 
literal in the partition spec. In Spark 3.0.1 or earlier, it is parsed as a 
string literal of its text representation, e.g., string "null", if the 
partition column is string type. To restore the legacy behavior, you can set 
`spark.sql.legacy.parseNullPartitionSpecAsStringLiteral` as true.
 
-  - In Spark 3.0.0, the output schema of `SHOW DATABASES` becomes `namespace: 
string`. In Spark version 2.4 and earlier, the schema was `databaseName: 
string`. Since Spark 3.0.2, you can restore the old schema by setting 
`spark.sql.legacy.keepCommandOutputSchema` to `true`.
+  - Since Spark 3.0.2, the output schema of `SHOW DATABASES` becomes 
`namespace: string`. In Spark 3.0.1 or earlier, the schema was `databaseName: 
string`. Since Spark 3.0.2, you can restore the old schema by setting 
`spark.sql.legacy.keepCommandOutputSchema` to `true`.

Review comment:
       Can you check the commit history, I do remember that we made a behavior 
change first, then added a legacy config to restore old behaviors.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] cloud-fan commented on a change in pull request #35705: [SPARK-38382][DOC] Unify the presentation of migration guide doc

Reply via email to