[GitHub] spark pull request #23238: [SPARK-25132][SQL][FOLLOWUP] Add migration doc fo...

2018-12-06 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/23238#discussion_r239708569
  
--- Diff: docs/sql-migration-guide-upgrade.md ---
@@ -141,6 +141,8 @@ displayTitle: Spark SQL Upgrading Guide
 
   - In Spark version 2.3 and earlier, HAVING without GROUP BY is treated 
as WHERE. This means, `SELECT 1 FROM range(10) HAVING true` is executed as 
`SELECT 1 FROM range(10) WHERE true`  and returns 10 rows. This violates SQL 
standard, and has been fixed in Spark 2.4. Since Spark 2.4, HAVING without 
GROUP BY is treated as a global aggregate, which means `SELECT 1 FROM range(10) 
HAVING true` will return only one row. To restore the previous behavior, set 
`spark.sql.legacy.parser.havingWithoutGroupByAsWhere` to `true`.
 
+  - In version 2.3 and earlier, when reading from a Parquet data source 
table, Spark always returns null for any column whose column names in Hive 
metastore schema and Parquet schema are in different letter cases, no matter 
whether `spark.sql.caseSensitive` is set to true or false. Since 2.4, when 
`spark.sql.caseSensitive` is set to false, Spark does case insensitive column 
name resolution between Hive metastore schema and Parquet schema, so even 
column names are in different letter cases, Spark returns corresponding column 
values. An exception is thrown if there is ambiguity, i.e. more than one 
Parquet column is matched. This change also applies to Parquet Hive tables when 
`spark.sql.hive.convertMetastoreParquet` is set to true.
--- End diff --

Hi, @seancxmao . Maybe, the followings?
```
- `spark.sql.caseSensitive` is set to true or false
+ `spark.sql.caseSensitive` is set to `true` or `false`
```
```
- `spark.sql.caseSensitive` is set to false
+ `spark.sql.caseSensitive` is set to `false`
```
```
- `spark.sql.hive.convertMetastoreParquet` is set to true
+ `spark.sql.hive.convertMetastoreParquet` is set to `true`
```


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23238: [SPARK-25132][SQL][FOLLOWUP] Add migration doc fo...

2018-12-05 Thread seancxmao
GitHub user seancxmao opened a pull request:

https://github.com/apache/spark/pull/23238

[SPARK-25132][SQL][FOLLOWUP] Add migration doc for case-insensitive field 
resolution when reading from Parquet

## What changes were proposed in this pull request?
#22148 introduces a behavior change. According to discussion at #22184, 
this PR updates migration guide when upgrade from Spark 2.3 to 2.4.

## How was this patch tested?
N/A

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/seancxmao/spark SPARK-25132-doc-2.4

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/23238.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #23238


commit 5bbcf41f34f2ca160da7ef4ebe4c54d15a2d09b5
Author: seancxmao 
Date:   2018-12-05T15:05:38Z

[SPARK-25132][SQL][FOLLOWUP] Update migration doc for case-insensitive 
field resolution when reading from Parquet




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org