[GitHub] spark pull request #21217: [SPARK-24151][SQL] Fix CURRENT_DATE, CURRENT_TIME...

dongjoon-hyun Tue, 17 Jul 2018 15:47:37 -0700

Github user dongjoon-hyun commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21217#discussion_r203205223
  
    --- Diff: docs/sql-programming-guide.md ---
    @@ -1857,6 +1857,7 @@ working with timestamps in `pandas_udf`s to get the 
best performance, see
       - In version 2.3 and earlier, Spark converts Parquet Hive tables by 
default but ignores table properties like `TBLPROPERTIES (parquet.compression 
'NONE')`. This happens for ORC Hive table properties like `TBLPROPERTIES 
(orc.compress 'NONE')` in case of `spark.sql.hive.convertMetastoreOrc=true`, 
too. Since Spark 2.4, Spark respects Parquet/ORC specific table properties 
while converting Parquet/ORC Hive tables. As an example, `CREATE TABLE t(id 
int) STORED AS PARQUET TBLPROPERTIES (parquet.compression 'NONE')` would 
generate Snappy parquet files during insertion in Spark 2.3, and in Spark 2.4, 
the result would be uncompressed parquet files.
       - Since Spark 2.0, Spark converts Parquet Hive tables by default for 
better performance. Since Spark 2.4, Spark converts ORC Hive tables by default, 
too. It means Spark uses its own ORC support by default instead of Hive SerDe. 
As an example, `CREATE TABLE t(id int) STORED AS ORC` would be handled with 
Hive SerDe in Spark 2.3, and in Spark 2.4, it would be converted into Spark's 
ORC data source table and ORC vectorization would be applied. To set `false` to 
`spark.sql.hive.convertMetastoreOrc` restores the previous behavior.
       - In version 2.3 and earlier, CSV rows are considered as malformed if at 
least one column value in the row is malformed. CSV parser dropped such rows in 
the DROPMALFORMED mode or outputs an error in the FAILFAST mode. Since Spark 
2.4, CSV row is considered as malformed only when it contains malformed column 
values requested from CSV datasource, other values can be ignored. As an 
example, CSV file contains the "id,name" header and one row "1234". In Spark 
2.4, selection of the id column consists of a row with one column value 1234 
but in Spark 2.3 and earlier it is empty in the DROPMALFORMED mode. To restore 
the previous behavior, set `spark.sql.csv.parser.columnPruning.enabled` to 
`false`.
    +  - In versions 2.2.1 and 2.3.0, if `spark.sql.caseSensitive` is set to 
true, then the `CURRENT_DATE` and `CURRENT_TIMESTAMP` functions incorrectly 
became case-sensitive and would resolve to columns (unless typed in lower 
case). In later versions, this has been fixed and the functions are no longer 
case-sensitive.
    --- End diff --
    
    Until now, 2.2.2 and 2.3.1 are released, too. Also, 2.3.2 voting is already 
started. 
    So, the range seems to be `2.2.1 ~ 2.3.2`.



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21217: [SPARK-24151][SQL] Fix CURRENT_DATE, CURRENT_TIME...

Reply via email to