[GitHub] spark pull request #20484: [SPARK-23313][DOC] Add a migration guide for ORC

gatorsmile Mon, 12 Feb 2018 14:29:43 -0800

Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20484#discussion_r167707320
  
    --- Diff: docs/sql-programming-guide.md ---
    @@ -1776,6 +1776,35 @@ working with timestamps in `pandas_udf`s to get the 
best performance, see
     
     ## Upgrading From Spark SQL 2.2 to 2.3
     
    +  - Since Spark 2.3, Spark supports a vectorized ORC reader with a new ORC 
file format for ORC files. To do that, the following configurations are newly 
added or change their default values. For ORC tables, the vectorized reader 
will be used for the tables created by `USING ORC`. With 
`spark.sql.hive.convertMetastoreOrc=true`, it will for the tables created by 
`USING HIVE OPTIONS (fileFormat 'ORC')`, too.
    --- End diff --
    
    > The vectorized reader is used for the native ORC tables (e.g., the ones 
created using the clause `USING ORC`) when `spark.sql.orc.impl` is set to 
`native` and `spark.sql.orc.enableVectorizedReader` to `true`. For the Hive ORC 
serde table (e.g., the ones created using the clause `USING HIVE OPTIONS 
(fileFormat 'ORC')`), the vectorized reader is used when 
`spark.sql.hive.convertMetastoreOrc` is set to `true`.



---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request #20484: [SPARK-23313][DOC] Add a migration guide for ORC

Reply via email to