Github user gatorsmile commented on a diff in the pull request:
https://github.com/apache/spark/pull/20484#discussion_r167707320
--- Diff: docs/sql-programming-guide.md ---
@@ -1776,6 +1776,35 @@ working with timestamps in `pandas_udf`s to get the
best performance, see
## Upgrading From Spark SQL 2.2 to 2.3
+ - Since Spark 2.3, Spark supports a vectorized ORC reader with a new ORC
file format for ORC files. To do that, the following configurations are newly
added or change their default values. For ORC tables, the vectorized reader
will be used for the tables created by `USING ORC`. With
`spark.sql.hive.convertMetastoreOrc=true`, it will for the tables created by
`USING HIVE OPTIONS (fileFormat 'ORC')`, too.
--- End diff --
> The vectorized reader is used for the native ORC tables (e.g., the ones
created using the clause `USING ORC`) when `spark.sql.orc.impl` is set to
`native` and `spark.sql.orc.enableVectorizedReader` to `true`. For the Hive ORC
serde table (e.g., the ones created using the clause `USING HIVE OPTIONS
(fileFormat 'ORC')`), the vectorized reader is used when
`spark.sql.hive.convertMetastoreOrc` is set to `true`.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]