Github user xuanyuanking commented on a diff in the pull request:
https://github.com/apache/spark/pull/22746#discussion_r226226005
--- Diff: docs/sql-data-sources-other.md ---
@@ -0,0 +1,114 @@
+---
+layout: global
+title: Other Data Sources
+displayTitle: Other Data Sources
+---
+
+* Table of contents
+{:toc}
+
+## ORC Files
+
+Since Spark 2.3, Spark supports a vectorized ORC reader with a new ORC
file format for ORC files.
+To do that, the following configurations are newly added. The vectorized
reader is used for the
+native ORC tables (e.g., the ones created using the clause `USING ORC`)
when `spark.sql.orc.impl`
+is set to `native` and `spark.sql.orc.enableVectorizedReader` is set to
`true`. For the Hive ORC
+serde tables (e.g., the ones created using the clause `USING HIVE OPTIONS
(fileFormat 'ORC')`),
+the vectorized reader is used when `spark.sql.hive.convertMetastoreOrc` is
also set to `true`.
+
+<table class="table">
+ <tr><th><b>Property
Name</b></th><th><b>Default</b></th><th><b>Meaning</b></th></tr>
+ <tr>
+ <td><code>spark.sql.orc.impl</code></td>
+ <td><code>native</code></td>
+ <td>The name of ORC implementation. It can be one of
<code>native</code> and <code>hive</code>. <code>native</code> means the native
ORC support that is built on Apache ORC 1.4. `hive` means the ORC library in
Hive 1.2.1.</td>
+ </tr>
+ <tr>
+ <td><code>spark.sql.orc.enableVectorizedReader</code></td>
+ <td><code>true</code></td>
+ <td>Enables vectorized orc decoding in <code>native</code>
implementation. If <code>false</code>, a new non-vectorized ORC reader is used
in <code>native</code> implementation. For <code>hive</code> implementation,
this is ignored.</td>
+ </tr>
+</table>
+
+## JSON Datasets
--- End diff --
Got it, will change it soon.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]