[GitHub] spark pull request #20610: [SPARK-23426][SQL] Use `hive` ORC impl and disabl...

dongjoon-hyun Wed, 14 Feb 2018 10:44:03 -0800

Github user dongjoon-hyun commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20610#discussion_r168271532
  
    --- Diff: docs/sql-programming-guide.md ---
    @@ -1004,6 +1004,24 @@ Configuration of Parquet can be done using the 
`setConf` method on `SparkSession
     </tr>
     </table>
     
    +## ORC Files
    +
    +Since Spark 2.3, Spark supports a vectorized ORC reader with a new ORC 
file format for ORC files. To do that, the following configurations are newly 
added. The vectorized reader is used for the native ORC tables (e.g., the ones 
created using the clause `USING ORC`) when `spark.sql.orc.impl` is set to 
`native` and `spark.sql.orc.enableVectorizedReader` is set to `true`. For the 
Hive ORC serde table (e.g., the ones created using the clause `USING HIVE 
OPTIONS (fileFormat 'ORC')`), the vectorized reader is used when 
`spark.sql.hive.convertMetastoreOrc` is set to `true`.
    +
    +<table class="table">
    +  <tr><th><b>Property 
Name</b></th><th><b>Default</b></th><th><b>Meaning</b></th></tr>
    +  <tr>
    +    <td><code>spark.sql.orc.impl</code></td>
    +    <td><code>hive</code></td>
    +    <td>The name of ORC implementation. It can be one of 
<code>native</code> and <code>hive</code>. <code>native</code> means the native 
ORC support that is built on Apache ORC 1.4.1. `hive` means the ORC library in 
Hive 1.2.1 which is used prior to Spark 2.3.</td>
    +  </tr>
    +  <tr>
    +    <td><code>spark.sql.orc.enableVectorizedReader</code></td>
    +    <td><code>true</code></td>
    +    <td>Enables vectorized orc decoding in <code>native</code> 
implementation. If <code>false</code>, a new non-vectorized ORC reader is used 
in <code>native</code> implementation. For <code>hive</code> implementation, 
this is ignored.</td>
    +  </tr>
    +</table>
    --- End diff --
    
    @gatorsmile . Now, this becomes a section.



---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request #20610: [SPARK-23426][SQL] Use `hive` ORC impl and disabl...

Reply via email to