[GitHub] spark pull request #20484: [SPARK-23313][DOC] Add a migration guide for ORC

dongjoon-hyun Fri, 02 Feb 2018 00:16:05 -0800

Github user dongjoon-hyun commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20484#discussion_r165580466
  
    --- Diff: docs/sql-programming-guide.md ---
    @@ -1776,6 +1776,66 @@ working with timestamps in `pandas_udf`s to get the 
best performance, see
     
     ## Upgrading From Spark SQL 2.2 to 2.3
     
    +  - Since Spark 2.3, Spark supports a vectorized ORC reader with a new ORC 
file format for ORC files and Hive ORC tables. To do that, the following 
configurations are newly added or change their default values.
    +
    +    <table class="table">
    +      <tr>
    +        <th>
    +          <b>Property Name</b>
    +        </th>
    +        <th>
    +          <b>Default</b>
    +        </th>
    +        <th>
    +          <b>Meaning</b>
    +        </th>
    +      </tr>
    +      <tr>
    +        <td>
    +          spark.sql.orc.impl
    +        </td>
    +        <td>
    +          native
    +        </td>
    +        <td>
    +          The name of ORC implementation: `native` means the native ORC 
support that is built on Apache ORC 1.4.1 instead of the ORC library in Hive 
1.2.1. It is `hive` by default prior to Spark 2.3.
    +        </td>
    +      </tr>
    +      <tr>
    +        <td>
    +          spark.sql.orc.enableVectorizedReader
    +        </td>
    +        <td>
    +          true
    +        </td>
    +        <td>
    +          Enables vectorized orc decoding in `native` implementation. If 
`false`, a new non-vectorized ORC reader is used in `native` implementation. 
For `hive` implementation, this is ignored.
    +        </td>
    +      </tr>
    +      <tr>
    +        <td>
    +          spark.sql.orc.filterPushdown
    +        </td>
    +        <td>
    +          true
    +        </td>
    +        <td>
    +          Enables filter pushdown for ORC files. It is `false` by default 
prior to Spark 2.3.
    +        </td>
    +      </tr>
    +      <tr>
    +        <td>
    +          spark.sql.hive.convertMetastoreOrc
    +        </td>
    +        <td>
    +          true
    +        </td>
    +        <td>
    +          Enable Spark's ORC support instead of Hive SerDe when reading 
from and writing to Hive ORC tables. It is `false` by default prior to Spark 
2.3.
    --- End diff --
    
    Sounds good. I'll update like this.



---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request #20484: [SPARK-23313][DOC] Add a migration guide for ORC

Reply via email to