HyukjinKwon commented on a change in pull request #32546:
URL: https://github.com/apache/spark/pull/32546#discussion_r632384490
##########
File path: sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala
##########
@@ -909,13 +909,10 @@ final class DataFrameWriter[T] private[sql](ds:
Dataset[T]) {
* }}}
* The text files will be encoded as UTF-8.
*
- * You can set the following option(s) for writing text files:
- * <ul>
- * <li>`compression` (default `null`): compression codec to use when saving
to file. This can be
- * one of the known case-insensitive shorten names (`none`, `bzip2`, `gzip`,
`lz4`,
- * `snappy` and `deflate`). </li>
- * <li>`lineSep` (default `\n`): defines the line separator that should be
used for writing.</li>
Review comment:
this isn't `orc`. It's `text`.
##########
File path: docs/sql-data-sources-orc.md
##########
@@ -172,3 +172,32 @@ When reading from Hive metastore ORC tables and inserting
to Hive metastore ORC
<td>2.0.0</td>
</tr>
</table>
+
+## Data Source Option
+
+Data source options of ORC can be set via:
+* the `.option`/`.options` methods of `DataFrameReader` or `DataFrameWriter`
+* the `.option`/`.options` methods of `DataStreamReader` or `DataStreamWriter`
+
+<table class="table">
+ <tr><th><b>Property
Name</b></th><th><b>Default</b></th><th><b>Meaning</b></th><th><b>Scope</b></th></tr>
+ <tr>
+ <td><code>mergeSchema</code></td>
+ <td>None</td>
+ <td>sets whether we should merge schemas collected from all ORC
part-files. This will override <code>spark.sql.orc.mergeSchema</code>. The
default value is specified in <code>spark.sql.orc.mergeSchema</code>.</td>
+ <td>read</td>
+ </tr>
+ <tr>
+ <td><code>compression</code></td>
+ <td>None</td>
+ <td>compression codec to use when saving to file. This can be one of the
known case-insensitive shorten names (none, snappy, zlib, lzo, and zstd). This
will override <code>orc.compress</code> and
<code>spark.sql.orc.compression.codec</code>. If None is set, it uses the value
specified in <code>spark.sql.orc.compression.codec</code>.</td>
+ <td>write</td>
+ </tr>
+ <tr>
+ <td><code>lineSep</code></td>
Review comment:
Can you remove this? it's text source option, not ORC
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]