Github user dongjoon-hyun commented on a diff in the pull request:
https://github.com/apache/spark/pull/22839#discussion_r228307486
--- Diff: docs/sql-data-sources-load-save-functions.md ---
@@ -82,6 +82,49 @@ To load a CSV file you can use:
</div>
</div>
+The extra options are also used during write operation.
+For example, you can control bloom filters and dictionary encodings for
ORC data sources.
+The following ORC example will create bloom filter on `favorite_color` and
use dictionary encoding for `name` and `favorite_color`.
+For Parquet, there exists `parquet.enable.dictionary`, too.
+To find more detailed information about the extra ORC/Parquet options,
+visit the official Apache ORC/Parquet websites.
+
+<div class="codetabs">
+
+<div data-lang="scala" markdown="1">
+{% include_example manual_save_options_orc
scala/org/apache/spark/examples/sql/SQLDataSourceExample.scala %}
+</div>
+
+<div data-lang="java" markdown="1">
+{% include_example manual_save_options_orc
java/org/apache/spark/examples/sql/JavaSQLDataSourceExample.java %}
+</div>
+
+<div data-lang="python" markdown="1">
+{% include_example manual_save_options_orc python/sql/datasource.py %}
+</div>
+
+<div data-lang="r" markdown="1">
+{% include_example manual_save_options_orc r/RSparkSQLExample.R %}
+</div>
+
+<div data-lang="sql" markdown="1">
+
+{% highlight sql %}
+CREATE TABLE users_with_options (
+ name STRING,
+ favorite_color STRING,
+ favorite_numbers array<integer>
+) USING ORC
+OPTIONS (
+ orc.bloom.filter.columns 'favorite_color',
+ orc.dictionary.key.threshold '1.0'
+)
+{% endhighlight %}
--- End diff --
Hi, @dbtsai . This is a backport of #22801 without
`orc.column.encoding.direct`.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]