Github user kiszk commented on a diff in the pull request:
https://github.com/apache/spark/pull/22746#discussion_r226235672
--- Diff: docs/sql-data-sources-load-save-functions.md ---
@@ -0,0 +1,283 @@
+---
+layout: global
+title: Generic Load/Save Functions
+displayTitle: Generic Load/Save Functions
+---
+
+* Table of contents
+{:toc}
+
+
+In the simplest form, the default data source (`parquet` unless otherwise
configured by
+`spark.sql.sources.default`) will be used for all operations.
+
+
+<div class="codetabs">
+<div data-lang="scala" markdown="1">
+{% include_example generic_load_save_functions
scala/org/apache/spark/examples/sql/SQLDataSourceExample.scala %}
+</div>
+
+<div data-lang="java" markdown="1">
+{% include_example generic_load_save_functions
java/org/apache/spark/examples/sql/JavaSQLDataSourceExample.java %}
+</div>
+
+<div data-lang="python" markdown="1">
+
+{% include_example generic_load_save_functions python/sql/datasource.py %}
+</div>
+
+<div data-lang="r" markdown="1">
+
+{% include_example generic_load_save_functions r/RSparkSQLExample.R %}
+
+</div>
+</div>
+
+### Manually Specifying Options
+
+You can also manually specify the data source that will be used along with
any extra options
+that you would like to pass to the data source. Data sources are specified
by their fully qualified
+name (i.e., `org.apache.spark.sql.parquet`), but for built-in sources you
can also use their short
+names (`json`, `parquet`, `jdbc`, `orc`, `libsvm`, `csv`, `text`).
DataFrames loaded from any data
+source type can be converted into other types using this syntax.
+
+To load a JSON file you can use:
+
+<div class="codetabs">
+<div data-lang="scala" markdown="1">
+{% include_example manual_load_options
scala/org/apache/spark/examples/sql/SQLDataSourceExample.scala %}
+</div>
+
+<div data-lang="java" markdown="1">
+{% include_example manual_load_options
java/org/apache/spark/examples/sql/JavaSQLDataSourceExample.java %}
+</div>
+
+<div data-lang="python" markdown="1">
+{% include_example manual_load_options python/sql/datasource.py %}
+</div>
+
+<div data-lang="r" markdown="1">
+{% include_example manual_load_options r/RSparkSQLExample.R %}
+</div>
+</div>
+
+To load a CSV file you can use:
+
+<div class="codetabs">
+<div data-lang="scala" markdown="1">
+{% include_example manual_load_options_csv
scala/org/apache/spark/examples/sql/SQLDataSourceExample.scala %}
+</div>
+
+<div data-lang="java" markdown="1">
+{% include_example manual_load_options_csv
java/org/apache/spark/examples/sql/JavaSQLDataSourceExample.java %}
+</div>
+
+<div data-lang="python" markdown="1">
+{% include_example manual_load_options_csv python/sql/datasource.py %}
+</div>
+
+<div data-lang="r" markdown="1">
+{% include_example manual_load_options_csv r/RSparkSQLExample.R %}
+
+</div>
+</div>
+
+### Run SQL on files directly
+
+Instead of using read API to load a file into DataFrame and query it, you
can also query that
+file directly with SQL.
+
+<div class="codetabs">
+<div data-lang="scala" markdown="1">
+{% include_example direct_sql
scala/org/apache/spark/examples/sql/SQLDataSourceExample.scala %}
+</div>
+
+<div data-lang="java" markdown="1">
+{% include_example direct_sql
java/org/apache/spark/examples/sql/JavaSQLDataSourceExample.java %}
+</div>
+
+<div data-lang="python" markdown="1">
+{% include_example direct_sql python/sql/datasource.py %}
+</div>
+
+<div data-lang="r" markdown="1">
+{% include_example direct_sql r/RSparkSQLExample.R %}
+
+</div>
+</div>
+
+### Save Modes
+
+Save operations can optionally take a `SaveMode`, that specifies how to
handle existing data if
+present. It is important to realize that these save modes do not utilize
any locking and are not
+atomic. Additionally, when performing an `Overwrite`, the data will be
deleted before writing out the
+new data.
+
+<table class="table">
+<tr><th>Scala/Java</th><th>Any Language</th><th>Meaning</th></tr>
+<tr>
+ <td><code>SaveMode.ErrorIfExists</code> (default)</td>
+ <td><code>"error" or "errorifexists"</code> (default)</td>
+ <td>
+ When saving a DataFrame to a data source, if data already exists,
+ an exception is expected to be thrown.
+ </td>
+</tr>
+<tr>
+ <td><code>SaveMode.Append</code></td>
+ <td><code>"append"</code></td>
+ <td>
+ When saving a DataFrame to a data source, if data/table already exists,
+ contents of the DataFrame are expected to be appended to existing data.
+ </td>
+</tr>
+<tr>
+ <td><code>SaveMode.Overwrite</code></td>
+ <td><code>"overwrite"</code></td>
+ <td>
+ Overwrite mode means that when saving a DataFrame to a data source,
+ if data/table already exists, existing data is expected to be
overwritten by the contents of
+ the DataFrame.
+ </td>
+</tr>
+<tr>
+ <td><code>SaveMode.Ignore</code></td>
+ <td><code>"ignore"</code></td>
+ <td>
+ Ignore mode means that when saving a DataFrame to a data source, if
data already exists,
+ the save operation is expected to not save the contents of the
DataFrame and to not
--- End diff --
nit: `expected to not ... to not ...` -> `expected not to ... not to ...`?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]