[GitHub] spark pull request #22746: [SPARK-24499][SQL][DOC] Split the page of sql-pro...

kiszk Thu, 18 Oct 2018 02:34:58 -0700

Github user kiszk commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22746#discussion_r226235672
  
    --- Diff: docs/sql-data-sources-load-save-functions.md ---
    @@ -0,0 +1,283 @@
    +---
    +layout: global
    +title: Generic Load/Save Functions
    +displayTitle: Generic Load/Save Functions
    +---
    +
    +* Table of contents
    +{:toc}
    +
    +
    +In the simplest form, the default data source (`parquet` unless otherwise 
configured by
    +`spark.sql.sources.default`) will be used for all operations.
    +
    +
    +<div class="codetabs">
    +<div data-lang="scala"  markdown="1">
    +{% include_example generic_load_save_functions 
scala/org/apache/spark/examples/sql/SQLDataSourceExample.scala %}
    +</div>
    +
    +<div data-lang="java"  markdown="1">
    +{% include_example generic_load_save_functions 
java/org/apache/spark/examples/sql/JavaSQLDataSourceExample.java %}
    +</div>
    +
    +<div data-lang="python"  markdown="1">
    +
    +{% include_example generic_load_save_functions python/sql/datasource.py %}
    +</div>
    +
    +<div data-lang="r"  markdown="1">
    +
    +{% include_example generic_load_save_functions r/RSparkSQLExample.R %}
    +
    +</div>
    +</div>
    +
    +### Manually Specifying Options
    +
    +You can also manually specify the data source that will be used along with 
any extra options
    +that you would like to pass to the data source. Data sources are specified 
by their fully qualified
    +name (i.e., `org.apache.spark.sql.parquet`), but for built-in sources you 
can also use their short
    +names (`json`, `parquet`, `jdbc`, `orc`, `libsvm`, `csv`, `text`). 
DataFrames loaded from any data
    +source type can be converted into other types using this syntax.
    +
    +To load a JSON file you can use:
    +
    +<div class="codetabs">
    +<div data-lang="scala"  markdown="1">
    +{% include_example manual_load_options 
scala/org/apache/spark/examples/sql/SQLDataSourceExample.scala %}
    +</div>
    +
    +<div data-lang="java"  markdown="1">
    +{% include_example manual_load_options 
java/org/apache/spark/examples/sql/JavaSQLDataSourceExample.java %}
    +</div>
    +
    +<div data-lang="python"  markdown="1">
    +{% include_example manual_load_options python/sql/datasource.py %}
    +</div>
    +
    +<div data-lang="r"  markdown="1">
    +{% include_example manual_load_options r/RSparkSQLExample.R %}
    +</div>
    +</div>
    +
    +To load a CSV file you can use:
    +
    +<div class="codetabs">
    +<div data-lang="scala"  markdown="1">
    +{% include_example manual_load_options_csv 
scala/org/apache/spark/examples/sql/SQLDataSourceExample.scala %}
    +</div>
    +
    +<div data-lang="java"  markdown="1">
    +{% include_example manual_load_options_csv 
java/org/apache/spark/examples/sql/JavaSQLDataSourceExample.java %}
    +</div>
    +
    +<div data-lang="python"  markdown="1">
    +{% include_example manual_load_options_csv python/sql/datasource.py %}
    +</div>
    +
    +<div data-lang="r"  markdown="1">
    +{% include_example manual_load_options_csv r/RSparkSQLExample.R %}
    +
    +</div>
    +</div>
    +
    +### Run SQL on files directly
    +
    +Instead of using read API to load a file into DataFrame and query it, you 
can also query that
    +file directly with SQL.
    +
    +<div class="codetabs">
    +<div data-lang="scala"  markdown="1">
    +{% include_example direct_sql 
scala/org/apache/spark/examples/sql/SQLDataSourceExample.scala %}
    +</div>
    +
    +<div data-lang="java"  markdown="1">
    +{% include_example direct_sql 
java/org/apache/spark/examples/sql/JavaSQLDataSourceExample.java %}
    +</div>
    +
    +<div data-lang="python"  markdown="1">
    +{% include_example direct_sql python/sql/datasource.py %}
    +</div>
    +
    +<div data-lang="r"  markdown="1">
    +{% include_example direct_sql r/RSparkSQLExample.R %}
    +
    +</div>
    +</div>
    +
    +### Save Modes
    +
    +Save operations can optionally take a `SaveMode`, that specifies how to 
handle existing data if
    +present. It is important to realize that these save modes do not utilize 
any locking and are not
    +atomic. Additionally, when performing an `Overwrite`, the data will be 
deleted before writing out the
    +new data.
    +
    +<table class="table">
    +<tr><th>Scala/Java</th><th>Any Language</th><th>Meaning</th></tr>
    +<tr>
    +  <td><code>SaveMode.ErrorIfExists</code> (default)</td>
    +  <td><code>"error" or "errorifexists"</code> (default)</td>
    +  <td>
    +    When saving a DataFrame to a data source, if data already exists,
    +    an exception is expected to be thrown.
    +  </td>
    +</tr>
    +<tr>
    +  <td><code>SaveMode.Append</code></td>
    +  <td><code>"append"</code></td>
    +  <td>
    +    When saving a DataFrame to a data source, if data/table already exists,
    +    contents of the DataFrame are expected to be appended to existing data.
    +  </td>
    +</tr>
    +<tr>
    +  <td><code>SaveMode.Overwrite</code></td>
    +  <td><code>"overwrite"</code></td>
    +  <td>
    +    Overwrite mode means that when saving a DataFrame to a data source,
    +    if data/table already exists, existing data is expected to be 
overwritten by the contents of
    +    the DataFrame.
    +  </td>
    +</tr>
    +<tr>
    +  <td><code>SaveMode.Ignore</code></td>
    +  <td><code>"ignore"</code></td>
    +  <td>
    +    Ignore mode means that when saving a DataFrame to a data source, if 
data already exists,
    +    the save operation is expected to not save the contents of the 
DataFrame and to not
    --- End diff --
    
    nit: `expected to not ... to not ...` -> `expected not to ... not to ...`?



---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request #22746: [SPARK-24499][SQL][DOC] Split the page of sql-pro...

Reply via email to