[GitHub] spark pull request #22746: [SPARK-24499][SQL][DOC] Split the page of sql-pro...

kiszk Thu, 18 Oct 2018 02:24:01 -0700

Github user kiszk commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22746#discussion_r226231876
  
    --- Diff: docs/sql-data-sources-jdbc.md ---
    @@ -0,0 +1,223 @@
    +---
    +layout: global
    +title: JDBC To Other Databases
    +displayTitle: JDBC To Other Databases
    +---
    +
    +* Table of contents
    +{:toc}
    +
    +Spark SQL also includes a data source that can read data from other 
databases using JDBC. This
    +functionality should be preferred over using 
[JdbcRDD](api/scala/index.html#org.apache.spark.rdd.JdbcRDD).
    +This is because the results are returned
    +as a DataFrame and they can easily be processed in Spark SQL or joined 
with other data sources.
    +The JDBC data source is also easier to use from Java or Python as it does 
not require the user to
    +provide a ClassTag.
    +(Note that this is different than the Spark SQL JDBC server, which allows 
other applications to
    +run queries using Spark SQL).
    +
    +To get started you will need to include the JDBC driver for your 
particular database on the
    +spark classpath. For example, to connect to postgres from the Spark Shell 
you would run the
    +following command:
    +
    +{% highlight bash %}
    +bin/spark-shell --driver-class-path postgresql-9.4.1207.jar --jars 
postgresql-9.4.1207.jar
    +{% endhighlight %}
    +
    +Tables from the remote database can be loaded as a DataFrame or Spark SQL 
temporary view using
    +the Data Sources API. Users can specify the JDBC connection properties in 
the data source options.
    +<code>user</code> and <code>password</code> are normally provided as 
connection properties for
    +logging into the data sources. In addition to the connection properties, 
Spark also supports
    +the following case-insensitive options:
    +
    +<table class="table">
    +  <tr><th>Property Name</th><th>Meaning</th></tr>
    +  <tr>
    +    <td><code>url</code></td>
    +    <td>
    +      The JDBC URL to connect to. The source-specific connection 
properties may be specified in the URL. e.g., 
<code>jdbc:postgresql://localhost/test?user=fred&password=secret</code>
    +    </td>
    +  </tr>
    +
    +  <tr>
    +    <td><code>dbtable</code></td>
    +    <td>
    +      The JDBC table that should be read from or written into. Note that 
when using it in the read
    +      path anything that is valid in a <code>FROM</code> clause of a SQL 
query can be used.
    +      For example, instead of a full table you could also use a subquery 
in parentheses. It is not
    +      allowed to specify `dbtable` and `query` options at the same time.
    +    </td>
    +  </tr>
    +  <tr>
    +    <td><code>query</code></td>
    +    <td>
    +      A query that will be used to read data into Spark. The specified 
query will be parenthesized and used
    +      as a subquery in the <code>FROM</code> clause. Spark will also 
assign an alias to the subquery clause.
    +      As an example, spark will issue a query of the following form to the 
JDBC Source.<br><br>
    +      <code> SELECT &lt;columns&gt; FROM (&lt;user_specified_query&gt;) 
spark_gen_alias</code><br><br>
    +      Below are couple of restrictions while using this option.<br>
    +      <ol>
    +         <li> It is not allowed to specify `dbtable` and `query` options 
at the same time. </li>
    +         <li> It is not allowed to spcify `query` and `partitionColumn` 
options at the same time. When specifying
    --- End diff --
    
    `spcify` -> `specify`



---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request #22746: [SPARK-24499][SQL][DOC] Split the page of sql-pro...

Reply via email to