Github user kiszk commented on a diff in the pull request:
https://github.com/apache/spark/pull/22746#discussion_r226231876
--- Diff: docs/sql-data-sources-jdbc.md ---
@@ -0,0 +1,223 @@
+---
+layout: global
+title: JDBC To Other Databases
+displayTitle: JDBC To Other Databases
+---
+
+* Table of contents
+{:toc}
+
+Spark SQL also includes a data source that can read data from other
databases using JDBC. This
+functionality should be preferred over using
[JdbcRDD](api/scala/index.html#org.apache.spark.rdd.JdbcRDD).
+This is because the results are returned
+as a DataFrame and they can easily be processed in Spark SQL or joined
with other data sources.
+The JDBC data source is also easier to use from Java or Python as it does
not require the user to
+provide a ClassTag.
+(Note that this is different than the Spark SQL JDBC server, which allows
other applications to
+run queries using Spark SQL).
+
+To get started you will need to include the JDBC driver for your
particular database on the
+spark classpath. For example, to connect to postgres from the Spark Shell
you would run the
+following command:
+
+{% highlight bash %}
+bin/spark-shell --driver-class-path postgresql-9.4.1207.jar --jars
postgresql-9.4.1207.jar
+{% endhighlight %}
+
+Tables from the remote database can be loaded as a DataFrame or Spark SQL
temporary view using
+the Data Sources API. Users can specify the JDBC connection properties in
the data source options.
+<code>user</code> and <code>password</code> are normally provided as
connection properties for
+logging into the data sources. In addition to the connection properties,
Spark also supports
+the following case-insensitive options:
+
+<table class="table">
+ <tr><th>Property Name</th><th>Meaning</th></tr>
+ <tr>
+ <td><code>url</code></td>
+ <td>
+ The JDBC URL to connect to. The source-specific connection
properties may be specified in the URL. e.g.,
<code>jdbc:postgresql://localhost/test?user=fred&password=secret</code>
+ </td>
+ </tr>
+
+ <tr>
+ <td><code>dbtable</code></td>
+ <td>
+ The JDBC table that should be read from or written into. Note that
when using it in the read
+ path anything that is valid in a <code>FROM</code> clause of a SQL
query can be used.
+ For example, instead of a full table you could also use a subquery
in parentheses. It is not
+ allowed to specify `dbtable` and `query` options at the same time.
+ </td>
+ </tr>
+ <tr>
+ <td><code>query</code></td>
+ <td>
+ A query that will be used to read data into Spark. The specified
query will be parenthesized and used
+ as a subquery in the <code>FROM</code> clause. Spark will also
assign an alias to the subquery clause.
+ As an example, spark will issue a query of the following form to the
JDBC Source.<br><br>
+ <code> SELECT <columns> FROM (<user_specified_query>)
spark_gen_alias</code><br><br>
+ Below are couple of restrictions while using this option.<br>
+ <ol>
+ <li> It is not allowed to specify `dbtable` and `query` options
at the same time. </li>
+ <li> It is not allowed to spcify `query` and `partitionColumn`
options at the same time. When specifying
--- End diff --
`spcify` -> `specify`
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]