[GitHub] [spark] HyukjinKwon commented on a change in pull request #32723: [SPARK-35583][DOCS] Move JDBC data source options from Python and Scala into a single page

2021-06-01 Thread GitBox


HyukjinKwon commented on a change in pull request #32723:
URL: https://github.com/apache/spark/pull/32723#discussion_r643577724



##
File path: sql/core/src/main/scala/org/apache/spark/sql/DataFrameReader.scala
##
@@ -301,23 +306,22 @@ class DataFrameReader private[sql](sparkSession: 
SparkSession) extends Logging {
* Don't create too many partitions in parallel on a large cluster; 
otherwise Spark might crash
* your external database systems.
*
-   * @param url JDBC database url of the form `jdbc:subprotocol:subname`.
+   * You can find the JDBC-specific options for reading table via JDBC in

Review comment:
   Can we change: "JDBC-specific options for reading table" -> 
"JDBC-specific option and parameter documentation for reading tables"?

##
File path: sql/core/src/main/scala/org/apache/spark/sql/DataFrameReader.scala
##
@@ -282,6 +282,10 @@ class DataFrameReader private[sql](sparkSession: 
SparkSession) extends Logging {
* Construct a `DataFrame` representing the database table accessible via 
JDBC URL
* url named table and connection properties.
*
+   * You can find the JDBC-specific options for reading table via JDBC in

Review comment:
   reading a table or reading tables




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on a change in pull request #32723: [SPARK-35583][DOCS] Move JDBC data source options from Python and Scala into a single page

2021-06-01 Thread GitBox


HyukjinKwon commented on a change in pull request #32723:
URL: https://github.com/apache/spark/pull/32723#discussion_r643577193



##
File path: docs/sql-data-sources-jdbc.md
##
@@ -39,6 +39,8 @@ following command:
 ./bin/spark-shell --driver-class-path postgresql-9.4.1207.jar --jars 
postgresql-9.4.1207.jar
 {% endhighlight %}
 
+## Data Source Option
+
 Tables from the remote database can be loaded as a DataFrame or Spark SQL 
temporary view using

Review comment:
   `Tables from the remote database can be ... ction properties for logging 
into the data sources` this description isn't about Data source option. Can you 
fix the description such as:
   
   Spark supports the following case-insensitive options for JDBC. The Data 
source options of JDBC can be set via:
   
   the .option/.options methods of
   ...
   
   For connection properties, users can specify the JDBC connection properties 
in the data source options. user; and password are 
normally provided as connection properties for logging into the data sources.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on a change in pull request #32723: [SPARK-35583][DOCS] Move JDBC data source options from Python and Scala into a single page

2021-06-01 Thread GitBox


HyukjinKwon commented on a change in pull request #32723:
URL: https://github.com/apache/spark/pull/32723#discussion_r642803769



##
File path: sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala
##
@@ -754,6 +746,8 @@ final class DataFrameWriter[T] private[sql](ds: Dataset[T]) 
{
* or "SERIALIZABLE", corresponding to standard 
transaction
* isolation levels defined by JDBC's Connection 
object, with default
* of "READ_UNCOMMITTED".
+   *
+   *

Review comment:
   Let's remove these empty lines




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on a change in pull request #32723: [SPARK-35583][DOCS] Move JDBC data source options from Python and Scala into a single page

2021-06-01 Thread GitBox


HyukjinKwon commented on a change in pull request #32723:
URL: https://github.com/apache/spark/pull/32723#discussion_r642803647



##
File path: sql/core/src/main/scala/org/apache/spark/sql/DataFrameReader.scala
##
@@ -301,23 +306,22 @@ class DataFrameReader private[sql](sparkSession: 
SparkSession) extends Logging {
* Don't create too many partitions in parallel on a large cluster; 
otherwise Spark might crash
* your external database systems.
*
-   * @param url JDBC database url of the form `jdbc:subprotocol:subname`.
+   * You can find the JDBC-specific options for reading table via JDBC in
+   * https://spark.apache.org/docs/latest/sql-data-sources-jdbc.html#data-source-option;>
+   *   Data Source Option in the version you use.
+   *
* @param table Name of the table in the external database.
-   * @param columnName the name of a column of numeric, date, or timestamp type
-   *   that will be used for partitioning.
-   * @param lowerBound the minimum value of `columnName` used to decide 
partition stride.
-   * @param upperBound the maximum value of `columnName` used to decide 
partition stride.
-   * @param numPartitions the number of partitions. This, along with 
`lowerBound` (inclusive),
-   *  `upperBound` (exclusive), form partition strides for 
generated WHERE
-   *  clause expressions used to split the column 
`columnName` evenly. When
-   *  the input is less than 1, the number is set to 1.
+   * @param columnName alias of `partitionColumn` option. Refer to 
`partitionColumn` in

Review comment:
   ```suggestion
  * @param columnName Alias of `partitionColumn` option. Refer to 
`partitionColumn` in
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on a change in pull request #32723: [SPARK-35583][DOCS] Move JDBC data source options from Python and Scala into a single page

2021-05-31 Thread GitBox


HyukjinKwon commented on a change in pull request #32723:
URL: https://github.com/apache/spark/pull/32723#discussion_r642788618



##
File path: python/pyspark/sql/readwriter.py
##
@@ -627,8 +627,6 @@ def jdbc(self, url, table, column=None, lowerBound=None, 
upperBound=None, numPar
 
 Parameters
 --
-url : str
-a JDBC URL of the form ``jdbc:subprotocol:subname``
 table : str
 the name of the table
 column : str, optional

Review comment:
   I think we can remove `lowerBound`, `upperBound`, and `numPartitions`.
   And, fix the description of `column` to something like:
   
   Alias of `partitionColumn` option. Refer to `partitionColumn` in `Data 
Source Option <...>`_ in the version you use.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org