[GitHub] spark pull request #15292: [SPARK-17719][SPARK-17776][SQL] Unify and tie up ...

2016-10-10 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/15292


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15292: [SPARK-17719][SPARK-17776][SQL] Unify and tie up ...

2016-10-09 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/15292#discussion_r82541651
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JDBCOptions.scala
 ---
@@ -17,47 +17,132 @@
 
 package org.apache.spark.sql.execution.datasources.jdbc
 
+import java.sql.{Connection, DriverManager}
+import java.util.Properties
+
 /**
  * Options for the JDBC data source.
  */
 class JDBCOptions(
 @transient private val parameters: Map[String, String])
   extends Serializable {
 
+  import JDBCOptions._
+
+  def this(url: String, table: String, parameters: Map[String, String]) = {
+this(parameters ++ Map(
+  JDBCOptions.JDBC_URL -> url,
+  JDBCOptions.JDBC_TABLE_NAME -> table))
+  }
+
+  val asConnectionProperties: Properties = {
+val properties = new Properties()
+// We should avoid to pass the options into properties. See 
SPARK-17776.
+parameters.filterKeys(!jdbcOptionNames.contains(_))
+  .foreach { case (k, v) => properties.setProperty(k, v) }
+properties
+  }
+
   // 
   // Required parameters
   // 
-  require(parameters.isDefinedAt("url"), "Option 'url' is required.")
-  require(parameters.isDefinedAt("dbtable"), "Option 'dbtable' is 
required.")
+  require(parameters.isDefinedAt(JDBC_URL), s"Option '$JDBC_URL' is 
required.")
+  require(parameters.isDefinedAt(JDBC_TABLE_NAME), s"Option 
'$JDBC_TABLE_NAME' is required.")
   // a JDBC URL
-  val url = parameters("url")
+  val url = parameters(JDBC_URL)
   // name of table
-  val table = parameters("dbtable")
+  val table = parameters(JDBC_TABLE_NAME)
+
+  // 
+  // Optional parameters
+  // 
+  val driverClass = {
+val userSpecifiedDriverClass = parameters.get(JDBC_DRIVER_CLASS)
+userSpecifiedDriverClass.foreach(DriverRegistry.register)
+
+// Performing this part of the logic on the driver guards against the 
corner-case where the
+// driver returned for a URL is different on the driver and executors 
due to classpath
+// differences.
+userSpecifiedDriverClass.getOrElse {
+  DriverManager.getDriver(url).getClass.getCanonicalName
+}
+  }
 
   // 
-  // Optional parameter list
+  // Optional parameters only for reading
   // 
   // the column used to partition
-  val partitionColumn = parameters.getOrElse("partitionColumn", null)
+  val partitionColumn = parameters.getOrElse(JDBC_PARTITION_COLUMN, null)
   // the lower bound of partition column
-  val lowerBound = parameters.getOrElse("lowerBound", null)
+  val lowerBound = parameters.getOrElse(JDBC_LOWER_BOUND, null)
   // the upper bound of the partition column
-  val upperBound = parameters.getOrElse("upperBound", null)
+  val upperBound = parameters.getOrElse(JDBC_UPPER_BOUND, null)
   // the number of partitions
-  val numPartitions = parameters.getOrElse("numPartitions", null)
-
+  val numPartitions = parameters.getOrElse(JDBC_NUM_PARTITIONS, null)
   require(partitionColumn == null ||
 (lowerBound != null && upperBound != null && numPartitions != null),
-"If 'partitionColumn' is specified then 'lowerBound', 'upperBound'," +
-  " and 'numPartitions' are required.")
+s"If '$JDBC_PARTITION_COLUMN' is specified then '$JDBC_LOWER_BOUND', 
'$JDBC_UPPER_BOUND'," +
+  s" and '$JDBC_NUM_PARTITIONS' are required.")
+  val fetchSize = {
+val size = parameters.getOrElse(JDBC_BATCH_FETCH_SIZE, "0").toInt
+require(size >= 0,
+  s"Invalid value `${size.toString}` for parameter " +
+s"`$JDBC_BATCH_FETCH_SIZE`. The minimum value is 0. When the value 
is 0, " +
+"the JDBC driver ignores the value and does the estimates.")
+size
+  }
 
   // 
-  // The options for DataFrameWriter
+  // Optional parameters only for writing
   // 
   // if to truncate the table from the JDBC database
-  val isTruncate = parameters.getOrElse("truncate", "false").toBoolean
+  val isTruncate = parameters.getOrElse(JDBC_TRUNCATE, "false").toBoolean
   // the create table option , which can be table_options or 
partition_options.
   // E.g., "CREATE TABLE t (name string) 

[GitHub] spark pull request #15292: [SPARK-17719][SPARK-17776][SQL] Unify and tie up ...

2016-10-09 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/15292#discussion_r82538242
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JDBCOptions.scala
 ---
@@ -17,47 +17,132 @@
 
 package org.apache.spark.sql.execution.datasources.jdbc
 
+import java.sql.{Connection, DriverManager}
+import java.util.Properties
+
 /**
  * Options for the JDBC data source.
  */
 class JDBCOptions(
 @transient private val parameters: Map[String, String])
   extends Serializable {
 
+  import JDBCOptions._
+
+  def this(url: String, table: String, parameters: Map[String, String]) = {
+this(parameters ++ Map(
+  JDBCOptions.JDBC_URL -> url,
+  JDBCOptions.JDBC_TABLE_NAME -> table))
+  }
+
+  val asConnectionProperties: Properties = {
+val properties = new Properties()
+// We should avoid to pass the options into properties. See 
SPARK-17776.
+parameters.filterKeys(!jdbcOptionNames.contains(_))
+  .foreach { case (k, v) => properties.setProperty(k, v) }
+properties
+  }
+
   // 
   // Required parameters
   // 
-  require(parameters.isDefinedAt("url"), "Option 'url' is required.")
-  require(parameters.isDefinedAt("dbtable"), "Option 'dbtable' is 
required.")
+  require(parameters.isDefinedAt(JDBC_URL), s"Option '$JDBC_URL' is 
required.")
+  require(parameters.isDefinedAt(JDBC_TABLE_NAME), s"Option 
'$JDBC_TABLE_NAME' is required.")
   // a JDBC URL
-  val url = parameters("url")
+  val url = parameters(JDBC_URL)
   // name of table
-  val table = parameters("dbtable")
+  val table = parameters(JDBC_TABLE_NAME)
+
+  // 
+  // Optional parameters
+  // 
+  val driverClass = {
+val userSpecifiedDriverClass = parameters.get(JDBC_DRIVER_CLASS)
+userSpecifiedDriverClass.foreach(DriverRegistry.register)
+
+// Performing this part of the logic on the driver guards against the 
corner-case where the
+// driver returned for a URL is different on the driver and executors 
due to classpath
+// differences.
+userSpecifiedDriverClass.getOrElse {
+  DriverManager.getDriver(url).getClass.getCanonicalName
+}
+  }
 
   // 
-  // Optional parameter list
+  // Optional parameters only for reading
   // 
   // the column used to partition
-  val partitionColumn = parameters.getOrElse("partitionColumn", null)
+  val partitionColumn = parameters.getOrElse(JDBC_PARTITION_COLUMN, null)
   // the lower bound of partition column
-  val lowerBound = parameters.getOrElse("lowerBound", null)
+  val lowerBound = parameters.getOrElse(JDBC_LOWER_BOUND, null)
   // the upper bound of the partition column
-  val upperBound = parameters.getOrElse("upperBound", null)
+  val upperBound = parameters.getOrElse(JDBC_UPPER_BOUND, null)
   // the number of partitions
-  val numPartitions = parameters.getOrElse("numPartitions", null)
-
+  val numPartitions = parameters.getOrElse(JDBC_NUM_PARTITIONS, null)
   require(partitionColumn == null ||
 (lowerBound != null && upperBound != null && numPartitions != null),
-"If 'partitionColumn' is specified then 'lowerBound', 'upperBound'," +
-  " and 'numPartitions' are required.")
+s"If '$JDBC_PARTITION_COLUMN' is specified then '$JDBC_LOWER_BOUND', 
'$JDBC_UPPER_BOUND'," +
+  s" and '$JDBC_NUM_PARTITIONS' are required.")
+  val fetchSize = {
+val size = parameters.getOrElse(JDBC_BATCH_FETCH_SIZE, "0").toInt
+require(size >= 0,
+  s"Invalid value `${size.toString}` for parameter " +
+s"`$JDBC_BATCH_FETCH_SIZE`. The minimum value is 0. When the value 
is 0, " +
+"the JDBC driver ignores the value and does the estimates.")
+size
+  }
 
   // 
-  // The options for DataFrameWriter
+  // Optional parameters only for writing
   // 
   // if to truncate the table from the JDBC database
-  val isTruncate = parameters.getOrElse("truncate", "false").toBoolean
+  val isTruncate = parameters.getOrElse(JDBC_TRUNCATE, "false").toBoolean
   // the create table option , which can be table_options or 
partition_options.
   // E.g., "CREATE TABLE t (name string) 

[GitHub] spark pull request #15292: [SPARK-17719][SPARK-17776][SQL] Unify and tie up ...

2016-10-09 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/15292#discussion_r82537727
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JDBCOptions.scala
 ---
@@ -17,47 +17,132 @@
 
 package org.apache.spark.sql.execution.datasources.jdbc
 
+import java.sql.{Connection, DriverManager}
+import java.util.Properties
+
 /**
  * Options for the JDBC data source.
  */
 class JDBCOptions(
 @transient private val parameters: Map[String, String])
   extends Serializable {
 
+  import JDBCOptions._
+
+  def this(url: String, table: String, parameters: Map[String, String]) = {
+this(parameters ++ Map(
+  JDBCOptions.JDBC_URL -> url,
+  JDBCOptions.JDBC_TABLE_NAME -> table))
+  }
+
+  val asConnectionProperties: Properties = {
+val properties = new Properties()
+// We should avoid to pass the options into properties. See 
SPARK-17776.
+parameters.filterKeys(!jdbcOptionNames.contains(_))
+  .foreach { case (k, v) => properties.setProperty(k, v) }
+properties
+  }
+
   // 
   // Required parameters
   // 
-  require(parameters.isDefinedAt("url"), "Option 'url' is required.")
-  require(parameters.isDefinedAt("dbtable"), "Option 'dbtable' is 
required.")
+  require(parameters.isDefinedAt(JDBC_URL), s"Option '$JDBC_URL' is 
required.")
+  require(parameters.isDefinedAt(JDBC_TABLE_NAME), s"Option 
'$JDBC_TABLE_NAME' is required.")
   // a JDBC URL
-  val url = parameters("url")
+  val url = parameters(JDBC_URL)
   // name of table
-  val table = parameters("dbtable")
+  val table = parameters(JDBC_TABLE_NAME)
+
+  // 
+  // Optional parameters
+  // 
+  val driverClass = {
+val userSpecifiedDriverClass = parameters.get(JDBC_DRIVER_CLASS)
+userSpecifiedDriverClass.foreach(DriverRegistry.register)
+
+// Performing this part of the logic on the driver guards against the 
corner-case where the
+// driver returned for a URL is different on the driver and executors 
due to classpath
+// differences.
+userSpecifiedDriverClass.getOrElse {
+  DriverManager.getDriver(url).getClass.getCanonicalName
+}
+  }
 
   // 
-  // Optional parameter list
+  // Optional parameters only for reading
   // 
   // the column used to partition
-  val partitionColumn = parameters.getOrElse("partitionColumn", null)
+  val partitionColumn = parameters.getOrElse(JDBC_PARTITION_COLUMN, null)
   // the lower bound of partition column
-  val lowerBound = parameters.getOrElse("lowerBound", null)
+  val lowerBound = parameters.getOrElse(JDBC_LOWER_BOUND, null)
   // the upper bound of the partition column
-  val upperBound = parameters.getOrElse("upperBound", null)
+  val upperBound = parameters.getOrElse(JDBC_UPPER_BOUND, null)
   // the number of partitions
-  val numPartitions = parameters.getOrElse("numPartitions", null)
-
+  val numPartitions = parameters.getOrElse(JDBC_NUM_PARTITIONS, null)
   require(partitionColumn == null ||
 (lowerBound != null && upperBound != null && numPartitions != null),
-"If 'partitionColumn' is specified then 'lowerBound', 'upperBound'," +
-  " and 'numPartitions' are required.")
+s"If '$JDBC_PARTITION_COLUMN' is specified then '$JDBC_LOWER_BOUND', 
'$JDBC_UPPER_BOUND'," +
+  s" and '$JDBC_NUM_PARTITIONS' are required.")
+  val fetchSize = {
+val size = parameters.getOrElse(JDBC_BATCH_FETCH_SIZE, "0").toInt
+require(size >= 0,
+  s"Invalid value `${size.toString}` for parameter " +
+s"`$JDBC_BATCH_FETCH_SIZE`. The minimum value is 0. When the value 
is 0, " +
+"the JDBC driver ignores the value and does the estimates.")
+size
+  }
 
   // 
-  // The options for DataFrameWriter
+  // Optional parameters only for writing
   // 
   // if to truncate the table from the JDBC database
-  val isTruncate = parameters.getOrElse("truncate", "false").toBoolean
+  val isTruncate = parameters.getOrElse(JDBC_TRUNCATE, "false").toBoolean
   // the create table option , which can be table_options or 
partition_options.
   // E.g., "CREATE TABLE t (name string) ENGINE=InnoDB 

[GitHub] spark pull request #15292: [SPARK-17719][SPARK-17776][SQL] Unify and tie up ...

2016-10-09 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/15292#discussion_r82524289
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/DataFrameReader.scala ---
@@ -229,13 +229,9 @@ class DataFrameReader private[sql](sparkSession: 
SparkSession) extends Logging {
   table: String,
   parts: Array[Partition],
   connectionProperties: Properties): DataFrame = {
-val props = new Properties()
-extraOptions.foreach { case (key, value) =>
-  props.put(key, value)
-}
-// connectionProperties should override settings in extraOptions
--- End diff --

I will make this back just to be safe.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15292: [SPARK-17719][SPARK-17776][SQL] Unify and tie up ...

2016-10-08 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/15292#discussion_r82515285
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/DataFrameReader.scala ---
@@ -229,13 +229,9 @@ class DataFrameReader private[sql](sparkSession: 
SparkSession) extends Logging {
   table: String,
   parts: Array[Partition],
   connectionProperties: Properties): DataFrame = {
-val props = new Properties()
-extraOptions.foreach { case (key, value) =>
-  props.put(key, value)
-}
-// connectionProperties should override settings in extraOptions
--- End diff --

should we still keep this comment?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15292: [SPARK-17719][SPARK-17776][SQL] Unify and tie up ...

2016-10-07 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/15292#discussion_r82339188
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JDBCOptions.scala
 ---
@@ -17,47 +17,130 @@
 
 package org.apache.spark.sql.execution.datasources.jdbc
 
+import java.sql.{Connection, DriverManager}
+import java.util.Properties
+
 /**
  * Options for the JDBC data source.
  */
 class JDBCOptions(
 @transient private val parameters: Map[String, String])
   extends Serializable {
 
+  import JDBCOptions._
+
+  def this(url: String, table: String, parameters: Map[String, String]) = {
+this(parameters ++ Map("url" -> url, "dbtable" -> table))
+  }
+
+  val asProperties: Properties = {
--- End diff --

I hear you. Let me fix then.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15292: [SPARK-17719][SPARK-17776][SQL] Unify and tie up ...

2016-10-07 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/15292#discussion_r82333056
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JDBCOptions.scala
 ---
@@ -17,47 +17,130 @@
 
 package org.apache.spark.sql.execution.datasources.jdbc
 
+import java.sql.{Connection, DriverManager}
+import java.util.Properties
+
 /**
  * Options for the JDBC data source.
  */
 class JDBCOptions(
 @transient private val parameters: Map[String, String])
   extends Serializable {
 
+  import JDBCOptions._
+
+  def this(url: String, table: String, parameters: Map[String, String]) = {
+this(parameters ++ Map("url" -> url, "dbtable" -> table))
+  }
+
+  val asProperties: Properties = {
--- End diff --

Sounds good but it seems both are fine as actual class name is `Properties` 
but I guess the meaning is connection properties. Maybe this one depends on 
personal taste.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15292: [SPARK-17719][SPARK-17776][SQL] Unify and tie up ...

2016-10-06 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/15292#discussion_r82332285
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JDBCOptions.scala
 ---
@@ -17,47 +17,130 @@
 
 package org.apache.spark.sql.execution.datasources.jdbc
 
+import java.sql.{Connection, DriverManager}
+import java.util.Properties
+
 /**
  * Options for the JDBC data source.
  */
 class JDBCOptions(
 @transient private val parameters: Map[String, String])
   extends Serializable {
 
+  import JDBCOptions._
+
+  def this(url: String, table: String, parameters: Map[String, String]) = {
+this(parameters ++ Map("url" -> url, "dbtable" -> table))
+  }
+
+  val asProperties: Properties = {
--- End diff --

I think the function name needs an update. How about 
`asConnectionProperties`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15292: [SPARK-17719][SPARK-17776][SQL] Unify and tie up ...

2016-10-06 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/15292#discussion_r82330973
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JDBCOptions.scala
 ---
@@ -17,47 +17,130 @@
 
 package org.apache.spark.sql.execution.datasources.jdbc
 
+import java.sql.{Connection, DriverManager}
+import java.util.Properties
+
 /**
  * Options for the JDBC data source.
  */
 class JDBCOptions(
 @transient private val parameters: Map[String, String])
   extends Serializable {
 
+  import JDBCOptions._
+
+  def this(url: String, table: String, parameters: Map[String, String]) = {
+this(parameters ++ Map("url" -> url, "dbtable" -> table))
--- End diff --

Change them to `JDBC_URL` and `JDBC_TABLE_NAME`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15292: [SPARK-17719][SPARK-17776][SQL] Unify and tie up ...

2016-10-06 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/15292#discussion_r82329856
  
--- Diff: docs/sql-programming-guide.md ---
@@ -1014,16 +1014,31 @@ bin/spark-shell --driver-class-path 
postgresql-9.4.1207.jar --jars postgresql-9.
 {% endhighlight %}
 
 Tables from the remote database can be loaded as a DataFrame or Spark SQL 
Temporary table using
-the Data Sources API. The following options are supported:
+the Data Sources API. The following case-sensitive options are supported:
 
 
   Property NameMeaning
   
 url
 
-  The JDBC URL to connect to.
+  The JDBC URL to connect to. It might contain user and password 
information. e.g., 
jdbc:postgresql://localhost/test?user=fred=secret
--- End diff --

Sure, that sounds more clean and correct.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15292: [SPARK-17719][SPARK-17776][SQL] Unify and tie up ...

2016-10-06 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/15292#discussion_r82329644
  
--- Diff: docs/sql-programming-guide.md ---
@@ -1014,16 +1014,31 @@ bin/spark-shell --driver-class-path 
postgresql-9.4.1207.jar --jars postgresql-9.
 {% endhighlight %}
 
 Tables from the remote database can be loaded as a DataFrame or Spark SQL 
Temporary table using
-the Data Sources API. The following options are supported:
+the Data Sources API. The following case-sensitive options are supported:
 
 
   Property NameMeaning
   
 url
 
-  The JDBC URL to connect to.
+  The JDBC URL to connect to. It might contain user and password 
information. e.g., 
jdbc:postgresql://localhost/test?user=fred=secret
 
   
+
+  
+user
+
+  The user to connect as.
--- End diff --

Sorry, after rethinking it, I think we do not need `user` and `password` 
here.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15292: [SPARK-17719][SPARK-17776][SQL] Unify and tie up ...

2016-10-06 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/15292#discussion_r82329571
  
--- Diff: docs/sql-programming-guide.md ---
@@ -1014,16 +1014,31 @@ bin/spark-shell --driver-class-path 
postgresql-9.4.1207.jar --jars postgresql-9.
 {% endhighlight %}
 
 Tables from the remote database can be loaded as a DataFrame or Spark SQL 
Temporary table using
-the Data Sources API. The following options are supported:
+the Data Sources API. The following case-sensitive options are supported:
 
 
   Property NameMeaning
   
 url
 
-  The JDBC URL to connect to.
+  The JDBC URL to connect to. It might contain user and password 
information. e.g., 
jdbc:postgresql://localhost/test?user=fred=secret
--- End diff --

How about this change?

_The source-specific connection properties may be specified in the URL. 
e.g., jdbc:postgresql://localhost/test?user=fred=secret_


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15292: [SPARK-17719][SPARK-17776][SQL] Unify and tie up ...

2016-10-06 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/15292#discussion_r82329341
  
--- Diff: docs/sql-programming-guide.md ---
@@ -1014,16 +1014,31 @@ bin/spark-shell --driver-class-path 
postgresql-9.4.1207.jar --jars postgresql-9.
 {% endhighlight %}
 
 Tables from the remote database can be loaded as a DataFrame or Spark SQL 
Temporary table using
-the Data Sources API. The following options are supported:
+the Data Sources API. The following case-sensitive options are supported:
 
 
   Property NameMeaning
   
 url
 
-  The JDBC URL to connect to.
+  The JDBC URL to connect to. It might contain user and password 
information. e.g., 
jdbc:postgresql://localhost/test?user=fred=secret
--- End diff --

Actually, this is not accurate. Let me think about it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15292: [SPARK-17719][SPARK-17776][SQL] Unify and tie up ...

2016-10-06 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/15292#discussion_r82306624
  
--- Diff: docs/sql-programming-guide.md ---
@@ -1014,16 +1014,31 @@ bin/spark-shell --driver-class-path 
postgresql-9.4.1207.jar --jars postgresql-9.
 {% endhighlight %}
 
 Tables from the remote database can be loaded as a DataFrame or Spark SQL 
Temporary table using
-the Data Sources API. The following options are supported:
+the Data Sources API. The following case-sensitive options are supported:
 
 
   Property NameMeaning
   
 url
 
-  The JDBC URL to connect to.
+  The JDBC URL to connect to. It might contain user and password 
information. e.g., 
jdbc:postgresql://localhost/test?user=fred=secret
 
   
+
+  
+user
+
+  The user to connect as.
+
+  
+
+  
+password
+
+  The password to use when connecting.
--- End diff --

`the password for logging into the database `


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15292: [SPARK-17719][SPARK-17776][SQL] Unify and tie up ...

2016-10-04 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/15292#discussion_r81868131
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JDBCOptions.scala
 ---
@@ -35,7 +50,12 @@ class JDBCOptions(
   val table = parameters("dbtable")
 
   // 
-  // Optional parameter list
+  // Optional parameters
+  // 
+  val driver = parameters.get(JDBCOptions.JDBC_DRIVER_CLASS)
--- End diff --

Oh, right. Yeap,  will check and try to address the comments below too.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15292: [SPARK-17719][SPARK-17776][SQL] Unify and tie up ...

2016-10-04 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/15292#discussion_r81804213
  
--- Diff: docs/sql-programming-guide.md ---
@@ -1024,6 +1024,7 @@ the Data Sources API. The following options are 
supported:
   The JDBC URL to connect to.
--- End diff --

Could you also improve this? You know, `url` might contain `user` and 
`password`, if users do not provide `user` and `password` in the option list. 

BTW, I think we also need to document `user` and `password` here.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15292: [SPARK-17719][SPARK-17776][SQL] Unify and tie up ...

2016-10-04 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/15292#discussion_r81802372
  
--- Diff: docs/sql-programming-guide.md ---
@@ -1048,28 +1049,42 @@ the Data Sources API. The following options are 
supported:
   partitionColumn must be a numeric column from the table 
in question. Notice
   that lowerBound and upperBound are just 
used to decide the
   partition stride, not for filtering the rows in table. So all rows 
in the table will be
-  partitioned and returned.
+  partitioned and returned. This option applies only to reading.
 
   
 
   
 fetchsize
 
-  The JDBC fetch size, which determines how many rows to fetch per 
round trip. This can help performance on JDBC drivers which default to low 
fetch size (eg. Oracle with 10 rows).
+  The JDBC fetch size, which determines how many rows to fetch per 
round trip. This can help performance on JDBC drivers which default to low 
fetch size (eg. Oracle with 10 rows). This option applies only to reading.
 
   
-  
+
+  
+ batchsize
+ 
+   The JDBC batch size, which determines how many rows to insert per 
round trip. This can help performance on JDBC drivers. This option applies only 
to writing.
--- End diff --

Could you also add the default size here?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org