subject:"\[GitHub\] spark pull request #21590\: \[SPARK\-24423\]\[SQL\] Add a new option for JDBC sour..."

[GitHub] spark pull request #21590: [SPARK-24423][SQL] Add a new option for JDBC sour...

2018-06-26 Thread maropu

Github user maropu commented on a diff in the pull request:

https://github.com/apache/spark/pull/21590#discussion_r198340472
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JDBCOptions.scala
 ---
@@ -109,6 +134,20 @@ class JDBCOptions(
 s"When reading JDBC data sources, users need to specify all or none 
for the following " +
   s"options: '$JDBC_PARTITION_COLUMN', '$JDBC_LOWER_BOUND', 
'$JDBC_UPPER_BOUND', " +
   s"and '$JDBC_NUM_PARTITIONS'")
+
+  require(!(query.isDefined && partitionColumn.isDefined),
+s"""
+   |Options '$JDBC_QUERY_STRING' and '$JDBC_PARTITION_COLUMN' can not 
be specified together.
+   |Please define the query using `$JDBC_TABLE_NAME` option instead 
and make sure to qualify
+   |the partition columns using the supplied subquery alias to resolve 
any ambiguity.
+   |Example :
+   |spark.read.format("jdbc")
+   |.option("dbtable", "(select c1, c2 from t1) as subq")
+   |.option("partitionColumn", "subq.c1"
+   |.load()
+ """.stripMargin
+  )
--- End diff --

aha, ok. Thanks for the kind explanation.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21590: [SPARK-24423][SQL] Add a new option for JDBC sour...

2018-06-26 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/21590


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21590: [SPARK-24423][SQL] Add a new option for JDBC sour...

2018-06-26 Thread dilipbiswal

Github user dilipbiswal commented on a diff in the pull request:

https://github.com/apache/spark/pull/21590#discussion_r198315709
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JDBCOptions.scala
 ---
@@ -109,6 +134,20 @@ class JDBCOptions(
 s"When reading JDBC data sources, users need to specify all or none 
for the following " +
   s"options: '$JDBC_PARTITION_COLUMN', '$JDBC_LOWER_BOUND', 
'$JDBC_UPPER_BOUND', " +
   s"and '$JDBC_NUM_PARTITIONS'")
+
+  require(!(query.isDefined && partitionColumn.isDefined),
+s"""
+   |Options '$JDBC_QUERY_STRING' and '$JDBC_PARTITION_COLUMN' can not 
be specified together.
+   |Please define the query using `$JDBC_TABLE_NAME` option instead 
and make sure to qualify
+   |the partition columns using the supplied subquery alias to resolve 
any ambiguity.
+   |Example :
+   |spark.read.format("jdbc")
+   |.option("dbtable", "(select c1, c2 from t1) as subq")
+   |.option("partitionColumn", "subq.c1"
+   |.load()
+ """.stripMargin
+  )
--- End diff --

@maropu Currently we disallow it to be on the safe side. Lets take your 
example. When using the query option to pass on the query, we basically expect 
the users to supply 
```SQL
select c0 p0, c1 p1, c2 p2 from t where c0 > 1
```
In spark , we will parentesize the query and add in an alias to confirm to 
the table subquery syntax. Given the user input the above query, he could 
decide to qualify the partition column names with the table name. So he could 
do the following :

``` SQL
al df = spark.read
  .format("jdbc")
  .option("driver", "org.postgresql.Driver")
  .option("url", "jdbc:postgresql://localhost:5432/postgres?user=maropu")
  .option("query", "select c0 p0, c1 p1, c2 p2 from t where c0 > 1")
  .option("partitionColumn", "t.p2")  ==> User qualifies the column names.
  .option("lowerBound", "1")
  .option("upperBound", "3")
  .option("numPartitions", "2")
  .load()
```

In this case we will end up generating the query of the following form -
``` SQL
select * from (select c0 p0, c1 p1, c2 p2 from t where c0 > 1) 
__SPARK_GEN_ALIAS where t.p2 >= 1 and t.p2 <=3
```
However this would be an invalid query. In the query option, its possible 
to specify a complex query involving joins. 

Thats the reason, we disallow it to be in safe side. In the dbtable option, 
users are responsible to explicitly specify the alias and would now how to 
qualify the partition columns.

Lets see if we can improve this in future. If you have some ideas, please 
let us know.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21590: [SPARK-24423][SQL] Add a new option for JDBC sour...

2018-06-26 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/21590#discussion_r198314687
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JDBCOptions.scala
 ---
@@ -109,6 +134,20 @@ class JDBCOptions(
 s"When reading JDBC data sources, users need to specify all or none 
for the following " +
   s"options: '$JDBC_PARTITION_COLUMN', '$JDBC_LOWER_BOUND', 
'$JDBC_UPPER_BOUND', " +
   s"and '$JDBC_NUM_PARTITIONS'")
+
+  require(!(query.isDefined && partitionColumn.isDefined),
+s"""
+   |Options '$JDBC_QUERY_STRING' and '$JDBC_PARTITION_COLUMN' can not 
be specified together.
+   |Please define the query using `$JDBC_TABLE_NAME` option instead 
and make sure to qualify
+   |the partition columns using the supplied subquery alias to resolve 
any ambiguity.
+   |Example :
+   |spark.read.format("jdbc")
+   |.option("dbtable", "(select c1, c2 from t1) as subq")
+   |.option("partitionColumn", "subq.c1"
+   |.load()
+ """.stripMargin
+  )
--- End diff --

@dilipbiswal Could you reply @maropu with an example?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21590: [SPARK-24423][SQL] Add a new option for JDBC sour...

2018-06-26 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/21590#discussion_r198313914
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcRelationProvider.scala
 ---
@@ -86,7 +86,8 @@ class JdbcRelationProvider extends 
CreatableRelationProvider
 
   case SaveMode.ErrorIfExists =>
 throw new AnalysisException(
-  s"Table or view '${options.table}' already exists. SaveMode: 
ErrorIfExists.")
+  s"Table or view '${options.table}' already exists. " +
+s"SaveMode: ErrorIfExists.")
--- End diff --

No change, right?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21590: [SPARK-24423][SQL] Add a new option for JDBC sour...

2018-06-26 Thread maropu

Github user maropu commented on a diff in the pull request:

https://github.com/apache/spark/pull/21590#discussion_r198027023
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JDBCOptions.scala
 ---
@@ -174,3 +209,25 @@ object JDBCOptions {
   val JDBC_TXN_ISOLATION_LEVEL = newOption("isolationLevel")
   val JDBC_SESSION_INIT_STATEMENT = newOption("sessionInitStatement")
 }
+
+class JdbcOptionsInWrite(
--- End diff --

That depends on comitter's decisions: @gatorsmile 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21590: [SPARK-24423][SQL] Add a new option for JDBC sour...

2018-06-26 Thread dilipbiswal

Github user dilipbiswal commented on a diff in the pull request:

https://github.com/apache/spark/pull/21590#discussion_r198022351
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JDBCOptions.scala
 ---
@@ -65,13 +65,32 @@ class JDBCOptions(
   // Required parameters
   // 
   require(parameters.isDefinedAt(JDBC_URL), s"Option '$JDBC_URL' is 
required.")
-  require(parameters.isDefinedAt(JDBC_TABLE_NAME), s"Option 
'$JDBC_TABLE_NAME' is required.")
   // a JDBC URL
   val url = parameters(JDBC_URL)
-  // name of table
-  val table = parameters(JDBC_TABLE_NAME)
+  // table name or a table subquery.
+  val tableOrQuery = (parameters.get(JDBC_TABLE_NAME), 
parameters.get(JDBC_QUERY_STRING)) match {
+case (Some(name), Some(subquery)) =>
+  throw new IllegalArgumentException(
+s"Both '$JDBC_TABLE_NAME' and '$JDBC_QUERY_STRING' can not be 
specified at the same time."
+  )
+case (None, None) =>
+  throw new IllegalArgumentException(
+s"Option '$JDBC_TABLE_NAME' or '${JDBC_QUERY_STRING}' is required."
--- End diff --

@maropu Sure.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21590: [SPARK-24423][SQL] Add a new option for JDBC sour...

2018-06-26 Thread dilipbiswal

Github user dilipbiswal commented on a diff in the pull request:

https://github.com/apache/spark/pull/21590#discussion_r198022370
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JDBCOptions.scala
 ---
@@ -65,13 +65,32 @@ class JDBCOptions(
   // Required parameters
   // 
   require(parameters.isDefinedAt(JDBC_URL), s"Option '$JDBC_URL' is 
required.")
-  require(parameters.isDefinedAt(JDBC_TABLE_NAME), s"Option 
'$JDBC_TABLE_NAME' is required.")
   // a JDBC URL
   val url = parameters(JDBC_URL)
-  // name of table
-  val table = parameters(JDBC_TABLE_NAME)
+  // table name or a table subquery.
+  val tableOrQuery = (parameters.get(JDBC_TABLE_NAME), 
parameters.get(JDBC_QUERY_STRING)) match {
+case (Some(name), Some(subquery)) =>
+  throw new IllegalArgumentException(
+s"Both '$JDBC_TABLE_NAME' and '$JDBC_QUERY_STRING' can not be 
specified at the same time."
+  )
+case (None, None) =>
+  throw new IllegalArgumentException(
+s"Option '$JDBC_TABLE_NAME' or '${JDBC_QUERY_STRING}' is required."
+  )
+case (Some(name), None) =>
+  if (name.isEmpty) {
+throw new IllegalArgumentException(s"Option '${JDBC_TABLE_NAME}' 
can not be empty.")
--- End diff --

@maropu OK.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21590: [SPARK-24423][SQL] Add a new option for JDBC sour...

2018-06-26 Thread dilipbiswal

Github user dilipbiswal commented on a diff in the pull request:

https://github.com/apache/spark/pull/21590#discussion_r198022388
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JDBCOptions.scala
 ---
@@ -65,13 +65,32 @@ class JDBCOptions(
   // Required parameters
   // 
   require(parameters.isDefinedAt(JDBC_URL), s"Option '$JDBC_URL' is 
required.")
-  require(parameters.isDefinedAt(JDBC_TABLE_NAME), s"Option 
'$JDBC_TABLE_NAME' is required.")
   // a JDBC URL
   val url = parameters(JDBC_URL)
-  // name of table
-  val table = parameters(JDBC_TABLE_NAME)
+  // table name or a table subquery.
+  val tableOrQuery = (parameters.get(JDBC_TABLE_NAME), 
parameters.get(JDBC_QUERY_STRING)) match {
+case (Some(name), Some(subquery)) =>
+  throw new IllegalArgumentException(
+s"Both '$JDBC_TABLE_NAME' and '$JDBC_QUERY_STRING' can not be 
specified at the same time."
+  )
+case (None, None) =>
+  throw new IllegalArgumentException(
+s"Option '$JDBC_TABLE_NAME' or '${JDBC_QUERY_STRING}' is required."
+  )
+case (Some(name), None) =>
+  if (name.isEmpty) {
+throw new IllegalArgumentException(s"Option '${JDBC_TABLE_NAME}' 
can not be empty.")
+  } else {
+name.trim
+  }
+case (None, Some(subquery)) =>
+  if (subquery.isEmpty) {
+throw new IllegalArgumentException(s"Option `${JDBC_QUERY_STRING}` 
can not be empty.")
--- End diff --

@maropu OK.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21590: [SPARK-24423][SQL] Add a new option for JDBC sour...

2018-06-26 Thread dilipbiswal

Github user dilipbiswal commented on a diff in the pull request:

https://github.com/apache/spark/pull/21590#discussion_r198022295
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JDBCOptions.scala
 ---
@@ -174,3 +209,25 @@ object JDBCOptions {
   val JDBC_TXN_ISOLATION_LEVEL = newOption("isolationLevel")
   val JDBC_SESSION_INIT_STATEMENT = newOption("sessionInitStatement")
 }
+
+class JdbcOptionsInWrite(
+@transient override val parameters: CaseInsensitiveMap[String])
+  extends JDBCOptions(parameters) {
+
+  import JDBCOptions._
+
+  def this(parameters: Map[String, String]) = 
this(CaseInsensitiveMap(parameters))
+
+  def this(url: String, table: String, parameters: Map[String, String]) = {
+this(CaseInsensitiveMap(parameters ++ Map(
+  JDBCOptions.JDBC_URL -> url,
+  JDBCOptions.JDBC_TABLE_NAME -> table)))
+  }
+
+  require(
+parameters.get(JDBC_TABLE_NAME).isDefined,
+s"Option '${JDBCOptions.JDBC_TABLE_NAME}' is required. " +
+  s"Option '${JDBCOptions.JDBC_QUERY_STRING}' is not applicable while 
writing.")
+
+  val destinationTable = parameters(JDBC_TABLE_NAME)
+}
--- End diff --

@maropu I had it as table and refactored it just before i pushed :-). I 
will change it back.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21590: [SPARK-24423][SQL] Add a new option for JDBC sour...

2018-06-26 Thread dilipbiswal

Github user dilipbiswal commented on a diff in the pull request:

https://github.com/apache/spark/pull/21590#discussion_r198022202
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JDBCOptions.scala
 ---
@@ -150,6 +183,7 @@ class JDBCOptions(
 }
 
 object JDBCOptions {
--- End diff --

@maropu Sure.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21590: [SPARK-24423][SQL] Add a new option for JDBC sour...

2018-06-26 Thread dilipbiswal

Github user dilipbiswal commented on a diff in the pull request:

https://github.com/apache/spark/pull/21590#discussion_r198022168
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JDBCOptions.scala
 ---
@@ -65,13 +65,32 @@ class JDBCOptions(
   // Required parameters
   // 
   require(parameters.isDefinedAt(JDBC_URL), s"Option '$JDBC_URL' is 
required.")
-  require(parameters.isDefinedAt(JDBC_TABLE_NAME), s"Option 
'$JDBC_TABLE_NAME' is required.")
   // a JDBC URL
   val url = parameters(JDBC_URL)
-  // name of table
-  val table = parameters(JDBC_TABLE_NAME)
+  // table name or a table subquery.
+  val tableOrQuery = (parameters.get(JDBC_TABLE_NAME), 
parameters.get(JDBC_QUERY_STRING)) match {
+case (Some(name), Some(subquery)) =>
+  throw new IllegalArgumentException(
+s"Both '$JDBC_TABLE_NAME' and '$JDBC_QUERY_STRING' can not be 
specified at the same time."
+  )
+case (None, None) =>
+  throw new IllegalArgumentException(
+s"Option '$JDBC_TABLE_NAME' or '${JDBC_QUERY_STRING}' is required."
+  )
+case (Some(name), None) =>
+  if (name.isEmpty) {
+throw new IllegalArgumentException(s"Option '${JDBC_TABLE_NAME}' 
can not be empty.")
+  } else {
+name.trim
+  }
+case (None, Some(subquery)) =>
+  if (subquery.isEmpty) {
+throw new IllegalArgumentException(s"Option `${JDBC_QUERY_STRING}` 
can not be empty.")
+  } else {
+s"(${subquery}) 
__SPARK_GEN_JDBC_SUBQUERY_NAME_${curId.getAndIncrement()}"
+  }
+  }
 
-  // 
--- End diff --

@maropu OK.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21590: [SPARK-24423][SQL] Add a new option for JDBC sour...

2018-06-26 Thread dilipbiswal

Github user dilipbiswal commented on a diff in the pull request:

https://github.com/apache/spark/pull/21590#discussion_r198022141
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JDBCOptions.scala
 ---
@@ -174,3 +209,25 @@ object JDBCOptions {
   val JDBC_TXN_ISOLATION_LEVEL = newOption("isolationLevel")
   val JDBC_SESSION_INIT_STATEMENT = newOption("sessionInitStatement")
 }
+
+class JdbcOptionsInWrite(
--- End diff --

@maropu Can i take this on as a follow-up ? The reason is i am not fully 
familiar with all the options. I need to study those a bit more before i 
refactor them. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21590: [SPARK-24423][SQL] Add a new option for JDBC sour...

2018-06-25 Thread maropu

Github user maropu commented on a diff in the pull request:

https://github.com/apache/spark/pull/21590#discussion_r198021171
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JDBCOptions.scala
 ---
@@ -65,13 +65,32 @@ class JDBCOptions(
   // Required parameters
   // 
   require(parameters.isDefinedAt(JDBC_URL), s"Option '$JDBC_URL' is 
required.")
-  require(parameters.isDefinedAt(JDBC_TABLE_NAME), s"Option 
'$JDBC_TABLE_NAME' is required.")
   // a JDBC URL
   val url = parameters(JDBC_URL)
-  // name of table
-  val table = parameters(JDBC_TABLE_NAME)
+  // table name or a table subquery.
+  val tableOrQuery = (parameters.get(JDBC_TABLE_NAME), 
parameters.get(JDBC_QUERY_STRING)) match {
+case (Some(name), Some(subquery)) =>
+  throw new IllegalArgumentException(
+s"Both '$JDBC_TABLE_NAME' and '$JDBC_QUERY_STRING' can not be 
specified at the same time."
+  )
+case (None, None) =>
+  throw new IllegalArgumentException(
+s"Option '$JDBC_TABLE_NAME' or '${JDBC_QUERY_STRING}' is required."
+  )
+case (Some(name), None) =>
+  if (name.isEmpty) {
+throw new IllegalArgumentException(s"Option '${JDBC_TABLE_NAME}' 
can not be empty.")
--- End diff --

ditto


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21590: [SPARK-24423][SQL] Add a new option for JDBC sour...

2018-06-25 Thread maropu

Github user maropu commented on a diff in the pull request:

https://github.com/apache/spark/pull/21590#discussion_r198021225
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JDBCOptions.scala
 ---
@@ -65,13 +65,32 @@ class JDBCOptions(
   // Required parameters
   // 
   require(parameters.isDefinedAt(JDBC_URL), s"Option '$JDBC_URL' is 
required.")
-  require(parameters.isDefinedAt(JDBC_TABLE_NAME), s"Option 
'$JDBC_TABLE_NAME' is required.")
   // a JDBC URL
   val url = parameters(JDBC_URL)
-  // name of table
-  val table = parameters(JDBC_TABLE_NAME)
+  // table name or a table subquery.
+  val tableOrQuery = (parameters.get(JDBC_TABLE_NAME), 
parameters.get(JDBC_QUERY_STRING)) match {
+case (Some(name), Some(subquery)) =>
+  throw new IllegalArgumentException(
+s"Both '$JDBC_TABLE_NAME' and '$JDBC_QUERY_STRING' can not be 
specified at the same time."
+  )
+case (None, None) =>
+  throw new IllegalArgumentException(
+s"Option '$JDBC_TABLE_NAME' or '${JDBC_QUERY_STRING}' is required."
+  )
+case (Some(name), None) =>
+  if (name.isEmpty) {
+throw new IllegalArgumentException(s"Option '${JDBC_TABLE_NAME}' 
can not be empty.")
+  } else {
+name.trim
+  }
+case (None, Some(subquery)) =>
+  if (subquery.isEmpty) {
+throw new IllegalArgumentException(s"Option `${JDBC_QUERY_STRING}` 
can not be empty.")
--- End diff --

ditto


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21590: [SPARK-24423][SQL] Add a new option for JDBC sour...

2018-06-25 Thread maropu

Github user maropu commented on a diff in the pull request:

https://github.com/apache/spark/pull/21590#discussion_r198021157
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JDBCOptions.scala
 ---
@@ -65,13 +65,32 @@ class JDBCOptions(
   // Required parameters
   // 
   require(parameters.isDefinedAt(JDBC_URL), s"Option '$JDBC_URL' is 
required.")
-  require(parameters.isDefinedAt(JDBC_TABLE_NAME), s"Option 
'$JDBC_TABLE_NAME' is required.")
   // a JDBC URL
   val url = parameters(JDBC_URL)
-  // name of table
-  val table = parameters(JDBC_TABLE_NAME)
+  // table name or a table subquery.
+  val tableOrQuery = (parameters.get(JDBC_TABLE_NAME), 
parameters.get(JDBC_QUERY_STRING)) match {
+case (Some(name), Some(subquery)) =>
+  throw new IllegalArgumentException(
+s"Both '$JDBC_TABLE_NAME' and '$JDBC_QUERY_STRING' can not be 
specified at the same time."
+  )
+case (None, None) =>
+  throw new IllegalArgumentException(
+s"Option '$JDBC_TABLE_NAME' or '${JDBC_QUERY_STRING}' is required."
--- End diff --

nit: remove braces: '{' and '}'


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21590: [SPARK-24423][SQL] Add a new option for JDBC sour...

2018-06-25 Thread maropu

Github user maropu commented on a diff in the pull request:

https://github.com/apache/spark/pull/21590#discussion_r198019979
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JDBCOptions.scala
 ---
@@ -174,3 +209,25 @@ object JDBCOptions {
   val JDBC_TXN_ISOLATION_LEVEL = newOption("isolationLevel")
   val JDBC_SESSION_INIT_STATEMENT = newOption("sessionInitStatement")
 }
+
+class JdbcOptionsInWrite(
+@transient override val parameters: CaseInsensitiveMap[String])
+  extends JDBCOptions(parameters) {
+
+  import JDBCOptions._
+
+  def this(parameters: Map[String, String]) = 
this(CaseInsensitiveMap(parameters))
+
+  def this(url: String, table: String, parameters: Map[String, String]) = {
+this(CaseInsensitiveMap(parameters ++ Map(
+  JDBCOptions.JDBC_URL -> url,
+  JDBCOptions.JDBC_TABLE_NAME -> table)))
+  }
+
+  require(
+parameters.get(JDBC_TABLE_NAME).isDefined,
+s"Option '${JDBCOptions.JDBC_TABLE_NAME}' is required. " +
+  s"Option '${JDBCOptions.JDBC_QUERY_STRING}' is not applicable while 
writing.")
+
+  val destinationTable = parameters(JDBC_TABLE_NAME)
+}
--- End diff --

Is it bad to change `destinationTable` to `table` for simplicity? 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21590: [SPARK-24423][SQL] Add a new option for JDBC sour...

2018-06-25 Thread maropu

Github user maropu commented on a diff in the pull request:

https://github.com/apache/spark/pull/21590#discussion_r198019710
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JDBCOptions.scala
 ---
@@ -150,6 +183,7 @@ class JDBCOptions(
 }
 
 object JDBCOptions {
--- End diff --

Move `JDBCOptions` in the end of this file


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21590: [SPARK-24423][SQL] Add a new option for JDBC sour...

2018-06-25 Thread maropu

Github user maropu commented on a diff in the pull request:

https://github.com/apache/spark/pull/21590#discussion_r198018682
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JDBCOptions.scala
 ---
@@ -65,13 +65,32 @@ class JDBCOptions(
   // Required parameters
   // 
   require(parameters.isDefinedAt(JDBC_URL), s"Option '$JDBC_URL' is 
required.")
-  require(parameters.isDefinedAt(JDBC_TABLE_NAME), s"Option 
'$JDBC_TABLE_NAME' is required.")
   // a JDBC URL
   val url = parameters(JDBC_URL)
-  // name of table
-  val table = parameters(JDBC_TABLE_NAME)
+  // table name or a table subquery.
+  val tableOrQuery = (parameters.get(JDBC_TABLE_NAME), 
parameters.get(JDBC_QUERY_STRING)) match {
+case (Some(name), Some(subquery)) =>
+  throw new IllegalArgumentException(
+s"Both '$JDBC_TABLE_NAME' and '$JDBC_QUERY_STRING' can not be 
specified at the same time."
+  )
+case (None, None) =>
+  throw new IllegalArgumentException(
+s"Option '$JDBC_TABLE_NAME' or '${JDBC_QUERY_STRING}' is required."
+  )
+case (Some(name), None) =>
+  if (name.isEmpty) {
+throw new IllegalArgumentException(s"Option '${JDBC_TABLE_NAME}' 
can not be empty.")
+  } else {
+name.trim
+  }
+case (None, Some(subquery)) =>
+  if (subquery.isEmpty) {
+throw new IllegalArgumentException(s"Option `${JDBC_QUERY_STRING}` 
can not be empty.")
+  } else {
+s"(${subquery}) 
__SPARK_GEN_JDBC_SUBQUERY_NAME_${curId.getAndIncrement()}"
+  }
+  }
 
-  // 
--- End diff --

Don't need to remove this


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21590: [SPARK-24423][SQL] Add a new option for JDBC sour...

2018-06-25 Thread maropu

Github user maropu commented on a diff in the pull request:

https://github.com/apache/spark/pull/21590#discussion_r198018360
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JDBCOptions.scala
 ---
@@ -174,3 +209,25 @@ object JDBCOptions {
   val JDBC_TXN_ISOLATION_LEVEL = newOption("isolationLevel")
   val JDBC_SESSION_INIT_STATEMENT = newOption("sessionInitStatement")
 }
+
+class JdbcOptionsInWrite(
--- End diff --

How about adding `JdbcOptionsInRead` for read-only options?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21590: [SPARK-24423][SQL] Add a new option for JDBC sour...

2018-06-24 Thread maropu

Github user maropu commented on a diff in the pull request:

https://github.com/apache/spark/pull/21590#discussion_r197631699
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JDBCOptions.scala
 ---
@@ -109,6 +134,20 @@ class JDBCOptions(
 s"When reading JDBC data sources, users need to specify all or none 
for the following " +
   s"options: '$JDBC_PARTITION_COLUMN', '$JDBC_LOWER_BOUND', 
'$JDBC_UPPER_BOUND', " +
   s"and '$JDBC_NUM_PARTITIONS'")
+
+  require(!(query.isDefined && partitionColumn.isDefined),
+s"""
+   |Options '$JDBC_QUERY_STRING' and '$JDBC_PARTITION_COLUMN' can not 
be specified together.
+   |Please define the query using `$JDBC_TABLE_NAME` option instead 
and make sure to qualify
+   |the partition columns using the supplied subquery alias to resolve 
any ambiguity.
+   |Example :
+   |spark.read.format("jdbc")
+   |.option("dbtable", "(select c1, c2 from t1) as subq")
+   |.option("partitionColumn", "subq.c1"
+   |.load()
+ """.stripMargin
+  )
--- End diff --

I was thinking the case below (this can be accepted in `dbtable` and 
`query` is not?):
```

postgres=# select * from t;
 c0 | c1 | c2
++
  1 |  1 |  1
  2 |  2 |  2
  3 |  3 |  3
(3 rows)

scala> :paste
val df = spark.read
  .format("jdbc")
  .option("driver", "org.postgresql.Driver")
  .option("url", "jdbc:postgresql://localhost:5432/postgres?user=maropu")
  .option("dbtable", "(select c0 p0, c1 p1, c2 p2 from t where c0 > 1) t")
  .option("partitionColumn", "p2")
  .option("lowerBound", "1")
  .option("upperBound", "3")
  .option("numPartitions", "2")
  .load()

scala> df.show
+---+---+---+   

| p0| p1| p2|
+---+---+---+
|  2|  2|  2|
|  3|  3|  3|
+---+---+---+
```


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21590: [SPARK-24423][SQL] Add a new option for JDBC sour...

2018-06-23 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/21590#discussion_r197620329
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JDBCOptions.scala
 ---
@@ -65,13 +65,38 @@ class JDBCOptions(
   // Required parameters
   // 
   require(parameters.isDefinedAt(JDBC_URL), s"Option '$JDBC_URL' is 
required.")
-  require(parameters.isDefinedAt(JDBC_TABLE_NAME), s"Option 
'$JDBC_TABLE_NAME' is required.")
+
   // a JDBC URL
   val url = parameters(JDBC_URL)
-  // name of table
-  val table = parameters(JDBC_TABLE_NAME)
+  val tableName = parameters.get(JDBC_TABLE_NAME)
+  val query = parameters.get(JDBC_QUERY_STRING)
--- End diff --

Another option is to follow what we are doing in another PR: 
https://github.com/apache/spark/pull/21247 ? We are facing the same issue 
there. The options are shared by both read and write paths. However, the 
limitations are different. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21590: [SPARK-24423][SQL] Add a new option for JDBC sour...

2018-06-23 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/21590#discussion_r197620436
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JDBCOptions.scala
 ---
@@ -65,13 +65,38 @@ class JDBCOptions(
   // Required parameters
   // 
   require(parameters.isDefinedAt(JDBC_URL), s"Option '$JDBC_URL' is 
required.")
-  require(parameters.isDefinedAt(JDBC_TABLE_NAME), s"Option 
'$JDBC_TABLE_NAME' is required.")
+
   // a JDBC URL
   val url = parameters(JDBC_URL)
-  // name of table
-  val table = parameters(JDBC_TABLE_NAME)
+  val tableName = parameters.get(JDBC_TABLE_NAME)
+  val query = parameters.get(JDBC_QUERY_STRING)
+  // Following two conditions make sure that :
+  // 1. One of the option (dbtable or query) must be specified.
+  // 2. Both of them can not be specified at the same time as they are 
conflicting in nature.
+  require(
+tableName.isDefined || query.isDefined,
+s"Option '$JDBC_TABLE_NAME' or '${JDBC_QUERY_STRING}' is required."
+  )
+
+  require(
+!(tableName.isDefined && query.isDefined),
+s"Both '$JDBC_TABLE_NAME' and '$JDBC_QUERY_STRING' can not be 
specified."
+  )
+
+  // table name or a table expression.
+  val tableOrQuery = tableName.map(_.trim).getOrElse {
--- End diff --

Using a tuple match here?
```
(tableName, query) match {
  case ...
}
```


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21590: [SPARK-24423][SQL] Add a new option for JDBC sour...

2018-06-23 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/21590#discussion_r197620455
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JDBCOptions.scala
 ---
@@ -109,6 +134,20 @@ class JDBCOptions(
 s"When reading JDBC data sources, users need to specify all or none 
for the following " +
   s"options: '$JDBC_PARTITION_COLUMN', '$JDBC_LOWER_BOUND', 
'$JDBC_UPPER_BOUND', " +
   s"and '$JDBC_NUM_PARTITIONS'")
+
+  require(!(query.isDefined && partitionColumn.isDefined),
+s"""
+   |Options '$JDBC_QUERY_STRING' and '$JDBC_PARTITION_COLUMN' can not 
be specified together.
+   |Please define the query using `$JDBC_TABLE_NAME` option instead 
and make sure to qualify
+   |the partition columns using the supplied subquery alias to resolve 
any ambiguity.
+   |Example :
+   |spark.read.format("jdbc")
+   |.option("dbtable", "(select c1, c2 from t1) as subq")
+   |.option("partitionColumn", "subq.c1"
+   |.load()
--- End diff --

Great! Please add them to the doc as I mentioned above?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21590: [SPARK-24423][SQL] Add a new option for JDBC sour...

2018-06-23 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/21590#discussion_r197620140
  
--- Diff: docs/sql-programming-guide.md ---
@@ -1302,9 +1302,20 @@ the following case-insensitive options:
   
 dbtable
 
-  The JDBC table that should be read. Note that anything that is valid 
in a FROM clause of
-  a SQL query can be used. For example, instead of a full table you 
could also use a
-  subquery in parentheses.
+  The JDBC table that should be read from or written into. Note that 
when using it in the read
+  path anything that is valid in a FROM clause of a SQL 
query can be used.
+  For example, instead of a full table you could also use a subquery 
in parentheses. Its not
+  allowed to specify `dbtable` and `query` options at the same time.
+
+  
+  
+query
+
+  A query that will be used to read data into Spark. The specified 
query will be parenthesized and used
+  as a subquery in the FROM clause. Spark will also 
assign a alias to the subquery clause.
+  As an example, spark will issue a query of the following form to the 
datasource.
+   SELECT columns FROM (user_specified_query) 
spark_gen_alias
+  Its not allowed to specify `dbtable` and `query` options at the same 
time.
--- End diff --

Also document the limitation of `query` when having `partitionColumn` and 
the workaround?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21590: [SPARK-24423][SQL] Add a new option for JDBC sour...

2018-06-23 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/21590#discussion_r197620099
  
--- Diff: docs/sql-programming-guide.md ---
@@ -1302,9 +1302,20 @@ the following case-insensitive options:
   
 dbtable
 
-  The JDBC table that should be read. Note that anything that is valid 
in a FROM clause of
-  a SQL query can be used. For example, instead of a full table you 
could also use a
-  subquery in parentheses.
+  The JDBC table that should be read from or written into. Note that 
when using it in the read
+  path anything that is valid in a FROM clause of a SQL 
query can be used.
+  For example, instead of a full table you could also use a subquery 
in parentheses. Its not
+  allowed to specify `dbtable` and `query` options at the same time.
+
+  
+  
+query
+
+  A query that will be used to read data into Spark. The specified 
query will be parenthesized and used
+  as a subquery in the FROM clause. Spark will also 
assign a alias to the subquery clause.
+  As an example, spark will issue a query of the following form to the 
datasource.
--- End diff --

`datasource` -> `JDBC source`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21590: [SPARK-24423][SQL] Add a new option for JDBC sour...

2018-06-23 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/21590#discussion_r197620021
  
--- Diff: docs/sql-programming-guide.md ---
@@ -1302,9 +1302,20 @@ the following case-insensitive options:
   
 dbtable
 
-  The JDBC table that should be read. Note that anything that is valid 
in a FROM clause of
-  a SQL query can be used. For example, instead of a full table you 
could also use a
-  subquery in parentheses.
+  The JDBC table that should be read from or written into. Note that 
when using it in the read
+  path anything that is valid in a FROM clause of a SQL 
query can be used.
+  For example, instead of a full table you could also use a subquery 
in parentheses. Its not
--- End diff --

Nit: `It is`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21590: [SPARK-24423][SQL] Add a new option for JDBC sour...

2018-06-23 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/21590#discussion_r197620115
  
--- Diff: docs/sql-programming-guide.md ---
@@ -1302,9 +1302,20 @@ the following case-insensitive options:
   
 dbtable
 
-  The JDBC table that should be read. Note that anything that is valid 
in a FROM clause of
-  a SQL query can be used. For example, instead of a full table you 
could also use a
-  subquery in parentheses.
+  The JDBC table that should be read from or written into. Note that 
when using it in the read
+  path anything that is valid in a FROM clause of a SQL 
query can be used.
+  For example, instead of a full table you could also use a subquery 
in parentheses. Its not
+  allowed to specify `dbtable` and `query` options at the same time.
+
+  
+  
+query
+
+  A query that will be used to read data into Spark. The specified 
query will be parenthesized and used
+  as a subquery in the FROM clause. Spark will also 
assign a alias to the subquery clause.
+  As an example, spark will issue a query of the following form to the 
datasource.
+   SELECT columns FROM (user_specified_query) 
spark_gen_alias
+  Its not allowed to specify `dbtable` and `query` options at the same 
time.
--- End diff --

`Its` -> `it is`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21590: [SPARK-24423][SQL] Add a new option for JDBC sour...

2018-06-23 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/21590#discussion_r197620496
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcRelationProvider.scala
 ---
@@ -58,6 +58,10 @@ class JdbcRelationProvider extends 
CreatableRelationProvider
   parameters: Map[String, String],
   df: DataFrame): BaseRelation = {
 val options = new JDBCOptions(parameters)
+require(
+  options.tableName.isDefined,
+  s"Option '${JDBCOptions.JDBC_TABLE_NAME}' is required. " +
+s"Option '${JDBCOptions.JDBC_QUERY_STRING}' is not applicable 
while writing.")
--- End diff --

Let us create a `JDBCOptionsInWrite`?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21590: [SPARK-24423][SQL] Add a new option for JDBC sour...

2018-06-23 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/21590#discussion_r197620483
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JDBCOptions.scala
 ---
@@ -109,6 +134,20 @@ class JDBCOptions(
 s"When reading JDBC data sources, users need to specify all or none 
for the following " +
   s"options: '$JDBC_PARTITION_COLUMN', '$JDBC_LOWER_BOUND', 
'$JDBC_UPPER_BOUND', " +
   s"and '$JDBC_NUM_PARTITIONS'")
+
+  require(!(query.isDefined && partitionColumn.isDefined),
+s"""
+   |Options '$JDBC_QUERY_STRING' and '$JDBC_PARTITION_COLUMN' can not 
be specified together.
+   |Please define the query using `$JDBC_TABLE_NAME` option instead 
and make sure to qualify
+   |the partition columns using the supplied subquery alias to resolve 
any ambiguity.
+   |Example :
+   |spark.read.format("jdbc")
+   |.option("dbtable", "(select c1, c2 from t1) as subq")
+   |.option("partitionColumn", "subq.c1"
+   |.load()
+ """.stripMargin
+  )
--- End diff --

@maropu The new option `query` is just a syntactic sugar for simplifying 
the work from many basic JDBC users. We can improve it in the future. For 
example, parsing the user-specified query and make all the other options work. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21590: [SPARK-24423][SQL] Add a new option for JDBC sour...

2018-06-23 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/21590#discussion_r197620091
  
--- Diff: docs/sql-programming-guide.md ---
@@ -1302,9 +1302,20 @@ the following case-insensitive options:
   
 dbtable
 
-  The JDBC table that should be read. Note that anything that is valid 
in a FROM clause of
-  a SQL query can be used. For example, instead of a full table you 
could also use a
-  subquery in parentheses.
+  The JDBC table that should be read from or written into. Note that 
when using it in the read
+  path anything that is valid in a FROM clause of a SQL 
query can be used.
+  For example, instead of a full table you could also use a subquery 
in parentheses. Its not
+  allowed to specify `dbtable` and `query` options at the same time.
+
+  
+  
+query
+
+  A query that will be used to read data into Spark. The specified 
query will be parenthesized and used
+  as a subquery in the FROM clause. Spark will also 
assign a alias to the subquery clause.
--- End diff --

`a alias` -> `an alias` 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21590: [SPARK-24423][SQL] Add a new option for JDBC sour...

2018-06-23 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/21590#discussion_r197620340
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JDBCOptions.scala
 ---
@@ -65,13 +65,38 @@ class JDBCOptions(
   // Required parameters
   // 
   require(parameters.isDefinedAt(JDBC_URL), s"Option '$JDBC_URL' is 
required.")
-  require(parameters.isDefinedAt(JDBC_TABLE_NAME), s"Option 
'$JDBC_TABLE_NAME' is required.")
+
   // a JDBC URL
   val url = parameters(JDBC_URL)
-  // name of table
-  val table = parameters(JDBC_TABLE_NAME)
+  val tableName = parameters.get(JDBC_TABLE_NAME)
+  val query = parameters.get(JDBC_QUERY_STRING)
+  // Following two conditions make sure that :
+  // 1. One of the option (dbtable or query) must be specified.
+  // 2. Both of them can not be specified at the same time as they are 
conflicting in nature.
+  require(
+tableName.isDefined || query.isDefined,
+s"Option '$JDBC_TABLE_NAME' or '${JDBC_QUERY_STRING}' is required."
+  )
+
+  require(
+!(tableName.isDefined && query.isDefined),
+s"Both '$JDBC_TABLE_NAME' and '$JDBC_QUERY_STRING' can not be 
specified."
--- End diff --

-> `can not be specified at the same time`?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21590: [SPARK-24423][SQL] Add a new option for JDBC sour...

2018-06-23 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/21590#discussion_r197620384
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JDBCOptions.scala
 ---
@@ -65,13 +65,38 @@ class JDBCOptions(
   // Required parameters
   // 
   require(parameters.isDefinedAt(JDBC_URL), s"Option '$JDBC_URL' is 
required.")
-  require(parameters.isDefinedAt(JDBC_TABLE_NAME), s"Option 
'$JDBC_TABLE_NAME' is required.")
+
   // a JDBC URL
   val url = parameters(JDBC_URL)
-  // name of table
-  val table = parameters(JDBC_TABLE_NAME)
+  val tableName = parameters.get(JDBC_TABLE_NAME)
+  val query = parameters.get(JDBC_QUERY_STRING)
+  // Following two conditions make sure that :
+  // 1. One of the option (dbtable or query) must be specified.
+  // 2. Both of them can not be specified at the same time as they are 
conflicting in nature.
+  require(
+tableName.isDefined || query.isDefined,
+s"Option '$JDBC_TABLE_NAME' or '${JDBC_QUERY_STRING}' is required."
+  )
+
+  require(
+!(tableName.isDefined && query.isDefined),
+s"Both '$JDBC_TABLE_NAME' and '$JDBC_QUERY_STRING' can not be 
specified."
+  )
+
+  // table name or a table expression.
+  val tableOrQuery = tableName.map(_.trim).getOrElse {
+// We have ensured in the code above that either dbtable or query is 
specified.
+query.get match {
+  case subQuery if subQuery.nonEmpty => s"(${subQuery}) 
spark_gen_${curId.getAndIncrement()}"
--- End diff --

`__SPARK_GEN_JDBC_SUBQUERY_NAME_${curId.getAndIncrement()}__`?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21590: [SPARK-24423][SQL] Add a new option for JDBC sour...

2018-06-22 Thread maropu

Github user maropu commented on a diff in the pull request:

https://github.com/apache/spark/pull/21590#discussion_r197356603
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JDBCOptions.scala
 ---
@@ -65,13 +65,38 @@ class JDBCOptions(
   // Required parameters
   // 
   require(parameters.isDefinedAt(JDBC_URL), s"Option '$JDBC_URL' is 
required.")
-  require(parameters.isDefinedAt(JDBC_TABLE_NAME), s"Option 
'$JDBC_TABLE_NAME' is required.")
+
   // a JDBC URL
   val url = parameters(JDBC_URL)
-  // name of table
-  val table = parameters(JDBC_TABLE_NAME)
+  val tableName = parameters.get(JDBC_TABLE_NAME)
+  val query = parameters.get(JDBC_QUERY_STRING)
--- End diff --

yea, it's okay to depend on xiao's opnion.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21590: [SPARK-24423][SQL] Add a new option for JDBC sour...

2018-06-22 Thread dilipbiswal

Github user dilipbiswal commented on a diff in the pull request:

https://github.com/apache/spark/pull/21590#discussion_r197355161
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JDBCOptions.scala
 ---
@@ -65,13 +65,38 @@ class JDBCOptions(
   // Required parameters
   // 
   require(parameters.isDefinedAt(JDBC_URL), s"Option '$JDBC_URL' is 
required.")
-  require(parameters.isDefinedAt(JDBC_TABLE_NAME), s"Option 
'$JDBC_TABLE_NAME' is required.")
+
   // a JDBC URL
   val url = parameters(JDBC_URL)
-  // name of table
-  val table = parameters(JDBC_TABLE_NAME)
+  val tableName = parameters.get(JDBC_TABLE_NAME)
+  val query = parameters.get(JDBC_QUERY_STRING)
--- End diff --

@maropu Thank you for taking the time to think about this throughly. A 
couple of questions/comments.
1) Looks like for read path we give precedence to dbtable over query. I 
feel its good to explicitly disallow this with a clear message in case of an 
ambiguity.
2) Usage of lazy here (especially to trigger errors) makes me a little 
nervous. Like if we want to introduce a debug statement to print the variables 
in side the QueryOptions class, things will not work any more, right ? Thats 
the reason, i had opted to check for the "invalid query option in write path" 
in the write function itself (i.e when i am sure of the calling context). 
Perhaps that how its used every where in which case it may be okay to follow 
the same approach here. 

I am okay with this. Lets get some opinion from @gatorsmile. Once i have 
the final set of comments, i will make the changes. Thanks again.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21590: [SPARK-24423][SQL] Add a new option for JDBC sour...

2018-06-22 Thread maropu

Github user maropu commented on a diff in the pull request:

https://github.com/apache/spark/pull/21590#discussion_r197349327
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JDBCOptions.scala
 ---
@@ -65,13 +65,38 @@ class JDBCOptions(
   // Required parameters
   // 
   require(parameters.isDefinedAt(JDBC_URL), s"Option '$JDBC_URL' is 
required.")
-  require(parameters.isDefinedAt(JDBC_TABLE_NAME), s"Option 
'$JDBC_TABLE_NAME' is required.")
+
   // a JDBC URL
   val url = parameters(JDBC_URL)
-  // name of table
-  val table = parameters(JDBC_TABLE_NAME)
+  val tableName = parameters.get(JDBC_TABLE_NAME)
+  val query = parameters.get(JDBC_QUERY_STRING)
--- End diff --

I think, since the `tableName` and `query` variables don't need to be 
exposed to other classes, can we remove them?

Btw, I feel sharing the `tableName` variable int both write/read paths 
makes code some complicated, so how about splitting the variable into two part: 
`tableOrQuery` for reading and `outputName` for writing?

e.g., 
https://github.com/apache/spark/commit/d62372ab0e855c359122609f1805ce83661d510e


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21590: [SPARK-24423][SQL] Add a new option for JDBC sour...

2018-06-21 Thread maropu

Github user maropu commented on a diff in the pull request:

https://github.com/apache/spark/pull/21590#discussion_r197347130
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JDBCOptions.scala
 ---
@@ -65,13 +65,38 @@ class JDBCOptions(
   // Required parameters
   // 
   require(parameters.isDefinedAt(JDBC_URL), s"Option '$JDBC_URL' is 
required.")
-  require(parameters.isDefinedAt(JDBC_TABLE_NAME), s"Option 
'$JDBC_TABLE_NAME' is required.")
+
   // a JDBC URL
   val url = parameters(JDBC_URL)
-  // name of table
-  val table = parameters(JDBC_TABLE_NAME)
+  val tableName = parameters.get(JDBC_TABLE_NAME)
+  val query = parameters.get(JDBC_QUERY_STRING)
+  // Following two conditions make sure that :
+  // 1. One of the option (dbtable or query) must be specified.
+  // 2. Both of them can not be specified at the same time as they are 
conflicting in nature.
+  require(
+tableName.isDefined || query.isDefined,
+s"Option '$JDBC_TABLE_NAME' or '${JDBC_QUERY_STRING}' is required."
+  )
+
+  require(
+!(tableName.isDefined && query.isDefined),
+s"Both '$JDBC_TABLE_NAME' and '$JDBC_QUERY_STRING' can not be 
specified."
+  )
+
+  // table name or a table expression.
+  val tableOrQuery = tableName.map(_.trim).getOrElse {
+// We have ensured in the code above that either dbtable or query is 
specified.
+query.get match {
+  case subQuery if subQuery.nonEmpty => s"(${subQuery}) 
spark_gen_${curId.getAndIncrement()}"
+  case subQuery => subQuery
+}
+  }
+
+  require(tableOrQuery.nonEmpty,
+s"Empty string is not allowed in either '$JDBC_TABLE_NAME' or 
'${JDBC_QUERY_STRING}' options"
+  )
+
 
-  // 
--- End diff --

nit: revert this line


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21590: [SPARK-24423][SQL] Add a new option for JDBC sour...

2018-06-20 Thread dilipbiswal

Github user dilipbiswal commented on a diff in the pull request:

https://github.com/apache/spark/pull/21590#discussion_r196975167
  
--- Diff: docs/sql-programming-guide.md ---
@@ -1302,9 +1302,18 @@ the following case-insensitive options:
   
 dbtable
 
-  The JDBC table that should be read. Note that anything that is valid 
in a FROM clause of
-  a SQL query can be used. For example, instead of a full table you 
could also use a
-  subquery in parentheses.
+  The JDBC table that should be read from or written into. Note that 
when using it in the read
+  path anything that is valid in a FROM clause of a SQL 
query can be used.
+  For example, instead of a full table you could also use a subquery 
in parentheses.
+
+  
+  
+query
+
+  A query that will be used to read data into Spark. The specified 
query will be parenthesized and used
+  as a subquery in the FROM clause. Spark will also 
assign a alias to the subquery clause.
+  As an example, spark will issue a query of the following form to the 
datasource.
+   SELECT columns FROM (user_specified_query) 
spark_generated_alias
--- End diff --

@viirya OK, i will add this.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21590: [SPARK-24423][SQL] Add a new option for JDBC sour...

2018-06-20 Thread dilipbiswal

Github user dilipbiswal commented on a diff in the pull request:

https://github.com/apache/spark/pull/21590#discussion_r196897983
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JDBCOptions.scala
 ---
@@ -65,13 +65,38 @@ class JDBCOptions(
   // Required parameters
   // 
   require(parameters.isDefinedAt(JDBC_URL), s"Option '$JDBC_URL' is 
required.")
-  require(parameters.isDefinedAt(JDBC_TABLE_NAME), s"Option 
'$JDBC_TABLE_NAME' is required.")
+
   // a JDBC URL
   val url = parameters(JDBC_URL)
-  // name of table
-  val table = parameters(JDBC_TABLE_NAME)
+  val tableName = parameters.get(JDBC_TABLE_NAME)
+  val query = parameters.get(JDBC_QUERY_STRING)
+  // Following two conditions make sure that :
+  // 1. One of the option (dbtable or query) must be specified.
+  // 2. Both of them can not be specified at the same time as they are 
conflicting in nature.
+  require(
+tableName.isDefined || query.isDefined,
+s"Option '$JDBC_TABLE_NAME' or '${JDBC_QUERY_STRING}' is required."
+  )
+
+  require(
+!(tableName.isDefined && query.isDefined),
+s"Both '$JDBC_TABLE_NAME' and '$JDBC_QUERY_STRING' can not be 
specified."
+  )
+
+  // table name or a table expression.
+  val tableExpression = tableName.map(_.trim).getOrElse {
+// We have ensured in the code above that either dbtable or query is 
specified.
+query.get match {
+  case subq if subq.nonEmpty => s"(${subq}) 
spark_gen_${curId.getAndIncrement()}"
+  case subq => subq
+}
+  }
+
+  require(tableExpression.nonEmpty,
--- End diff --

@gengliangwang I see your point. Does this read better to you ?

```
require(tableOrQuery.nonEmpty,
s"Empty string is not allowed in either '$JDBC_TABLE_NAME' or 
'${JDBC_QUERY_STRING}' options"
  )
```


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21590: [SPARK-24423][SQL] Add a new option for JDBC sour...

2018-06-20 Thread dilipbiswal

Github user dilipbiswal commented on a diff in the pull request:

https://github.com/apache/spark/pull/21590#discussion_r196895138
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JDBCOptions.scala
 ---
@@ -65,13 +65,38 @@ class JDBCOptions(
   // Required parameters
   // 
   require(parameters.isDefinedAt(JDBC_URL), s"Option '$JDBC_URL' is 
required.")
-  require(parameters.isDefinedAt(JDBC_TABLE_NAME), s"Option 
'$JDBC_TABLE_NAME' is required.")
+
   // a JDBC URL
   val url = parameters(JDBC_URL)
-  // name of table
-  val table = parameters(JDBC_TABLE_NAME)
+  val tableName = parameters.get(JDBC_TABLE_NAME)
--- End diff --

@gengliangwang Thanks.. Actually i had tried a couple of different ways. 
Some how i found this a little hard to follow when i embed the error message. I 
like to check things upfront along with comments on top easy to follow. But if 
others find this easy to follow as well, then i will change.
```SQL
val tableExpression = if (parameters.isDefinedAt(JDBC_TABLE_NAME)) {
require(!parameters.isDefinedAt(JDBC_QUERY_STRING),
  s"Both '$JDBC_TABLE_NAME' and '$JDBC_QUERY_STRING' can not be 
specified."
)
parameters.get(JDBC_TABLE_NAME).get.trim
  } else {
require(parameters.isDefinedAt(JDBC_QUERY_STRING),
  s"Option '$JDBC_TABLE_NAME' or '${JDBC_QUERY_STRING}' is required."
)
s"(${parameters.get(JDBC_QUERY_STRING)}) ${curId.getAndIncrement()}"
  }
```


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21590: [SPARK-24423][SQL] Add a new option for JDBC sour...

2018-06-20 Thread dilipbiswal

Github user dilipbiswal commented on a diff in the pull request:

https://github.com/apache/spark/pull/21590#discussion_r196847962
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JDBCOptions.scala
 ---
@@ -65,13 +65,38 @@ class JDBCOptions(
   // Required parameters
   // 
   require(parameters.isDefinedAt(JDBC_URL), s"Option '$JDBC_URL' is 
required.")
-  require(parameters.isDefinedAt(JDBC_TABLE_NAME), s"Option 
'$JDBC_TABLE_NAME' is required.")
+
   // a JDBC URL
   val url = parameters(JDBC_URL)
-  // name of table
-  val table = parameters(JDBC_TABLE_NAME)
+  val tableName = parameters.get(JDBC_TABLE_NAME)
+  val query = parameters.get(JDBC_QUERY_STRING)
+  // Following two conditions make sure that :
+  // 1. One of the option (dbtable or query) must be specified.
+  // 2. Both of them can not be specified at the same time as they are 
conflicting in nature.
+  require(
+tableName.isDefined || query.isDefined,
+s"Option '$JDBC_TABLE_NAME' or '${JDBC_QUERY_STRING}' is required."
+  )
+
+  require(
+!(tableName.isDefined && query.isDefined),
+s"Both '$JDBC_TABLE_NAME' and '$JDBC_QUERY_STRING' can not be 
specified."
+  )
+
+  // table name or a table expression.
+  val tableExpression = tableName.map(_.trim).getOrElse {
--- End diff --

@viirya ok. i will change. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21590: [SPARK-24423][SQL] Add a new option for JDBC sour...

2018-06-20 Thread dilipbiswal

Github user dilipbiswal commented on a diff in the pull request:

https://github.com/apache/spark/pull/21590#discussion_r196848042
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JDBCOptions.scala
 ---
@@ -65,13 +65,38 @@ class JDBCOptions(
   // Required parameters
   // 
   require(parameters.isDefinedAt(JDBC_URL), s"Option '$JDBC_URL' is 
required.")
-  require(parameters.isDefinedAt(JDBC_TABLE_NAME), s"Option 
'$JDBC_TABLE_NAME' is required.")
+
   // a JDBC URL
   val url = parameters(JDBC_URL)
-  // name of table
-  val table = parameters(JDBC_TABLE_NAME)
+  val tableName = parameters.get(JDBC_TABLE_NAME)
+  val query = parameters.get(JDBC_QUERY_STRING)
+  // Following two conditions make sure that :
+  // 1. One of the option (dbtable or query) must be specified.
+  // 2. Both of them can not be specified at the same time as they are 
conflicting in nature.
+  require(
+tableName.isDefined || query.isDefined,
+s"Option '$JDBC_TABLE_NAME' or '${JDBC_QUERY_STRING}' is required."
+  )
+
+  require(
+!(tableName.isDefined && query.isDefined),
+s"Both '$JDBC_TABLE_NAME' and '$JDBC_QUERY_STRING' can not be 
specified."
+  )
+
+  // table name or a table expression.
+  val tableExpression = tableName.map(_.trim).getOrElse {
+// We have ensured in the code above that either dbtable or query is 
specified.
+query.get match {
+  case subq if subq.nonEmpty => s"(${subq}) 
spark_gen_${curId.getAndIncrement()}"
--- End diff --

@viirya Will change.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21590: [SPARK-24423][SQL] Add a new option for JDBC sour...

2018-06-20 Thread dilipbiswal

Github user dilipbiswal commented on a diff in the pull request:

https://github.com/apache/spark/pull/21590#discussion_r196847882
  
--- Diff: docs/sql-programming-guide.md ---
@@ -1302,9 +1302,18 @@ the following case-insensitive options:
   
 dbtable
 
-  The JDBC table that should be read. Note that anything that is valid 
in a FROM clause of
-  a SQL query can be used. For example, instead of a full table you 
could also use a
-  subquery in parentheses.
+  The JDBC table that should be read from or written into. Note that 
when using it in the read
+  path anything that is valid in a FROM clause of a SQL 
query can be used.
+  For example, instead of a full table you could also use a subquery 
in parentheses.
+
+  
+  
+query
+
+  A query that will be used to read data into Spark. The specified 
query will be parenthesized and used
+  as a subquery in the FROM clause. Spark will also 
assign a alias to the subquery clause.
--- End diff --

@viirya I think its better to let users know how we generate the from 
clause. That way they can choose to qualify the partition columns if needed. 
However, if you strongly feel otherwise, i will remove from doc.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21590: [SPARK-24423][SQL] Add a new option for JDBC sour...

2018-06-20 Thread dilipbiswal

Github user dilipbiswal commented on a diff in the pull request:

https://github.com/apache/spark/pull/21590#discussion_r196846549
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JDBCOptions.scala
 ---
@@ -65,13 +65,38 @@ class JDBCOptions(
   // Required parameters
   // 
   require(parameters.isDefinedAt(JDBC_URL), s"Option '$JDBC_URL' is 
required.")
-  require(parameters.isDefinedAt(JDBC_TABLE_NAME), s"Option 
'$JDBC_TABLE_NAME' is required.")
+
   // a JDBC URL
   val url = parameters(JDBC_URL)
-  // name of table
-  val table = parameters(JDBC_TABLE_NAME)
+  val tableName = parameters.get(JDBC_TABLE_NAME)
+  val query = parameters.get(JDBC_QUERY_STRING)
+  // Following two conditions make sure that :
+  // 1. One of the option (dbtable or query) must be specified.
+  // 2. Both of them can not be specified at the same time as they are 
conflicting in nature.
+  require(
+tableName.isDefined || query.isDefined,
+s"Option '$JDBC_TABLE_NAME' or '${JDBC_QUERY_STRING}' is required."
+  )
+
+  require(
+!(tableName.isDefined && query.isDefined),
+s"Both '$JDBC_TABLE_NAME' and '$JDBC_QUERY_STRING' can not be 
specified."
+  )
+
+  // table name or a table expression.
+  val tableExpression = tableName.map(_.trim).getOrElse {
+// We have ensured in the code above that either dbtable or query is 
specified.
+query.get match {
+  case subq if subq.nonEmpty => s"(${subq}) 
spark_gen_${curId.getAndIncrement()}"
--- End diff --

@maropu Yeah. we need an alias. Systems like postgress require a mandatory 
table subquery alias.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21590: [SPARK-24423][SQL] Add a new option for JDBC sour...

2018-06-20 Thread viirya

Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/21590#discussion_r196788131
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JDBCOptions.scala
 ---
@@ -65,13 +65,38 @@ class JDBCOptions(
   // Required parameters
   // 
   require(parameters.isDefinedAt(JDBC_URL), s"Option '$JDBC_URL' is 
required.")
-  require(parameters.isDefinedAt(JDBC_TABLE_NAME), s"Option 
'$JDBC_TABLE_NAME' is required.")
+
   // a JDBC URL
   val url = parameters(JDBC_URL)
-  // name of table
-  val table = parameters(JDBC_TABLE_NAME)
+  val tableName = parameters.get(JDBC_TABLE_NAME)
+  val query = parameters.get(JDBC_QUERY_STRING)
+  // Following two conditions make sure that :
+  // 1. One of the option (dbtable or query) must be specified.
+  // 2. Both of them can not be specified at the same time as they are 
conflicting in nature.
+  require(
+tableName.isDefined || query.isDefined,
+s"Option '$JDBC_TABLE_NAME' or '${JDBC_QUERY_STRING}' is required."
+  )
+
+  require(
+!(tableName.isDefined && query.isDefined),
+s"Both '$JDBC_TABLE_NAME' and '$JDBC_QUERY_STRING' can not be 
specified."
+  )
+
+  // table name or a table expression.
+  val tableExpression = tableName.map(_.trim).getOrElse {
+// We have ensured in the code above that either dbtable or query is 
specified.
+query.get match {
+  case subq if subq.nonEmpty => s"(${subq}) 
spark_gen_${curId.getAndIncrement()}"
+  case subq => subq
--- End diff --

Ok. I saw you forbidding it later.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21590: [SPARK-24423][SQL] Add a new option for JDBC sour...

2018-06-20 Thread viirya

Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/21590#discussion_r196787410
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JDBCOptions.scala
 ---
@@ -65,13 +65,38 @@ class JDBCOptions(
   // Required parameters
   // 
   require(parameters.isDefinedAt(JDBC_URL), s"Option '$JDBC_URL' is 
required.")
-  require(parameters.isDefinedAt(JDBC_TABLE_NAME), s"Option 
'$JDBC_TABLE_NAME' is required.")
+
   // a JDBC URL
   val url = parameters(JDBC_URL)
-  // name of table
-  val table = parameters(JDBC_TABLE_NAME)
+  val tableName = parameters.get(JDBC_TABLE_NAME)
+  val query = parameters.get(JDBC_QUERY_STRING)
+  // Following two conditions make sure that :
+  // 1. One of the option (dbtable or query) must be specified.
+  // 2. Both of them can not be specified at the same time as they are 
conflicting in nature.
+  require(
+tableName.isDefined || query.isDefined,
+s"Option '$JDBC_TABLE_NAME' or '${JDBC_QUERY_STRING}' is required."
+  )
+
+  require(
+!(tableName.isDefined && query.isDefined),
+s"Both '$JDBC_TABLE_NAME' and '$JDBC_QUERY_STRING' can not be 
specified."
+  )
+
+  // table name or a table expression.
+  val tableExpression = tableName.map(_.trim).getOrElse {
+// We have ensured in the code above that either dbtable or query is 
specified.
+query.get match {
+  case subq if subq.nonEmpty => s"(${subq}) 
spark_gen_${curId.getAndIncrement()}"
--- End diff --

nit: `subq` -> `subQuery`.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21590: [SPARK-24423][SQL] Add a new option for JDBC sour...

2018-06-20 Thread viirya

Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/21590#discussion_r196787193
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JDBCOptions.scala
 ---
@@ -65,13 +65,38 @@ class JDBCOptions(
   // Required parameters
   // 
   require(parameters.isDefinedAt(JDBC_URL), s"Option '$JDBC_URL' is 
required.")
-  require(parameters.isDefinedAt(JDBC_TABLE_NAME), s"Option 
'$JDBC_TABLE_NAME' is required.")
+
   // a JDBC URL
   val url = parameters(JDBC_URL)
-  // name of table
-  val table = parameters(JDBC_TABLE_NAME)
+  val tableName = parameters.get(JDBC_TABLE_NAME)
+  val query = parameters.get(JDBC_QUERY_STRING)
+  // Following two conditions make sure that :
+  // 1. One of the option (dbtable or query) must be specified.
+  // 2. Both of them can not be specified at the same time as they are 
conflicting in nature.
+  require(
+tableName.isDefined || query.isDefined,
+s"Option '$JDBC_TABLE_NAME' or '${JDBC_QUERY_STRING}' is required."
+  )
+
+  require(
+!(tableName.isDefined && query.isDefined),
+s"Both '$JDBC_TABLE_NAME' and '$JDBC_QUERY_STRING' can not be 
specified."
+  )
+
+  // table name or a table expression.
+  val tableExpression = tableName.map(_.trim).getOrElse {
--- End diff --

`expression` sounds a bit confusing. `tableOrQuery`? `tableSource`?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21590: [SPARK-24423][SQL] Add a new option for JDBC sour...

2018-06-20 Thread viirya

Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/21590#discussion_r196785859
  
--- Diff: docs/sql-programming-guide.md ---
@@ -1302,9 +1302,18 @@ the following case-insensitive options:
   
 dbtable
 
-  The JDBC table that should be read. Note that anything that is valid 
in a FROM clause of
-  a SQL query can be used. For example, instead of a full table you 
could also use a
-  subquery in parentheses.
+  The JDBC table that should be read from or written into. Note that 
when using it in the read
+  path anything that is valid in a FROM clause of a SQL 
query can be used.
+  For example, instead of a full table you could also use a subquery 
in parentheses.
+
+  
+  
+query
+
+  A query that will be used to read data into Spark. The specified 
query will be parenthesized and used
+  as a subquery in the FROM clause. Spark will also 
assign a alias to the subquery clause.
+  As an example, spark will issue a query of the following form to the 
datasource.
+   SELECT columns FROM (user_specified_query) 
spark_generated_alias
--- End diff --

We can mention `dbtable` and `query` can't be specified at the same time.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21590: [SPARK-24423][SQL] Add a new option for JDBC sour...

2018-06-20 Thread viirya

Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/21590#discussion_r196786172
  
--- Diff: docs/sql-programming-guide.md ---
@@ -1302,9 +1302,18 @@ the following case-insensitive options:
   
 dbtable
 
-  The JDBC table that should be read. Note that anything that is valid 
in a FROM clause of
-  a SQL query can be used. For example, instead of a full table you 
could also use a
-  subquery in parentheses.
+  The JDBC table that should be read from or written into. Note that 
when using it in the read
+  path anything that is valid in a FROM clause of a SQL 
query can be used.
+  For example, instead of a full table you could also use a subquery 
in parentheses.
+
+  
+  
+query
+
+  A query that will be used to read data into Spark. The specified 
query will be parenthesized and used
+  as a subquery in the FROM clause. Spark will also 
assign a alias to the subquery clause.
--- End diff --

Is it necessary to mention this query alias? Meaningful to end users?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21590: [SPARK-24423][SQL] Add a new option for JDBC sour...

2018-06-20 Thread viirya

Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/21590#discussion_r196783966
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JDBCOptions.scala
 ---
@@ -65,13 +65,38 @@ class JDBCOptions(
   // Required parameters
   // 
   require(parameters.isDefinedAt(JDBC_URL), s"Option '$JDBC_URL' is 
required.")
-  require(parameters.isDefinedAt(JDBC_TABLE_NAME), s"Option 
'$JDBC_TABLE_NAME' is required.")
+
   // a JDBC URL
   val url = parameters(JDBC_URL)
-  // name of table
-  val table = parameters(JDBC_TABLE_NAME)
+  val tableName = parameters.get(JDBC_TABLE_NAME)
+  val query = parameters.get(JDBC_QUERY_STRING)
+  // Following two conditions make sure that :
+  // 1. One of the option (dbtable or query) must be specified.
+  // 2. Both of them can not be specified at the same time as they are 
conflicting in nature.
+  require(
+tableName.isDefined || query.isDefined,
+s"Option '$JDBC_TABLE_NAME' or '${JDBC_QUERY_STRING}' is required."
+  )
+
+  require(
+!(tableName.isDefined && query.isDefined),
+s"Both '$JDBC_TABLE_NAME' and '$JDBC_QUERY_STRING' can not be 
specified."
+  )
+
+  // table name or a table expression.
+  val tableExpression = tableName.map(_.trim).getOrElse {
+// We have ensured in the code above that either dbtable or query is 
specified.
+query.get match {
+  case subq if subq.nonEmpty => s"(${subq}) 
spark_gen_${curId.getAndIncrement()}"
+  case subq => subq
--- End diff --

hmm, an empty subquery?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21590: [SPARK-24423][SQL] Add a new option for JDBC sour...

2018-06-20 Thread maropu

Github user maropu commented on a diff in the pull request:

https://github.com/apache/spark/pull/21590#discussion_r196732502
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JDBCOptions.scala
 ---
@@ -65,13 +65,38 @@ class JDBCOptions(
   // Required parameters
   // 
   require(parameters.isDefinedAt(JDBC_URL), s"Option '$JDBC_URL' is 
required.")
-  require(parameters.isDefinedAt(JDBC_TABLE_NAME), s"Option 
'$JDBC_TABLE_NAME' is required.")
+
   // a JDBC URL
   val url = parameters(JDBC_URL)
-  // name of table
-  val table = parameters(JDBC_TABLE_NAME)
+  val tableName = parameters.get(JDBC_TABLE_NAME)
+  val query = parameters.get(JDBC_QUERY_STRING)
+  // Following two conditions make sure that :
+  // 1. One of the option (dbtable or query) must be specified.
+  // 2. Both of them can not be specified at the same time as they are 
conflicting in nature.
+  require(
+tableName.isDefined || query.isDefined,
+s"Option '$JDBC_TABLE_NAME' or '${JDBC_QUERY_STRING}' is required."
+  )
+
+  require(
+!(tableName.isDefined && query.isDefined),
+s"Both '$JDBC_TABLE_NAME' and '$JDBC_QUERY_STRING' can not be 
specified."
+  )
+
+  // table name or a table expression.
+  val tableExpression = tableName.map(_.trim).getOrElse {
+// We have ensured in the code above that either dbtable or query is 
specified.
+query.get match {
+  case subq if subq.nonEmpty => s"(${subq}) 
spark_gen_${curId.getAndIncrement()}"
--- End diff --

yea, simpler, better. Btw, we essentially need the alias name for a query? 
When we currently describe  a query in `dbtable`, it seems we don't need the 
name? 
https://github.com/apache/spark/blob/bc111463a766a5619966a282fbe0fec991088ceb/sql/core/src/test/scala/org/apache/spark/sql/jdbc/JDBCSuite.scala#L1196


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21590: [SPARK-24423][SQL] Add a new option for JDBC sour...

2018-06-20 Thread gengliangwang

Github user gengliangwang commented on a diff in the pull request:

https://github.com/apache/spark/pull/21590#discussion_r196658307
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JDBCOptions.scala
 ---
@@ -65,13 +65,38 @@ class JDBCOptions(
   // Required parameters
   // 
   require(parameters.isDefinedAt(JDBC_URL), s"Option '$JDBC_URL' is 
required.")
-  require(parameters.isDefinedAt(JDBC_TABLE_NAME), s"Option 
'$JDBC_TABLE_NAME' is required.")
+
   // a JDBC URL
   val url = parameters(JDBC_URL)
-  // name of table
-  val table = parameters(JDBC_TABLE_NAME)
+  val tableName = parameters.get(JDBC_TABLE_NAME)
--- End diff --

Personally I prefer:
```
  val tableExpression = if (parameters.isDefinedAt(JDBC_TABLE_NAME)) {
require(!parameters.isDefinedAt(JDBC_QUERY_STRING), "...")
parameters.get(JDBC_TABLE_NAME).get.trim
  } else {
require(parameters.isDefinedAt(JDBC_QUERY_STRING), "...")
s"(${parameters.get(JDBC_QUERY_STRING)}) ${curId.getAndIncrement()}"
  }
```


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21590: [SPARK-24423][SQL] Add a new option for JDBC sour...

2018-06-20 Thread gengliangwang

Github user gengliangwang commented on a diff in the pull request:

https://github.com/apache/spark/pull/21590#discussion_r196657947
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JDBCOptions.scala
 ---
@@ -65,13 +65,38 @@ class JDBCOptions(
   // Required parameters
   // 
   require(parameters.isDefinedAt(JDBC_URL), s"Option '$JDBC_URL' is 
required.")
-  require(parameters.isDefinedAt(JDBC_TABLE_NAME), s"Option 
'$JDBC_TABLE_NAME' is required.")
+
   // a JDBC URL
   val url = parameters(JDBC_URL)
-  // name of table
-  val table = parameters(JDBC_TABLE_NAME)
+  val tableName = parameters.get(JDBC_TABLE_NAME)
+  val query = parameters.get(JDBC_QUERY_STRING)
+  // Following two conditions make sure that :
+  // 1. One of the option (dbtable or query) must be specified.
+  // 2. Both of them can not be specified at the same time as they are 
conflicting in nature.
+  require(
+tableName.isDefined || query.isDefined,
+s"Option '$JDBC_TABLE_NAME' or '${JDBC_QUERY_STRING}' is required."
+  )
+
+  require(
+!(tableName.isDefined && query.isDefined),
+s"Both '$JDBC_TABLE_NAME' and '$JDBC_QUERY_STRING' can not be 
specified."
+  )
+
+  // table name or a table expression.
+  val tableExpression = tableName.map(_.trim).getOrElse {
+// We have ensured in the code above that either dbtable or query is 
specified.
+query.get match {
+  case subq if subq.nonEmpty => s"(${subq}) 
spark_gen_${curId.getAndIncrement()}"
+  case subq => subq
+}
+  }
+
+  require(tableExpression.nonEmpty,
--- End diff --

The error check and error message here are confusing. It seems telling user 
that the two options can be both specified.
Maybe we should just check the defined one and improve the error message.




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21590: [SPARK-24423][SQL] Add a new option for JDBC sour...

2018-06-19 Thread dilipbiswal

Github user dilipbiswal commented on a diff in the pull request:

https://github.com/apache/spark/pull/21590#discussion_r196634511
  
--- Diff: sql/core/src/test/scala/org/apache/spark/sql/jdbc/JDBCSuite.scala 
---
@@ -1206,4 +1207,92 @@ class JDBCSuite extends SparkFunSuite
 }.getMessage
 assert(errMsg.contains("Statement was canceled or the session timed 
out"))
   }
+
+  test("query JDBC option - negative tests") {
+val query = "SELECT * FROM  test.people WHERE theid = 1"
+// load path
+val e1 = intercept[RuntimeException] {
+  val df = spark.read.format("jdbc")
+.option("Url", urlWithUserAndPass)
+.option("query", query)
--- End diff --

@HyukjinKwon Thanks.. I will update the doc.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21590: [SPARK-24423][SQL] Add a new option for JDBC sour...

2018-06-19 Thread HyukjinKwon

Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/21590#discussion_r196623246
  
--- Diff: sql/core/src/test/scala/org/apache/spark/sql/jdbc/JDBCSuite.scala 
---
@@ -1206,4 +1207,92 @@ class JDBCSuite extends SparkFunSuite
 }.getMessage
 assert(errMsg.contains("Statement was canceled or the session timed 
out"))
   }
+
+  test("query JDBC option - negative tests") {
+val query = "SELECT * FROM  test.people WHERE theid = 1"
+// load path
+val e1 = intercept[RuntimeException] {
+  val df = spark.read.format("jdbc")
+.option("Url", urlWithUserAndPass)
+.option("query", query)
--- End diff --

shall we update 
https://github.com/apache/spark/blob/master/docs/sql-programming-guide.md#jdbc-to-other-databases
 too?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21590: [SPARK-24423][SQL] Add a new option for JDBC sour...

2018-06-19 Thread dilipbiswal

Github user dilipbiswal commented on a diff in the pull request:

https://github.com/apache/spark/pull/21590#discussion_r196490214
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JDBCOptions.scala
 ---
@@ -65,13 +65,38 @@ class JDBCOptions(
   // Required parameters
   // 
   require(parameters.isDefinedAt(JDBC_URL), s"Option '$JDBC_URL' is 
required.")
-  require(parameters.isDefinedAt(JDBC_TABLE_NAME), s"Option 
'$JDBC_TABLE_NAME' is required.")
+
   // a JDBC URL
   val url = parameters(JDBC_URL)
-  // name of table
-  val table = parameters(JDBC_TABLE_NAME)
+  val tableName = parameters.get(JDBC_TABLE_NAME)
+  val query = parameters.get(JDBC_QUERY_STRING)
+  // Following two conditions make sure that :
+  // 1. One of the option (dbtable or query) must be specified.
+  // 2. Both of them can not be specified at the same time as they are 
conflicting in nature.
+  require(
+tableName.isDefined || query.isDefined,
+s"Option '$JDBC_TABLE_NAME' or '${JDBC_QUERY_STRING}' is required."
+  )
+
+  require(
+!(tableName.isDefined && query.isDefined),
+s"Both '$JDBC_TABLE_NAME' and '$JDBC_QUERY_STRING' can not be 
specified."
+  )
+
+  // table name or a table expression.
+  val tableExpression = tableName.map(_.trim).getOrElse {
+// We have ensured in the code above that either dbtable or query is 
specified.
+query.get match {
+  case subq if subq.nonEmpty => s"(${subq}) 
spark_gen_${curId.getAndIncrement()}"
--- End diff --

@maropu Don't mind using a constant name ? "spark_gen_alias" ?



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21590: [SPARK-24423][SQL] Add a new option for JDBC sour...

2018-06-19 Thread dilipbiswal

Github user dilipbiswal commented on a diff in the pull request:

https://github.com/apache/spark/pull/21590#discussion_r196489749
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JDBCOptions.scala
 ---
@@ -109,6 +134,20 @@ class JDBCOptions(
 s"When reading JDBC data sources, users need to specify all or none 
for the following " +
   s"options: '$JDBC_PARTITION_COLUMN', '$JDBC_LOWER_BOUND', 
'$JDBC_UPPER_BOUND', " +
   s"and '$JDBC_NUM_PARTITIONS'")
+
+  require(!(query.isDefined && partitionColumn.isDefined),
+s"""
+   |Options '$JDBC_QUERY_STRING' and '$JDBC_PARTITION_COLUMN' can not 
be specified together.
+   |Please define the query using `$JDBC_TABLE_NAME` option instead 
and make sure to qualify
+   |the partition columns using the supplied subquery alias to resolve 
any ambiguity.
+   |Example :
+   |spark.read.format("jdbc")
+   |.option("dbtable", "(select c1, c2 from t1) as subq")
+   |.option("partitionColumn", "subq.c1"
+   |.load()
+ """.stripMargin
+  )
--- End diff --

@maropu So since the we auto generate a subquery alias here for easy of 
use, we r disallowing the query option together with partition columns. As 
users wouldn't know how to qualify the partition columns given the suquery 
alias is generated implicitly. In this case, we ask them to use the existing 
dbtable to specify the query where they are in control to specify the alias 
themselves. Another option i considered is to introduce "queryAlias" as another 
option. But thought to avoid it for simplicity.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21590: [SPARK-24423][SQL] Add a new option for JDBC sour...

2018-06-19 Thread dilipbiswal

Github user dilipbiswal commented on a diff in the pull request:

https://github.com/apache/spark/pull/21590#discussion_r196487627
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JDBCOptions.scala
 ---
@@ -65,13 +65,38 @@ class JDBCOptions(
   // Required parameters
   // 
   require(parameters.isDefinedAt(JDBC_URL), s"Option '$JDBC_URL' is 
required.")
-  require(parameters.isDefinedAt(JDBC_TABLE_NAME), s"Option 
'$JDBC_TABLE_NAME' is required.")
+
   // a JDBC URL
   val url = parameters(JDBC_URL)
-  // name of table
-  val table = parameters(JDBC_TABLE_NAME)
+  val tableName = parameters.get(JDBC_TABLE_NAME)
+  val query = parameters.get(JDBC_QUERY_STRING)
+  // Following two conditions make sure that :
+  // 1. One of the option (dbtable or query) must be specified.
+  // 2. Both of them can not be specified at the same time as they are 
conflicting in nature.
+  require(
+tableName.isDefined || query.isDefined,
+s"Option '$JDBC_TABLE_NAME' or '${JDBC_QUERY_STRING}' is required."
+  )
+
+  require(
+!(tableName.isDefined && query.isDefined),
+s"Both '$JDBC_TABLE_NAME' and '$JDBC_QUERY_STRING' can not be 
specified."
+  )
--- End diff --

@maropu These two requires are using tableName and query which is computed 
in lines before. Thats why i have placed these two requires after. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21590: [SPARK-24423][SQL] Add a new option for JDBC sour...

2018-06-19 Thread maropu

Github user maropu commented on a diff in the pull request:

https://github.com/apache/spark/pull/21590#discussion_r196407423
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JDBCOptions.scala
 ---
@@ -65,13 +65,38 @@ class JDBCOptions(
   // Required parameters
   // 
   require(parameters.isDefinedAt(JDBC_URL), s"Option '$JDBC_URL' is 
required.")
-  require(parameters.isDefinedAt(JDBC_TABLE_NAME), s"Option 
'$JDBC_TABLE_NAME' is required.")
+
   // a JDBC URL
   val url = parameters(JDBC_URL)
-  // name of table
-  val table = parameters(JDBC_TABLE_NAME)
+  val tableName = parameters.get(JDBC_TABLE_NAME)
+  val query = parameters.get(JDBC_QUERY_STRING)
+  // Following two conditions make sure that :
+  // 1. One of the option (dbtable or query) must be specified.
+  // 2. Both of them can not be specified at the same time as they are 
conflicting in nature.
+  require(
+tableName.isDefined || query.isDefined,
+s"Option '$JDBC_TABLE_NAME' or '${JDBC_QUERY_STRING}' is required."
+  )
+
+  require(
+!(tableName.isDefined && query.isDefined),
+s"Both '$JDBC_TABLE_NAME' and '$JDBC_QUERY_STRING' can not be 
specified."
+  )
+
+  // table name or a table expression.
+  val tableExpression = tableName.map(_.trim).getOrElse {
+// We have ensured in the code above that either dbtable or query is 
specified.
+query.get match {
+  case subq if subq.nonEmpty => s"(${subq}) 
spark_gen_${curId.getAndIncrement()}"
--- End diff --

We need `curId`? A constant name is bad?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21590: [SPARK-24423][SQL] Add a new option for JDBC sour...

2018-06-19 Thread maropu

Github user maropu commented on a diff in the pull request:

https://github.com/apache/spark/pull/21590#discussion_r196406329
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JDBCOptions.scala
 ---
@@ -109,6 +134,20 @@ class JDBCOptions(
 s"When reading JDBC data sources, users need to specify all or none 
for the following " +
   s"options: '$JDBC_PARTITION_COLUMN', '$JDBC_LOWER_BOUND', 
'$JDBC_UPPER_BOUND', " +
   s"and '$JDBC_NUM_PARTITIONS'")
+
+  require(!(query.isDefined && partitionColumn.isDefined),
+s"""
+   |Options '$JDBC_QUERY_STRING' and '$JDBC_PARTITION_COLUMN' can not 
be specified together.
+   |Please define the query using `$JDBC_TABLE_NAME` option instead 
and make sure to qualify
+   |the partition columns using the supplied subquery alias to resolve 
any ambiguity.
+   |Example :
+   |spark.read.format("jdbc")
+   |.option("dbtable", "(select c1, c2 from t1) as subq")
+   |.option("partitionColumn", "subq.c1"
+   |.load()
+ """.stripMargin
+  )
--- End diff --

Can't we define these two parameters simultaneously? How about the case 
where columns in query output have a partition column? 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21590: [SPARK-24423][SQL] Add a new option for JDBC sour...

2018-06-19 Thread maropu

Github user maropu commented on a diff in the pull request:

https://github.com/apache/spark/pull/21590#discussion_r196403496
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JDBCOptions.scala
 ---
@@ -65,13 +65,38 @@ class JDBCOptions(
   // Required parameters
   // 
   require(parameters.isDefinedAt(JDBC_URL), s"Option '$JDBC_URL' is 
required.")
-  require(parameters.isDefinedAt(JDBC_TABLE_NAME), s"Option 
'$JDBC_TABLE_NAME' is required.")
+
   // a JDBC URL
   val url = parameters(JDBC_URL)
-  // name of table
-  val table = parameters(JDBC_TABLE_NAME)
+  val tableName = parameters.get(JDBC_TABLE_NAME)
+  val query = parameters.get(JDBC_QUERY_STRING)
+  // Following two conditions make sure that :
+  // 1. One of the option (dbtable or query) must be specified.
+  // 2. Both of them can not be specified at the same time as they are 
conflicting in nature.
+  require(
+tableName.isDefined || query.isDefined,
+s"Option '$JDBC_TABLE_NAME' or '${JDBC_QUERY_STRING}' is required."
+  )
+
+  require(
+!(tableName.isDefined && query.isDefined),
+s"Both '$JDBC_TABLE_NAME' and '$JDBC_QUERY_STRING' can not be 
specified."
+  )
--- End diff --

Can you put these `require` in the head of this section (line 68)?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21590: [SPARK-24423][SQL] Add a new option for JDBC sour...

2018-06-19 Thread maropu

Github user maropu commented on a diff in the pull request:

https://github.com/apache/spark/pull/21590#discussion_r196402958
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JDBCOptions.scala
 ---
@@ -65,13 +65,38 @@ class JDBCOptions(
   // Required parameters
   // 
   require(parameters.isDefinedAt(JDBC_URL), s"Option '$JDBC_URL' is 
required.")
-  require(parameters.isDefinedAt(JDBC_TABLE_NAME), s"Option 
'$JDBC_TABLE_NAME' is required.")
+
   // a JDBC URL
   val url = parameters(JDBC_URL)
-  // name of table
-  val table = parameters(JDBC_TABLE_NAME)
+  val tableName = parameters.get(JDBC_TABLE_NAME)
+  val query = parameters.get(JDBC_QUERY_STRING)
+  // Following two conditions make sure that :
+  // 1. One of the option (dbtable or query) must be specified.
+  // 2. Both of them can not be specified at the same time as they are 
conflicting in nature.
+  require(
+tableName.isDefined || query.isDefined,
+s"Option '$JDBC_TABLE_NAME' or '${JDBC_QUERY_STRING}' is required."
+  )
--- End diff --

Probably, it'd better to put these `require`s in the `Required parameters` 
section.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21590: [SPARK-24423][SQL] Add a new option for JDBC sour...

2018-06-19 Thread dilipbiswal

GitHub user dilipbiswal opened a pull request:

https://github.com/apache/spark/pull/21590

[SPARK-24423][SQL] Add a new option  for JDBC sources

## What changes were proposed in this pull request?
Here is the description in the JIRA -

Currently, our JDBC connector provides the option `dbtable` for users to 
specify the to-be-loaded JDBC source table. 

 ```SQL
 val jdbcDf = spark.read
   .format("jdbc")
   .option("*dbtable*", "dbName.tableName")
   .options(jdbcCredentials: Map)
   .load()
 ```

Normally, users do not fetch the whole JDBC table due to the poor 
performance/throughput of JDBC. Thus, they normally just fetch a small set of 
tables. For advanced users, they can pass a subquery as the option.   

 ```SQL
 val query = """ (select * from tableName limit 10) as tmp """
 val jdbcDf = spark.read
   .format("jdbc")
   .option("*dbtable*", query)
   .options(jdbcCredentials: Map)
   .load()
 ```
However, this is straightforward to end users. We should simply allow users 
to specify the query by a new option `query`. We will handle the complexity for 
them. 

 ```SQL
 val query = """select * from tableName limit 10"""
 val jdbcDf = spark.read
   .format("jdbc")
   .option("*{color:#ff}query{color}*", query)
   .options(jdbcCredentials: Map)
   .load()
```

## How was this patch tested?
Added tests in JDBCSuite and JDBCWriterSuite.
Also tested against MySQL, Postgress, Oracle, DB2 (using docker 
infrastructure) to make sure there are no syntax issues.



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/dilipbiswal/spark SPARK-24423

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/21590.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #21590






---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

63 matches

Mail list logo