[GitHub] spark pull request #22461: [SPARK-25453][SQL][TEST] OracleIntegrationSuite I...

2018-09-30 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/22461


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22461: [SPARK-25453][SQL][TEST] OracleIntegrationSuite I...

2018-09-28 Thread seancxmao
Github user seancxmao commented on a diff in the pull request:

https://github.com/apache/spark/pull/22461#discussion_r221411618
  
--- Diff: docs/sql-programming-guide.md ---
@@ -1489,7 +1489,7 @@ See the [Apache Avro Data Source 
Guide](avro-data-source-guide.html).
 
  * The JDBC driver class must be visible to the primordial class loader on 
the client session and on all executors. This is because Java's DriverManager 
class does a security check that results in it ignoring all drivers not visible 
to the primordial class loader when one goes to open a connection. One 
convenient way to do this is to modify compute_classpath.sh on all worker nodes 
to include your driver JARs.
  * Some databases, such as H2, convert all names to upper case. You'll 
need to use upper case to refer to those names in Spark SQL.
-
+ * Users can specify vendor-specific JDBC connection properties in the 
data source options to do special treatment. For example, 
`spark.read.format("jdbc").option("url", 
oracleJdbcUrl).option("oracle.jdbc.mapDateToTimestamp", "false")`. 
`oracle.jdbc.mapDateToTimestamp` defaults to true, users often need to disable 
this flag to avoid Oracle date being resolved as timestamp.
--- End diff --

@maropu @gatorsmile Could you please suggest something else to do here?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22461: [SPARK-25453][SQL][TEST] OracleIntegrationSuite I...

2018-09-21 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/22461#discussion_r219546151
  
--- Diff: docs/sql-programming-guide.md ---
@@ -1489,7 +1489,7 @@ See the [Apache Avro Data Source 
Guide](avro-data-source-guide.html).
 
  * The JDBC driver class must be visible to the primordial class loader on 
the client session and on all executors. This is because Java's DriverManager 
class does a security check that results in it ignoring all drivers not visible 
to the primordial class loader when one goes to open a connection. One 
convenient way to do this is to modify compute_classpath.sh on all worker nodes 
to include your driver JARs.
  * Some databases, such as H2, convert all names to upper case. You'll 
need to use upper case to refer to those names in Spark SQL.
-
+ * Users can specify vendor-specific JDBC connection properties in the 
data source options to do special treatment. For example, 
`spark.read.format("jdbc").option("url", 
oracleJdbcUrl).option("oracle.jdbc.mapDateToTimestamp", "false")`. 
`oracle.jdbc.mapDateToTimestamp` defaults to true, users often need to disable 
this flag to avoid Oracle date being resolved as timestamp.
--- End diff --

This looks fine to me. 

@maropu Your idea is great! We need more examples in the troubleshooting. 
Currently, our document for Spark SQL and Core needs a major update.  Maybe we 
can do it after we finish the reorg of the documentation. 
https://issues.apache.org/jira/browse/SPARK-24499? 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22461: [SPARK-25453][SQL][TEST] OracleIntegrationSuite I...

2018-09-21 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/22461#discussion_r219542061
  
--- Diff: 
external/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/OracleIntegrationSuite.scala
 ---
@@ -462,6 +464,9 @@ class OracleIntegrationSuite extends 
DockerJDBCIntegrationSuite with SharedSQLCo
   .option("lowerBound", "2018-07-04 03:30:00.0")
   .option("upperBound", "2018-07-27 14:11:05.0")
   .option("numPartitions", 2)
+  .option("oracle.jdbc.mapDateToTimestamp", "false")
--- End diff --

I see. We still have a date column in the input dataframe. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22461: [SPARK-25453][SQL][TEST] OracleIntegrationSuite I...

2018-09-21 Thread maropu
Github user maropu commented on a diff in the pull request:

https://github.com/apache/spark/pull/22461#discussion_r219540966
  
--- Diff: docs/sql-programming-guide.md ---
@@ -1489,7 +1489,7 @@ See the [Apache Avro Data Source 
Guide](avro-data-source-guide.html).
 
  * The JDBC driver class must be visible to the primordial class loader on 
the client session and on all executors. This is because Java's DriverManager 
class does a security check that results in it ignoring all drivers not visible 
to the primordial class loader when one goes to open a connection. One 
convenient way to do this is to modify compute_classpath.sh on all worker nodes 
to include your driver JARs.
  * Some databases, such as H2, convert all names to upper case. You'll 
need to use upper case to refer to those names in Spark SQL.
-
+ * Users can specify vendor-specific JDBC connection properties in the 
data source options to do special treatment. For example, 
`spark.read.format("jdbc").option("url", 
oracleJdbcUrl).option("oracle.jdbc.mapDateToTimestamp", "false")`. 
`oracle.jdbc.mapDateToTimestamp` defaults to true, users often need to disable 
this flag to avoid Oracle date being resolved as timestamp.
--- End diff --

Ur.. I see. This section is troubleshooting, so I think you'd be better to 
illustrate a concrete trouble case, then you describe a solution for that.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22461: [SPARK-25453][SQL][TEST] OracleIntegrationSuite I...

2018-09-21 Thread seancxmao
Github user seancxmao commented on a diff in the pull request:

https://github.com/apache/spark/pull/22461#discussion_r219539062
  
--- Diff: docs/sql-programming-guide.md ---
@@ -1287,8 +1287,18 @@ bin/spark-shell --driver-class-path 
postgresql-9.4.1207.jar --jars postgresql-9.
 Tables from the remote database can be loaded as a DataFrame or Spark SQL 
temporary view using
 the Data Sources API. Users can specify the JDBC connection properties in 
the data source options.
 user and password are normally provided as 
connection properties for
-logging into the data sources. In addition to the connection properties, 
Spark also supports
-the following case-insensitive options:
+logging into the data sources. Vendor-specific connection properties can 
also be passed to the
+underlying JDBC driver in the same way. For example:
+
+{% highlight scala %}
+// oracle.jdbc.mapDateToTimestamp defaults to true. If this flag is not 
disabled, a column of Oracle
+// DATE type will be resolved as Catalyst TimestampType, which is probably 
not the desired behavior.
+spark.read.format("jdbc")
+  .option("url", oracleJdbcUrl)
+  .option("oracle.jdbc.mapDateToTimestamp", "false")
+{% endhighlight %}
+
--- End diff --

I have moved this description to `Troubleshooting` section. I also tried to 
brush up the description. Writing good documentation is sometimes difficult 
than writing code. really need your help :)


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22461: [SPARK-25453][SQL][TEST] OracleIntegrationSuite I...

2018-09-21 Thread seancxmao
Github user seancxmao commented on a diff in the pull request:

https://github.com/apache/spark/pull/22461#discussion_r219537942
  
--- Diff: 
external/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/OracleIntegrationSuite.scala
 ---
@@ -462,6 +468,12 @@ class OracleIntegrationSuite extends 
DockerJDBCIntegrationSuite with SharedSQLCo
   .option("lowerBound", "2018-07-04 03:30:00.0")
   .option("upperBound", "2018-07-27 14:11:05.0")
   .option("numPartitions", 2)
+  // oracle.jdbc.mapDateToTimestamp defaults to true. If this flag is 
not disabled, column d
+  // (Oracle DATE) will be resolved as Catalyst Timestamp, which will 
fail test result
+  // comparison. E.g. Expected 2018-07-06 while Actual 2018-07-06 
00:00:00.0.
--- End diff --

OK, I will remove duplicate description.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22461: [SPARK-25453][SQL][TEST] OracleIntegrationSuite I...

2018-09-21 Thread maropu
Github user maropu commented on a diff in the pull request:

https://github.com/apache/spark/pull/22461#discussion_r219522094
  
--- Diff: docs/sql-programming-guide.md ---
@@ -1287,8 +1287,18 @@ bin/spark-shell --driver-class-path 
postgresql-9.4.1207.jar --jars postgresql-9.
 Tables from the remote database can be loaded as a DataFrame or Spark SQL 
temporary view using
 the Data Sources API. Users can specify the JDBC connection properties in 
the data source options.
 user and password are normally provided as 
connection properties for
-logging into the data sources. In addition to the connection properties, 
Spark also supports
-the following case-insensitive options:
+logging into the data sources. Vendor-specific connection properties can 
also be passed to the
+underlying JDBC driver in the same way. For example:
+
+{% highlight scala %}
+// oracle.jdbc.mapDateToTimestamp defaults to true. If this flag is not 
disabled, a column of Oracle
+// DATE type will be resolved as Catalyst TimestampType, which is probably 
not the desired behavior.
+spark.read.format("jdbc")
+  .option("url", oracleJdbcUrl)
+  .option("oracle.jdbc.mapDateToTimestamp", "false")
+{% endhighlight %}
+
--- End diff --

Also, you'd be better to brush up the description more to make users 
understood easily...


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22461: [SPARK-25453][SQL][TEST] OracleIntegrationSuite I...

2018-09-21 Thread maropu
Github user maropu commented on a diff in the pull request:

https://github.com/apache/spark/pull/22461#discussion_r219520789
  
--- Diff: docs/sql-programming-guide.md ---
@@ -1287,8 +1287,18 @@ bin/spark-shell --driver-class-path 
postgresql-9.4.1207.jar --jars postgresql-9.
 Tables from the remote database can be loaded as a DataFrame or Spark SQL 
temporary view using
 the Data Sources API. Users can specify the JDBC connection properties in 
the data source options.
 user and password are normally provided as 
connection properties for
-logging into the data sources. In addition to the connection properties, 
Spark also supports
-the following case-insensitive options:
+logging into the data sources. Vendor-specific connection properties can 
also be passed to the
+underlying JDBC driver in the same way. For example:
+
+{% highlight scala %}
+// oracle.jdbc.mapDateToTimestamp defaults to true. If this flag is not 
disabled, a column of Oracle
+// DATE type will be resolved as Catalyst TimestampType, which is probably 
not the desired behavior.
+spark.read.format("jdbc")
+  .option("url", oracleJdbcUrl)
+  .option("oracle.jdbc.mapDateToTimestamp", "false")
+{% endhighlight %}
+
--- End diff --

Probably, I thinks its better to put this description in the 
`Troubleshooting` section: 
https://spark.apache.org/docs/2.3.1/sql-programming-guide.html#troubleshooting


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22461: [SPARK-25453][SQL][TEST] OracleIntegrationSuite I...

2018-09-21 Thread maropu
Github user maropu commented on a diff in the pull request:

https://github.com/apache/spark/pull/22461#discussion_r219518705
  
--- Diff: 
external/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/OracleIntegrationSuite.scala
 ---
@@ -462,6 +468,12 @@ class OracleIntegrationSuite extends 
DockerJDBCIntegrationSuite with SharedSQLCo
   .option("lowerBound", "2018-07-04 03:30:00.0")
   .option("upperBound", "2018-07-27 14:11:05.0")
   .option("numPartitions", 2)
+  // oracle.jdbc.mapDateToTimestamp defaults to true. If this flag is 
not disabled, column d
+  // (Oracle DATE) will be resolved as Catalyst Timestamp, which will 
fail test result
+  // comparison. E.g. Expected 2018-07-06 while Actual 2018-07-06 
00:00:00.0.
--- End diff --

I thinks we don't need this duplicate description.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22461: [SPARK-25453][SQL][TEST] OracleIntegrationSuite I...

2018-09-21 Thread maropu
Github user maropu commented on a diff in the pull request:

https://github.com/apache/spark/pull/22461#discussion_r219518607
  
--- Diff: 
external/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/OracleIntegrationSuite.scala
 ---
@@ -462,6 +468,12 @@ class OracleIntegrationSuite extends 
DockerJDBCIntegrationSuite with SharedSQLCo
   .option("lowerBound", "2018-07-04 03:30:00.0")
   .option("upperBound", "2018-07-27 14:11:05.0")
   .option("numPartitions", 2)
+  // oracle.jdbc.mapDateToTimestamp defaults to true. If this flag is 
not disabled, column d
+  // (Oracle DATE) will be resolved as Catalyst Timestamp, which will 
fail test result
+  // comparison. E.g. Expected 2018-07-06 while Actual 2018-07-06 
00:00:00.0.
+  .option("oracle.jdbc.mapDateToTimestamp", "false")
+  .option("sessionInitStatement",
+"ALTER SESSION SET NLS_TIMESTAMP_FORMAT = '-MM-DD 
HH24:MI:SS.FF'")
--- End diff --

I thinks we don't need this duplicate description.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22461: [SPARK-25453][SQL][TEST] OracleIntegrationSuite I...

2018-09-21 Thread seancxmao
Github user seancxmao commented on a diff in the pull request:

https://github.com/apache/spark/pull/22461#discussion_r219403696
  
--- Diff: 
external/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/OracleIntegrationSuite.scala
 ---
@@ -462,6 +464,9 @@ class OracleIntegrationSuite extends 
DockerJDBCIntegrationSuite with SharedSQLCo
   .option("lowerBound", "2018-07-04 03:30:00.0")
   .option("upperBound", "2018-07-27 14:11:05.0")
   .option("numPartitions", 2)
+  .option("oracle.jdbc.mapDateToTimestamp", "false")
--- End diff --

Yes, we need this. Otherwise, Spark will read column `d` values as Catalyst 
type timestamp, which will fail the test.

https://user-images.githubusercontent.com/12194089/45865915-9e730800-bdb1-11e8-9a42-a1394c601166.png;>



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22461: [SPARK-25453][SQL][TEST] OracleIntegrationSuite I...

2018-09-21 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/22461#discussion_r219392779
  
--- Diff: 
external/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/OracleIntegrationSuite.scala
 ---
@@ -462,6 +464,9 @@ class OracleIntegrationSuite extends 
DockerJDBCIntegrationSuite with SharedSQLCo
   .option("lowerBound", "2018-07-04 03:30:00.0")
   .option("upperBound", "2018-07-27 14:11:05.0")
   .option("numPartitions", 2)
+  .option("oracle.jdbc.mapDateToTimestamp", "false")
--- End diff --

Do we need this line?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22461: [SPARK-25453][SQL][TEST] OracleIntegrationSuite I...

2018-09-20 Thread seancxmao
Github user seancxmao commented on a diff in the pull request:

https://github.com/apache/spark/pull/22461#discussion_r219386919
  
--- Diff: 
external/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/OracleIntegrationSuite.scala
 ---
@@ -442,6 +442,8 @@ class OracleIntegrationSuite extends 
DockerJDBCIntegrationSuite with SharedSQLCo
   .option("lowerBound", "2018-07-06")
   .option("upperBound", "2018-07-20")
   .option("numPartitions", 3)
+  .option("oracle.jdbc.mapDateToTimestamp", "false")
--- End diff --

ok. I will add notes to 
http://spark.apache.org/docs/latest/sql-programming-guide.html#jdbc-to-other-databases,
 and will also add comments to the code.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22461: [SPARK-25453][SQL][TEST] OracleIntegrationSuite I...

2018-09-20 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/22461#discussion_r219359831
  
--- Diff: 
external/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/OracleIntegrationSuite.scala
 ---
@@ -442,6 +442,8 @@ class OracleIntegrationSuite extends 
DockerJDBCIntegrationSuite with SharedSQLCo
   .option("lowerBound", "2018-07-06")
   .option("upperBound", "2018-07-20")
   .option("numPartitions", 3)
+  .option("oracle.jdbc.mapDateToTimestamp", "false")
--- End diff --

Also leave comments before line 445 to explain why we need to do this. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22461: [SPARK-25453][SQL][TEST] OracleIntegrationSuite I...

2018-09-20 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/22461#discussion_r219359638
  
--- Diff: 
external/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/OracleIntegrationSuite.scala
 ---
@@ -442,6 +442,8 @@ class OracleIntegrationSuite extends 
DockerJDBCIntegrationSuite with SharedSQLCo
   .option("lowerBound", "2018-07-06")
   .option("upperBound", "2018-07-20")
   .option("numPartitions", 3)
+  .option("oracle.jdbc.mapDateToTimestamp", "false")
--- End diff --

This is good to know. Could you help update the doc and explain we can pass 
the options to the underlying JDBC and also give some examples?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org