(spark) branch master updated: [SPARK-47375][SQL] Add guidelines for timestamp mapping in `JdbcDialect#getCatalystType`

yao Wed, 13 Mar 2024 05:32:20 -0700

This is an automated email from the ASF dual-hosted git repository.

yao pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git



The following commit(s) were added to refs/heads/master by this push:
     new 6ea983511b62 [SPARK-47375][SQL] Add guidelines for timestamp mapping 
in `JdbcDialect#getCatalystType`
6ea983511b62 is described below

commit 6ea983511b62a6971142f4f98ff9077ba50b9a0b
Author: Kent Yao <y...@apache.org>
AuthorDate: Wed Mar 13 20:29:14 2024 +0800

    [SPARK-47375][SQL] Add guidelines for timestamp mapping in 
`JdbcDialect#getCatalystType`
    
    ### What changes were proposed in this pull request?
    
    This PR adds guidelines for mapping database timestamps to Spark SQL 
Timestamps through the JDBC Standard API and Spark JDBCDialects trait. The 
details of this PR can be viewed directly in the method descriptions, no more 
copying here
    
    ### Why are the changes needed?
    
    These guidelines can help us revise the built-in jdbc datasource later 
without controversies. It also encourages custom dialects to follow it。
    
    ### Does this PR introduce _any_ user-facing change?
    
    no, developer API doc changes
    
    ### How was this patch tested?
    
    doc build
    
    ### Was this patch authored or co-authored using generative AI tooling?
    no
    
    Closes #45496 from yaooqinn/SPARK-47375.
    
    Authored-by: Kent Yao <y...@apache.org>
    Signed-off-by: Kent Yao <y...@apache.org>
---
 .../org/apache/spark/sql/jdbc/JdbcDialects.scala   | 49 +++++++++++++++++++---
 1 file changed, 43 insertions(+), 6 deletions(-)

diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/jdbc/JdbcDialects.scala 
b/sql/core/src/main/scala/org/apache/spark/sql/jdbc/JdbcDialects.scala
index 6621282647d4..7c5e476d9786 100644
--- a/sql/core/src/main/scala/org/apache/spark/sql/jdbc/JdbcDialects.scala
+++ b/sql/core/src/main/scala/org/apache/spark/sql/jdbc/JdbcDialects.scala
@@ -89,12 +89,49 @@ abstract class JdbcDialect extends Serializable with 
Logging {
 
   /**
    * Get the custom datatype mapping for the given jdbc meta information.
-   * @param sqlType The sql type (see java.sql.Types)
-   * @param typeName The sql type name (e.g. "BIGINT UNSIGNED")
-   * @param size The size of the type.
-   * @param md Result metadata associated with this type.
-   * @return The actual DataType (subclasses of 
[[org.apache.spark.sql.types.DataType]])
-   *         or null if the default type mapping should be used.
+   *
+   * Guidelines for mapping database defined timestamps to Spark SQL 
timestamps:
+   * <ul>
+   *   <li>
+   *     TIMESTAMP WITHOUT TIME ZONE if preferTimestampNTZ ->
+   *     [[org.apache.spark.sql.types.TimestampNTZType]]
+   *   </li>
+   *   <li>
+   *     TIMESTAMP WITHOUT TIME ZONE if !preferTimestampNTZ ->
+   *     [[org.apache.spark.sql.types.TimestampType]](LTZ)
+   *   </li>
+   *   <li>TIMESTAMP WITH TIME ZONE -> 
[[org.apache.spark.sql.types.TimestampType]](LTZ)</li>
+   *   <li>TIMESTAMP WITH LOCAL TIME ZONE -> 
[[org.apache.spark.sql.types.TimestampType]](LTZ)</li>
+   *   <li>
+   *     If the TIMESTAMP cannot be distinguished by `sqlType` and `typeName`, 
preferTimestampNTZ
+   *     is respected for now, but we may need to add another option in the 
future if necessary.
+   *   </li>
+   * </ul>
+   *
+   * @param sqlType Refers to [[java.sql.Types]] constants, or other constants 
defined by the
+   *                target database, e.g. `-101` is Oracle's TIMESTAMP WITH 
TIME ZONE type.
+   *                This value is returned by 
[[java.sql.ResultSetMetaData#getColumnType]].
+   * @param typeName The column type name used by the database (e.g. "BIGINT 
UNSIGNED"). This is
+   *                 sometimes used to determine the target data type when 
`sqlType` is not
+   *                 sufficient if multiple database types are conflated into 
a single id.
+   *                 This value is returned by 
[[java.sql.ResultSetMetaData#getColumnTypeName]].
+   * @param size The size of the type, e.g. the maximum precision for numeric 
types, length for
+   *             character string, etc.
+   *             This value is returned by 
[[java.sql.ResultSetMetaData#getPrecision]].
+   * @param md Result metadata associated with this type. This contains 
additional information
+   *           from [[java.sql.ResultSetMetaData]] or user specified options.
+   *           <ul>
+   *             <li>
+   *               `isTimestampNTZ`: Whether read a TIMESTAMP WITHOUT TIME 
ZONE value as
+   *               [[org.apache.spark.sql.types.TimestampNTZType]] or not. 
This is configured by
+   *               `JDBCOptions.preferTimestampNTZ`.
+   *             </li>
+   *             <li>
+   *               `scale`: The length of fractional part 
[[java.sql.ResultSetMetaData#getScale]]
+   *             </li>
+   *            </ul>
+   * @return An option the actual DataType (subclasses of 
[[org.apache.spark.sql.types.DataType]])
+   *         or None if the default type mapping should be used.
    */
   def getCatalystType(
       sqlType: Int, typeName: String, size: Int, md: MetadataBuilder): 
Option[DataType] = None


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

(spark) branch master updated: [SPARK-47375][SQL] Add guidelines for timestamp mapping in `JdbcDialect#getCatalystType`

Reply via email to