Float type mappings

GitBox Fri, 12 Jul 2019 12:45:01 -0700

shivsood opened a new pull request #24969: [SPARK-28151][SQL] Fix
MsSqlServerDialect Byte/Short/Float type mappings
URL: https://github.com/apache/spark/pull/24969

Fix for ByteType, ShortType and FloatTypes are not correctly mapped for
read/write of SQLServer tables
###ByteType issue
Writing dataframe with column type BYTETYPE fails when using JDBC connector
for SQL Server. Append and Read of tables also fail. The problem is due

1. (Write path) Incorrect mapping of BYTETYPE in getCommonJDBCType() in
jdbcutils.scala where BYTETYPE gets mapped to BYTE text. It should be mapped to
TINYINT
case ByteType => Option(JdbcType("BYTE", java.sql.Types.TINYINT))

In getCatalystType() ( JDBC to Catalyst type mapping) TINYINT is mapped to
INTEGER, while it should be mapped to BYTETYPE. Mapping to integer is ok from
the point of view of upcasting, but will lead to 4 byte allocation rather than
1 byte for BYTETYPE.

2. (read path) Read path ends up calling makeGetter(dt: DataType, metadata:
Metadata). The function sets the value in RDD row. The value is set per the
data type. Here there is no mapping for BYTETYPE and thus results will result
in an error when getCatalystType() is fixed.

Note : These issues were found when reading/writing with SQLServer. Will be
submitting a PR soon to fix these mappings in MSSQLServerDialect.

Error seen when writing table

(JDBC Write failed,com.microsoft.sqlserver.jdbc.SQLServerException: Column,
parameter, or variable #2: Cannot find data type BYTE.)
com.microsoft.sqlserver.jdbc.SQLServerException: Column, parameter, or
variable #2: Cannot find data type BYTE.

com.microsoft.sqlserver.jdbc.SQLServerException.makeFromDatabaseError(SQLServerException.java:254)

com.microsoft.sqlserver.jdbc.SQLServerStatement.getNextResult(SQLServerStatement.java:1608)

com.microsoft.sqlserver.jdbc.SQLServerStatement.doExecuteStatement(SQLServerStatement.java:859)
..

###ShortType and FloatType issue
ShortType and FloatTypes are not correctly mapped to right JDBC types when
using JDBC connector. This results in tables and spark data frame being created
with unintended types.

Some example issue

Write from df with column type results in a SQL table of with column type as
INTEGER as opposed to SMALLINT. Thus a larger table that expected.
read results in a dataframe with type INTEGER as opposed to ShortType

FloatTypes have a issue with read path. In the write path Spark data type
'FloatType' is correctly mapped to JDBC equivalent data type 'Real'. But in the
read path when JDBC data types need to be converted to Catalyst data types (
getCatalystType) 'Real' gets incorrectly gets mapped to 'DoubleType' rather
than 'FloatType'.

## What changes were proposed in this pull request?
(MsSqlServerDialect.scala ) BYTETYPE is mapped to "TINYINT", SHORTTYPE is
mapped to small int
(JdbcUtils.scala) reading ByteTYPE is added to makeGetter to enable reading
after ByteType is translated to TINYINT.

## How was this patch tested?
Unit test - Added and passed.
Integration Test - Tested end to end with SQLServer using JDBC connector.

(Please explain how this patch was tested. E.g. unit tests, integration
tests, manual tests)
(If this patch involves UI changes, please attach a screenshot; otherwise,
remove this)

Please review https://spark.apache.org/contributing.html before opening a
pull request.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] shivsood opened a new pull request #24969: [SPARK-28151][SQL] Fix MsSqlServerDialect Byte/Short/Float type mappings

Reply via email to