[GitHub] [spark] shivsood opened a new pull request #25344: [WIP][SPARK-2815][SQL] Mapped ByteType to TinyINT for MsSQLServerDialect

GitBox Fri, 02 Aug 2019 18:45:20 -0700

shivsood opened a new pull request #25344: [WIP][SPARK-2815][SQL] Mapped 
ByteType to TinyINT for MsSQLServerDialect
URL: https://github.com/apache/spark/pull/25344
 
 
   Writing dataframe with column type BYTETYPE fails when using JDBC connector 
for SQL Server. The problem is due to 
   -  (Write path) Incorrect mapping of BYTETYPE in getCommonJDBCType() in 
jdbcutils.scala where BYTETYPE gets mapped to BYTE text. It should be mapped to 
TINYINT
       case ByteType => Option(JdbcType("BYTE", java.sql.Types.TINYINT))
   
   In getCatalystType() ( JDBC to Catalyst type mapping) TINYINT is mapped to 
INTEGER, while it should be mapped to BYTETYPE. Mapping to integer is ok from 
the point of view of upcasting, but will lead to 4 byte allocation rather than 
1 byte for BYTETYPE.
   
   - (read path) Read path ends up calling makeGetter(dt: DataType, metadata: 
Metadata). The function sets the value in RDD row. The value is set per the 
data type. Here there is no mapping for BYTETYPE and thus results will result 
in an error when getCatalystType() is fixed.
   
   Note : These issues were found when reading/writing with SQLServer. 
   
   Error seen when writing table
   
   ```
   (JDBC Write failed,com.microsoft.sqlserver.jdbc.SQLServerException: Column, 
parameter, or variable #2: Cannot find data type BYTE.)
   com.microsoft.sqlserver.jdbc.SQLServerException: Column, parameter, or 
variable #2: Cannot find data type BYTE.
   
com.microsoft.sqlserver.jdbc.SQLServerException.makeFromDatabaseError(SQLServerException.java:254)
   
com.microsoft.sqlserver.jdbc.SQLServerStatement.getNextResult(SQLServerStatement.java:1608)
   
com.microsoft.sqlserver.jdbc.SQLServerStatement.doExecuteStatement(SQLServerStatement.java:859)
   ```
   
   ## What changes were proposed in this pull request?
   Change mapping of BYTETYPE from BYTE to TinyINT. 
   
   (Please fill in changes proposed in this fix)
   
   ## How was this patch tested?
   Path was tested with integration test by adding test case to 
MsSqlServerIntegrationSuite.scala.
   
   The fix fails with the following error when df.write is done with a 
dataframe what contains a ByteType.
   
   ```
   19/08/02 18:25:44 INFO Executor: Finished task 0.0 in stage 7.0 (TID 7). 
1197 bytes result sent to driver
   19/08/02 18:25:44 INFO TaskSetManager: Finished task 0.0 in stage 7.0 (TID 
7) in 43 ms on localhost (executor driver) (1/2)
   19/08/02 18:25:44 INFO CodeGenerator: Code generated in 14.586963 ms
   19/08/02 18:25:44 ERROR Executor: Exception in task 1.0 in stage 7.0 (TID 8)
   java.lang.RuntimeException: Error while encoding: 
java.lang.RuntimeException: java.lang.Integer is not a valid external type for 
schema of tinyint
   if (assertnotnull(input[0, org.apache.spark.sql.Row, true]).isNullAt) null 
else validateexternaltype(getexternalrowfield(assertnotnull(input[0, 
org.apache.spark.sql.Row, true]), 0, serialNum), ByteType) AS serialNum#231
        at 
org.apache.spark.sql.catalyst.encoders.ExpressionEncoder.toRow(ExpressionEncoder.scala:344)
        at 
org.apache.spark.sql.SparkSession.$anonfun$createDataFrame$1(SparkSession.scala:367)
        at scala.collection.Iterator$$anon$10.next(Iterator.scala:459)
        at 
org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown
 Source)
        at 
org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
        at 
org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$1.hasNext(WholeStageCodegenExec.scala:731)
        at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:458)
        at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:458)
        at 
org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.savePartition(JdbcUtils.scala:662)
        at 
org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.$anonfun$saveTable$1(JdbcUtils.scala:845)
   ```
   
   Please review https://spark.apache.org/contributing.html before opening a 
pull request.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] shivsood opened a new pull request #25344: [WIP][SPARK-2815][SQL] Mapped ByteType to TinyINT for MsSQLServerDialect

Reply via email to