Re: [PR] [SPARK-50793][SQL] Fix MySQL cast function for DOUBLE, LONGTEXT, BIGINT and BLOB types [spark]

via GitHub Wed, 26 Feb 2025 00:35:11 -0800


beliefer commented on code in PR #49453:
URL: https://github.com/apache/spark/pull/49453#discussion_r1971150984



##########
connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/v2/MySQLIntegrationSuite.scala:
##########
@@ -241,6 +241,84 @@ class MySQLIntegrationSuite extends 
DockerJDBCIntegrationV2Suite with V2JDBCTest
     assert(rows10(0).getString(0) === "amy")
     assert(rows10(1).getString(0) === "alex")
   }
+
+  // MySQL Connector/J uses collation 'utf8mb4_0900_ai_ci' as collation for 
connection.
+  // The MySQL server 9.1.0 uses collation 'utf8mb4_0900_ai_ci' for database 
by default.
+  // This method uses string colume directly as the result of cast has the 
same collation.
+  def testCastStringTarget(stringLiteral: String, stringCol: String): String = 
stringCol
+
+  test("SPARK-50793: MySQL JDBC Connector failed to cast some types") {
+    val tableName = catalogName + ".test_cast_function"
+    withTable(tableName) {
+      val stringValue = "0"
+      val stringLiteral = "'0'"
+      val stringCol = "string_col"
+      val longValue = 0L
+      val longCol = "long_col"
+      val binaryValue = Array[Byte](0x30)
+      val binaryLiteral = "x'30'"
+      val binaryCol = "binary_col"
+      val doubleValue = 0.0
+      val doubleLiteral = "0.0"
+      val doubleCol = "double_col"
+      // CREATE table to use types defined in Spark SQL
+      sql(
+        s"CREATE TABLE $tableName ($stringCol STRING, $longCol LONG, " +
+          s"$binaryCol BINARY, $doubleCol DOUBLE)")
+      sql(
+        s"INSERT INTO $tableName VALUES($stringLiteral, $longValue, 
$binaryLiteral, $doubleValue)")
+
+      def testCast(
+          castType: String,
+          sourceCol: String,
+          targetCol: String,
+          targetDataType: DataType,
+          targetValue: Any): Unit = {
+        val sql = s"SELECT CAST($sourceCol AS $castType) AS target " +
+          s"FROM $tableName WHERE CAST($sourceCol AS $castType) = $targetCol"
+        val df = spark.sql(sql)

Review Comment:
   I think we should create the test table with the MySQL DDL syntax. For 
example,
   ```
   CREATE TABLE t
   (
       c1 VARCHAR(20) CHARACTER SET utf8mb4,
       c2 TEXT CHARACTER SET latin1 COLLATE latin1_general_cs
   );
   ```
   And then let `MySQLDialect.getCatalystType` supports convert JDBC collation 
to Spark collation and `MySQLDialect.getJDBCType` supports convert Spark 
collation to JDBC collation.
   At last we test with
   `select * from t where c2 = 'abc'` in Spark. In fact, we will push down the 
query
    `select * from t where c2 = latin1'abc' COLLATE latin1_general_cs` to MySQL.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] [SPARK-50793][SQL] Fix MySQL cast function for DOUBLE, LONGTEXT, BIGINT and BLOB types [spark]

Reply via email to