sunxiaoguang commented on code in PR #49453:
URL: https://github.com/apache/spark/pull/49453#discussion_r1976368257
##########
connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/v2/MySQLIntegrationSuite.scala:
##########
@@ -241,6 +241,84 @@ class MySQLIntegrationSuite extends
DockerJDBCIntegrationV2Suite with V2JDBCTest
assert(rows10(0).getString(0) === "amy")
assert(rows10(1).getString(0) === "alex")
}
+
+ // MySQL Connector/J uses collation 'utf8mb4_0900_ai_ci' as collation for
connection.
+ // The MySQL server 9.1.0 uses collation 'utf8mb4_0900_ai_ci' for database
by default.
+ // This method uses string colume directly as the result of cast has the
same collation.
+ def testCastStringTarget(stringLiteral: String, stringCol: String): String =
stringCol
+
+ test("SPARK-50793: MySQL JDBC Connector failed to cast some types") {
+ val tableName = catalogName + ".test_cast_function"
+ withTable(tableName) {
+ val stringValue = "0"
+ val stringLiteral = "'0'"
+ val stringCol = "string_col"
+ val longValue = 0L
+ val longCol = "long_col"
+ val binaryValue = Array[Byte](0x30)
+ val binaryLiteral = "x'30'"
+ val binaryCol = "binary_col"
+ val doubleValue = 0.0
+ val doubleLiteral = "0.0"
+ val doubleCol = "double_col"
+ // CREATE table to use types defined in Spark SQL
+ sql(
+ s"CREATE TABLE $tableName ($stringCol STRING, $longCol LONG, " +
+ s"$binaryCol BINARY, $doubleCol DOUBLE)")
+ sql(
+ s"INSERT INTO $tableName VALUES($stringLiteral, $longValue,
$binaryLiteral, $doubleValue)")
+
+ def testCast(
+ castType: String,
+ sourceCol: String,
+ targetCol: String,
+ targetDataType: DataType,
+ targetValue: Any): Unit = {
+ val sql = s"SELECT CAST($sourceCol AS $castType) AS target " +
+ s"FROM $tableName WHERE CAST($sourceCol AS $castType) = $targetCol"
+ val df = spark.sql(sql)
Review Comment:
> You just need supports pushdown the collation to H2 dialect as an example
or select MySQL. Other dialects remains the followup pr. First, make the DS V2
pushdown framework supports collation and select MySQL or H2 as a basic
implementation. Then continue proceeding this PR.
Ok, I have some quick questions about this.
- I assume features like this should go through a RFC procedure. I will try
to figure it out myself. Meanwhile, I would really appreciate if you can give
me some hints of previous work similar as references
- The subtle differences of different collations and encodings can be very
tricky, what's your suggestion for tables with collations that we know that
Spark don't support yet. How do we work with collations that are a little
different but overall behave the same. As a related question, different version
of MySQL server support different set of collations. Shell we support the
latest MySQL server only or we need to consider the version of MySQL server and
try use the collation in that MySQL version.
Really appreciate your help, thanks.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]