Re: [PR] [SPARK-48652][SQL] Fix casting issue in Spark SQL when comparing string column to integer value [spark]

via GitHub Wed, 10 Jul 2024 22:04:22 -0700


Zawa-ll commented on PR #47246:
URL: https://github.com/apache/spark/pull/47246#issuecomment-2222038902


   > When can this happen?
   
   Thanks for your question, @HyukjinKwon.
   
   This issue typically happens when we compare a string column to an integer 
value in Spark SQL. For instance, imagine we have a dataset with a string 
column id and we try to filter out rows where id is not equal to -1.
   
   Here's a small example to illustrate:
   ```
   case class Person(id: String, name: String)
   val personDF = Seq(
     Person("a", "amit"),
     Person("b", "abhishek")
   ).toDF()
   personDF.createOrReplaceTempView("person_ddf")
   
   val sqlQuery = "SELECT * FROM person_ddf WHERE id <> -1"
   val resultDF = spark.sql(sqlQuery)
   resultDF.show()
   ```
   
   In this case, the query id <> -1 ends up returning an empty result set 
because Spark SQL tries to cast the id column values to integers for the 
comparison. Since "a" and "b" can't be cast to integers, the comparison fails, 
leading to no results.
   
   To fix this, I updated the comparison logic so that integers are converted 
to strings before the comparison. This approach ensures that the comparison is 
done correctly and predictably.
   
   Here's the updated logic:
   
   ```
   protected lazy val ordering: Ordering[Any] = new Ordering[Any] {
     override def compare(x: Any, y: Any): Int = {
       (x, y) match {
         case (xs: String, yi: Int) => xs.compareTo(yi.toString)
         case (xi: Int, ys: String) => xi.toString.compareTo(ys)
         case _ => TypeUtils.getInterpretedOrdering(left.dataType).compare(x, y)
       }
     }
   }
   ```
   
   In addition, the_ Jira of this issue is Jira-48652.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] [SPARK-48652][SQL] Fix casting issue in Spark SQL when comparing string column to integer value [spark]

Reply via email to