Zawa-ll commented on PR #47246:
URL: https://github.com/apache/spark/pull/47246#issuecomment-2222038902
> When can this happen?
Thanks for your question, @HyukjinKwon.
This issue typically happens when we compare a string column to an integer
value in Spark SQL. For instance, imagine we have a dataset with a string
column id and we try to filter out rows where id is not equal to -1.
Here's a small example to illustrate:
```
case class Person(id: String, name: String)
val personDF = Seq(
Person("a", "amit"),
Person("b", "abhishek")
).toDF()
personDF.createOrReplaceTempView("person_ddf")
val sqlQuery = "SELECT * FROM person_ddf WHERE id <> -1"
val resultDF = spark.sql(sqlQuery)
resultDF.show()
```
In this case, the query id <> -1 ends up returning an empty result set
because Spark SQL tries to cast the id column values to integers for the
comparison. Since "a" and "b" can't be cast to integers, the comparison fails,
leading to no results.
To fix this, I updated the comparison logic so that integers are converted
to strings before the comparison. This approach ensures that the comparison is
done correctly and predictably.
Here's the updated logic:
```
protected lazy val ordering: Ordering[Any] = new Ordering[Any] {
override def compare(x: Any, y: Any): Int = {
(x, y) match {
case (xs: String, yi: Int) => xs.compareTo(yi.toString)
case (xi: Int, ys: String) => xi.toString.compareTo(ys)
case _ => TypeUtils.getInterpretedOrdering(left.dataType).compare(x, y)
}
}
}
```
In addition, the_ Jira of this issue is Jira-48652.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]