vladanvasi-db commented on code in PR #48737:
URL: https://github.com/apache/spark/pull/48737#discussion_r1832164996


##########
sql/api/src/main/scala/org/apache/spark/sql/types/StringType.scala:
##########
@@ -83,9 +83,18 @@ class StringType private (val collationId: Int) extends 
AtomicType with Serializ
   override def jsonValue: JValue = JString("string")
 
   override def equals(obj: Any): Boolean =

Review Comment:
   In my opinion, although it may be the correct approach regarding equality of 
objects in Java/Scala, I think this can bring up more problems in the future 
even though I refactored places in the code where we compare the `collationId`. 
There are a lot other places that compare `DataType` directly and currently for 
`StringType`, this is being done by comparing the `collationId`. These places 
are very hard to detect, and I agree with Stefan that `StringType == 
StringType("UTF8_BINARY")` returning false here would be misleading. 
Furthermore, for all the code that is being added afterwards, people will have 
to be really careful when comparing `DataType`s, since the `equals`method does 
not work as expected for `StringType`. Also for the `hashCode` - places are 
very hard to detect(as example we use `distinct` method of a Scala `List` in 
`CollationTypeCoercion` which relies on `hashCode` object method, and here we 
need the `hashCode` to work properly. Other places in the code where we call
  the `hashCode` of `StringType` are almost impossible to be detected all at 
once).



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to