mitkedb commented on code in PR #45216:
URL: https://github.com/apache/spark/pull/45216#discussion_r1499330134


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala:
##########
@@ -586,9 +584,17 @@ object ContainsExpressionBuilder extends 
StringBinaryPredicateExpressionBuilderB
 }
 
 case class Contains(left: Expression, right: Expression) extends 
StringPredicate {
-  override def compare(l: UTF8String, r: UTF8String): Boolean = l.contains(r)
+  override def compare(l: UTF8String, r: UTF8String): Boolean = {
+    val collationID = right.dataType.asInstanceOf[StringType].collationId

Review Comment:
   @dbatomic @uros-db 
   
   Should we consider special casing the path for the default UTF8 binary 
collation, to prevent any potential performance regression in the default code 
path?
   
   Something like the following:
   
   if right.dataType.isDefaultCollationFinalVariable (it is important that the 
variable is final as it will result in significantly better optimizations by 
JIT IIRC)
      default / current implementation
   else
      collation implementation
      
      
   And the codegen version of the code should remain identical to how it was 
before we added collations... I am guessing we can check the collation ID at 
compile time, and just generate exactly the same codegen code for UTF8 binary
   
   
   
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to