HyukjinKwon commented on a change in pull request #26875:
URL: https://github.com/apache/spark/pull/26875#discussion_r443208133



##########
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/regexpExpressions.scala
##########
@@ -240,11 +245,16 @@ case class RLike(left: Expression, right: Expression) 
extends StringRegexExpress
       }
     } else {
       val rightStr = ctx.freshName("rightStr")
-      val pattern = ctx.freshName("pattern")
+      val pattern = ctx.addMutableState(patternClass, "pattern")
+      val lastRightStr = ctx.addMutableState(classOf[String].getName, 
"lastRightStr")
+
       nullSafeCodeGen(ctx, ev, (eval1, eval2) => {
         s"""
           String $rightStr = $eval2.toString();
-          $patternClass $pattern = $patternClass.compile($rightStr);
+          if (!$rightStr.equals($lastRightStr)) {

Review comment:
       The positive cases are good enough. The concern I heard was actually 
here we add some overhead for string comparison, when the strings are very long.
   
   Can we identify the worst cases? It's okay to show the trade-off explicitly. 
I tend to agree with compiling the pattern once is better in general. Feel free 
to reopen the PR.
   
   cc @rednaxelafx as well FYI.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to