Github user mgaido91 commented on a diff in the pull request:
https://github.com/apache/spark/pull/19811#discussion_r157415970
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/regexpExpressions.scala
---
@@ -112,15 +112,14 @@ case class Like(left: Expression, right: Expression)
extends StringRegexExpressi
override protected def doGenCode(ctx: CodegenContext, ev: ExprCode):
ExprCode = {
val patternClass = classOf[Pattern].getName
val escapeFunc = StringUtils.getClass.getName.stripSuffix("$") +
".escapeLikeRegex"
- val pattern = ctx.freshName("pattern")
if (right.foldable) {
val rVal = right.eval()
if (rVal != null) {
val regexStr =
StringEscapeUtils.escapeJava(escape(rVal.asInstanceOf[UTF8String].toString()))
- ctx.addMutableState(patternClass, pattern,
- s"""$pattern = ${patternClass}.compile("$regexStr");""")
+ val pattern = ctx.addMutableState(patternClass, "patternLike",
+ v => s"""$v = ${patternClass}.compile("$regexStr");""",
forceInline = true)
--- End diff --
my only concern is about point 2. I think it is a dangerous thing to do.
What if we generate a lot of frequently used variable? I think it is safer at
the moment to consider only 1 and 3 in the decision whether to inline or not.
In the future, with a different codegen method, we might then define a
threshold over which we generate an array for the given class, otherwise we use
plain variables, which IMHO would be the best option but at the moment it is
not feasible...
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]