beliefer edited a comment on issue #27507: [SPARK-24884][SQL] Support regexp function regexp_extract_all URL: https://github.com/apache/spark/pull/27507#issuecomment-585518801 > I have a high-level question. Do we have huge advantage to generate Java code? > > One advantage is to store the result of `Pattern.compile()` into each global variable for caching while the non-generated code shares one variable for cache. > On the other hand, the size of the result is not small. Which trade-off do we select? Space or performance? LIKE and RLIKE cache the result of `Pattern.compile()`. RegExpReplace and RegExpExtract use another way https://github.com/apache/spark/blob/5e3c092dc055ca0f1a2f523efa5f305555b991e6/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/regexpExpressions.scala#L438. If the pattern string is a constant, the two approaches to the same goal. If the pattern string is a variable, the performance issue seems cannot to avoid.
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
