[GitHub] spark pull request #16893: [SPARK-19555][SQL] Improve the performance of Str...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/16893 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16893: [SPARK-19555][SQL] Improve the performance of Str...
Github user JoshRosen commented on a diff in the pull request: https://github.com/apache/spark/pull/16893#discussion_r100707748 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/StringUtils.scala --- @@ -27,21 +28,28 @@ object StringUtils { // replace the % with .*, match 0 or more times with any character def escapeLikeRegex(v: String): String = { --- End diff -- Actually, I'm wrong: we can't quite do that simplification because the old code has a subtle bug related to backslash-escaping. Due to the complexity and terseness the old implementation, it's a little non-obvious to spot that the `case (prev, '\\') => ""` has the effect of always ignoring backslash characters, so this method is incapable of producing a backslash in its output. This is a problem if the user wants to write a `LIKE` pattern to match backslashes then this is impossible with the current code. It turns out that this is covered by #15398, which also implements performance improvements for this code, so I guess this PR and JIRA is redundant :( I thought #15398 had been merged / fixed by now, but I guess not. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16893: [SPARK-19555][SQL] Improve the performance of Str...
Github user JoshRosen commented on a diff in the pull request: https://github.com/apache/spark/pull/16893#discussion_r100707421 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/StringUtils.scala --- @@ -27,21 +28,28 @@ object StringUtils { // replace the % with .*, match 0 or more times with any character def escapeLikeRegex(v: String): String = { --- End diff -- I think we could further simplify this by replacing `previousChar` with a boolean, `nextCharacterIsEscaped` (or `inEscape`). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16893: [SPARK-19555][SQL] Improve the performance of Str...
Github user tejasapatil commented on a diff in the pull request: https://github.com/apache/spark/pull/16893#discussion_r100705983 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/StringUtils.scala --- @@ -27,21 +28,28 @@ object StringUtils { // replace the % with .*, match 0 or more times with any character def escapeLikeRegex(v: String): String = { --- End diff -- nit: while you are at it, can you also make the var names better ? `v` -> `input` `c` -> `currentChar` `prev` -> `previousChar` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16893: [SPARK-19555][SQL] Improve the performance of Str...
GitHub user lins05 opened a pull request: https://github.com/apache/spark/pull/16893 [SPARK-19555][SQL] Improve the performance of StringUtils.escapeLikeRegex method ## What changes were proposed in this pull request? Copied from [SPARK-19555](https://issues.apache.org/jira/browse/SPARK-19555) Spark's `StringUtils.escapeLikeRegex()` method is written inefficiently, performing tons of object allocations due to the use `zip()`, `flatMap()` , and `mkString`. Instead, I think method should be rewritten in an imperative style using a Java string builder. This method can become a performance bottleneck in cases where regex expressions are used with non-constant-foldable expressions (e.g. the regex expression comes from the data rather than being part of the query). ## How was this patch tested? Existing tests. You can merge this pull request into a Git repository by running: $ git pull https://github.com/lins05/spark spark-19555-improve-escape-like-regex Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/16893.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #16893 commit bbcdeda98c14705b6de3efab70f2c58bc4539bb9 Author: Shuai Lin Date: 2017-02-11T07:46:34Z [SPARK-19555][SQL] Improve the performance of StringUtils.escapeLikeRegex --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org