[GitHub] spark pull request #16893: [SPARK-19555][SQL] Improve the performance of Str...

2017-05-18 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/16893


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16893: [SPARK-19555][SQL] Improve the performance of Str...

2017-02-12 Thread JoshRosen
Github user JoshRosen commented on a diff in the pull request:

https://github.com/apache/spark/pull/16893#discussion_r100707748
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/StringUtils.scala
 ---
@@ -27,21 +28,28 @@ object StringUtils {
   // replace the % with .*, match 0 or more times with any character
   def escapeLikeRegex(v: String): String = {
--- End diff --

Actually, I'm wrong: we can't quite do that simplification because the old 
code has a subtle bug related to backslash-escaping. Due to the complexity and 
terseness the old implementation, it's a little non-obvious to spot that the 
`case (prev, '\\') => ""` has the effect of always ignoring backslash 
characters, so this method is incapable of producing a backslash in its output. 
This is a problem if the user wants to write a `LIKE` pattern to match 
backslashes then this is impossible with the current code.

It turns out that this is covered by #15398, which also implements 
performance improvements for this code, so I guess this PR and JIRA is 
redundant :(

I thought #15398 had been merged / fixed by now, but I guess not.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16893: [SPARK-19555][SQL] Improve the performance of Str...

2017-02-12 Thread JoshRosen
Github user JoshRosen commented on a diff in the pull request:

https://github.com/apache/spark/pull/16893#discussion_r100707421
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/StringUtils.scala
 ---
@@ -27,21 +28,28 @@ object StringUtils {
   // replace the % with .*, match 0 or more times with any character
   def escapeLikeRegex(v: String): String = {
--- End diff --

I think we could further simplify this by replacing `previousChar` with a 
boolean, `nextCharacterIsEscaped` (or `inEscape`).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16893: [SPARK-19555][SQL] Improve the performance of Str...

2017-02-12 Thread tejasapatil
Github user tejasapatil commented on a diff in the pull request:

https://github.com/apache/spark/pull/16893#discussion_r100705983
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/StringUtils.scala
 ---
@@ -27,21 +28,28 @@ object StringUtils {
   // replace the % with .*, match 0 or more times with any character
   def escapeLikeRegex(v: String): String = {
--- End diff --

nit: while you are at it, can you also make the var names better ?

`v` -> `input`
`c` -> `currentChar`
`prev` -> `previousChar`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16893: [SPARK-19555][SQL] Improve the performance of Str...

2017-02-10 Thread lins05
GitHub user lins05 opened a pull request:

https://github.com/apache/spark/pull/16893

[SPARK-19555][SQL] Improve the performance of StringUtils.escapeLikeRegex 
method

## What changes were proposed in this pull request?

Copied from [SPARK-19555](https://issues.apache.org/jira/browse/SPARK-19555)

Spark's `StringUtils.escapeLikeRegex()` method is written inefficiently, 
performing tons of object allocations due to the use `zip()`, `flatMap()` , and 
`mkString`. Instead, I think method should be rewritten in an imperative style 
using a Java string builder.

This method can become a performance bottleneck in cases where regex 
expressions are used with non-constant-foldable expressions (e.g. the regex 
expression comes from the data rather than being part of the query).

## How was this patch tested?

Existing tests.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/lins05/spark 
spark-19555-improve-escape-like-regex

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/16893.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #16893


commit bbcdeda98c14705b6de3efab70f2c58bc4539bb9
Author: Shuai Lin 
Date:   2017-02-11T07:46:34Z

[SPARK-19555][SQL] Improve the performance of StringUtils.escapeLikeRegex




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org