[GitHub] [spark] maropu commented on a change in pull request #25001: [SPARK-28083][SQL] Support LIKE ... ESCAPE syntax
maropu commented on a change in pull request #25001: [SPARK-28083][SQL] Support LIKE ... ESCAPE syntax URL: https://github.com/apache/spark/pull/25001#discussion_r338011227 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/regexpExpressions.scala ## @@ -149,10 +153,17 @@ case class Like(left: Expression, right: Expression) extends StringRegexExpressi } else { val pattern = ctx.freshName("pattern") val rightStr = ctx.freshName("rightStr") + val escapeChar = escapeCharOpt.map { str => +if (str.equals("\"") || str.equals("\\")) { Review comment: I remember this is to accept [the query](https://github.com/apache/spark/pull/25001#issuecomment-511803526). The suggested looks nice and we need comments here. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] maropu commented on a change in pull request #25001: [SPARK-28083][SQL] Support LIKE ... ESCAPE syntax
maropu commented on a change in pull request #25001: [SPARK-28083][SQL] Support LIKE ... ESCAPE syntax URL: https://github.com/apache/spark/pull/25001#discussion_r309971143 ## File path: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/util/StringUtilsSuite.scala ## @@ -23,13 +23,30 @@ import org.apache.spark.sql.catalyst.util.StringUtils._ class StringUtilsSuite extends SparkFunSuite { test("escapeLikeRegex") { -assert(escapeLikeRegex("abdef") === "(?s)\\Qa\\E\\Qb\\E\\Qd\\E\\Qe\\E\\Qf\\E") -assert(escapeLikeRegex("a\\__b") === "(?s)\\Qa\\E\\Q_\\E.\\Qb\\E") -assert(escapeLikeRegex("a_%b") === "(?s)\\Qa\\E..*\\Qb\\E") -assert(escapeLikeRegex("a%\\%b") === "(?s)\\Qa\\E.*\\Q%\\E\\Qb\\E") -assert(escapeLikeRegex("a%") === "(?s)\\Qa\\E.*") -assert(escapeLikeRegex("**") === "(?s)\\Q*\\E\\Q*\\E") -assert(escapeLikeRegex("a_b") === "(?s)\\Qa\\E.\\Qb\\E") +val expectedEscapedStrArr = Array("(?s)\\Qa\\E\\Qb\\E\\Qd\\E\\Qe\\E\\Qf\\E", Review comment: Can you assign these expected strings to each independent variable? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] maropu commented on a change in pull request #25001: [SPARK-28083][SQL] Support LIKE ... ESCAPE syntax
maropu commented on a change in pull request #25001: [SPARK-28083][SQL] Support LIKE ... ESCAPE syntax URL: https://github.com/apache/spark/pull/25001#discussion_r309970475 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/regexpExpressions.scala ## @@ -83,33 +83,39 @@ abstract class StringRegexExpression extends BinaryExpression % matches zero or more characters in the input (similar to .* in posix regular expressions) - The escape character is '\'. If an escape character precedes a special symbol or another - escape character, the following character is matched literally. It is invalid to escape - any other character. - Since Spark 2.0, string literals are unescaped in our SQL parser. For example, in order to match "\abc", the pattern should be "\\abc". When SQL config 'spark.sql.parser.escapedStringLiterals' is enabled, it fallbacks to Spark 1.6 behavior regarding string literal parsing. For example, if the config is enabled, the pattern to match "\abc" should be "\abc". + * escape - a optional string. The default escape character is the '\' . If an escape character + precedes a special symbol or another escape character, the following character is matched + literally. It is invalid to escape any other character. You can specify '' as escape + character so that disables the escape mechanism, which makes it impossible to turn off + the special meaning of underscore and percent signs in the pattern. Review comment: For now, can you fix code to throw an exception for the case? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] maropu commented on a change in pull request #25001: [SPARK-28083][SQL] Support LIKE ... ESCAPE syntax
maropu commented on a change in pull request #25001: [SPARK-28083][SQL] Support LIKE ... ESCAPE syntax URL: https://github.com/apache/spark/pull/25001#discussion_r308523852 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/regexpExpressions.scala ## @@ -83,33 +83,39 @@ abstract class StringRegexExpression extends BinaryExpression % matches zero or more characters in the input (similar to .* in posix regular expressions) - The escape character is '\'. If an escape character precedes a special symbol or another - escape character, the following character is matched literally. It is invalid to escape - any other character. - Since Spark 2.0, string literals are unescaped in our SQL parser. For example, in order to match "\abc", the pattern should be "\\abc". When SQL config 'spark.sql.parser.escapedStringLiterals' is enabled, it fallbacks to Spark 1.6 behavior regarding string literal parsing. For example, if the config is enabled, the pattern to match "\abc" should be "\abc". + * escape - a optional string. The default escape character is the '\' . If an escape character + precedes a special symbol or another escape character, the following character is matched + literally. It is invalid to escape any other character. You can specify '' as escape + character so that disables the escape mechanism, which makes it impossible to turn off + the special meaning of underscore and percent signs in the pattern. Review comment: I like the standard behivour, but we should wait for the other review's comment. cc: @dongjoon-hyun @gatorsmile This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] maropu commented on a change in pull request #25001: [SPARK-28083][SQL] Support LIKE ... ESCAPE syntax
maropu commented on a change in pull request #25001: [SPARK-28083][SQL] Support LIKE ... ESCAPE syntax URL: https://github.com/apache/spark/pull/25001#discussion_r308523852 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/regexpExpressions.scala ## @@ -83,33 +83,39 @@ abstract class StringRegexExpression extends BinaryExpression % matches zero or more characters in the input (similar to .* in posix regular expressions) - The escape character is '\'. If an escape character precedes a special symbol or another - escape character, the following character is matched literally. It is invalid to escape - any other character. - Since Spark 2.0, string literals are unescaped in our SQL parser. For example, in order to match "\abc", the pattern should be "\\abc". When SQL config 'spark.sql.parser.escapedStringLiterals' is enabled, it fallbacks to Spark 1.6 behavior regarding string literal parsing. For example, if the config is enabled, the pattern to match "\abc" should be "\abc". + * escape - a optional string. The default escape character is the '\' . If an escape character + precedes a special symbol or another escape character, the following character is matched + literally. It is invalid to escape any other character. You can specify '' as escape + character so that disables the escape mechanism, which makes it impossible to turn off + the special meaning of underscore and percent signs in the pattern. Review comment: I like the behivour, but we should wait for the other review's comment. cc: @dongjoon-hyun @gatorsmile This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] maropu commented on a change in pull request #25001: [SPARK-28083][SQL] Support LIKE ... ESCAPE syntax
maropu commented on a change in pull request #25001: [SPARK-28083][SQL] Support LIKE ... ESCAPE syntax URL: https://github.com/apache/spark/pull/25001#discussion_r308523229 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/regexpExpressions.scala ## @@ -83,33 +83,39 @@ abstract class StringRegexExpression extends BinaryExpression % matches zero or more characters in the input (similar to .* in posix regular expressions) - The escape character is '\'. If an escape character precedes a special symbol or another - escape character, the following character is matched literally. It is invalid to escape - any other character. - Since Spark 2.0, string literals are unescaped in our SQL parser. For example, in order to match "\abc", the pattern should be "\\abc". When SQL config 'spark.sql.parser.escapedStringLiterals' is enabled, it fallbacks to Spark 1.6 behavior regarding string literal parsing. For example, if the config is enabled, the pattern to match "\abc" should be "\abc". + * escape - a optional string. The default escape character is the '\' . If an escape character + precedes a special symbol or another escape character, the following character is matched + literally. It is invalid to escape any other character. You can specify '' as escape + character so that disables the escape mechanism, which makes it impossible to turn off + the special meaning of underscore and percent signs in the pattern. Review comment: If we follow `1)`, we should throw an exception for the empty case? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] maropu commented on a change in pull request #25001: [SPARK-28083][SQL] Support LIKE ... ESCAPE syntax
maropu commented on a change in pull request #25001: [SPARK-28083][SQL] Support LIKE ... ESCAPE syntax URL: https://github.com/apache/spark/pull/25001#discussion_r308085976 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/regexpExpressions.scala ## @@ -83,33 +83,39 @@ abstract class StringRegexExpression extends BinaryExpression % matches zero or more characters in the input (similar to .* in posix regular expressions) - The escape character is '\'. If an escape character precedes a special symbol or another - escape character, the following character is matched literally. It is invalid to escape - any other character. - Since Spark 2.0, string literals are unescaped in our SQL parser. For example, in order to match "\abc", the pattern should be "\\abc". When SQL config 'spark.sql.parser.escapedStringLiterals' is enabled, it fallbacks to Spark 1.6 behavior regarding string literal parsing. For example, if the config is enabled, the pattern to match "\abc" should be "\abc". + * escape - a optional string. The default escape character is the '\' . If an escape character + precedes a special symbol or another escape character, the following character is matched + literally. It is invalid to escape any other character. You can specify '' as escape + character so that disables the escape mechanism, which makes it impossible to turn off + the special meaning of underscore and percent signs in the pattern. Review comment: oh... that's a weird case... the standard says something about this? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] maropu commented on a change in pull request #25001: [SPARK-28083][SQL] Support LIKE ... ESCAPE syntax
maropu commented on a change in pull request #25001: [SPARK-28083][SQL] Support LIKE ... ESCAPE syntax URL: https://github.com/apache/spark/pull/25001#discussion_r308085976 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/regexpExpressions.scala ## @@ -83,33 +83,39 @@ abstract class StringRegexExpression extends BinaryExpression % matches zero or more characters in the input (similar to .* in posix regular expressions) - The escape character is '\'. If an escape character precedes a special symbol or another - escape character, the following character is matched literally. It is invalid to escape - any other character. - Since Spark 2.0, string literals are unescaped in our SQL parser. For example, in order to match "\abc", the pattern should be "\\abc". When SQL config 'spark.sql.parser.escapedStringLiterals' is enabled, it fallbacks to Spark 1.6 behavior regarding string literal parsing. For example, if the config is enabled, the pattern to match "\abc" should be "\abc". + * escape - a optional string. The default escape character is the '\' . If an escape character + precedes a special symbol or another escape character, the following character is matched + literally. It is invalid to escape any other character. You can specify '' as escape + character so that disables the escape mechanism, which makes it impossible to turn off + the special meaning of underscore and percent signs in the pattern. Review comment: oh... the standard says something about this? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] maropu commented on a change in pull request #25001: [SPARK-28083][SQL] Support LIKE ... ESCAPE syntax
maropu commented on a change in pull request #25001: [SPARK-28083][SQL] Support LIKE ... ESCAPE syntax URL: https://github.com/apache/spark/pull/25001#discussion_r308081254 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/regexpExpressions.scala ## @@ -83,33 +83,39 @@ abstract class StringRegexExpression extends BinaryExpression % matches zero or more characters in the input (similar to .* in posix regular expressions) - The escape character is '\'. If an escape character precedes a special symbol or another - escape character, the following character is matched literally. It is invalid to escape - any other character. - Since Spark 2.0, string literals are unescaped in our SQL parser. For example, in order to match "\abc", the pattern should be "\\abc". When SQL config 'spark.sql.parser.escapedStringLiterals' is enabled, it fallbacks to Spark 1.6 behavior regarding string literal parsing. For example, if the config is enabled, the pattern to match "\abc" should be "\abc". + * escape - a optional string. The default escape character is the '\' . If an escape character + precedes a special symbol or another escape character, the following character is matched + literally. It is invalid to escape any other character. You can specify '' as escape + character so that disables the escape mechanism, which makes it impossible to turn off + the special meaning of underscore and percent signs in the pattern. Review comment: Before that, I just want to make sure that that behaivour is consistent among the other DBMSs. (I'm not sure we need to accept that empty case.) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] maropu commented on a change in pull request #25001: [SPARK-28083][SQL] Support LIKE ... ESCAPE syntax
maropu commented on a change in pull request #25001: [SPARK-28083][SQL] Support LIKE ... ESCAPE syntax URL: https://github.com/apache/spark/pull/25001#discussion_r308081254 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/regexpExpressions.scala ## @@ -83,33 +83,39 @@ abstract class StringRegexExpression extends BinaryExpression % matches zero or more characters in the input (similar to .* in posix regular expressions) - The escape character is '\'. If an escape character precedes a special symbol or another - escape character, the following character is matched literally. It is invalid to escape - any other character. - Since Spark 2.0, string literals are unescaped in our SQL parser. For example, in order to match "\abc", the pattern should be "\\abc". When SQL config 'spark.sql.parser.escapedStringLiterals' is enabled, it fallbacks to Spark 1.6 behavior regarding string literal parsing. For example, if the config is enabled, the pattern to match "\abc" should be "\abc". + * escape - a optional string. The default escape character is the '\' . If an escape character + precedes a special symbol or another escape character, the following character is matched + literally. It is invalid to escape any other character. You can specify '' as escape + character so that disables the escape mechanism, which makes it impossible to turn off + the special meaning of underscore and percent signs in the pattern. Review comment: Before that, I just want to make sure that that behaivour is consistent among the other DBMSs. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] maropu commented on a change in pull request #25001: [SPARK-28083][SQL] Support LIKE ... ESCAPE syntax
maropu commented on a change in pull request #25001: [SPARK-28083][SQL] Support LIKE ... ESCAPE syntax URL: https://github.com/apache/spark/pull/25001#discussion_r308059472 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/regexpExpressions.scala ## @@ -83,33 +83,39 @@ abstract class StringRegexExpression extends BinaryExpression % matches zero or more characters in the input (similar to .* in posix regular expressions) - The escape character is '\'. If an escape character precedes a special symbol or another - escape character, the following character is matched literally. It is invalid to escape - any other character. - Since Spark 2.0, string literals are unescaped in our SQL parser. For example, in order to match "\abc", the pattern should be "\\abc". When SQL config 'spark.sql.parser.escapedStringLiterals' is enabled, it fallbacks to Spark 1.6 behavior regarding string literal parsing. For example, if the config is enabled, the pattern to match "\abc" should be "\abc". + * escape - a optional string. The default escape character is the '\' . If an escape character + precedes a special symbol or another escape character, the following character is matched + literally. It is invalid to escape any other character. You can specify '' as escape + character so that disables the escape mechanism, which makes it impossible to turn off + the special meaning of underscore and percent signs in the pattern. Review comment: Could you show us these behivours? (I'm not sure that the current one is ok) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] maropu commented on a change in pull request #25001: [SPARK-28083][SQL] Support LIKE ... ESCAPE syntax
maropu commented on a change in pull request #25001: [SPARK-28083][SQL] Support LIKE ... ESCAPE syntax URL: https://github.com/apache/spark/pull/25001#discussion_r308059472 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/regexpExpressions.scala ## @@ -83,33 +83,39 @@ abstract class StringRegexExpression extends BinaryExpression % matches zero or more characters in the input (similar to .* in posix regular expressions) - The escape character is '\'. If an escape character precedes a special symbol or another - escape character, the following character is matched literally. It is invalid to escape - any other character. - Since Spark 2.0, string literals are unescaped in our SQL parser. For example, in order to match "\abc", the pattern should be "\\abc". When SQL config 'spark.sql.parser.escapedStringLiterals' is enabled, it fallbacks to Spark 1.6 behavior regarding string literal parsing. For example, if the config is enabled, the pattern to match "\abc" should be "\abc". + * escape - a optional string. The default escape character is the '\' . If an escape character + precedes a special symbol or another escape character, the following character is matched + literally. It is invalid to escape any other character. You can specify '' as escape + character so that disables the escape mechanism, which makes it impossible to turn off + the special meaning of underscore and percent signs in the pattern. Review comment: Could you show us these behivours? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] maropu commented on a change in pull request #25001: [SPARK-28083][SQL] Support LIKE ... ESCAPE syntax
maropu commented on a change in pull request #25001: [SPARK-28083][SQL] Support LIKE ... ESCAPE syntax URL: https://github.com/apache/spark/pull/25001#discussion_r308057910 ## File path: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/RegexpExpressionsSuite.scala ## @@ -118,6 +118,295 @@ class RegexpExpressionsSuite extends SparkFunSuite with ExpressionEvalHelper { checkLiteralRow("""%SystemDrive%\Users\John""" like _, """\%SystemDrive\%\\Users%""", true) } + test("LIKE Pattern ESCAPE '/'") { + Review comment: These tests below are super duplicated between each other. IMHO we don't need these over-killing tests and more simpler tests are ok to me here (Basic tests already have been done in https://github.com/apache/spark/pull/25001/files#diff-c36b16be93e0c63a4e3fad37bd904b29R25) Can you write tests like this? ``` Seq("/", "#", "\"", ...).foreach { escapeChar => test(s"LIKE Pattern ESCAPE '$escapeChar'") { // More simpler tests here... } } ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] maropu commented on a change in pull request #25001: [SPARK-28083][SQL] Support LIKE ... ESCAPE syntax
maropu commented on a change in pull request #25001: [SPARK-28083][SQL] Support LIKE ... ESCAPE syntax URL: https://github.com/apache/spark/pull/25001#discussion_r308056865 ## File path: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/util/StringUtilsSuite.scala ## @@ -23,13 +23,27 @@ import org.apache.spark.sql.catalyst.util.StringUtils._ class StringUtilsSuite extends SparkFunSuite { test("escapeLikeRegex") { -assert(escapeLikeRegex("abdef") === "(?s)\\Qa\\E\\Qb\\E\\Qd\\E\\Qe\\E\\Qf\\E") -assert(escapeLikeRegex("a\\__b") === "(?s)\\Qa\\E\\Q_\\E.\\Qb\\E") -assert(escapeLikeRegex("a_%b") === "(?s)\\Qa\\E..*\\Qb\\E") -assert(escapeLikeRegex("a%\\%b") === "(?s)\\Qa\\E.*\\Q%\\E\\Qb\\E") -assert(escapeLikeRegex("a%") === "(?s)\\Qa\\E.*") -assert(escapeLikeRegex("**") === "(?s)\\Q*\\E\\Q*\\E") -assert(escapeLikeRegex("a_b") === "(?s)\\Qa\\E.\\Qb\\E") +assert(escapeLikeRegex("abdef", "\\") === "(?s)\\Qa\\E\\Qb\\E\\Qd\\E\\Qe\\E\\Qf\\E") +assert(escapeLikeRegex("abdef", "/") === "(?s)\\Qa\\E\\Qb\\E\\Qd\\E\\Qe\\E\\Qf\\E") +assert(escapeLikeRegex("abdef", "\"") === "(?s)\\Qa\\E\\Qb\\E\\Qd\\E\\Qe\\E\\Qf\\E") +assert(escapeLikeRegex("a\\__b", "\\") === "(?s)\\Qa\\E\\Q_\\E.\\Qb\\E") +assert(escapeLikeRegex("a/__b", "/") === "(?s)\\Qa\\E\\Q_\\E.\\Qb\\E") +assert(escapeLikeRegex("a\"__b", "\"") === "(?s)\\Qa\\E\\Q_\\E.\\Qb\\E") Review comment: nit: To make it easy to see, how about this format? ``` val expectedEscapedStr = "(?s)\\Qa\\E\\Q_\\E.\\Qb\\E" assert(escapeLikeRegex("a\\__b", "\\") === expectedEscapedStr) assert(escapeLikeRegex("a/__b", "/") === expectedEscapedStr) assert(escapeLikeRegex("a\"__b", "\"") === expectedEscapedStr) ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] maropu commented on a change in pull request #25001: [SPARK-28083][SQL] Support LIKE ... ESCAPE syntax
maropu commented on a change in pull request #25001: [SPARK-28083][SQL] Support LIKE ... ESCAPE syntax URL: https://github.com/apache/spark/pull/25001#discussion_r308054916 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/dsl/package.scala ## @@ -97,7 +97,9 @@ package object dsl { case _ => In(expr, list) } -def like(other: Expression): Expression = Like(expr, other) +def like(other: Expression): Expression = Like(expr, other, None) +def like(other: Expression, escapeChar: String): Expression = + Like(expr, other, Some(escapeChar)) Review comment: ``` def like(other: Expression): Expression = Like(expr, other, None) def like(other: Expression, escapeChar: String): Expression = Like(expr, other, Some(escapeChar)) ``` -> ``` def like(other: Expression, escapeCharOption: Option[String] = None): Expression = Like(expr, other, escapeCharOption) ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] maropu commented on a change in pull request #25001: [SPARK-28083][SQL] Support LIKE ... ESCAPE syntax
maropu commented on a change in pull request #25001: [SPARK-28083][SQL] Support LIKE ... ESCAPE syntax URL: https://github.com/apache/spark/pull/25001#discussion_r308054143 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/regexpExpressions.scala ## @@ -83,33 +83,39 @@ abstract class StringRegexExpression extends BinaryExpression % matches zero or more characters in the input (similar to .* in posix regular expressions) - The escape character is '\'. If an escape character precedes a special symbol or another - escape character, the following character is matched literally. It is invalid to escape - any other character. - Since Spark 2.0, string literals are unescaped in our SQL parser. For example, in order to match "\abc", the pattern should be "\\abc". When SQL config 'spark.sql.parser.escapedStringLiterals' is enabled, it fallbacks to Spark 1.6 behavior regarding string literal parsing. For example, if the config is enabled, the pattern to match "\abc" should be "\abc". + * escape - a optional string. The default escape character is the '\' . If an escape character + precedes a special symbol or another escape character, the following character is matched + literally. It is invalid to escape any other character. You can specify '' as escape + character so that disables the escape mechanism, which makes it impossible to turn off + the special meaning of underscore and percent signs in the pattern. Review comment: > `You can specify '' as escape character so that disables the escape mechanism...` Is this behaviour the same with the other DBMSs? > I adds some description of '' specified as escape character, we don't need it? Yeah, I feel its a corner case. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] maropu commented on a change in pull request #25001: [SPARK-28083][SQL] Support LIKE ... ESCAPE syntax
maropu commented on a change in pull request #25001: [SPARK-28083][SQL] Support LIKE ... ESCAPE syntax URL: https://github.com/apache/spark/pull/25001#discussion_r308054143 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/regexpExpressions.scala ## @@ -83,33 +83,39 @@ abstract class StringRegexExpression extends BinaryExpression % matches zero or more characters in the input (similar to .* in posix regular expressions) - The escape character is '\'. If an escape character precedes a special symbol or another - escape character, the following character is matched literally. It is invalid to escape - any other character. - Since Spark 2.0, string literals are unescaped in our SQL parser. For example, in order to match "\abc", the pattern should be "\\abc". When SQL config 'spark.sql.parser.escapedStringLiterals' is enabled, it fallbacks to Spark 1.6 behavior regarding string literal parsing. For example, if the config is enabled, the pattern to match "\abc" should be "\abc". + * escape - a optional string. The default escape character is the '\' . If an escape character + precedes a special symbol or another escape character, the following character is matched + literally. It is invalid to escape any other character. You can specify '' as escape + character so that disables the escape mechanism, which makes it impossible to turn off + the special meaning of underscore and percent signs in the pattern. Review comment: > `You can specify '' as escape character so that disables the escape mechanism...` Is this behaviour the same with the other DBMSs? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] maropu commented on a change in pull request #25001: [SPARK-28083][SQL] Support LIKE ... ESCAPE syntax
maropu commented on a change in pull request #25001: [SPARK-28083][SQL] Support LIKE ... ESCAPE syntax URL: https://github.com/apache/spark/pull/25001#discussion_r308054143 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/regexpExpressions.scala ## @@ -83,33 +83,39 @@ abstract class StringRegexExpression extends BinaryExpression % matches zero or more characters in the input (similar to .* in posix regular expressions) - The escape character is '\'. If an escape character precedes a special symbol or another - escape character, the following character is matched literally. It is invalid to escape - any other character. - Since Spark 2.0, string literals are unescaped in our SQL parser. For example, in order to match "\abc", the pattern should be "\\abc". When SQL config 'spark.sql.parser.escapedStringLiterals' is enabled, it fallbacks to Spark 1.6 behavior regarding string literal parsing. For example, if the config is enabled, the pattern to match "\abc" should be "\abc". + * escape - a optional string. The default escape character is the '\' . If an escape character + precedes a special symbol or another escape character, the following character is matched + literally. It is invalid to escape any other character. You can specify '' as escape + character so that disables the escape mechanism, which makes it impossible to turn off + the special meaning of underscore and percent signs in the pattern. Review comment: > `You can specify '' as escape character so that disables the escape mechanism...` Is this behaviour the same with the other DBMSs? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] maropu commented on a change in pull request #25001: [SPARK-28083][SQL] Support LIKE ... ESCAPE syntax
maropu commented on a change in pull request #25001: [SPARK-28083][SQL] Support LIKE ... ESCAPE syntax URL: https://github.com/apache/spark/pull/25001#discussion_r308053733 ## File path: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/parser/ExpressionParserSuite.scala ## @@ -175,7 +175,13 @@ class ExpressionParserSuite extends AnalysisTest { test("like expressions") { assertEqual("a like 'pattern%'", 'a like "pattern%") +assertEqual("a like 'pattern%' escape '#'", 'a.like("pattern%", "#")) +assertEqual("a like 'pattern%' escape '\"'", 'a.like("pattern%", "\"")) +intercept("a like 'pattern%' escape '##'", "Escape string must be empty or one character.") assertEqual("a not like 'pattern%'", !('a like "pattern%")) +assertEqual("a not like 'pattern%' escape '#'", !('a.like("pattern%", "#"))) +assertEqual("a not like 'pattern%' escape '\"'", !('a.like("pattern%", "\""))) +intercept("a not like 'pattern%' escape '\"/'", "Escape string must be empty or one character.") assertEqual("a rlike 'pattern%'", 'a rlike "pattern%") Review comment: Can you move these new assertions into a new test unit? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] maropu commented on a change in pull request #25001: [SPARK-28083][SQL] Support LIKE ... ESCAPE syntax
maropu commented on a change in pull request #25001: [SPARK-28083][SQL] Support LIKE ... ESCAPE syntax URL: https://github.com/apache/spark/pull/25001#discussion_r308052836 ## File path: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/RegexpExpressionsSuite.scala ## @@ -118,6 +118,295 @@ class RegexpExpressionsSuite extends SparkFunSuite with ExpressionEvalHelper { checkLiteralRow("""%SystemDrive%\Users\John""" like _, """\%SystemDrive\%\\Users%""", true) } + test("LIKE Pattern ESCAPE '/'") { + +// null handling +checkLiteralRow(Literal.create(null, StringType).like(_, "/"), "a", null) +checkEvaluation( + Literal.create("a", StringType).like(Literal.create(null, StringType), "/"), null) +checkEvaluation( + Literal.create(null, StringType).like(Literal.create(null, StringType), "/"), null) +checkEvaluation( + Literal.create("a", StringType).like(NonFoldableLiteral.create("a", StringType), "/"), true) +checkEvaluation( + Literal.create("a", StringType).like(NonFoldableLiteral.create(null, StringType), "/"), null) +checkEvaluation( + Literal.create(null, StringType).like(NonFoldableLiteral.create("a", StringType), "/"), null) +checkEvaluation( + Literal.create(null, StringType).like(NonFoldableLiteral.create(null, StringType), "/"), null) + +// simple patterns +checkLiteralRow("abdef" like(_, "/"), "abdef", true) +checkLiteralRow("a_%b" like(_, "/"), "a/__b", true) +checkLiteralRow("addb" like(_, "/"), "a_%b", true) +checkLiteralRow("addb" like(_, "/"), "a/__b", false) +checkLiteralRow("addb" like(_, "/"), "a%/%b", false) +checkLiteralRow("a_%b" like(_, "/"), "a%/%b", true) +checkLiteralRow("addb" like(_, "/"), "a%", true) +checkLiteralRow("addb" like(_, "/"), "**", false) +checkLiteralRow("abc" like(_, "/"), "a%", true) +checkLiteralRow("abc" like(_, "/"), "b%", false) +checkLiteralRow("abc" like(_, "/"), "bc%", false) +checkLiteralRow("a\nb" like(_, "/"), "a_b", true) +checkLiteralRow("ab" like(_, "/"), "a%b", true) +checkLiteralRow("a\nb" like(_, "/"), "a%b", true) + +// empty input +checkLiteralRow("" like(_, "/"), "", true) +checkLiteralRow("a" like(_, "/"), "", false) +checkLiteralRow("" like(_, "/"), "a", false) + +// SI-17647 double-escaping backslash +checkLiteralRow("""""" like(_, "/"), """%//%""", true) +checkLiteralRow("""%%""" like(_, "/"), """%%""", true) +checkLiteralRow("""/__""" like(_, "/"), """///__""", true) +checkLiteralRow("""///__""" like(_, "/"), """%//%/%""", false) +checkLiteralRow("""_///%""" like(_, "/"), """%//""", false) + +// unicode +// scalastyle:off nonascii +checkLiteralRow("a\u20ACa" like(_, "/"), "_\u20AC_", true) +checkLiteralRow("a€a" like(_, "/"), "_€_", true) +checkLiteralRow("a€a" like(_, "/"), "_\u20AC_", true) +checkLiteralRow("a\u20ACa" like(_, "/"), "_€_", true) +// scalastyle:on nonascii + +// invalid escaping +val invalidEscape = intercept[AnalysisException] { + evaluateWithoutCodegen("""a""" like("""/a""", "/")) +} +assert(invalidEscape.getMessage.contains("pattern")) + +val endEscape = intercept[AnalysisException] { + evaluateWithoutCodegen("""a""" like("""a/""", "/")) +} +assert(endEscape.getMessage.contains("pattern")) + +// case +checkLiteralRow("A" like(_, "/"), "a%", false) +checkLiteralRow("a" like(_, "/"), "A%", false) +checkLiteralRow("AaA" like(_, "/"), "_a_", true) + +// example +checkLiteralRow( + """%SystemDrive%/Users/John""" like(_, "/"), """/%SystemDrive/%//Users%""", true) + } + + test("LIKE Pattern ESCAPE '#'") { + +// null handling +checkLiteralRow(Literal.create(null, StringType).like(_, "#"), "a", null) +checkEvaluation( + Literal.create("a", StringType).like(Literal.create(null, StringType), "#"), null) +checkEvaluation( + Literal.create(null, StringType).like(Literal.create(null, StringType), "#"), null) +checkEvaluation( + Literal.create("a", StringType).like(NonFoldableLiteral.create("a", StringType), "#"), true) +checkEvaluation( + Literal.create("a", StringType).like(NonFoldableLiteral.create(null, StringType), "#"), null) +checkEvaluation( + Literal.create(null, StringType).like(NonFoldableLiteral.create("a", StringType), "#"), null) +checkEvaluation( + Literal.create(null, StringType).like(NonFoldableLiteral.create(null, StringType), "#"), null) + +// simple patterns +checkLiteralRow("abdef" like(_, "#"), "abdef", true) +checkLiteralRow("a_%b" like(_, "#"), "a#__b", true) +checkLiteralRow("addb" like(_, "#"), "a_%b", true) +checkLiteralRow("addb" like(_, "#"), "a#__b", false) +checkLiteralRow("addb" like(_, "#"), "a%#%b", false) +checkLiteralRow("a_%b" like(_, "#"), "a%#%b", true) +checkLiteralRow("addb"
[GitHub] [spark] maropu commented on a change in pull request #25001: [SPARK-28083][SQL] Support LIKE ... ESCAPE syntax
maropu commented on a change in pull request #25001: [SPARK-28083][SQL] Support LIKE ... ESCAPE syntax URL: https://github.com/apache/spark/pull/25001#discussion_r308052738 ## File path: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/RegexpExpressionsSuite.scala ## @@ -118,6 +118,295 @@ class RegexpExpressionsSuite extends SparkFunSuite with ExpressionEvalHelper { checkLiteralRow("""%SystemDrive%\Users\John""" like _, """\%SystemDrive\%\\Users%""", true) } + test("LIKE Pattern ESCAPE '/'") { + Review comment: nit: remove a blank. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] maropu commented on a change in pull request #25001: [SPARK-28083][SQL] Support LIKE ... ESCAPE syntax
maropu commented on a change in pull request #25001: [SPARK-28083][SQL] Support LIKE ... ESCAPE syntax URL: https://github.com/apache/spark/pull/25001#discussion_r308052815 ## File path: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/RegexpExpressionsSuite.scala ## @@ -118,6 +118,295 @@ class RegexpExpressionsSuite extends SparkFunSuite with ExpressionEvalHelper { checkLiteralRow("""%SystemDrive%\Users\John""" like _, """\%SystemDrive\%\\Users%""", true) } + test("LIKE Pattern ESCAPE '/'") { + +// null handling +checkLiteralRow(Literal.create(null, StringType).like(_, "/"), "a", null) +checkEvaluation( + Literal.create("a", StringType).like(Literal.create(null, StringType), "/"), null) +checkEvaluation( + Literal.create(null, StringType).like(Literal.create(null, StringType), "/"), null) +checkEvaluation( + Literal.create("a", StringType).like(NonFoldableLiteral.create("a", StringType), "/"), true) +checkEvaluation( + Literal.create("a", StringType).like(NonFoldableLiteral.create(null, StringType), "/"), null) +checkEvaluation( + Literal.create(null, StringType).like(NonFoldableLiteral.create("a", StringType), "/"), null) +checkEvaluation( + Literal.create(null, StringType).like(NonFoldableLiteral.create(null, StringType), "/"), null) + +// simple patterns +checkLiteralRow("abdef" like(_, "/"), "abdef", true) +checkLiteralRow("a_%b" like(_, "/"), "a/__b", true) +checkLiteralRow("addb" like(_, "/"), "a_%b", true) +checkLiteralRow("addb" like(_, "/"), "a/__b", false) +checkLiteralRow("addb" like(_, "/"), "a%/%b", false) +checkLiteralRow("a_%b" like(_, "/"), "a%/%b", true) +checkLiteralRow("addb" like(_, "/"), "a%", true) +checkLiteralRow("addb" like(_, "/"), "**", false) +checkLiteralRow("abc" like(_, "/"), "a%", true) +checkLiteralRow("abc" like(_, "/"), "b%", false) +checkLiteralRow("abc" like(_, "/"), "bc%", false) +checkLiteralRow("a\nb" like(_, "/"), "a_b", true) +checkLiteralRow("ab" like(_, "/"), "a%b", true) +checkLiteralRow("a\nb" like(_, "/"), "a%b", true) + +// empty input +checkLiteralRow("" like(_, "/"), "", true) +checkLiteralRow("a" like(_, "/"), "", false) +checkLiteralRow("" like(_, "/"), "a", false) + +// SI-17647 double-escaping backslash +checkLiteralRow("""""" like(_, "/"), """%//%""", true) +checkLiteralRow("""%%""" like(_, "/"), """%%""", true) +checkLiteralRow("""/__""" like(_, "/"), """///__""", true) +checkLiteralRow("""///__""" like(_, "/"), """%//%/%""", false) +checkLiteralRow("""_///%""" like(_, "/"), """%//""", false) + +// unicode +// scalastyle:off nonascii +checkLiteralRow("a\u20ACa" like(_, "/"), "_\u20AC_", true) +checkLiteralRow("a€a" like(_, "/"), "_€_", true) +checkLiteralRow("a€a" like(_, "/"), "_\u20AC_", true) +checkLiteralRow("a\u20ACa" like(_, "/"), "_€_", true) +// scalastyle:on nonascii + +// invalid escaping +val invalidEscape = intercept[AnalysisException] { + evaluateWithoutCodegen("""a""" like("""/a""", "/")) +} +assert(invalidEscape.getMessage.contains("pattern")) + +val endEscape = intercept[AnalysisException] { + evaluateWithoutCodegen("""a""" like("""a/""", "/")) +} +assert(endEscape.getMessage.contains("pattern")) + +// case +checkLiteralRow("A" like(_, "/"), "a%", false) +checkLiteralRow("a" like(_, "/"), "A%", false) +checkLiteralRow("AaA" like(_, "/"), "_a_", true) + +// example +checkLiteralRow( + """%SystemDrive%/Users/John""" like(_, "/"), """/%SystemDrive/%//Users%""", true) + } + + test("LIKE Pattern ESCAPE '#'") { + Review comment: nit: remove a blank. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] maropu commented on a change in pull request #25001: [SPARK-28083][SQL] Support LIKE ... ESCAPE syntax
maropu commented on a change in pull request #25001: [SPARK-28083][SQL] Support LIKE ... ESCAPE syntax URL: https://github.com/apache/spark/pull/25001#discussion_r308052627 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/StringUtils.scala ## @@ -39,9 +39,16 @@ object StringUtils extends Logging { * throw an [[AnalysisException]]. * * @param pattern the SQL pattern to convert + * @param escapeStr the escape string contains one character. * @return the equivalent Java regular expression of the pattern */ - def escapeLikeRegex(pattern: String): String = { + def escapeLikeRegex(pattern: String, escapeStr: String): String = { +assert(escapeStr.length <= 1) +val escapeChar = if (escapeStr.length == 0) { + "" +} else { + escapeStr.charAt(0) +} Review comment: Can you move this check to `AstBuilder`? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] maropu commented on a change in pull request #25001: [SPARK-28083][SQL] Support LIKE ... ESCAPE syntax
maropu commented on a change in pull request #25001: [SPARK-28083][SQL] Support LIKE ... ESCAPE syntax URL: https://github.com/apache/spark/pull/25001#discussion_r308052627 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/StringUtils.scala ## @@ -39,9 +39,16 @@ object StringUtils extends Logging { * throw an [[AnalysisException]]. * * @param pattern the SQL pattern to convert + * @param escapeStr the escape string contains one character. * @return the equivalent Java regular expression of the pattern */ - def escapeLikeRegex(pattern: String): String = { + def escapeLikeRegex(pattern: String, escapeStr: String): String = { +assert(escapeStr.length <= 1) +val escapeChar = if (escapeStr.length == 0) { + "" +} else { + escapeStr.charAt(0) +} Review comment: Can you move this logic to `AstBuilder`? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] maropu commented on a change in pull request #25001: [SPARK-28083][SQL] Support LIKE ... ESCAPE syntax
maropu commented on a change in pull request #25001: [SPARK-28083][SQL] Support LIKE ... ESCAPE syntax URL: https://github.com/apache/spark/pull/25001#discussion_r308052538 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/regexpExpressions.scala ## @@ -83,33 +83,39 @@ abstract class StringRegexExpression extends BinaryExpression % matches zero or more characters in the input (similar to .* in posix regular expressions) - The escape character is '\'. If an escape character precedes a special symbol or another - escape character, the following character is matched literally. It is invalid to escape - any other character. - Since Spark 2.0, string literals are unescaped in our SQL parser. For example, in order to match "\abc", the pattern should be "\\abc". When SQL config 'spark.sql.parser.escapedStringLiterals' is enabled, it fallbacks to Spark 1.6 behavior regarding string literal parsing. For example, if the config is enabled, the pattern to match "\abc" should be "\abc". + * escape - a optional string. The default escape character is the '\' . If an escape character + precedes a special symbol or another escape character, the following character is matched + literally. It is invalid to escape any other character. You can specify '' as escape + character so that disables the escape mechanism, which makes it impossible to turn off + the special meaning of underscore and percent signs in the pattern. Review comment: I like the previous one. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] maropu commented on a change in pull request #25001: [SPARK-28083][SQL] Support LIKE ... ESCAPE syntax
maropu commented on a change in pull request #25001: [SPARK-28083][SQL] Support LIKE ... ESCAPE syntax URL: https://github.com/apache/spark/pull/25001#discussion_r306098357 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/StringUtils.scala ## @@ -39,27 +39,36 @@ object StringUtils extends Logging { * throw an [[AnalysisException]]. * * @param pattern the SQL pattern to convert + * @param escapeStr the escape string contains one character. * @return the equivalent Java regular expression of the pattern */ - def escapeLikeRegex(pattern: String): String = { + def escapeLikeRegex(pattern: String, escapeStr: String): String = { +val escapeChar = escapeStr.charAt(0) val in = pattern.toIterator val out = new StringBuilder() def fail(message: String) = throw new AnalysisException( s"the pattern '$pattern' is invalid, $message") while (in.hasNext) { - in.next match { -case '\\' if in.hasNext => + val cur = in.next + if (cur == escapeChar) { Review comment: You can write like this? ``` in.next match { case c if c == escapeChar && in.hasNext => ``` but, I think this rewriting has no harm. So, I leave it to other reviewers. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] maropu commented on a change in pull request #25001: [SPARK-28083][SQL] Support LIKE ... ESCAPE syntax
maropu commented on a change in pull request #25001: [SPARK-28083][SQL] Support LIKE ... ESCAPE syntax URL: https://github.com/apache/spark/pull/25001#discussion_r305181626 ## File path: sql/core/src/test/scala/org/apache/spark/sql/SQLQueryTestSuite.scala ## @@ -484,6 +484,8 @@ class SQLQueryTestSuite extends QueryTest with SharedSQLContext { """.stripMargin) .load(testFile("test-data/postgresql/tenk.data")) .createOrReplaceTempView("tenk1") + + session.sql("select * from tenk1 where stringu1 like stringu2 escape '\"'") Review comment: This is not correct and I meant we need test s in `resources/sql-tests/inputs/`. Or, to check if the parser works well for the new token `ESCAPE`, its ok to add tests in `ExpressionParserSuite`. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] maropu commented on a change in pull request #25001: [SPARK-28083][SQL] Support LIKE ... ESCAPE syntax
maropu commented on a change in pull request #25001: [SPARK-28083][SQL] Support LIKE ... ESCAPE syntax URL: https://github.com/apache/spark/pull/25001#discussion_r305180881 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala ## @@ -1240,7 +1240,14 @@ class AstBuilder(conf: SQLConf) extends SqlBaseBaseVisitor[AnyRef] with Logging case SqlBaseParser.IN => invertIfNotDefined(In(e, ctx.expression.asScala.map(expression))) case SqlBaseParser.LIKE => -invertIfNotDefined(Like(e, expression(ctx.pattern))) +val escapeOpt = Option(ctx.escapeChar).map(string).map { str => + if (str.length > 1) { +throw new ParseException("Invalid escape string." + + "Escape string must be empty or one character.", ctx) Review comment: Any test for this exception? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] maropu commented on a change in pull request #25001: [SPARK-28083][SQL] Support LIKE ... ESCAPE syntax
maropu commented on a change in pull request #25001: [SPARK-28083][SQL] Support LIKE ... ESCAPE syntax URL: https://github.com/apache/spark/pull/25001#discussion_r305178415 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/StringUtils.scala ## @@ -39,27 +39,36 @@ object StringUtils extends Logging { * throw an [[AnalysisException]]. * * @param pattern the SQL pattern to convert + * @param escapeStr the escape string contains one character. * @return the equivalent Java regular expression of the pattern */ - def escapeLikeRegex(pattern: String): String = { + def escapeLikeRegex(pattern: String, escapeStr: String): String = { +val escapeChar = escapeStr.charAt(0) val in = pattern.toIterator val out = new StringBuilder() def fail(message: String) = throw new AnalysisException( s"the pattern '$pattern' is invalid, $message") while (in.hasNext) { - in.next match { -case '\\' if in.hasNext => + val cur = in.next + if (cur == escapeChar) { Review comment: Any reason to rewrite this part from pattern-matching to if-then? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] maropu commented on a change in pull request #25001: [SPARK-28083][SQL] Support LIKE ... ESCAPE syntax
maropu commented on a change in pull request #25001: [SPARK-28083][SQL] Support LIKE ... ESCAPE syntax URL: https://github.com/apache/spark/pull/25001#discussion_r305172239 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/StringUtils.scala ## @@ -39,27 +39,36 @@ object StringUtils extends Logging { * throw an [[AnalysisException]]. * * @param pattern the SQL pattern to convert + * @param escapeStr the escape string contains one character. * @return the equivalent Java regular expression of the pattern */ - def escapeLikeRegex(pattern: String): String = { + def escapeLikeRegex(pattern: String, escapeStr: String): String = { +val escapeChar = escapeStr.charAt(0) Review comment: this might need to `assert(escapeStr.length == 1)`? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] maropu commented on a change in pull request #25001: [SPARK-28083][SQL] Support LIKE ... ESCAPE syntax
maropu commented on a change in pull request #25001: [SPARK-28083][SQL] Support LIKE ... ESCAPE syntax URL: https://github.com/apache/spark/pull/25001#discussion_r305171884 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/regexpExpressions.scala ## @@ -83,33 +83,38 @@ abstract class StringRegexExpression extends BinaryExpression % matches zero or more characters in the input (similar to .* in posix regular expressions) - The escape character is '\'. If an escape character precedes a special symbol or another - escape character, the following character is matched literally. It is invalid to escape - any other character. - Since Spark 2.0, string literals are unescaped in our SQL parser. For example, in order to match "\abc", the pattern should be "\\abc". When SQL config 'spark.sql.parser.escapedStringLiterals' is enabled, it fallbacks to Spark 1.6 behavior regarding string literal parsing. For example, if the config is enabled, the pattern to match "\abc" should be "\abc". + * escape - a optional string. The default escape character is the '\' . If an escape character + precedes a special symbol or another escape character, the following character is matched + literally. It is invalid to escape any other character. """, examples = """ Examples: > SELECT '%SystemDrive%\Users\John' _FUNC_ '\%SystemDrive\%\\Users%' true + > SELECT '%SystemDrive%/Users/John' _FUNC_ '/%SystemDrive/%//Users%' ESCAPE '/' + true """, note = """ Use RLIKE to match with standard regular expressions. """, since = "1.0.0") -case class Like(left: Expression, right: Expression) extends StringRegexExpression { +case class Like(left: Expression, right: Expression, escapeCharOpt: Option[String] = None) + extends StringRegexExpression { + + private lazy val escapeStr = escapeCharOpt.getOrElse("\\") - override def escape(v: String): String = StringUtils.escapeLikeRegex(v) + override def escape(v: String): String = StringUtils.escapeLikeRegex(v, escapeStr) override def matches(regex: Pattern, str: String): Boolean = regex.matcher(str).matches() - override def toString: String = s"$left LIKE $right" + override def toString: String = s"$left LIKE $right" + +escapeCharOpt.map(str => s" ESCAPE $str").getOrElse("") Review comment: `str` has the same handling with line 150-156 for special chars? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] maropu commented on a change in pull request #25001: [SPARK-28083][SQL] Support LIKE ... ESCAPE syntax
maropu commented on a change in pull request #25001: [SPARK-28083][SQL] Support LIKE ... ESCAPE syntax URL: https://github.com/apache/spark/pull/25001#discussion_r305171610 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/regexpExpressions.scala ## @@ -83,33 +83,38 @@ abstract class StringRegexExpression extends BinaryExpression % matches zero or more characters in the input (similar to .* in posix regular expressions) - The escape character is '\'. If an escape character precedes a special symbol or another - escape character, the following character is matched literally. It is invalid to escape - any other character. - Since Spark 2.0, string literals are unescaped in our SQL parser. For example, in order to match "\abc", the pattern should be "\\abc". When SQL config 'spark.sql.parser.escapedStringLiterals' is enabled, it fallbacks to Spark 1.6 behavior regarding string literal parsing. For example, if the config is enabled, the pattern to match "\abc" should be "\abc". + * escape - a optional string. The default escape character is the '\' . If an escape character + precedes a special symbol or another escape character, the following character is matched + literally. It is invalid to escape any other character. """, examples = """ Examples: > SELECT '%SystemDrive%\Users\John' _FUNC_ '\%SystemDrive\%\\Users%' true + > SELECT '%SystemDrive%/Users/John' _FUNC_ '/%SystemDrive/%//Users%' ESCAPE '/' + true """, note = """ Use RLIKE to match with standard regular expressions. """, since = "1.0.0") -case class Like(left: Expression, right: Expression) extends StringRegexExpression { +case class Like(left: Expression, right: Expression, escapeCharOpt: Option[String] = None) + extends StringRegexExpression { + + private lazy val escapeStr = escapeCharOpt.getOrElse("\\") - override def escape(v: String): String = StringUtils.escapeLikeRegex(v) + override def escape(v: String): String = StringUtils.escapeLikeRegex(v, escapeStr) Review comment: nit: How about this? Then, remove the `escapeStr` variable. ``` override def escape(v: String): String = StringUtils.escapeLikeRegex(v, escapeCharOpt.getOrElse("\\")) ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] maropu commented on a change in pull request #25001: [SPARK-28083][SQL] Support LIKE ... ESCAPE syntax
maropu commented on a change in pull request #25001: [SPARK-28083][SQL] Support LIKE ... ESCAPE syntax URL: https://github.com/apache/spark/pull/25001#discussion_r303892528 ## File path: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/RegexpExpressionsSuite.scala ## @@ -118,6 +118,154 @@ class RegexpExpressionsSuite extends SparkFunSuite with ExpressionEvalHelper { checkLiteralRow("""%SystemDrive%\Users\John""" like _, """\%SystemDrive\%\\Users%""", true) } + test("LIKE Pattern ESCAPE '/'") { + +// null handling +checkLiteralRow(Literal.create(null, StringType).like(_, "/"), "a", null) +checkEvaluation( + Literal.create("a", StringType).like(Literal.create(null, StringType), "/"), null) +checkEvaluation( + Literal.create(null, StringType).like(Literal.create(null, StringType), "/"), null) +checkEvaluation( + Literal.create("a", StringType).like(NonFoldableLiteral.create("a", StringType), "/"), true) +checkEvaluation( + Literal.create("a", StringType).like(NonFoldableLiteral.create(null, StringType), "/"), null) +checkEvaluation( + Literal.create(null, StringType).like(NonFoldableLiteral.create("a", StringType), "/"), null) +checkEvaluation( + Literal.create(null, StringType).like(NonFoldableLiteral.create(null, StringType), "/"), null) + +// simple patterns +checkLiteralRow("abdef" like(_, "/"), "abdef", true) +checkLiteralRow("a_%b" like(_, "/"), "a/__b", true) +checkLiteralRow("addb" like(_, "/"), "a_%b", true) +checkLiteralRow("addb" like(_, "/"), "a/__b", false) +checkLiteralRow("addb" like(_, "/"), "a%/%b", false) +checkLiteralRow("a_%b" like(_, "/"), "a%/%b", true) +checkLiteralRow("addb" like(_, "/"), "a%", true) +checkLiteralRow("addb" like(_, "/"), "**", false) +checkLiteralRow("abc" like(_, "/"), "a%", true) +checkLiteralRow("abc" like(_, "/"), "b%", false) +checkLiteralRow("abc" like(_, "/"), "bc%", false) +checkLiteralRow("a\nb" like(_, "/"), "a_b", true) +checkLiteralRow("ab" like(_, "/"), "a%b", true) +checkLiteralRow("a\nb" like(_, "/"), "a%b", true) + +// empty input +checkLiteralRow("" like(_, "/"), "", true) +checkLiteralRow("a" like(_, "/"), "", false) +checkLiteralRow("" like(_, "/"), "a", false) + +// SI-17647 double-escaping backslash +checkLiteralRow("""""" like(_, "/"), """%//%""", true) +checkLiteralRow("""%%""" like(_, "/"), """%%""", true) +checkLiteralRow("""/__""" like(_, "/"), """///__""", true) +checkLiteralRow("""///__""" like(_, "/"), """%//%/%""", false) +checkLiteralRow("""_///%""" like(_, "/"), """%//""", false) + +// unicode +// scalastyle:off nonascii +checkLiteralRow("a\u20ACa" like(_, "/"), "_\u20AC_", true) +checkLiteralRow("a€a" like(_, "/"), "_€_", true) +checkLiteralRow("a€a" like(_, "/"), "_\u20AC_", true) +checkLiteralRow("a\u20ACa" like(_, "/"), "_€_", true) +// scalastyle:on nonascii + +// invalid escaping +val invalidEscape = intercept[AnalysisException] { + evaluateWithoutCodegen("""a""" like("""/a""", "/")) +} +assert(invalidEscape.getMessage.contains("pattern")) Review comment: "pattern"? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] maropu commented on a change in pull request #25001: [SPARK-28083][SQL] Support LIKE ... ESCAPE syntax
maropu commented on a change in pull request #25001: [SPARK-28083][SQL] Support LIKE ... ESCAPE syntax URL: https://github.com/apache/spark/pull/25001#discussion_r303878494 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/regexpExpressions.scala ## @@ -83,33 +83,39 @@ abstract class StringRegexExpression extends BinaryExpression % matches zero or more characters in the input (similar to .* in posix regular expressions) - The escape character is '\'. If an escape character precedes a special symbol or another - escape character, the following character is matched literally. It is invalid to escape - any other character. - Since Spark 2.0, string literals are unescaped in our SQL parser. For example, in order to match "\abc", the pattern should be "\\abc". When SQL config 'spark.sql.parser.escapedStringLiterals' is enabled, it fallbacks to Spark 1.6 behavior regarding string literal parsing. For example, if the config is enabled, the pattern to match "\abc" should be "\abc". + * escape - a optional string. The default escape character is the '\' but a different one + can be selected by using the ESCAPE clause. If an escape character precedes a special + symbol or another escape character, the following character is matched literally. It is + invalid to escape any other character. """, examples = """ Examples: > SELECT '%SystemDrive%\Users\John' _FUNC_ '\%SystemDrive\%\\Users%' true + > SELECT '%SystemDrive%/Users/John' _FUNC_ '/%SystemDrive/%//Users%' ESCAPE '/' + true """, note = """ Use RLIKE to match with standard regular expressions. """, since = "1.0.0") -case class Like(left: Expression, right: Expression) extends StringRegexExpression { +case class Like(left: Expression, right: Expression, escapeCharOpt: Option[String] = None) + extends StringRegexExpression { + + private lazy val escapeStr = escapeCharOpt.getOrElse("\\") - override def escape(v: String): String = StringUtils.escapeLikeRegex(v) + override def escape(v: String): String = StringUtils.escapeLikeRegex(v, escapeStr) override def matches(regex: Pattern, str: String): Boolean = regex.matcher(str).matches() - override def toString: String = s"$left LIKE $right" + override def toString: String = s"$left LIKE $right" + +escapeCharOpt.map(str => s" ESCAPE $str") Review comment: ``` override def toString: String = s"$left LIKE $right" + escapeCharOpt.map(str => s" ESCAPE $str") ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] maropu commented on a change in pull request #25001: [SPARK-28083][SQL] Support LIKE ... ESCAPE syntax
maropu commented on a change in pull request #25001: [SPARK-28083][SQL] Support LIKE ... ESCAPE syntax URL: https://github.com/apache/spark/pull/25001#discussion_r303890645 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/regexpExpressions.scala ## @@ -142,10 +148,11 @@ case class Like(left: Expression, right: Expression) extends StringRegexExpressi } else { val pattern = ctx.freshName("pattern") val rightStr = ctx.freshName("rightStr") + val escapeChar = escapeCharOpt.getOrElse("""""") Review comment: Have you checked if the other meta characters (e.g., ") work correctly? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] maropu commented on a change in pull request #25001: [SPARK-28083][SQL] Support LIKE ... ESCAPE syntax
maropu commented on a change in pull request #25001: [SPARK-28083][SQL] Support LIKE ... ESCAPE syntax URL: https://github.com/apache/spark/pull/25001#discussion_r303876036 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/regexpExpressions.scala ## @@ -70,8 +70,8 @@ abstract class StringRegexExpression extends BinaryExpression * Simple RegEx pattern matching function */ @ExpressionDescription( - usage = "str _FUNC_ pattern - Returns true if str matches pattern, " + -"null if any arguments are null, false otherwise.", + usage = "str _FUNC_ pattern[ escape] - Returns true if str matches `pattern` with `escape`" + Review comment: `[ escape]` -> `[ ESCAPE escape]`? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] maropu commented on a change in pull request #25001: [SPARK-28083][SQL] Support LIKE ... ESCAPE syntax
maropu commented on a change in pull request #25001: [SPARK-28083][SQL] Support LIKE ... ESCAPE syntax URL: https://github.com/apache/spark/pull/25001#discussion_r303877847 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/regexpExpressions.scala ## @@ -83,33 +83,39 @@ abstract class StringRegexExpression extends BinaryExpression % matches zero or more characters in the input (similar to .* in posix regular expressions) - The escape character is '\'. If an escape character precedes a special symbol or another - escape character, the following character is matched literally. It is invalid to escape - any other character. - Since Spark 2.0, string literals are unescaped in our SQL parser. For example, in order to match "\abc", the pattern should be "\\abc". When SQL config 'spark.sql.parser.escapedStringLiterals' is enabled, it fallbacks to Spark 1.6 behavior regarding string literal parsing. For example, if the config is enabled, the pattern to match "\abc" should be "\abc". + * escape - a optional string. The default escape character is the '\' but a different one + can be selected by using the ESCAPE clause. If an escape character precedes a special + symbol or another escape character, the following character is matched literally. It is + invalid to escape any other character. Review comment: How about this? ``` * escape - an optional string. The default escape character is the '\' . If an escape character precedes a special symbol or another escape character, the following character is matched literally. It is invalid to escape any other character. ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] maropu commented on a change in pull request #25001: [SPARK-28083][SQL] Support LIKE ... ESCAPE syntax
maropu commented on a change in pull request #25001: [SPARK-28083][SQL] Support LIKE ... ESCAPE syntax URL: https://github.com/apache/spark/pull/25001#discussion_r302360877 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/regexpExpressions.scala ## @@ -103,13 +103,16 @@ abstract class StringRegexExpression extends BinaryExpression Use RLIKE to match with standard regular expressions. """, since = "1.0.0") -case class Like(left: Expression, right: Expression) extends StringRegexExpression { +case class Like(left: Expression, right: Expression, escapeCharOpt: Option[String] = None) + extends StringRegexExpression { + + private lazy val escapeStr = escapeCharOpt.getOrElse("\\") - override def escape(v: String): String = StringUtils.escapeLikeRegex(v) + override def escape(v: String): String = StringUtils.escapeLikeRegex(v, escapeStr) override def matches(regex: Pattern, str: String): Boolean = regex.matcher(str).matches() - override def toString: String = s"$left LIKE $right" + override def toString: String = s"$left LIKE $right ESCAPE $escapeStr" Review comment: This is an optional feature, so I think its better to hide this if escapeCharOpt = None. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] maropu commented on a change in pull request #25001: [SPARK-28083][SQL] Support LIKE ... ESCAPE syntax
maropu commented on a change in pull request #25001: [SPARK-28083][SQL] Support LIKE ... ESCAPE syntax URL: https://github.com/apache/spark/pull/25001#discussion_r302302687 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/regexpExpressions.scala ## @@ -70,7 +70,7 @@ abstract class StringRegexExpression extends BinaryExpression * Simple RegEx pattern matching function */ @ExpressionDescription( - usage = "str _FUNC_ pattern - Returns true if str matches pattern, " + + usage = "str _FUNC_ pattern escape - Returns true if str matches pattern with escape, " + Review comment: Is `escape` optional? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] maropu commented on a change in pull request #25001: [SPARK-28083][SQL] Support LIKE ... ESCAPE syntax
maropu commented on a change in pull request #25001: [SPARK-28083][SQL] Support LIKE ... ESCAPE syntax URL: https://github.com/apache/spark/pull/25001#discussion_r302307011 ## File path: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/RegexpExpressionsSuite.scala ## @@ -51,71 +51,123 @@ class RegexpExpressionsSuite extends SparkFunSuite with ExpressionEvalHelper { // null handling checkLiteralRow(Literal.create(null, StringType).like(_), "a", null) +checkLiteralRow(Literal.create(null, StringType).like(_, "/"), "a", null) Review comment: plz drop these tests and add a new test case for various escape characters? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] maropu commented on a change in pull request #25001: [SPARK-28083][SQL] Support LIKE ... ESCAPE syntax
maropu commented on a change in pull request #25001: [SPARK-28083][SQL] Support LIKE ... ESCAPE syntax URL: https://github.com/apache/spark/pull/25001#discussion_r302304973 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/regexpExpressions.scala ## @@ -70,7 +70,7 @@ abstract class StringRegexExpression extends BinaryExpression * Simple RegEx pattern matching function */ @ExpressionDescription( - usage = "str _FUNC_ pattern - Returns true if str matches pattern, " + + usage = "str _FUNC_ pattern escape - Returns true if str matches pattern with escape, " + Review comment: Also, you need to add an example below... (could you check carefully again by referencing the other examples...?) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] maropu commented on a change in pull request #25001: [SPARK-28083][SQL] Support LIKE ... ESCAPE syntax
maropu commented on a change in pull request #25001: [SPARK-28083][SQL] Support LIKE ... ESCAPE syntax URL: https://github.com/apache/spark/pull/25001#discussion_r302305938 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/regexpExpressions.scala ## @@ -142,10 +145,11 @@ case class Like(left: Expression, right: Expression) extends StringRegexExpressi } else { val pattern = ctx.freshName("pattern") val rightStr = ctx.freshName("rightStr") + val escapeChar = escapeCharOpt.getOrElse("") Review comment: plz use `"""` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] maropu commented on a change in pull request #25001: [SPARK-28083][SQL] Support LIKE ... ESCAPE syntax
maropu commented on a change in pull request #25001: [SPARK-28083][SQL] Support LIKE ... ESCAPE syntax URL: https://github.com/apache/spark/pull/25001#discussion_r302306449 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/regexpExpressions.scala ## @@ -103,13 +106,16 @@ abstract class StringRegexExpression extends BinaryExpression Use RLIKE to match with standard regular expressions. """, since = "1.0.0") -case class Like(left: Expression, right: Expression) extends StringRegexExpression { +case class Like(left: Expression, right: Expression, escapeCharOpt: Option[String] = None) + extends StringRegexExpression { - override def escape(v: String): String = StringUtils.escapeLikeRegex(v) + private lazy val escapeStr = escapeCharOpt.getOrElse("\\") Review comment: duplicated in the line 148? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] maropu commented on a change in pull request #25001: [SPARK-28083][SQL] Support LIKE ... ESCAPE syntax
maropu commented on a change in pull request #25001: [SPARK-28083][SQL] Support LIKE ... ESCAPE syntax URL: https://github.com/apache/spark/pull/25001#discussion_r302305300 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/regexpExpressions.scala ## @@ -103,13 +103,16 @@ abstract class StringRegexExpression extends BinaryExpression Use RLIKE to match with standard regular expressions. """, since = "1.0.0") -case class Like(left: Expression, right: Expression) extends StringRegexExpression { +case class Like(left: Expression, right: Expression, escapeCharOpt: Option[String] = None) + extends StringRegexExpression { + + private lazy val escapeStr = escapeCharOpt.getOrElse("\\") - override def escape(v: String): String = StringUtils.escapeLikeRegex(v) + override def escape(v: String): String = StringUtils.escapeLikeRegex(v, escapeStr) override def matches(regex: Pattern, str: String): Boolean = regex.matcher(str).matches() - override def toString: String = s"$left LIKE $right" + override def toString: String = s"$left LIKE $right ESCAPE $escapeStr" Review comment: `escapeStr` always printed if `escapeCharOpt` = None? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] maropu commented on a change in pull request #25001: [SPARK-28083][SQL] Support LIKE ... ESCAPE syntax
maropu commented on a change in pull request #25001: [SPARK-28083][SQL] Support LIKE ... ESCAPE syntax URL: https://github.com/apache/spark/pull/25001#discussion_r300851495 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/regexpExpressions.scala ## @@ -103,13 +106,16 @@ abstract class StringRegexExpression extends BinaryExpression Use RLIKE to match with standard regular expressions. """, since = "1.0.0") -case class Like(left: Expression, right: Expression) extends StringRegexExpression { +case class Like(left: Expression, right: Expression, escapeCharOpt: Option[String] = None) + extends StringRegexExpression { - override def escape(v: String): String = StringUtils.escapeLikeRegex(v) + private lazy val escapeStr = escapeCharOpt.getOrElse("\\") Review comment: we need this lazy? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] maropu commented on a change in pull request #25001: [SPARK-28083][SQL] Support LIKE ... ESCAPE syntax
maropu commented on a change in pull request #25001: [SPARK-28083][SQL] Support LIKE ... ESCAPE syntax URL: https://github.com/apache/spark/pull/25001#discussion_r300851663 ## File path: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/RegexpExpressionsSuite.scala ## @@ -51,71 +51,123 @@ class RegexpExpressionsSuite extends SparkFunSuite with ExpressionEvalHelper { // null handling checkLiteralRow(Literal.create(null, StringType).like(_), "a", null) +checkLiteralRow(Literal.create(null, StringType).like(_, "/"), "a", null) Review comment: Why we need these tests? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] maropu commented on a change in pull request #25001: [SPARK-28083][SQL] Support LIKE ... ESCAPE syntax
maropu commented on a change in pull request #25001: [SPARK-28083][SQL] Support LIKE ... ESCAPE syntax URL: https://github.com/apache/spark/pull/25001#discussion_r300851464 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/regexpExpressions.scala ## @@ -83,16 +83,19 @@ abstract class StringRegexExpression extends BinaryExpression % matches zero or more characters in the input (similar to .* in posix regular expressions) - The escape character is '\'. If an escape character precedes a special symbol or another - escape character, the following character is matched literally. It is invalid to escape - any other character. + The default escape character is '\'. If an escape character precedes a special symbol + or another escape character, the following character is matched literally. It is + invalid to escape any other character. Review comment: How about just moving this section into the description for `escape`? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] maropu commented on a change in pull request #25001: [SPARK-28083][SQL] Support LIKE ... ESCAPE syntax
maropu commented on a change in pull request #25001: [SPARK-28083][SQL] Support LIKE ... ESCAPE syntax URL: https://github.com/apache/spark/pull/25001#discussion_r300850906 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/dsl/package.scala ## @@ -97,7 +97,9 @@ package object dsl { case _ => In(expr, list) } -def like(other: Expression): Expression = Like(expr, other) +def like(other: Expression): Expression = Like(expr, other, None) +def like(other: Expression, escapeChar: String): Expression = + Like(expr, other, Option(escapeChar)) Review comment: Option->Some This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] maropu commented on a change in pull request #25001: [SPARK-28083][SQL] Support LIKE ... ESCAPE syntax
maropu commented on a change in pull request #25001: [SPARK-28083][SQL] Support LIKE ... ESCAPE syntax URL: https://github.com/apache/spark/pull/25001#discussion_r300851598 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala ## @@ -1240,7 +1240,15 @@ class AstBuilder(conf: SQLConf) extends SqlBaseBaseVisitor[AnyRef] with Logging case SqlBaseParser.IN => invertIfNotDefined(In(e, ctx.expression.asScala.map(expression))) case SqlBaseParser.LIKE => -invertIfNotDefined(Like(e, expression(ctx.pattern))) +val escapeOpt = Option(ctx.escapeChar).map{ str => + if (str.getText.length() > 1) { Review comment: nit: length() -> length This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] maropu commented on a change in pull request #25001: [SPARK-28083][SQL] Support LIKE ... ESCAPE syntax
maropu commented on a change in pull request #25001: [SPARK-28083][SQL] Support LIKE ... ESCAPE syntax URL: https://github.com/apache/spark/pull/25001#discussion_r300851675 ## File path: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/util/StringUtilsSuite.scala ## @@ -23,13 +23,20 @@ import org.apache.spark.sql.catalyst.util.StringUtils._ class StringUtilsSuite extends SparkFunSuite { test("escapeLikeRegex") { -assert(escapeLikeRegex("abdef") === "(?s)\\Qa\\E\\Qb\\E\\Qd\\E\\Qe\\E\\Qf\\E") -assert(escapeLikeRegex("a\\__b") === "(?s)\\Qa\\E\\Q_\\E.\\Qb\\E") -assert(escapeLikeRegex("a_%b") === "(?s)\\Qa\\E..*\\Qb\\E") -assert(escapeLikeRegex("a%\\%b") === "(?s)\\Qa\\E.*\\Q%\\E\\Qb\\E") -assert(escapeLikeRegex("a%") === "(?s)\\Qa\\E.*") -assert(escapeLikeRegex("**") === "(?s)\\Q*\\E\\Q*\\E") -assert(escapeLikeRegex("a_b") === "(?s)\\Qa\\E.\\Qb\\E") +assert(escapeLikeRegex("abdef", "\\") === "(?s)\\Qa\\E\\Qb\\E\\Qd\\E\\Qe\\E\\Qf\\E") Review comment: Could you add a new test case instead of adding them in the existing one? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] maropu commented on a change in pull request #25001: [SPARK-28083][SQL] Support LIKE ... ESCAPE syntax
maropu commented on a change in pull request #25001: [SPARK-28083][SQL] Support LIKE ... ESCAPE syntax URL: https://github.com/apache/spark/pull/25001#discussion_r300851549 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala ## @@ -1240,7 +1240,15 @@ class AstBuilder(conf: SQLConf) extends SqlBaseBaseVisitor[AnyRef] with Logging case SqlBaseParser.IN => invertIfNotDefined(In(e, ctx.expression.asScala.map(expression))) case SqlBaseParser.LIKE => -invertIfNotDefined(Like(e, expression(ctx.pattern))) +val escapeOpt = Option(ctx.escapeChar).map{ str => Review comment: nit: `.map { str =>` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] maropu commented on a change in pull request #25001: [SPARK-28083][SQL] Support LIKE ... ESCAPE syntax
maropu commented on a change in pull request #25001: [SPARK-28083][SQL] Support LIKE ... ESCAPE syntax URL: https://github.com/apache/spark/pull/25001#discussion_r300851471 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/regexpExpressions.scala ## @@ -83,16 +83,19 @@ abstract class StringRegexExpression extends BinaryExpression % matches zero or more characters in the input (similar to .* in posix regular expressions) - The escape character is '\'. If an escape character precedes a special symbol or another - escape character, the following character is matched literally. It is invalid to escape - any other character. + The default escape character is '\'. If an escape character precedes a special symbol + or another escape character, the following character is matched literally. It is + invalid to escape any other character. Review comment: Also update the `usage` section. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] maropu commented on a change in pull request #25001: [SPARK-28083][SQL] Support LIKE ... ESCAPE syntax
maropu commented on a change in pull request #25001: [SPARK-28083][SQL] Support LIKE ... ESCAPE syntax URL: https://github.com/apache/spark/pull/25001#discussion_r300519376 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/StringUtils.scala ## @@ -65,6 +65,38 @@ object StringUtils extends Logging { "(?s)" + out.result() // (?s) enables dotall mode, causing "." to match new lines } + def escapeLikeRegex(pattern: String, escapeStr: String): String = { Review comment: Why we need the two version of `escapeLikeRegex`? Can we merge them? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] maropu commented on a change in pull request #25001: [SPARK-28083][SQL] Support LIKE ... ESCAPE syntax
maropu commented on a change in pull request #25001: [SPARK-28083][SQL] Support LIKE ... ESCAPE syntax URL: https://github.com/apache/spark/pull/25001#discussion_r300518433 ## File path: sql/core/src/test/resources/sql-tests/results/typeCoercion/native/implicitTypeCasts.sql.out ## @@ -317,38 +317,38 @@ struct -- !query 39 select 1 like '%' FROM t -- !query 39 schema -struct +struct Review comment: Could you remove this change by overriding some functions, e.g., simpleString? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] maropu commented on a change in pull request #25001: [SPARK-28083][SQL] Support LIKE ... ESCAPE syntax
maropu commented on a change in pull request #25001: [SPARK-28083][SQL] Support LIKE ... ESCAPE syntax URL: https://github.com/apache/spark/pull/25001#discussion_r300517662 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/regexpExpressions.scala ## @@ -83,16 +83,19 @@ abstract class StringRegexExpression extends BinaryExpression % matches zero or more characters in the input (similar to .* in posix regular expressions) - The escape character is '\'. If an escape character precedes a special symbol or another - escape character, the following character is matched literally. It is invalid to escape - any other character. + The default escape character is '\'. If an escape character precedes a special symbol + or another escape character, the following character is matched literally. It is + invalid to escape any other character. Since Spark 2.0, string literals are unescaped in our SQL parser. For example, in order to match "\abc", the pattern should be "\\abc". When SQL config 'spark.sql.parser.escapedStringLiterals' is enabled, it fallbacks to Spark 1.6 behavior regarding string literal parsing. For example, if the config is enabled, the pattern to match "\abc" should be "\abc". + * escape - a string expression. The default escape character is the '\' but a different one Review comment: a string expression? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] maropu commented on a change in pull request #25001: [SPARK-28083][SQL] Support LIKE ... ESCAPE syntax
maropu commented on a change in pull request #25001: [SPARK-28083][SQL] Support LIKE ... ESCAPE syntax URL: https://github.com/apache/spark/pull/25001#discussion_r300516911 ## File path: sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4 ## @@ -662,7 +662,8 @@ predicate : NOT? kind=BETWEEN lower=valueExpression AND upper=valueExpression | NOT? kind=IN '(' expression (',' expression)* ')' | NOT? kind=IN '(' query ')' -| NOT? kind=(RLIKE | LIKE) pattern=valueExpression +| NOT? kind=RLIKE pattern=valueExpression +| NOT? kind=LIKE pattern=valueExpression (ESCAPE escapeChar=valueExpression)? Review comment: `valueExpression` -> `STRING`? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] maropu commented on a change in pull request #25001: [SPARK-28083][SQL] Support LIKE ... ESCAPE syntax
maropu commented on a change in pull request #25001: [SPARK-28083][SQL] Support LIKE ... ESCAPE syntax URL: https://github.com/apache/spark/pull/25001#discussion_r300517831 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/regexpExpressions.scala ## @@ -103,13 +106,20 @@ abstract class StringRegexExpression extends BinaryExpression Use RLIKE to match with standard regular expressions. """, since = "1.0.0") -case class Like(left: Expression, right: Expression) extends StringRegexExpression { +case class Like( + left: Expression, + right: Expression, + escapeCharOpt: Option[String] = None) extends StringRegexExpression { Review comment: ``` case class Like(left: Expression, right: Expression, escapeCharOpt: Option[String] = None) extends StringRegexExpression { ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] maropu commented on a change in pull request #25001: [SPARK-28083][SQL] Support LIKE ... ESCAPE syntax
maropu commented on a change in pull request #25001: [SPARK-28083][SQL] Support LIKE ... ESCAPE syntax URL: https://github.com/apache/spark/pull/25001#discussion_r300517573 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala ## @@ -1240,7 +1240,9 @@ class AstBuilder(conf: SQLConf) extends SqlBaseBaseVisitor[AnyRef] with Logging case SqlBaseParser.IN => invertIfNotDefined(In(e, ctx.expression.asScala.map(expression))) case SqlBaseParser.LIKE => -invertIfNotDefined(Like(e, expression(ctx.pattern))) +val escapeOpt = Option(ctx.escapeChar).map(_.getText) +val like = Like(e, expression(ctx.pattern), escapeOpt) +invertIfNotDefined(like) Review comment: Needs to check if `escapeOpt` has a valid char for an escape ? In PostgreSQL; ``` postgres=# select * from t where a like 'a%c' escape 'invalid'; ERROR: invalid escape string HINT: Escape string must be empty or one character. ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] maropu commented on a change in pull request #25001: [SPARK-28083][SQL] Support LIKE ... ESCAPE syntax
maropu commented on a change in pull request #25001: [SPARK-28083][SQL] Support LIKE ... ESCAPE syntax URL: https://github.com/apache/spark/pull/25001#discussion_r300519634 ## File path: sql/core/src/main/scala/org/apache/spark/sql/Column.scala ## @@ -822,6 +822,16 @@ class Column(val expr: Expression) extends Logging { */ def like(literal: String): Column = withExpr { Like(expr, lit(literal).expr) } + /** + * SQL like expression. Returns a boolean column based on a SQL LIKE ESCAPE match. + * + * @group expr_ops + * @since 3.0.0 + */ + def like(literal: String, escapeStr: String): Column = withExpr { +Like(expr, lit(literal).expr, Option(escapeStr)) + } + Review comment: I'm neutral on adding this. Is that enough to just use the SQL interface for this escape? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] maropu commented on a change in pull request #25001: [SPARK-28083][SQL] Support LIKE ... ESCAPE syntax
maropu commented on a change in pull request #25001: [SPARK-28083][SQL] Support LIKE ... ESCAPE syntax URL: https://github.com/apache/spark/pull/25001#discussion_r300519115 ## File path: docs/sql-keywords.md ## @@ -103,6 +103,7 @@ Below is a list of all the keywords in Spark SQL. DROPnon-reservednon-reservedreserved ELSEreservednon-reservedreserved ENDreservednon-reservedreserved + ESCAPEnon-reservednon-reservedreserved Review comment: Ur, IMHO this should be reserved even in spark. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] maropu commented on a change in pull request #25001: [SPARK-28083][SQL] Support LIKE ... ESCAPE syntax
maropu commented on a change in pull request #25001: [SPARK-28083][SQL] Support LIKE ... ESCAPE syntax URL: https://github.com/apache/spark/pull/25001#discussion_r300204709 ## File path: docs/sql-keywords.md ## @@ -103,6 +103,7 @@ Below is a list of all the keywords in Spark SQL. DROPnon-reservednon-reservedreserved ELSEreservednon-reservedreserved ENDreservednon-reservedreserved + ESCAPEnon-reservednon-reservedreserved Review comment: Also, update `TableIdentifierParserSuite`. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] maropu commented on a change in pull request #25001: [SPARK-28083][SQL] Support LIKE ... ESCAPE syntax
maropu commented on a change in pull request #25001: [SPARK-28083][SQL] Support LIKE ... ESCAPE syntax URL: https://github.com/apache/spark/pull/25001#discussion_r300204451 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/regexpExpressions.scala ## @@ -65,6 +65,59 @@ abstract class StringRegexExpression extends BinaryExpression override def sql: String = s"${left.sql} ${prettyName.toUpperCase(Locale.ROOT)} ${right.sql}" } +abstract class StringRegexV2Expression extends TernaryExpression Review comment: like this? https://github.com/apache/spark/compare/master...maropu:SPARK-28083 ``` /* 066 */ String filter_rightStr_0 = scan_value_1.toString(); /* 067 */ java.util.regex.Pattern filter_pattern_0 = java.util.regex.Pattern.compile(org.apache.spark.sql.catalyst.util.StringUtils.escapeLikeRegex(filter_rightStr_0, "\")); ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org