spark git commit: [SPARK-20399][SQL] Add a config to fallback string literal parsing consistent with old sql parser behavior
Repository: spark Updated Branches: refs/heads/master 04901dd03 -> 609ba5f2b [SPARK-20399][SQL] Add a config to fallback string literal parsing consistent with old sql parser behavior ## What changes were proposed in this pull request? The new SQL parser is introduced into Spark 2.0. All string literals are unescaped in parser. Seems it bring an issue regarding the regex pattern string. The following codes can reproduce it: val data = Seq("\u0020\u0021\u0023", "abc") val df = data.toDF() // 1st usage: works in 1.6 // Let parser parse pattern string val rlike1 = df.filter("value rlike '^\\x20[\\x20-\\x23]+$'") // 2nd usage: works in 1.6, 2.x // Call Column.rlike so the pattern string is a literal which doesn't go through parser val rlike2 = df.filter($"value".rlike("^\\x20[\\x20-\\x23]+$")) // In 2.x, we need add backslashes to make regex pattern parsed correctly val rlike3 = df.filter("value rlike '^x20[x20-x23]+$'") Follow the discussion in #17736, this patch adds a config to fallback to 1.6 string literal parsing and mitigate migration issue. ## How was this patch tested? Jenkins tests. Please review http://spark.apache.org/contributing.html before opening a pull request. Author: Liang-Chi Hsieh Closes #17887 from viirya/add-config-fallback-string-parsing. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/609ba5f2 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/609ba5f2 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/609ba5f2 Branch: refs/heads/master Commit: 609ba5f2b9fd89b1b9971d08f7cc680d202dbc7c Parents: 04901dd Author: Liang-Chi Hsieh Authored: Fri May 12 11:15:10 2017 +0800 Committer: Wenchen Fan Committed: Fri May 12 11:15:10 2017 +0800 -- .../sql/catalyst/catalog/SessionCatalog.scala | 2 +- .../expressions/regexpExpressions.scala | 33 - .../spark/sql/catalyst/parser/AstBuilder.scala | 11 +- .../spark/sql/catalyst/parser/ParseDriver.scala | 8 +- .../spark/sql/catalyst/parser/ParserUtils.scala | 6 + .../org/apache/spark/sql/internal/SQLConf.scala | 10 ++ .../catalyst/parser/ExpressionParserSuite.scala | 128 +-- .../spark/sql/execution/SparkSqlParser.scala| 2 +- .../org/apache/spark/sql/DatasetSuite.scala | 13 ++ 9 files changed, 171 insertions(+), 42 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/609ba5f2/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala -- diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala index 18e5146..f6653d3 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala @@ -73,7 +73,7 @@ class SessionCatalog( functionRegistry, conf, new Configuration(), - CatalystSqlParser, + new CatalystSqlParser(conf), DummyFunctionResourceLoader) } http://git-wip-us.apache.org/repos/asf/spark/blob/609ba5f2/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/regexpExpressions.scala -- diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/regexpExpressions.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/regexpExpressions.scala index 3fa8458..aa5a1b5 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/regexpExpressions.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/regexpExpressions.scala @@ -86,6 +86,13 @@ abstract class StringRegexExpression extends BinaryExpression escape character, the following character is matched literally. It is invalid to escape any other character. +Since Spark 2.0, string literals are unescaped in our SQL parser. For example, in order +to match "\abc", the pattern should be "\\abc". + +When SQL config 'spark.sql.parser.escapedStringLiterals' is enabled, it fallbacks +to Spark 1.6 behavior regarding string literal parsing. For example, if the config is +enabled, the pattern to match "\abc" should be "\abc". + Examples: > SELECT '%SystemDrive%\Users\John' _FUNC_ '\%SystemDrive\%\\Users%' true @@ -144,7 +151,31 @@ case class Like(left: Expression, right: Expression) extends StringRegexExpressi } @ExpressionDescription( - usage = "str _FUNC_ regexp - Returns true if `str` matches `
spark git commit: [SPARK-20399][SQL] Add a config to fallback string literal parsing consistent with old sql parser behavior
Repository: spark Updated Branches: refs/heads/branch-2.2 5844151bc -> 3d1908fd5 [SPARK-20399][SQL] Add a config to fallback string literal parsing consistent with old sql parser behavior ## What changes were proposed in this pull request? The new SQL parser is introduced into Spark 2.0. All string literals are unescaped in parser. Seems it bring an issue regarding the regex pattern string. The following codes can reproduce it: val data = Seq("\u0020\u0021\u0023", "abc") val df = data.toDF() // 1st usage: works in 1.6 // Let parser parse pattern string val rlike1 = df.filter("value rlike '^\\x20[\\x20-\\x23]+$'") // 2nd usage: works in 1.6, 2.x // Call Column.rlike so the pattern string is a literal which doesn't go through parser val rlike2 = df.filter($"value".rlike("^\\x20[\\x20-\\x23]+$")) // In 2.x, we need add backslashes to make regex pattern parsed correctly val rlike3 = df.filter("value rlike '^x20[x20-x23]+$'") Follow the discussion in #17736, this patch adds a config to fallback to 1.6 string literal parsing and mitigate migration issue. ## How was this patch tested? Jenkins tests. Please review http://spark.apache.org/contributing.html before opening a pull request. Author: Liang-Chi Hsieh Closes #17887 from viirya/add-config-fallback-string-parsing. (cherry picked from commit 609ba5f2b9fd89b1b9971d08f7cc680d202dbc7c) Signed-off-by: Wenchen Fan Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/3d1908fd Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/3d1908fd Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/3d1908fd Branch: refs/heads/branch-2.2 Commit: 3d1908fd58fd9b1970cbffebdb731bfe4c776ad9 Parents: 5844151 Author: Liang-Chi Hsieh Authored: Fri May 12 11:15:10 2017 +0800 Committer: Wenchen Fan Committed: Fri May 12 11:15:26 2017 +0800 -- .../sql/catalyst/catalog/SessionCatalog.scala | 2 +- .../expressions/regexpExpressions.scala | 33 - .../spark/sql/catalyst/parser/AstBuilder.scala | 11 +- .../spark/sql/catalyst/parser/ParseDriver.scala | 8 +- .../spark/sql/catalyst/parser/ParserUtils.scala | 6 + .../org/apache/spark/sql/internal/SQLConf.scala | 10 ++ .../catalyst/parser/ExpressionParserSuite.scala | 128 +-- .../spark/sql/execution/SparkSqlParser.scala| 2 +- .../org/apache/spark/sql/DatasetSuite.scala | 13 ++ 9 files changed, 171 insertions(+), 42 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/3d1908fd/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala -- diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala index 18e5146..f6653d3 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala @@ -73,7 +73,7 @@ class SessionCatalog( functionRegistry, conf, new Configuration(), - CatalystSqlParser, + new CatalystSqlParser(conf), DummyFunctionResourceLoader) } http://git-wip-us.apache.org/repos/asf/spark/blob/3d1908fd/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/regexpExpressions.scala -- diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/regexpExpressions.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/regexpExpressions.scala index 3fa8458..aa5a1b5 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/regexpExpressions.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/regexpExpressions.scala @@ -86,6 +86,13 @@ abstract class StringRegexExpression extends BinaryExpression escape character, the following character is matched literally. It is invalid to escape any other character. +Since Spark 2.0, string literals are unescaped in our SQL parser. For example, in order +to match "\abc", the pattern should be "\\abc". + +When SQL config 'spark.sql.parser.escapedStringLiterals' is enabled, it fallbacks +to Spark 1.6 behavior regarding string literal parsing. For example, if the config is +enabled, the pattern to match "\abc" should be "\abc". + Examples: > SELECT '%SystemDrive%\Users\John' _FUNC_ '\%SystemDrive\%\\Users%' true @@ -144,7 +151,31 @@ case class Like(left: Expression, right: Expression) extends StringR