cloud-fan commented on code in PR #43203:
URL: https://github.com/apache/spark/pull/43203#discussion_r1347515797
##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/regexpExpressions.scala:
##########
@@ -92,11 +92,14 @@ abstract class StringRegexExpression extends
BinaryExpression
_ matches any one character in the input (similar to . in posix
regular expressions)\
% matches zero or more characters in the input (similar to .* in
posix regular
expressions)<br><br>
- Since Spark 2.0, string literals are unescaped in our SQL parser.
For example, in order
- to match "\abc", the pattern should be "\\abc".<br><br>
+ Since Spark 2.0, string literals are unescaped in our SQL parser,
see the unescaping
+ rules at <a
href="https://spark.apache.org/docs/latest/sql-ref-literals.html#string-literal">String
Literal</a>.
+ For example, in order to match "\abc", the pattern should be
"\\abc".<br><br>
When SQL config 'spark.sql.parser.escapedStringLiterals' is enabled,
it falls back
to Spark 1.6 behavior regarding string literal parsing. For example,
if the config is
- enabled, the pattern to match "\abc" should be "\abc".
+ enabled, the pattern to match "\abc" should be "\abc".<br><br>
+ The `pattern` argument might be a raw string literal (with the `r`
prefix) to avoid
Review Comment:
I think we should be more strong-opinioned here:
```
It's recommended to use a raw string literal (with the `r` prefix) to avoid
escaping special characters in the pattern string if exists.
```
Then we add some examples to use raw string literal (should be the same
behavior with turning off `escapedStringLiterals`)
##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/regexpExpressions.scala:
##########
@@ -92,11 +92,14 @@ abstract class StringRegexExpression extends
BinaryExpression
_ matches any one character in the input (similar to . in posix
regular expressions)\
% matches zero or more characters in the input (similar to .* in
posix regular
expressions)<br><br>
- Since Spark 2.0, string literals are unescaped in our SQL parser.
For example, in order
- to match "\abc", the pattern should be "\\abc".<br><br>
+ Since Spark 2.0, string literals are unescaped in our SQL parser,
see the unescaping
+ rules at <a
href="https://spark.apache.org/docs/latest/sql-ref-literals.html#string-literal">String
Literal</a>.
+ For example, in order to match "\abc", the pattern should be
"\\abc".<br><br>
When SQL config 'spark.sql.parser.escapedStringLiterals' is enabled,
it falls back
to Spark 1.6 behavior regarding string literal parsing. For example,
if the config is
- enabled, the pattern to match "\abc" should be "\abc".
+ enabled, the pattern to match "\abc" should be "\abc".<br><br>
+ The `pattern` argument might be a raw string literal (with the `r`
prefix) to avoid
Review Comment:
I think we should be more strong-opinioned here:
```
It's recommended to use a raw string literal (with the `r` prefix) to
avoid escaping special characters in the pattern string if exists.
```
Then we add some examples to use raw string literal (should be the same
behavior with turning off `escapedStringLiterals`)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]