JoshRosen commented on code in PR #43801:
URL: https://github.com/apache/spark/pull/43801#discussion_r1400156026
##########
sql/catalyst/src/main/java/org/apache/spark/sql/connector/util/V2ExpressionSQLBuilder.java:
##########
@@ -238,21 +261,21 @@ protected String visitStartsWith(String l, String r) {
// Remove quotes at the beginning and end.
// e.g. converts "'str'" to "str".
String value = r.substring(1, r.length() - 1);
- return l + " LIKE '" + value + "%'";
+ return l + " LIKE '" + escapeSpecialChar(value) + "%'";
Review Comment:
We might need to add an `ESCAPE "\"` to explicitly specify the escape
character to use in the `LIKE` because I don't know that all databases have a
default escape character.
For example, from [Oracle
docs](https://docs.oracle.com/en/database/oracle/oracle-database/19/sqlrf/Pattern-matching-Conditions.html#GUID-0779657B-06A8-441F-90C5-044B47862A0A):
> If `esc_char` is not specified, then there is no default escape character.
And
[Snowflake](https://docs.snowflake.com/en/sql-reference/functions/like#usage-notes):
> There is no default escape character.
> If you use the backslash as an escape character, then you must specify
escape the backslash in the ESCAPE clause. For example, the following command
specifies that the escape character is the backslash, and then uses that escape
character to search for ‘%’ as a literal (without the escape character, the ‘%’
would be treated as a wildcard):
> `'SOMETHING%' LIKE '%\\%%' ESCAPE '\\';`
[T-SQL](https://learn.microsoft.com/en-us/sql/t-sql/language-elements/like-transact-sql)
says that it supports `LIKE` but that
> ESCAPE and STRING_ESCAPE are not supported in Azure Synapse Analytics or
Analytics Platform System (PDW).
and also seems to
[suggest](https://learn.microsoft.com/en-us/sql/t-sql/language-elements/like-transact-sql?view=sql-server-ver16#pattern-match-with-the-escape-clause)
that `ESCAPE` must always be specified when using escape characters:
> To search for the percent sign as a character instead of as a wildcard
character, the ESCAPE keyword and escape character must be provided
---
Given all of this, I am wondering whether we might need to allow the
pushdown implementation to vary on a per-SQL-dialect basis, or, alternatively,
have some way to skip pushdown of these expressions in cases where they need to
be escaped but the underlying database does not support the `ESCAPE` clause.
##########
sql/catalyst/src/main/java/org/apache/spark/sql/connector/util/V2ExpressionSQLBuilder.java:
##########
@@ -238,21 +261,21 @@ protected String visitStartsWith(String l, String r) {
// Remove quotes at the beginning and end.
// e.g. converts "'str'" to "str".
String value = r.substring(1, r.length() - 1);
- return l + " LIKE '" + value + "%'";
+ return l + " LIKE '" + escapeSpecialChar(value) + "%'";
Review Comment:
We might need to add an `ESCAPE "\"` to explicitly specify the escape
character to use in the `LIKE` because I don't know that all databases have a
default escape character.
For example, from [Oracle
docs](https://docs.oracle.com/en/database/oracle/oracle-database/19/sqlrf/Pattern-matching-Conditions.html#GUID-0779657B-06A8-441F-90C5-044B47862A0A):
> If `esc_char` is not specified, then there is no default escape character.
And
[Snowflake](https://docs.snowflake.com/en/sql-reference/functions/like#usage-notes):
> There is no default escape character.
> If you use the backslash as an escape character, then you must specify
escape the backslash in the ESCAPE clause. For example, the following command
specifies that the escape character is the backslash, and then uses that escape
character to search for ‘%’ as a literal (without the escape character, the ‘%’
would be treated as a wildcard):
> `'SOMETHING%' LIKE '%\\%%' ESCAPE '\\';`
[T-SQL](https://learn.microsoft.com/en-us/sql/t-sql/language-elements/like-transact-sql)
says that it supports `LIKE` but that
> ESCAPE and STRING_ESCAPE are not supported in Azure Synapse Analytics or
Analytics Platform System (PDW).
and also seems to
[suggest](https://learn.microsoft.com/en-us/sql/t-sql/language-elements/like-transact-sql?view=sql-server-ver16#pattern-match-with-the-escape-clause)
that `ESCAPE` must always be specified when using escape characters:
> To search for the percent sign as a character instead of as a wildcard
character, the ESCAPE keyword and escape character must be provided
---
Given all of this, I am wondering whether we might need to allow the
pushdown implementation to vary on a per-SQL-dialect basis, or, alternatively,
have some way to skip pushdown of these expressions in cases where they need to
be escaped but the underlying database does not support the `ESCAPE` clause.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]