Github user HyukjinKwon commented on a diff in the pull request:
https://github.com/apache/spark/pull/8391#discussion_r39345066
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JDBCRDD.scala
---
@@ -275,6 +275,10 @@ private[sql] class JDBCRDD(
*/
private def compileFilter(f: Filter): String = f match {
case EqualTo(attr, value) => s"$attr = ${compileValue(value)}"
+ // Since the null-safe equality operator is not a standard SQL
operator,
+ // This was written as using is-null and normal equality.
+ case EqualNullSafe(attr, value) =>
+ s"($attr = ${compileValue(value)} OR ($attr IS NULL AND
${compileValue(value)} IS NULL))"
--- End diff --
Yes, it looks so..
If SparkSQL creates the SQL (for datasources) like the below, it can be a
problem..
```
SELECT 1 <=> 1, NULL <=> NULL, 1 <=> NULL;
-> 1, 1, 0
SELECT 1 = 1, NULL = NULL, 1 = NULL;
-> 1, NULL, NULL
```
However, as I see the codes, I think the `compileFilter()` is only used to
construct WHERE clause.
I tested the expression at MySQL and they look ok.
In details,
I tested four cases at MySQL.
1. attr is a column, value is null.
2. attr is a column, value is not null.
3. attr is null, value is null.
4. attr is null, value is not null.
- `CREATE`
```SQL
CREATE TABLE TestTable
(`id` int, `test` varchar(7))
;
```
- `INSERT`
```SQL
INSERT INTO TestTable
(`id`, `test`)
VALUES
(1, null),
(2, 'OpenAM'),
(3, 'OpenDJ')
;
```
- `SELECT`
```SQL
/*1. attr is a column, value is null. */
SELECT
*
FROM
TestTable
WHERE
(test = NULL OR (test IS NULL AND NULL IS NULL));
SELECT
*
FROM
TestTable
WHERE
test <=> NULL;
```
```SQL
/*2. attr is a column, value is not null.*/
SELECT
*
FROM
TestTable
WHERE
(test = 'OpenAM' OR (test IS NULL AND 'OpenAM' IS NULL));
SELECT
*
FROM
TestTable
WHERE
test <=> 'OpenAM';
```
```SQL
/*3. attr is null, value is null.*/
SELECT
*
FROM
TestTable
WHERE
(NULL = NULL OR (NULL IS NULL AND NULL IS NULL));
SELECT
*
FROM
TestTable
WHERE
NULL <=> NULL;
```
```SQL
/*4. attr is null, value is not null.*/
SELECT
*
FROM
TestTable
WHERE
(NULL = 'OpenAM' OR (NULL IS NULL AND 'OpenAM' IS NULL));
SELECT
*
FROM
TestTable
WHERE
NULL <=> 'OpenAM';
```
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]