GitHub user bomeng opened a pull request:
https://github.com/apache/spark/pull/11789
[SPARK-13858] [SQL] fix the data type cast issue
## What changes were proposed in this pull request?
I believe this is a critical bug in the Spark related to cast, which is
found by running the TPC tests.
Use case:
User table (T) has one column (COL) which is FLOAT data type, it has tuple
as (1.49f). the query is:
```
select COL from T where COL = 1.49, or
select COL from T where COL between 0 and 1.49
```
Spark will return nothing. Here is the reason in Scala (try to use Scala
shell to test):
```
val a = 1.49f #a: Float = 1.49
val b = 1.49 #b: Double = 1.49
val c = a == b #c: Boolean = false
val d = a <= b #d: Boolean = false
```
From above, we can see that casting a FLOAT to DOUBLE will not guarantee
the comparison is still correct. There are many use cases that user will use =,
<=, between in the queries, so it will affect the accuracy quite a lot.
The solution is to use Decimal to do the comparison. Since we already have
the DecimalType.forType() in place, we can just leverage the function from
there.
1.49 is the number I found with this cast issue, there may be more.
## How was this patch tested?
I did a full test for SQL, updated the expected results for a test suite,
and also add a new test case for this scenario.
All tests are passed for now.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/bomeng/spark SPARK-13858
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/11789.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #11789
----
commit d315fac89655a89febb44d206f896f87da4e43d2
Author: bomeng <[email protected]>
Date: 2016-03-17T17:48:55Z
fix the data type cast issue
commit 83d8d9deef85fed16ddfa015c60650bc34debc00
Author: bomeng <[email protected]>
Date: 2016-03-17T18:05:45Z
update the comments
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]