[jira] [Commented] (DERBY-6017) IN lists with constants may return wrong results

2012-12-19 Thread Knut Anders Hatlen (JIRA)

[ 
https://issues.apache.org/jira/browse/DERBY-6017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13536117#comment-13536117
 ] 

Knut Anders Hatlen commented on DERBY-6017:
---

Thanks, Bryan.

> Am I right in conceptualizing this as "we need to be implicitly casting the 
> values to
> the correct type in certain situations, and we're currently not doing so."?

For the values in the IN list, I think that's the right way to conceptualize it.

For the simple comparison operations (for example a predicate such as 
9223372036854775805 = 9.223372036854776E18) I'm not so sure. But that's the 
part of the problem that I suggested we didn't focus on in this issue.

I think it's a good idea to make our docs say that equality comparisons 
involving floating point values may have surprising results.

> IN lists with constants may return wrong results
> 
>
> Key: DERBY-6017
> URL: https://issues.apache.org/jira/browse/DERBY-6017
> Project: Derby
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 10.9.1.0
>Reporter: Knut Anders Hatlen
>Assignee: Knut Anders Hatlen
>
> Given this table:
> ij> connect 'jdbc:derby:memory:db;create=true';
> ij> create table t(x bigint);
> 0 rows inserted/updated/deleted
> ij> insert into t values 9223372036854775805, 9223372036854775806, 
> 9223372036854775807;
> 3 rows inserted/updated/deleted
> A query that uses an IN list that contains all the three values actually 
> stored in the table, returns all three rows as expected:
> ij> select * from t where x in (9223372036854775805, 9223372036854775806, 
> 9223372036854775807);
> X   
> 
> 9223372036854775805 
> 9223372036854775806 
> 9223372036854775807 
> 3 rows selected
> However, if we add a value whose type precedence is higher, like a DOUBLE 
> value, and that value happens to be equal to the approximation of the other 
> values in the IN list when they are cast from BIGINT to DOUBLE, only one row 
> is returned:
> ij> select * from t where x in (9223372036854775805, 9223372036854775806, 
> 9223372036854775807, 9.223372036854776E18);
> X   
> 
> 9223372036854775805 
> 1 row selected
> I believe this query should return all three rows too.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (DERBY-6017) IN lists with constants may return wrong results

2012-12-19 Thread Bryan Pendleton (JIRA)

[ 
https://issues.apache.org/jira/browse/DERBY-6017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13536050#comment-13536050
 ] 

Bryan Pendleton commented on DERBY-6017:


A little bit of quick searching confirms my suspicion that most databases have 
this problem,
and that comparison of floating point values can lead to confusing results:

http://stackoverflow.com/questions/2567434/mysql-floating-point-comparison-issues

http://yongjun-jiao.blogspot.com/2011/11/floating-point-number-equality.html

https://kb.askmonty.org/en/numeric-operations/

It might be worth, as a related issue or sub-task, considering ways to improve 
our
documentation in this area so that we guide users toward safer ways of comparing
floating point values, e.g., some form of:

WHERE ABS(float_column - other_value) < small_epsilon

with an appropriate suggestion for what small_epsilon should be.


> IN lists with constants may return wrong results
> 
>
> Key: DERBY-6017
> URL: https://issues.apache.org/jira/browse/DERBY-6017
> Project: Derby
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 10.9.1.0
>Reporter: Knut Anders Hatlen
>Assignee: Knut Anders Hatlen
>
> Given this table:
> ij> connect 'jdbc:derby:memory:db;create=true';
> ij> create table t(x bigint);
> 0 rows inserted/updated/deleted
> ij> insert into t values 9223372036854775805, 9223372036854775806, 
> 9223372036854775807;
> 3 rows inserted/updated/deleted
> A query that uses an IN list that contains all the three values actually 
> stored in the table, returns all three rows as expected:
> ij> select * from t where x in (9223372036854775805, 9223372036854775806, 
> 9223372036854775807);
> X   
> 
> 9223372036854775805 
> 9223372036854775806 
> 9223372036854775807 
> 3 rows selected
> However, if we add a value whose type precedence is higher, like a DOUBLE 
> value, and that value happens to be equal to the approximation of the other 
> values in the IN list when they are cast from BIGINT to DOUBLE, only one row 
> is returned:
> ij> select * from t where x in (9223372036854775805, 9223372036854775806, 
> 9223372036854775807, 9.223372036854776E18);
> X   
> 
> 9223372036854775805 
> 1 row selected
> I believe this query should return all three rows too.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (DERBY-6017) IN lists with constants may return wrong results

2012-12-19 Thread Bryan Pendleton (JIRA)

[ 
https://issues.apache.org/jira/browse/DERBY-6017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13536036#comment-13536036
 ] 

Bryan Pendleton commented on DERBY-6017:


Thank you Knut Anders for digging deep into the standard to explore these 
topics.

I see no holes in your logic; it seems unavoidable that (a) these queries are 
intended to be
clearly-defined by the standard, and (b) Derby is doing it wrong. I can *wish* 
that the
standard were written differently, but it ain't so... :)

I think your proposed approach is excellent.

Hopefully the fact that we currently seem to behave correctly in the ANY 
queries and
in the table value constructor (VALUES ... ) give some clues about what's 
needed to
be included in the other queries to give them the right form.

Am I right in conceptualizing this as "we need to be implicitly casting the 
values to
the correct type in certain situations, and we're currently not doing so."?


> IN lists with constants may return wrong results
> 
>
> Key: DERBY-6017
> URL: https://issues.apache.org/jira/browse/DERBY-6017
> Project: Derby
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 10.9.1.0
>Reporter: Knut Anders Hatlen
>Assignee: Knut Anders Hatlen
>
> Given this table:
> ij> connect 'jdbc:derby:memory:db;create=true';
> ij> create table t(x bigint);
> 0 rows inserted/updated/deleted
> ij> insert into t values 9223372036854775805, 9223372036854775806, 
> 9223372036854775807;
> 3 rows inserted/updated/deleted
> A query that uses an IN list that contains all the three values actually 
> stored in the table, returns all three rows as expected:
> ij> select * from t where x in (9223372036854775805, 9223372036854775806, 
> 9223372036854775807);
> X   
> 
> 9223372036854775805 
> 9223372036854775806 
> 9223372036854775807 
> 3 rows selected
> However, if we add a value whose type precedence is higher, like a DOUBLE 
> value, and that value happens to be equal to the approximation of the other 
> values in the IN list when they are cast from BIGINT to DOUBLE, only one row 
> is returned:
> ij> select * from t where x in (9223372036854775805, 9223372036854775806, 
> 9223372036854775807, 9.223372036854776E18);
> X   
> 
> 9223372036854775805 
> 1 row selected
> I believe this query should return all three rows too.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (DERBY-6017) IN lists with constants may return wrong results

2012-12-19 Thread Knut Anders Hatlen (JIRA)

[ 
https://issues.apache.org/jira/browse/DERBY-6017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13535934#comment-13535934
 ] 

Knut Anders Hatlen commented on DERBY-6017:
---

I've tried to interpret what the standard says. Here are the relevant parts 
I've found:

> 8.4  - Syntax Rules
>
> 2) Let IVL be an .
> ( IVL )
> is equivalent to the :
> ( VALUES IVL )

So, according to this rule, the following two queries should be equivalent 
(which they are not currently):

ij> select * from t where x in (9223372036854775805, 9223372036854775806, 
9223372036854775807, 9.223372036854776E18);
X   

9223372036854775805 

1 row selected
ij> select * from t where x in (values 9223372036854775805, 
9223372036854775806, 9223372036854775807, 9.223372036854776E18);
X   

9223372036854775805 
9223372036854775806 
9223372036854775807 

3 rows selected

Furthermore, it says:

> 8.4  - Syntax Rules
>
> 5) The expression
> RVC IN IPV
> is equivalent to
> RVC = ANY IPV

So to find the correct semantics for IN, we need to rewrite the query to ANY. 
That is,

select * from t where x = any (values 9223372036854775805, 9223372036854775806, 
9223372036854775807, 9.223372036854776E18);

and see what the standard says about that. (This particular ANY query returns 
three rows in Derby, which is the same as the IN (VALUES ...) query above.)

This leads us to:

> 8.8  - Syntax Rules
>
> 1) Let RV1 and RV2 be s whose declared types are 
> respectively that of the  predicand> and the row type of the . The Syntax Rules of 
> Subclause 8.2, “ predicate>”, are applied to:
> RV1  RV2

That is, for the comparisons, the value on the right hand side should have the 
row type of the sub-query.

And the row type of our VALUES sub-query is DOUBLE (or at least some 
approximate numeric type) as 7.3  says row type is 
determined by applying Subclause 9.3, “Data types of results of aggregations”, 
whose syntax rule 3d says:

> If any data type in DTS is approximate numeric, then each data type in DTS 
> shall be numeric and the
> result data type is approximate numeric with implementation-defined precision.

Derby does produce the right type for the :

ij> values 9223372036854775805, 9223372036854775806, 9223372036854775807, 
9.223372036854776E18;
1 
--
9.223372036854776E18  
9.223372036854776E18  
9.223372036854776E18  
9.223372036854776E18  

4 rows selected

The ANY query should therefore end up like:

  select * from t where x = 9.223372036854776E18 or x = 9.223372036854776E18 or 
x = 9.223372036854776E18 or x = 9.223372036854776E18;

Or even simpler, because the DOUBLE representation of all four values happens 
to be the same:

  select * from t where x = 9.223372036854776E18;

Now, 8.2  - General Rules, says this:

> 2) Numbers are compared with respect to their algebraic value.

No more details than that, I'm afraid. And no mentioning about converting the 
operands to the dominant type, so far as I can see.

Derby currently returns these three rows for the query:

ij> select * from t where x = 9.223372036854776E18;
X   

9223372036854775805 
9223372036854775806 
9223372036854775807 

3 rows selected

I'm not completely convinced that all those three values have the same 
algebraic value as 9.223372036854776E18. But in any case I think changing how 
Derby performs numeric comparisons is outside the scope of this issue.

So how's this for a plan? In this issue, let's assume Derby's equality 
comparison operator does the right thing. The goal for now should be to make an 
 behave the same way as the ANY query the SQL standard says it 
should be equivalent to. We should have tests that use the results from the 
equivalent ANY queries as canons, and those tests would also alert us if we 
later make changes to the comparison operator in a way that makes ANY and IN 
behave inconsistently.

> IN lists with constants may return wrong results
> 
>
> Key: DERBY-6017
> URL: https://issues.apache.org/jira/browse/DERBY-6017
> Project: Derby
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 10.9.1.0
>Reporter: Knut Anders Hatlen
>Assignee: Knut Anders Hatlen
>
> Given this table:
> ij> connect 'jdbc:derby:memory:db;create=true';
> ij> create table t(x bigint);
> 0 rows inserted/updated/deleted
> ij> insert into t values 9223372036854775805, 9223372036854775806, 
> 9223372036854775807;
> 3 rows inserted/updated/deleted
> A query that uses an IN list that contains all the three values actually 
> stored in the table, returns all three rows as expected:
> ij> select * from t where x in (9223372036854775805, 9223372036854775806, 
> 9223372036854775807);
> X  

[jira] [Commented] (DERBY-6017) IN lists with constants may return wrong results

2012-12-18 Thread Knut Anders Hatlen (JIRA)

[ 
https://issues.apache.org/jira/browse/DERBY-6017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13535311#comment-13535311
 ] 

Knut Anders Hatlen commented on DERBY-6017:
---

I think the query is allowed by the SQL standard. The first paragraph in 
SQL:2003, part 2, section 4.4.1 (Introduction to numbers) says: "A number is 
either an exact numeric value or an approximate numeric value. Any two numbers 
are comparable." So no such luck.

We probably need to study the standard more closely to find out what the exact 
semantics are, though. In particular this: Is it the sorting or the binary 
search that uses the right kind of comparison. If it's the sorting (which uses 
the same kind of comparison for all the values, based on the dominant type), I 
suspect the problem also affects IN lists that don't have constants. For 
example:

ij> create table t3(b1 bigint, b2 bigint, d double);
0 rows inserted/updated/deleted
ij> insert into t3 values (9223372036854775805, 9223372036854775806, 1);
1 row inserted/updated/deleted
ij> select * from t3 where b1 in (b2, d);
B1  |B2  |D 


0 rows selected

If it is correct that the dominant type should be used, I would have expected 
the above query to return one row, as there is a DOUBLE value in the IN list, 
and b1=b2 when they are converted to DOUBLE.

Another puzzling result that doesn't involve constants, is this:

ij> create table t4 (b bigint);
0 rows inserted/updated/deleted
ij> insert into t4 values 9223372036854775806, 9223372036854775807;
2 rows inserted/updated/deleted
ij> create table t5 (d double);
0 rows inserted/updated/deleted
ij> insert into t5 values 9.223372036854776E18;
1 row inserted/updated/deleted
ij> select * from t4 where b in (select d from t5);
B   

9223372036854775807 

1 row selected
ij> select * from t4 where b in (select cast(d as double) from t5);
B   

9223372036854775806 
9223372036854775807 

2 rows selected

Is it correct that the two queries should return different results? The only 
difference is that the first query accesses the D column with no cast, and the 
second one casts D to DOUBLE. But since D already is a DOUBLE column, I 
wouldn't expect the cast to make any difference.

> IN lists with constants may return wrong results
> 
>
> Key: DERBY-6017
> URL: https://issues.apache.org/jira/browse/DERBY-6017
> Project: Derby
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 10.9.1.0
>Reporter: Knut Anders Hatlen
>Assignee: Knut Anders Hatlen
>
> Given this table:
> ij> connect 'jdbc:derby:memory:db;create=true';
> ij> create table t(x bigint);
> 0 rows inserted/updated/deleted
> ij> insert into t values 9223372036854775805, 9223372036854775806, 
> 9223372036854775807;
> 3 rows inserted/updated/deleted
> A query that uses an IN list that contains all the three values actually 
> stored in the table, returns all three rows as expected:
> ij> select * from t where x in (9223372036854775805, 9223372036854775806, 
> 9223372036854775807);
> X   
> 
> 9223372036854775805 
> 9223372036854775806 
> 9223372036854775807 
> 3 rows selected
> However, if we add a value whose type precedence is higher, like a DOUBLE 
> value, and that value happens to be equal to the approximation of the other 
> values in the IN list when they are cast from BIGINT to DOUBLE, only one row 
> is returned:
> ij> select * from t where x in (9223372036854775805, 9223372036854775806, 
> 9223372036854775807, 9.223372036854776E18);
> X   
> 
> 9223372036854775805 
> 1 row selected
> I believe this query should return all three rows too.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (DERBY-6017) IN lists with constants may return wrong results

2012-12-18 Thread Bryan Pendleton (JIRA)

[ 
https://issues.apache.org/jira/browse/DERBY-6017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13535049#comment-13535049
 ] 

Bryan Pendleton commented on DERBY-6017:


A very clear description, and a very clear analysis; thank you very much!

The fact that floating point comparisons are approximate has always caught me 
up;
I wish that SQL had made it illegal to perform an exact comparison ( "=", "IN", 
etc.)
on a floating point type.

Then we could have just declared this query illegal, and forced the user to 
think
more clearly about what computation they were trying to express.

Is there any hope for such a resolution in the SQL standard?


> IN lists with constants may return wrong results
> 
>
> Key: DERBY-6017
> URL: https://issues.apache.org/jira/browse/DERBY-6017
> Project: Derby
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 10.9.1.0
>Reporter: Knut Anders Hatlen
>Assignee: Knut Anders Hatlen
>
> Given this table:
> ij> connect 'jdbc:derby:memory:db;create=true';
> ij> create table t(x bigint);
> 0 rows inserted/updated/deleted
> ij> insert into t values 9223372036854775805, 9223372036854775806, 
> 9223372036854775807;
> 3 rows inserted/updated/deleted
> A query that uses an IN list that contains all the three values actually 
> stored in the table, returns all three rows as expected:
> ij> select * from t where x in (9223372036854775805, 9223372036854775806, 
> 9223372036854775807);
> X   
> 
> 9223372036854775805 
> 9223372036854775806 
> 9223372036854775807 
> 3 rows selected
> However, if we add a value whose type precedence is higher, like a DOUBLE 
> value, and that value happens to be equal to the approximation of the other 
> values in the IN list when they are cast from BIGINT to DOUBLE, only one row 
> is returned:
> ij> select * from t where x in (9223372036854775805, 9223372036854775806, 
> 9223372036854775807, 9.223372036854776E18);
> X   
> 
> 9223372036854775805 
> 1 row selected
> I believe this query should return all three rows too.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (DERBY-6017) IN lists with constants may return wrong results

2012-12-18 Thread Knut Anders Hatlen (JIRA)

[ 
https://issues.apache.org/jira/browse/DERBY-6017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13534914#comment-13534914
 ] 

Knut Anders Hatlen commented on DERBY-6017:
---

I believe this happens because of optimizations that are performed if the IN 
list consists of constants only.

Such IN lists are sorted at compile time so that binary search can be used to 
find if there's a match at run time. That's all good. However, the sorting and 
the binary search use different ordering. The sorting (in 
ValueNodeList.sortInAscendingOrder()) uses the ordering of the type with the 
highest precedence of the target and all the operands. The binary search (in 
DataType.in()) uses the ordering of the type with the highest precedence of 
each pair of values that it compares.

In the query above, this means:

The sorting happens using the type with the highest precedence of all the 
values. That is, DOUBLE. All the four values in the IN list have the same 
DOUBLE value, so the list is already sorted, regardless of how we order the 
actual values. But when binary search is performed at run time, BIGINT 
semantics are used for some of the comparisons (those that involve BIGINTs 
only) and DOUBLE comparison for others (those that involve a DOUBLE value). So 
the binary search does not see the list as one that contain values that are all 
equal.

Additionally, during preprocessing, there is code to simplify the predicate if 
it's an IN list where all values are equal. This check also uses the dominant 
type, DOUBLE, and finds that the list indeed contains only one distinct value. 
It therefore eliminates the IN list and replaces it with a simple equality 
check using just one of the values in the IN list. That is, it rewrites the 
query from

select * from t where x in (9223372036854775805, 9223372036854775806, 
9223372036854775807, 9.223372036854776E18)

to

select * from t where x = 9223372036854775805

Those two queries are equivalent if the equality operator uses DOUBLE 
semantics. Unfortunately, the information about what's the dominant type is 
lost when the IN list is eliminated, and the equality check is performed using 
BIGINT semantics instead. The result is that only a single row matches.

So I think there are two things that need to be fixed:

1) The sorting and the binary search must be made consistent.

2) The duplicate elimination must preserve type information.

> IN lists with constants may return wrong results
> 
>
> Key: DERBY-6017
> URL: https://issues.apache.org/jira/browse/DERBY-6017
> Project: Derby
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 10.9.1.0
>Reporter: Knut Anders Hatlen
>Assignee: Knut Anders Hatlen
>
> Given this table:
> ij> connect 'jdbc:derby:memory:db;create=true';
> ij> create table t(x bigint);
> 0 rows inserted/updated/deleted
> ij> insert into t values 9223372036854775805, 9223372036854775806, 
> 9223372036854775807;
> 3 rows inserted/updated/deleted
> A query that uses an IN list that contains all the three values actually 
> stored in the table, returns all three rows as expected:
> ij> select * from t where x in (9223372036854775805, 9223372036854775806, 
> 9223372036854775807);
> X   
> 
> 9223372036854775805 
> 9223372036854775806 
> 9223372036854775807 
> 3 rows selected
> However, if we add a value whose type precedence is higher, like a DOUBLE 
> value, and that value happens to be equal to the approximation of the other 
> values in the IN list when they are cast from BIGINT to DOUBLE, only one row 
> is returned:
> ij> select * from t where x in (9223372036854775805, 9223372036854775806, 
> 9223372036854775807, 9.223372036854776E18);
> X   
> 
> 9223372036854775805 
> 1 row selected
> I believe this query should return all three rows too.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira