Hi,

I've got a question and I don't know exactly whether the behavior is  
standard or expected or not.

Assuming the following schema and data:
   CREATE TABLE t1 (id, name);
   INSERT INTO t1 values (1, 'John');
   INSERT INTO t1 values (2, 'Arnold');
   INSERT INTO t1 values (3, 'Zork');

We have some queries that do roughly this:

   CREATE VIEW v1 AS SELECT id FROM t1 ORDER BY name;
   CREATE TABLE t2 AS SELECT id FROM v1;
   SELECT id from t2;

This produces the ids in the expected order:

2
1
3


However, when fetching DISTINCT values, the values are returned in  
sorted order, not in the 'native' order in the table:

   SELECT DISTINCT id from t2;

1
2
3



Up until version 3.5.1 (well actually that was the last version before  
3.5.9 that we used), sqlite would return the distinct values in the  
same order as in the query without the DISTINCT keyword.

I'm suspecting that there is nothing that guarantees the order in  
which results are returned unless I explicitely specify an "ORDER BY"  
term, so this may be according to the standards, but it's nonetheless  
a change that hurts us.
What we did so far was to create a temporary table that contained  
record IDs in a specific order, e.g.

   CREATE TEMP TABLE foo AS SELECT record_id FROM bar WHERE <some  
condition> ORDER BY lastname;

then, after some further processing, we'd do something like

   SELECT DISTINCT record_id FROM foo;

to get the record IDs in the defined order and eventually something like

   SELECT lastname, firstname, contact_info
   FROM foo
   LEFT JOIN bar ON foo.record_id=bar.record_id
   LEFT JOIN contacts ON contacts.bar_id=bar.record_id
   WHERE foo.record_id=5;

to get the information for a single record.

The reasoning behind putting this sorted list of record IDs into a  
temporary table is not really obvious from this example, but our real  
code does a lot more processing and reuses some classes that use  
temporary or real tables to store record selections. These are mainly  
used as sets of records to display in lists for certain [dynamically  
constructed] filter conditions. The display code will then pick the  
record id and run a simple query to fetch the displayed properties for  
that record. Since we can't cache the information for all records in  
the displayed set (might be a really large set initially), we came up  
with this approach to reduce the size of the "cache" to the table  
containing just the record IDs. However, in order to get a fast  
access, we have to rely on the order of the record IDs in the table to  
match the order induced by the "ORDER BY" expression...

So, to cut a long story short: is this change in the returned order  
for DISTINCT queries between version 3.5.1 and 3.5.9 a side effect of  
switching the query engine? If so, is it an undesired side effect or  
is it just something we have to live with?


Thanks,
</jum>

_______________________________________________
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Reply via email to