I believe I may have found a bug in SQLite 3.18. I've got a query that returns 
a correct result set when there are no indexes in the database, but returns an 
incorrect result set if recompiled after an index has been added. The incorrect 
result set has no collation applied, so the problem seems to be that when 
SQLite uses the index to optimize the query, it forgets to apply collation when 
grouping and ordering the results, so the results are in uncollated order as 
they appeared in the index.

Here's a slightly simplified* version of the query. This just returns a sorted 
list of all the unique artist names (the database is generated from an iTunes 
library, with each row containing a JSON object representing a track.) The 
"LCUnicode_CD_" custom collation compares UTF-8-encoded Unicode strings 
ignoring case and diacritics; it's been tested enough that I trust it.

        SELECT json_extract(body, '$.Artist') 
        FROM kv_default 
        WHERE json_extract body, '$.Artist') IS NOT NULL AND json_extract body, 
'$.Compilation') IS NULL 
        GROUP BY json_extract(body, '$.Artist') COLLATE LCUnicode_CD_ 
        ORDER BY json_extract(body, '$.Artist') COLLATE LCUnicode_CD_;

With no index in the database, this query returns the correct results. Artists 
are sorted ignoring case and diacritical marks, and artist names that have 
inconsistent case or diacritics on different tracks (like "The B-52s" vs "The 
b-52s", or "Björk" vs "Bjork") only appear once.

Now I add an index:

        CREATE INDEX byArtist 
        ON kv_default (json_extract(body, '$.Compilation'), json_extract(body, 
'$.Artist'))

After creating the index, recompiling and running the query now produces 
incorrect results: the collation is ignored, so the sort order is 
case-sensitive, accented letters appear after Z, and duplicate artist names 
with different capitalization/accents appear.

Before the index is added, EXPLAIN QUERY PLAN gives:
        0|0|0| SCAN TABLE kv_default
        0|0|0| USE TEMP B-TREE FOR GROUP BY

After the index is added, it gives:
        0|0|0| SEARCH TABLE kv_default USING INDEX byArtist (<expr>=?)

It's forgotten that it needs to sort/group the rows coming from the index, 
because the index isn't collated. :(

I haven't yet tried to create a minimal reproduction, but I'm guessing I could 
reproduce this without using JSON functions or custom collations.

—Jens

* I've omitted some irrelevant result columns, and substituted json_extract for 
the actual custom function I use which has similar behavior (extracting a value 
out of an encoded blob.)
_______________________________________________
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Reply via email to