I believe I may have found a bug in SQLite 3.18. I've got a query that returns
a correct result set when there are no indexes in the database, but returns an
incorrect result set if recompiled after an index has been added. The incorrect
result set has no collation applied, so the problem seems to be that when
SQLite uses the index to optimize the query, it forgets to apply collation when
grouping and ordering the results, so the results are in uncollated order as
they appeared in the index.
Here's a slightly simplified* version of the query. This just returns a sorted
list of all the unique artist names (the database is generated from an iTunes
library, with each row containing a JSON object representing a track.) The
"LCUnicode_CD_" custom collation compares UTF-8-encoded Unicode strings
ignoring case and diacritics; it's been tested enough that I trust it.
SELECT json_extract(body, '$.Artist')
FROM kv_default
WHERE json_extract body, '$.Artist') IS NOT NULL AND json_extract body,
'$.Compilation') IS NULL
GROUP BY json_extract(body, '$.Artist') COLLATE LCUnicode_CD_
ORDER BY json_extract(body, '$.Artist') COLLATE LCUnicode_CD_;
With no index in the database, this query returns the correct results. Artists
are sorted ignoring case and diacritical marks, and artist names that have
inconsistent case or diacritics on different tracks (like "The B-52s" vs "The
b-52s", or "Björk" vs "Bjork") only appear once.
Now I add an index:
CREATE INDEX byArtist
ON kv_default (json_extract(body, '$.Compilation'), json_extract(body,
'$.Artist'))
After creating the index, recompiling and running the query now produces
incorrect results: the collation is ignored, so the sort order is
case-sensitive, accented letters appear after Z, and duplicate artist names
with different capitalization/accents appear.
Before the index is added, EXPLAIN QUERY PLAN gives:
0|0|0| SCAN TABLE kv_default
0|0|0| USE TEMP B-TREE FOR GROUP BY
After the index is added, it gives:
0|0|0| SEARCH TABLE kv_default USING INDEX byArtist (<expr>=?)
It's forgotten that it needs to sort/group the rows coming from the index,
because the index isn't collated. :(
I haven't yet tried to create a minimal reproduction, but I'm guessing I could
reproduce this without using JSON functions or custom collations.
—Jens
* I've omitted some irrelevant result columns, and substituted json_extract for
the actual custom function I use which has similar behavior (extracting a value
out of an encoded blob.)
_______________________________________________
sqlite-users mailing list
[email protected]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users