On 2014/05/26 00:22, Andrew Arnott wrote:
I'm using C# with SQLite.cs and SQLiteAsync.cs. I started with the slow
version and upon realizing perf was *really* bad I experimented with
different forms and discovered the fast version worked great. On Windows
Phone 8.1, the slow one took ~20 seconds to execute while the fast one was
well under a second. The "Attachment" table contains binary data that can
be over a MB in size.
It isn't clear why the slow one would be so much slower. In my naïve
opinion the query analyze in SQLite should recognize and treat these two as
the same but obviously the execution strategy is vastly different. Is there
an opportunity here to optimize SQLite for the slow query form?
var fast = await this.Database.QueryAsync<Entities.Message>(@"
SELECT DISTINCT m.Id FROM Message m
INNER JOIN Attachment a ON a.MessageId = m.Id
WHERE m.RemoteContactId = ?
",
new object[] { this.RemoteParty.Id });
var slow = await this.Database.QueryAsync<Entities.Message>(@"
SELECT m.Id FROM Message m
WHERE m.RemoteContactId = ? AND (
SELECT COUNT(Id) FROM Attachment a
WHERE a.MessageId = m.Id
) > 0", new object[] { this.RemoteParty.Id });
The slow one will be slow on any system. You are essentially having the Query Engine compile a list of Id's from a table and upon
each iteration which satisfies the first (Left) part of the statement has to then perform a sub-Query by going through an entire
second table every time and counting up all the instances of a.MessageId=m.Id and then comparing the results (which may be in the
thousands for all we know) to zero. This can never be fast if the second table is really large and the looked-up field is not indexed.
Contrast the first (fast) query which requires compiling again a list of IDs, but this time by means of a simple Join which will
have a simple lookup on the RemoteContactId (quite possibly a primary key or some other form of index) which would be very fast to
get to. No iterating the entire list to get counts. A better second query would be one using EXIST in stead of COUNT()>0 which
essentially does the same thing, but EXISTS would look only for 1 candidate and not need to count them all.
To be clear - this can only ever matter if the second table is huge, or at least contains very many messages (or records) with the
same RemoteContactId.
To answer your specific question though - it isn't clear why the one is so much slower, the only plausible explanation is a really
large second table dataset that is unindexed, or if indexed, contains very many similar Remote contact Ids. Then, IF this is the
case, it has nothing much to do with the "query analyze" (by which I presume you mean the Query planner) has no way of "realizing"
those are the same, nor any way of improving the lookup since the query itself forces the count, which will be time-consuming in
this case.
The first one is a better Query in SQL terms too. As an interesting aside, I found that if you can find a way to use GROUP BY in
stead of DISTINCT (having to tweak the other Selected fields a bit) I get much better query times in SQLite specifically. I think
this is to do with the way grouping sorts results or looks up existing ones working different to the distinct method, but this is
just my hypothesis, however the speed increase is real and consistent through versions (since 3.7.7 anyway - I did not test before
then).
_______________________________________________
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users