Re: [sqlite] Two equivalent queries with very different perf

RSmith Sun, 25 May 2014 19:52:27 -0700


On 2014/05/26 00:22, Andrew Arnott wrote:

I'm using C# with SQLite.cs and SQLiteAsync.cs. I started with the slow
version and upon realizing perf was *really* bad I experimented with
different forms and discovered the fast version worked great. On Windows
Phone 8.1, the slow one took ~20 seconds to execute while the fast one was
well under a second. The "Attachment" table contains binary data that can
be over a MB in size.


It isn't clear why the slow one would be so much slower. In my naïve
opinion the query analyze in SQLite should recognize and treat these two as
the same but obviously the execution strategy is vastly different. Is there
an opportunity here to optimize SQLite for the slow query form?

var fast = await this.Database.QueryAsync<Entities.Message>(@"
        SELECT DISTINCT m.Id FROM Message m
        INNER JOIN Attachment a ON a.MessageId = m.Id
        WHERE m.RemoteContactId = ?
        ",
        new object[] { this.RemoteParty.Id });


var slow = await this.Database.QueryAsync<Entities.Message>(@"
        SELECT m.Id FROM Message m
        WHERE m.RemoteContactId = ? AND (
                SELECT COUNT(Id) FROM Attachment a
                WHERE a.MessageId = m.Id
        ) > 0", new object[] { this.RemoteParty.Id });

The slow one will be slow on any system. You are essentially having the Query Engine compile a list of Id's from a table and uponeach iteration which satisfies the first (Left) part of the statement has to then perform a sub-Query by going through an entiresecond table every time and counting up all the instances of a.MessageId=m.Id and then comparing the results (which may be in thethousands for all we know) to zero. This can never be fast if the second table is really large and the looked-up field is not indexed.

Contrast the first (fast) query which requires compiling again a list of IDs, but this time by means of a simple Join which willhave a simple lookup on the RemoteContactId (quite possibly a primary key or some other form of index) which would be very fast toget to. No iterating the entire list to get counts. A better second query would be one using EXIST in stead of COUNT()>0 whichessentially does the same thing, but EXISTS would look only for 1 candidate and not need to count them all.

To be clear - this can only ever matter if the second table is huge, or at least contains very many messages (or records) with thesame RemoteContactId.

To answer your specific question though - it isn't clear why the one is so much slower, the only plausible explanation is a reallylarge second table dataset that is unindexed, or if indexed, contains very many similar Remote contact Ids. Then, IF this is thecase, it has nothing much to do with the "query analyze" (by which I presume you mean the Query planner) has no way of "realizing"those are the same, nor any way of improving the lookup since the query itself forces the count, which will be time-consuming inthis case.

The first one is a better Query in SQL terms too. As an interesting aside, I found that if you can find a way to use GROUP BY instead of DISTINCT (having to tweak the other Selected fields a bit) I get much better query times in SQLite specifically. I thinkthis is to do with the way grouping sorts results or looks up existing ones working different to the distinct method, but this isjust my hypothesis, however the speed increase is real and consistent through versions (since 3.7.7 anyway - I did not test beforethen).



_______________________________________________
sqlite-users mailing list
[email protected]
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] Two equivalent queries with very different perf

Reply via email to