On 2014/05/26 00:22, Andrew Arnott wrote:
I'm using C# with SQLite.cs and SQLiteAsync.cs. I started with the slow
version and upon realizing perf was *really* bad I experimented with
different forms and discovered the fast version worked great. On Windows
Phone 8.1, the slow one took ~20 seconds to execute while the fast one was
well under a second. The "Attachment" table contains binary data that can
be over a MB in size.

It isn't clear why the slow one would be so much slower. In my naïve
opinion the query analyze in SQLite should recognize and treat these two as
the same but obviously the execution strategy is vastly different. Is there
an opportunity here to optimize SQLite for the slow query form?

var fast = await this.Database.QueryAsync<Entities.Message>(@"
        SELECT DISTINCT m.Id FROM Message m
        INNER JOIN Attachment a ON a.MessageId = m.Id
        WHERE m.RemoteContactId = ?
        ",
        new object[] { this.RemoteParty.Id });


var slow = await this.Database.QueryAsync<Entities.Message>(@"
        SELECT m.Id FROM Message m
        WHERE m.RemoteContactId = ? AND (
                SELECT COUNT(Id) FROM Attachment a
                WHERE a.MessageId = m.Id
        ) > 0", new object[] { this.RemoteParty.Id });


The slow one will be slow on any system. You are essentially having the Query Engine compile a list of Id's from a table and upon each iteration which satisfies the first (Left) part of the statement has to then perform a sub-Query by going through an entire second table every time and counting up all the instances of a.MessageId=m.Id and then comparing the results (which may be in the thousands for all we know) to zero. This can never be fast if the second table is really large and the looked-up field is not indexed.

Contrast the first (fast) query which requires compiling again a list of IDs, but this time by means of a simple Join which will have a simple lookup on the RemoteContactId (quite possibly a primary key or some other form of index) which would be very fast to get to. No iterating the entire list to get counts. A better second query would be one using EXIST in stead of COUNT()>0 which essentially does the same thing, but EXISTS would look only for 1 candidate and not need to count them all.

To be clear - this can only ever matter if the second table is huge, or at least contains very many messages (or records) with the same RemoteContactId.

To answer your specific question though - it isn't clear why the one is so much slower, the only plausible explanation is a really large second table dataset that is unindexed, or if indexed, contains very many similar Remote contact Ids. Then, IF this is the case, it has nothing much to do with the "query analyze" (by which I presume you mean the Query planner) has no way of "realizing" those are the same, nor any way of improving the lookup since the query itself forces the count, which will be time-consuming in this case.

The first one is a better Query in SQL terms too. As an interesting aside, I found that if you can find a way to use GROUP BY in stead of DISTINCT (having to tweak the other Selected fields a bit) I get much better query times in SQLite specifically. I think this is to do with the way grouping sorts results or looks up existing ones working different to the distinct method, but this is just my hypothesis, however the speed increase is real and consistent through versions (since 3.7.7 anyway - I did not test before then).


_______________________________________________
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Reply via email to