[sqlite] Wrong Index Select with Large Table Joined to Small Table

Denis Burke Thu, 21 Jan 2016 18:09:58 -0500

Thank you - great feedback.

What you have said about the auto indexes really helps.  There were 4
tables involved in the example I sent.  Each of the 4 was using an
autoindex.


Here is a comparison:

Original query: 207 seconds
w/ 2 real indexes suggested by DRH:  42 seconds
w/ all 4 tables with real indexes: 0.16 second

My lesson: I thought "auto indexes" were just that: indexes.  When I
created a PK on a table and saw the autoindex, I assumed it would behave
the same as a 'regular' index with the same fields.  This has shown me that
could not be further from the truth. So it seems I need to add 'real'
indexes that mirror the columns defined in the PK - for all tables.


For the indexed columns that are all NULL - we have many instances of this
schema and in some instances those column are fully populated.  So in any
given DB instance, there is probably at least one index on a column that is
NULL. but for obvious support reasons we try to keep all DBs on the same
schema.  Your suggestion to make it a partial index makes sense.  When
would you NOT want to add the "where [indexedColumn] IS NOT NULL"?  Seems
like it would always be helpful.


Thanks again for the excellent insight.

-Denis Burke


On Thu, Jan 21, 2016 at 4:43 PM, Richard Hipp <drh at sqlite.org> wrote:

> On 1/21/16, Richard Hipp <drh at sqlite.org> wrote:
> > On 1/21/16, Denis Burke <burkeden at gmail.com> wrote:
> >>
> >> That query takes 4 minutes to run...
> >
> > Can you please run ".fullschema" on your database from the sqlite3.exe
> > command line shell and post the output
> >
>
> [Mr. Burke sent the .fullschema output to me off-list]
>
> I've studied your schema, and as far as I can tell, SQLite is making
> the right algorithm choice based on the information it has available
> to it.
>
> The "fast" query (the one where you specify an index using the INDEXED
> BY clause) actually create two "automatic indexes" that are discarded
> after the query completes.  There is a lot of performance risk in
> choosing to create an automatic index, because creating an index is
> expensive and SQLite has no way to know in advance how well that index
> might work out.  And because of this risk, the estimated cost of using
> an automatic index is artificially inflated.  Apparently in your case
> the automatic indexes work out quite well, but as I said, the query
> planner has no way of knowing this in advance.  So I think it is
> making the right choice.
>
> My recommendation is that you use CREATE INDEX to create real indexes
> that cover the same columns as the two automatic indexes identified in
> the EXPLAIN QUERY PLAN output of your fast query.  To wit:
>
>    CREATE INDEX x1 ON PatientLocGroups(PatLocGrpID, Hospital,
> ClientID, PatientLocID);
>    CREATE INDEX x2 ON PriorityGroups(PriorityGroupID, ClientID,
> PriorityID);
>
> I don't have any idea what data you are storing, but I wonder if
> PatLogGrpID ought not be the PRIMARY KEY of the PatientLocGroups
> table, and if PriorityGroupID ought not be the PRIMARY KEY of the
> PriorityGroups table.  If that works, it would obviate the need for
> these indexes, most likely.  But I don't know if that would be
> appropriate for your data - only you know that.
>
> Further to the previous paragraph, please note the difference between
> INT PRIMARY KEY and INTEGER PRIMARY KEY.  You want the latter most
> likely.
>
> I suspect that if you make some of the changes above, your problem
> will go away, and many of your other queries will be faster as well.
>
> The other thing I saw was in the sqlite_stat1 data.  There are many
> entries like this:
>
> INSERT INTO sqlite_stat1 VALUES('ResultData','IndxRDRsltDate','7020427
> 7020427');
>
> Whenever you see the first two numbers in the final column of
> sqlite_stat1 are the same, that indicates bad index.  You should drop
> it.  It is never being used (because the query planner knows that it
> is an awful index) and is merely taking up space.
>
> In your case, the indexed column appears to always be NULL.  Perhaps
> you are thinking that you occasionally have a non-NULL entry that you
> want to look up quick?  The way to do that is with a partial index.
> (https://www.sqlite.org/partialindex.html)  For the specific example
> above, you want:
>
>    CREATE INDEX [IndxRDRsltDate] ON [ResultData] ([RsltDate] COLLATE
> NOCASE ASC) WHERE RsltDate IS NOT NULL;
>
> I think that "WHERE column IS NOT NULL" clause at the end will help you a
> lot.
> --
> D. Richard Hipp
> drh at sqlite.org
>

[sqlite] Wrong Index Select with Large Table Joined to Small Table

Reply via email to