Thank you - great feedback.
What you have said about the auto indexes really helps. There were 4
tables involved in the example I sent. Each of the 4 was using an
autoindex.
Here is a comparison:
Original query: 207 seconds
w/ 2 real indexes suggested by DRH: 42 seconds
w/ all 4 tables with real indexes: 0.16 second
My lesson: I thought "auto indexes" were just that: indexes. When I
created a PK on a table and saw the autoindex, I assumed it would behave
the same as a 'regular' index with the same fields. This has shown me that
could not be further from the truth. So it seems I need to add 'real'
indexes that mirror the columns defined in the PK - for all tables.
For the indexed columns that are all NULL - we have many instances of this
schema and in some instances those column are fully populated. So in any
given DB instance, there is probably at least one index on a column that is
NULL. but for obvious support reasons we try to keep all DBs on the same
schema. Your suggestion to make it a partial index makes sense. When
would you NOT want to add the "where [indexedColumn] IS NOT NULL"? Seems
like it would always be helpful.
Thanks again for the excellent insight.
-Denis Burke
On Thu, Jan 21, 2016 at 4:43 PM, Richard Hipp <drh at sqlite.org> wrote:
> On 1/21/16, Richard Hipp <drh at sqlite.org> wrote:
> > On 1/21/16, Denis Burke <burkeden at gmail.com> wrote:
> >>
> >> That query takes 4 minutes to run...
> >
> > Can you please run ".fullschema" on your database from the sqlite3.exe
> > command line shell and post the output
> >
>
> [Mr. Burke sent the .fullschema output to me off-list]
>
> I've studied your schema, and as far as I can tell, SQLite is making
> the right algorithm choice based on the information it has available
> to it.
>
> The "fast" query (the one where you specify an index using the INDEXED
> BY clause) actually create two "automatic indexes" that are discarded
> after the query completes. There is a lot of performance risk in
> choosing to create an automatic index, because creating an index is
> expensive and SQLite has no way to know in advance how well that index
> might work out. And because of this risk, the estimated cost of using
> an automatic index is artificially inflated. Apparently in your case
> the automatic indexes work out quite well, but as I said, the query
> planner has no way of knowing this in advance. So I think it is
> making the right choice.
>
> My recommendation is that you use CREATE INDEX to create real indexes
> that cover the same columns as the two automatic indexes identified in
> the EXPLAIN QUERY PLAN output of your fast query. To wit:
>
> CREATE INDEX x1 ON PatientLocGroups(PatLocGrpID, Hospital,
> ClientID, PatientLocID);
> CREATE INDEX x2 ON PriorityGroups(PriorityGroupID, ClientID,
> PriorityID);
>
> I don't have any idea what data you are storing, but I wonder if
> PatLogGrpID ought not be the PRIMARY KEY of the PatientLocGroups
> table, and if PriorityGroupID ought not be the PRIMARY KEY of the
> PriorityGroups table. If that works, it would obviate the need for
> these indexes, most likely. But I don't know if that would be
> appropriate for your data - only you know that.
>
> Further to the previous paragraph, please note the difference between
> INT PRIMARY KEY and INTEGER PRIMARY KEY. You want the latter most
> likely.
>
> I suspect that if you make some of the changes above, your problem
> will go away, and many of your other queries will be faster as well.
>
> The other thing I saw was in the sqlite_stat1 data. There are many
> entries like this:
>
> INSERT INTO sqlite_stat1 VALUES('ResultData','IndxRDRsltDate','7020427
> 7020427');
>
> Whenever you see the first two numbers in the final column of
> sqlite_stat1 are the same, that indicates bad index. You should drop
> it. It is never being used (because the query planner knows that it
> is an awful index) and is merely taking up space.
>
> In your case, the indexed column appears to always be NULL. Perhaps
> you are thinking that you occasionally have a non-NULL entry that you
> want to look up quick? The way to do that is with a partial index.
> (https://www.sqlite.org/partialindex.html) For the specific example
> above, you want:
>
> CREATE INDEX [IndxRDRsltDate] ON [ResultData] ([RsltDate] COLLATE
> NOCASE ASC) WHERE RsltDate IS NOT NULL;
>
> I think that "WHERE column IS NOT NULL" clause at the end will help you a
> lot.
> --
> D. Richard Hipp
> drh at sqlite.org
>