Dave Hayden wrote: > On Jun 20, 2004, at 9:07 PM, Darren Duncan wrote: > >> Generally speaking, you should only use indexes on table columns that have >> a lot of distinct values, and each one only appears a few times. You should >> not use indexes on columns that have few distinct values and each appears >> many times; in the latter case, a full table scan would be faster. > > > That's weird. I would have thought that having any index at all to pare down > the result set would make things faster..? Wouldn't the select here: > > CREATE TABLE tmp ( flag boolean, name text ); > > SELECT name FROM tmp WHERE flag = 1 AND name LIKE '%foo%'; > > run faster with an index on the flag column since it can scan just the flag = > 1 rows instead of the full table? >
Let N be the total number of entires in the table and let K be the number of entries with a matching key. Then to do a full table scan requires O(N) time. To use an index, you first consult the index to get a list of K rowids of potential matches. This takes time O(K). Then for each rowid you have to look up the corresponding row in the btree, an O(logN) operation. So the total time for using an index is O(KlogN).
If there is an upper bound on the number of duplicate key entries in a table (in other words if there are lots of different keys) then K will be a constant and O(KlogN) simplifies to O(logN) which is clearly less than O(N). In that case, it pays to use the index.
But suppose there are only a fixed number of different key values. As the N grows, the number of duplicate keys will increase. K will be a linear function of N. So O(KlogN) becomes O(NlogN) which is clearly more than O(N). In that case, it is best to use a full table scan when N is large.
-- D. Richard Hipp -- [EMAIL PROTECTED] -- 704.948.4565
--------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]