Re: [sqlite] SELECT behaviour with INDEX

Jay A. Kreibich Thu, 17 Jun 2010 07:45:34 -0700

On Thu, Jun 17, 2010 at 08:44:25AM -0500, Black, Michael (IS) scratched on the 
wall:
> Is there any advantage/disadvantage to having seperate indexes in a
> case like this?


  In a case like this, no.  If each column was indexed individually one
  one of them would be used in this query.  There is a (very) small
  performance gain in using the compound index, but you'd have to do a
  lot of data analysis to figure out if the additional overhead and
  storage from the second column is worth the gains in query speed.
  That's a very subjective analysis anyways.


  In any index placement there are advantages and disadvantages.
  Different indexes provide different performance boosts, and you very
  much need the right tool for the job.  It is a hard problem.


> In other words, as a simple design goal to have a separate index
> that matches whatever you select on and order on?

  Yes and no.  If the only concern is query performance, then in theory
  this is true.  But indexes have cost associated.  They add noticeable
  overhead to INSERT/UPDATE/DELETE, and each additional index adds
  additional overhead.  They also add size to the database.  Even if
  your query-to-modification ratio is very very high on the query side,
  adding indexes that provide very marginal benefits may still hurt
  you, thanks to increased I/O costs and cache crowding.

  So, like everything, there is a huge "it depends."  The biggest thing
  is likely query-to-modify ratio, followed by query frequency.  You
  don't spend a lot of time optimizing the query (or index structure)
  for a maintenance routine that runs once a week, especially if that
  modification adds 0.1% overhead to the query you run 10,000 times a
  day (or maybe you do, but the circumstance would be extreme).

  Index placement is deep magic, and is highly dependent on the
  database, the database product, the platform (cache memory, disk
  speeds), the queries, the query frequency, the data flux rate, the
  phase of the moon, and the alignment of the planets.

  It also represents this odd conflict of interest in the whole
  philosophy of SQL query design.  In theory, SQL is all about saying
  what you want, and letting the database engine to all kinds of
  mysterious stuff to get you the results as fast as possible.  In
  practice, it only half works that way, so there is a lot of
  second-guessing about where and when to place indexes.  In many ways,
  that takes more skill and understanding than if SQL just allowed more
  explicit query designation.

  Modern query optimizers are pretty good at doing a good job with the
  tools at hand, but the system is not well setup to provide feedback
  to the developer-- "Put an index on this column and you'll boost
  performance 23.54%", for example.  This isn't unique to SQLite, it
  is an issue with all SQL databases.  Indexes are outside of the
  relational model, so they have evolved in a somewhat haphazard way,
  like the rest of SQL.  It doesn't help that the foundation ideas were
  laid down when a megabyte or two of data was HUGE.  It hasn't scaled
  so well to datasets where good indexes are a critical requirement,
  and not a helpful addition.
  

> It seems to me that if you do "SELECT * FROM log WHERE
> id_clie...@idclient ORDER BY utc DESC LIMIT 1" aren't you better off
> with two indexes, one for id_client and one for utc?  That way you
> don't have to worry about order dependence at all.  It seems only
> if you start doing WHERE clauses on both should you need a compound index.

  Doesn't work that way.  Once a set of rows have been extracted from an
  index, the data is no longer part of the original table, so a second
  index can't be used.  You can look deeper into the same index, but
  not a second index.

  For example, I hand you two phone books--- one ordered by the
  standard last-name, first-name, etc., and a "reverse" phone book that
  is ordered by telephone number (for number => name lookups).

  I then ask you to provide a list of all the people with a last name
  of "Smith", and I want the list ordered by phone number.  You can use
  either phonebook to get the answer, but you can't use both.  You
  either lookup all the Smiths and manually sort that sub-set, or you
  scan the whole reverse book, plucking out the Smiths as you find them
  (and knowing the list is already in order and no further sorting is
  required).

  The only way to use an index to fill both conditions is if the
  phonebook is sorted by both name and then phone number... e.g. a
  compound index on just the right columns.
  
  
  As a general rule of thumb, most database systems can only use one
  index from one table per query (or subquery).  There are a small
  handful of exceptions (such as chained OR conditions), but that's
  a good place to start.

> And in this specific case unless you have a boat load of utc's for
> each client_id the utc index isn't going to buy you much at all.

  Correct.  It is doubtful the I/O overhead is worth it.

   -j

-- 
Jay A. Kreibich < J A Y  @  K R E I B I.C H >

"Intelligence is like underwear: it is important that you have it,
 but showing it to the wrong people has the tendency to make them
 feel uncomfortable." -- Angela Johnson
_______________________________________________
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] SELECT behaviour with INDEX

Reply via email to