Re: [PERFORM] Quesion on the use of indexes

2010-08-17 Thread Alvaro Herrera
Excerpts from Tom Lane's message of lun ago 16 23:33:29 -0400 2010:
 Benjamin Krajmalnik k...@servoyant.com writes:
  A little background - I have various multi-column indexes whenever I
  have queries which restrict the output based on the values of the 2
  fields (for example, a client code and the date of a transaction).
 
  Is there a performance gain using this approach as opposed to using 2
  separate indexes, one on the first column and one on the second column?
 
 Maybe, maybe not ... it's going to depend on a bunch of factors, one of
 which is what your update load is like compared to the queries that read
 the indexes.  There's a bit of coverage of this in the fine manual: see
 http://www.postgresql.org/docs/8.4/static/indexes-multicolumn.html
 and the next few pages.

Another important factor is how selective is each clause in isolation
compared to how selective they are together.  We have found that doing
BitmapAnd of two bitmap-scanned indexes is sometimes much too slow
compared to a two-column index.  (I have yet to see a case where indexes
beyond two columns are useful; at this point, combined bitmap indexscans
are enough.)

-- 
Álvaro Herrera alvhe...@commandprompt.com
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Quesion on the use of indexes

2010-08-17 Thread gnuoytr
Here's a quote from the docs:

To combine multiple indexes, the system scans each needed index and prepares a 
bitmap  in memory giving the locations of table rows that are reported as 
matching that index's conditions. The bitmaps are then ANDed and ORed together 
as needed by the query. Finally, the actual table rows are visited and 
returned. The table rows are visited in physical order, because that is how the 
bitmap is laid out; this means that any ordering of the original indexes is 
lost, and so a separate sort step will be needed if the query has an ORDER BY  
clause. For this reason, and because each additional index scan adds extra 
time, the planner will sometimes choose to use a simple index scan even though 
additional indexes are available that could have been used as well

So, if you have a multi-column index, supply values from the major component on 
down, not skipping any (but not necessarily supplying all components, leaving 
the tail components), and the index is clustered, then you will get the best 
performance on a range scan.  For equality scans, who knows?  For high 
selectivity (meaning here, few hits) of single indexes the cost of preparing 
the bitmaps and such may be less than traversing the multi-index and visiting 
the table.  For non-clustered multi-column, my bet would be on the single 
indexes up to some small number of indexes.

And, as the docs say, the optimizer may well decide that it isn't worth the 
effort to use more than the most selective single index.

Robert

 Original message 
Date: Tue, 17 Aug 2010 11:07:39 -0400
From: pgsql-performance-ow...@postgresql.org (on behalf of Alvaro Herrera 
alvhe...@commandprompt.com)
Subject: Re: [PERFORM] Quesion on the use of indexes  
To: Tom Lane t...@sss.pgh.pa.us
Cc: Benjamin Krajmalnik k...@servoyant.com,pgsql-performance 
pgsql-performance@postgresql.org

Excerpts from Tom Lane's message of lun ago 16 23:33:29 -0400 2010:
 Benjamin Krajmalnik k...@servoyant.com writes:
  A little background - I have various multi-column indexes whenever I
  have queries which restrict the output based on the values of the 2
  fields (for example, a client code and the date of a transaction).
 
  Is there a performance gain using this approach as opposed to using 2
  separate indexes, one on the first column and one on the second column?
 
 Maybe, maybe not ... it's going to depend on a bunch of factors, one of
 which is what your update load is like compared to the queries that read
 the indexes.  There's a bit of coverage of this in the fine manual: see
 http://www.postgresql.org/docs/8.4/static/indexes-multicolumn.html
 and the next few pages.

Another important factor is how selective is each clause in isolation
compared to how selective they are together.  We have found that doing
BitmapAnd of two bitmap-scanned indexes is sometimes much too slow
compared to a two-column index.  (I have yet to see a case where indexes
beyond two columns are useful; at this point, combined bitmap indexscans
are enough.)

-- 
Álvaro Herrera alvhe...@commandprompt.com
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] Quesion on the use of indexes

2010-08-16 Thread Benjamin Krajmalnik
A little background - I have various multi-column indexes whenever I
have queries which restrict the output based on the values of the 2
fields (for example, a client code and the date of a transaction).

Is there a performance gain using this approach as opposed to using 2
separate indexes, one on the first column and one on the second column?

 

The reason I am asking is that my coding convetion goes back to the days
where I used ISAM tables, so the systems did not know how to use more
than a single index.

In some cases, I may have an index on (columna, columnb)  and one on
(columnb, columna) due to the data access patterns.  If there are no
performance gains in having these multi-part indexes, and performance
will be the same as having one index solely on columna and one solely on
columnb, then I can reduce the disk usage significantly in some cases.