On Tue, Mar 07, 2006 at 07:09:15PM +0100, PFC wrote:
> 
> >The problem is that you're now talking about doing 2 index scans instead
> >of just one and a sort.
> 
>       It depends on what you call an index scan :
>       a- Scanning just the index (no heap lookup) to create a bitmap

Sure, and then you scan the other index and read the heap at the same
time (b). Your plan requires doing both. The question is: in what cases
will it be faster to scan the extra index and build the bitmap vs. just
doing a sort.

>       b- Scanning the index and hitting the heap in index order to 
>       retrieve the  rows
> 
>       (a) should be quite fast, because indexes generally use less space 
>       than  the main table, and have good locality of reference. (b) is OK if 
> the 
> table fits in memory, but if it has to seek on every row from the heap...

If the table fits in memory, who cares? A sort should be damn fast at
that point, because you're dealing with a small set of data.

>       So, when doing :
>       SELECT * FROM products WHERE category=C ORDER BY price LIMIT 20;
> 
>       If the category contains few products, using the index on category 
>       then  sorting is good.
>       However, if the category contains many items, postgres is likely to 
>       use  the index on price to avoid the sort. It needs to lose time 
> fetching 

Have you actually seen this behavior? My experience is that you have to
have a correlation somewhere over 80-90% before an index scan is favored
over a seqscan + sort (which as I mentioned before appears to be
broken).

>       The bitmap trick I proposed in my previous post would be even more  
> interesting if the table is clustered on category (which seems a  
> reasonable thing to do).

In which case it's highly unlikely that using the price index will buy
you anything.

>       Does the cost estimator know about this kind of correlation ?

Yes. The problem is that the index scan cost estimator figures out a
best and worst case cost, and then interpolates between the two using
correlation^2. IMO it should be using abs(correlation) to do this, and
there's some data at http://stats.distributed.net/~decibel/ that backs
this up. There's also been some discussions on -hackers (search the
archives for "index cost correlation nasby"), but I've not had time to
follow up on this. If you wanted to test a new index cost formula it
would be a one line change to the code.
-- 
Jim C. Nasby, Sr. Engineering Consultant      [EMAIL PROTECTED]
Pervasive Software      http://pervasive.com    work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf       cell: 512-569-9461

---------------------------(end of broadcast)---------------------------
TIP 1: if posting/reading through Usenet, please send an appropriate
       subscribe-nomail command to [EMAIL PROTECTED] so that your
       message can get through to the mailing list cleanly

Reply via email to