Arjen van der Meijden <[EMAIL PROTECTED]> writes:
>> The file is attached. (bz)Grepping for 'Was executed on 8.2 final slower 
>> than 8.2 dev' with -A1 will show you the timecomparisons, which you 
>> could than look up using your favorite editor.

I dug through this file, and it seems that all the cases where 8.2 final
is choosing a really markedly worse plan are instances of the same
query, for which 8.2 chose a nestloop plan with an inner indexscan on
a clustered index.  8.2 final is failing to choose that because in the
nestloop case it's not giving any cost credit for the index's
correlation.  Obviously a clustered index should have very high
correlation.  In the test case, half a dozen or so heap tuples need to
be fetched per index scan, and because of the correlation it's likely
they are all on the same heap page ... but the costing is assuming that
they are scattered randomly.  I had punted on this point back in June
because it seemed too complicated to handle in combination with the
cross-scan caching, but obviously we need to do something.  After
thinking a bit more, I propose the attached patch --- to be applied on
top of the other ones I sent you --- which seems to fix the problem
here.  Please give it a try.

                        regards, tom lane

*** src/backend/optimizer/path/costsize.c.orig  Fri Dec  8 15:58:34 2006
--- src/backend/optimizer/path/costsize.c       Thu Dec 14 15:37:40 2006
***************
*** 276,288 ****
        if (outer_rel != NULL && outer_rel->rows > 1)
        {
                /*
!                * For repeated indexscans, scale up the number of tuples 
fetched in
                 * the Mackert and Lohman formula by the number of scans, so 
that we
!                * estimate the number of pages fetched by all the scans. Then
                 * pro-rate the costs for one scan.  In this case we assume all 
the
!                * fetches are random accesses.  XXX it'd be good to include
!                * correlation in this model, but it's not clear how to do that
!                * without double-counting cache effects.
                 */
                double          num_scans = outer_rel->rows;
  
--- 276,287 ----
        if (outer_rel != NULL && outer_rel->rows > 1)
        {
                /*
!                * For repeated indexscans, the appropriate estimate for the
!                * uncorrelated case is to scale up the number of tuples 
fetched in
                 * the Mackert and Lohman formula by the number of scans, so 
that we
!                * estimate the number of pages fetched by all the scans; then
                 * pro-rate the costs for one scan.  In this case we assume all 
the
!                * fetches are random accesses.
                 */
                double          num_scans = outer_rel->rows;
  
***************
*** 291,297 ****
                                                                                
        (double) index->pages,
                                                                                
        root);
  
!               run_cost += (pages_fetched * random_page_cost) / num_scans;
        }
        else
        {
--- 290,316 ----
                                                                                
        (double) index->pages,
                                                                                
        root);
  
!               max_IO_cost = (pages_fetched * random_page_cost) / num_scans;
! 
!               /*
!                * In the perfectly correlated case, the number of pages touched
!                * by each scan is selectivity * table_size, and we can use the
!                * Mackert and Lohman formula at the page level to estimate how
!                * much work is saved by caching across scans.  We still assume
!                * all the fetches are random, though, which is an overestimate
!                * that's hard to correct for without double-counting the cache
!                * effects.  (But in most cases where such a plan is actually
!                * interesting, only one page would get fetched per scan anyway,
!                * so it shouldn't matter much.)
!                */
!               pages_fetched = ceil(indexSelectivity * (double) 
baserel->pages);
! 
!               pages_fetched = index_pages_fetched(pages_fetched * num_scans,
!                                                                               
        baserel->pages,
!                                                                               
        (double) index->pages,
!                                                                               
        root);
! 
!               min_IO_cost = (pages_fetched * random_page_cost) / num_scans;
        }
        else
        {
***************
*** 312,326 ****
                min_IO_cost = random_page_cost;
                if (pages_fetched > 1)
                        min_IO_cost += (pages_fetched - 1) * seq_page_cost;
  
!               /*
!                * Now interpolate based on estimated index order correlation 
to get
!                * total disk I/O cost for main table accesses.
!                */
!               csquared = indexCorrelation * indexCorrelation;
  
!               run_cost += max_IO_cost + csquared * (min_IO_cost - 
max_IO_cost);
!       }
  
        /*
         * Estimate CPU costs per tuple.
--- 331,345 ----
                min_IO_cost = random_page_cost;
                if (pages_fetched > 1)
                        min_IO_cost += (pages_fetched - 1) * seq_page_cost;
+       }
  
!       /*
!        * Now interpolate based on estimated index order correlation to get
!        * total disk I/O cost for main table accesses.
!        */
!       csquared = indexCorrelation * indexCorrelation;
  
!       run_cost += max_IO_cost + csquared * (min_IO_cost - max_IO_cost);
  
        /*
         * Estimate CPU costs per tuple.
---------------------------(end of broadcast)---------------------------
TIP 9: In versions below 8.0, the planner will ignore your desire to
       choose an index scan if your joining column's datatypes do not
       match

Reply via email to