I have learned you can use:

        SET random = 0;

to force identical statistics every time you run ANALYZE.

---------------------------------------------------------------------------

Mary Edie Meredith wrote:
> I certainly don't claim that it is appropriate to force customers into a
> full analysis, particularly if random sampling versus a full scan of the
> data reveals little to no performance differences in the plans.  Being
> able to sample accurately is _very nice for large tables.
> 
> For our testing purposes, however, consistent results are extremely
> important. We have observed that small difference in one plan for one of
> 22 queries can cause a difference in the DBT-3 results.  If this
> happens, a small change in performance runs between two Linux kernels
> may appear to be due to the kernels, when in fact it is due to the plan
> change.  
> 
> We know that the plans are _exactly the same if the data in the
> pg_statistics table is the same from run to run (all other things being
> equal).  So what we need to have is identical optimizer costs
> (pg_statistics) for the same table data for each.
> 
> I feel certain that the pg_statistics table will be identical from run
> to run if analyze looks at every row.   Thus our hope to find a way to
> get that.
> 
> We did runs over night.  We can confirm that VACUUM FULL ANALYZE does
> not produce the same pg_statistics run to run.  With the default (10)
> default_statistics_target the plans are also different.
> 
> We ran additional tests with default_statistics_target set to 1000 (the
> max I believe).  The plans are the same over the different runs, but the
> pg_statistics table has different cost values.  The performance results
> of the runs are consistent (we would expect this with the same plans). 
> The resulting performance metrics are similar to the best plans we see
> using the default histogram size (good news).
> 
> However, we worry that one day the cost will change enough for whatever
> reason to cause a plan change, especially for a larger database scale
> factor (database size/row size). 
> 
> I know we appear to be an isolated case, but customers also do testing
> and may have the same consistency issues we have.  I can also imagine
> cases where customers want to guarantee that plans stay the same
> (between replicated sites, for example).  If two developers are
> analyzing changes to the optimizer, don't you want the costs used for
> testing on their two systems to be identical for comparison purposes?
> 
> Anyway, IMHO I believe that an option for an ANALYZE FULL ("sampling"
> all rows) would be valuable.   Any other ideas for how to force this
> without code change are very welcome.  
> 
> Thanks for your info!
> 
> 
> 
> On Thu, 2003-09-04 at 16:16, Tom Lane wrote:
> > Mary Edie Meredith <[EMAIL PROTECTED]> writes:
> > > Stephan Szabo kindly responded to our earlier queries suggesting we look
> > > at default_statistics_target and ALTER TABLE ALTER COLUMN SET
> > > STATISTICS.  
> > 
> > > These determine the number of bins in the histogram for a given column. 
> > > But for a large number of rows (for example 6 million) the maximum value
> > > (1000) does not guarantee that ANALYZE will do a full scan of the table.
> > > We do not see a way to guarantee the same statistics run to run without
> > > forcing ANALYZE to examine every row of every table.  
> > 
> > Do you actually still have a problem with the plans changing when the
> > stats target is above 100 or so?  I think the notion of "force ANALYZE
> > to do a full scan" is inherently wrongheaded ... it certainly would not
> > produce numbers that have anything to do with ordinary practice.
> > 
> > If you have data statistics that are so bizarre that the planner still
> > gets things wrong with a target of 1000, then I'd like to know more
> > about why.
> > 
> >                     regards, tom lane
> > 
> > ---------------------------(end of broadcast)---------------------------
> > TIP 2: you can get off all lists at once with the unregister command
> >     (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])
> -- 
> Mary Edie Meredith <[EMAIL PROTECTED]>
> Open Source Development Lab
> 
> 
> ---------------------------(end of broadcast)---------------------------
> TIP 2: you can get off all lists at once with the unregister command
>     (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])
> 

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  [EMAIL PROTECTED]               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073

---------------------------(end of broadcast)---------------------------
TIP 6: Have you searched our list archives?

               http://archives.postgresql.org

Reply via email to