Patch applied. Thanks. Your documentation changes can be viewed in
five minutes using links on the developer's page,
http://www.postgresql.org/developer/testing.
---
Mark Kirkwood wrote:
> Bruce Momjian wrote:
> > Mark Kirkwood wrote:
> >
> >>At Tom's suggestion, I am going to amend the page to fit into the
> >>'internals' chapter as opposed to 'performance tips' one. I might do
> >>this first, and send you the resulting page.
> >
> > That sounds good, that this become part of the developer docs.
> >
>
> Here is the amended version. I have placed it in its own chapter located
> immediately after 'bki Backend Interface', however there is nothing
> special about that location... feel free to move it around :-)
>
> Mark
>
>
>
>
>
> diff -Naur sgml.orig/filelist.sgml sgml/filelist.sgml
> --- sgml.orig/filelist.sgml Mon Feb 14 15:02:16 2005
> +++ sgml/filelist.sgmlTue Feb 15 09:52:33 2005
> @@ -77,6 +77,7 @@
>
>
>
> +
>
>
>
> diff -Naur sgml.orig/howplanstats.sgml sgml/howplanstats.sgml
> --- sgml.orig/howplanstats.sgml Thu Jan 1 12:00:00 1970
> +++ sgml/howplanstats.sgmlTue Feb 15 17:18:30 2005
> @@ -0,0 +1,370 @@
> +
> +
> +
> + How the Planner Uses Statistics
> +
> +
> + This chapter builds on the material covered in linkend="using-explain">
> + and , and shows how the planner uses the
> + system statistics to estimate the number of rows each stage in a query
> might
> + return. This is a significant part of the planning / optimizing process,
> + providing much of the raw material for cost calculation.
> +
> +
> +
> + The intent of this chapter is not to document the code —
> + better done in the code itself, but to present an overview of how it
> works.
> + This will perhaps ease the learning curve for someone who subsequently
> + wishes to read the code. As a consequence, the approach chosen is to
> analyze
> + a series of incrementally more complex examples.
> +
> +
> +
> + The outputs and algorithms shown below are taken from version 8.0.
> + The behaviour of earlier (or later) versions may vary.
> +
> +
> +
> + Row Estimation Examples
> +
> +
> + row estimation
> + planner
> +
> +
> +
> + Using examples drawn from the regression test database, let's start with
> a
> + very simple query:
> +
> +EXPLAIN SELECT * FROM tenk1;
> +
> + QUERY PLAN
> +-
> + Seq Scan on tenk1 (cost=0.00..445.00 rows=1 width=244)
> +
> +
> + How the planner determines the cardinality of tenk1
> + is covered in , but is repeated here for
> + completeness. The number of rows is looked up from
> + pg_class:
> +
> +
> +SELECT reltuples, relpages FROM pg_class WHERE relname = 'tenk1';
> +
> + relpages | reltuples
> +--+---
> + 345 | 1
> +
> + The planner will check the relpages estimate
> + (this is a cheap operation) and if incorrect may scale
> + reltuples to obtain a row estimate. In this
> case it
> + does not, thus:
> +
> +
> +rows = 1
> +
> +
> +
> +
> +
> + let's move on to an example with a range condition in its
> + WHERE clause:
> +
> +
> +EXPLAIN SELECT * FROM tenk1 WHERE unique1 < 1000;
> +
> + QUERY PLAN
> +
> + Seq Scan on tenk1 (cost=0.00..470.00 rows=1031 width=244)
> + Filter: (unique1 < 1000)
> +
> +
> + The planner examines the WHERE clause condition:
> +
> +
> +unique1 < 1000
> +
> +
> + and looks up the restriction function for the operator
> + < in pg_operator.
> + This is held in the column oprrest,
> + and the result in this case is scalarltsel.
> + The scalarltsel function retrieves the histogram for
> + unique1 from
> pg_statistics
> + - we can follow this by using the simpler pg_stats
> + view:
> +
> +
> +SELECT histogram_bounds FROM pg_stats
> +WHERE tablename='tenk1' AND attname='unique1';
> +
> + histogram_bounds
> +--
> + {1,970,1943,2958,3971,5069,6028,7007,7919,8982,9995}
> +
> +
> + Next the fraction of the histogram occupied by < 1000
> + is worked out. This is the selectivity. The histogram divides the range
> + into equal frequency buckets, so all we have to do is locate the bucket
> + that our value is in and count part of it and
> + all of the ones before. The value 1000 is clearly in
> + the second (970 - 1943) bucket, so by assuming a linear distribution of
> + values inside each bucket we can calculate the selectivity as:
> +
> +
> +selectivity = (1 + (1000 - bckt[2].min)/(bckt[2].max -
> bckt[2].min))/num_bckts
> += (1 + (1000 - 970)/(1943 - 970))/10
> += 0.1031
> +
> +
> + that is, one whole bucket plus a linear fraction of the se