Re: [PERFORM] statistics target for columns in unique constraint?

2013-05-20 Thread Kevin Grittner
ach alanchi...@gmail.com wrote:

 One quick follow up since I'm expecting y'all might know:  Do the
 statistics targets actually speed performance on an index search
 itself; the actual lookup?  Or are the JUST to inform the planner
 towards the best pathway decision?

Since the statistics are just a random sampling and generally not
completely up-to-date, they really can't be used for anything other
than *estimating* relative costs in order to try to pick the best
plan.  Once a plan is chosen, its execution time is not influenced
by the statistics.  A higher statistics target can increase
planning time.  In a complex query with many joins and many indexes
on the referenced tables, the increase in planning time can be
significant.  I have seen cases where blindly increasing the
default statistics target resulted in planning time which was
longer than run time -- without any increase in plan quality.

Generally when something is configurable, it's because there can be
benefit to adjusting it.  If there was a single setting which could
not be materially improved upon for some cases, we wouldn't expose
a configuration option.  This is something which is not only
globally adjustable, you can override the setting for individual
columns -- again, we don't go to the trouble of supporting that
without a good reason.

-- 
Kevin Grittner
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] statistics target for columns in unique constraint?

2013-05-18 Thread ach
Thanks guys!  I'm gonna try tuning the statistics back down to 10 on that
table and see what that does to the insertion rates.  Oh and for Mark:  Not
to worry - i'd actually tuned the stats there up myself awhile ago in an
experiment to see if -that- would've sped insertions some; back before i'd
had enough mileage on postgres for it to have occurred to me that might just
have been useless ;-)

One quick follow up since I'm expecting y'all might know:  Do the statistics
targets actually speed performance on an index search itself; the actual
lookup?  Or are the JUST to inform the planner towards the best pathway
decision?  In other words if I have statistics set to 1000, say, in one
case, and the planner chose the exact same path it would have if they'd just
been set to 100, would the lookup return faster when the stats were at 1000? 
Or would it actually take the same time either way?  My hunch is it's the
latter...




--
View this message in context: 
http://postgresql.1045698.n5.nabble.com/statistics-target-for-columns-in-unique-constraint-tp5755256p5756093.html
Sent from the PostgreSQL - performance mailing list archive at Nabble.com.


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] statistics target for columns in unique constraint?

2013-05-13 Thread Marti Raudsepp
On Mon, May 13, 2013 at 6:01 PM, ach alanchi...@gmail.com wrote:
 what I'm wondering is, since
 the unique constraint already covers the whole table and all rows in
 entirety, is it really necessary for statistics to be set that high on
 those?

AFAIK if there are exact-matching unique constraints/indexes for a
query's WHERE clause, the planner will deduce that the query only
returns 1 row and won't consult statistics at all.

 Or does that only serve to slow down inserts to that table?

It doesn't slow down inserts directly. Tables are analyzed in the
background by autovacuum. However, I/O traffic from autovacuum analyze
may slow down inserts running concurrently.

Regards,
Marti


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] statistics target for columns in unique constraint?

2013-05-13 Thread Mark Kirkwood

On 14/05/13 10:10, Marti Raudsepp wrote:

On Mon, May 13, 2013 at 6:01 PM, ach alanchi...@gmail.com wrote:

what I'm wondering is, since
the unique constraint already covers the whole table and all rows in
entirety, is it really necessary for statistics to be set that high on
those?


AFAIK if there are exact-matching unique constraints/indexes for a
query's WHERE clause, the planner will deduce that the query only
returns 1 row and won't consult statistics at all.


Or does that only serve to slow down inserts to that table?


It doesn't slow down inserts directly. Tables are analyzed in the
background by autovacuum. However, I/O traffic from autovacuum analyze
may slow down inserts running concurrently.




A higher number in stats target means larger stats structures - which in 
turn means that the planning stage of *all* queries may be impacted - 
e.g takes up more memory, slightly slower as these larger structures are 
read, iterated over, free'd etc.


So if your only access is via a defined unique key, then (as Marti 
suggests) - a large setting for stats target would seem to be unnecessary.


If you have access to a test environment I'd recommend you model the 
effect of reducing stats target down (back to the default of 100 or even 
to the old version default of 10).


A little - paranoia - maybe switch on statement logging and ensure that 
there are no *other* ways this table is accessed...the fact that the 
number was cranked up from the default is a little suspicious!


Regards

Mark




--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Statistics and Multi-Column indexes

2011-07-15 Thread lars

On 07/10/2011 02:31 PM, Samuel Gendler wrote:
What about partitioning tables by tenant id and then maintaining 
indexes on each partition independent of tenant id, since constraint 
exclusion should handle filtering by tenant id for you.  That seems 
like a potentially more tolerable variant of #5 How many tenants are 
we talking about?  I gather partitioning starts to become problematic 
when the number of partitions gets large.



I thought I had replied... Apparently I didn't.

The database can grow in two dimensions: The number of tenants and the 
number of rows per tenant.
We have many tenants with relatively little data and a few with a lot of 
data. So the number of tenants

is known ahead of time and might be 1000's.

-- Lars


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Statistics and Multi-Column indexes

2011-07-11 Thread Kevin Grittner
lars lhofha...@yahoo.com wrote:
 
 We are maintaining a large multi tenant database where *all*
 tables have a tenant-id and all indexes and PKs lead with the
 tenant-id. Statistics and counts for the all other columns are
 only really meaningful within the context of the tenant they
 belong to.
 
 There appear to be five options for me:
 1. Using single column indexes on all interesting columns and rely
 on PostgreSQLs bitmap indexes to combine them (which are pretty
 cool).
 
Those are cool -- when programmers are skeptical that they should
just say *what* they want and let the database figure out how to get
it, I like to point out that this option is available to the
planner, but not to application programming.  Of course, there are a
great many other reason *I* find more compelling, but this one tends
to impress application programmers.
 
Assuming you keep the primary key as a multi-column index, this
seems like a good place to start.
 
 2. Use multi column indexes and accept that sometimes Postgres
 pick the wrong index (because a non-tenant-id column might seem
 highly selective over the table, but it is not for a particular
 tenant - or vice versa).
 
If you have a lot of queries which access data based on matching
some set of columns, an occasional multicolumn index in addition to
the individual column index may be worth it.
 
You might want to avoid prepared statements, since these are planned
for the general case and can fall down badly for the extremes.
 
 3. Use a functional index that combines multiple columns and only
 query via these, that causes statistics gathering for the
 expression. I.e. create index i on t((tenantid||column1)) and
 SELECT ... FROM t WHERE tenantid||column1 = '...'
 
I would hold off on that until I saw evidence of a specific need.
 
 4. Play with n_distinct and/or set the statistics for the inner
 columns to some fixed values that lead to the plans that we want.
 
Try not to think in terms of plans we want, but in terms of
modeling your costs so that, given your tables and indexes, the
PostgreSQL planner will do a good job of picking a fast plan.  You
normally need to tweak a few of the costing factors to match the
reality of your server and load.
 
 5. Have a completely different schema and maybe a database per
 tenant.
 
 Are there any other options?
 
If most queries operate within a single tenant and you have less
than 100 tenants, you might think about partitioned tables.  Beyond
100, or if most queries need to look at many partitions, it becomes
problematic.
 
 I don't think #2 is actually an option. We have some tenants with
 many (sometimes 100s) millions of rows per table, and picking the
 wrong index would be disastrous.
 
You can always drop counter-productive or unused indexes.  I think
it's best to look at indexes, as much as possible, as database
tuning options, rather than something with any semantic meaning.  If
you're doing things right, an index will never change the result of
any query, so you are free to try different tunings and see what's
fastest with your schema and your production load.
 
We have tables with 100s of millions of rows where everything is
indexed by county number.  The biggest county has about 20% of the
rows; the smallest about 0.05%.  We generally do pretty well with
(1) and (2).  We do occasionally find it useful to create an index
with a WHERE clause, though.  You might want to consider those.
 
-Kevin

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Statistics and Multi-Column indexes

2011-07-10 Thread Samuel Gendler
On Sun, Jul 10, 2011 at 2:16 PM, lars lhofha...@yahoo.com wrote:

 I know this has been discussed various times...

 We are maintaining a large multi tenant database where *all* tables have a
 tenant-id and all indexes and PKs lead with the tenant-id.
 Statistics and counts for the all other columns are only really meaningful
 within the context of the tenant they belong to.

 There appear to be five options for me:
 1. Using single column indexes on all interesting columns and rely on
 PostgreSQLs bitmap indexes to combine them (which are pretty cool).
 2. Use multi column indexes and accept that sometimes Postgres pick the
 wrong index (because a non-tenant-id
 column might seem highly selective over the table, but it is not for a
 particular tenant - or vice versa).
 3. Use a functional index that combines multiple columns and only query via
 these, that causes statistics
 gathering for the expression.
 I.e. create index i on t((tenantid||column1)) and SELECT ... FROM t WHERE
 tenantid||column1 = '...'
 4. Play with n_distinct and/or set the statistics for the inner columns to
 some fixed values that lead to the plans that we want.
 5. Have a completely different schema and maybe a database per tenant.


What about partitioning tables by tenant id and then maintaining indexes on
each partition independent of tenant id, since constraint exclusion should
handle filtering by tenant id for you.  That seems like a potentially more
tolerable variant of #5 How many tenants are we talking about?  I gather
partitioning starts to become problematic when the number of partitions gets
large.


Re: [PERFORM] Statistics use with functions

2009-05-08 Thread Tom Lane
Matthew Wakeling matt...@flymine.org writes:
 When I wrap the fields in the constraints in a LOWER() function, the 
 planner stops looking at the statistics and makes a wild guess, even 
 though it is very obvious from just looking what the result should be. 

Well, in general the planner can't assume anything about the statistics
of a function result, since it doesn't know how the function behaves.
In this case, however, you evidently have an index on lower(distance)
which should have caused ANALYZE to gather stats on the values of that
functional expression.  It looks like there might be something wrong
there --- can you look into pg_stats and see if there is such an entry
and if it looks sane?

 Also, why does the BitmapAnd say zero actual rows?

There isn't any reasonably-inexpensive way for EXPLAIN ANALYZE to
determine how many rows are represented by a bitmap result, so it
doesn't try.

regards, tom lane

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Statistics use with functions

2009-05-08 Thread Tom Lane
Matthew Wakeling matt...@flymine.org writes:
 Ah, now I see it - I re-analysed, and found entries in pg_stats where 
 tablename is the name of the index. Now the query plans correctly and has 
 the right estimates. So, one needs to analyse AFTER creating indexes - 
 didn't know that.

Yes, for functional indexes it's helpful to do that.  Doesn't matter
for plain-old-plain-old indexes.

regards, tom lane

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Statistics use with functions

2009-05-08 Thread Matthew Wakeling

On Fri, 8 May 2009, Tom Lane wrote:

In this case, however, you evidently have an index on lower(distance)
which should have caused ANALYZE to gather stats on the values of that
functional expression.  It looks like there might be something wrong
there --- can you look into pg_stats and see if there is such an entry
and if it looks sane?


What should I be looking for? I don't see anything obvious from this:

modmine-r9=# select attname from pg_stats where tablename = 
'geneflankingregion';

Ah, now I see it - I re-analysed, and found entries in pg_stats where 
tablename is the name of the index. Now the query plans correctly and has 
the right estimates. So, one needs to analyse AFTER creating indexes - 
didn't know that.


modmine-r9=# explain analyse SELECT * FROM geneflankingregion WHERE 
LOWER(distance) = '10.0kb' AND LOWER(direction) = 'upstream';

QUERY PLAN
-
 Bitmap Heap Scan on geneflankingregion
   (cost=1197.19..11701.87 rows=45614 width=212)
   (actual time=18.336..153.825 rows=45502 loops=1)
   Recheck Cond: (lower(distance) = '10.0kb'::text)
   Filter: (lower(direction) = 'upstream'::text)
   -  Bitmap Index Scan on geneflankingregion__distance_equals
 (cost=0.00..1185.78 rows=91134 width=0)
 (actual time=16.565..16.565 rows=91004 loops=1)
 Index Cond: (lower(distance) = '10.0kb'::text)
 Total runtime: 199.282 ms
(6 rows)

Matthew

--
It is better to keep your mouth closed and let people think you are a fool
than to open it and remove all doubt.  -- Mark Twain

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Statistics issue

2008-05-31 Thread Tom Lane
Vlad Arkhipov [EMAIL PROTECTED] writes:
 EXPLAIN ANALYZE
 SELECT *
 FROM i
 WHERE d BETWEEN '2007-05-12' AND '2007-05-12'

 Index Scan using i_d on i  (cost=0.00..2.39 rows=1 width=402) (actual
 time=0.053..4.284 rows=1721 loops=1)
   Index Cond: ((d = '2007-05-12'::date) AND (d = '2007-05-12'::date))
 Total runtime: 6.645 ms

 EXPLAIN ANALYZE
 SELECT *
 FROM i
 WHERE d = '2007-05-12'

 Index Scan using i_d on i  (cost=0.00..38.97 rows=1572 width=402)
 (actual time=0.044..4.250 rows=1721 loops=1)
   Index Cond: (d = '2007-05-12'::date)
 Total runtime: 6.619 ms

Hmm, I wonder whether we shouldn't do something like this
http://archives.postgresql.org/pgsql-committers/2008-03/msg00128.php
for all range conditions, not just those made up by
prefix_selectivity().

regards, tom lane

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] statistics buffer is full

2006-04-02 Thread Qingqing Zhou

Gábriel Ákos [EMAIL PROTECTED] wrote

 I've got this message while heavily inserting into a database. What should 
 I tune and how? It is postgresql 8.1.3.

 2006-03-29 14:16:57.513 CEST:LOG:  statistics buffer is full


Since your server is in a heavy load, so the common trick is to increase 
PGSTAT_RECVBUFFERSZ in include/pgstat.h and recompile your server.

Regards,
Qingqing 



---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings