Re: [PERFORM] statistics target for columns in unique constraint?
ach wrote: > One quick follow up since I'm expecting y'all might know: Do the > statistics targets actually speed performance on an index search > itself; the actual lookup? Or are the JUST to inform the planner > towards the best pathway decision? Since the statistics are just a random sampling and generally not completely up-to-date, they really can't be used for anything other than *estimating* relative costs in order to try to pick the best plan. Once a plan is chosen, its execution time is not influenced by the statistics. A higher statistics target can increase planning time. In a complex query with many joins and many indexes on the referenced tables, the increase in planning time can be significant. I have seen cases where blindly increasing the default statistics target resulted in planning time which was longer than run time -- without any increase in plan quality. Generally when something is configurable, it's because there can be benefit to adjusting it. If there was a single setting which could not be materially improved upon for some cases, we wouldn't expose a configuration option. This is something which is not only globally adjustable, you can override the setting for individual columns -- again, we don't go to the trouble of supporting that without a good reason. -- Kevin Grittner EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] statistics target for columns in unique constraint?
Thanks guys! I'm gonna try tuning the statistics back down to 10 on that table and see what that does to the insertion rates. Oh and for Mark: Not to worry - i'd actually tuned the stats there up myself awhile ago in an experiment to see if -that- would've sped insertions some; back before i'd had enough mileage on postgres for it to have occurred to me that might just have been useless ;-) One quick follow up since I'm expecting y'all might know: Do the statistics targets actually speed performance on an index search itself; the actual lookup? Or are the JUST to inform the planner towards the best pathway decision? In other words if I have statistics set to 1000, say, in one case, and the planner chose the exact same path it would have if they'd just been set to 100, would the lookup return faster when the stats were at 1000? Or would it actually take the same time either way? My hunch is it's the latter... -- View this message in context: http://postgresql.1045698.n5.nabble.com/statistics-target-for-columns-in-unique-constraint-tp5755256p5756093.html Sent from the PostgreSQL - performance mailing list archive at Nabble.com. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] statistics target for columns in unique constraint?
On 14/05/13 10:10, Marti Raudsepp wrote: On Mon, May 13, 2013 at 6:01 PM, ach wrote: what I'm wondering is, since the unique constraint already covers the whole table and all rows in entirety, is it really necessary for statistics to be set that high on those? AFAIK if there are exact-matching unique constraints/indexes for a query's WHERE clause, the planner will deduce that the query only returns 1 row and won't consult statistics at all. Or does that only serve to slow down inserts to that table? It doesn't slow down inserts directly. Tables are analyzed in the background by autovacuum. However, I/O traffic from autovacuum analyze may slow down inserts running concurrently. A higher number in stats target means larger stats structures - which in turn means that the planning stage of *all* queries may be impacted - e.g takes up more memory, slightly slower as these larger structures are read, iterated over, free'd etc. So if your only access is via a defined unique key, then (as Marti suggests) - a large setting for stats target would seem to be unnecessary. If you have access to a test environment I'd recommend you model the effect of reducing stats target down (back to the default of 100 or even to the old version default of 10). A little - paranoia - maybe switch on statement logging and ensure that there are no *other* ways this table is accessed...the fact that the number was cranked up from the default is a little suspicious! Regards Mark -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] statistics target for columns in unique constraint?
On Mon, May 13, 2013 at 6:01 PM, ach wrote: > what I'm wondering is, since > the unique constraint already covers the whole table and all rows in > entirety, is it really necessary for statistics to be set that high on > those? AFAIK if there are exact-matching unique constraints/indexes for a query's WHERE clause, the planner will deduce that the query only returns 1 row and won't consult statistics at all. > Or does that only serve to slow down inserts to that table? It doesn't slow down inserts directly. Tables are analyzed in the background by autovacuum. However, I/O traffic from autovacuum analyze may slow down inserts running concurrently. Regards, Marti -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Statistics and Multi-Column indexes
On 07/10/2011 02:31 PM, Samuel Gendler wrote: What about partitioning tables by tenant id and then maintaining indexes on each partition independent of tenant id, since constraint exclusion should handle filtering by tenant id for you. That seems like a potentially more tolerable variant of #5 How many tenants are we talking about? I gather partitioning starts to become problematic when the number of partitions gets large. I thought I had replied... Apparently I didn't. The database can grow in two dimensions: The number of tenants and the number of rows per tenant. We have many tenants with relatively little data and a few with a lot of data. So the number of tenants is known ahead of time and might be 1000's. -- Lars -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Statistics and Multi-Column indexes
lars wrote: > We are maintaining a large multi tenant database where *all* > tables have a tenant-id and all indexes and PKs lead with the > tenant-id. Statistics and counts for the all other columns are > only really meaningful within the context of the tenant they > belong to. > > There appear to be five options for me: > 1. Using single column indexes on all interesting columns and rely > on PostgreSQLs bitmap indexes to combine them (which are pretty > cool). Those are cool -- when programmers are skeptical that they should just say *what* they want and let the database figure out how to get it, I like to point out that this option is available to the planner, but not to application programming. Of course, there are a great many other reason *I* find more compelling, but this one tends to impress application programmers. Assuming you keep the primary key as a multi-column index, this seems like a good place to start. > 2. Use multi column indexes and accept that sometimes Postgres > pick the wrong index (because a non-tenant-id column might seem > highly selective over the table, but it is not for a particular > tenant - or vice versa). If you have a lot of queries which access data based on matching some set of columns, an occasional multicolumn index in addition to the individual column index may be worth it. You might want to avoid prepared statements, since these are planned for the general case and can fall down badly for the extremes. > 3. Use a functional index that combines multiple columns and only > query via these, that causes statistics gathering for the > expression. I.e. create index i on t((tenantid||column1)) and > SELECT ... FROM t WHERE tenantid||column1 = '...' I would hold off on that until I saw evidence of a specific need. > 4. Play with n_distinct and/or set the statistics for the inner > columns to some fixed values that lead to the plans that we want. Try not to think in terms of "plans we want", but in terms of modeling your costs so that, given your tables and indexes, the PostgreSQL planner will do a good job of picking a fast plan. You normally need to tweak a few of the costing factors to match the reality of your server and load. > 5. Have a completely different schema and maybe a database per > tenant. > Are there any other options? If most queries operate within a single tenant and you have less than 100 tenants, you might think about partitioned tables. Beyond 100, or if most queries need to look at many partitions, it becomes problematic. > I don't think #2 is actually an option. We have some tenants with > many (sometimes 100s) millions of rows per table, and picking the > wrong index would be disastrous. You can always drop counter-productive or unused indexes. I think it's best to look at indexes, as much as possible, as database tuning options, rather than something with any semantic meaning. If you're doing things right, an index will never change the result of any query, so you are free to try different tunings and see what's fastest with your schema and your production load. We have tables with 100s of millions of rows where everything is indexed by county number. The biggest county has about 20% of the rows; the smallest about 0.05%. We generally do pretty well with (1) and (2). We do occasionally find it useful to create an index with a WHERE clause, though. You might want to consider those. -Kevin -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Statistics and Multi-Column indexes
On Sun, Jul 10, 2011 at 2:16 PM, lars wrote: > I know this has been discussed various times... > > We are maintaining a large multi tenant database where *all* tables have a > tenant-id and all indexes and PKs lead with the tenant-id. > Statistics and counts for the all other columns are only really meaningful > within the context of the tenant they belong to. > > There appear to be five options for me: > 1. Using single column indexes on all interesting columns and rely on > PostgreSQLs bitmap indexes to combine them (which are pretty cool). > 2. Use multi column indexes and accept that sometimes Postgres pick the > wrong index (because a non-tenant-id > column might seem highly selective over the table, but it is not for a > particular tenant - or vice versa). > 3. Use a functional index that combines multiple columns and only query via > these, that causes statistics > gathering for the expression. > I.e. create index i on t((tenantid||column1)) and SELECT ... FROM t WHERE > tenantid||column1 = '...' > 4. Play with n_distinct and/or set the statistics for the inner columns to > some fixed values that lead to the plans that we want. > 5. Have a completely different schema and maybe a database per tenant. > > What about partitioning tables by tenant id and then maintaining indexes on each partition independent of tenant id, since constraint exclusion should handle filtering by tenant id for you. That seems like a potentially more tolerable variant of #5 How many tenants are we talking about? I gather partitioning starts to become problematic when the number of partitions gets large.
Re: [PERFORM] Statistics use with functions
Matthew Wakeling writes: > Ah, now I see it - I re-analysed, and found entries in pg_stats where > tablename is the name of the index. Now the query plans correctly and has > the right estimates. So, one needs to analyse AFTER creating indexes - > didn't know that. Yes, for functional indexes it's helpful to do that. Doesn't matter for plain-old-plain-old indexes. regards, tom lane -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Statistics use with functions
On Fri, 8 May 2009, Tom Lane wrote: In this case, however, you evidently have an index on lower(distance) which should have caused ANALYZE to gather stats on the values of that functional expression. It looks like there might be something wrong there --- can you look into pg_stats and see if there is such an entry and if it looks sane? What should I be looking for? I don't see anything obvious from this: modmine-r9=# select attname from pg_stats where tablename = 'geneflankingregion'; Ah, now I see it - I re-analysed, and found entries in pg_stats where tablename is the name of the index. Now the query plans correctly and has the right estimates. So, one needs to analyse AFTER creating indexes - didn't know that. modmine-r9=# explain analyse SELECT * FROM geneflankingregion WHERE LOWER(distance) = '10.0kb' AND LOWER(direction) = 'upstream'; QUERY PLAN - Bitmap Heap Scan on geneflankingregion (cost=1197.19..11701.87 rows=45614 width=212) (actual time=18.336..153.825 rows=45502 loops=1) Recheck Cond: (lower(distance) = '10.0kb'::text) Filter: (lower(direction) = 'upstream'::text) -> Bitmap Index Scan on geneflankingregion__distance_equals (cost=0.00..1185.78 rows=91134 width=0) (actual time=16.565..16.565 rows=91004 loops=1) Index Cond: (lower(distance) = '10.0kb'::text) Total runtime: 199.282 ms (6 rows) Matthew -- It is better to keep your mouth closed and let people think you are a fool than to open it and remove all doubt. -- Mark Twain -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Statistics use with functions
Matthew Wakeling writes: > When I wrap the fields in the constraints in a LOWER() function, the > planner stops looking at the statistics and makes a wild guess, even > though it is very obvious from just looking what the result should be. Well, in general the planner can't assume anything about the statistics of a function result, since it doesn't know how the function behaves. In this case, however, you evidently have an index on lower(distance) which should have caused ANALYZE to gather stats on the values of that functional expression. It looks like there might be something wrong there --- can you look into pg_stats and see if there is such an entry and if it looks sane? > Also, why does the BitmapAnd say zero actual rows? There isn't any reasonably-inexpensive way for EXPLAIN ANALYZE to determine how many rows are represented by a bitmap result, so it doesn't try. regards, tom lane -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Statistics issue
Vlad Arkhipov <[EMAIL PROTECTED]> writes: > EXPLAIN ANALYZE > SELECT * > FROM i > WHERE d BETWEEN '2007-05-12' AND '2007-05-12' > Index Scan using i_d on i (cost=0.00..2.39 rows=1 width=402) (actual > time=0.053..4.284 rows=1721 loops=1) > Index Cond: ((d >= '2007-05-12'::date) AND (d <= '2007-05-12'::date)) > Total runtime: 6.645 ms > EXPLAIN ANALYZE > SELECT * > FROM i > WHERE d = '2007-05-12' > Index Scan using i_d on i (cost=0.00..38.97 rows=1572 width=402) > (actual time=0.044..4.250 rows=1721 loops=1) > Index Cond: (d = '2007-05-12'::date) > Total runtime: 6.619 ms Hmm, I wonder whether we shouldn't do something like this http://archives.postgresql.org/pgsql-committers/2008-03/msg00128.php for all range conditions, not just those made up by prefix_selectivity(). regards, tom lane -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] statistics buffer is full
""Gábriel Ákos"" <[EMAIL PROTECTED]> wrote > > I've got this message while heavily inserting into a database. What should > I tune and how? It is postgresql 8.1.3. > > 2006-03-29 14:16:57.513 CEST:LOG: statistics buffer is full > Since your server is in a heavy load, so the common trick is to increase PGSTAT_RECVBUFFERSZ in include/pgstat.h and recompile your server. Regards, Qingqing ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] Statistics not working??
On Fri, 11 Mar 2005, Hugo Ferreira wrote: Hi there! I think I may have a problem with the statistics in my postgresql 8.0 running under Windowx XP. When I view both pg_stat_all_tables and pg_stat_all_indexes, all the numeric columns that should hold the statistics are 0 (zero). My configuration file has the following: stats_start_collector = true stats_command_string = true stats_reset_on_server_start = false Any tip? You need to define stats_block_level and/or stats_row_level Thanks in advance, Hugo Ferreira Regards, Oleg _ Oleg Bartunov, sci.researcher, hostmaster of AstroNet, Sternberg Astronomical Institute, Moscow University (Russia) Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/ phone: +007(095)939-16-83, +007(095)939-23-83 ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] statistics
Thanks for the answer. I know the question was to primitive (it claims: i have no idea about databases). But i simply didn't find the answer and if a don't ask i won't learn. Someday i will talk with Tom Lane about how to improve the planner but until that day comes i have a lot of technical things to learn. _ Add photos to your messages with MSN 8. Get 2 months FREE*. http://join.msn.com/?page=features/featuredemail ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html
Re: [PERFORM] statistics
On 07/04/2004 22:05 Jaime Casanova wrote: What the statistics are? Where can i view it? where can i find info about its field and why are they valuable information to performance? thanx in advance, Jaime Casanova OK. An idiot's guide to statistics by a full-time idiot... Let's start with a simple premise. I'm a RDBMS (forget that I'm actually an idiot for a moment...) and I've been asked for select * from foo where bar = 7; How do I go about fulfilling the reequest in the most efficient manner? (i.e., ASAP!) One way might be to read through the whole table and return only those rows which match the where criteron - a sequential scan on the table. But wait a minute, there is an index on column bar. Could I use this instead? Well, of course, I could use it but I have to keep sight of the goal of returning the data ASAP and I know that the act of reading index/reading table/... will have a performance penalty due to a lot more head movement on the disk. So how do I make chose between a sequential scan and an index scan? Let's lokk at a couple of extreme scenarios: 1) let's look at the condition where all or virtually all of the bar columns are populated wityh the value 7. In this case it would be more efficient to read sequentially through the table. 2) the opposite of (1) - very few of the bar columns have the value 7. In this case using the index could be a winner. So generalising, I need to be able to estimate whether doing a sequential scan is more efficient that an index scan and this comes down to 2 factors: a) the cost of moving the disk heads all over the place (random page cost) b) the spread of values in the selecting column(s) (a) is specfified in postgresql.conf (see archives for much discusion about what the value should be..) (b) is determined by the dastardly trick of actually sampling the data in the table!!! That's what analyze does. It samples your table(s) and uses the result to feeede into it's descision about when to flip between sequential and index scans. Hope this makes some kind of sense... -- Paul Thomas +--+-+ | Thomas Micro Systems Limited | Software Solutions for Business | | Computer Consultants | http://www.thomas-micro-systems-ltd.co.uk | +--+-+ ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match