Re: [PERFORM] 60 core performance with 9.3
I've been assisting Mark with the benchmarking of these new servers. The drop off in both throughput and CPU utilisation that we've been observing as the client count increases has let me to investigate which lwlocks are dominant at different client counts. I've recompiled postgres with Andres LWLock improvements, Kevin's libnuma patch and with LWLOCK_STATS enabled. The LWLOCK_STATS below suggest that ProcArrayLock might be the main source of locking that's causing throughput to take a dive as the client count increases beyond the core count. wal_buffers = 256MB checkpoint_segments = 1920 wal_sync_method = open_datasync pgbench -s 2000 -T 600 Results: clients | tps -+- 6 | 9490 12 | 17558 24 | 25681 48 | 41175 96 | 48954 192 | 31887 384 | 15564 LWLOCK_STATS at 48 clients Lock |Blk | SpinDelay | Blk % | SpinDelay % +--+---+---+- BufFreelistLock| 31144 | 11 | 1.64 | 1.62 ShmemIndexLock |192 | 1 | 0.01 | 0.15 OidGenLock | 32648 | 14 | 1.72 | 2.06 XidGenLock | 35731 | 18 | 1.88 | 2.64 ProcArrayLock | 291121 | 215 | 15.36 | 31.57 SInvalReadLock | 32136 | 13 | 1.70 | 1.91 SInvalWriteLock| 32141 | 12 | 1.70 | 1.76 WALBufMappingLock | 31662 | 15 | 1.67 | 2.20 WALWriteLock | 825380 | 45 | 36.31 | 6.61 CLogControlLock| 583458 | 337 | 26.93 | 49.49 LWLOCK_STATS at 96 clients Lock |Blk | SpinDelay | Blk % | SpinDelay % +--+---+---+- BufFreelistLock| 62954 | 12 | 1.54 | 0.27 ShmemIndexLock | 62635 | 4 | 1.54 | 0.09 OidGenLock | 92232 | 22 | 2.26 | 0.50 XidGenLock | 98326 | 18 | 2.41 | 0.41 ProcArrayLock | 928871 |3188 | 22.78 | 72.57 SInvalReadLock | 58392 | 13 | 1.43 | 0.30 SInvalWriteLock| 57429 | 14 | 1.41 | 0.32 WALBufMappingLock | 138375 | 14 | 3.39 | 0.32 WALWriteLock | 1480707 | 42 | 36.31 | 0.96 CLogControlLock| 1098239 |1066 | 26.93 | 27.27 LWLOCK_STATS at 384 clients Lock |Blk | SpinDelay | Blk % | SpinDelay % +--+---+---+- BufFreelistLock| 184298 | 158 | 1.93 | 0.03 ShmemIndexLock | 183573 | 164 | 1.92 | 0.03 OidGenLock | 184558 | 173 | 1.93 | 0.03 XidGenLock | 200239 | 213 | 2.09 | 0.04 ProcArrayLock | 4035527 | 579666 | 42.22 | 98.62 SInvalReadLock | 182204 | 152 | 1.91 | 0.03 SInvalWriteLock| 182898 | 137 | 1.91 | 0.02 WALBufMappingLock | 219936 | 215 | 2.30 | 0.04 WALWriteLock | 3172725 | 457 | 24.67 | 0.08 CLogControlLock| 1012458 |6423 | 10.59 | 1.09 The same test done with a readonly workload show virtually no SpinDelay at all. Any thoughts or comments on these results are welcome! Regards, Matt. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] OFFSET/LIMIT - Disparate Performance w/ Go application
Hi all. This might be tricky in so much as there’s a few moving parts (when isn’t there?), but I’ve tried to test the postgres side as much as possible. Trying to work out a potential database bottleneck with a HTTP application (written in Go): Pages that render HTML templates but don’t perform DB queries can hit ~36k+ req/s Pages that perform a SELECT on a single row net about ~6.6k req/s: db.Get(l, SELECT * FROM listings WHERE id = $1 AND expiry_date current_date, l.Id) Pages that SELECT multiple rows with OFFSET and LIMIT conditions struggle to top 1.3k req/s There’s very little “extra” logic around these queries: you can find the code here (about 39 lines for both functions) https://gist.github.com/elithrar/b2497b0b473da64932b5 Other pertinent details: It’s always been about this slow to my knowledge The table is a test database with about 40 rows, although performance doesn’t change noticeably even with a few hundred (it’s unlikely to ever be more than a 10,000 rows over its lifetime) Running PostgreSQL 9.3.4 on OS X w/ a 3.5GHz i7, 12GB RAM, 128GB PCI-E SSD. The Go application peaks at about 40MB memory when hitting 37k req/s — so there doesn’t appear to be an issue of it eating into the available RAM on the machine I’m also aware that HTTP benchmarks aren’t the most reliable thing, but I’m using wrk -c 400 -t 32 -15s to stress it out The application has a connection pool via the lib/pq driver (https://github.com/lib/pq) with MaxOpen set to 256 connections. Stack size is 8GB and max socket connections are set to 1024 (running out of FDs isn’t the problem here from what I can see). Relevant postgresql.conf settings — everything else should be default, including fsync/synchronous commits (on) for obvious reasons: max_connections = 512 shared_buffers = 2048MB temp_buffers = 16MB work_mem = 4MB wal_buffers = 16 checkpoint_segments = 16 random_page_cost = 2.0 effective_cache_size = 8192MB The query in question is: http://explain.depesz.com/s/7g8 and the table schema is as below: Table public.listings ┌───┬──┬───┐ │Column │ Type │ Modifiers │ ├───┼──┼───┤ │ id│ character varying(17)│ not null │ │ title │ text │ │ │ company │ text │ │ │ location │ text │ │ │ description │ text │ │ │ rendered_desc │ text │ │ │ term │ text │ │ │ commute │ text │ │ │ company_url │ text │ │ │ rep │ text │ │ │ rep_email │ text │ │ │ app_method│ text │ │ │ app_email │ text │ │ │ app_url │ text │ │ │ posted_date │ timestamp with time zone │ │ │ edited_date │ timestamp with time zone │ │ │ renewed_date │ timestamp with time zone │ │ │ expiry_date │ timestamp with time zone │ │ │ slug │ text │ │ │ charge_id │ text │ │ │ sponsor_id│ text │ │ │ tsv │ tsvector │ │ └───┴──┴───┘ Indexes: listings_pkey PRIMARY KEY, btree (id) fts gin (tsv) listings_expiry_date_idx btree (expiry_date) listings_fts_idx gin (to_tsvector('english'::regconfig, (((title || ' '::text) || company) || ' '::text) || location) || ' '::text) || term) || ' '::text) || commute)) Triggers: tsvectorupdate BEFORE INSERT OR UPDATE ON listings FOR EACH ROW EXECUTE PROCEDURE tsvector_update_trigger('tsv', 'pg_catalog.english', 'title', 'company', 'location', 'term', 'commute’) The single row query has a query plan here: http://explain.depesz.com/s/1Np (this is where I see 6.6k req/s at the application level), Some pgbench results from this machine as well: $ pgbench -c 128 -C -j 4 -T 15 -M extended -S starting vacuum...end. transaction type: SELECT only scaling factor: 1 query mode: extended number of clients: 128 number of threads: 4 duration: 15 s number of transactions actually processed: 17040 tps = 1134.481459 (including connections establishing) tps = 56884.093652 (excluding connections establishing) Ultimately I'm not expecting a miracle—database ops are nearly always the slowest part of a web server outside the latency to the client itself—but I'd expect something a little closer (even 10% of 33k would be a lot better). And of course, this is somewhat academic because I don't expect to see four million hits an hour—but I'd also like to catch problems for future reference. Thanks in
Re: [PERFORM] OFFSET/LIMIT - Disparate Performance w/ Go application
Thanks for the replies Jeff, Tom and Merlin. Pages that SELECT multiple rows with OFFSET and LIMIT conditions struggle to top 1.3k req/s Is that tested at the OFFSET and LIMIT of 0 and 15, as shown in the explain plan? Yes — 0 (OFFSET) and 16 (LIMIT), or 15 and 31 (i.e. “second page” of results). There’s no difference on that front. For context, OFFSET is a multiple of 15 (i.e. 15 results per page) and LIMIT is always 15 + 1 in an attempt to fetch one more result, get the len of the returned slice and then return paginate true + slice the last result off if there’s more than 15. The query in question is: http://explain.depesz.com/s/7g8 and the table schema is as below: The reported runtime of 0.078 ms should be able to sustain nearly 10 times the reported rate of 1.3k/s, so the bottleneck would seem to be elsewhere. Perhaps the bottleneck is formatting the result set in postgres to be sent over the wire, then sending it over the wire, then parsing it in the Go connection library to hand back to the Go user code, and then the Go user code doing something meaningful with it. What happens if you get rid of the offset and the order by, and just use limit? I bet it doesn't change the speed much (because that is not where the bottleneck is). You seem to be selecting an awful lot of wide columns. Do you really need to see all of them? - Testing SELECT * FROM … with just LIMIT 15 and no offset yields 1299 request/s at the front end of the application. - Testing SELECT id, title, company, location, commute, term, expiry_date (a subset of fields) with LIMIT 15 and no OFFSET yields 1800 request/s at the front end. There’s definitely an increase to be realised there (I’d say by just tossing the rendered HTML field). Based on your comments about the Go side of things, I ran a quick test by cutting the table down to 6 records from the 39 in the test DB in all previous tests. This skips the pagination logic (below) and yields 3068 req/s on the front-end. // Determine if we have more than one page of results. // If so, trim the extra result off and set pagination = true if len(listings) opts.PerPage { paginate = true listings = listings[:opts.PerPage] } So there certainly appears to be a bottleneck on the Go side as well (outside of even the DB driver), probably from the garbage generated from slicing the slice, although I’d be keen to know if there’s a better way to approach returning a paginated list of results. Well, you can also do client side pagination using the row-wise comparison feature, implemented by you :-). Cursors can be the best approach, but it's nice to know the client side approach if you're really stateless and/or want to be able to pick up external changes during the browse. What would be a better approach here? The cursor approach isn’t ideal in my case (although I could make it work), but what other options are there that are stateless? Some pgbench results from this machine as well: $ pgbench -c 128 -C -j 4 -T 15 -M extended -S This is just benchmarking how fast you can make and break connections to the database. Because your app is using an embedded connection pooler, this benchmark isn't very relevant to your situation. Noted — thanks. On 13 Jun 2014, at 4:46 AM, Jeff Janes jeff.ja...@gmail.com wrote: snip
Re: [PERFORM] In progress INSERT wrecks plans on table
On Tue, 2013-05-07 at 18:32 +1200, Mark Kirkwood wrote: On 07/05/13 18:10, Simon Riggs wrote: On 7 May 2013 01:23, mark.kirkw...@catalyst.net.nz wrote: I'm thinking that a variant of (2) might be simpler to inplement: (I think Matt C essentially beat me to this suggestion - he originally discovered this issue). It is probably good enough for only *new* plans to react to the increased/increasing number of in progress rows. So this would require backends doing significant numbers of row changes to either directly update pg_statistic or report their in progress numbers to the stats collector. The key change here is the partial execution numbers would need to be sent. Clearly one would need to avoid doing this too often (!) - possibly only when number of changed rows autovacuum_analyze_scale_factor proportion of the relation concerned or similar. Are you loading using COPY? Why not break down the load into chunks? INSERT - but we could maybe workaround by chunking the INSERT. However that *really* breaks the idea that in SQL you just say what you want, not how the database engine should do it! And more practically means that the most obvious and clear way to add your new data has nasty side effects, and you have to tip toe around muttering secret incantations to make things work well :-) We also had the same problem with an UPDATE altering the data distribution in such a way that trivial but frequently executed queries cause massive server load until auto analyze sorted out the stats. -- Matt Clarkson Catalyst.Net Limited -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] In progress INSERT wrecks plans on table
(2) seems fairly hard generically, since we'd have to keep track of the tids returned from the IndexScan to allow us to switch to a different plan and avoid re-issuing rows that we've already returned. But maybe if we adapted the IndexScan plan type so that it adopted a more page oriented approach internally, it could act like a bitmapscan. Anyway, that would need some proof that it would work and sounds like a fair task. (1) sounds more easily possible and plausible. At the moment we have enable_indexscan = off. If we had something like plan_cost_weight_indexscan = N, we could selectively increase the cost of index scans so that they would be less likely to be selected. i.e. plan_cost_weight_indexscan = 2 would mean an indexscan would need to be half the cost of any other plan before it was selected. (parameter name selected so it could apply to all parameter types). The reason to apply this weighting would be to calculate risk adjusted cost not just estimated cost. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services Another option would be for the bulk insert/update/delete to track the distribution stats as the operation progresses and if it detects that it is changing the distribution of data beyond a certain threshold it would update the pg stats accordingly. -- Matt Clarkson Catalyst.Net Limited -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Estimation question...
I get it now... project_id=115 has a frequency of 0.09241 in pg_stats. So if ((project_id = 115) AND (project_id = 115)) is considered as two independent conditions, the row estimate ends up being 0.09241 * 0.09241 * 1.20163e+07 (reltuples from pg_class) = 102614. http://www.postgresql.org/docs/9.0/static/row-estimation-examples.html was a big help. Matt On Wed, Feb 27, 2013 at 9:08 AM, Matt Daw m...@shotgunsoftware.com wrote: Quick follow up... I've found that the row estimate in: explain select count(id) from versions where project_id IN (80,115) AND project_id=115; QUERY PLAN - Aggregate (cost=178572.75..178572.76 rows=1 width=4) - Index Scan using dneg_versions_project_id on versions (cost=0.00..178306.94 rows=106323 width=4) Index Cond: ((project_id = ANY ('{80,115}'::integer[])) AND (project_id = 115)) ... is the sum of two other estimates, seen when rewriting the query using OR instead of IN: explain select count(id) from versions where (project_id = 80 OR project_id = 115) AND project_id=115; QUERY PLAN --- Aggregate (cost=305896.95..305896.96 rows=1 width=4) - Bitmap Heap Scan on versions (cost=2315.08..305632.00 rows=105980 width=4) Recheck Cond: (((project_id = 80) AND (project_id = 115)) OR ((project_id = 115) AND (project_id = 115))) - BitmapOr (cost=2315.08..2315.08 rows=106323 width=0) - Bitmap Index Scan on dneg_versions_project_id (cost=0.00..94.52 rows=3709 width=0) Index Cond: ((project_id = 80) AND (project_id = 115)) - Bitmap Index Scan on dneg_versions_project_id (cost=0.00..2167.57 rows=102614 width=0) Index Cond: ((project_id = 115) AND (project_id = 115)) 106323 = 3709 + 102614 Looks like the underlying problem is that the estimate for ((project_id = 115) AND (project_id = 115)) doesn't end up being the same as (project_id=115) on its own. Matt On Tue, Feb 26, 2013 at 11:35 AM, Matt Daw m...@shotgunsoftware.com wrote: Howdy, the query generator in my app sometimes creates redundant filters of the form: project_id IN ( list of projects user has permission to see ) AND project_id = single project user is looking at ... and this is leading to a bad estimate (and thus a bad plan) on a few complex queries. I've included simplified examples below. This server is running 9.0.10 and the statistics target has been updated to 1000 on the project_id column. I've also loaded the one table into a 9.2.2 instance and replicated the behaviour. I can change how the query is being generated, but I'm curious why I'm getting a bad estimate. Is this an expected result? Thanks! Matt = 1) Filter on project_id only, row estimate for Bitmap Index Scan quite good. explain (analyze,buffers) select count(id) from versions WHERE project_id=115; QUERY PLAN --- Aggregate (cost=1218111.01..1218111.02 rows=1 width=4) (actual time=1531.341..1531.342 rows=1 loops=1) Buffers: shared hit=452619 - Bitmap Heap Scan on versions (cost=34245.06..1215254.86 rows=1142461 width=4) (actual time=148.394..1453.383 rows=1114197 loops=1) Recheck Cond: (project_id = 115) Buffers: shared hit=452619 - Bitmap Index Scan on versions_project_id (cost=0.00..33959.45 rows=1142461 width=0) (actual time=139.709..139.709 rows=1116037 loops=1) Index Cond: (project_id = 115) Buffers: shared hit=22077 Total runtime: 1531.399 ms 2) Filter on project_id IN () AND project_id. Row estimate is ~10x lower. explain (analyze,buffers) select count(id) from versions WHERE project_id IN (80,115) AND project_id=115;; QUERY PLAN - Aggregate (cost=327066.18..327066.19 rows=1 width=4) (actual time=1637.889..1637.889 rows=1 loops=1) Buffers: shared hit=458389 - Bitmap Heap Scan on versions (cost=3546.56..326793.17 rows=109201 width=4) (actual time=155.107..1557.453 rows=1114180 loops=1) Recheck Cond: ((project_id = ANY ('{80,115}'::integer[])) AND (project_id = 115)) Buffers: shared hit=458389 - Bitmap Index Scan on versions_project_id (cost=0.00..3519.26 rows=109201 width=0) (actual time=145.502..145.502 rows=1125436 loops=1) Index Cond: ((project_id = ANY ('{80,115}'::integer
Re: [PERFORM] Estimation question...
Quick follow up... I've found that the row estimate in: explain select count(id) from versions where project_id IN (80,115) AND project_id=115; QUERY PLAN - Aggregate (cost=178572.75..178572.76 rows=1 width=4) - Index Scan using dneg_versions_project_id on versions (cost=0.00..178306.94 rows=106323 width=4) Index Cond: ((project_id = ANY ('{80,115}'::integer[])) AND (project_id = 115)) ... is the sum of two other estimates, seen when rewriting the query using OR instead of IN: explain select count(id) from versions where (project_id = 80 OR project_id = 115) AND project_id=115; QUERY PLAN --- Aggregate (cost=305896.95..305896.96 rows=1 width=4) - Bitmap Heap Scan on versions (cost=2315.08..305632.00 rows=105980 width=4) Recheck Cond: (((project_id = 80) AND (project_id = 115)) OR ((project_id = 115) AND (project_id = 115))) - BitmapOr (cost=2315.08..2315.08 rows=106323 width=0) - Bitmap Index Scan on dneg_versions_project_id (cost=0.00..94.52 rows=3709 width=0) Index Cond: ((project_id = 80) AND (project_id = 115)) - Bitmap Index Scan on dneg_versions_project_id (cost=0.00..2167.57 rows=102614 width=0) Index Cond: ((project_id = 115) AND (project_id = 115)) 106323 = 3709 + 102614 Looks like the underlying problem is that the estimate for ((project_id = 115) AND (project_id = 115)) doesn't end up being the same as (project_id=115) on its own. Matt On Tue, Feb 26, 2013 at 11:35 AM, Matt Daw m...@shotgunsoftware.com wrote: Howdy, the query generator in my app sometimes creates redundant filters of the form: project_id IN ( list of projects user has permission to see ) AND project_id = single project user is looking at ... and this is leading to a bad estimate (and thus a bad plan) on a few complex queries. I've included simplified examples below. This server is running 9.0.10 and the statistics target has been updated to 1000 on the project_id column. I've also loaded the one table into a 9.2.2 instance and replicated the behaviour. I can change how the query is being generated, but I'm curious why I'm getting a bad estimate. Is this an expected result? Thanks! Matt = 1) Filter on project_id only, row estimate for Bitmap Index Scan quite good. explain (analyze,buffers) select count(id) from versions WHERE project_id=115; QUERY PLAN --- Aggregate (cost=1218111.01..1218111.02 rows=1 width=4) (actual time=1531.341..1531.342 rows=1 loops=1) Buffers: shared hit=452619 - Bitmap Heap Scan on versions (cost=34245.06..1215254.86 rows=1142461 width=4) (actual time=148.394..1453.383 rows=1114197 loops=1) Recheck Cond: (project_id = 115) Buffers: shared hit=452619 - Bitmap Index Scan on versions_project_id (cost=0.00..33959.45 rows=1142461 width=0) (actual time=139.709..139.709 rows=1116037 loops=1) Index Cond: (project_id = 115) Buffers: shared hit=22077 Total runtime: 1531.399 ms 2) Filter on project_id IN () AND project_id. Row estimate is ~10x lower. explain (analyze,buffers) select count(id) from versions WHERE project_id IN (80,115) AND project_id=115;; QUERY PLAN - Aggregate (cost=327066.18..327066.19 rows=1 width=4) (actual time=1637.889..1637.889 rows=1 loops=1) Buffers: shared hit=458389 - Bitmap Heap Scan on versions (cost=3546.56..326793.17 rows=109201 width=4) (actual time=155.107..1557.453 rows=1114180 loops=1) Recheck Cond: ((project_id = ANY ('{80,115}'::integer[])) AND (project_id = 115)) Buffers: shared hit=458389 - Bitmap Index Scan on versions_project_id (cost=0.00..3519.26 rows=109201 width=0) (actual time=145.502..145.502 rows=1125436 loops=1) Index Cond: ((project_id = ANY ('{80,115}'::integer[])) AND (project_id = 115)) Buffers: shared hit=22076 Total runtime: 1637.941 ms -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] Estimation question...
Howdy, the query generator in my app sometimes creates redundant filters of the form: project_id IN ( list of projects user has permission to see ) AND project_id = single project user is looking at ... and this is leading to a bad estimate (and thus a bad plan) on a few complex queries. I've included simplified examples below. This server is running 9.0.10 and the statistics target has been updated to 1000 on the project_id column. I've also loaded the one table into a 9.2.2 instance and replicated the behaviour. I can change how the query is being generated, but I'm curious why I'm getting a bad estimate. Is this an expected result? Thanks! Matt = 1) Filter on project_id only, row estimate for Bitmap Index Scan quite good. explain (analyze,buffers) select count(id) from versions WHERE project_id=115; QUERY PLAN --- Aggregate (cost=1218111.01..1218111.02 rows=1 width=4) (actual time=1531.341..1531.342 rows=1 loops=1) Buffers: shared hit=452619 - Bitmap Heap Scan on versions (cost=34245.06..1215254.86 rows=1142461 width=4) (actual time=148.394..1453.383 rows=1114197 loops=1) Recheck Cond: (project_id = 115) Buffers: shared hit=452619 - Bitmap Index Scan on versions_project_id (cost=0.00..33959.45 rows=1142461 width=0) (actual time=139.709..139.709 rows=1116037 loops=1) Index Cond: (project_id = 115) Buffers: shared hit=22077 Total runtime: 1531.399 ms 2) Filter on project_id IN () AND project_id. Row estimate is ~10x lower. explain (analyze,buffers) select count(id) from versions WHERE project_id IN (80,115) AND project_id=115;; QUERY PLAN - Aggregate (cost=327066.18..327066.19 rows=1 width=4) (actual time=1637.889..1637.889 rows=1 loops=1) Buffers: shared hit=458389 - Bitmap Heap Scan on versions (cost=3546.56..326793.17 rows=109201 width=4) (actual time=155.107..1557.453 rows=1114180 loops=1) Recheck Cond: ((project_id = ANY ('{80,115}'::integer[])) AND (project_id = 115)) Buffers: shared hit=458389 - Bitmap Index Scan on versions_project_id (cost=0.00..3519.26 rows=109201 width=0) (actual time=145.502..145.502 rows=1125436 loops=1) Index Cond: ((project_id = ANY ('{80,115}'::integer[])) AND (project_id = 115)) Buffers: shared hit=22076 Total runtime: 1637.941 ms -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] Query plan, nested EXISTS
Aggregate (cost=821544.18..821544.19 rows=1 width=0) (actual time=5046.492..5046.492 rows=1 loops=1) - Hash Semi Join (cost=735371.03..821521.73 rows=8977 width=0) (actual time=4941.968..5045.968 rows=7116 loops=1) Hash Cond: (a.id = b.note_id) - Index Scan using notes_retirement_date_project on notes a (cost=0.00..66725.10 rows=12469 width=4) (actual time=9.639..68.751 rows=12469 loops=1) Index Cond: (project_id = 114) - Hash (cost=712116.23..712116.23 rows=1417424 width=4) (actual time=4931.956..4931.956 rows=297401 loops=1) Buckets: 65536 Batches: 4 Memory Usage: 2633kB - Hash Join (cost=620484.32..712116.23 rows=1417424 width=4) (actual time=3362.472..4864.816 rows=297401 loops=1) Hash Cond: (b.entity_id = c.id) - Seq Scan on note_links b (cost=0.00..71849.56 rows=1417424 width=8) (actual time=0.079..622.277 rows=1509795 loops=1) Filter: ((retirement_date IS NULL) AND ((entity_type)::text = 'Version'::text)) - Hash (cost=618673.97..618673.97 rows=144828 width=4) (actual time=3362.337..3362.337 rows=155834 loops=1) Buckets: 16384 Batches: 1 Memory Usage: 5479kB - HashAggregate (cost=617225.69..618673.97 rows=144828 width=4) (actual time=3289.861..3335.344 rows=155834 loops=1) - Seq Scan on versions c (cost=0.00..616863.62 rows=144828 width=4) (actual time=217.080..3133.870 rows=155834 loops=1) Filter: ((retirement_date IS NULL) AND ((sg_status_list)::text = 'ip'::text)) Total runtime: 5051.414 ms (17 rows) Does anything come to mind that would help me debug why this plan is being chosen? Thanks! Matt
Re: [PERFORM] Query plan, nested EXISTS
Hi Tom, thank you very much. I'll load these tables onto a 9.2 instance and report back. Matt On Fri, Sep 28, 2012 at 2:44 PM, Tom Lane t...@sss.pgh.pa.us wrote: Matt Daw m...@shotgunsoftware.com writes: Howdy, I've been debugging a client's slow query today and I'm curious about the query plan. It's picking a plan that hashes lots of rows from the versions table (on v9.0.10)... EXPLAIN ANALYZE SELECT COUNT(*) FROM notes a WHERE a.project_id = 114 AND EXISTS ( SELECT 1 FROM note_links b WHERE b.note_id = a.id AND b.entity_type = 'Version' AND EXISTS ( SELECT 1 FROM versions c WHERE c.id = b.entity_id AND c.code ILIKE '%comp%' AND c.retirement_date IS NULL ) AND b.retirement_date IS NULL ) I think the real problem here is that 9.0 is incapable of avoiding a full table scan on note_links, which means it doesn't really have any better option than to do the inner EXISTS as a full-table semijoin. This is because it can't push a.id down through two levels of join, and because the semijoins don't commute, there's no way to get a.id into the scan of note_links to pull out only the useful rows. The hack with LIMIT avoids this problem by preventing the inner EXISTS from being treated as a full-fledged semijoin; but of course that hack leaves you vulnerable to very bad plans if the statistics are such that a nestloop join isn't the best bet for the inner EXISTS. The work I did for parameterized paths in 9.2 was intended to address exactly this type of scenario. I would be interested to know if 9.2 does this any better for you. regards, tom lane
Re: [PERFORM] Query plan, nested EXISTS
Hi Tom, v9.2.1 looks good! Aggregate (cost=420808.99..420809.00 rows=1 width=0) (actual time=147.345..147.345 rows=1 loops=1) - Nested Loop Semi Join (cost=0.00..420786.71 rows=8914 width=0) (actual time=13.847..147.219 rows=894 loops=1) - Index Scan using notes_retirement_date_project on notes a (cost=0.00..67959.22 rows=12535 width=4) (actual time=13.811..71.741 rows=12469 loops=1) Index Cond: (project_id = 114) - Nested Loop Semi Join (cost=0.00..28.14 rows=1 width=4) (actual time=0.006..0.006 rows=0 loops=12469) - Index Scan using note_links_note on note_links b (cost=0.00..12.37 rows=1 width=8) (actual time=0.002..0.002 rows=1 loops=12469) Index Cond: (note_id = a.id) Filter: ((retirement_date IS NULL) AND ((entity_type)::text = 'Version'::text)) Rows Removed by Filter: 1 - Index Scan using versions_pkey on versions c (cost=0.00..15.76 rows=1 width=4) (actual time=0.003..0.003 rows=0 loops=11794) Index Cond: (id = b.entity_id) Filter: ((retirement_date IS NULL) AND ((code)::text ~~* '%comp%'::text)) Rows Removed by Filter: 1 Total runtime: 147.411 ms (14 rows) On Fri, Sep 28, 2012 at 2:47 PM, Matt Daw m...@shotgunsoftware.com wrote: Hi Tom, thank you very much. I'll load these tables onto a 9.2 instance and report back. Matt On Fri, Sep 28, 2012 at 2:44 PM, Tom Lane t...@sss.pgh.pa.us wrote: Matt Daw m...@shotgunsoftware.com writes: Howdy, I've been debugging a client's slow query today and I'm curious about the query plan. It's picking a plan that hashes lots of rows from the versions table (on v9.0.10)... EXPLAIN ANALYZE SELECT COUNT(*) FROM notes a WHERE a.project_id = 114 AND EXISTS ( SELECT 1 FROM note_links b WHERE b.note_id = a.id AND b.entity_type = 'Version' AND EXISTS ( SELECT 1 FROM versions c WHERE c.id = b.entity_id AND c.code ILIKE '%comp%' AND c.retirement_date IS NULL ) AND b.retirement_date IS NULL ) I think the real problem here is that 9.0 is incapable of avoiding a full table scan on note_links, which means it doesn't really have any better option than to do the inner EXISTS as a full-table semijoin. This is because it can't push a.id down through two levels of join, and because the semijoins don't commute, there's no way to get a.id into the scan of note_links to pull out only the useful rows. The hack with LIMIT avoids this problem by preventing the inner EXISTS from being treated as a full-fledged semijoin; but of course that hack leaves you vulnerable to very bad plans if the statistics are such that a nestloop join isn't the best bet for the inner EXISTS. The work I did for parameterized paths in 9.2 was intended to address exactly this type of scenario. I would be interested to know if 9.2 does this any better for you. regards, tom lane
[PERFORM] Performance of Seq Scan from buffer cache
Howdy. I'm curious what besides raw hardware speed determines the performance of a Seq Scan that comes entirely out of shared buffers… I ran the following on the client's server I'm profiling, which is otherwise idle: EXPLAIN (ANALYZE ON, BUFFERS ON) SELECT * FROM notes; Seq Scan on notes (cost=0.00..94004.88 rows=1926188 width=862) (actual time=0.009..1673.702 rows=1926207 loops=1) Buffers: shared hit=74743 Total runtime: 3110.442 ms (3 rows) … and that's about 9x slower than what I get on my laptop with the same data. I ran stream-scaling on the machine and the results seem reasonable (8644.1985 MB/s with 1 core - 25017 MB/s with 12 cores). The box is running 2.6.26.6-49 and postgresql 9.0.6. I'm stumped as to why it's so much slower, any ideas on what might explain it… or other benchmarks I could run to try to narrow down the cause? Thanks! Matt
Re: [PERFORM] Performance of Seq Scan from buffer cache
Ugh, never mind. I ran ltrace and it's spending 99% of its time in gettimeofday. select count(*) from notes; count - 1926207 (1 row) Time: 213.950 ms explain analyze select count(*) from notes; QUERY PLAN Aggregate (cost=99274.59..99274.60 rows=1 width=0) (actual time=2889.325..2889.325 rows=1 loops=1) - Seq Scan on notes (cost=0.00..94459.07 rows=1926207 width=0) (actual time=0.005..1475.218 rows=1926207 loops=1) Total runtime: 2889.360 ms (3 rows) Time: 2889.842 ms On Tuesday, 21 August, 2012 at 3:57 PM, Matt Daw wrote: Howdy. I'm curious what besides raw hardware speed determines the performance of a Seq Scan that comes entirely out of shared buffers… I ran the following on the client's server I'm profiling, which is otherwise idle: EXPLAIN (ANALYZE ON, BUFFERS ON) SELECT * FROM notes; Seq Scan on notes (cost=0.00..94004.88 rows=1926188 width=862) (actual time=0.009..1673.702 rows=1926207 loops=1) Buffers: shared hit=74743 Total runtime: 3110.442 ms (3 rows) … and that's about 9x slower than what I get on my laptop with the same data. I ran stream-scaling on the machine and the results seem reasonable (8644.1985 MB/s with 1 core - 25017 MB/s with 12 cores). The box is running 2.6.26.6-49 and postgresql 9.0.6. I'm stumped as to why it's so much slower, any ideas on what might explain it… or other benchmarks I could run to try to narrow down the cause? Thanks! Matt
Re: [PERFORM] Slowing UPDATEs inside a transaction
Robert Haas wrote: Old row versions have to be kept around until they're no longer of interest to any still-running transaction. Thanks for the explanation. Regarding the snippet above, why would the intermediate history of multiply-modified uncommitted rows be of interest to anything, or is the current behaviour simply cheaper overall in terms of cpu/developer time? -- The information contained in this message is confidential and is intended for the addressee only. If you have received this message in error or there are any problems please notify the originator immediately. The unauthorised use, disclosure, copying or alteration of this message is strictly forbidden. Critical Software Ltd. reserves the right to monitor and record e-mail messages sent to and from this address for the purposes of investigating or detecting any unauthorised use of its system and ensuring its effective operation. Critical Software Ltd. registered in England, 04909220. Registered Office: IC2, Keele Science Park, Keele, Staffordshire, ST5 5NH. This message has been scanned for security threats by iCritical. For further information, please visit www.icritical.com -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] Slowing UPDATEs inside a transaction
Hi. I've only been using PostgreSQL properly for a week or so, so I apologise if this has been covered numerous times, however Google is producing nothing of use. I'm trying to import a large amount of legacy data (billions of denormalised rows) into a pg database with a completely different schema, de-duplicating bits of it on-the-fly while maintaining a reference count. The procedures to do this have proven easy to write, however the speed is not pretty. I've spent some time breaking down the cause and it's come down to a simple UPDATE as evidenced below: CREATE TABLE foo (a int PRIMARY KEY, b int); INSERT INTO foo VALUES (1,1); CREATE OR REPLACE FUNCTION test() RETURNS int AS $$ DECLARE i int; BEGIN FOR i IN 1..1 LOOP UPDATE foo SET b=b+1 WHERE a=1; END LOOP; RETURN 1; END; $$ LANGUAGE plpgsql; When run individually, this function produces the following timing: Time: 1912.593 ms Time: 1937.183 ms Time: 1941.607 ms Time: 1943.673 ms Time: 1944.738 ms However, when run inside a transaction (to simulate a function doing the same work) I get this: START TRANSACTION Time: 0.836 ms Time: 1908.931 ms Time: 5370.110 ms Time: 8809.378 ms Time: 12274.294 ms Time: 15698.745 ms Time: 19218.389 ms There is no disk i/o and the postgresql process runs 100% cpu. Server is amd64 FreeBSD 8-STABLE w/16GB RAM running postgresql 9.0.3 from packages Looking at the timing of real data (heavily grouped), it seems the speed of UPDATEs can vary dependent on how heavily updated a row is, so I set out to produce a test case: CREATE TABLE foo (a int PRIMARY KEY, b int); INSERT INTO foo VALUES (1,1),(2,1),(3,1),(4,1); CREATE OR REPLACE FUNCTION test(int) RETURNS int AS $$ DECLARE i int; BEGIN FOR i IN 1..1 LOOP UPDATE foo SET b=1 WHERE a=$1; END LOOP; RETURN 1; END; $$ LANGUAGE plpgsql; START TRANSACTION; SELECT test(1); Time: 1917.305 ms SELECT test(2); Time: 1926.758 ms SELECT test(3); Time: 1926.498 ms SELECT test(1); Time: 5376.691 ms SELECT test(2); Time: 5408.231 ms SELECT test(3); Time: 5403.654 ms SELECT test(1); Time: 8842.425 ms SELECT test(4); Time: 1925.954 ms COMMIT; START TRANSACTION; SELECT test(1); Time: 1911.153 ms As you can see, the more an individual row is updated /within a transaction/, the slower it becomes for some reason. Unfortunately in my real-world case, I need to do many billions of these UPDATEs. Is there any way I can get around this without pulling my huge source table out of the database and feeding everything in line-at-a-time from outside the database? Thanks. -- The information contained in this message is confidential and is intended for the addressee only. If you have received this message in error or there are any problems please notify the originator immediately. The unauthorised use, disclosure, copying or alteration of this message is strictly forbidden. Critical Software Ltd. reserves the right to monitor and record e-mail messages sent to and from this address for the purposes of investigating or detecting any unauthorised use of its system and ensuring its effective operation. Critical Software Ltd. registered in England, 04909220. Registered Office: IC2, Keele Science Park, Keele, Staffordshire, ST5 5NH. This message has been scanned for security threats by iCritical. For further information, please visit www.icritical.com -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Slow-ish Query Needs Some Love
On Feb 2, 6:06 am, Edgardo Portal egportal2...@yahoo.com wrote: On 2010-02-02, Matt White mattw...@gmail.com wrote: I have a relatively straightforward query that by itself isn't that slow, but we have to run it up to 40 times on one webpage load, so it needs to run much faster than it does. Here it is: SELECT COUNT(*) FROM users, user_groups WHERE users.user_group_id = user_groups.id AND NOT users.deleted AND user_groups.partner_id IN (partner_id_1, partner_id_2); The structure is partners have user groups which have users. In the test data there are over 200,000 user groups and users but only ~3000 partners. Anyone have any bright ideas on how to speed this query up? Can you avoid running it 40 times, maybe by restructuring the query (or making a view) along the lines of the following and adding some logic to your page? SELECT p.partner_id, ug.user_group_id, u.id, count(*) FROM partners p LEFT JOIN user_groups ug ON ug.partner_id=p.partner_id LEFT JOIN users u ON u.user_group_id=ug.id WHERE NOT u.deleted GROUP BY 1,2,3 ; Thanks for the suggestion. The view didn't seem to speed things up. Perhaps we can reduce the number of times it's called, we'll see. Any additional ideas would be helpful. Thanks. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] Slow-ish Query Needs Some Love
I have a relatively straightforward query that by itself isn't that slow, but we have to run it up to 40 times on one webpage load, so it needs to run much faster than it does. Here it is: SELECT COUNT(*) FROM users, user_groups WHERE users.user_group_id = user_groups.id AND NOT users.deleted AND user_groups.partner_id IN (partner_id_1, partner_id_2); The structure is partners have user groups which have users. In the test data there are over 200,000 user groups and users but only ~3000 partners. Anyone have any bright ideas on how to speed this query up? Here's the query plan: Aggregate (cost=12574.53..12574.54 rows=1 width=0) (actual time=2909.298..2909.299 rows=1 loops=1) - Hash Join (cost=217.79..12566.08 rows=3378 width=0) (actual time=2909.284..2909.284 rows=0 loops=1) Hash Cond: (users.user_group_id = user_groups.id) - Seq Scan on users (cost=0.00..11026.11 rows=206144 width=4) (actual time=0.054..517.811 rows=205350 loops=1) Filter: (NOT deleted) - Hash (cost=175.97..175.97 rows=3346 width=4) (actual time=655.054..655.054 rows=22 loops=1) - Nested Loop (cost=0.27..175.97 rows=3346 width=4) (actual time=1.327..428.406 rows=22 loops=1) - HashAggregate (cost=0.27..0.28 rows=1 width=4) (actual time=1.259..1.264 rows=2 loops=1) - Result (cost=0.00..0.26 rows=1 width=0) (actual time=1.181..1.240 rows=2 loops=1) - Index Scan using user_groups_partner_id_idx on user_groups (cost=0.00..133.86 rows=3346 width=8) (actual time=0.049..96.992 rows=11 loops=2) Index Cond: (user_groups.partner_id = (partner_all_subpartners(3494))) The one obvious thing that everyone will point out is the sequential scan on users, but there actually is an index on users.deleted. When I forced sequential scanning off, it ran slower, so the planner wins again. Thanks for any help you can offer. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] suggestions for postgresql setup on Dell 2950 , PERC6i controller
Rajesh Kumar Mallah wrote: I've checked out the latest Areca controllers, but the manual available on their website states there's a limitation of 32 disks in an array... Where exactly is there limitation of 32 drives. the datasheet of 1680 states support upto 128drives using enclosures. The 1680 manual: http://www.areca.us//support/download/RaidCards/Documents/Manual_Spec/SAS_Manual.zip Page 25: Note: 1. The maximum no. is 32 disk drived included in a single RAID set Page 49: 1. Up to 32 disk drives can be included in a single RAID set. 2. Up to 8 RAID sets can be created per controller (point 2 meaning you can't do s/w RAID over umpteen h/w RAID1 pairs) Page 50: To create RAID 30/50/60 volume, you need create multiple RAID sets first with the same disk members on each RAID set. The max no. disk drives per volume set: 32 for RAID 0/1/10/3/5/6 and 128 for RAID 30/50/60. ...and a few more times saying the same thing -- -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] suggestions for postgresql setup on Dell 2950 , PERC6i controller
Scott Carey wrote: You probably don’t want a single array with more than 32 drives anyway, its almost always better to start carving out chunks and using software raid 0 or 1 on top of that for various reasons. I wouldn’t put more than 16 drives in one array on any of these RAID cards, they’re just not optimized for really big arrays and tend to fade between 6 to 16 in one array, depending on the quality. This is what I'm looking at now. The server I'm working on at the moment currently has a PERC6/e and 3xMD1000s which needs to be tested in a few setups. I need to code a benchmarker yet (I haven't found one yet that can come close to replicating our DB usage patterns), but I intend to try: 1. 3x h/w RAID10 (one per shelf), sofware RAID0 2. lots x h/w RAID1, software RAID0 if the PERC will let me create enough arrays 3. Pure s/w RAID10 if I can convince the PERC to let the OS see the disks 4. 2x h/w RAID30, software RAID0 I'm not holding much hope out for the last one :) I'm just glad work on a rewrite of my inherited backend systems should start soon; get rid of the multi-TB MySQL hell and move to a distributed PG setup on dirt cheap Dell R200s/blades You can do direct-attached storage to 100+ drives or more if you want. The price and manageability cost go up a lot if it gets too big however. Having global hot spare drives is critical. Not that the cost of using SAN’s and such is low... SAS expanders have made DAS with large arrays very accessible though. For large storage arrays (RAID60 or similar) you can't beat a RAID controller and disk shelf(s), especially if you keep the raidsets small and use cheap ludicrous capacity SATA disks You just need to be aware that performance doesn't scale well/easily over 1-2 shelves on the things -- -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] suggestions for postgresql setup on Dell 2950 , PERC6i controller
Glyn Astill wrote: Stupid question, but why do people bother with the Perc line of cards if the LSI brand is better? It seems the headache of trying to get the Perc cards to perform is not worth any money saved. I think in most cases the dell cards actually cost more, people end up stuck with them because they come bundled with their servers - they find out too late that they've got a lemon. That's what's been happening with me... The fact Dell prices can have a fair bit of downward movement when you get the account manager on the phone makes them especially attractive to the people controlling the purse strings. The biggest reason for me however is the lack of comparative reviews. I struggled to get the LSI card to replace the PERC3 because all I had to go on was qualitative mailing list/forum posts from strangers. The only way I got it was to make the argument that other than trying the LSI, we'd have no choice other than replacing the server+shelf+disks. I want to see just how much better a high-end Areca/Adaptec controller is, but I just don't think I can get approval for a £1000 card because some guy on the internet said the PERC sucks. Would that same person say it sucked if it came in Areca packaging? Am I listening to the advice of a professional, or a fanboy? -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] suggestions for postgresql setup on Dell 2950 , PERC6i controller
Arjen van der Meijden wrote: Afaik the Perc 5/i and /e are more or less rebranded LSI-cards (they're not identical in layout etc), so it would be a bit weird if they performed much less than the similar LSI's wouldn't you think? I've recently had to replace a PERC4/DC with the exact same card made by LSI (320-2) because the PERCs firmware was crippled. Its idea of RAID10 actually appears to be concatenated RAID1 arrays. Since replacing it and rebuilding the array on the LSI card, performance has been considerably better (14 disk SCSI shelf) Areca may be the fastest around right now, but if you'd like to get it all from one supplier, its not too bad to be stuck with Dell's perc 5 or 6 series. The PERC6 isn't too bad, however it grinds to a halt when the IO queue gets large and it has the serious limitation of not supporting more than 8 spans, so trying to build a RAID10 array greater than 16 disks is pointless if you're not just after the extra capacity. Are there any reasonable choices for bigger (3+ shelf) direct-connected RAID10 arrays, or are hideously expensive SANs the only option? I've checked out the latest Areca controllers, but the manual available on their website states there's a limitation of 32 disks in an array... -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] too many clog files
Hi Duan, As others have said, you should probably attempt to run pg_dump to export your database. If that doesn't work, consider restoring from backup. If the dump does work, you can create a clean PGDATA directory (using initdb like when you setup your original installation), and create a fresh copy of your database using the dump file. Then abandon your potentially damaged PGDATA directory. For future reference: - The autovacuum parameter in postgresql.conf is off by default under Postgres 8.1. You should probably turn it on to ensure regular vacuuming, unless you have your own cronjob to do the vacuuming. - About finding old transactions, there are 2 places you have to look for old transactions. The usual place is in pg_stat_activity. The 2nd place is pg_prepared_xacts, where prepared transactions are listed. If there's a prepared transaction in your system, it might explain why your old commit-logs aren't being purged. The following query shows both prepared and normal transactions: select l.transactionid, age(l.transactionid) as age, /* measured in number of other transactions elapsed, not in terms of time */ l.pid, case when l.pid is null then false else true end as is_prepared, a.backend_start, p.prepared as time_xact_was_prepared, p.gid as prepared_name from pg_locks l left outer join pg_stat_activity a on l.pid = a.procpid left outer join pg_prepared_xacts p on l.transactionid = p.transaction where l.locktype = 'transactionid' and l.mode = 'ExclusiveLock' and l.granted order by age(l.transactionid) desc ; transactionid | age | pid | is_prepared | backend_start | time_xact_was_prepared | prepared_name ---+-+--+-+---+---+-- 316645 | 44 | | f | | 2008-09-09 00:31:46.724178-07 | my_prepared_transaction1 316689 | 0 | 6093 | t | 2008-09-09 00:40:10.928287-07 | | (2 rows) Note that unless you run this query as a superuser (e.g. postgres), the columns from pg_stat_activity will only be visible for sessions that belong to you. To rollback this example prepared transaction, you'd type: ROLLBACK PREPARED 'my_prepared_transaction1'; Hope this helps! Matt -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] too many clog files
Alvaro Herrera wrote: Move the old clog files back where they were, and run VACUUM FREEZE in all your databases. That should clean up all the old pg_clog files, if you're really that desperate. Has anyone actually seen a CLOG file get removed under 8.2 or 8.3? How about 8.1? I'm probably missing something, but looking at src/backend/commands/vacuum.c (under 8.2.9 and 8.3.3), it seems like vac_truncate_clog() scans through *all* tuples of pg_database looking for the oldest datfrozenxid. Won't that always be template0, which as far as I know can never be vacuumed (or otherwise connected to)? postgres=# select datname, datfrozenxid, age(datfrozenxid), datallowconn from pg_database order by age(datfrozenxid), datname ; datname | datfrozenxid | age| datallowconn --+--+--+-- template1| 36347792 | 3859 | t postgres | 36347733 | 3918 | t mss_test | 36347436 | 4215 | t template0| 526 | 36351125 | f (4 rows) I looked at several of my 8.2 databases' pg_clog directories, and they all have all the sequentially numbered segments ( through current segment). Would it be reasonable for vac_truncate_clog() to skip databases where datallowconn is false (i.e. template0)? Looking back to the 8.1.13 code, it does exactly that: if (!dbform-datallowconn) continue; Also, Duan, if you have lots of files under pg_clog, you may be burning through transactions faster than necessary. Do your applications leave autocommit turned on? And since no one else mentioned it, as a work-around for a small filesystem you can potentially shutdown your database, move the pg_clog directory to a separate filesystem, and create a symlink to it under your PGDATA directory. That's not a solution, just a mitigation. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] limit clause breaks query planner?
Tom Lane [EMAIL PROTECTED] writes: Matt Smiley [EMAIL PROTECTED] writes: So an Index Scan is always going to have a higher cost estimate than an equivalent Seq Scan returning the same result rows (unless random_page_cost is 1). That's why I think the planner is always preferring the plan that uses a Seq Scan. If that were the case, we'd never choose an indexscan at all... You're right, that was a silly guess. It's true that a plain indexscan is not preferred for queries that will return a large fraction of the table. However, it should be willing to use a bitmap scan for this query, given default cost settings (the default cost settings will cause it to prefer bitmap scan for retrieving up to about a third of the table, in my experience). I too am confused about why it doesn't prefer that choice in the OP's example. It looks like the bitmap scan has a higher cost estimate because the entire bitmap index must be built before beginning the heap scan and returning rows up the pipeline. The row-count limit can't be pushed lower than the bitmap-heap-scan like it can for the basic index-scan. test_8_3_3=# set enable_seqscan = false ; SET test_8_3_3=# set enable_indexscan = false ; SET test_8_3_3=# explain analyze select * from my_table where a is null and b = 3 limit 15 ; QUERY PLAN Limit (cost=17070.22..17071.02 rows=15 width=8) (actual time=606.902..607.086 rows=15 loops=1) - Bitmap Heap Scan on my_table (cost=17070.22..69478.96 rows=988217 width=8) (actual time=606.892..606.983 rows=15 loops=1) Recheck Cond: (b = 3) Filter: (a IS NULL) - Bitmap Index Scan on idx_b (cost=0.00..16823.17 rows=109 width=0) (actual time=592.657..592.657 rows=100 loops=1) Index Cond: (b = 3) Total runtime: 607.340 ms (7 rows) It would be interesting to alter the random_page_cost setting and see if he gets different results. Using an unmodified postgresql.conf, the cost estimate for an index-scan were so much higher than for a seqscan that random_page_cost had to be set below 0.2 before the index-scan was preferred. However, it looks like this was mainly because effective_cache_size was too small. The planner thought the cache was only 128 MB, and the size of the complete table+index was 39492 + 21946 pages * 8 KB/block = 330 MB. It makes sense for the cost estimate to be so much higher if blocks are expected to be repeatedly re-fetched from disk. I wonder if David's effective_cache_size is too small. test_8_3_3=# reset all ; RESET test_8_3_3=# explain analyze select * from my_table where a is null and b = 3 limit 15 ; QUERY PLAN --- Limit (cost=0.00..2.50 rows=15 width=8) (actual time=0.036..0.239 rows=15 loops=1) - Seq Scan on my_table (cost=0.00..164492.74 rows=988217 width=8) (actual time=0.028..0.138 rows=15 loops=1) Filter: ((a IS NULL) AND (b = 3)) Total runtime: 0.338 ms (4 rows) test_8_3_3=# set enable_seqscan = false ; SET test_8_3_3=# show random_page_cost ; random_page_cost -- 4 (1 row) test_8_3_3=# explain analyze select * from my_table where a is null and b = 3 limit 15 ; QUERY PLAN -- Limit (cost=0.00..45.99 rows=15 width=8) (actual time=0.051..0.200 rows=15 loops=1) - Index Scan using idx_b on my_table (cost=0.00..3029924.36 rows=988217 width=8) (actual time=0.043..0.100 rows=15 loops=1) Index Cond: (b = 3) Filter: (a IS NULL) Total runtime: 0.308 ms (5 rows) test_8_3_3=# set random_page_cost = 0.19 ; SET test_8_3_3=# explain analyze select * from my_table where a is null and b = 3 limit 15 ; QUERY PLAN - Limit (cost=0.00..2.45 rows=15 width=8) (actual time=0.050..0.201 rows=15 loops=1) - Index Scan using idx_b on my_table (cost=0.00..161190.65 rows=988217 width=8) (actual time=0.042..0.097 rows=15 loops=1) Index Cond: (b = 3) Filter: (a IS NULL) Total runtime: 0.307 ms (5 rows) Now fix effective_cache_size and try again. test_8_3_3=# reset all ; RESET test_8_3_3=# set effective_cache_size = '500MB' ; SET test_8_3_3=# set enable_seqscan = false ; SET test_8_3_3=# explain analyze select * from my_table where a is null and b = 3 limit 15
Re: [PERFORM] limit clause breaks query planner?
Tom Lane [EMAIL PROTECTED] writes: I'm not sure offhand whether the existing correlation stats would be of use for it, or whether we'd have to get ANALYZE to gather additional data. Please forgive the tangent, but would it be practical to add support for gathering statistics on an arbitrary expression associated with a table, rather than just on materialized columns? For example: analyze my_tab for expression 'my_func(my_tab.col)' ; It seems like any time you'd consider using a functional index, this feature would let the planner calculate decent selectivity estimates for the expression's otherwise opaque data distribution. The expression might be treated as a virtual column on the table; not sure if that helps or hurts. Should I post this question on pgsql-hackers? -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] limit clause breaks query planner?
Hi David, Early in this thread, Pavel suggested: you should partial index create index foo(b) on mytable where a is null; Rather, you might try the opposite partial index (where a is NOT null) as a replacement for the original unqualified index on column A. This new index will be ignored by the query you're trying to tune, but it'll be available to the other queries that filter to a non-null value of column A. (Omitting NULL from that index should be ok because you normally wouldn't want to use an index when 95% of the table's rows match the filtered key.) Then you can temporarily disable Seq Scans in your session for just this one query, as follows: SQL create table my_table ( a int, b int ) ; CREATE TABLE SQL create index idx_a_not_null on my_table ( a ) where a is not null ; CREATE INDEX SQL create index idx_b on my_table ( b ) ; CREATE INDEX SQL insert into my_table (a, b) select case when random() = 0.95 then null else i end as a, mod(i, 10) as b from generate_series(1, 1000) s(i) ; INSERT 0 1000 SQL analyze my_table ; ANALYZE Review the statistics available to the optimizer: SQL select attname, null_frac, n_distinct, most_common_vals, most_common_freqs, histogram_bounds, correlation from pg_stats where tablename = 'my_table' order by attname ; attname | null_frac | n_distinct | most_common_vals| most_common_freqs | histogram_bounds| correlation -+---++---+--++- a | 0.945 | -1 | | | {2771,1301755,2096051,3059786,3680728,4653531,5882434,6737141,8240245,9428702,9875768} | 1 b | 0 | 10 | {9,4,3,1,2,6,8,5,7,0} | {0.110333,0.104,0.102333,0.100333,0.100333,0.0996667,0.0986667,0.098,0.096,0.09} | |0.127294 (2 rows) SQL select relname, reltuples, relpages from pg_class where relname in ('my_table', 'idx_a_not_null', 'idx_b') order by relname ; relname | reltuples | relpages +---+-- idx_a_not_null |499955 | 1100 idx_b | 1e+07 |21946 my_table | 1e+07 |39492 (3 rows) Run the test query, first without disabling Seq Scan to show this example reproduces the plan you're trying to avoid. SQL explain analyze select * from my_table where a is null and b = 5 limit 15 ; QUERY PLAN --- Limit (cost=0.00..2.66 rows=15 width=8) (actual time=0.070..0.263 rows=15 loops=1) - Seq Scan on my_table (cost=0.00..164492.00 rows=929250 width=8) (actual time=0.061..0.159 rows=15 loops=1) Filter: ((a IS NULL) AND (b = 5)) Total runtime: 0.371 ms (4 rows) Now run the same query without the Seq Scan option. SQL set enable_seqscan = false ; SET SQL explain analyze select * from my_table where a is null and b = 5 limit 15 ; QUERY PLAN -- Limit (cost=0.00..46.33 rows=15 width=8) (actual time=0.081..0.232 rows=15 loops=1) - Index Scan using idx_b on my_table (cost=0.00..2869913.63 rows=929250 width=8) (actual time=0.072..0.130 rows=15 loops=1) Index Cond: (b = 5) Filter: (a IS NULL) Total runtime: 0.341 ms (5 rows) SQL reset enable_seqscan ; RESET Yes, it's unsavory to temporarily adjust a session-level parameter to tune a single query, but I don't know of a less intrusive way to avoid the SeqScan. Here's why I think it might be your simplest option: As far as I can tell, the plan nodes for accessing the table/index are unaware of the LIMIT. The cost of the Limit node is estimated as the cost of its input row-source multiplied by the ratio of requested/returned rows. For example, from the preceding plan output: 2869913.63 for Index Scan upper cost * (15 row limit / 929250 returned rows) = 46.326 upper cost for the Limit node The underlying plan nodes each assume that all the rows matching their filter predicates will be returned up the pipeline; the cost estimate is only reduced at the Limit node. A Seq Scan and an Index Scan (over a complete index) will both expected the same number of input rows (pg_class.reltuples). They also produce the same estimated result set, since
Re: [SOLVED] [PERFORM] Query plan excluding index on view
Removing the constants definitely did take care of the issue on 8.3 (still same query plan on 8.1). Thanks for your help in getting this resolved, and sorry again for not including all relevant information on my initial request On Fri, Apr 4, 2008 at 10:20 PM, Tom Lane [EMAIL PROTECTED] wrote: Matt Klinker [EMAIL PROTECTED] writes: --Joined View: CREATE OR REPLACE VIEW directory_listing AS SELECT school.id, school.name, school.description, 119075291 AS listing_type_fid FROM school UNION ALL SELECT company.id, company.name, company.description, 119074833 AS listing_type_fid FROM company; Ah, there's the problem :-(. Can you get rid of the constants here? The planner's currently not smart about UNION ALL subqueries unless their SELECT lists contain just simple column references. (Yes, fixing that is on the todo list, but don't hold your breath... it'll be 8.4 material at the earliest.) regards, tom lane
Re: [PERFORM] Query plan excluding index on view
on company c (cost=0.00..9.67 rows=1 width=424) Index Cond: (c.id = xref.listing_fid) On Thu, Apr 3, 2008 at 11:49 PM, Tom Lane [EMAIL PROTECTED] wrote: Matt Klinker [EMAIL PROTECTED] writes: Sorry for not including this extra bit originally. Below is the explain detail from both the query to the view that takes longer and then the query directly to the single table that performs quickly. ... - Subquery Scan *SELECT* 1 (cost=0.00..1285922.80 rows=18384890 width=251) - Seq Scan on company (cost=0.00..1102073.90 rows=18384890 The presence of a Subquery Scan node tells me that either this is a much older PG version than you stated, or there are some interesting details to the query that you omitted. Please drop the fan-dance routine and show us a complete reproducible test case. regards, tom lane
[PERFORM] Query plan excluding index on view
I'm trying to fine tune this query to return in a reasonable amount of time and am having difficulties getting the query to run the way I'd like. I have a couple of semi-related entities that are stored in individual tables, say, A and B. There is then a view created that pulls together the common fields from these 2 tables. These are then related through a m:m relationship to a classification. Quick definitions of all of this follows: Table: ItemA id - primary key name description addtl fields for A Table: ItemB id- primary key name description addtl fields for B View: Combined SELECT id, name, description from ItemA UNION ALL SELECT id, name, description from ItemB Table: xref id - primary key item_id - indexed, points to either ItemA.id or ItemB.id classifcation_id - indexed, points to classification.id Table: classifcation id - primiary key name I'm trying to query from the classification, through the xref, and to the view to get a list of Items (either A or B) that are tied to a specific classification. My query is rather simple, baiscally as follows: SELECT id, name, description FROM combination c INNER JOIN xref on c.id = xref.item_id WHERE xref.classifcation_id = 1 This query runs in about 2-3 minutes (I should mention that ItemA has ~18M records and xref has ~26M records - and both will continue to grow). The explain text shows a disregard for the indexes on ItemA and ItemB and a sequence scan is done on both of them. However, if I rewrite this query to join directly to ItemA rather to the view it runs in ~50ms because it now uses the proper index. I know it's generally requested to include the EXPLAIN text when submitting a specific question, but I thought perhaps this was generic enough that someone might at least have some suggestions. If required I can certainly work up a simpler example, or I could include my actual explain (though it doesn't exactly match everything defined above as I tried to keep things rather generic). Any links would be nice as well, from all my searching the past few days, most of the performance tuning resources I could find where about tuning the server itself, not really a specific query - at least not one that dealt with this issue. If you've read this far - thank you much!
Re: [PERFORM] Query plan excluding index on view
Sorry for not including this extra bit originally. Below is the explain detail from both the query to the view that takes longer and then the query directly to the single table that performs quickly. Hash Join (cost=49082.96..1940745.80 rows=11412 width=76) Hash Cond: (outer.?column1? = inner.listing_fid) - Append (cost=0.00..1290709.94 rows=18487347 width=252) - Subquery Scan *SELECT* 1 (cost=0.00..1285922.80 rows=18384890 width=251) - Seq Scan on company (cost=0.00..1102073.90 rows=18384890 width=251) - Subquery Scan *SELECT* 2 (cost=0.00..4787.14 rows=102457 width=252) - Seq Scan on school (cost=0.00..3762.57 rows=102457 width=252) - Hash (cost=49042.64..49042.64 rows=16130 width=8) - Bitmap Heap Scan on listing_node_xref xref (cost=102.45..49042.64 rows=16130 width=8) Recheck Cond: (node_fid = 173204537) - Bitmap Index Scan on idx_listing_node_xref_node_fid (cost=0.00..102.45 rows=16130 width=0) Index Cond: (node_fid = 173204537) Nested Loop (cost=102.45..98564.97 rows=11349 width=517) - Bitmap Heap Scan on listing_node_xref xref (cost=102.45..49042.64 rows=16130 width=8) Recheck Cond: (node_fid = 173204537) - Bitmap Index Scan on idx_listing_node_xref_node_fid (cost=0.00..102.45 rows=16130 width=0) Index Cond: (node_fid = 173204537) - Index Scan using idx_pki_company_id on company c (cost=0.00..3.06 rows=1 width=517) Index Cond: (c.id = outer.listing_fid) On Thu, Apr 3, 2008 at 7:19 PM, Tom Lane [EMAIL PROTECTED] wrote: Matt Klinker [EMAIL PROTECTED] writes: I new I'd forget something! I've tried this on both 8.2 and 8.3 with the same results. Then you're going to have to provide more details ... regards, tom lane
[PERFORM] db performance/design question
I'm designing a system that will be doing over a million inserts/deletes on a single table every hour. Rather than using a single table, it is possible for me to partition the data into multiple tables if I wanted to, which would be nice because I can just truncate them when I don't need them. I could even use table spaces to split the IO load over multiple filers. The application does not require all this data be in the same table. The data is fairly temporary, it might last 5 seconds, it might last 2 days, but it will all be deleted eventually and different data will be created. Considering a single table would grow to 10mil+ rows at max, and this machine will sustain about 25mbps of insert/update/delete traffic 24/7 - 365, will I be saving much by partitioning data like that? -- -Matt http://twiki.spimageworks.com/twiki/bin/view/Software/CueDevelopment
Re: [PERFORM] Sunfire X4500 recommendations
Hi David, Thanks for your feedback! I'm rather a newbie at this, and I do appreciate the critique. First, let me correct myself: The formulas for the risk of loosing data when you loose 2 and 3 disks shouldn't have included the first term (g/n). I'll give the corrected formulas and tables at the end of the email. please explain why you are saying that the risk of loosing any 1 disk is 1/n. shouldn't it be probability of failure * n instead? 1/n represents the assumption that all disks have an equal probability of being the next one to fail. This seems like a fair assumption in general for the active members of a stripe (not including hot spares). A possible exception would be the parity disks (because reads always skip them and writes always hit them), but that's only a consideration if the RAID configuration used dedicated disks for parity instead of distributing it across the RAID 5/6 group members. Apart from that, whether the workload is write-heavy or read-heavy, sequential or scattered, the disks in the stripe ought to handle a roughly equivalent number of iops over their lifetime. following this logic the risk of loosing all 48 disks in a single group of 48 would be 100% Exactly. Putting all disks in one group is RAID 0 -- no data protection. If you loose even 1 active member of the stripe, the probability of loosing your data is 100%. also what you are looking for is the probability of the second (and third) disks failing in time X (where X is the time nessasary to notice the failure, get a replacement, and rebuild the disk) Yep, that's exactly what I'm looking for. That's why I said, these probabilities are only describing the case where we don't have enough time between disk failures to recover the array. My goal wasn't to estimate how long time X is. (It doesn't seem like a generalizable quantity; due partly to logistical and human factors, it's unique to each operating environment.) Instead, I start with the assumption that time X has been exceeded, and we've lost a 2nd (or 3rd) disk in the array. Given that assumption, I wanted to show the probability that the loss of the 2nd disk has caused the stripe to become unrecoverable. We know that RAID 10 and 50 can tolerate the loss of anywhere between 1 and n/g disks, depending on how lucky you are. I wanted to quantify the amount of luck required, as a risk management tool. The duration of time X can be minimized with hot spares and attentive administrators, but the risk after exceeding time X can only be minimized (as far as I know) by configuring the RAID stripe with small enough underlying failure groups. the killer is the time needed to rebuild the disk, with multi-TB arrays is't sometimes faster to re-initialize the array and reload from backup then it is to do a live rebuild (the kernel.org servers had a raid failure recently and HPA mentioned that it took a week to rebuild the array, but it would have only taken a couple days to do a restore from backup) That's very interesting. I guess the rebuild time also would depend on how large the damaged failure group was. Under RAID 10, for example, I think you'd still only have to rebuild 1 disk from its mirror, regardless of how many other disks were in the stripe, right? So shortening the rebuild time may be another good motivation to keep the failure groups small. add to this the fact that disk failures do not appear to be truely independant from each other statisticly (see the recent studies released by google and cmu), and I wouldn't bother with single-parity for a I don't think I've seen the studies you mentioned. Would you cite them please? This may not be typical of everyone's experience, but what I've seen during in-house load tests is an equal I/O rate for each disk in my stripe, using short-duration sampling intervals to avoid long-term averaging effects. This is what I expected to find, so I didn't delve deeper. Certainly it's true that some disks may be more heavily burdened than others for hours or days, but I wouldn't expect any bias from an application-driven access pattern to persist for a significant fraction of a disk's lifespan. The only influence I'd expect to bias the cumulative I/O handled by a disk over its entire life would be its role in the RAID configuration. Hot spares will have minimal wear-and-tear until they're activated. Dedicated parity disks will probably live longer than data disks, unless the workload is very heavily oriented towards small writes (e.g. logging). multi-TB array. If the data is easy to recreate (including from backup) or short lived (say a database of log data that cycles every month or so) I would just do RAID-0 and plan on loosing the data on drive failure (this assumes that you can afford the loss of service when this happens). if the data is more important then I'd do dual-parity or more, along with a hot spare so that the
Re: [PERFORM] Sunfire X4500 recommendations
Hi Dimitri, First of all, thanks again for the great feedback! Yes, my I/O load is mostly read operations. There are some bulk writes done in the background periodically throughout the day, but these are not as time-sensitive. I'll have to do some testing to find the best balance of read vs. write speed and tolerance of disk failure vs. usable diskspace. I'm looking forward to seeing the results of your OLTP tests! Good luck! Since I won't be doing that myself, it'll be all new to me. About disk failure, I certainly agree that increasing the number of disks will decrease the average time between disk failures. Apart from any performance considerations, I wanted to get a clear idea of the risk of data loss under various RAID configurations. It's a handy reference, so I thought I'd share it: The goal is to calculate the probability of data loss when we loose a certain number of disks within a short timespan (e.g. loosing a 2nd disk before replacing+rebuilding the 1st one). For RAID 10, 50, and Z, we will loose data if any disk group (i.e. mirror or parity-group) looses 2 disks. For RAID 60 and Z2, we will loose data if 3 disks die in the same parity group. The parity groups can include arbitrarily many disks. Having larger groups gives us more usable diskspace but less protection. (Naturally we're more likely to loose 2 disks in a group of 50 than in a group of 5.) g = number of disks in each group (e.g. mirroring = 2; single-parity = 3 or more; dual-parity = 4 or more) n = total number of disks risk of loosing any 1 disk = 1/n risk of loosing 1 disk from a particular group = g/n risk of loosing 2 disks in the same group = g/n * (g-1)/(n-1) risk of loosing 3 disks in the same group = g/n * (g-1)/(n-1) * (g-2)/(n-2) For the x4500, we have 48 disks. If we stripe our data across all those disks, then these are our configuration options: RAID 10 or 50 -- Mirroring or single-parity must loose 2 disks from the same group to loose data: disks_per_group num_groups total_disks usable_disks risk_of_data_loss 2 24 4824 0.09% 3 16 4832 0.27% 4 12 4836 0.53% 6 8 4840 1.33% 8 6 4842 2.48% 12 4 4844 5.85% 24 2 4846 24.47% 48 1 4847100.00% RAID 60 or Z2 -- Double-parity must loose 3 disks from the same group to loose data: disks_per_group num_groups total_disks usable_disks risk_of_data_loss 2 24 48 n/an/a 3 16 4816 0.01% 4 12 4824 0.02% 6 8 4832 0.12% 8 6 4836 0.32% 12 4 4840 1.27% 24 2 4844 11.70% 48 1 4846100.00% So, in terms of fault tolerance: - RAID 60 and Z2 always beat RAID 10, since they never risk data loss when only 2 disks fail. - RAID 10 always beats RAID 50 and Z, since it has the largest number of disk groups across which to spread the risk. - Having more parity groups increases fault tolerance but decreases usable diskspace. That's all assuming each disk has an equal chance of failure, which is probably true since striping should distribute the workload evenly. And again, these probabilities are only describing the case where we don't have enough time between disk failures to recover the array. In terms of performance, I think RAID 10 should always be best for write speed. (Since it doesn't calculate parity, writing a new block doesn't require reading the rest of the RAID stripe just to recalculate the parity bits.) I think it's also normally just as fast for reading, since the controller can load-balance the pending read requests to both sides of each mirror. ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Sunfire X4500 recommendations
Thanks Dimitri! That was very educational material! I'm going to think out loud here, so please correct me if you see any errors. The section on tuning for OLTP transactions was interesting, although my OLAP workload will be predominantly bulk I/O over large datasets of mostly-sequential blocks. The NFS+ZFS section talked about the zil_disable control for making zfs ignore commits/fsyncs. Given that Postgres' executor does single-threaded synchronous I/O like the tar example, it seems like it might benefit significantly from setting zil_disable=1, at least in the case of frequently flushed/committed writes. However, zil_disable=1 sounds unsafe for the datafiles' filesystem, and would probably only be acceptible for the xlogs if they're stored on a separate filesystem and you're willing to loose recently committed transactions. This sounds pretty similar to just setting fsync=off in postgresql.conf, which is easier to change later, so I'll skip the zil_disable control. The RAID-Z section was a little surprising. It made RAID-Z sound just like RAID 50, in that you can customize the trade-off between iops versus usable diskspace and fault-tolerance by adjusting the number/size of parity-protected disk groups. The only difference I noticed was that RAID-Z will apparently set the stripe size across vdevs (RAID-5s) to be as close as possible to the filesystem's block size, to maximize the number of disks involved in concurrently fetching each block. Does that sound about right? So now I'm wondering what RAID-Z offers that RAID-50 doesn't. I came up with 2 things: an alleged affinity for full-stripe writes and (under RAID-Z2) the added fault-tolerance of RAID-6's 2nd parity bit (allowing 2 disks to fail per zpool). It wasn't mentioned in this blog, but I've heard that under certain circumstances, RAID-Z will magically decide to mirror a block instead of calculating parity on it. I'm not sure how this would happen, and I don't know the circumstances that would trigger this behavior, but I think the goal (if it really happens) is to avoid the performance penalty of having to read the rest of the stripe required to calculate parity. As far as I know, this is only an issue affecting small writes (e.g. single-row updates in an OLTP workload), but not large writes (compared to the RAID's stripe size). Anyway, when I saw the filesystem's intent log mentioned, I thought maybe the small writes are converted to full-stripe writes by deferring their commit until a full stripe's worth of data had been accumulated. Does that sound plausible? Are there any other noteworthy perks to RAID-Z, rather than RAID-50? If not, I'm inclined to go with your suggestion, Dimitri, and use zfs like RAID-10 to stripe a zpool over a bunch of RAID-1 vdevs. Even though many of our queries do mostly sequential I/O, getting higher seeks/second is more important to us than the sacrificed diskspace. For the record, those blogs also included a link to a very helpful ZFS Best Practices Guide: http://www.solarisinternals.com/wiki/index.php/ZFS_Best_Practices_Guide To sum up, so far the short list of tuning suggestions for ZFS includes: - Use a separate zpool and filesystem for xlogs if your apps write often. - Consider setting zil_disable=1 on the xlogs' dedicated filesystem. ZIL is the intent log, and it sounds like disabling it may be like disabling journaling. Previous message threads in the Postgres archives debate whether this is safe for the xlogs, but it didn't seem like a conclusive answer was reached. - Make filesystem block size (zfs record size) match the Postgres block size. - Manually adjust vdev_cache. I think this sets the read-ahead size. It defaults to 64 KB. For OLTP workload, reduce it; for DW/OLAP maybe increase it. - Test various settings for vq_max_pending (until zfs can auto-tune it). See http://blogs.sun.com/erickustarz/entry/vq_max_pending - A zpool of mirrored disks should support more seeks/second than RAID-Z, just like RAID 10 vs. RAID 50. However, no single Postgres backend will see better than a single disk's seek rate, because the executor currently dispatches only 1 logical I/O request at a time. Dimitri [EMAIL PROTECTED] 03/23/07 2:28 AM On Friday 23 March 2007 03:20, Matt Smiley wrote: My company is purchasing a Sunfire x4500 to run our most I/O-bound databases, and I'd like to get some advice on configuration and tuning. We're currently looking at: - Solaris 10 + zfs + RAID Z - CentOS 4 + xfs + RAID 10 - CentOS 4 + ext3 + RAID 10 but we're open to other suggestions. Matt, for Solaris + ZFS you may find answers to all your questions here: http://blogs.sun.com/roch/category/ZFS http://blogs.sun.com/realneel/entry/zfs_and_databases Think to measure log (WAL) activity and use separated pool for logs if needed. Also, RAID-Z is more security-oriented rather performance, RAID-10 should be a better choice... Rgds
[PERFORM] Sunfire X4500 recommendations
My company is purchasing a Sunfire x4500 to run our most I/O-bound databases, and I'd like to get some advice on configuration and tuning. We're currently looking at: - Solaris 10 + zfs + RAID Z - CentOS 4 + xfs + RAID 10 - CentOS 4 + ext3 + RAID 10 but we're open to other suggestions. From previous message threads, it looks like some of you have achieved stellar performance under both Solaris 10 U2/U3 with zfs and CentOS 4.4 with xfs. Would those of you who posted such results please describe how you tuned the OS/fs to yield those figures (e.g. patches, special drivers, read-ahead, checksumming, write-through cache settings, etc.)? Most of our servers currently run CentOS/RedHat, and we have little experience with Solaris, but we're not opposed to Solaris if there's a compelling reason to switch. For example, it sounds like zfs snapshots may have a lighter performance penalty than LVM snapshots. We've heard that just using LVM (even without active snapshots) imposes a maximum sequential I/O rate of around 600 MB/s (although we haven't yet reached this limit experimentally). By the way, we've also heard that Solaris is more stable under heavy I/O load than Linux. Have any of you experienced this? It's hard to put much stock in such a blanket statement, but naturally we don't want to introduce instabilities. Thanks in advance for your thoughts! For reference: Our database cluster will be 3-6 TB in size. The Postgres installation will be 8.1 (at least initially), compiled to use 32 KB blocks (rather than 8 KB). The workload will be predominantly OLAP. The Sunfire X4500 has 2 dual-core Opterons, 16 GB RAM, 48 SATA disks (500 GB/disk * 48 = 24 TB raw - 12 TB usable under RAID 10). So far, we've seen the X4500 deliver impressive but suboptimal results using the out-of-the-box installation of Solaris + zfs. The Linux testing is in the early stages (no xfs, yet), but so far it yeilds comparatively modest write rates and very poor read and rewrite rates. === Results under Solaris with zfs: === Four concurrent writers: % time dd if=/dev/zero of=/zpool1/test/50GB-zero1 bs=1024k count=51200 ; time sync % time dd if=/dev/zero of=/zpool1/test/50GB-zero2 bs=1024k count=51200 ; time sync % time dd if=/dev/zero of=/zpool1/test/50GB-zero3 bs=1024k count=51200 ; time sync % time dd if=/dev/zero of=/zpool1/test/50GB-zero4 bs=1024k count=51200 ; time sync Seq Write (bs = 1 MB): 128 + 122 + 131 + 124 = 505 MB/s Four concurrent readers: % time dd if=/zpool1/test/50GB-zero1 of=/dev/null bs=1024k % time dd if=/zpool1/test/50GB-zero2 of=/dev/null bs=1024k % time dd if=/zpool1/test/50GB-zero3 of=/dev/null bs=1024k % time dd if=/zpool1/test/50GB-zero4 of=/dev/null bs=1024k Seq Read (bs = 1 MB): 181 + 177 + 180 + 178 = 716 MB/s One bonnie++ process: % bonnie++ -r 16384 -s 32g:32k -f -n0 -d /zpool1/test/bonnie_scratch Version 1.03 --Sequential Output-- --Sequential Input- --Random- -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks-- Machine Size:chnk K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP thumper132G:32k 604173 98 268893 43 543389 59 519.2 3 thumper1,32G:32k,,,604173,98,268893,43,,,543389,59,519.2,3, 4 concurrent synchronized bonnie++ processes: % bonnie++ -p4 % bonnie++ -r 16384 -s 32g:32k -y -f -n0 -d /zpool1/test/bonnie_scratch % bonnie++ -r 16384 -s 32g:32k -y -f -n0 -d /zpool1/test/bonnie_scratch % bonnie++ -r 16384 -s 32g:32k -y -f -n0 -d /zpool1/test/bonnie_scratch % bonnie++ -r 16384 -s 32g:32k -y -f -n0 -d /zpool1/test/bonnie_scratch % bonnie++ -p-1 Combined results of 4 sessions: Seq Output: 124 + 124 + 124 + 140 = 512 MB/s Rewrite: 93 + 94 + 93 + 96 = 376 MB/s Seq Input:192 + 194 + 193 + 197 = 776 MB/s Random Seek: 327 + 327 + 335 + 332 = 1321 seeks/s = Results under CentOS 4 with ext3 and LVM: = % bonnie++ -s 32g:32k -f -n0 -d /large_lvm_stripe/test/bonnie_scratch Version 1.03 --Sequential Output-- --Sequential Input- --Random- -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks-- Machine Size:chnk K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP thumper1.rt 32G:32k 346595 94 59448 11 132471 12 479.4 2 thumper1.rtkinternal,32G:32k,,,346595,94,59448,11,,,132471,12,479.4,2, Summary of bonnie++ results: sequential sequentialsequential scattered Test case write MB/s rewrite MB/s read MB/s seeks/s - -- -- - Sol10+zfs, 1 process 604 269 543519 Sol10+zfs, 4 processes512 376 776
[PERFORM] pgsql upgrade
Hi all, I've run into an issue with a Pg 7.4.6 to 8.1.5 upgrade along with hardware upgrade. I moved the database from a 2x 3.0ghz Xeon (512kb w/HT) to a 2x Opteron 250. The database is in memory on a tmpfs partition. (the application can rebuild the db during total system failure) When I first switched it over, the results were exactly what I expected. I was sustaining about a 2.2 on the Xeon and with the Opteron, about a 1.5 with the same traffic. The box is highload, it ouputs about 15mbps. After a couple hours working perfectly the ' system' (vs user) load jumped from a 3% of total CPU usage, to 30%, a 10x increase, and postgres started to write out to data/base at a fairly fast rate. The CPU context switch rate doubled at the same time. Iowait, which was historically 0 on the 7.4 box, went to 0.08. Strangely enough, a vacuum (not full or analyze) stopped postgres from writing to data/base but the strange load pattern remains. (system is ~30% of the overall load, vs 3% before) So, my question is, what happened, and how can I get it back to the same load pattern 7.4.6 had, and the same pattern I had for 4 hours before it went crazy? Matt ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Loading the entire DB into RAM
If memory serves me correctly I have seen several posts about this in the past. I'll try to recall highlights. 1. Create a md in linux sufficiently large enough to handle the data set you are wanting to store. 2. Create a HD based copy somewhere as your permanent storage mechanism. 3. Start up your PostgreSQL instance with the MD as the data store 4. Load your data to the MD instance. 5. Figure out how you will change indexes _and_ ensure that your disk storage is consistent with your MD instance. I haven't done so, but it would be interesting to have a secondary database somewhere that is your primary storage. It needn't be especially powerful, or even available. It serves as the place to generate your indexing data. You could then use SLONY to propogate the data to the MD production system. Of course, if you are updating your system that resides in ram, you should be thinking the other way. Have SLONY replicate changes to the other, permanent storage, system. Either way you do it, I can't think of an out of the box method to doing it. Somehow one has to transfer data from permanent storage to the md instance, and, likewise, back to permanent storage. Out of curiosity, what are you using as the search engine? Charles A. Landemaine wrote: I have a web server with PostgreSQL and RHEL. It hosts a search engine, and each time some one makes a query, it uses the HDD Raid array. The DB is not very big, it is less than a GB. I plan to add more RAM anyway. What I'd like to do is find out how to keep the whole DB in RAM so that each time some one does a query, it doesn't use the HDD. Is it possible, if so, how? Thanks, Charles. ---(end of broadcast)--- TIP 6: explain analyze is your friend ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Storing Digital Video
Rodrigo Madera wrote: I am concerned with performance issues involving the storage of DV on a database. I though of some options, which would be the most advised for speed? 1) Pack N frames inside a container and store the container to the db. 2) Store each frame in a separate record in the table frames. 3) (type something here) Thanks for the help, My experience has been that this is a very bad idea. Many people want to store all sorts of data in a database such as email messages, pictures, etc... The idea of a relational database is to perform queries against data. If you are needing to just store data then store it on a disk and use the database as the indexer of the data. Keep in mind the larger the database the slower some operations become. Unless you are operating on the frame data (which you either store as blobs or hex-encoded data) I'd recommend you store the data on a hard drive and let the database store meta data about the video such as path information, run time, author, etc... We do this on an application storing close to a million images and the performance is impressive. 1. we don't have to do any sort of data manipulation storing the data in or retrieving the data out of the database. 2. our database is compact and extremely fast - it is using the database for what it was designed for - relational queries. My $0.02 Rodrigo ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
[PERFORM] Prefetch
I wanted to get some opinions about row prefetching. AFAIK, there is no prefetching done by PostgreSQL; all prefetching is delegated to the operating system. The hardware (can't say enough good things about it): Athlon 64, dual channel 4GB ram 240GB usable 4 disk raid5 (ATA133) Fedora Core 3 PostgreSQL 7.4.7 I have what is essentially a data warehouse of stock data. Each day has around 30,000 records (tickers). A typical operation is to get the 200 day simple moving average (of price) for each ticker and write the result to a summary table. In running this process (Perl/DBI), it is typical to see 70-80% I/O wait time with postgres running a about 8-9%. If I run the next day's date, the postgres cache and file cache is now populated with 199 days of the needed data, postgres runs 80-90% of CPU and total run time is greatly reduced. My conclusion is that this is a high cache hit rate in action. I've done other things that make sense, like using indexes, playing with the planner constants and turning up the postgres cache buffers. Even playing with extream hdparm read-ahead numbers (i.e. 64738), there is no apparent difference in database performance. The random nature of the I/O drops disk reads down to about 1MB/sec for the array. A linear table scan can easily yield 70-80MB/sec on this system. Total table size is usually around 1GB and with indexes should be able to fit completely in main memory. Other databases like Oracle and DB2 implement some sort of row prefetch. Has there been serious consideration of implementing something like a prefetch subsystem? Does anyone have any opinions as to why this would be a bad idea for postgres? Postges is great for a multiuser environment and OLTP applications. However, in this set up, a data warehouse, the observed performance is not what I would hope for. Regards, Matt Olson Ocean Consulting http://www.oceanconsulting.com/ ---(end of broadcast)--- TIP 8: explain analyze is your friend
[PERFORM] Prefetch
I wanted to get some opinions about row prefetching. AFAIK, there is no prefetching done by PostgreSQL; all prefetching is delegated to the operating system. The hardware (can't say enough good things about it): Athlon 64, dual channel 4GB ram 240GB usable 4 disk raid5 (ATA133) Fedora Core 3 PostgreSQL 7.4.7 I have what is essentially a data warehouse of stock data. Each day has around 30,000 records (tickers). A typical operation is to get the 200 day simple moving average (of price) for each ticker and write the result to a summary table. In running this process (Perl/DBI), it is typical to see 70-80% I/O wait time with postgres running a about 8-9%. If I run the next day's date, the postgres cache and file cache is now populated with 199 days of the needed data, postgres runs 80-90% of CPU and total run time is greatly reduced. My conclusion is that this is a high cache hit rate in action. I've done other things that make sense, like using indexes, playing with the planner constants and turning up the postgres cache buffers. Even playing with extream hdparm read-ahead numbers (i.e. 64738) yields no apparent difference in database performance. The random nature of the I/O drops disk reads down to about 1MB/sec for the array. A linear table scan can easily yield 70-80MB/sec on this system. Total table size is usually around 1GB and with indexes should be able to fit completely in main memory. Other databases like Oracle and DB2 implement some sort of row prefetch. Has there been serious consideration of implementing something like a prefetch subsystem? Does anyone have any opinions as to why this would be a bad idea for postgres? Postges is great for a multiuser environment and OLTP applications. However, in this set up, a data warehouse, the observed performance is not what I would hope for. Regards, Matt Olson Ocean Consulting http://www.oceanconsulting.com/ ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [PERFORM] Prefetch
My postgres binaries and WAL are on a separate disk from the raid array. The table I'm doing the selects from is probably about 4GB in size and 18-20 million records. No concurrent or dependent inserts or deletes are going on. Tom's point and your points about optimizing the application are well taken. I know my approach is sub optimal and prone to getting caught by latency issues (seek times, cache hit rates, etc.). However, the question of prefetch in my mind is all about eliminating latencies, so, I thought my problem would be good for the sake of discussing prefetching. The two approaches I'm in the process of testing are Rod and Greg's suggestion of using 'CLUSTER'. And for the sake of not letting a good idea get away, I'll probably spend time on doing a parallel query approach which Tom suggested. I'll report back to the list what I find and maybe do some _rough_ benchmarking. This is a production app, so I can't get too much in the way of the daily batches. -- Matt Olson Ocean Consulting http://www.oceanconsulting.com/ On Tuesday 10 May 2005 11:13 am, Greg Stark wrote: Matt Olson writes: I've done other things that make sense, like using indexes, playing with the planner constants and turning up the postgres cache buffers. Even playing with extream hdparm read-ahead numbers (i.e. 64738) yields no apparent difference in database performance. The random nature of the I/O drops disk reads down to about 1MB/sec for the array. A linear table scan can easily yield 70-80MB/sec on this system. Total table size is usually around 1GB and with indexes should be able to fit completely in main memory. Actually forcing things to use indexes is the wrong direction to go if you're trying to process lots of data and want to stream it off disk as rapidly as possible. I would think about whether you can structure your data such that you can use sequential scans. That might mean partitioning your raw data into separate tables and then accessing only the partitions that are relevant to the query. In your application that might be hard. It sounds like you would need more or less one table per stock ticker which would really be hard to manage. One thing you might look into is using the CLUSTER command. But postgres doesn't maintain the cluster ordering so it would require periodically rerunning it. I'm a bit surprised by your 1MB/s rate. I would expect to see about 10MB/s even for completely random reads. Is it possible you're seeing something else interfering? Do you have INSERT/UPDATE/DELETE transactions happening concurrently with this select scan? If so you should strongly look into separating the transaction log from the data files. ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM]
On Thu, Jan 20, 2005 at 11:31:29 -0500, Alex Turner [EMAIL PROTECTED] wrote: I am curious - I wasn't aware that postgresql supported partitioned tables, Could someone point me to the docs on this. Some people have been doing it using a union view. There isn't actually a partition feature. Actually, there is. If found this example on pgsql-performance: CREATE TABLE super_foo ( partition NUMERIC, bar NUMERIC ); ANALYZE super_foo ; CREATE TABLE sub_foo1 () INHERITS ( super_foo ); INSERT INTO sub_foo1 VALUES ( 1, 1 ); -- repeat insert until sub_foo1 has 1,000,000 rows CREATE INDEX idx_subfoo1_partition ON sub_foo1 ( partition ); ANALYZE sub_foo1 ; CREATE TABLE sub_foo2 () INHERITS ( super_foo ); INSERT INTO sub_foo2 VALUES ( 2, 1 ); -- repeat insert until sub_foo2 has 1,000,000 rows CREATE INDEX idx_subfoo2_partition ON sub_foo2 ( partition ); ANALYZE sub_foo2 ; I think that in certain cases this system even beats Oracle as it stores less information in the table partitions. (and in doing so is causing less disk IO) BTW, internally, Oracle sees partitions as tables too. Even the Union all system that MS SQL Server uses works fine as long as the optimiser supports it to prune correctly. Cheers, Matt -- Matt Casters [EMAIL PROTECTED] i-Bridge bvba, http://www.kettle.be Fonteinstraat 70, 9400 Okegem, Belgium Phone +32 (0) 486/97.29.37 ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] PostgreSQL clustering VS MySQL clustering
Presumably it can't _ever_ know without being explicitly told, because even for a plain SELECT there might be triggers involved that update tables, or it might be a select of a stored proc, etc. So in the general case, you can't assume that a select doesn't cause an update, and you can't be sure that the table list in an update is a complete list of the tables that might be updated. Tatsuo Ishii wrote: Can I ask a question? Suppose table A gets updated on the master at time 00:00. Until 00:03 pgpool needs to send all queries regarding A to the master only. My question is, how can pgpool know a query is related to A? -- Tatsuo Ishii ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: [PERFORM]
Some people have been doing it using a union view. There isn't actually a partition feature. Actually, there is. If found this example on pgsql-performance: CREATE TABLE super_foo ( partition NUMERIC, bar NUMERIC ); ANALYZE super_foo ; CREATE TABLE sub_foo1 () INHERITS ( super_foo ); [...] CREATE TABLE sub_foo2 () INHERITS ( super_foo ); [...] Yes, this could be used instead of a view. But there is one thing missing. You can't just insert into super_foo and aquire the correct partition. You will still have to insert into the correct underlying table. Real partitioning will take care of correct partition selection. This IS bad news. It would mean a serious change in the ETL. I think I can solve the other problems, but I don't know about this one... Regards, Matt ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [PERFORM] PostgreSQL clustering VS MySQL clustering
Yes, I wasn't really choosing my examples particularly carefully, but I think the conclusion stands: pgpool (or anyone/thing except for the server) cannot in general tell from the SQL it is handed by the client whether an update will occur, nor which tables might be affected. That's not to say that pgpool couldn't make a good guess in the majority of cases! M Joshua D. Drake wrote: Matt Clark wrote: Presumably it can't _ever_ know without being explicitly told, because even for a plain SELECT there might be triggers involved that update tables, or it might be a select of a stored proc, etc. So in the general case, you can't assume that a select doesn't cause an update, and you can't be sure that the table list in an update is a complete list of the tables that might be updated. Uhmmm no :) There is no such thing as a select trigger. The closest you would get is a function that is called via select which could be detected by making sure you are prepending with a BEGIN or START Transaction. Thus yes pgPool can be made to do this. Sincerely, Joshua D. Drake ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
[PERFORM]
Hi, I have the go ahead of a customer to do some testing on Postgresql in a couple of weeks as a replacement for Oracle. The reason for the test is that the number of users of the warehouse is going to increase and this will have a serious impact on licencing costs. (I bet that sounds familiar) We're running a medium sized data warehouse on a Solaris box (4CPU, 8Gb RAM) on Oracle. Basically we have 2 large fact tables to deal with: one going for 400M rows, the other will be hitting 1B rows soon. (around 250Gb of data) My questions to the list are: has this sort of thing been attempted before? If so, what where the performance results compared to Oracle? I've been reading up on partitioned tabes on pgsql, will the performance benefit will be comparable to Oracle partitioned tables? What are the gotchas? Should I be testing on 8 or the 7 version? While I didn't find any documents immediately, are there any fine manuals to read on data warehouse performance tuning on PostgreSQL? Thanks in advance for any help you may have, I'll do my best to keep pgsql-performance up to date on the results. Best regards, Matt -- Matt Casters [EMAIL PROTECTED] i-Bridge bvba, http://www.kettle.be Fonteinstraat 70, 9400 Okegem, Belgium Phone +32 (0) 486/97.29.37 ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: [PERFORM]
Thanks Stephen, My main concern is to get as much read performance on the disks as possible on this given system. CPU is rarely a problem on a typical data warehouse system, this one's not any different. We basically have 2 RAID5 disk sets (300Gb) and 150Gb) with a third one coming along.(around 350Gb) I was kind of hoping that the new PGSQL tablespaces would allow me to create a storage container spanning multiple file-systems, but unfortunately, that seems to be not the case. Is this correct? That tells me that I probably need to do a full reconfiguration of the disks on the Solaris level to get maximum performance out of the system. Mmmm. This is going to be a though one to crack. Perhaps it will be possible to get some extra juice out of placing the indexes on the smaller disks (150G) and the data on the bigger ones? Thanks! Matt -Oorspronkelijk bericht- Van: Stephen Frost [mailto:[EMAIL PROTECTED] Verzonden: donderdag 20 januari 2005 15:26 Aan: Matt Casters CC: pgsql-performance@postgresql.org Onderwerp: Re: [PERFORM] * Matt Casters ([EMAIL PROTECTED]) wrote: I have the go ahead of a customer to do some testing on Postgresql in a couple of weeks as a replacement for Oracle. The reason for the test is that the number of users of the warehouse is going to increase and this will have a serious impact on licencing costs. (I bet that sounds familiar) Rather familiar, yes... :) We're running a medium sized data warehouse on a Solaris box (4CPU, 8Gb RAM) on Oracle. Basically we have 2 large fact tables to deal with: one going for 400M rows, the other will be hitting 1B rows soon. (around 250Gb of data) Quite a bit of data. There's one big thing to note here I think- Postgres will not take advantage of multiple CPUs for a given query, Oracle will. So, it depends on your workload as to how that may impact you. Situations where this will be unlikely to affect you: Your main bottle-neck is IO/disk and not CPU. You run multiple queries in parallel frequently. There are other processes on the system which chew up CPU time anyway. Situations where you're likely to be affected would be: You periodically run one big query. You run a set of queries in sequential order. My questions to the list are: has this sort of thing been attempted before? If so, what where the performance results compared to Oracle? I'm pretty sure it's been attempted before but unfortunately I don't have any numbers on it myself. My data sets aren't that large (couple million rows) but I've found PostgreSQL at least as fast as Oracle for what we do, and much easier to work with. I've been reading up on partitioned tabes on pgsql, will the performance benefit will be comparable to Oracle partitioned tables? In this case I would think so, except that PostgreSQL still won't use multiple CPUs for a given query, even against partitioned tables, aiui. What are the gotchas? See above? :) Other issues are things having to do w/ your specific SQL- Oracle's old join syntax isn't supported by PostgreSQL (what is it, something like select x,y from a,b where x=%y; to do a right-join, iirc). Should I be testing on 8 or the 7 version? Now that 8.0 is out I'd say probably test with that and just watch for 8.0.x releases before you go production, if you have time before you have to go into production with the new solution (sounds like you do- changing databases takes time anyway). Thanks in advance for any help you may have, I'll do my best to keep pgsql-performance up to date on the results. Hope that helps. Others on here will correct me if I misspoke. :) Stephen ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [PERFORM]
Joshua, Actually that's a great idea! I'll have to check if Solaris wants to play ball though. We'll have to see as we don't have the new disks yet, ETA is next week. Cheers, Matt -Oorspronkelijk bericht- Van: Joshua D. Drake [mailto:[EMAIL PROTECTED] Verzonden: donderdag 20 januari 2005 21:26 Aan: [EMAIL PROTECTED] CC: pgsql-performance@postgresql.org Onderwerp: Re: [PERFORM] Matt Casters wrote: Thanks Stephen, My main concern is to get as much read performance on the disks as possible on this given system. CPU is rarely a problem on a typical data warehouse system, this one's not any different. We basically have 2 RAID5 disk sets (300Gb) and 150Gb) with a third one coming along.(around 350Gb) Why not run two raid systems. A RAID 1 for your OS and a RAID 10 for your database? Push all of your extra drives into the RAID 10. Sincerely, Joshua D. Drake I was kind of hoping that the new PGSQL tablespaces would allow me to create a storage container spanning multiple file-systems, but unfortunately, that seems to be not the case. Is this correct? That tells me that I probably need to do a full reconfiguration of the disks on the Solaris level to get maximum performance out of the system. Mmmm. This is going to be a though one to crack. Perhaps it will be possible to get some extra juice out of placing the indexes on the smaller disks (150G) and the data on the bigger ones? Thanks! Matt -Oorspronkelijk bericht- Van: Stephen Frost [mailto:[EMAIL PROTECTED] Verzonden: donderdag 20 januari 2005 15:26 Aan: Matt Casters CC: pgsql-performance@postgresql.org Onderwerp: Re: [PERFORM] * Matt Casters ([EMAIL PROTECTED]) wrote: I have the go ahead of a customer to do some testing on Postgresql in a couple of weeks as a replacement for Oracle. The reason for the test is that the number of users of the warehouse is going to increase and this will have a serious impact on licencing costs. (I bet that sounds familiar) Rather familiar, yes... :) We're running a medium sized data warehouse on a Solaris box (4CPU, 8Gb RAM) on Oracle. Basically we have 2 large fact tables to deal with: one going for 400M rows, the other will be hitting 1B rows soon. (around 250Gb of data) Quite a bit of data. There's one big thing to note here I think- Postgres will not take advantage of multiple CPUs for a given query, Oracle will. So, it depends on your workload as to how that may impact you. Situations where this will be unlikely to affect you: Your main bottle-neck is IO/disk and not CPU. You run multiple queries in parallel frequently. There are other processes on the system which chew up CPU time anyway. Situations where you're likely to be affected would be: You periodically run one big query. You run a set of queries in sequential order. My questions to the list are: has this sort of thing been attempted before? If so, what where the performance results compared to Oracle? I'm pretty sure it's been attempted before but unfortunately I don't have any numbers on it myself. My data sets aren't that large (couple million rows) but I've found PostgreSQL at least as fast as Oracle for what we do, and much easier to work with. I've been reading up on partitioned tabes on pgsql, will the performance benefit will be comparable to Oracle partitioned tables? In this case I would think so, except that PostgreSQL still won't use multiple CPUs for a given query, even against partitioned tables, aiui. What are the gotchas? See above? :) Other issues are things having to do w/ your specific SQL- Oracle's old join syntax isn't supported by PostgreSQL (what is it, something like select x,y from a,b where x=%y; to do a right-join, iirc). Should I be testing on 8 or the 7 version? Now that 8.0 is out I'd say probably test with that and just watch for 8.0.x releases before you go production, if you have time before you have to go into production with the new solution (sounds like you do- changing databases takes time anyway). Thanks in advance for any help you may have, I'll do my best to keep pgsql-performance up to date on the results. Hope that helps. Others on here will correct me if I misspoke. :) Stephen ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster -- Command Prompt, Inc., your source for PostgreSQL replication, professional support, programming, managed services, shared and dedicated hosting. Home of the Open Source Projects plPHP, plPerlNG, pgManage, and pgPHPtoolkit. Contact us now at: +1-503-667-4564 - http://www.commandprompt.com ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
[PERFORM] DWH on Postgresql
Hi, I have the go ahead of a customer to do some testing on Postgresql in a couple of weeks as a replacement for Oracle. The reason for the test is that the number of users of the warehouse is going to increase and this will have a serious impact on licencing costs. (I bet that sounds familiar) We're running a medium sized data warehouse on a Solaris box (4CPU, 8Gb RAM) on Oracle. Basically we have 2large fact tables to deal with: onegoing for400M rows, the other will be hitting 1B rows soon. (around 250Gb of data) My questions to the list are: has this sort of thing been attempted before?If so, what where the results? I've been reading up on partitioned tabes on pgsql, will the performance benefit will be comparable to Oracle partitioned tables? What are the gotchas? Should I be testing on 8 or the 7 version? Thanks in advancefor any help you may have, I'll do my best to keep pgsql-performance up to date on the results. Best regards, Matt ___ Matt Casters i-Bridge bvba, http://www.kettle.be Fonteinstraat 70, 9400 OKEGEM, Belgium Tel. 054/25.01.37 GSM 0486/97.29.37
Re: [PERFORM] Swapping on Solaris
Kevin Schroeder wrote: It looks to me like you are using no (device or file) swap at all, and have 1.3G of real memory free, so could in fact give Postgres more of it :-) Indeed. If you DO run into trouble after giving Postgres more RAM, use the vmstat command. You can use this command like vmstat 10. (ignore the first line) Keep an eye on the pi and po parameters. (kilobytes paged in and out) HTH, Matt -- Matt Casters [EMAIL PROTECTED] i-Bridge bvba, http://www.kettle.be Fonteinstraat 70, 9400 Okegem, Belgium Phone +32 (0) 486/97.29.37 ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: [PERFORM] Swapping on Solaris
This page may be of use: http://www.serverworldmagazine.com/monthly/2003/02/solaris.shtml From personal experience, for god's sake don't think Solaris' VM/swap implementation is easy - it's damn good, but it ain't easy! Matt Kevin Schroeder wrote: I think it's probably just reserving them. I can't think of anything else. Also, when I run swap activity with sar I don't see any activity, which also points to reserved swap space, not used swap space. swap -s reports total: 358336k bytes allocated + 181144k reserved = 539480k used, 2988840k available Kevin - Original Message - From: Alan Stange [EMAIL PROTECTED] To: Kevin Schroeder [EMAIL PROTECTED] Cc: pgsql-performance@postgresql.org Sent: Wednesday, January 19, 2005 11:04 AM Subject: Re: [PERFORM] Swapping on Solaris Kevin Schroeder wrote: I may be asking the question the wrong way, but when I start up PostgreSQL swap is what gets used the most of. I've got 1282MB free RAM right now and and 515MB swap in use. Granted, swap file usage probably wouldn't be zero, but I would guess that it should be a lot lower so something must be keeping PostgreSQL from using the free RAM that my system is reporting. For example, one of my postgres processes is 201M in size but on 72M is resident in RAM. That extra 130M is available in RAM, according to top, but postgres isn't using it. The test you're doing doesn't measure what you think you're measuring. First, what else is running on the machine?Note that some shared memory allocations do reserve backing pages in swap, even though the pages aren't currently in use. Perhaps this is what you're measuring? swap -s has better numbers than top. You'd be better by trying a reboot then starting pgsql and seeing what memory is used. Just because you start a process and see the swap number increase doesn't mean that the new process is in swap. It means some anonymous pages had to be evicted to swap to make room for the new process or some pages had to be reserved in swap for future use. Typically a new process won't be paged out unless something else is causing enormous memory pressure... -- Alan ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Speed in V8.0
Another man working to the bitter end this Christmas! There could be many reasons, but maybe first you should look at the amount of RAM available? If the tables fit in RAM on the production server but not on the dev server, then that will easily defeat the improvement due to using the native DB version. Why don't you install cygwin on the dev box and do the comparison using the same hardware? M -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Thomas Wegner Sent: 21 December 2004 23:03 To: pgsql-performance@postgresql.org Subject: [PERFORM] Speed in V8.0 Hello, i have a problem between V7.4.3 Cygwin and V8.0RC2 W2K. I have 2 systems: 1. Production Machine - Dual P4 3000MHz - 2 GB RAM - W2K - PostgreSQL 7.4.3 under Cygwin - i connect to it over a DSL Line 2. Develop Machine - P4 1800MHz - 760 MB RAM - PostgreSQL Native Windows - local connection 100MB/FD Both systems use the default postgresql.conf. Now the problem. I have an (unoptimized, dynamic) query wich was execute on the production machine over DSL in 2 seconds and on my develop machine, connected over local LAN, in 119 seconds! Whats this? I can not post the query details here public, its a commercial project. Any first idea? I execute on both machine the same query with the same database design! - Thomas Wegner CabrioMeter - The Weather Plugin for Trillian http://www.wegner24.de/cabriometer ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [PERFORM] ext3 journalling type
Am I right to assume that writeback is both fastest and at the same time as safe to use as ordered? Maybe any of you did some benchmarks? It should be fastest because it is the least overhead, and safe because postgres does it's own write-order guaranteeing through fsync(). You should also mount the FS with the 'noatime' option. But For some workloads, there are tests showing that 'data=journal' can be the fastest! This is because although the data is written twice (once to the journal, and then to its real location on disk) in this mode data is written _sequentially_ to the journal, and later written out to its destination, which may be at a quieter time. There's a discussion (based around 7.2) here: http://www.kerneltraffic.org/kernel-traffic/kt20020401_160.txt M ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [PERFORM] Strange (?) Index behavior?
For some reason it's a requirement that partial wildcard searches are done on this field, such as SELECT ... WHERE field LIKE 'A%' I thought an interesting way to do this would be to simply create partial indexes for each letter on that field, and it works when the query matches the WHERE clause in the index exactly like above. The problem is thus: I thought PG could use an ordinary index for 'like' conditions with just a terminating '%'? My other thought is that like 'A%' should grab about 1/26th of the table anyway (if the initial character distribution is random), and so a sequential scan might be the best plan anyway... M ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
[PERFORM] What is the difference between these?
To me, these three queries seem identical... why doesn't the first one (simplest to understand and write) go the same speed as the third one? I'll I'm trying to do is get statistics for one day (in this case, today) summarized. Table has ~25M rows. I'm using postgres 7.3.? on rh linux 7.3 (note that i think the difference between the first two might just be related to the data being in memory for the second query). EXPLAIN ANALYZE select count(distinct sessionid) from usage_access where atime date_trunc('day', now()); QUERY PLAN -- Aggregate (cost=933439.69..933439.69 rows=1 width=4) (actual time=580350.43..580350.43 rows=1 loops=1) - Seq Scan on usage_access (cost=0.00..912400.11 rows=8415831 width=4) (actual time=580164.48..580342.21 rows=2964 loops=1) Filter: (atime date_trunc('day'::text, now())) Total runtime: 580350.65 msec (4 rows) EXPLAIN ANALYZE select count(distinct sessionid) from (select * from usage_access where atime date_trunc('day', now())) as temp; QUERY PLAN -- Aggregate (cost=933439.69..933439.69 rows=1 width=4) (actual time=348012.85..348012.85 rows=1 loops=1) - Seq Scan on usage_access (cost=0.00..912400.11 rows=8415831 width=4) (actual time=347960.53..348004.68 rows=2964 loops=1) Filter: (atime date_trunc('day'::text, now())) Total runtime: 348013.10 msec (4 rows) EXPLAIN ANALYZE select count(distinct sessionid) from usage_access where atime between date_trunc('day', now()) and date_trunc('day', now()) + '1 day'::interval; QUERY PLAN -- Aggregate (cost=89324.98..89324.98 rows=1 width=4) (actual time=27.84..27.84 rows=1 loops=1) - Index Scan using usage_access_atime on usage_access (cost=0.00..89009.39 rows=126237 width=4) (actual time=0.51..20.37 rows=2964 loops=1) Index Cond: ((atime = date_trunc('day'::text, now())) AND (atime = (date_trunc('day'::text, now()) + '1 day'::interval))) Total runtime: 28.11 msec (4 rows) -- Matthew Nuzum | Makers of Elite Content Management System www.followers.net | View samples of Elite CMS in action [EMAIL PROTECTED] | http://www.followers.net/portfolio/ ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Strange (?) Index behavior?
With that many rows, and a normal index on the field, postgres figures the best option for say I% is not an index scan, but a sequential scan on the table, with a filter -- quite obviously this is slow as heck, and yes, I've run analyze several times and in fact have the vacuum analyze automated. Ah, so like 'I%' uses a very slow seq scan, but like 'ABC%' uses an ordinary index OK? If so then... The planner would usually assume (from what Tom usually says) that 1/26 selectivity isn't worth doing an index scan for, but in your case it's wrong (maybe because the rows are very big?) You may be able to get the planner to go for an index scan on like 'I%' by tweaking the foo_cost variables in postgresql.conf Or you could have the app rewrite like 'I%' to like 'IA%' or like 'IB%' ... , or do that as a stored proc. With the partial index the index scan is used and the cost drops from 0..2million to 0..9000 -- a vast improvement. So there are really only 9000 rows out of 76 million starting with 'I'? How about combining some techniques - you could create an index on the first two chars of the field (should be selective enough to give an index scan), select from that, and select the actual data with the like clause. CREATE INDEX idx_firstletters ON table (substr(field, 1, 2)); CREATE INDEX idx_all ON table (field); SELECT field FROM (SELECT field FROM table WHERE substr(field, 1, 2) = 'DE') AS approx WHERE field LIKE 'DE%'; Any good? ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [PERFORM] Restricting Postgres
I have a dual processor system that can support over 150 concurrent connections handling normal traffic and load. Now suppose I setup Apache to spawn all of it's children instantly, what will ... This will spawn 150 children in a short order of time and as this takes Doctor, it hurts when I do this! Well, don't do that then... Sorry, couldn't resist ;-) Our Apache/PG driven website also needs to be able to deal with occasional large peaks, so what we do is: StartServers 15 # Don't create too many children initially MinSpareServers 10 # Always have at least 10 spares lying around MaxSpareServers 20 # But no more than 20 MaxClients 150 # Up to 150 - the default 256 is too much for our RAM So on server restart 15 Apache children are created, then one new child every second up to a maximum of 150. Apache's 'ListenBackLog' is around 500 by default, so there's plenty of scope for queuing inbound requests while we wait for sufficient children to be spawned. In addition we (as _every_ high load site should) run Squid as an accelerator, which dramatically increases the number of client connections that can be handled. Across 2 webservers at peak times we've had 50,000 concurrently open http https client connections to Squid, with 150 Apache children doing the work that squid can't (i.e. all the dynamic stuff), and PG (on a separate box of course) whipping through nearly 800 mixed selects, inserts and updates per second - and then had to restart Apache on one of the servers for a config change... Not a problem :-) One little tip - if you run squid on the same machine as apache, and use a dual-proc box, then because squid is single-threaded it will _never_ take more than half the CPU - nicely self balancing in a way. M ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html
Re: [PERFORM] Restricting Postgres
Apache::DBI overall works better to what I require, even if it is not a pool per sey. Now if pgpool supported variable rate pooling like Apache does with it's children, it might help to even things out. That and you'd still get the spike if you have to start the webserver and database server at or around the same time. I still don't quite get it though - you shouldn't be getting more than one child per second being launched by Apache, so that's only one PG postmaster per second, which is really a trivial load. That is unless you have 'StartServers' set high, in which case the 'obvious' answer is to lower it. Are you launching multiple DB connections per Apache process as well? ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [PERFORM] Restricting Postgres
Case in point: A first time visitor hits your home page. A dynamic page is generated (in about 1 second) and served (taking 2 more seconds) which contains links to 20 additional The gain from an accelerator is actually even more that that, as it takes essentially zero seconds for Apache to return the generated content (which in the case of a message board could be quite large) to Squid, which can then feed it slowly to the user, leaving Apache free again to generate another page. When serving dialup users large dynamic pages this can be a _huge_ gain. I think Martin's pages (dimly recalling another thread) take a pretty long time to generate though, so he may not see quite such a significant gain. ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Restricting Postgres
Correct the 75% of all hits are on a script that can take anywhere from a few seconds to a half an hour to complete.The script essentially auto-flushes to the browser so they get new information as it arrives creating the illusion of on demand generation. This is more like a streaming data server, which is a very different beast from a webserver, and probably better suited to the job. Usually either multithreaded or single-process using select() (just like Squid). You could probably build one pretty easily. Using a 30MB Apache process to serve one client for half an hour seems like a hell of a waste of RAM. A squid proxy would probably cause severe problems when dealing with a script that does not complete output for a variable rate of time. No, it's fine, squid gives it to the client as it gets it, but can receive from the server faster. ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Restricting Postgres
1- You have a query that runs for half an hour and you spoon feed the results to the client ? (argh) 2- Your script looks for new data every few seconds, sends a packet, then sleeps, and loops ? If it's 2 I have a readymade solution for you, just ask. I'm guessing (2) - PG doesn't give the results of a query in a stream. ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Restricting Postgres
These are CGI scripts at the lowest level, nothing more and nothing less. While I could probably embed a small webserver directly into the perl scripts and run that as a daemon, it would take away the portability that the scripts currently offer. If they're CGI *scripts* then they just use the CGI environment, not Apache, so a daemon that accepts the inbound connections, then compiles the scripts a-la Apache::Registry, but puts each in a separate thread would be, er, relatively easy for someone better at multithreaded stuff than me. This should be my last question on the matter, does squid report the proper IP address of the client themselves?That's a critical requirement for the scripts. In the X-Forwarded-For header. Not that you can be sure you're seeing the true client IP anyway if they've gone through an ISP proxy beforehand. ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [PERFORM] Restricting Postgres
In your webpage include an iframe with a Javascript to refresh it every five seconds. The iframe fetches a page from the server which brings in the new data in form of generated JavaScript which writes in the parent window. Thus, you get a very short request every 5 seconds to fetch new data, and it is displayed in the client's window very naturally. ... Yup. If you go the JS route then you can do even better by using JS to load data into JS objects in the background and manipulate the page content directly, no need for even an Iframe. Ignore the dullards who have JS turned off - it's essential for modern web apps, and refusing JS conflicts absolutely with proper semantic markup. http://developer.apple.com/internet/webcontent/xmlhttpreq.html is a good starting point. It's clear that this discussion has moved way away from PG! Although in the context of DB backed web apps I guess in remains a bit on-topic... M ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] appropriate indexing
- ITEM table will, grow, grow, grow (sold items are not deleted) WHERE PRODUCT.SECTION_USED_FK IS NOT NULL AND ITEM.STATUS=1 and (ITEM.KIND=2 or ITEM.KIND=3) Partial index on item.status ? ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [PERFORM] Better Hardware, worst Results
All 3 plans have crappy estimates. Run ANALYZE in production, then send another explain analyze (as an attachment please, to avoid linewrap). Er, no other possible answer except Rod's :-) ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: [PERFORM] Restricting Postgres
Javascript is too powerful to turn for any random web page. It is only essential for web pages because people write their web pages to only work with javascript. Hmm... I respectfully disagree. It is so powerful that it is impossible to ignore when implementing a sophisticated app. And it is not dangerous to the user so long as they have a popup blocker. Commercially, I can ignore the people who turn it off, and I can gain a huge benefit from knowing that 95% of people have it turned on, because it gives my users a hugely better experience than the equivalent XHTML only page (which I deliver, and which works, but which is a fairly depressing experience compared to the JS enabled version). It is _amazing_ how much crud you can take out of a page if you let JS do the dynamic stuff (with CSS still in full control of the styling). Nice, clean, semantically sensible XHTML, that can be transformed for multiple devices - it's great. An example: a class=preview_link/previews/foo.wmv/a But we want it to appear in a popup when viewed in certain devices Easy - Attach an 'onclick' event handler (or just set the target attribute) when the device has a suitable screen media player, but leave the markup clean for the rest of the world. ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [PERFORM] Restricting Postgres
A note though : you'll have to turn off HTTP persistent connections in your server (not in your proxy) or youre back to square one. I hadn't considered that. On the client side it would seem to be up to the client whether to use a persistent connection or not. If it does, then yeah, a request every 5 seconds would still just hold open a server. One more reason to use a proxy I s'pose. It's clear that this discussion has moved way away from PG! Although in the context of DB backed web apps I guess in remains a bit on-topic... I find it very on-topic as - it's a way to help this guy solve his pg problem which was iin fact a design problem - it's the future of database driven web apps (no more reloading the whole page !) I think in the future there will be a good bit of presentation login in the client... Not if Bruno has his way ;-) ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Restricting Postgres
Pierre-Frédéric Caillaud wrote: check this marvelus piece of 5 minutes of work : http://boutiquenumerique.com/test/iframe_feed.html cela m'a fait le sourire :-) (apologies for bad french) M ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [PERFORM] preloading indexes
Title: Message The best way to get all the stuff needed by a query into RAM is to run the query. Is it more that you want to 'pin' the data in RAM so it doesn't get overwritten by other queries? -Original Message-From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of [EMAIL PROTECTED]Sent: 03 November 2004 17:31To: [EMAIL PROTECTED]Subject: [PERFORM] preloading indexes I am working with some pretty convoluted queries that work very slowly the first time theyre called but perform fine on the second call. I am fairly certain that these differences are due to the caching. Can someone point me in a direction that would allow me to pre-cache the critical indexes?
Re: [PERFORM] can't handle large number of INSERT/UPDATEs
and certainly anyone who's been around a computer more than a week or two knows which direction in and out are customarily seen from. regards, tom lane Apparently not whoever wrote the man page that everyone copied ;-) Interesting. I checked this on several machines. They actually say different things. Redhat 9- bi: Blocks sent to a block device (blocks/s). Latest Cygwin- bi: Blocks sent to a block device (blocks/s). Redhat 7.x- bi: Blocks sent to a block device (blocks/s). Redhat AS3- bi: blocks sent out to a block device (in blocks/s) I would say that I probably agree, things should be relative to the cpu. However, it doesn't seem to be something that was universally agreed upon. Or maybe the man-pages were all wrong, and only got updated recently. Looks like the man pages are wrong, for RH7.3 at least. It says bi is 'blocks written', but an actual test like 'dd if=/dev/zero of=/tmp/test bs=1024 count=16384' on an otherwise nearly idle RH7.3 box gives: procs memoryswap io system cpu r b w swpd free buff cache si sobibo incs us sy id 0 0 0 75936 474704 230452 953580 0 0 0 0 106 2527 0 0 99 0 0 0 75936 474704 230452 953580 0 0 0 16512 376 2572 0 2 98 0 0 0 75936 474704 230452 953580 0 0 0 0 105 2537 0 0 100 Which is in line with bo being 'blocks written'. M ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [PERFORM] can't handle large number of INSERT/UPDATEs
I don't have iostat on that machine, but vmstat shows a lot of writes to the drives, and the runnable processes are more than 1: 6 1 0 3617652 292936 279192800 0 52430 1347 4681 25 19 20 37 Assuming that's the output of 'vmstat 1' and not some other delay, 50MB/second of sustained writes is usually considered 'a lot'. ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html
[PERFORM] Anything to be gained from a 'Postgres Filesystem'?
I suppose I'm just idly wondering really. Clearly it's against PG philosophy to build an FS or direct IO management into PG, but now it's so relatively easy to plug filesystems into the main open-source Oses, It struck me that there might be some useful changes to, say, XFS or ext3, that could be made that would help PG out. I'm thinking along the lines of an FS that's aware of PG's strategies and requirements and therefore optimised to make those activities as efiicient as possible - possibly even being aware of PG's disk layout and treating files differently on that basis. Not being an FS guru I'm not really clear on whether this would help much (enough to be worth it anyway) or not - any thoughts? And if there were useful gains to be had, would it need a whole new FS or could an existing one be modified? So there might be (as I said, I'm not an FS guru...): * great append performance for the WAL? * optimised scattered writes for checkpointing? * Knowledge that FSYNC is being used for preserving ordering a lot of the time, rather than requiring actual writes to disk (so long as the writes eventually happen in order...)? Matt Matt Clark Ymogen Ltd P: 0845 130 4531 W: https://ymogen.net/ M: 0774 870 1584 ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: [PERFORM] OS desicion
You are asking the wrong question. The best OS is the OS you (and/or the customer) knows and can administer competently. The real performance differences between unices are so small as to be ignorable in this context. The context switching bug is not OS-dependent, but varys in severity across machine architectures (I understand it to be mostly P4/Athlon related, but don't take my word for it). M Tom Fischer wrote: Hi List, I have a Dual-Xeon 3Ghz System with with GB RAM and an Adaptec 212ß SCSI RAID with 4 SCA Harddiscs. Our customer wants to have the Machine tuned for best Database performance. Which OS should we used? We are tending between Linux 2.6 or FreeBSD. The Database Size is 5GB and ascending. Most SQL-Queries are Selects, the Tablesizes are beetween 300k and up to 10 MB. I've read the Hardware Performance Guide and the result was to take FreeBSD in the Decision too :) And what is on this Context Switiching Bug i have read in the Archive? Hope you can help me Regards Tom ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: [PERFORM] how much mem to give postgres?
How would I turn that off? In the kernel config? Not too familiar with that. I have a 2 proc xeon with 4 gigs of mem on the way for postgres, so I hope HT isn't a problem. If HT is turned off, does it just not use the other half of the processor? Or does the processor just work as one unit? You turn it off in the BIOS. There is no 'other half', the processor is just pretending to have two cores by shuffling registers around, which gives maybe a 5-10% performance gain in certain multithreaded situations. opinionA hack to overcome marchitactural limitations due to the overly long pipeline in the Prescott core./opinion. Really of most use for desktop interactivity rather than actual throughput. M ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: [PERFORM] OS desicion
The real performance differences between unices are so small as to be ignorable in this context. <> Well, at least the difference between Linux and BSD. There are substantial tradeoffs should you chose to use Solaris or UnixWare. Yes, quite right, I should have said 'popular x86-based unices'.
Re: [PERFORM] how much mem to give postgres?
OT Hyperthreading is actually an excellent architectural feature that can give significant performance gains when implemented well and used for an appropriate workload under a decently HT aware OS. IMO, typical RDBMS streams are not an obviously appropriate workload, Intel didn't implement it particularly well and I don't think there are any OSes that support it particularly well. /OT But don't write off using it in the future, when it's been improved at both the OS and the silicon levels. You are quite right of course - unfortunately the current Intel implementation meets nearly none of these criteria! As Rod Taylor pointed out off-list, IBM's SMT implementation on the Power5 is vastly superior. Though he's also just told me that Sun is beating IBM on price/performance for his workload, so who knows how reliable a chap he is... ;-) M ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
[PERFORM] Speeding up this function
Hello, I've thought it would be nice to index certain aspects of my apache log files for analysis. I've used several different techniques and have something usable now, but I'd like to tweak it one step further. My first performance optimization was to change the logformat into a CSV format. I processed the logfiles with PHP and plsql stored procedures. Unfortunately, it took more than 24 hours to process 1 days worth of log files. I've now switched to using C# (using mono) to create hash-tables to do almost all of the pre-processing. This has brought the time down to about 3 hours. Actually, if I take out one step it brought the process down to about 6 minutes, which is a tremendous improvement. The one step that is adding 2.5+ hours to the job is not easily done in C#, as far as I know. Once the mostly-normalized data has been put into a table called usage_raw_access I then use this query: insert into usage_access select * , usage_normalize_session(accountid,client,atime) as sessionid from usage_raw_access; All it does is try to link pageviews together into a session. here's the function: create or replace function usage_normalize_session (varchar(12), inet, timestamptz) returns integer as ' DECLARE -- $1 = Account ID, $2 = IP Address, $3 = Time RecordSet record; BEGIN SELECT INTO RecordSet DISTINCT sessionid FROM usage_access ua WHERE ua.accountid = $1 AND ua.client = $2 AND ua.atime = ($3 - ''20 min''::interval)::timestamptz; if found then return RecordSet.sessionid; end if; return nextval(''usage_session_ids''); END;' language plpgsql; And the table usage_access looks like this: Table public.usage_access Column| Type | Modifiers -+--+--- [snip] client | inet | atime | timestamp with time zone | accountid | character varying(12)| sessionid | integer | Indexes: usage_acccess_req_url btree (req_url), usage_access_accountid btree (accountid), usage_access_atime btree (atime), usage_access_hostid btree (hostid), usage_access_sessionid btree (sessionid) usage_access_sessionlookup btree (accountid,client,atime); As you can see, this looks for clients who have visited the same site within 20 min. If there is no match, a unique sessionid is assigned from a sequence. If there is a visit, the session id assigned to them is used. I'm only able to process about 25 records per second with my setup. My window to do this job is 3-4 hours and the shorter the better. Here is an explain analyze of the query I do (note I limited it to 1000): EXPLAIN ANALYZE insert into usage_access select * , usage_normalize_session(accountid,client,atime) as sessionid from usage_raw_access limit 1000; QUERY PLAN -- Subquery Scan *SELECT* (cost=0.00..20.00 rows=1000 width=196) (actual time=51.63..47634.22 rows=1000 loops=1) - Limit (cost=0.00..20.00 rows=1000 width=196) (actual time=51.59..47610.23 rows=1000 loops=1) - Seq Scan on usage_raw_access (cost=0.00..20.00 rows=1000 width=196) (actual time=51.58..47606.14 rows=1001 loops=1) Total runtime: 48980.54 msec I also did an explain of the query that's performed inside the function: EXPLAIN ANALYZE select sessionid from usage_access ua where ua.accountid = 'XYZ' and ua.client = '64.68.88.45'::inet and ua.atime = '2003-11-02 04:50:01-05'::timestamptz; QUERY PLAN -- Index Scan using usage_access_sessionlookup on usage_access ua (cost=0.00..6.02 rows=1 width=4) (actual time=0.29..0.29 rows=0 loops=1) Index Cond: ((accountid = 'XYZ'::character varying) AND (client = '64.68.88.45'::inet) AND (atime = '2003-11-02 04:50:01-05'::timestamp with time zone)) Total runtime: 0.35 msec (3 rows) What I'd really like to know is if someone knows a way to do any of the following: a: Make the INSERT into ... SELECT *,usage_access_sessionlookup().. work faster b: Make the usage_access_sessionlookup() smarter,better,etc. c: Do this in C# using a hash-table or some other procedure that would be quicker. d: Find an algorithm to create the sessionid without having to do any database or hash-table lookups. As the dataset gets bigger, it won't fit in RAM and the lookup queries will become I/O bound, drastically slowing things down. d: is my first choice. For some reason I just can't seem to get my mind around the data. I wonder if there's someway
Re: [PERFORM] Speeding up this function
On Tue, 19 Oct 2004 15:49:45 -0400, Jeremy Dunn [EMAIL PROTECTED] wrote: -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Matt Nuzum Sent: Tuesday, October 19, 2004 3:35 PM To: pgsql-performance Subject: [PERFORM] Speeding up this function snip snip This is probably a stupid question, but why are you trying to create sessions after the fact? Since it appears that users of your site must login, why not just assign a sessionID to them at login time, and keep it in the URL for the duration of the session? Then it would be easy to track where they've been. - Jeremy You don't have to log in to visit the sites. These log files are actually for many domains. Right now, we do logging with a web-bug and it does handle the sessions, but it relies on javascript and we want to track a lot more than we are now. Plus, that code is in JavaScript and one of our primary motiviations is to ditch MySQL completely. -- Matthew Nuzum | Makers of Elite Content Management System www.followers.net | View samples of Elite CMS in action [EMAIL PROTECTED] | http://www.followers.net/portfolio/ ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Select with qualified join condition / Batch inserts
SELECT cmp.WELL_INDEX, cmp.COMPOUND, con.CONCENTRATION FROM SCR_WELL_COMPOUND cmp, SCR_WELL_CONCENTRATION con WHERE cmp.BARCODE=con.BARCODE AND cmp.WELL_INDEX=con.WELL_INDEX AND cmp.MAT_ID=con.MAT_ID AND cmp.MAT_ID = 3 AND cmp.BARCODE='910125864' AND cmp.ID_LEVEL = 1; Quick guess - type mismatch forcing sequential scan. Try some quotes: AND cmp.MAT_ID = '3' AND cmp.BARCODE='910125864' AND cmp.ID_LEVEL = '1'; M ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Opteron vs RHAT
trainwreck... If you're going through IBM, then they won't want to respond to any issues if you're not running a bog-standard RHAS/RHES release from Red Hat. ... To be fair, we keep on actually running into things that _can't_ be backported, like fibrechannel drivers that were written to take advantage of changes in the SCSI support in 2.6. I thought IBM had good support for SUSE? I don't know why I thought that... ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [PERFORM] IBM P-series machines
As for vendor support for Opteron, that sure looks like a trainwreck... If you're going through IBM, then they won't want to respond to any issues if you're not running a bog-standard RHAS/RHES release from Red Hat. And that, on Opteron, is preposterous, because there's plenty of the bits of Opteron support that only ever got put in Linux 2.6, whilst RHAT is still back in the 2.4 days. To be fair, they have backported a boatload of 2.6 features to their kernel: http://www.redhat.com/software/rhel/kernel26/ And that page certainly isn't an exhaustive list... M ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Which plattform do you recommend I run PostgreSQL for
In the MySQL manual it says that MySQL performs best with Linux 2.4 with ReiserFS on x86. Can anyone official, or in the know, give similar information regarding PostgreSQL? I'm neither official, nor in the know, but I do have a spare moment! I can tell you that any *NIX variant on any modern hardware platform will give you good performance, except for Cygwin/x86. Any differences between OSes on the same hardware are completely swamped by far more direct concerns like IO systems, database design, OS tuning etc. Pick the OS you're most familiar with is usually a good recommendation (and not just for Postgres). ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Caching of Queries
OK, that'd work too... the point is if you're re-connecting all the time it doesn't really matter what else you do for performance. Yeah, although there is the chap who was asking questions on the list recently who had some very long-running code on his app servers, so was best off closing the connection because he had far too many postmaster processes just sitting there idle all the time! But you're right, it's a killer usually. M ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: [PERFORM] Caching of Queries
Basically you set a default in seconds for the HTML results to be cached, and then have triggers set that force the cache to regenerate (whenever CRUD happens to the content, for example). Can't speak for Perl/Python/Ruby/.Net/Java, but Cache_Lite sure made a believer out of me! Nice to have it in a library, but if you want to be that simplistic then it's easy in any language. What if a process on server B modifies a n important value that server A has cached though? Coherency (albeit that the client may choose to not use it) is a must for a general solution. ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: [PERFORM] Caching of Queries
More to the point though, I think this is a feature that really really should be in the DB, because then it's trivial for people to use. How does putting it into PGPool make it any less trivial for people to use? The answers are at http://www2b.biglobe.ne.jp/~caco/pgpool/index-e.html . Specifically, it's a separate application that needs configuration, the homepage has no real discussion of the potential pitfalls of pooling and what this implementation does to get around them, you get the idea. I'm sure it's great software, but it doesn't come as part of the DB server, so 95% of people who would benefit from query caching being implemented in it never will. If it shipped with and was turned on by default in SUSE or RedHat that would be a different matter. Which I realise makes me look like one of those people who doesn't appreciate code unless it's 'popular', but I hope I'm not *that* bad... Oh OK, I'll say it, this is a perfect example of why My*** has so much more mindshare. It's not better, but it sure makes the average Joe _feel_ better. Sorry, I've got my corporate hat on today, I'm sure I'll feel a little less cynical tomorrow. M
Re: [PERFORM] Caching of Queries
Any competently written application where caching results would be a suitable performance boost can already implement application or middleware caching fairly easily, and increase performance much more than putting result caching into the database would. I guess the performance increase is that you can spend $10,000 on a developer, or $10,000 on hardware, and for the most part get a more reliable result the second way. MemcacheD is fine(ish), but it's not a panacea, and it's more than easy to shoot yourself in the foot with it. Caching is hard enough that lots of people do it badly - I'd rather use an implementation from the PG team than almost anywhere else. I don't see caching results in the database as much of a win for most well written applications. Toy benchmarks, sure, but for real apps it seems it would add a lot of complexity, and violate the whole point of using an ACID database. Well the point surely is to _remove_ complexity from the application, which is written by God Knows Who, and put it in the DB, which is written by God And You. And you can still have ACID (cached data is not the same as stale data, although once you have the former, the latter can begin to look tempting sometimes). M ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [PERFORM] Caching of Queries
I've looked at PREPARE, but apparently it only lasts per-session - that's worthless in our case (web based service, one connection per data-requiring connection). That's a non-sequitur. Most 'normal' high volume web apps have persistent DB connections, one per http server process. Are you really dropping DB connections and reconnecting each time a new HTTP request comes in? M ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [PERFORM] insert
It is likely that you are missing an index on one of those foreign key'd items. I don't think that is too likely as a foreign key reference must be a unique key which would have an index. I think you must be thinking of primary keys, not foreign keys. All one-to-many relationships have non-unique foreign keys. ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: [PERFORM] Performance Bottleneck
Squid also takes away the work of doing SSL (presuming you're running it on a different machine). Unfortunately it doesn't support HTTP/1.1 which means that most generated pages (those that don't set Content-length) end up forcing squid to close and then reopen the connection to the web server. It is true that it doesn't support http/1.1, but 'most generated pages'? Unless they are actually emitted progressively they should have a perfectly good content-length header. I've also had some problems when Squid had a large number of connections open (several thousand); though that may have been because of my half_closed_clients setting. Squid 3 coped a lot better when I tried it (quite a few months ago now - and using FreeBSD and the special kqueue system call) but crashed under some (admittedly synthetic) conditions. It runs out of the box with a very conservative setting for max open file descriptors - this may or may not be the cause of the problems you have seen. Certainly I ran squid with 16,000 connections back in 1999... You still have periods of time when the web servers are busy using their CPUs to generate HTML rather than waiting for database queries. This is especially true if you cache a lot of data somewhere on the web servers themselves (which, in my experience, reduces the database load a great deal). If you REALLY need to reduce the number of connections (because you have a large number of web servers doing a lot of computation, say) then it might still be useful. Aha, a postgres related topic in this thread! What you say is very true, but then given that the connection overhead is so vanishingly small, why not simply run without a persistent DB connection in this case? I would maintain that if your webservers are holding open idle DB connections for so long that it's a problem, then simply close the connections! M ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [PERFORM] Performance Bottleneck
And this is exactly where the pgpool advantage lies. Especially with the TPC-W, the Apache is serving a mix of PHP (or whatever CGI technique is used) and static content like images. Since the 200+ Apache kids serve any of that content by random and the emulated browsers very much encourage it to ramp up MaxClients children by using up to 4 concurrent image connections, one does end up with MaxClients DB connections that are all relatively low frequently used. In contrast to that the real pgpool causes lesser, more active DB connections, which is better for performance. There are two well-worn and very mature techniques for dealing with the issue of web apps using one DB connection per apache process, both of which work extremely well and attack the issue at its source. 1) Use a front-end caching proxy like Squid as an accelerator. Static content will be served by the accelerator 99% of the time. Additionally, large pages can be served immediately to the accelerator by Apache, which can then go on to serve another request without waiting for the end user's dial-up connection to pull the data down. Massive speedup, fewer apache processes needed. 2) Serve static content off an entirely separate apache server than the dynamic content, but by using separate domains (e.g. 'static.foo.com'). Personally I favour number 1. Our last biggish peak saw 6000 open HTTP and HTTPS connections and only 200 apache children, all of them nice and busy, not hanging around on street corners looking bored. During quiet times Apache drops back to its configured minimum of 40 kids. Option 2 has the advantage that you can use a leaner build for the 'dynamic' apache server, but with RAM so plentiful these days that's a less useful property. Basically this puts the 'pooling' back in the stateless HTTP area where it truly belongs and can be proven not to have any peculiar side effects (especially when it comes to transaction safety). Even better, so long as you use URL parameters for searches and the like, you can have the accelerator cache those pages for a certain time too so long as slightly stale results are OK. I'm sure pgpool and the like have their place, but being band-aids for poorly configured websites probably isn't the best use for them. M ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: [PERFORM] Swapping in 7.4.3
This is normal. My personal workstation has been up for 16 days, and it shows 65 megs used for swap. The linux kernel looks for things that haven't been accessed in quite a while and tosses them into swap to free up the memory for other uses. This isn't PostgreSQL's fault, or anything elses. It's how a typical Unix kernel works. I.e. you're seeing a problem that simply isn't there. Actually it (and other OSes) does slightly better than that. It _copies_ the least recently used pages into swap, but leaves them in memory. Then when there really is a need to swap stuff out there is no need to actually write to swap because it's already been done, and conversely if those pages are wanted then they don't have to be read from disk because they were never removed from memory. ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [PERFORM] Postgres over Linux NBD or NFS
How about iSCSI? This is exactly what it's for - presenting a bunch of remote SCSI hardware as if it were local. There are several reference implementations on SourceForge from Intel, Cisco others. I've never tried it myself, but I would if I had the need. And let's face it there are some very big players selling very pricey kit that uses it, so you should have pretty high confidence that the fundamentals are strong. M The other goal is to be able to stick LOTS of disk into one box, and dole it out to multiple servers. It's more expensive to set up and manage 3 RAID arrays than it is to set up and manage just 1, because you have to manage 3 sets of disk hardware rather than 1. [snip] The poor man's approach involves trying to fake this by building a disk box running Linux that exports the storage either as a filesystem (using NFS) or as disk blocks (NBD). NFS clearly doesn't provide the filesystem semantics needed to get decent reliability; with NBD, it's not clear what happens :-(. ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html
Re: [PERFORM] RAID or manual split?
It seems, that if I know the type and frequency of the queries a database will be seeing, I could split the database by hand over multiple disks and get better performance that I would with a RAID array with similar hardware. Unlikely, but possible if you had radically different hardware for different tables. Six large (3-7 Mrow) 'summary' tables, each being updated continuously by 5-20 processes with about 0.5 transactions/second/process. Well you should get close to an order of magnitude better performance from a RAID controller with write-back cache on those queries. Periodically (currently every two weeks), join queries are performed between one of the 'summary' tables(same one each time) and each of the other five. Each join touches most rows of both tables, indexes aren't used. Results are written into a separate group of 'inventory' tables (about 500 Krow each), one for each join. The more disks the data is spread over the better (the RAID controller will help here with striping). There are frequent (100-1000/day) queries of both the inventory and summary tables using the primary key -- always using the index and returning 10 rows. RAM is what you need, to cache the data and indexes, and then as much CPU power as you can get. We're currently getting (barely) acceptable performance from a single 15k U160 SCSI disk, but db size and activity are growing quickly. I've got more disks and a battery-backed LSI card on order. 3 or more disks in a stripe set, with write back caching, will almost certainly give a huge performance boost. Try that first, and only if you have issues should you think about futzing with symlinks etc. M ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Wierd context-switching issue on Xeon
As a cross-ref to all the 7.4.x tests people have sent in, here's 7.2.3 (Redhat 7.3), Quad Xeon 700MHz/1MB L2 cache, 3GB RAM. Idle-ish (it's a production server) cs/sec ~5000 3 test queries running: procs memoryswap io system cpu r b w swpd free buff cache si sobibo incs us sy id 3 0 0 23380 577680 105912 2145140 0 0 0 0 107 116890 50 14 35 2 0 0 23380 577680 105912 2145140 0 0 0 0 114 118583 50 15 34 2 0 0 23380 577680 105912 2145140 0 0 0 0 107 115842 54 14 32 2 1 0 23380 577680 105920 2145140 0 0 032 156 117549 50 16 35 HTH Matt -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] Behalf Of Tom Lane Sent: 20 April 2004 01:02 To: [EMAIL PROTECTED] Cc: Joe Conway; scott.marlowe; Bruce Momjian; [EMAIL PROTECTED]; [EMAIL PROTECTED]; Neil Conway Subject: Re: [PERFORM] Wierd context-switching issue on Xeon Here is a test case. To set up, run the test_setup.sql script once; then launch two copies of the test_run.sql script. (For those of you with more than two CPUs, see whether you need one per CPU to make trouble, or whether two test_runs are enough.) Check that you get a nestloops-with-index-scans plan shown by the EXPLAIN in test_run. In isolation, test_run.sql should do essentially no syscalls at all once it's past the initial ramp-up. On a machine that's functioning per expectations, multiple copies of test_run show a relatively low rate of semop() calls --- a few per second, at most --- and maybe a delaying select() here and there. What I actually see on Josh's client's machine is a context swap storm: vmstat 1 shows CS rates around 170K/sec. strace'ing the backends shows a corresponding rate of semop() syscalls, with a few delaying select()s sprinkled in. top(1) shows system CPU percent of 25-30 and idle CPU percent of 16-20. I haven't bothered to check how long the test_run query takes, but if it ends while you're still examining the behavior, just start it again. Note the test case assumes you've got shared_buffers set to at least 1000; with smaller values, you may get some I/O syscalls, which will probably skew the results. regards, tom lane ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [PERFORM] [ADMIN] Benchmarking postgres on Solaris/Linux
Now if these vendors could somehow eliminate downtime due to human error we'd be talking *serious* reliablity. You mean making the OS smart enough to know when clearing the arp cache is a bonehead operation, or just making the hardware smart enough to realise that the keyswitch really shouldn't be turned while 40 people are logged in? (Either way, I agree this'd be an improvement. It'd sure make colocation a lot less painful.) Well I was joking really, but those are two very good examples! Yes, machines should require extra confirmation for operations like those. Hell, even a simple 'init 0' would be well served by a prompt that says There are currently 400 network sockets open, 50 remote users logged in, and 25 disk IOs per second. What's more, there's nobody logged in at the console to boot me up again afterwards - are you _sure_ you want to shut the machine down?. It's also crazy that there's no prompt after an 'rm -rf' (we could have 'rm -rf --iacceptfullresponsibility' for an unprompted version). Stuff like that would have saved me from a few embarrassments in the past for sure ;-) It drives me absolutely nuts every time I see a $staggeringly_expensive clustered server whose sysadmins are scared to do a failover test in case something goes wrong! Or which has worse uptime than my desktop PC because the cluster software's poorly set up or administered. Or which has both machines on the same circuit breaker. I could go on but it's depressing me. Favourite anecdote: A project manager friend of mine had a new 'lights out' datacenter to set up. The engineers, admins and operators swore blind that everything had been tested in every possible way, and that incredible uptime was guaranteed. 'So if I just pull this disk out everything will keep working?' he asked, and then pulled the disk out without waiting for an answer... Ever since he told me that story I've done exactly that with every piece of so-called 'redundant' hardware a vendor tries to flog me. Ask them to set it up, then just do nasty things to it without asking for permission. Less than half the gear makes it through that filter, and actually you can almost tell from the look on the technical sales rep's face as you reach for the drive/cable/card/whatever whether it will or won't. M ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] [ADMIN] Benchmarking postgres on Solaris/Linux
If it's going to be write intensive then the RAID controller will be the most important thing. A dual p3/500 with a write-back cache will smoke either of the boxes you mention using software RAID on write performance. As for the compute intensive side (complex joins sorts etc), the Dell will most likely beat the Sun by some distance, although what the Sun lacks in CPU power it may make up a bit in memory bandwidth/latency. Matt -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] Behalf Of Subbiah, Stalin Sent: 23 March 2004 18:41 To: 'Andrew Sullivan'; '[EMAIL PROTECTED]' Subject: Re: [PERFORM] [ADMIN] Benchmarking postgres on Solaris/Linux We are looking into Sun V210 (2 x 1 GHz cpu, 2 gig ram, 5.8Os) vs. Dell 1750 (2 x 2.4 GHz xeon, 2 gig ram, RH3.0). database will mostly be write intensive and disks will be on raid 10. Wondering if 64bit 1 GHz to 32bit 2.4 GHz make a big difference here. Thanks! -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] Behalf Of Andrew Sullivan Sent: Tuesday, March 23, 2004 9:37 AM To: '[EMAIL PROTECTED]' Subject: Re: [PERFORM] [ADMIN] Benchmarking postgres on Solaris/Linux On Mon, Mar 22, 2004 at 04:05:45PM -0800, Subbiah, Stalin wrote: being the key performance booster for postgres. what is the preferred OS for postgres deployment if given an option between linux and solaris. As One thing this very much depends on is what you're trying to do. Suns have a reputation for greater reliability. While my own experience with Sun hardware has been rather shy of sterling, I _can_ say that it stands head and shoulders above a lot of the x86 gear you can get. If you're planning to use Solaris on x86, don't bother. Solaris is a slow, bloated pig compared to Linux, at least when it comes to managing the largish number of processes that Postgres requires. If pure speed is what you're after, I have found that 2-way, 32 bit Linux on P-IIIs compares very favourably to 4 way 64 bit Ultra SPARC IIs. A -- Andrew Sullivan | [EMAIL PROTECTED] The fact that technology doesn't work is no bar to success in the marketplace. --Philip Greenspun ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED]) ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [PERFORM] Disappointing performance in db migrated from MS SQL
Josh, the disks in the new system should be substantially faster than the old. Both are Ultra160 SCSI RAID 5 arrays, but the new system has 15k RPM disks, as opposed to the 10k RPM disks in the old system. Spindle speed does not correlate with 'throughput' in any easy way. What controllers are you using for these disks? ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html
Re: [PERFORM] insert speed - Mac OSX vs Redhat
On a variety of hardware with Redhat, and versions of postgres, we're not getting much better than 50 inserts per second. This is prior to moving WAL to another disk, and fsync is on. However, with postgres 7.4 on Mac OSX 10.2.3, we're getting an amazing 500 inserts per second. We can only put this down to the OS. ^^^ You haven't really produced much evidence to support that statement. Given that the differences in performance between Postgres running on *BSD and Linux on Intel hardware are not large at all, it seems to be almost certainly false in fact. It may of course be due to some settings of the different OSes, but not the OSes themselves. It would help if you gave a straight PG7.4 comparison with hardware specs as well, and config file differences if any. One thought: assuming the Apple has IDE disks, then the disks probably have write caching turned on, which is good for speed, but not crash-safe. matt ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Adding RAM: seeking advice warnings of hidden gotchas
If you have 3 1.5GB tables then you might as well go for 4GB while you're at it. Make sure you've got a bigmem kernel either running or available, and boost effective_cache_size by whatever amount you increase the RAM by. We run a Quad Xeon/4GB server on Redhat 7.3 and it's solid as a rock. There is no way I know of to get indexes preferentially cached over data though. Matt -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] Behalf Of Nick Fankhauser Sent: 17 December 2003 19:57 To: [EMAIL PROTECTED] Org Subject: [PERFORM] Adding RAM: seeking advice warnings of hidden gotchas Hi- After having done my best to squeeze better performance out of our application by tuning within our existing resources, I'm falling back on adding memory as a short-term solution while we get creative for a long-term fix. I'm curious about what experiences others have had with the process of adding big chunks of RAM. In particular, if I'm trying to encourage the OS to cache more of my index information in RAM, what sort of configuration should I do at both the PostgreSQL and OS level? In a slightly off-topic vein, I'd also like to hear about it if anyone knows about any gotchas at the OS level that might become a problem. The server is a dual processor Athlon 1.2GHz box with hardware SCSI RAID. It currently has 1 GB RAM, and we're planning to add one GB more for a total of 2GB. The OS is Debian Linux Kernel 2.4.x, and we're on PostgreSQL v7.3.2 My current memory related settings are: SHMMAX and SHMALL set to 128MB (OS setting) shared buffers 8192 (64MB) sort_mem 16384 (16MB) effective_cache_size 65536 (512MB) We support up to 70 active users, sharing a connection pool of 16 connections. Most of the queries center around 3 tables that are about 1.5 GB each. Thanks. -Nick - Nick Fankhauser [EMAIL PROTECTED] Phone 1.765.965.7363 Fax 1.765.962.9788 doxpop - Court records at your fingertips - http://www.doxpop.com/ ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: [PERFORM] tuning questions
I ended up going back to a default postgresql.conf and reapplying the various tunings one-by-one. Turns out that while setting fsync = false had little effect on the slow IDE box, it had a drastic effect on this faster SCSI box and performance is quite acceptable now (aside from the expected falloff of about 30% after the first twenty minutes, which I believe comes from growing and shrinking tables without vacuumdb --analyzing). Hmm. I wonder if that could be related to the issue where many IDE drives have write-caching enabled. With the write cache enabled fsyncs are nearly immediate, so setting fsync=false makes little difference... ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster