[PERFORM] Proposal for unlogged tables
I've recently been doing some performance testing with unlogged tables vs logged tables on 9.5-rc1. Basically we're trying to do big loads of data into the database periodically. If on the very rare occasion the server crashes half way through the import it's no big deal so I've been looking specifically at unlogged tables with transactions having synchronous_commit set to OFF. When we do the inserts on a logged table with default WAL configuration settings we get a *lot* of disk IO generated (500mb/sec of pretty random IO - we have a nice ssd raid array but even so this maxes it out). Tweaking WAL settings (commit_delay, max_wal_size, min_wal_size) improves the situation quite a bit (50-100mb/sec of writes), but still we have no need to log the inserts into the WAL at the moment. Doing the imports to unlogged tables we get virtually no IO until the insert process has finished when the table gets flushed to disk which is great for us. However I read in the manuals that if the server ever has an unclean shutdown all unlogged tables will be truncated. Obviously with 9.5 we can now alter tables to be logged/unlogged after insert but this will still write all the inserts into the WAL. I can understand the requirement to truncate tables with active IO at the point of unclean shutdown where you may get corrupted data; but I'm interested to find out how easy it would be to not perform the truncate for historical unlogged tables. If the last data modification statement was run more than eg 30 seconds or 1 minute before an unclean shutdown (or the data was otherwise flushed to disk and there was no IO since then) can we not assume that the data is not corrupted and hence not truncate the unlogged tables? Thanks Mark -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Plan differences
Hello Anton, Changing the locale to anything other than C or POSIX will have a performance overhead. I’m pretty sure that just declaring the locale on the indexes is just like plastering over the cracks. Is it possible to reload the database with the same locale as the original database server? Regards, Adam From: pgsql-performance-ow...@postgresql.org [mailto:pgsql-performance-ow...@postgresql.org] On Behalf Of Anton Melser Sent: 01 January 2016 5:13 PM To: Tom Lane Cc: pgsql-performance@postgresql.org Subject: Re: [PERFORM] Plan differences Declaring new indexes with COLLATE "C" and removing the old indexes fixed the like problem but it created a another - the > and < queries need a sort before passing off the the new index. Having two indexes seems to give me the best of both worlds, though obviously it's taking up (much) more space. As space isn't ever likely to be a problem, and there are no updates (only copy) to these tables, I'll keep it like this to avoid having to reload the entire DB. I spoke a little soon - while many of the simple queries are now hitting the indexes, some of the more complicated ones are still producing substantially inferior plans, even after reloading the whole DB with an identical lc_collate and lc_ctype. Here are the plans on the original server and the new server (identical collations, lctypes and index types - btree C). I have been experimenting (accepted = accepted2, idx_accepted2_mid = idx_accepted_mid, etc.) and the tables no longer have exactly the same data but there is nothing substantially different (a few days of data more with about a year total). The oldserver query is actually working on about 3x the amount of data - I tried reducing the amounts on the new server to get done in memory but it didn't seem to help the plan. HashAggregate (cost=3488512.43..3496556.16 rows=536249 width=143) (actual time=228467.924..229026.799 rows=1426351 loops=1) Group Key: to_char(timezone('UTC'::text, a.tstamp), '-MM-DD'::text), a.column1, CASE WHEN (d.column1 IS NOT NULL) THEN d.column1 ELSE fff.column1 END, a.column2 -> Merge Left Join (cost=110018.15..3072358.66 rows=23780215 width=143) (actual time=3281.993..200563.177 rows=23554638 loops=1) Merge Cond: ((a.message_id)::text = (fff.message_id)::text) -> Merge Left Join (cost=110017.58..2781199.04 rows=23780215 width=136) (actual time=3281.942..157385.338 rows=23554636 loops=1) Merge Cond: ((a.message_id)::text = (d.message_id)::text) -> Index Scan using idx_accepted2_mid on accepted a (cost=0.70..2226690.13 rows=23780215 width=83) (actual time=3.690..73048.662 rows=23554632 loops=1) Index Cond: ((message_id)::text > '20151130'::text) Filter: (((mrid)::text <> '----'::text) AND ((mrid)::text <> 'BAT'::text) AND ((column2)::text <> 'text1'::text) AND ((column2)::text !~~ 'text2.%'::text)) Rows Removed by Filter: 342947 -> Index Scan using idx_delivered2_mid on delivered d (cost=110016.89..482842.01 rows=3459461 width=53) (actual time=3278.245..64031.033 rows=23666434 loops=1) Index Cond: ((message_id)::text > '20151130'::text) Filter: (NOT (hashed SubPlan 1)) Rows Removed by Filter: 443 SubPlan 1 -> Index Scan using idx_failed2_mid on failed ff (cost=0.57..109953.48 rows=25083 width=46) (actual time=0.041..3124.642 rows=237026 loops=1) Index Cond: ((message_id)::text > '20151130'::text) Filter: ((severity)::text = 'permanent'::text) Rows Removed by Filter: 5080519 -> Index Scan using idx_failed2_mid on failed fff (cost=0.57..112718.27 rows=25083 width=53) (actual time=0.034..4861.762 rows=236676 loops=1) Index Cond: ((message_id)::text > '20151130'::text) Filter: ((severity)::text = 'permanent'::text) Rows Removed by Filter: 5080519 Planning time: 2.039 ms Execution time: 229076.361 ms HashAggregate (cost=7636055.05..7640148.23 rows=272879 width=143) (actual time=488739.376..488915.545 rows=403741 loops=1) Group Key: to_char(timezone('UTC'::text, a.tstamp), '-MM-DD'::text), a.column1, CASE WHEN (d.column1 IS NOT NULL) THEN d.column1 ELSE fff.column1 END, a.column2 -> Hash Right Join (cost=5119277.32..7528101.45 rows=6168777 width=143) (actual time=271256.212..480958.460 rows=6516196 loops=1) Hash Cond: ((d.message_id)::text = (a.message_id)::text) -> Bitmap Heap Scan on delivered2 d (cost=808012.86..3063311.98 rows=3117499 width=53) (actual time=7012.487..194557.307 rows=6604970 loops=1) Recheck Cond: ((message_id)::text > '20151225'::text) Rows Removed by Index Recheck:
Re: [PERFORM] Plan differences
Hi, >Changing the locale to anything other than C or POSIX will > have a performance overhead. I’m pretty sure that just declaring the > locale on the indexes is just like plastering over the cracks. > > > > Is it possible to reload the database with the same locale as the original > database server? > Sorry, I wasn't clear - I did end up recreating the DB with lc_collate = "C" and lc_ctype = "C" and loading all data and the plans are for this situation (i.e., both are now the same, "C" everywhere) Maybe it is just a case of optimisations being removed in the RC? Cheers, Anton
Re: [PERFORM] Proposal for unlogged tables
On 2016-01-04 02:59, Mark Zealey wrote: > shutdown all unlogged tables will be truncated. Obviously with 9.5 we can now > alter tables to be logged/unlogged after insert but this will still write all > the inserts into the WAL. I haven't tried, but won't converting an unlogged table into a logged table write all the inserts at once instead of once per insert? Or are you wanting to do more bulk insert into that table later? > I can understand the requirement to truncate tables > with active IO at the point of unclean shutdown where you may get corrupted > data; but I'm interested to find out how easy it would be to not perform the > truncate for historical unlogged tables. Are you trying to avoid running a CHECKPOINT? Are you afraid the activity on the other tables will create too much I/O? > If the last data modification > statement was run more than eg 30 seconds or 1 minute before an unclean > shutdown (or the data was otherwise flushed to disk and there was no IO since > then) can we not assume that the data is not corrupted and hence not truncate > the unlogged tables? I have to admit that I have been surprised by this, it feels like unlogged tables are never written properly unless you do an explicit CHECKSUM. -- http://yves.zioup.com gpg: 4096R/32B0F416 -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Proposal for unlogged tables
On 04/01/16 16:27, Yves Dorfsman wrote: I haven't tried, but won't converting an unlogged table into a logged table write all the inserts at once instead of once per insert? Or are you wanting to do more bulk insert into that table later? Are you trying to avoid running a CHECKPOINT? Are you afraid the activity on the other tables will create too much I/O? Setting a table to logged still pushes all the inserts into the WAL which we don't need and causes a lot of extra IO. It also takes quite a long time as it is basically rewriting the table and all indexes (eg 60 seconds for 2m rows on one of my test tables). We can do this but a) it generates lots of additional IO which isn't really required for us, and b) it acquires an exclusive lock on the table which is also not nice for us. If the last data modification statement was run more than eg 30 seconds or 1 minute before an unclean shutdown (or the data was otherwise flushed to disk and there was no IO since then) can we not assume that the data is not corrupted and hence not truncate the unlogged tables? I have to admit that I have been surprised by this, it feels like unlogged tables are never written properly unless you do an explicit CHECKSUM. I don't know how the internals work but unlogged tables definitely flushed to disk and persist through normal server restarts. It is just according to the docs if the server ever has an unclean shutdown the tables are truncated even if they have not been updated in a year. I can't understand why it has to be like this and it seems that it would be much nicer to not automatically truncate if it doesn't have to. This would be great in the situation where you can tolerate a low chance of data-loss but want very quick upserts. Mark -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Proposal for unlogged tables
On 04/01/16 18:12, Andres Freund wrote: Pages containing data of unlogged tables aren't ever flushed to disk unless a) a shutdown checkpoint is performed b) a buffer containing data from an unlogged table is used for something else c) the database being copied is the the source of a CREATE DATABASE .. TEMPLATE Hence, if there's an unclean shutdown, there's absolutely no guarantee about the on-disk state of unlogged tables. Even if they haven't been modified in ages - there could have been many many dirty pages in shared buffers when crashing. Always flushing dirty pages of unlogged tables at checkpoint would greatly increase the overhead for memory resident, write heavy workloads that use unlogged tables. If there was a command to flush a specific unlogged table to disk it would work around all these issues no? Perhaps if you marked the table as read only at the same time it would flush it to disk and ensure no more data could be written to it eg (ALTER TABLE ... SET READ ONLY on an unlogged table would flush + not truncate after crash). In our case this would be great as we want to use these as permanent tables for speed; but after an initial data dump we don't change the data again so we could just do this at the end of the import process. Mark -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Proposal for unlogged tables
On 2016-01-04 19:12:22 +0200, Mark Zealey wrote: > If there was a command to flush a specific unlogged table to disk it would > work around all these issues no? Perhaps if you marked the table as read > only at the same time it would flush it to disk and ensure no more data > could be written to it eg (ALTER TABLE ... SET READ ONLY on an unlogged > table would flush + not truncate after crash). In our case this would be > great as we want to use these as permanent tables for speed; but after an > initial data dump we don't change the data again so we could just do this at > the end of the import process. It's more complex than that, even unmodified tables need to be processed by vacuum every now and then (xid wraparound handling). It probably possible to work around such things, but it's not a line or ten. Andres -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Proposal for unlogged tables
On 2016-01-04 16:38:40 +0200, Mark Zealey wrote: > I don't know how the internals work but unlogged tables definitely flushed > to disk and persist through normal server restarts. It is just according to > the docs if the server ever has an unclean shutdown the tables are truncated > even if they have not been updated in a year. I can't understand why it has > to be like this and it seems that it would be much nicer to not > automatically truncate if it doesn't have to. Pages containing data of unlogged tables aren't ever flushed to disk unless a) a shutdown checkpoint is performed b) a buffer containing data from an unlogged table is used for something else c) the database being copied is the the source of a CREATE DATABASE .. TEMPLATE Hence, if there's an unclean shutdown, there's absolutely no guarantee about the on-disk state of unlogged tables. Even if they haven't been modified in ages - there could have been many many dirty pages in shared buffers when crashing. Always flushing dirty pages of unlogged tables at checkpoint would greatly increase the overhead for memory resident, write heavy workloads that use unlogged tables. Andres -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance