Re: [PERFORM] [GENERAL] [pgadmin-support] Issue with a hanging apply process on the replica db after vacuum works on primary

2015-03-20 Thread Vladimir Borodin

 19 марта 2015 г., в 20:30, Sergey Shchukin shchukin@gmail.com 
 написал(а):
 
 17.03.2015 13:22, Sergey Shchukin пишет:
 05.03.2015 11:25, Jim Nasby пишет:
 On 2/27/15 5:11 AM, Sergey Shchukin wrote: 
 
 show max_standby_streaming_delay; 
   max_standby_streaming_delay 
 - 
   30s 
 
 We both need to be more clear about which server we're talking about 
 (master or replica). 
 
 What are max_standby_streaming_delay and max_standby_archive_delay set to 
 *on the replica*? 
 
 My hope is that one or both of those is set to somewhere around 8 minutes 
 on the replica. That would explain everything. 
 
 If that's not the case then I suspect what's happening is there's something 
 running on the replica that isn't checking for interrupts frequently 
 enough. That would also explain it. 
 
 When replication hangs, is the replication process using a lot of CPU? Or 
 is it just sitting there? What's the process status for the replay process 
 show? 
 
 Can you get a trace of the replay process on the replica when this is 
 happening to see where it's spending all it's time? 
 
 How are you generating these log lines? 
  Tue Feb 24 15:05:07 MSK 2015 Stream: MASTER-masterdb:79607161592048 
 SLAVE:79607161550576 Replay:79607160986064 :: REPLAY 592 KBytes 
 (00:00:00.398376 seconds) 
 
 Do you see the confl_* fields in pg_stat_database_conflicts on the 
 *replica* increasing? 
 
 Hi Jim,
 
 max_standby_streaming_delay and max_standby_archive_delay  both are 30s on 
 master and replica dbs
 
 I don't see any specific or heavy workload during this issue with a hanging 
 apply process. Just a normal queries as usual. 
 
 But I see an increased disk activity during the time when the apply issue is 
 ongoing
 
 DSK |  sdc  |  | busy 61%  | read   11511 |  
  | write   4534 | KiB/r 46  |  |  KiB/w  4 | MBr/s  
 52.78 |   | MBw/s   1.88 |  avq 1.45 |  |  avio 
 0.38 ms |
 DSK |  sde  |  | busy 60%  | read   11457 |  
  | write   4398 | KiB/r 46  |  |  KiB/w  4 | MBr/s  
 51.97 |   | MBw/s   1.83 |  avq 1.47 |  |  avio 
 0.38 ms |
 DSK |  sdd  |  | busy 60%  | read9673 |  
  | write   4538 | KiB/r 61  |  |  KiB/w  4 | MBr/s  
 58.24 |   | MBw/s   1.88 |  avq 1.47 |  |  avio 
 0.42 ms |
 DSK |  sdj  |  | busy 59%  | read9576 |  
  | write   4177 | KiB/r 63  |  |  KiB/w  4 | MBr/s  
 59.30 |   | MBw/s   1.75 |  avq 1.48 |  |  avio 
 0.43 ms |
 DSK |  sdh  |  | busy 59%  | read9615 |  
  | write   4305 | KiB/r 63  |  |  KiB/w  4 | MBr/s  
 59.23 |   | MBw/s   1.80 |  avq 1.48 |  |  avio 
 0.42 ms |
 DSK |  sdf  |  | busy 59%  | read9483 |  
  | write   4404 | KiB/r 63  |  |  KiB/w  4 | MBr/s  
 59.11 |   | MBw/s   1.83 |  avq 1.47 |  |  avio 
 0.42 ms |
 DSK |  sdi  |  | busy 59%  | read   11273 |  
  | write   4173 | KiB/r 46  |  |  KiB/w  4 | MBr/s  
 51.50 |   | MBw/s   1.75 |  avq 1.43 |  |  avio 
 0.38 ms |
 DSK |  sdg  |  | busy 59%  | read   11406 |  
  | write   4297 | KiB/r 46  |  |  KiB/w  4 | MBr/s  
 51.66 |   | MBw/s   1.80 |  avq 1.46 |  |  avio 
 0.37 ms |
 
 Although it's not seems to be an upper IO limit.
 
 Normally disks are busy at 20-45%
 
 DSK |  sde  |  | busy 29%  | read6524 |  
  | write  14426 | KiB/r 26  |  |  KiB/w  5 | MBr/s  
 17.08 |   | MBw/s   7.78 |  avq10.46 |  |  avio 
 0.14 ms |
 DSK |  sdi  |  | busy 29%  | read6590 |  
  | write  14391 | KiB/r 26  |  |  KiB/w  5 | MBr/s  
 17.19 |   | MBw/s   7.76 |  avq 8.75 |  |  avio 
 0.14 ms |
 DSK |  sdg  |  | busy 29%  | read6547 |  
  | write  14401 | KiB/r 26  |  |  KiB/w  5 | MBr/s  
 16.94 |   | MBw/s   7.60 |  avq 7.28 |  |  avio 
 0.14 ms |
 DSK |  sdc  |  | busy 29%  | read6835 |  
  | write  14283 | KiB/r 27  |  |  KiB/w  5 | MBr/s  
 18.08 |   | MBw/s   7.74 |  avq 8.77 |  |  avio 
 0.14 ms |
 DSK |  sdf  |  | busy 23%  | read3808 |  
  | write  14391 | KiB/r 36  |  |  KiB/w  5 | MBr/s  
 13.49 |   | MBw/s   7.78 |  

[PERFORM] 9.4 -> 9.5 regression with queries through pgbouncer on RHEL 6

2016-05-25 Thread Vladimir Borodin
Hi all.We have found that queries through PgBouncer 1.7.2 (with transaction pooling) to local PostgreSQL are almost two times slower in 9.5.3 than in 9.4.8 on RHEL 6 hosts (all packages are updated to last versions). Meanwhile the problem can’t be reproduced i.e. on Ubuntu 14.04 (also fully-updated).Here is how the results look like for 9.4, 9.5 and 9.6. All are built from latest commits on yesterday in	* REL9_4_STABLE (a0cc89a28141595d888d8aba43163d58a1578bfb),	* REL9_5_STABLE (e504d915bbf352ecfc4ed335af934e799bf01053),	* master (6ee7fb8244560b7a3f224784b8ad2351107fa55d).All of them are build on the host where testing is done (with stock gcc versions). Sysctls, pgbouncer config and everything we found are the same, postgres configs are default, PGDATA is in tmpfs. All numbers are reproducible, they are stable between runs.Shortly:OS			PostgreSQL version	TPS			Avg. latencyRHEL 6		9.4	44898		1.425 msRHEL 6		9.5	26199		2.443 msRHEL 6		9.5	43027		1.487 msUbuntu 14.04	9.4	67458		0.949 msUbuntu 14.04	9.5	64065		0.999 msUbuntu 14.04	9.6	64350		0.995 msYou could see that the difference between major versions on Ubuntu is not significant, but on RHEL 9.5 is 70% slower than 9.4 and 9.6.Below are more details.RHEL 6:postgres@pgload05g ~ $ /usr/lib/postgresql/9.4/bin/pgbench -U postgres -T 60 -j 64 -c 64 -S -n 'host=localhost port=6432 dbname=pg94'transaction type: SELECT onlyscaling factor: 100query mode: simplenumber of clients: 64number of threads: 64duration: 60 snumber of transactions actually processed: 2693962latency average: 1.425 mstps = 44897.461518 (including connections establishing)tps = 44898.763258 (excluding connections establishing)postgres@pgload05g ~ $ /usr/lib/postgresql/9.4/bin/pgbench -U postgres -T 60 -j 64 -c 64 -S -n 'host=localhost port=6432 dbname=pg95'transaction type: SELECT onlyscaling factor: 100query mode: simplenumber of clients: 64number of threads: 64duration: 60 snumber of transactions actually processed: 1572014latency average: 2.443 mstps = 26198.928627 (including connections establishing)tps = 26199.803363 (excluding connections establishing)postgres@pgload05g ~ $ /usr/lib/postgresql/9.4/bin/pgbench -U postgres -T 60 -j 64 -c 64 -S -n 'host=localhost port=6432 dbname=pg96'transaction type: SELECT onlyscaling factor: 100query mode: simplenumber of clients: 64number of threads: 64duration: 60 snumber of transactions actually processed: 2581645latency average: 1.487 mstps = 43025.676995 (including connections establishing)tps = 43027.038275 (excluding connections establishing)postgres@pgload05g ~ $Ubuntu 14.04 (the same hardware):postgres@pgloadpublic02:~$ /usr/lib/postgresql/9.4/bin/pgbench -U postgres -T 60 -j 64 -c 64 -S -n 'host=localhost port=6432 dbname=pg94'transaction type: SELECT onlyscaling factor: 100query mode: simplenumber of clients: 64number of threads: 64duration: 60 snumber of transactions actually processed: 4047653latency average: 0.949 mstps = 67458.361515 (including connections establishing)tps = 67459.983480 (excluding connections establishing)postgres@pgloadpublic02:~$ /usr/lib/postgresql/9.4/bin/pgbench -U postgres -T 60 -j 64 -c 64 -S -n 'host=localhost port=6432 dbname=pg95'transaction type: SELECT onlyscaling factor: 100query mode: simplenumber of clients: 64number of threads: 64duration: 60 snumber of transactions actually processed: 3844084latency average: 0.999 mstps = 64065.447458 (including connections establishing)tps = 64066.943627 (excluding connections establishing)postgres@pgloadpublic02:~$ /usr/lib/postgresql/9.4/bin/pgbench -U postgres -T 60 -j 64 -c 64 -S -n 'host=localhost port=6432 dbname=pg96'transaction type: SELECT onlyscaling factor: 100query mode: simplenumber of clients: 64number of threads: 64duration: 60 snumber of transactions actually processed: 3861088latency average: 0.995 mstps = 64348.573126 (including connections establishing)tps = 64350.195750 (excluding connections establishing)postgres@pgloadpublic02:~$In both tests (RHEL and Ubuntu) the bottleneck is performance of singe CPU core which is 100% consumed by PgBouncer. If pgbench connects to postgres directly I get the following (expected) numbers:postgres@pgload05g ~ $ /usr/lib/postgresql/9.4/bin/pgbench -U postgres -T 60 -j 64 -c 64 -S -n 'host=localhost port=5432'transaction type: SELECT onlyscaling factor: 100query mode: simplenumber of clients: 64number of threads: 64duration: 60 snumber of transactions actually processed: 10010710latency average: 0.384 mstps = 166835.937859 (including connections establishing)tps = 166849.730224 (excluding connections establishing)postgres@pgload05g ~ $ /usr/lib/postgresql/9.4/bin/pgbench -U postgres -T 60 -j 64 -c 64 -S -n 'host=localhost port=5433'transaction type: SELECT onlyscaling factor: 100query mode: simplenumber of clients: 64number of threads: 64duration: 60 snumber of transactions actually processed: 13373890latency average: 0.287 mstps = 222888.311289 (including connections establishing)tps = 

Re: [PERFORM] DIsk I/O from pg_stat_activity

2016-03-13 Thread Vladimir Borodin

> 13 марта 2016 г., в 20:39, Artem Tomyuk  написал(а):
> 
> Hi all.
> 
> Is there any way of how to retrieve information from pg_stat_activity (its 
> not very comfort to get it from iotop, because its not showing full text of 
> query) which query generates or consumes the most IO load or time. 

Probably this can be done with pg_stat_kcache. Installing it with 
pg_stat_statements and querying it something like below will give stats per 
query:

rpopdb01d/postgres R # SELECT rolname, queryid, round(total_time::numeric, 2) 
AS total_time, calls,
pg_size_pretty(shared_blks_hit*8192) AS shared_hit,
pg_size_pretty(int8larger(0, (shared_blks_read*8192 - reads))) AS 
page_cache_hit,
pg_size_pretty(reads) AS physical_read,
round(blk_read_time::numeric, 2) AS blk_read_time,
round(user_time::numeric, 2) AS user_time,
round(system_time::numeric, 2) AS system_time
FROM pg_stat_statements s
JOIN pg_stat_kcache() k USING (userid, dbid, queryid)
JOIN pg_database d ON s.dbid = d.oid
JOIN pg_roles r ON r.oid = userid
WHERE datname != 'postgres' AND datname NOT LIKE 'template%'
ORDER BY reads DESC LIMIT 1;
 rolname |  queryid   |  total_time  |  calls   | shared_hit | page_cache_hit | 
physical_read | blk_read_time | user_time | system_time
-++--+--+++---+---+---+-
 rpop| 3183006759 | 309049021.97 | 38098195 | 276 TB | 27 TB  | 
22 TB |   75485646.81 | 269508.98 |35635.96
(1 row)

Time: 18.605 ms
rpopdb01d/postgres R #

Query text may be resolved by queryid something like SELECT query FROM 
pg_stat_statements WHERE queryid = 3183006759.

Works only with 9.4+ and gives you statistics per query for all the time, not 
the current state.

> 
> Thanks for any advice.


--
May the force be with you…
https://simply.name



Re: [PERFORM] Planner do seq scan on empty master partitioned table

2016-08-11 Thread Vladimir Borodin

> 11 авг. 2016 г., в 13:46, Andrey Zhidenkov  
> написал(а):
> 
> I have a table (registry.entry) which has ~ 100 inherited tables. This
> is a master table and it's empty:
> 
> postgres@db=# select count(*) from only registry.entry;
> count
> ---
> 0
> (1 row)
> 
> Master table has rules, inherited tables has check constraints. Data
> partitioned by value of area_id. But when I run a query with area_id
> in where clause, planner do seq scan on master table if master table
> has no indexes or index scan if has:
> 
> Append  (cost=0.12..1750.11 rows=670 width=256)
>  ->  Index Scan using MASTER_TABLE_INDEX on entry e  (cost=0.12..6.15
> rows=1 width=253)
>Index Cond: (((cadastral_number)::text ~>=~
> '61:44:0030502'::text) AND ((cadastral_number)::text ~<~
> '61:44:0030503'::text))
>Filter: (((cadastral_number)::text ~~ '61:44:0030502%'::text)
> AND (area_id = 1381) AND (quarter_id = 1368779))
>  ->  Bitmap Heap Scan on entry_61_44 e_1  (cost=1381.62..1743.95
> rows=669 width=256)
>Recheck Cond: (quarter_id = 1368779)
>Filter: (((cadastral_number)::text ~~ '61:44:0030502%'::text)
> AND (area_id = 1381))
>->  BitmapAnd  (cost=1381.62..1381.62 rows=122 width=0)
>  ->  Bitmap Index Scan on
> entry_61_44_cadastral_number_idx  (cost=0.00..321.57 rows=12901
> width=0)
>Index Cond: (((cadastral_number)::text ~>=~
> '61:44:0030502'::text) AND ((cadastral_number)::text ~<~
> '61:44:0030503'::text))
>  ->  Bitmap Index Scan on entry_61_44_quarter_id_idx
> (cost=0.00..1059.47 rows=67205 width=0)
>Index Cond: (quarter_id = 1368779)
> 
> As you can see, postgres scan only one needed partition and (!) an
> index from master table, In this example I has an index on master
> table because it's a production server and when I drop it query time
> is too long.
> In the past (before partitioning) master table has many rows. I made
> vacuum and vacuum analyze for registry.entry, but it didn't help.
> pgAdmin says that table size is 21Gb, live tuples: 0, dead tuples: 0.

You can make TRUNCATE ONLY master_table. But don’t forget the ONLY keyword 
because in that case it will truncate all child tables also :)

> 
> What am I doing wrong?
> 
> -- 
> Andrey Zhidenkov
> 
> 
> -- 
> Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance


--
May the force be with you…
https://simply.name



Re: [HACKERS] [PERFORM] 9.4 -> 9.5 regression with queries through pgbouncer on RHEL 6

2016-07-04 Thread Vladimir Borodin

> 13 июня 2016 г., в 21:58, Vladimir Borodin <r...@simply.name> написал(а):
> 
>> 
>> 13 июня 2016 г., в 0:51, Andres Freund <and...@anarazel.de 
>> <mailto:and...@anarazel.de>> написал(а):
>> 
>> Hi Vladimir,
>> 
>> Thanks for these reports.
>> 
>> On 2016-06-13 00:42:19 +0300, Vladimir Borodin wrote:
>>> perf report -g -i pg9?_all.data >/tmp/pg9?_perf_report.txt
>> 
>> Any chance you could redo the reports with --no-children --call-graph=fractal
>> added? The mode that includes child overheads unfortunately makes the
>> output hard to interpet/compare.
> 
> Of course. Not sure if that is important but I upgraded perf for that 
> (because --no-children option was introduced in ~3.16), so perf record and 
> perf report were done with different perf versions.
> 
> 
> 
> 
> 
> Also I’ve done the same test on same host (RHEL 6) but with 4.6 kernel/perf 
> and writing perf data to /dev/shm for not loosing events. Perf report output 
> is also attached but important thing is that the regression is not so 
> significant:
> 
> root@pgload05g ~ # uname -r
> 4.6.0-1.el6.elrepo.x86_64
> root@pgload05g ~ # cat /proc/sys/kernel/sched_autogroup_enabled
> 1
> root@pgload05g ~ # /tmp/run.sh
> RHEL 69.4 71634   0.893
> RHEL 69.5 54005   1.185
> RHEL 69.6 65550   0.976
> root@pgload05g ~ # echo 0 >/proc/sys/kernel/sched_autogroup_enabled
> root@pgload05g ~ # /tmp/run.sh
> RHEL 69.4 73041   0.876
> RHEL 69.5 60105   1.065
> RHEL 69.6 67984   0.941
> root@pgload05g ~ #
> 
> 
> 
> 

Andres, is there any chance that you would find time to look at those results? 
Are they actually useful?

> 
> 
>> 
>>> The results from pg9?_perf_report.txt are attached. Note that in all cases 
>>> some events were lost, i.e.:
>>> 
>>> root@pgload05g ~ # perf report -g -i pg94_all.data 
>>> >/tmp/pg94_perf_report.txt
>>> Failed to open [vsyscall], continuing without symbols
>>> Warning:
>>> Processed 537137 events and lost 7846 chunks!
>> 
>> You can reduce the overhead by reducing the sampling frequency, e.g. by
>> specifying -F 300.
>> 
>> Greetings,
>> 
>> Andres Freund
>> 
>> 
>> -- 
>> Sent via pgsql-hackers mailing list (pgsql-hack...@postgresql.org 
>> <mailto:pgsql-hack...@postgresql.org>)
>> To make changes to your subscription:
>> http://www.postgresql.org/mailpref/pgsql-hackers 
>> <http://www.postgresql.org/mailpref/pgsql-hackers>
> 
> 
> --
> May the force be with you…
> https://simply.name <https://simply.name/>

--
May the force be with you…
https://simply.name



Re: [PERFORM] Backup taking long time !!!

2017-01-22 Thread Vladimir Borodin

> 20 янв. 2017 г., в 19:59, Stephen Frost  написал(а):
> 
>>> How are you testing your backups..?  Do you have page-level checksums
>>> enabled on your database?  
>> 
>> Yep, we use checksums. We restore latest backup with recovery_target = 
>> 'immediate' and do COPY tablename TO '/dev/null’ with checking exit code for 
>> each table in each database (in several threads, of course).
> 
> Right, unfortunately that only checks the heap pages, it won't help with
> corruption happening in an index file or other files which have a
> checksum.

That’s fine for us because indexes could be rebuilt. The main idea is the 
guarantee that data would not be lost.

--
May the force be with you…
https://simply.name



Re: [PERFORM] Backup taking long time !!!

2017-01-20 Thread Vladimir Borodin

> 20 янв. 2017 г., в 15:22, Stephen Frost  написал(а):
>> 
>> This process can be automatized by some applications like barman
>> http://www.pgbarman.org/
> 
> Last I checked, barman is still single-threaded.
> 
> If the database is large enough that you need multi-process backup, I'd
> suggest looking at pgbackrest- http://www.pgbackrest.org.
> 
> pgbackrest has parallel backup, incremental/differential/full backup
> support, supports compression, CRC checking, and a whole ton of other
> good stuff.

Increments in pgbackrest are done on file level which is not really efficient. 
We have done parallelism, compression and page-level increments (9.3+) in 
barman fork [1], but unfortunately guys from 2ndquadrant-it don’t hurry to work 
on it.

And actually it would be much better to do a good backup and recovery manager 
part of the core postgres.

[1] https://github.com/secwall/barman
[2] https://github.com/2ndquadrant-it/barman/issues/21


--
May the force be with you…
https://simply.name



Re: [PERFORM] Backup taking long time !!!

2017-01-20 Thread Vladimir Borodin

> 20 янв. 2017 г., в 16:40, Stephen Frost  написал(а):
> 
> Vladimir,
> 
>> Increments in pgbackrest are done on file level which is not really 
>> efficient. We have done parallelism, compression and page-level increments 
>> (9.3+) in barman fork [1], but unfortunately guys from 2ndquadrant-it don’t 
>> hurry to work on it.
> 
> We're looking at page-level incremental backup in pgbackrest also.  For
> larger systems, we've not heard too much complaining about it being
> file-based though, which is why it hasn't been a priority.  Of course,
> the OP is on 9.1 too, so.

Well, we have forked barman and made everything from the above just because we 
needed ~ 2 PB of disk space for storing backups for our ~ 300 TB of data. (Our 
recovery window is 7 days) And on 5 TB database it took a lot of time to 
make/restore a backup.

> 
> As for your fork, well, I can't say I really blame the barman folks for
> being cautious- that's usually a good thing in your backup software. :)

The reason seems to be not the caution but the lack of time for working on it. 
But yep, it took us half a year to deploy our fork everywhere. And it would 
take much more time if we didn’t have system for checking backups consistency.

> 
> I'm curious how you're handling compressed page-level incremental
> backups though.  I looked through barman-incr and it wasn't obvious to
> me what was going wrt how the incrementals are stored, are they ending
> up as sparse files, or are you actually copying/overwriting the prior
> file in the backup repository?

No, we do store each file in the following way. At the beginning you write a 
map of changed pages. At second you write changed pages themselves. The 
compression is streaming so you don’t need much memory for that but the 
downside of this approach is that you read each datafile twice (we believe in 
page cache here).

>  Apologies, python isn't my first
> language, but the lack of any comment anywhere in that file doesn't
> really help.

Not a problem. Actually, it would be much easier to understand if it was a 
series of commits rather than one commit that we do ammend and force-push after 
each rebase on vanilla barman. We should add comments.

--
May the force be with you…
https://simply.name



Re: [PERFORM] Backup taking long time !!!

2017-01-20 Thread Vladimir Borodin

> 20 янв. 2017 г., в 18:06, Stephen Frost  написал(а):
> 
> Right, without incremental or compressed backups, you'd have to have
> room for 7 full copies of your database.  Have you looked at what your
> incrementals would be like with file-level incrementals and compression?

Most of our DBs can’t use partitioning over time-series fields, so we have a 
lot of datafiles in which only a few pages have been modified. So file-level 
increments didn’t really work for us. And we didn’t use compression in barman 
before patching it because single-threaded compression sucks.

> How are you testing your backups..?  Do you have page-level checksums
> enabled on your database?  

Yep, we use checksums. We restore latest backup with recovery_target = 
'immediate' and do COPY tablename TO '/dev/null’ with checking exit code for 
each table in each database (in several threads, of course).

> pgbackrest recently added the ability to
> check PG page-level checksums during a backup and report issues.

Sounds interesting, should take a look.

--
May the force be with you…
https://simply.name



Re: [PERFORM] Questionaire: Common WAL write rates on busy servers.

2017-04-25 Thread Vladimir Borodin
Hi Andres.

> 25 апр. 2017 г., в 7:17, Andres Freund  написал(а):
> 
> Hi,
> 
> I've lately seen more and more installations where the generation of
> write-ahead-log (WAL) is one of the primary bottlenecks.  I'm curious
> whether that's primarily a "sampling error" of mine, or whether that's
> indeed more common.
> 
> The primary reason I'm curious is that I'm pondering a few potential
> optimizations, and would like to have some guidance which are more and
> which are less important.
> 
> Questions (answer as many you can comfortably answer):
> - How many MB/s, segments/s do you see on busier servers?

Nearly one WAL (16 MB) per second most of the time and 3 WALs per second in the 
beginning of checkpoint (due to full_page_writes).

> - What generates the bulk of WAL on your servers (9.5+ can use
>  pg_xlogdump --stats to compute that)?

Here is the output from a couple of our masters (and that is actually two hours 
before peak load):

$ pg_xlogdump --stats 000100012B280089 000100012B300088 | fgrep 
-v 0.00

Type   N  (%)  Record size  
(%) FPI size  (%)Combined size  (%)
   -  ---  ---  
---   ----  ---
Heap2   55820638 ( 21.31)   1730485085 
( 22.27)   1385795249 ( 13.28)   3116280334 ( 17.12)
Heap74366993 ( 28.39)   2288644932 
( 29.46)   5880717650 ( 56.34)   8169362582 ( 44.87)
Btree   84655827 ( 32.32)   2243526276 
( 28.88)   3170518879 ( 30.38)   5414045155 ( 29.74)
    
  
Total  2619337907769663301 
[42.67%]  10437031778 [57.33%]  18206695079 [100%]
$

$ pg_xlogdump --stats 0001D17F00A5 0001D1910004 | fgrep 
-v 0.00
Type   N  (%)  Record size  
(%) FPI size  (%)Combined size  (%)
   -  ---  ---  
---   ----  ---
Heap2   13676881 ( 18.95)422289539 
( 19.97)  15319927851 ( 25.63)  15742217390 ( 25.44)
Heap22284283 ( 30.88)715293050 
( 33.83)  17119265188 ( 28.64)  17834558238 ( 28.82)
Btree   27640155 ( 38.30)725674896 
( 34.32)  19244109632 ( 32.19)  19969784528 ( 32.27)
Gin  6580760 (  9.12)172246586 
(  8.15)   8091332009 ( 13.54)   8263578595 ( 13.35)
    
  
Total   721729832114133847 
[3.42%]   59774634680 [96.58%]  61888768527 [100%]
$

> - Are you seeing WAL writes being a bottleneck?OA

We do sometimes see WALWriteLock in pg_stat_activity.wait_event, but not too 
often.

> - What kind of backup methods are you using and is the WAL volume a
>  problem?

We use fork of barman project. In most cases that’s not a problem.

> - What kind of replication are you using and is the WAL volume a
>  problem?

Physical streaming replication. We used to have problems with network bandwidth 
(1 Gbit/s was consumed by transferring WAL to two replicas and one archive) but 
that became better after 1. upgrading to 9.5 and turning wal_compression on, 2. 
changing archive command to doing parallel compression and sending WALs to 
archive, 3. increasing checkpoint_timeout.

> - What are your settings for wal_compression, max_wal_size (9.5+) /
>  checkpoint_segments (< 9.5), checkpoint_timeout and wal_buffers?

xdb301e/postgres M # SELECT name, current_setting(name) FROM pg_settings
WHERE name IN ('max_wal_size', 'checkpoint_timeout', 'wal_compression', 
'wal_buffers');
name| current_setting
+-
 checkpoint_timeout | 1h
 max_wal_size   | 128GB
 wal_buffers| 16MB
 wal_compression| on
(4 rows)

Time: 0.938 ms
xdb301e/postgres M #

> - Could you quickly describe your workload?

OLTP workload with 80% reads and 20% writes.

> 
> Feel free to add any information you think is pertinent ;)

Well, we actually workarounded issues with WAL write rate by increasing 
checkpoint_timeout to maximum possible (in 9.6 it can be even more). The 
downside of this change is recovery time. Thanks postgres