With these settings auto vacuuming of all tables became more often (as was expected), but not of big tables with lots of inserts (and with absolutely no updates). I've done a cron-script that does "vacuum analyze" for all databases which runs every hour. And it seems that right now I don't have any performance problems.
Ilya and Emre, thank you for help. 13.02.2014, в 16:20, Ilya Kosmodemiansky <ilya.kosmodemian...@postgresql-consulting.com> написал(а): > Vladimir, > > pgbouncer works with pl/proxy in transaction pooling mode. A wide spread > phrase that statement mode is for plproxy does not mean any limitations for > transaction pooling mode until you have atocommit on client. Anyway, try to > reduce connections. > > try to set your autovacuum a bit more aggressive: > > > autovacuum_analyze_scale_ > factor=0.05 #or like that > autovacuum_analyze_threshold=5 > autovacuum_freeze_max_age=200000000 > autovacuum_max_workers=20 # that is fine for slow disks > autovacuum_naptime=1 > autovacuum_vacuum_cost_delay=5 # or at least 10 > autovacuum_vacuum_cost_limit =-1 > autovacuum_vacuum_scale_factor=0.01 # this setting is to be really > aggressive, otherwise you simply postpone huge vacuums and related disk io, > smaller portions are better > autovacuum_vacuum_threshold=20 > > probably you will also need some ionice for autovacuum workers > > > On Thu, Feb 13, 2014 at 1:13 PM, Ilya Kosmodemiansky > <ilya.kosmodemian...@postgresql-consulting.com> wrote: > Vladimir, > > pgbouncer works with pl/proxy in transaction pooling mode. A wide spread > phrase that statement mode is for plproxy does not mean any limitations for > transaction pooling mode until you have atocommit on client. Anyway, try to > reduce connections. > > try to set your autovacuum a bit more aggressive: > > > autovacuum_analyze_scale_factor=0.05 #or like that > autovacuum_analyze_threshold=5 > autovacuum_freeze_max_age=200000000 > autovacuum_max_workers=20 # that is fine for slow disks > autovacuum_naptime=1 > autovacuum_vacuum_cost_delay=5 # or at least 10 > autovacuum_vacuum_cost_limit =-1 > autovacuum_vacuum_scale_factor=0.01 # this setting is to be really > aggressive, otherwise you simply postpone huge vacuums and related disk io, > smaller portions are better > autovacuum_vacuum_threshold=20 > > probably you will also need some ionice for autovacuum workers > > > > On Thu, Feb 13, 2014 at 11:26 AM, Бородин Владимир <r...@simply.name> wrote: > > 13.02.2014, в 13:29, Ilya Kosmodemiansky <hydrobi...@gmail.com> написал(а): > >> Vladimir, >> >> And, any effect on your problem? > > It worked without problems longer than previous configuration but repeated > again several minutes ago :( > >> >> On Thu, Feb 13, 2014 at 9:35 AM, Бородин Владимир <r...@simply.name> wrote: >>> I have limited max connections to 1000, reduced shared buffers to 8G and >>> restarted postgres. >> >> 1000 is still to much in most cases. With pgbouncer in transaction >> pooling mode normaly pool size 8-32, max_connections = 100 (default >> value) and client_connections 500-1500 looks more reasonable. > > Clients for this db are plproxy hosts. As far as I know plproxy can work only > with statement pooling. > >> >> >>> I have also noticed that this big tables stopped vacuuming automatically a >>> couple of weeks ago. It could be the reason of the problem, I will now try >>> to tune autovacuum parameters to turn it back. But yesterday I ran "vacuum >>> analyze" for all relations manually but that did not help. >> >> How do your autovacuum parameters look like now? > > They were all default except for vacuum_defer_cleanup_age = 100000. I have > increased autovacuum_max_workers = 20 because I have 10 databases with about > 10 tables each. That did not make better (I haven't seen more than two auto > vacuum workers simultaneously). Then I have tried to set vacuum_cost_limit = > 1000. Still not vacuuming big tables. Right now the parameters look like this: > > root@rpopdb01e ~ # fgrep vacuum > /var/lib/pgsql/9.3/data/conf.d/postgresql.conf > #vacuum_cost_delay = 0 # 0-100 milliseconds > #vacuum_cost_page_hit = 1 # 0-10000 credits > #vacuum_cost_page_miss = 10 # 0-10000 credits > #vacuum_cost_page_dirty = 20 # 0-10000 credits > vacuum_cost_limit = 1000 # 1-10000 credits > vacuum_defer_cleanup_age = 100000 # number of xacts by which cleanup is > delayed > autovacuum = on # Enable autovacuum subprocess? 'on' > log_autovacuum_min_duration = 0 # -1 disables, 0 logs all actions and > autovacuum_max_workers = 20 # max number of autovacuum > subprocesses > #autovacuum_naptime = 1min # time between autovacuum runs > #autovacuum_vacuum_threshold = 50 # min number of row updates before > # vacuum > #autovacuum_analyze_threshold = 50 # min number of row updates before > #autovacuum_vacuum_scale_factor = 0.2 # fraction of table size before vacuum > #autovacuum_analyze_scale_factor = 0.1 # fraction of table size before > analyze > #autovacuum_freeze_max_age = 200000000 # maximum XID age before forced vacuum > #autovacuum_vacuum_cost_delay = 20ms # default vacuum cost delay for > # autovacuum, in milliseconds; > # -1 means use vacuum_cost_delay > #autovacuum_vacuum_cost_limit = -1 # default vacuum cost limit for > # autovacuum, -1 means use > # vacuum_cost_limit > #vacuum_freeze_min_age = 50000000 > #vacuum_freeze_table_age = 150000000 > root@rpopdb01e ~ # > >> >>> 13.02.2014, в 0:14, Ilya Kosmodemiansky <hydrobi...@gmail.com> написал(а): >>> >>> On Wed, Feb 12, 2014 at 8:57 PM, Бородин Владимир <r...@simply.name> wrote: >>> >>> >>> Yes, this is legacy, I will fix it. We had lots of inactive connections but >>> right now we use pgbouncer for this. When the workload is normal we have >>> some kind of 80-120 backends. Less than 10 of them are in active state. >>> Having problem with locks we get lots of sessions (sometimes more than 1000 >>> of them are in active state). According to vmstat the number of context >>> switches is not so big (less than 20k), so I don't think it is the main >>> reason. Yes, it can aggravate the problem, but imho not create it. >>> >>> >>> >>> I'am afraid that is the problem. More than 1000 backends, most of them >>> are simply waiting. >>> >>> >>> >>> I don't understand the correlation of shared buffers size and >>> synchronous_commit. Could you please explain your statement? >>> >>> >>> >>> You need to fsync your huge shared buffers any time your database >>> performs checkpoint. By default it usually happens too often because >>> checkpoint_timeout is 5min by default. Without bbu, on software raid >>> that leads to io spike and you commit waits for wal. >>> >>> >>> >>> 12.02.2014, в 23:37, Ilya Kosmodemiansky <hydrobi...@gmail.com> написал(а): >>> >>> another thing which is arguable - concurrency degree. How many of your >>> max_connections = 4000 are actually running? 4000 definitely looks like an >>> overkill and they could be a serious source of concurrency, especially then >>> you have had barrier enabled and software raid. >>> >>> Plus for 32Gb of shared buffers with synchronous_commit = on especially on >>> heavy workload one should definitely have bbu, otherwise performance will >>> be poor. >>> >>> >>> On Wed, Feb 12, 2014 at 8:20 PM, Бородин Владимир <r...@simply.name> wrote: >>> >>> >>> Oh, I haven't thought about barriers, sorry. Although I use soft raid >>> without batteries I have turned barriers off on one cluster shard to try. >>> >>> root@rpopdb01e ~ # mount | fgrep data >>> /dev/md2 on /var/lib/pgsql/9.3/data type ext4 (rw,noatime,nodiratime) >>> root@rpopdb01e ~ # mount -o remount,nobarrier /dev/md2 >>> root@rpopdb01e ~ # mount | fgrep data >>> /dev/md2 on /var/lib/pgsql/9.3/data type ext4 >>> (rw,noatime,nodiratime,nobarrier) >>> root@rpopdb01e ~ # >>> >>> 12.02.2014, в 21:56, Ilya Kosmodemiansky <hydrobi...@gmail.com> написал(а): >>> >>> My question was actually about barrier option, by default it is enabled on >>> RHEL6/ext4 and could cause serious bottleneck on io before disks are >>> actually involved. What says mount without arguments? >>> >>> On Feb 12, 2014, at 18:43, Бородин Владимир <r...@simply.name> wrote: >>> >>> root@rpopdb01e ~ # fgrep data /etc/fstab >>> UUID=f815fd3f-e4e4-43a6-a6a1-bce1203db3e0 /var/lib/pgsql/9.3/data ext4 >>> noatime,nodiratime 0 1 >>> root@rpopdb01e ~ # >>> >>> According to iostat the disks are not the bottleneck. >>> >>> 12.02.2014, в 21:30, Ilya Kosmodemiansky <hydrobi...@gmail.com> написал(а): >>> >>> Hi Vladimir, >>> >>> Just in case: how is your ext4 mount? >>> >>> Best regards, >>> Ilya >>> >>> On Feb 12, 2014, at 17:59, Бородин Владимир <r...@simply.name> wrote: >>> >>> Hi all. >>> >>> Today I have started getting errors like below in logs (seems that I have >>> not changed anything for last week). When it happens the db gets lots of >>> connections in state active, eats 100% cpu and clients get errors (due to >>> timeout). >>> >>> 2014-02-12 15:44:24.562 >>> MSK,"rpop","rpopdb_p6",30061,"localhost:58350",52fb5e53.756d,1,"SELECT >>> waiting",2014-02-12 15:43:15 MSK,143/264877,1002850566,LOG,00000,"process >>> 30061 still waiting for ExclusiveLock on extension of relation 26118 of >>> database 24590 after 1000.082 ms",,,,,"SQL statement ""insert into >>> rpop.rpop_imap_uidls (folder_id, uidl) values (i_folder_id, i_uidl)"" >>> >>> I have read several topics [1, 2, 3, 4] with similar problems but haven't >>> find a good solution. Below is some more diagnostics. >>> >>> I am running PostgreSQL 9.3.2 installed from RPM packages on RHEL 6.4. Host >>> is running with the following CPU (32 cores) and memory: >>> >>> root@rpopdb01e ~ # fgrep -m1 'model name' /proc/cpuinfo >>> model name : Intel(R) Xeon(R) CPU E5-2660 0 @ 2.20GHz >>> root@rpopdb01e ~ # free -m >>> total used free shared buffers cached >>> Mem: 129028 123558 5469 0 135 119504 >>> -/+ buffers/cache: 3918 125110 >>> Swap: 16378 0 16378 >>> root@rpopdb01e ~ # >>> >>> PGDATA lives on RAID6 array of 8 ssd-disks with ext4, iostat and atop say >>> the disks are really free. Right now PGDATA takes only 95G. >>> The settings changed in postgresql.conf are here [5]. >>> >>> When it happens the last query from here [6] shows that almost all queries >>> are waiting for ExclusiveLock, but they do a simple insert. >>> >>> (extend,26647,26825,,,,,,,) | 5459 | ExclusiveLock | 1 | >>> (extend,26647,26825,,,,,,,) | 8053 | ExclusiveLock | 5459,8053 >>> (extend,26647,26828,,,,,,,) | 5567 | ExclusiveLock | 1 | >>> (extend,26647,26828,,,,,,,) | 5490 | ExclusiveLock | 5567,5490 >>> (extend,24584,25626,,,,,,,) | 5611 | ExclusiveLock | 1 | >>> (extend,24584,25626,,,,,,,) | 3963 | ExclusiveLock | 5611,3963 >>> >>> I have several databases running on one host with one postmaster process >>> and ExclusiveLock is being waited by many oids. I suppose the only common >>> thing for all of them is that they are bigger than others and they almost >>> do not get updates and deletes (only inserts and reads). Some more info >>> about one of such tables is here [7]. >>> >>> I have tried to look at the source code (src/backend/access/heap/hio.c) to >>> understand when the exclusive lock can be taken, but I could only read >>> comments :) I have also examined FSM for this tables and their indexes and >>> found that for most of them there are free pages but there are, for >>> example, such cases: >>> >>> rpopdb_p0=# select count(*) from pg_freespace('rpop.rpop_uidl') where avail >>> != 0; >>> count >>> -------- >>> 115953 >>> (1 row) >>> >>> rpopdb_p0=# select count(*) from pg_freespace('rpop.pk_rpop_uidl') where >>> avail != 0; >>> count >>> ------- >>> 0 >>> (1 row) >>> >>> rpopdb_p0=# \dS+ rpop.rpop_uidl >>> Table "rpop.rpop_uidl" >>> Column | Type | Modifiers | Storage | Stats target | >>> Description >>> --------+------------------------+-----------+----------+--------------+------------- >>> popid | bigint | not null | plain | | >>> uidl | character varying(200) | not null | extended | | >>> Indexes: >>> "pk_rpop_uidl" PRIMARY KEY, btree (popid, uidl) >>> Has OIDs: no >>> >>> rpopdb_p0=# >>> >>> >>> My questions are: >>> 1. Do we consume 100% cpu (in system) trying to get page from FSM? Or does >>> it happen during exclusive lock acquiring? How can I dig it? >>> 2. How much space do we extend to the relation when we get exclusive lock >>> on it? >>> 3. Why extended page is not visible for other backends? >>> 4. Is there any possibility of situation where backend A got exclusive lock >>> on some relation to extend it. Then OS CPU scheduler made a context switch >>> to backend B while backend B is waiting for exclusive lock on the same >>> relation. And so on for many backends. >>> 5. (and the main question) what can I do to get rid of such situations? It >>> is a production cluster and I do not have any ideas what to do with this >>> situation :( Any help would be really appropriate. >>> >>> [1] >>> http://www.postgresql.org/message-id/8bca3aa10906011613l8ac2423h8153bbd2513dc...@mail.gmail.com >>> [2] >>> http://pgsql.performance.narkive.com/IrkPbl3f/postgresql-9-2-3-performance-problem-caused-exclusive-locks >>> [3] http://www.postgresql.org/message-id/50a2c93e.9070...@dalibo.com >>> [4] >>> http://www.postgresql.org/message-id/cal_0b1sypyeoynkynv95nnv2d+4jxtug3hkkf6fahfw7gvg...@mail.gmail.com >>> [5] http://pastebin.com/raw.php?i=Bd40Vn6h >>> [6] http://wiki.postgresql.org/wiki/Lock_dependency_information >>> [7 http://pastebin.com/raw.php?i=eGrtG524] >>> >>> -- >>> Vladimir >>> >>> >>> >>> >>> >>> >>> -- >>> Vladimir >>> >>> >>> >>> >>> >>> >>> -- >>> Да пребудет с вами сила... >>> http://simply.name >>> >>> >>> >>> >>> >>> >>> >>> >>> -- >>> Vladimir >>> >>> >>> >>> >>> >>> >>> -- >>> Vladimir >>> >>> >>> >>> > > > -- > Да пребудет с вами сила... > http://simply.name > > > > > > > > > -- > Ilya Kosmodemiansky > > Database consultant, > PostgreSQL-Consulting.com > > tel. +14084142500 > cell. +4915144336040 > i...@postgresql-consulting.com > > > > -- > Ilya Kosmodemiansky > > Database consultant, > PostgreSQL-Consulting.com > > tel. +14084142500 > cell. +4915144336040 > i...@postgresql-consulting.com -- Да пребудет с вами сила... http://simply.name