Oh, I haven't thought about barriers, sorry. Although I use soft raid without batteries I have turned barriers off on one cluster shard to try.
root@rpopdb01e ~ # mount | fgrep data /dev/md2 on /var/lib/pgsql/9.3/data type ext4 (rw,noatime,nodiratime) root@rpopdb01e ~ # mount -o remount,nobarrier /dev/md2 root@rpopdb01e ~ # mount | fgrep data /dev/md2 on /var/lib/pgsql/9.3/data type ext4 (rw,noatime,nodiratime,nobarrier) root@rpopdb01e ~ # 12.02.2014, в 21:56, Ilya Kosmodemiansky <hydrobi...@gmail.com> написал(а): > My question was actually about barrier option, by default it is enabled on > RHEL6/ext4 and could cause serious bottleneck on io before disks are actually > involved. What says mount without arguments? > > On Feb 12, 2014, at 18:43, Бородин Владимир <r...@simply.name> wrote: > >> root@rpopdb01e ~ # fgrep data /etc/fstab >> UUID=f815fd3f-e4e4-43a6-a6a1-bce1203db3e0 /var/lib/pgsql/9.3/data ext4 >> noatime,nodiratime 0 1 >> root@rpopdb01e ~ # >> >> According to iostat the disks are not the bottleneck. >> >> 12.02.2014, в 21:30, Ilya Kosmodemiansky <hydrobi...@gmail.com> написал(а): >> >>> Hi Vladimir, >>> >>> Just in case: how is your ext4 mount? >>> >>> Best regards, >>> Ilya >>> >>> On Feb 12, 2014, at 17:59, Бородин Владимир <r...@simply.name> wrote: >>> >>>> Hi all. >>>> >>>> Today I have started getting errors like below in logs (seems that I have >>>> not changed anything for last week). When it happens the db gets lots of >>>> connections in state active, eats 100% cpu and clients get errors (due to >>>> timeout). >>>> >>>> 2014-02-12 15:44:24.562 >>>> MSK,"rpop","rpopdb_p6",30061,"localhost:58350",52fb5e53.756d,1,"SELECT >>>> waiting",2014-02-12 15:43:15 MSK,143/264877,1002850566,LOG,00000,"process >>>> 30061 still waiting for ExclusiveLock on extension of relation 26118 of >>>> database 24590 after 1000.082 ms",,,,,"SQL statement ""insert into >>>> rpop.rpop_imap_uidls (folder_id, uidl) values (i_folder_id, i_uidl)"" >>>> >>>> I have read several topics [1, 2, 3, 4] with similar problems but haven't >>>> find a good solution. Below is some more diagnostics. >>>> >>>> I am running PostgreSQL 9.3.2 installed from RPM packages on RHEL 6.4. >>>> Host is running with the following CPU (32 cores) and memory: >>>> >>>> root@rpopdb01e ~ # fgrep -m1 'model name' /proc/cpuinfo >>>> model name : Intel(R) Xeon(R) CPU E5-2660 0 @ 2.20GHz >>>> root@rpopdb01e ~ # free -m >>>> total used free shared buffers cached >>>> Mem: 129028 123558 5469 0 135 119504 >>>> -/+ buffers/cache: 3918 125110 >>>> Swap: 16378 0 16378 >>>> root@rpopdb01e ~ # >>>> >>>> PGDATA lives on RAID6 array of 8 ssd-disks with ext4, iostat and atop say >>>> the disks are really free. Right now PGDATA takes only 95G. >>>> The settings changed in postgresql.conf are here [5]. >>>> >>>> When it happens the last query from here [6] shows that almost all queries >>>> are waiting for ExclusiveLock, but they do a simple insert. >>>> >>>> (extend,26647,26825,,,,,,,) | 5459 | ExclusiveLock | 1 | >>>> (extend,26647,26825,,,,,,,) | 8053 | ExclusiveLock | 5459,8053 >>>> (extend,26647,26828,,,,,,,) | 5567 | ExclusiveLock | 1 | >>>> (extend,26647,26828,,,,,,,) | 5490 | ExclusiveLock | 5567,5490 >>>> (extend,24584,25626,,,,,,,) | 5611 | ExclusiveLock | 1 | >>>> (extend,24584,25626,,,,,,,) | 3963 | ExclusiveLock | 5611,3963 >>>> >>>> I have several databases running on one host with one postmaster process >>>> and ExclusiveLock is being waited by many oids. I suppose the only common >>>> thing for all of them is that they are bigger than others and they almost >>>> do not get updates and deletes (only inserts and reads). Some more info >>>> about one of such tables is here [7]. >>>> >>>> I have tried to look at the source code (src/backend/access/heap/hio.c) to >>>> understand when the exclusive lock can be taken, but I could only read >>>> comments :) I have also examined FSM for this tables and their indexes and >>>> found that for most of them there are free pages but there are, for >>>> example, such cases: >>>> >>>> rpopdb_p0=# select count(*) from pg_freespace('rpop.rpop_uidl') where >>>> avail != 0; >>>> count >>>> -------- >>>> 115953 >>>> (1 row) >>>> >>>> rpopdb_p0=# select count(*) from pg_freespace('rpop.pk_rpop_uidl') where >>>> avail != 0; >>>> count >>>> ------- >>>> 0 >>>> (1 row) >>>> >>>> rpopdb_p0=# \dS+ rpop.rpop_uidl >>>> Table "rpop.rpop_uidl" >>>> Column | Type | Modifiers | Storage | Stats target | >>>> Description >>>> --------+------------------------+-----------+----------+--------------+------------- >>>> popid | bigint | not null | plain | | >>>> uidl | character varying(200) | not null | extended | | >>>> Indexes: >>>> "pk_rpop_uidl" PRIMARY KEY, btree (popid, uidl) >>>> Has OIDs: no >>>> >>>> rpopdb_p0=# >>>> >>>> >>>> My questions are: >>>> 1. Do we consume 100% cpu (in system) trying to get page from FSM? Or does >>>> it happen during exclusive lock acquiring? How can I dig it? >>>> 2. How much space do we extend to the relation when we get exclusive lock >>>> on it? >>>> 3. Why extended page is not visible for other backends? >>>> 4. Is there any possibility of situation where backend A got exclusive >>>> lock on some relation to extend it. Then OS CPU scheduler made a context >>>> switch to backend B while backend B is waiting for exclusive lock on the >>>> same relation. And so on for many backends. >>>> 5. (and the main question) what can I do to get rid of such situations? It >>>> is a production cluster and I do not have any ideas what to do with this >>>> situation :( Any help would be really appropriate. >>>> >>>> [1] >>>> http://www.postgresql.org/message-id/8bca3aa10906011613l8ac2423h8153bbd2513dc...@mail.gmail.com >>>> [2] >>>> http://pgsql.performance.narkive.com/IrkPbl3f/postgresql-9-2-3-performance-problem-caused-exclusive-locks >>>> [3] http://www.postgresql.org/message-id/50a2c93e.9070...@dalibo.com >>>> [4] >>>> http://www.postgresql.org/message-id/cal_0b1sypyeoynkynv95nnv2d+4jxtug3hkkf6fahfw7gvg...@mail.gmail.com >>>> [5] http://pastebin.com/raw.php?i=Bd40Vn6h >>>> [6] http://wiki.postgresql.org/wiki/Lock_dependency_information >>>> [7 http://pastebin.com/raw.php?i=eGrtG524] >>>> >>>> -- >>>> Vladimir >>>> >>>> >>>> >>>> >> >> >> -- >> Vladimir >> >> >> >> -- Да пребудет с вами сила... http://simply.name