Oh, I haven't thought about barriers, sorry. Although I use soft raid without 
batteries I have turned barriers off on one cluster shard to try.

root@rpopdb01e ~ # mount | fgrep data
/dev/md2 on /var/lib/pgsql/9.3/data type ext4 (rw,noatime,nodiratime)
root@rpopdb01e ~ # mount -o remount,nobarrier /dev/md2
root@rpopdb01e ~ # mount | fgrep data
/dev/md2 on /var/lib/pgsql/9.3/data type ext4 (rw,noatime,nodiratime,nobarrier)
root@rpopdb01e ~ #

12.02.2014, в 21:56, Ilya Kosmodemiansky <hydrobi...@gmail.com> написал(а):

> My question was actually about barrier option, by default it is enabled on 
> RHEL6/ext4 and could cause serious bottleneck on io before disks are actually 
> involved. What says mount without arguments? 
> 
> On Feb 12, 2014, at 18:43, Бородин Владимир <r...@simply.name> wrote:
> 
>> root@rpopdb01e ~ # fgrep data /etc/fstab
>> UUID=f815fd3f-e4e4-43a6-a6a1-bce1203db3e0 /var/lib/pgsql/9.3/data ext4 
>> noatime,nodiratime 0 1
>> root@rpopdb01e ~ #
>> 
>> According to iostat the disks are not the bottleneck.
>> 
>> 12.02.2014, в 21:30, Ilya Kosmodemiansky <hydrobi...@gmail.com> написал(а):
>> 
>>> Hi Vladimir,
>>> 
>>> Just in case: how is your ext4 mount? 
>>> 
>>> Best regards, 
>>> Ilya
>>> 
>>> On Feb 12, 2014, at 17:59, Бородин Владимир <r...@simply.name> wrote:
>>> 
>>>> Hi all.
>>>> 
>>>> Today I have started getting errors like below in logs (seems that I have 
>>>> not changed anything for last week). When it happens the db gets lots of 
>>>> connections in state active, eats 100% cpu and clients get errors (due to 
>>>> timeout). 
>>>> 
>>>> 2014-02-12 15:44:24.562 
>>>> MSK,"rpop","rpopdb_p6",30061,"localhost:58350",52fb5e53.756d,1,"SELECT 
>>>> waiting",2014-02-12 15:43:15 MSK,143/264877,1002850566,LOG,00000,"process 
>>>> 30061 still waiting for ExclusiveLock on extension of relation 26118 of 
>>>> database 24590 after 1000.082 ms",,,,,"SQL statement ""insert into 
>>>> rpop.rpop_imap_uidls (folder_id, uidl) values (i_folder_id, i_uidl)""
>>>> 
>>>> I have read several topics [1, 2, 3, 4] with similar problems but haven't 
>>>> find a good solution. Below is some more diagnostics.
>>>> 
>>>> I am running PostgreSQL 9.3.2 installed from RPM packages on RHEL 6.4. 
>>>> Host is running with the following CPU (32 cores) and memory:
>>>> 
>>>> root@rpopdb01e ~ # fgrep -m1 'model name' /proc/cpuinfo
>>>> model name : Intel(R) Xeon(R) CPU E5-2660 0 @ 2.20GHz
>>>> root@rpopdb01e ~ # free -m
>>>>              total       used       free     shared    buffers     cached
>>>> Mem:        129028     123558       5469          0        135     119504
>>>> -/+ buffers/cache:       3918     125110
>>>> Swap:        16378          0      16378
>>>> root@rpopdb01e ~ #
>>>> 
>>>> PGDATA lives on RAID6 array of 8 ssd-disks with ext4, iostat and atop say 
>>>> the disks are really free. Right now PGDATA takes only 95G.
>>>> The settings changed in postgresql.conf are here [5].
>>>> 
>>>> When it happens the last query from here [6] shows that almost all queries 
>>>> are waiting for ExclusiveLock, but they do a simple insert.
>>>> 
>>>>  (extend,26647,26825,,,,,,,) |        5459 | ExclusiveLock |     1 | 
>>>> (extend,26647,26825,,,,,,,) | 8053 | ExclusiveLock | 5459,8053
>>>>  (extend,26647,26828,,,,,,,) |        5567 | ExclusiveLock |     1 | 
>>>> (extend,26647,26828,,,,,,,) | 5490 | ExclusiveLock | 5567,5490
>>>>  (extend,24584,25626,,,,,,,) |        5611 | ExclusiveLock |     1 | 
>>>> (extend,24584,25626,,,,,,,) | 3963 | ExclusiveLock | 5611,3963
>>>> 
>>>> I have several databases running on one host with one postmaster process 
>>>> and ExclusiveLock is being waited by many oids. I suppose the only common 
>>>> thing for all of them is that they are bigger than others and they almost 
>>>> do not get updates and deletes (only inserts and reads). Some more info 
>>>> about one of such tables is here [7].
>>>> 
>>>> I have tried to look at the source code (src/backend/access/heap/hio.c) to 
>>>> understand when the exclusive lock can be taken, but I could only read 
>>>> comments :) I have also examined FSM for this tables and their indexes and 
>>>> found that for most of them there are free pages but there are, for 
>>>> example, such cases:
>>>> 
>>>> rpopdb_p0=# select count(*) from pg_freespace('rpop.rpop_uidl') where 
>>>> avail != 0;
>>>>  count
>>>> --------
>>>>  115953
>>>> (1 row)
>>>> 
>>>> rpopdb_p0=# select count(*) from pg_freespace('rpop.pk_rpop_uidl') where 
>>>> avail != 0;
>>>>  count
>>>> -------
>>>>      0
>>>> (1 row)
>>>> 
>>>> rpopdb_p0=# \dS+ rpop.rpop_uidl
>>>>                                Table "rpop.rpop_uidl"
>>>>  Column |          Type          | Modifiers | Storage  | Stats target | 
>>>> Description
>>>> --------+------------------------+-----------+----------+--------------+-------------
>>>>  popid  | bigint                 | not null  | plain    |              |
>>>>  uidl   | character varying(200) | not null  | extended |              |
>>>> Indexes:
>>>>     "pk_rpop_uidl" PRIMARY KEY, btree (popid, uidl)
>>>> Has OIDs: no
>>>> 
>>>> rpopdb_p0=#
>>>> 
>>>> 
>>>> My questions are:
>>>> 1. Do we consume 100% cpu (in system) trying to get page from FSM? Or does 
>>>> it happen during exclusive lock acquiring? How can I dig it?
>>>> 2. How much space do we extend to the relation when we get exclusive lock 
>>>> on it?
>>>> 3. Why extended page is not visible for other backends?
>>>> 4. Is there any possibility of situation where backend A got exclusive 
>>>> lock on some relation to extend it. Then OS CPU scheduler made a context 
>>>> switch to backend B while backend B is waiting for exclusive lock on the 
>>>> same relation. And so on for many backends.
>>>> 5. (and the main question) what can I do to get rid of such situations? It 
>>>> is a production cluster and I do not have any ideas what to do with this 
>>>> situation :( Any help would be really appropriate.
>>>> 
>>>> [1] 
>>>> http://www.postgresql.org/message-id/8bca3aa10906011613l8ac2423h8153bbd2513dc...@mail.gmail.com
>>>> [2] 
>>>> http://pgsql.performance.narkive.com/IrkPbl3f/postgresql-9-2-3-performance-problem-caused-exclusive-locks
>>>> [3] http://www.postgresql.org/message-id/50a2c93e.9070...@dalibo.com
>>>> [4] 
>>>> http://www.postgresql.org/message-id/cal_0b1sypyeoynkynv95nnv2d+4jxtug3hkkf6fahfw7gvg...@mail.gmail.com
>>>> [5] http://pastebin.com/raw.php?i=Bd40Vn6h
>>>> [6] http://wiki.postgresql.org/wiki/Lock_dependency_information
>>>> [7 http://pastebin.com/raw.php?i=eGrtG524]
>>>> 
>>>> --
>>>> Vladimir
>>>> 
>>>> 
>>>> 
>>>> 
>> 
>> 
>> --
>> Vladimir
>> 
>> 
>> 
>> 


--
Да пребудет с вами сила...
http://simply.name





Reply via email to