hello again, some more data on the subject:
you are right, delete/re-insert didn't solve the problem (haven't yet tried dump/restore, i might tonight, when low traffic). a short snip from logs were i log all queries that take longer then 1sec: 2014-05-06T15:57:46.303880+03:00 s10 postgres[46220]: [1891-1] prosms prosms LOG: 00000: duration: 1947.172 ms execute <unnamed>: select easysms_jb_pay($1, $2) 2014-05-06T15:57:46.304230+03:00 s10 postgres[46220]: [1891-2] prosms prosms DETAIL: parameters: $1 = '10808', $2 = '2' 2014-05-06T15:57:46.304439+03:00 s10 postgres[46220]: [1891-3] prosms prosms LOCATION: exec_execute_message, postgres.c:1989 2014-05-06T15:57:56.199005+03:00 s10 postgres[58002]: [2570-1] prosms prosms LOG: 00000: duration: 6869.886 ms execute <unnamed>: select easysms_jb_pay($1,$2) 2014-05-06T15:57:56.199349+03:00 s10 postgres[58002]: [2570-2] prosms prosms DETAIL: parameters: $1 = '10808', $2 = '2' 2014-05-06T15:57:56.199567+03:00 s10 postgres[58002]: [2570-3] prosms prosms LOCATION: exec_execute_message, postgres.c:1989 2014-05-06T15:57:59.134982+03:00 s10 postgres[58002]: [2571-1] prosms prosms LOG: 00000: duration: 1797.747 ms execute <unnamed>: select easysms_jb_pay($1,$2) 2014-05-06T15:57:59.135334+03:00 s10 postgres[58002]: [2571-2] prosms prosms DETAIL: parameters: $1 = '10808', $2 = '2' 2014-05-06T15:57:59.135562+03:00 s10 postgres[58002]: [2571-3] prosms prosms LOCATION: exec_execute_message, postgres.c:1989 2014-05-06T15:58:07.149477+03:00 s10 postgres[46220]: [1892-1] prosms prosms LOG: 00000: duration: 3938.979 ms execute <unnamed>: select easysms_jb_pay($1,$2) 2014-05-06T15:58:07.149830+03:00 s10 postgres[46220]: [1892-2] prosms prosms DETAIL: parameters: $1 = '10808', $2 = '2' 2014-05-06T15:58:07.150067+03:00 s10 postgres[46220]: [1892-3] prosms prosms LOCATION: exec_execute_message, postgres.c:1989 2014-05-06T16:01:33.784422+03:00 s10 postgres[58002]: [2572-1] prosms prosms LOG: 00000: duration: 2921.212 ms execute <unnamed>: select easysms_jb_pay($1,$2) 2014-05-06T16:01:33.784842+03:00 s10 postgres[58002]: [2572-2] prosms prosms DETAIL: parameters: $1 = '10808', $2 = '4' 2014-05-06T16:01:33.785037+03:00 s10 postgres[58002]: [2572-3] prosms prosms LOCATION: exec_execute_message, postgres.c:1989 should you deem helpful yes i can enable full query logging and summon the listing for you. a typical vmstat: # vmstat 1 procs -----------memory---------- ---swap-- -----io---- -system-- -----cpu------ r b swpd free buff cache si so bi bo in cs us sy id wa st 1 0 363476 325852 294064 30520736 0 0 26 347 0 0 1 0 99 0 0 1 0 363476 328400 294068 30520732 0 0 0 28 4930 8014 6 1 94 0 0 2 0 363476 331756 294068 30520732 0 0 0 4384 4950 7980 6 1 93 0 0 0 0 363476 334016 294068 30520756 0 0 0 4384 4961 7981 7 1 92 0 0 0 0 363476 334700 294068 30520756 0 0 0 4424 4012 6467 4 1 95 0 0 1 0 363476 330852 294068 30520788 0 0 0 0 2559 3861 5 1 95 0 0 1 0 363476 331316 294072 30520788 0 0 0 4408 5013 8127 6 1 94 0 0 1 0 363476 330788 294072 30520788 0 0 0 4384 5535 9055 6 1 93 0 0 0 0 363476 331496 294072 30520804 0 0 0 4384 5031 8092 7 1 92 0 0 2 0 363476 331268 294072 30520804 0 0 0 4428 5052 8246 6 1 94 0 0 1 0 363476 330848 294080 30520812 0 0 0 32 4892 7950 5 1 94 0 0 1 0 363476 330480 294080 30520812 0 0 0 4388 4935 8036 6 1 94 0 0 2 0 363476 332616 294084 30521092 0 0 0 4408 5064 8194 6 1 93 0 0 0 0 363476 333596 294084 30521008 0 0 0 4384 5205 8463 8 1 91 0 0 1 0 363476 333324 294084 30521008 0 0 0 40 5014 8151 6 1 94 0 0 0 0 363476 332740 294084 30521016 0 0 0 4384 5047 8163 6 1 93 0 0 1 0 363476 336052 294084 30520888 0 0 0 4384 4849 7780 6 1 94 0 0 1 0 363476 334732 294088 30520892 0 0 8 4400 5520 9012 6 1 93 0 0 0 0 363476 334064 294096 30520884 0 0 0 220 3903 6193 6 1 94 0 0 0 0 363476 333124 294096 30520916 0 0 0 2232 4088 6462 6 1 93 0 0 the process that is constantly writing the majority of data is: "postgres: stats collector process" and varies from 2.5mb/sec up to 5.7mb/sec all the other postgres (and non-postgres) processes write very little data and rarely. the checkpointer process is like 78.6kb/sec (a few seconds now as i write this email but no other is having a constant rate or I/O) also, _while having the problem_ the results of the following queries are (taken from http://wiki.postgresql.org/wiki/Lock_Monitoring): SELECT relation::regclass, * FROM pg_locks WHERE NOT granted; (0 rows) SELECT bl.pid AS blocked_pid, a.usename AS blocked_user, ka.query AS blocking_statement, now() - ka.query_start AS blocking_duration, kl.pid AS blocking_pid, ka.usename AS blocking_user, a.query AS blocked_statement, now() - a.query_start AS blocked_duration FROM pg_catalog.pg_locks bl JOIN pg_catalog.pg_stat_activity a ON a.pid = bl.pid JOIN pg_catalog.pg_locks kl ON kl.transactionid = bl.transactionid AND kl.pid != bl.pid JOIN pg_catalog.pg_stat_activity ka ON ka.pid = kl.pid WHERE NOT bl.granted; (0 rows) SELECT waiting.locktype AS waiting_locktype, waiting.relation::regclass AS waiting_table, waiting_stm.query AS waiting_query, waiting.mode AS waiting_mode, waiting.pid AS waiting_pid, other.locktype AS other_locktype, other.relation::regclass AS other_table, other_stm.query AS other_query, other.mode AS other_mode, other.pid AS other_pid, other.granted AS other_grantedFROM pg_catalog.pg_locks AS waitingJOIN pg_catalog.pg_stat_activity AS waiting_stm ON ( waiting_stm.pid = waiting.pid )JOIN pg_catalog.pg_locks AS other ON ( ( waiting."database" = other."database" AND waiting.relation = other.relation ) OR waiting.transactionid = other.transactionid )JOIN pg_catalog.pg_stat_activity AS other_stm ON ( other_stm.pid = other.pid )WHERE NOT waiting.grantedAND waiting.pid <> other.pid; (0 rows) WITH RECURSIVE c(requested, current) AS ( VALUES ('AccessShareLock'::text, 'AccessExclusiveLock'::text), ('RowShareLock'::text, 'ExclusiveLock'::text), ('RowShareLock'::text, 'AccessExclusiveLock'::text), ('RowExclusiveLock'::text, 'ShareLock'::text), ('RowExclusiveLock'::text, 'ShareRowExclusiveLock'::text), ('RowExclusiveLock'::text, 'ExclusiveLock'::text), ('RowExclusiveLock'::text, 'AccessExclusiveLock'::text), ('ShareUpdateExclusiveLock'::text, 'ShareUpdateExclusiveLock'::text), ('ShareUpdateExclusiveLock'::text, 'ShareLock'::text), ('ShareUpdateExclusiveLock'::text, 'ShareRowExclusiveLock'::text), ('ShareUpdateExclusiveLock'::text, 'ExclusiveLock'::text), ('ShareUpdateExclusiveLock'::text, 'AccessExclusiveLock'::text), ('ShareLock'::text, 'RowExclusiveLock'::text), ('ShareLock'::text, 'ShareUpdateExclusiveLock'::text), ('ShareLock'::text, 'ShareRowExclusiveLock'::text), ('ShareLock'::text, 'ExclusiveLock'::text), ('ShareLock'::text, 'AccessExclusiveLock'::text), ('ShareRowExclusiveLock'::text, 'RowExclusiveLock'::text), ('ShareRowExclusiveLock'::text, 'ShareUpdateExclusiveLock'::text), ('ShareRowExclusiveLock'::text, 'ShareLock'::text), ('ShareRowExclusiveLock'::text, 'ShareRowExclusiveLock'::text), ('ShareRowExclusiveLock'::text, 'ExclusiveLock'::text), ('ShareRowExclusiveLock'::text, 'AccessExclusiveLock'::text), ('ExclusiveLock'::text, 'RowShareLock'::text), ('ExclusiveLock'::text, 'RowExclusiveLock'::text), ('ExclusiveLock'::text, 'ShareUpdateExclusiveLock'::text), ('ExclusiveLock'::text, 'ShareLock'::text), ('ExclusiveLock'::text, 'ShareRowExclusiveLock'::text), ('ExclusiveLock'::text, 'ExclusiveLock'::text), ('ExclusiveLock'::text, 'AccessExclusiveLock'::text), ('AccessExclusiveLock'::text, 'AccessShareLock'::text), ('AccessExclusiveLock'::text, 'RowShareLock'::text), ('AccessExclusiveLock'::text, 'RowExclusiveLock'::text), ('AccessExclusiveLock'::text, 'ShareUpdateExclusiveLock'::text), ('AccessExclusiveLock'::text, 'ShareLock'::text), ('AccessExclusiveLock'::text, 'ShareRowExclusiveLock'::text), ('AccessExclusiveLock'::text, 'ExclusiveLock'::text), ('AccessExclusiveLock'::text, 'AccessExclusiveLock'::text) ), l AS ( SELECT (locktype,DATABASE,relation::regclass::text,page,tuple,virtualxid,transactionid,classid,objid,objsubid) AS target, virtualtransaction, pid, mode, granted FROM pg_catalog.pg_locks ), t AS ( SELECT blocker.target AS blocker_target, blocker.pid AS blocker_pid, blocker.mode AS blocker_mode, blocked.target AS target, blocked.pid AS pid, blocked.mode AS mode FROM l blocker JOIN l blocked ON ( NOT blocked.granted AND blocker.granted AND blocked.pid != blocker.pid AND blocked.target IS NOT DISTINCT FROM blocker.target) JOIN c ON (c.requested = blocked.mode AND c.current = blocker.mode) ), r AS ( SELECT blocker_target, blocker_pid, blocker_mode, '1'::int AS depth, target, pid, mode, blocker_pid::text || ',' || pid::text AS seq FROM t UNION ALL SELECT blocker.blocker_target, blocker.blocker_pid, blocker.blocker_mode, blocker.depth + 1, blocked.target, blocked.pid, blocked.mode, blocker.seq || ',' || blocked.pid::text FROM r blocker JOIN t blocked ON (blocked.blocker_pid = blocker.pid) WHERE blocker.depth < 1000 )SELECT * FROM r ORDER BY seq; (0 rows) no finding here either :( best, /mstelios Stelios Mavromichalis Cytech Ltd. - http://www.cytech.gr/ Science & Technology Park of Crete fax: +30 2810 31 1045 tel.: +30 2810 31 4127 mob.: +30 697 7078013 skype: mstelios On Tue, May 6, 2014 at 2:54 AM, David G Johnston <david.g.johns...@gmail.com > wrote: > Stelios Mavromichalis wrote > > as a prior step to dump/restore i am thinking of deleting and > re-inserting > > that particular row. that might share some light you think? > > I still dislike the randomness of the unresponsiveness... > > Every time you perform an update you "delete and insert" that row - that is > how an update works in MVCC - so doing so explicitly is unlikely to provide > any benefit. Since that row is continually being inserted, and no other > rows are having this issue, I'm seriously doubting that a dump/restore is > going to have any effect either. Note that the index scan took twice as > long in the bad case - but still reasonable and you didn't notice any > benefit from a REINDEX. This is what I would expect. > > The only other difference, if concurrency has been ruled out, is the 4 vs > 18 > buffers that had to be read. I cannot imagine that, since all 22 were in > cache, that simply reading that much more data would account for the > difference (we're talking a 10,000-fold increase, not 2to4-fold). The > reason for this particular difference, IIUC, is how may candidate tuples > are > present whose visibility has to be accounted for (assuming 1 buffer per > tuple, you needed to scan 4 vs 18 for visibility in the two queries). > > Is there any log file information you can share? Especially if you can set > log_min_statement_duration (or whatever that GUC is named) so that whenever > one of these gets adversely delayed it appears in the log along with > whatever other system messages are being sent. Checkpoints are a typical > culprit though that should be affecting a great deal more than what you > indicate you are seeing. > > I'm pretty certain you are seeing this here largely because of the > frequency > of activity on this particular user; not because the data itself is > corrupted. It could be some kind of symptom of internal concurrency that > you just haven't observed yet but it could also be I/O or other system > contention that you also haven't properly instrumented. Unfortunately that > is beyond my current help-providing skill-set. > > A dump-restore likely would not make anything worse though I'd be surprised > if it were to improve matters. It also doesn't seem like hardware - unless > the RAM is bad. Software bugs are unlikely if this had been working well > before 5 days ago. So, you need to observe the system during both periods > (good and bad) and observe something that is different - probably not > within > PostgreSQL if indeed you've minimized concurrency. And also see if you can > see if any other queries, executed during both these times, exhibit a > performance decrease. Logging all statements would help matters greatly if > you can afford it in your production environment - it would make looking > for > internal concurrency much easier. > > David J. > > > > -- > View this message in context: > http://postgresql.1045698.n5.nabble.com/Re-recently-and-selectively-slow-but-very-simple-update-query-tp5802553p5802579.html > Sent from the PostgreSQL - performance mailing list archive at Nabble.com. > > > -- > Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-performance >