Re: [PERFORM] Re: recently and selectively slow, but very simple, update query....

Stelios Mavromichalis Tue, 06 May 2014 06:33:21 -0700

hello again,

some more data on the subject:


you are right, delete/re-insert didn't solve the problem (haven't yet tried
dump/restore, i might tonight, when low traffic).

a short snip from logs were i log all queries that take longer then 1sec:

2014-05-06T15:57:46.303880+03:00 s10 postgres[46220]: [1891-1] prosms
prosms LOG:  00000: duration: 1947.172 ms  execute <unnamed>: select
easysms_jb_pay($1,
$2)
2014-05-06T15:57:46.304230+03:00 s10 postgres[46220]: [1891-2] prosms
prosms DETAIL:  parameters: $1 = '10808', $2 = '2'
2014-05-06T15:57:46.304439+03:00 s10 postgres[46220]: [1891-3] prosms
prosms LOCATION:  exec_execute_message, postgres.c:1989
2014-05-06T15:57:56.199005+03:00 s10 postgres[58002]: [2570-1] prosms
prosms LOG:  00000: duration: 6869.886 ms  execute <unnamed>: select
easysms_jb_pay($1,$2)
2014-05-06T15:57:56.199349+03:00 s10 postgres[58002]: [2570-2] prosms
prosms DETAIL:  parameters: $1 = '10808', $2 = '2'
2014-05-06T15:57:56.199567+03:00 s10 postgres[58002]: [2570-3] prosms
prosms LOCATION:  exec_execute_message, postgres.c:1989
2014-05-06T15:57:59.134982+03:00 s10 postgres[58002]: [2571-1] prosms
prosms LOG:  00000: duration: 1797.747 ms  execute <unnamed>: select
easysms_jb_pay($1,$2)
2014-05-06T15:57:59.135334+03:00 s10 postgres[58002]: [2571-2] prosms
prosms DETAIL:  parameters: $1 = '10808', $2 = '2'
2014-05-06T15:57:59.135562+03:00 s10 postgres[58002]: [2571-3] prosms
prosms LOCATION:  exec_execute_message, postgres.c:1989
2014-05-06T15:58:07.149477+03:00 s10 postgres[46220]: [1892-1] prosms
prosms LOG:  00000: duration: 3938.979 ms  execute <unnamed>: select
easysms_jb_pay($1,$2)
2014-05-06T15:58:07.149830+03:00 s10 postgres[46220]: [1892-2] prosms
prosms DETAIL:  parameters: $1 = '10808', $2 = '2'
2014-05-06T15:58:07.150067+03:00 s10 postgres[46220]: [1892-3] prosms
prosms LOCATION:  exec_execute_message, postgres.c:1989
2014-05-06T16:01:33.784422+03:00 s10 postgres[58002]: [2572-1] prosms
prosms LOG:  00000: duration: 2921.212 ms  execute <unnamed>: select
easysms_jb_pay($1,$2)
2014-05-06T16:01:33.784842+03:00 s10 postgres[58002]: [2572-2] prosms
prosms DETAIL:  parameters: $1 = '10808', $2 = '4'
2014-05-06T16:01:33.785037+03:00 s10 postgres[58002]: [2572-3] prosms
prosms LOCATION:  exec_execute_message, postgres.c:1989


should you deem helpful yes i can enable full query logging and summon the
listing for you.

a typical vmstat:
# vmstat 1
procs -----------memory---------- ---swap-- -----io---- -system--
-----cpu------
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id
wa st
 1  0 363476 325852 294064 30520736    0    0    26   347    0    0  1  0
99  0  0
 1  0 363476 328400 294068 30520732    0    0     0    28 4930 8014  6  1
94  0  0
 2  0 363476 331756 294068 30520732    0    0     0  4384 4950 7980  6  1
93  0  0
 0  0 363476 334016 294068 30520756    0    0     0  4384 4961 7981  7  1
92  0  0
 0  0 363476 334700 294068 30520756    0    0     0  4424 4012 6467  4  1
95  0  0
 1  0 363476 330852 294068 30520788    0    0     0     0 2559 3861  5  1
95  0  0
 1  0 363476 331316 294072 30520788    0    0     0  4408 5013 8127  6  1
94  0  0
 1  0 363476 330788 294072 30520788    0    0     0  4384 5535 9055  6  1
93  0  0
 0  0 363476 331496 294072 30520804    0    0     0  4384 5031 8092  7  1
92  0  0
 2  0 363476 331268 294072 30520804    0    0     0  4428 5052 8246  6  1
94  0  0
 1  0 363476 330848 294080 30520812    0    0     0    32 4892 7950  5  1
94  0  0
 1  0 363476 330480 294080 30520812    0    0     0  4388 4935 8036  6  1
94  0  0
 2  0 363476 332616 294084 30521092    0    0     0  4408 5064 8194  6  1
93  0  0
 0  0 363476 333596 294084 30521008    0    0     0  4384 5205 8463  8  1
91  0  0
 1  0 363476 333324 294084 30521008    0    0     0    40 5014 8151  6  1
94  0  0
 0  0 363476 332740 294084 30521016    0    0     0  4384 5047 8163  6  1
93  0  0
 1  0 363476 336052 294084 30520888    0    0     0  4384 4849 7780  6  1
94  0  0
 1  0 363476 334732 294088 30520892    0    0     8  4400 5520 9012  6  1
93  0  0
 0  0 363476 334064 294096 30520884    0    0     0   220 3903 6193  6  1
94  0  0
 0  0 363476 333124 294096 30520916    0    0     0  2232 4088 6462  6  1
93  0  0

the process that is constantly writing the majority of data is:

"postgres: stats collector process"

and varies from 2.5mb/sec up to 5.7mb/sec

all the other postgres (and non-postgres) processes write very little data
and rarely.

the checkpointer process is like 78.6kb/sec (a few seconds now as i write
this email but no other is having a constant rate or I/O)

also, _while having the problem_ the results of the following queries are
(taken from http://wiki.postgresql.org/wiki/Lock_Monitoring):

 SELECT relation::regclass, * FROM pg_locks WHERE NOT granted;
(0 rows)


 SELECT bl.pid                 AS blocked_pid,
         a.usename              AS blocked_user,
         ka.query               AS blocking_statement,
         now() - ka.query_start AS blocking_duration,
         kl.pid                 AS blocking_pid,
         ka.usename             AS blocking_user,
         a.query                AS blocked_statement,
         now() - a.query_start  AS blocked_duration
  FROM  pg_catalog.pg_locks         bl
   JOIN pg_catalog.pg_stat_activity a  ON a.pid = bl.pid
   JOIN pg_catalog.pg_locks         kl ON kl.transactionid =
bl.transactionid AND kl.pid != bl.pid
   JOIN pg_catalog.pg_stat_activity ka ON ka.pid = kl.pid
  WHERE NOT bl.granted;
(0 rows)

SELECT
    waiting.locktype           AS waiting_locktype,
    waiting.relation::regclass AS waiting_table,
    waiting_stm.query          AS waiting_query,
    waiting.mode               AS waiting_mode,
    waiting.pid                AS waiting_pid,
    other.locktype             AS other_locktype,
    other.relation::regclass   AS other_table,
    other_stm.query            AS other_query,
    other.mode                 AS other_mode,
    other.pid                  AS other_pid,
    other.granted              AS other_grantedFROM
    pg_catalog.pg_locks AS waitingJOIN
    pg_catalog.pg_stat_activity AS waiting_stm
    ON (
        waiting_stm.pid = waiting.pid
    )JOIN
    pg_catalog.pg_locks AS other
    ON (
        (
            waiting."database" = other."database"
        AND waiting.relation  = other.relation
        )
        OR waiting.transactionid = other.transactionid
    )JOIN
    pg_catalog.pg_stat_activity AS other_stm
    ON (
        other_stm.pid = other.pid
    )WHERE
    NOT waiting.grantedAND
    waiting.pid <> other.pid;
(0 rows)



WITH RECURSIVE
     c(requested, current) AS
       ( VALUES
         ('AccessShareLock'::text, 'AccessExclusiveLock'::text),
         ('RowShareLock'::text, 'ExclusiveLock'::text),
         ('RowShareLock'::text, 'AccessExclusiveLock'::text),
         ('RowExclusiveLock'::text, 'ShareLock'::text),
         ('RowExclusiveLock'::text, 'ShareRowExclusiveLock'::text),
         ('RowExclusiveLock'::text, 'ExclusiveLock'::text),
         ('RowExclusiveLock'::text, 'AccessExclusiveLock'::text),
         ('ShareUpdateExclusiveLock'::text, 'ShareUpdateExclusiveLock'::text),
         ('ShareUpdateExclusiveLock'::text, 'ShareLock'::text),
         ('ShareUpdateExclusiveLock'::text, 'ShareRowExclusiveLock'::text),
         ('ShareUpdateExclusiveLock'::text, 'ExclusiveLock'::text),
         ('ShareUpdateExclusiveLock'::text, 'AccessExclusiveLock'::text),
         ('ShareLock'::text, 'RowExclusiveLock'::text),
         ('ShareLock'::text, 'ShareUpdateExclusiveLock'::text),
         ('ShareLock'::text, 'ShareRowExclusiveLock'::text),
         ('ShareLock'::text, 'ExclusiveLock'::text),
         ('ShareLock'::text, 'AccessExclusiveLock'::text),
         ('ShareRowExclusiveLock'::text, 'RowExclusiveLock'::text),
         ('ShareRowExclusiveLock'::text, 'ShareUpdateExclusiveLock'::text),
         ('ShareRowExclusiveLock'::text, 'ShareLock'::text),
         ('ShareRowExclusiveLock'::text, 'ShareRowExclusiveLock'::text),
         ('ShareRowExclusiveLock'::text, 'ExclusiveLock'::text),
         ('ShareRowExclusiveLock'::text, 'AccessExclusiveLock'::text),
         ('ExclusiveLock'::text, 'RowShareLock'::text),
         ('ExclusiveLock'::text, 'RowExclusiveLock'::text),
         ('ExclusiveLock'::text, 'ShareUpdateExclusiveLock'::text),
         ('ExclusiveLock'::text, 'ShareLock'::text),
         ('ExclusiveLock'::text, 'ShareRowExclusiveLock'::text),
         ('ExclusiveLock'::text, 'ExclusiveLock'::text),
         ('ExclusiveLock'::text, 'AccessExclusiveLock'::text),
         ('AccessExclusiveLock'::text, 'AccessShareLock'::text),
         ('AccessExclusiveLock'::text, 'RowShareLock'::text),
         ('AccessExclusiveLock'::text, 'RowExclusiveLock'::text),
         ('AccessExclusiveLock'::text, 'ShareUpdateExclusiveLock'::text),
         ('AccessExclusiveLock'::text, 'ShareLock'::text),
         ('AccessExclusiveLock'::text, 'ShareRowExclusiveLock'::text),
         ('AccessExclusiveLock'::text, 'ExclusiveLock'::text),
         ('AccessExclusiveLock'::text, 'AccessExclusiveLock'::text)
       ),
     l AS
       (
         SELECT
             
(locktype,DATABASE,relation::regclass::text,page,tuple,virtualxid,transactionid,classid,objid,objsubid)
AS target,
             virtualtransaction,
             pid,
             mode,
             granted
           FROM pg_catalog.pg_locks
       ),
     t AS
       (
         SELECT
             blocker.target  AS blocker_target,
             blocker.pid     AS blocker_pid,
             blocker.mode    AS blocker_mode,
             blocked.target  AS target,
             blocked.pid     AS pid,
             blocked.mode    AS mode
           FROM l blocker
           JOIN l blocked
             ON ( NOT blocked.granted
              AND blocker.granted
              AND blocked.pid != blocker.pid
              AND blocked.target IS NOT DISTINCT FROM blocker.target)
           JOIN c ON (c.requested = blocked.mode AND c.current = blocker.mode)
       ),
     r AS
       (
         SELECT
             blocker_target,
             blocker_pid,
             blocker_mode,
             '1'::int        AS depth,
             target,
             pid,
             mode,
             blocker_pid::text || ',' || pid::text AS seq
           FROM t
         UNION ALL
         SELECT
             blocker.blocker_target,
             blocker.blocker_pid,
             blocker.blocker_mode,
             blocker.depth + 1,
             blocked.target,
             blocked.pid,
             blocked.mode,
             blocker.seq || ',' || blocked.pid::text
           FROM r blocker
           JOIN t blocked
             ON (blocked.blocker_pid = blocker.pid)
           WHERE blocker.depth < 1000
       )SELECT * FROM r
  ORDER BY seq;
(0 rows)


no finding here either :(

best,

/mstelios


Stelios Mavromichalis
Cytech Ltd. - http://www.cytech.gr/
Science & Technology Park of Crete
fax: +30 2810 31 1045
tel.: +30 2810 31 4127
mob.: +30 697 7078013
skype: mstelios


On Tue, May 6, 2014 at 2:54 AM, David G Johnston <david.g.johns...@gmail.com
> wrote:

> Stelios Mavromichalis wrote
> > as a prior step to dump/restore i am thinking of deleting and
> re-inserting
> > that particular row. that might share some light you think?
>
> I still dislike the randomness of the unresponsiveness...
>
> Every time you perform an update you "delete and insert" that row - that is
> how an update works in MVCC - so doing so explicitly is unlikely to provide
> any benefit.  Since that row is continually being inserted, and no other
> rows are having this issue, I'm seriously doubting that a dump/restore is
> going to have any effect either.  Note that the index scan took twice as
> long in the bad case - but still reasonable and you didn't notice any
> benefit from a REINDEX.  This is what I would expect.
>
> The only other difference, if concurrency has been ruled out, is the 4 vs
> 18
> buffers that had to be read.  I cannot imagine that, since all 22 were in
> cache, that simply reading that much more data would account for the
> difference (we're talking a 10,000-fold increase, not 2to4-fold).  The
> reason for this particular difference, IIUC, is how may candidate tuples
> are
> present whose visibility has to be accounted for (assuming 1 buffer per
> tuple, you needed to scan 4 vs 18 for visibility in the two queries).
>
> Is there any log file information you can share?  Especially if you can set
> log_min_statement_duration (or whatever that GUC is named) so that whenever
> one of these gets adversely delayed it appears in the log along with
> whatever other system messages are being sent.  Checkpoints are a typical
> culprit though that should be affecting a great deal more than what you
> indicate you are seeing.
>
> I'm pretty certain you are seeing this here largely because of the
> frequency
> of activity on this particular user; not because the data itself is
> corrupted.  It could be some kind of symptom of internal concurrency that
> you just haven't observed yet but it could also be I/O or other system
> contention that you also haven't properly instrumented.  Unfortunately that
> is beyond my current help-providing skill-set.
>
> A dump-restore likely would not make anything worse though I'd be surprised
> if it were to improve matters.  It also doesn't seem like hardware - unless
> the RAM is bad.  Software bugs are unlikely if this had been working well
> before 5 days ago.   So, you need to observe the system during both periods
> (good and bad) and observe something that is different - probably not
> within
> PostgreSQL if indeed you've minimized concurrency.  And also see if you can
> see if any other queries, executed during both these times, exhibit a
> performance decrease.  Logging all statements would help matters greatly if
> you can afford it in your production environment - it would make looking
> for
> internal concurrency much easier.
>
> David J.
>
>
>
> --
> View this message in context:
> http://postgresql.1045698.n5.nabble.com/Re-recently-and-selectively-slow-but-very-simple-update-query-tp5802553p5802579.html
> Sent from the PostgreSQL - performance mailing list archive at Nabble.com.
>
>
> --
> Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance
>

Re: [PERFORM] Re: recently and selectively slow, but very simple, update query....

Reply via email to