On Mon, Sep 23, 2019 at 8:57 AM Tom Lane <t...@sss.pgh.pa.us> wrote:

> Maybe check for waiting on a lock?
> It'd be useful to look in pg_stat_activity and/or top(1) while the
> initial query is running, to see if it seems to be eating CPU or
> is blocked on some condition.
>

I think this will provide information that makes it seem less weird.

With your suggestion, I monitored postgres via top and pg_stat_activity in
the various scenarios and found that an autovacuum was triggering which
covered tables used in the insert into select.  What seems to be happening
is the autovacuum takes just about the same time as I give the query to run
before giving up on it.  The next time I run the query, the autovacuum is
complete and the query runs normally.

Of course, I'd like to understand why the query never finishes.

when autovacuum is running:
PID  PPID USER     STAT   VSZ %VSZ CPU %CPU COMMAND
   38     1 postgres R     158m   8%   1  50% postgres: revwaste
portal-local 172.19.0.5(39956) SELECT
   36     1 postgres S     171m   9%   0   0% postgres: autovacuum worker
process   portal-local
   34     1 postgres S     186m   9%   0   0% postgres: autovacuum worker
process   portal-local

after autovacuum finishes and during remaining soft-lock:
   PID  PPID USER     STAT   VSZ %VSZ CPU %CPU COMMAND
   45     1 postgres R     259m  13%   0  50% postgres: revwaste
portal-local 172.19.0.5(39962) SELECT
   20     1 postgres S     153m   8%   1   0% postgres: writer process
   22     1 postgres S     153m   8%   0   0% postgres: autovacuum launcher
process

Same for pg_stat_activity:

|         backend_start         |          xact_start           |
 query_start          |         state_change          | wait_event_type |
wait_event | state  | backend_xid | backend_xmin |               query
-------+--------------+-----+----------+----------+------------------+-------------+-----------------+-------------+-------------------------------+-------------------------------+-------------------------------+------------------------------
| 2019-09-23 20:29:45.127527+00 | 2019-09-23 20:29:45.12886+00  |
2019-09-23 20:29:45.12886+00  | 2019-09-23 20:29:45.128861+00 |
    |            | active |         808 |          808 | select import_wss()
| 2019-09-23 20:30:01.624853+00 | 2019-09-23 20:30:58.047317+00 |
2019-09-23 20:30:58.047317+00 | 2019-09-23 20:30:58.047318+00 |
    |            | active |             |          808 | autovacuum:
ANALYZE wss.RowCoding
| 2019-09-23 20:31:01.644824+00 | 2019-09-23 20:31:01.666711+00 |
2019-09-23 20:31:01.666711+00 | 2019-09-23 20:31:01.666712+00 |
    |            | active |             |          808 | autovacuum:
ANALYZE wss.WSSData
| 2019-09-23 20:31:14.101808+00 | 2019-09-23 20:31:14.103306+00 |
2019-09-23 20:31:14.103306+00 | 2019-09-23 20:31:14.103307+00 |
    |            | active |             |          808 | select * from
pg_stat_activity
(4 rows)

|         backend_start         |          xact_start           |
 query_start          |         state_change          | wait_event_type |
wait_event | state  | backend_xid | backend_xmin |             query
-------+--------------+-----+----------+----------+------------------+-------------+-----------------+-------------+-------------------------------+-------------------------------+-------------------------------+-------------------------------
| 2019-09-23 20:29:45.127527+00 | 2019-09-23 20:29:45.12886+00  |
2019-09-23 20:29:45.12886+00  | 2019-09-23 20:29:45.128861+00 |
    |            | active |         808 |          808 | select import_wss()
| 2019-09-23 20:34:21.01283+00  | 2019-09-23 20:34:21.014473+00 |
2019-09-23 20:34:21.014473+00 | 2019-09-23 20:34:21.014475+00 |
    |            | active |             |          808 | select * from
pg_stat_activity
(2 rows)


> Can you create a self-contained test case that acts like this?
>

I can try to duplicate it if this doesn't provide the details needed.

corey

Reply via email to