On Mon, Sep 23, 2019 at 8:57 AM Tom Lane <t...@sss.pgh.pa.us> wrote: > Maybe check for waiting on a lock? > It'd be useful to look in pg_stat_activity and/or top(1) while the > initial query is running, to see if it seems to be eating CPU or > is blocked on some condition. >
I think this will provide information that makes it seem less weird. With your suggestion, I monitored postgres via top and pg_stat_activity in the various scenarios and found that an autovacuum was triggering which covered tables used in the insert into select. What seems to be happening is the autovacuum takes just about the same time as I give the query to run before giving up on it. The next time I run the query, the autovacuum is complete and the query runs normally. Of course, I'd like to understand why the query never finishes. when autovacuum is running: PID PPID USER STAT VSZ %VSZ CPU %CPU COMMAND 38 1 postgres R 158m 8% 1 50% postgres: revwaste portal-local 172.19.0.5(39956) SELECT 36 1 postgres S 171m 9% 0 0% postgres: autovacuum worker process portal-local 34 1 postgres S 186m 9% 0 0% postgres: autovacuum worker process portal-local after autovacuum finishes and during remaining soft-lock: PID PPID USER STAT VSZ %VSZ CPU %CPU COMMAND 45 1 postgres R 259m 13% 0 50% postgres: revwaste portal-local 172.19.0.5(39962) SELECT 20 1 postgres S 153m 8% 1 0% postgres: writer process 22 1 postgres S 153m 8% 0 0% postgres: autovacuum launcher process Same for pg_stat_activity: | backend_start | xact_start | query_start | state_change | wait_event_type | wait_event | state | backend_xid | backend_xmin | query -------+--------------+-----+----------+----------+------------------+-------------+-----------------+-------------+-------------------------------+-------------------------------+-------------------------------+------------------------------ | 2019-09-23 20:29:45.127527+00 | 2019-09-23 20:29:45.12886+00 | 2019-09-23 20:29:45.12886+00 | 2019-09-23 20:29:45.128861+00 | | | active | 808 | 808 | select import_wss() | 2019-09-23 20:30:01.624853+00 | 2019-09-23 20:30:58.047317+00 | 2019-09-23 20:30:58.047317+00 | 2019-09-23 20:30:58.047318+00 | | | active | | 808 | autovacuum: ANALYZE wss.RowCoding | 2019-09-23 20:31:01.644824+00 | 2019-09-23 20:31:01.666711+00 | 2019-09-23 20:31:01.666711+00 | 2019-09-23 20:31:01.666712+00 | | | active | | 808 | autovacuum: ANALYZE wss.WSSData | 2019-09-23 20:31:14.101808+00 | 2019-09-23 20:31:14.103306+00 | 2019-09-23 20:31:14.103306+00 | 2019-09-23 20:31:14.103307+00 | | | active | | 808 | select * from pg_stat_activity (4 rows) | backend_start | xact_start | query_start | state_change | wait_event_type | wait_event | state | backend_xid | backend_xmin | query -------+--------------+-----+----------+----------+------------------+-------------+-----------------+-------------+-------------------------------+-------------------------------+-------------------------------+------------------------------- | 2019-09-23 20:29:45.127527+00 | 2019-09-23 20:29:45.12886+00 | 2019-09-23 20:29:45.12886+00 | 2019-09-23 20:29:45.128861+00 | | | active | 808 | 808 | select import_wss() | 2019-09-23 20:34:21.01283+00 | 2019-09-23 20:34:21.014473+00 | 2019-09-23 20:34:21.014473+00 | 2019-09-23 20:34:21.014475+00 | | | active | | 808 | select * from pg_stat_activity (2 rows) > Can you create a self-contained test case that acts like this? > I can try to duplicate it if this doesn't provide the details needed. corey