Hi,

I am running into a strange issue with Postgres 10 when using pg_dump with
the directory format and jobs parameter set it intermittently hangs.  Seems
to occur less frequently the lower I set the jobs parameter but does happen
eventually even when set to 2 (could not reproduce when jobs=1).  I've
tested with Postgres 11 and no matter how high I set the jobs parameter it
never hangs (tested all the way up to jobs=8).  Also with Postgres 10 and
other formats I don't run into the issue.  Here are my server specs and
software versions (OS and Postgres)

Windows Server 2016 DataCenter  10.0.14393
2 cores  4 logical processors
26GB memory

Postgres 11.4 (port 5433)
Postgres 10.9 (port 5432)

-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
And here are the psql statements and powershell script to recreate the
issue.

CREATE DATABASE pg_dump_hang;
\c pg_dump_hang
CREATE TABLE series1 AS SELECT i FROM generate_series(1,1000000) i;
CREATE TABLE series2 AS SELECT i FROM generate_series(1,1000000) i;
CREATE TABLE series3 AS SELECT i FROM generate_series(1,1000000) i;
CREATE TABLE series4 AS SELECT i FROM generate_series(1,1000000) i;
CREATE TABLE series5 AS SELECT i FROM generate_series(1,1000000) i;
CREATE TABLE series6 AS SELECT i FROM generate_series(1,1000000) i;
CREATE TABLE series7 AS SELECT i FROM generate_series(1,1000000) i;
CREATE TABLE series8 AS SELECT i FROM generate_series(1,1000000) i;


$env:PGPASSWORD='#######'
$env:PGUSER='postgres'
$env:PGPORT=5433
set-location "C:\program files\postgresql\11\bin"
$a = 1
DO
{
   remove-item -Recurse D:\dumphangtest
   .\pg_dump.exe --format=directory --jobs=4 --file=D:\dumphangtest
pg_dump_hang
   $a++
   "Completed loop $a at -" + (date)
} While ($a -le 25)

$env:PGPASSWORD='#######'
$env:PGUSER='postgres'
$env:PGPORT=5432
set-location "c:\program files\postgresql\10\bin"
$a = 1
DO
{
   remove-item -Recurse D:\dumphangtest
   .\pg_dump.exe --format=directory --jobs=4 --file=D:\dumphangtest
pg_dump_hang
   $a++
   "Completed loop $a at -" + (date)
} While ($a -le 25)

-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Here is the output from pg_stat_activity and the filesystem listing when
the hang occurs.

--jobs=4
postgres=# select
pid,state,state_change,query_start,wait_event,wait_event_type,substring(query,1,35)
from pg_stat_activity where datname like 'pg_dump%';
 pid  |        state        |         state_change          |
 query_start          | wait_event  | wait_event_type |
 substring
------+---------------------+-------------------------------+-------------------------------+-------------+-----------------+-------------------------------------
 5324 | idle in transaction | 2019-06-27 15:29:13.03416+01  | 2019-06-27
15:29:13.03154+01  | ClientRead  | Client          | SELECT at.attname,
(SELECT pg_catal
 4144 | idle in transaction | 2019-06-27 15:29:16.222315+01 | 2019-06-27
15:29:15.070106+01 | ClientRead  | Client          | COPY public.series6
(i) TO stdout;
 5576 | active              | 2019-06-27 15:29:13.642069+01 | 2019-06-27
15:29:13.642068+01 | ClientWrite | Client          | COPY public.series2
(i) TO stdout;
 3860 | idle in transaction | 2019-06-27 15:29:17.109549+01 | 2019-06-27
15:29:16.198842+01 | ClientRead  | Client          | COPY public.series8
(i) TO stdout;
 3120 | idle in transaction | 2019-06-27 15:29:16.350121+01 | 2019-06-27
15:29:15.089817+01 | ClientRead  | Client          | COPY public.series7
(i) TO stdout;
(5 rows)

D:\>dir dumphangtest
 Volume in drive D has no label.
 Volume Serial Number is C635-CDEF

 Directory of D:\dumphangtest

06/27/2019  10:29 AM    <DIR>          .
06/27/2019  10:29 AM    <DIR>          ..
06/27/2019  10:29 AM         2,114,899 2863.dat.gz
06/27/2019  10:29 AM           376,832 2864.dat.gz
06/27/2019  10:29 AM         2,114,899 2865.dat.gz
06/27/2019  10:29 AM         2,114,899 2866.dat.gz
06/27/2019  10:29 AM         2,114,899 2867.dat.gz
06/27/2019  10:29 AM         2,114,899 2868.dat.gz
06/27/2019  10:29 AM         2,114,899 2869.dat.gz
06/27/2019  10:29 AM         2,114,899 2870.dat.gz
06/27/2019  10:29 AM             8,452 toc.dat
               9 File(s)     15,189,577 bytes
               2 Dir(s)  102,021,439,488 bytes free

--jobs=2
postgres=# select
pid,state,state_change,query_start,wait_event,wait_event_type,substring(query,1,35)
from pg_stat_activity where datname like 'pg_dump%';
 pid  |        state        |         state_change          |
 query_start          | wait_event  | wait_event_type |
 substring
------+---------------------+-------------------------------+-------------------------------+-------------+-----------------+-------------------------------------
 3420 | idle in transaction | 2019-06-27 14:37:55.067559+01 | 2019-06-27
14:37:54.448198+01 | ClientRead  | Client          | COPY public.series8
(i) TO stdout;
 3180 | idle in transaction | 2019-06-27 14:37:49.537396+01 | 2019-06-27
14:37:49.53608+01  | ClientRead  | Client          | SELECT at.attname,
(SELECT pg_catal
 5364 | active              | 2019-06-27 14:37:50.151002+01 | 2019-06-27
14:37:50.151001+01 | ClientWrite | Client          | COPY public.series2
(i) TO stdout;
(3 rows)

D:\>dir dumphangtest
 Volume in drive D has no label.
 Volume Serial Number is C635-CDEF

 Directory of D:\dumphangtest

06/27/2019  09:37 AM    <DIR>          .
06/27/2019  09:37 AM    <DIR>          ..
06/27/2019  09:37 AM         2,114,899 2863.dat.gz
06/27/2019  09:37 AM            81,920 2864.dat.gz
06/27/2019  09:37 AM         2,114,899 2865.dat.gz
06/27/2019  09:37 AM         2,114,899 2866.dat.gz
06/27/2019  09:37 AM         2,114,899 2867.dat.gz
06/27/2019  09:37 AM         2,114,899 2868.dat.gz
06/27/2019  09:37 AM         2,114,899 2869.dat.gz
06/27/2019  09:37 AM         2,114,899 2870.dat.gz
06/27/2019  09:37 AM             8,452 toc.dat
               9 File(s)     14,894,665 bytes
               2 Dir(s)  102,056,497,152 bytes free

----------------------------------------------------------------------------------------------------------

And here is the stack trace from the thread of the hung process.

ntoskrnl.exe!KiSwapContext+0x76
ntoskrnl.exe!KiSwapThread+0x17d
ntoskrnl.exe!KiCommitThreadWait+0x14f
ntoskrnl.exe!KeWaitForSingleObject+0x377
ntoskrnl.exe!KiSchedulerApc+0x231
ntoskrnl.exe!KiDeliverApc+0x22a
ntoskrnl.exe!KiSwapThread+0x364
ntoskrnl.exe!KiCommitThreadWait+0x14f
ntoskrnl.exe!KeWaitForMultipleObjects+0x1fe
ntoskrnl.exe!ObWaitForMultipleObjects+0x2c7
ntoskrnl.exe!NtWaitForMultipleObjects+0xf9
ntoskrnl.exe!KiSystemServiceCopyEnd+0x13
ntdll.dll!NtWaitForMultipleObjects+0x14
KERNELBASE.dll!WaitForMultipleObjectsEx+0xef
KERNELBASE.dll!WaitForMultipleObjects+0xe
postgres.exe!WaitLatchOrSocket+0x456
postgres.exe!WaitEventSetWait+0x8e
postgres.exe!secure_write+0xeb
postgres.exe!pq_putbytes+0x4e5
postgres.exe!pq_putbytes+0x41b
postgres.exe!pq_putbytes+0x256
postgres.exe!CreateCopyDestReceiver+0x3784
postgres.exe!CreateCopyDestReceiver+0x17b2
postgres.exe!CreateCopyDestReceiver+0x144c
postgres.exe!CreateCopyDestReceiver+0x1098
postgres.exe!DoCopy+0x50a
postgres.exe!standard_ProcessUtility+0x37b
pg_stat_statements.dll!pg_finfo_pg_stat_statements+0xde1
postgres.exe!ProcessUtility+0x34
postgres.exe!PortalRunFetch+0x891
postgres.exe!PortalRunFetch+0xbae
postgres.exe!PortalRun+0x1a0
postgres.exe!get_stats_option_name+0x1335
postgres.exe!PostgresMain+0x635
postgres.exe!ShmemBackendArrayAllocation+0x2b3a
postgres.exe!SubPostmasterMain+0x275
postgres.exe!main+0x480
postgres.exe!pgwin32_popen+0x144f
KERNEL32.DLL!BaseThreadInitThunk+0x14
ntdll.dll!RtlUserThreadStart+0x21

Any ideas on changes in Postgres 10 that would cause this?  We were
previously running 9.6.7 and didn't encounter this issue on that version
either.

Thanks
Ben.

Ben Snaidero
*Geotab*
Senior Database Specialist
Direct +1 (289) 230-7749
Toll-free +1 (877) 436-8221
Visit www.geotab.com
Twitter <https://twitter.com/geotab> | Facebook
<https://www.facebook.com/Geotab> | YouTube
<https://www.youtube.com/user/MyGeotab> | LinkedIn
<https://www.linkedin.com/company/geotab/>

Reply via email to