Re: [HACKERS] Freeze on Cygwin w/ concurrency
On Mon, Mar 20, 2017 at 11:47:03PM -0400, Noah Misch wrote: > "pgbench -i -s 50; pgbench -S -j2 -c16 -T900 -P5" freezes consistently on > Cygwin 2.2.1 and Cygwin 2.6.0. (I suspect most other versions are affected.) > I've pinged[1] the Cygwin bug thread with some additional detail. The problem was cygserver thread exhaustion; cygserver needs a thread per simultaneous waiter. With "cygserver -r 40" or the equivalent config file setting, this test does not freeze. Cygwin 2.8.0 introduced a change to dynamically grow the thread count: https://cygwin.com/git/gitweb.cgi?p=newlib-cygwin.git;a=commitdiff;h=0b73dba4de3fdadde499edfbc7ca9d9a01c11487 However, Cygwin 2.8.0 introduced another source of cygserver freezes: https://cygwin.com/git/gitweb.cgi?p=newlib-cygwin.git;a=commitdiff;h=b80b2c011936f7f075b76b6e59f9e8a5ec49caa1 The 2.8.0-specific freezes have no known workaround. Cygwin 2.8.1 works, having reverted the problem commit. Do not use PostgreSQL with Cygwin 2.8.0. > If a Cygwin > buildfarm member starts using --enable-tap-tests, you may see failures in the > pgbench test suite. (lorikeet used --enable-tap-tests from 2017-03-18 to > 2017-03-20, but it failed before reaching the pgbench test suite.) Curious > that "make check" has too little concurrency to see more effects from this. I now understand the bug required eleven concurrent lock waiters, and it's plausible that "make check" doesn't experience that. The pgbench test suite uses -c5, so I expect it to be stable on almost any Cygwin. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Freeze on Cygwin w/ concurrency
On 03/20/2017 11:47 PM, Noah Misch wrote: > "pgbench -i -s 50; pgbench -S -j2 -c16 -T900 -P5" freezes consistently on > Cygwin 2.2.1 and Cygwin 2.6.0. (I suspect most other versions are affected.) > I've pinged[1] the Cygwin bug thread with some additional detail. If a Cygwin > buildfarm member starts using --enable-tap-tests, you may see failures in the > pgbench test suite. (lorikeet used --enable-tap-tests from 2017-03-18 to > 2017-03-20, but it failed before reaching the pgbench test suite.) Curious > that "make check" has too little concurrency to see more effects from this. Yeah, I abandoned --enable-tap-test on lorikeet, didn't have time to get to the bottom of the problems. Glad I'm not totally alone keeping this alive. cheers andrew -- Andrew Dunstanhttps://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Freeze on Cygwin w/ concurrency
On Mon, Mar 20, 2017 at 11:47 PM, Noah Misch wrote: > "pgbench -i -s 50; pgbench -S -j2 -c16 -T900 -P5" freezes consistently on > Cygwin 2.2.1 and Cygwin 2.6.0. (I suspect most other versions are affected.) > I've pinged[1] the Cygwin bug thread with some additional detail. Ouch. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Freeze on Cygwin w/ concurrency
"pgbench -i -s 50; pgbench -S -j2 -c16 -T900 -P5" freezes consistently on Cygwin 2.2.1 and Cygwin 2.6.0. (I suspect most other versions are affected.) I've pinged[1] the Cygwin bug thread with some additional detail. If a Cygwin buildfarm member starts using --enable-tap-tests, you may see failures in the pgbench test suite. (lorikeet used --enable-tap-tests from 2017-03-18 to 2017-03-20, but it failed before reaching the pgbench test suite.) Curious that "make check" has too little concurrency to see more effects from this. Frozen backends show a stack trace like this: #0 0x7710139a in ntdll!ZwWriteFile () from /cygdrive/c/Windows/SYSTEM32/ntdll.dll #1 0x07fefd851b2b in WriteFile () from /cygdrive/c/Windows/system32/KERNELBASE.dll #2 0x76fb3576 in WriteFile () from /cygdrive/c/Windows/system32/kernel32.dll #3 0x000180160c6c in transport_layer_pipes::write (this=, buf=, len=) at /usr/src/debug/cygwin-2.6.0-1/winsup/cygserver/transport_pipes.cc:224 #4 0x00018015feb6 in client_request::send (this=0xa930, conn=0x6000e8290) at /usr/src/debug/cygwin-2.6.0-1/winsup/cygserver/client.cc:134 #5 0x000180160591 in client_request::make_request (this=this@entry=0xa930) at /usr/src/debug/cygwin-2.6.0-1/winsup/cygserver/client.cc:473 #6 0x000180114f79 in semop (semid=65540, sops=0xaa00, nsops=1) at /usr/src/debug/cygwin-2.6.0-1/winsup/cygwin/sem.cc:125 #7 0x000180117a4b in _sigfe () at sigfe.s:35 #8 0x00010063c81a in PGSemaphoreLock (sema=sema@entry=0x6e06a18) at pg_sema.c:387 #9 0x0001006a962b in LWLockAcquire (lock=lock@entry=0x6fff6774d80, mode=mode@entry=LW_SHARED) at lwlock.c:1286 #10 0x000100687d46 in BufferAlloc (foundPtr=0xab0b , strategy=0x0, blockNum=290, forkNum=MAIN_FORKNUM, relpersistence=112 'p', smgr=0x6000ea588) at bufmgr.c:1012 The postmaster, also frozen, shows a stack trace like this: #0 0x771018ca in ntdll!ZwWaitForMultipleObjects () from /cygdrive/c/Windows/SYSTEM32/ntdll.dll #1 0x07fefd851420 in KERNELBASE!GetCurrentProcess () from /cygdrive/c/Windows/system32/KERNELBASE.dll #2 0x76fa1220 in WaitForMultipleObjects () from /cygdrive/c/Windows/system32/kernel32.dll #3 0x000180120173 in child_info::sync (this=this@entry=0xc008, pid=4692, hProcess=@0xc1b0: 0x4b8, howlong=howlong@entry=30) at /usr/src/debug/cygwin-2.6.0-1/winsup/cygwin/sigproc.cc:1010 #4 0x0001800aa163 in frok::parent (this=0xc000, stack_here=0xbfa0 "") at /usr/src/debug/cygwin-2.6.0-1/winsup/cygwin/fork.cc:501 #5 0x0001800aaa05 in fork () at /usr/src/debug/cygwin-2.6.0-1/winsup/cygwin/fork.cc:607 #6 0x000180117a4b in _sigfe () at sigfe.s:35 #7 0x000100641618 in fork_process () at fork_process.c:61 #8 0x00010063e80a in StartAutoVacWorker () at autovacuum.c:1436 The postmaster log eventually has: 28 [main] postgres 4408 child_info::sync: wait failed, pid 4692, Win32 error 183 292 [main] postgres 4408 fork: child 4692 - died waiting for dll loading, errno 11 [1] https://cygwin.com/ml/cygwin/2017-03/msg00218.html -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers