[RESULT][VOTE] Release httpd-2.4.41
Hi, all; It is my pleasure to confirm that we have received enough votes to PASS the release of httpd-2.4.41! As always, I thank everyone for their diligence in testing this release and am very happy we caught the minor regression during 2.4.40 validation cycles to prevent a potentially impacting situation for users. Awesome, y'all! I detected the following +1 votes: PMC jailletc36, druggeri, humbedooh, icing, steffenal, jim, covener, jorton, rjung Community Dennis Clarke, Noel Butler No -1 votes were recorded. I will begin the process of promoting the release to the dist locations and will prep for announcement in 48-ish hours. -- Daniel Ruggeri
Re: [PATCH 63503] - Reverse proxy server - SIGSEGV
On Mon, Aug 12, 2019 at 4:33 PM Don Poitras wrote: > > Eric, > I'm not sure what you're asking. The global mutex will cover the call to > init_conn_pool(), so there isn't a problem with worker->cp getting overlayed > if that's what you're thinking. I am thinking the opposite. Two threads interleaved in ap_proxy_initialize_worker could both think they're initializing the worker. When the 2nd one gets blocked halfway through in the global mutex, the first will have created worker->cp->pool, left, and begin using it (for the DNS calls). > The problem is that ap_proxy_initialize_worker() gets called repeatedly with > the same worker pointer. The first time it's called for a particular worker, > it creates a connection pool and the worker thread mutex. > But _every_ use of worker->cp->pool must be serialized. Yes, minus the direct constructor call that only creates a subpool. > Subsequent calls to ap_proxy_initialize_worker() with the same worker pointer > will use the pool and mutex created in the first call. It's only these calls > after the first that are at risk for concurrent use of the pool (hence my use > of the 'notfirst' flag). In the first call, I don't even bother getting the > proxy mutex as there isn't any way for other threads to be using the pool > before it's created. I think the first thread has to grab the thread mutex it just created if the global lock is not held earlier than it is today (to make checking the initialization state safer). Another route to safety is to create the mutex when the worker is defined since this is at startup rather than this lazy load but it is probably a riskier change.
RE: [PATCH 63503] - Reverse proxy server - SIGSEGV
Eric, I'm not sure what you're asking. The global mutex will cover the call to init_conn_pool(), so there isn't a problem with worker->cp getting overlayed if that's what you're thinking. The problem is that ap_proxy_initialize_worker() gets called repeatedly with the same worker pointer. The first time it's called for a particular worker, it creates a connection pool and the worker thread mutex. This mutex appears to only be used in ap_proxy_determine_connection() to protect setting of worker->cp->addr by apr_sockaddr_info_get() (it also protects use of worker->cp->pool, but it's not clear that was recognized as a requirement). But _every_ use of worker->cp->pool must be serialized. Subsequent calls to ap_proxy_initialize_worker() with the same worker pointer will use the pool and mutex created in the first call. It's only these calls after the first that are at risk for concurrent use of the pool (hence my use of the 'notfirst' flag). In the first call, I don't even bother getting the proxy mutex as there isn't any way for other threads to be using the pool before it's created. -- Don Poitras - Host R&DSAS Institute Inc.SAS Campus Drive mailto:sas...@sas.com (919)531-5637 Fax:677- Cary, NC 27513
Re: [PATCH 63503] - Reverse proxy server - SIGSEGV
Via inspection this looks quite sane. > On Aug 12, 2019, at 3:24 PM, Eric Covener wrote: > > Hi Don, can you try this very similar patch? I applied yours to my > sandbox to get more context and made a few minor changes (including > pre-existing stuff that looked misleading) > > http://people.apache.org/~covener/trunk-proxy-segv.diff > >
Re: [PATCH 63503] - Reverse proxy server - SIGSEGV
Hi Don, can you try this very similar patch? I applied yours to my sandbox to get more context and made a few minor changes (including pre-existing stuff that looked misleading) http://people.apache.org/~covener/trunk-proxy-segv.diff On Mon, Aug 12, 2019 at 2:43 PM Eric Covener wrote: > > On Mon, Aug 12, 2019 at 2:37 PM Eric Covener wrote: > > > > On Mon, Aug 12, 2019 at 12:32 PM Don Poitras wrote: > > > > > > Eric, > > > The global mutex only serializes concurrent calls to > > > ap_proxy_initialize_worker(). The worker pool is also used when the > > > proxy_handler() is called from a thread kicked off from a _previous_ call > > > to ap_proxy_initialize_worker() . Turning on the pool concurrency check > > > shows this happening: > > > > > > pool concurrency check: pool 0xa04348(proxy_worker_cp), thread cur > > > 7f25a21fc700 in use by 7f2598bed700, state in use -> in use > > > > > > 172256 . Thawed libc:gsignal (+0x37) > > ><--- abort > > > 1 libc:gsignal (+0x37) > > > 2 libc:abort (+0x143) > > > 3 libapr-1:\apr_pools\pool_concurrency_abort 768 > > > 4 libapr-1:\apr_pools\apr_palloc 778 (+0xA) > > > 5 libapr-1:\thread_cond\apr_thread_cond_create 44 > > > 6 libaprutil-1:\apr_reslist\apr_reslist_create 299 (+0x9) > > > 7 mod_proxy:\proxy_util\ap_proxy_initialize_worker 2013 (+0x2F) > > > 8 mod_proxy:\mod_proxy\proxy_handler 1176 (+0xE) > > > 9 httpd:UNKNOWN at 0x004543A0 > > > > > > Here's the thead the diagnostic said had the pool 'in use' when > > > apr_palloc() was called: > > > > > > 172271Thawed libc:__GI_strncmp (+0x1680) > > > <-- in use > > > 1 libc:__GI_strncmp (+0x1680) > > > 2 libc:getenv (+0xBD) > > > 3 libc:__nscd_getai (+0x3D3) > > > 4 libc:gaih_inet.constprop.8 (+0x15F2) > > > 5 libc:getaddrinfo (+0x11F) > > > 6 libapr-1:\sockaddr\call_resolver 397 (+0x10) > > > 7 mod_proxy:\proxy_util\ap_proxy_determine_connection 2506 (+0x15) > > > 8 mod_proxy_http:\mod_proxy_http\proxy_http_handler 1956 (+0x1D) > > > 9 mod_proxy:\mod_proxy\proxy_run_scheme_handler 3063 (+0x18) > > > 10 mod_proxy:\mod_proxy\proxy_handler 1250 (+0x16) > > > 11 httpd:UNKNOWN at 0x004543A0 > > > > Thanks, I feel like I understand a bit more now at least about the > > apr_reslist_acquire() vs. DNS lookups cases. > > > > I guess in the above case ap_proxy_initialize_worker() has some funny > > business where the state is checked outside of the global mutex and if > > a 2nd thread sneaks in they will reuse worker->cp->pool > > Whoops, gmail ate a bunch of keystrokes and sent prematurely when I > tried to paste.: > > if (worker->cp == NULL) > init_conn_pool(p, worker); > > But then will create a new reslist with the old pool that is already > being used in thread 1 that has gotten farther into the handler. > I don't totally follow how the patch in BZ is fixing this one if > that's what's happening. > > -- > Eric Covener > cove...@gmail.com -- Eric Covener cove...@gmail.com
Re: [PATCH 63503] - Reverse proxy server - SIGSEGV
On Mon, Aug 12, 2019 at 2:37 PM Eric Covener wrote: > > On Mon, Aug 12, 2019 at 12:32 PM Don Poitras wrote: > > > > Eric, > > The global mutex only serializes concurrent calls to > > ap_proxy_initialize_worker(). The worker pool is also used when the > > proxy_handler() is called from a thread kicked off from a _previous_ call > > to ap_proxy_initialize_worker() . Turning on the pool concurrency check > > shows this happening: > > > > pool concurrency check: pool 0xa04348(proxy_worker_cp), thread cur > > 7f25a21fc700 in use by 7f2598bed700, state in use -> in use > > > > 172256 . Thawed libc:gsignal (+0x37) > > <--- abort > > 1 libc:gsignal (+0x37) > > 2 libc:abort (+0x143) > > 3 libapr-1:\apr_pools\pool_concurrency_abort 768 > > 4 libapr-1:\apr_pools\apr_palloc 778 (+0xA) > > 5 libapr-1:\thread_cond\apr_thread_cond_create 44 > > 6 libaprutil-1:\apr_reslist\apr_reslist_create 299 (+0x9) > > 7 mod_proxy:\proxy_util\ap_proxy_initialize_worker 2013 (+0x2F) > > 8 mod_proxy:\mod_proxy\proxy_handler 1176 (+0xE) > > 9 httpd:UNKNOWN at 0x004543A0 > > > > Here's the thead the diagnostic said had the pool 'in use' when > > apr_palloc() was called: > > > > 172271Thawed libc:__GI_strncmp (+0x1680) > > <-- in use > > 1 libc:__GI_strncmp (+0x1680) > > 2 libc:getenv (+0xBD) > > 3 libc:__nscd_getai (+0x3D3) > > 4 libc:gaih_inet.constprop.8 (+0x15F2) > > 5 libc:getaddrinfo (+0x11F) > > 6 libapr-1:\sockaddr\call_resolver 397 (+0x10) > > 7 mod_proxy:\proxy_util\ap_proxy_determine_connection 2506 (+0x15) > > 8 mod_proxy_http:\mod_proxy_http\proxy_http_handler 1956 (+0x1D) > > 9 mod_proxy:\mod_proxy\proxy_run_scheme_handler 3063 (+0x18) > > 10 mod_proxy:\mod_proxy\proxy_handler 1250 (+0x16) > > 11 httpd:UNKNOWN at 0x004543A0 > > Thanks, I feel like I understand a bit more now at least about the > apr_reslist_acquire() vs. DNS lookups cases. > > I guess in the above case ap_proxy_initialize_worker() has some funny > business where the state is checked outside of the global mutex and if > a 2nd thread sneaks in they will reuse worker->cp->pool Whoops, gmail ate a bunch of keystrokes and sent prematurely when I tried to paste.: if (worker->cp == NULL) init_conn_pool(p, worker); But then will create a new reslist with the old pool that is already being used in thread 1 that has gotten farther into the handler. I don't totally follow how the patch in BZ is fixing this one if that's what's happening. -- Eric Covener cove...@gmail.com
Re: [PATCH 63503] - Reverse proxy server - SIGSEGV
On Mon, Aug 12, 2019 at 12:32 PM Don Poitras wrote: > > Eric, > The global mutex only serializes concurrent calls to > ap_proxy_initialize_worker(). The worker pool is also used when the > proxy_handler() is called from a thread kicked off from a _previous_ call to > ap_proxy_initialize_worker() . Turning on the pool concurrency check shows > this happening: > > pool concurrency check: pool 0xa04348(proxy_worker_cp), thread cur > 7f25a21fc700 in use by 7f2598bed700, state in use -> in use > > 172256 . Thawed libc:gsignal (+0x37) ><--- abort > 1 libc:gsignal (+0x37) > 2 libc:abort (+0x143) > 3 libapr-1:\apr_pools\pool_concurrency_abort 768 > 4 libapr-1:\apr_pools\apr_palloc 778 (+0xA) > 5 libapr-1:\thread_cond\apr_thread_cond_create 44 > 6 libaprutil-1:\apr_reslist\apr_reslist_create 299 (+0x9) > 7 mod_proxy:\proxy_util\ap_proxy_initialize_worker 2013 (+0x2F) > 8 mod_proxy:\mod_proxy\proxy_handler 1176 (+0xE) > 9 httpd:UNKNOWN at 0x004543A0 > > Here's the thead the diagnostic said had the pool 'in use' when apr_palloc() > was called: > > 172271Thawed libc:__GI_strncmp (+0x1680) > <-- in use > 1 libc:__GI_strncmp (+0x1680) > 2 libc:getenv (+0xBD) > 3 libc:__nscd_getai (+0x3D3) > 4 libc:gaih_inet.constprop.8 (+0x15F2) > 5 libc:getaddrinfo (+0x11F) > 6 libapr-1:\sockaddr\call_resolver 397 (+0x10) > 7 mod_proxy:\proxy_util\ap_proxy_determine_connection 2506 (+0x15) > 8 mod_proxy_http:\mod_proxy_http\proxy_http_handler 1956 (+0x1D) > 9 mod_proxy:\mod_proxy\proxy_run_scheme_handler 3063 (+0x18) > 10 mod_proxy:\mod_proxy\proxy_handler 1250 (+0x16) > 11 httpd:UNKNOWN at 0x004543A0 Thanks, I feel like I understand a bit more now at least about the apr_reslist_acquire() vs. DNS lookups cases. I guess in the above case ap_proxy_initialize_worker() has some funny business where the state is checked outside of the global mutex and if a 2nd thread sneaks in they will reuse worker->cp->pool
Re: [VOTE] Release httpd-2.4.41
Am 09.08.2019 um 15:40 schrieb Daniel Ruggeri: Hi, all; Please find below the proposed release tarball and signatures: https://dist.apache.org/repos/dist/dev/httpd/ I would like to call a VOTE over the next few days to release this candidate tarball as 2.4.41: [X] +1: It's not just good, it's good enough! [ ] +0: Let's have a talk. [ ] -1: There's trouble in paradise. Here's what's wrong. The computed digests of the tarball up for vote are: sha1: b713e835aa7cde823a4b7f8e3463164f3d9fe63e *httpd-2.4.41.tar.gz sha256: 3c0f9663240beb0f008acf3b4501c4f339d7467ee345a36c86c46b4d6f3a5461 *httpd-2.4.41.tar.gz -- Daniel Ruggeri +1 to release and thanks a bunch for RM! Summary: all OK except for - very few shutdown crashes on Solaris (already observed in 2.4.37, but then with MPM event when statically linked, now 2 times both with prefork and shared linking) - not tested but still expected to happen: problems with prefork plus mod_ext_filter plus LimitRequestBody on Solaris (not a regression), plus indicatons that I also saw it on SLES 11 (and then disabled mod_ext_filter there to let the tests proceed). Detailed report: - Sigs and hashes OK - contents of tarballs identical - contents of tag and tarballs identical except for expected deltas Built on - Solaris 10 Sparc as 32 Bit Binaries - SLES 11+12 (64 Bits) - RHEL 6+7 (64 Bits) For all platforms built - with default (shared) and static modules - with module set reallyall - using --enable-load-all-modules - against external APR/APU 1.7.0/1.6.1 plus for SLES 12 and RHEL 7 also against 1.6.5/1.6.1 plus for SLES 12 and RHEL 7 built but not tested using APR trunk head - using external libraries - expat 2.2.7 - pcre 8.43 - lua 5.3.5 (compiled with LUA_COMPAT_MODULE) - distcache 1.5.1 - libxml2 2.9.9 - libnghttp2 1.39.1 - brotli 1.0.7 - curl 7.65.3 - jansson 2.12 and - openssl 0.9.8zh, 1.0.2, 1.0.2s, 1.0.1e, 1.0.1k, 1.1.1, 1.1.1c plus patches (head of a few days ago) - Tool chain: - platform gcc except on Solaris (gcc 8.3.0 Solaris 10) - CFLAGS: -O2 -g -Wall -fno-strict-aliasing - on Solaris additionally -mpcu=v9, -D_XOPEN_SOURCE, -D_XOPEN_SOURCE_EXTENDED=1, -D__EXTENSIONS__ and -D_XPG6 All of the 246 builds succeeded. - compiler warnings: none Tested for - Solaris 10, SLES 11+12, RHEL 6+7 - MPMs prefork, worker, event - default and static module builds - log level trace8 - module set reallyall - for "reallyall" 129 modules plus MPMs - Perl client bundle build against OpenSSL 1.1.1, 1.1.0i, 1.0.2p and 0.9.8zh - OpenSSL once linked statically and once as a shared library Every OpenSSL version in the client tested with every version in the server, not just the same version. The total number of test suite runs was 1926 (plus some on Solaris and RHEL 6 still running, the whole suite hasn't finished yet, but enough to come up with a clear result). The following test failures were seen: a Crashes only on Solaris, this time only with prefork MPM and dynamically linked builds. The crash seems to happen only at the end of a process during pchild clean up and it might be problematic, that the watchdog thread at that time still exists. gdb info see at end. b Tests 4, 8 and 12 of t/modules/buffer.t Not a regression Tests 4, 8 and sometimes 12, always line 37 Relatively frequent (645) failures on platforms Solaris 10 and old SLES11, RHEL 6, but not on more modern (and here faster) SLES 12 and RHEL 7. Happens for all OpenSSL client and server versions and all link types. c Various tests in t/apache/expr_string.t Not a regression. 18 times, Test numbers : 11, 14, 20, 23, 26, 29 Happens for 18 out of 1926 runs (3 times on Solaris 10, otherwise always on RHEL6). The failure is once on line 101, all others are on line 87. They happen, when the error_log contents are checked. Could be due to logs written to NFS. d Test 5 in t/modules/dav.t line 69: Not a regression. 5 times: once RHEL 6, twice RHEL 7 and twice SLES 12 Creation, modified and now times not in the correct order. This seems to be a system issue, all tests done on NFS, many tested on virtualized guests. e Test 54 in t/modules/cgi.t line 232: Not a regression 3 times once Solaris Test checks log contents. Could be false positive due to logs written to NFS. Regards, Rainer GDB info (sporadic) Solaris shutdown crahes: #0 0xff07f17c in allocator_free (node=0x1bd8, allocator=0x333530) at /path/to/sources/apr/1.7.x/1.7.0/apr-1.7.0/memory/unix/apr_pools.c:486 freelist = 0x0 max_index = 1574824 max_free_index = 512 next = 0x1bd8 index = 5201872 current_free_index = 4284842517 #1 apr_pool_destroy (pool=0x338068) at /path/to/sources/apr/1.7.x/1.7.0/apr-1.7.0/memory/unix/apr_pools.c:1043 active = allocator = 0x333530 #2 0xfed629d0 in clean_child_exit (code=7) at /shared/build/dev/httpd/sources/http
RE: [PATCH 63503] - Reverse proxy server - SIGSEGV
Eric, The global mutex only serializes concurrent calls to ap_proxy_initialize_worker(). The worker pool is also used when the proxy_handler() is called from a thread kicked off from a _previous_ call to ap_proxy_initialize_worker() . Turning on the pool concurrency check shows this happening: pool concurrency check: pool 0xa04348(proxy_worker_cp), thread cur 7f25a21fc700 in use by 7f2598bed700, state in use -> in use 172256 . Thawed libc:gsignal (+0x37) <--- abort 1 libc:gsignal (+0x37) 2 libc:abort (+0x143) 3 libapr-1:\apr_pools\pool_concurrency_abort 768 4 libapr-1:\apr_pools\apr_palloc 778 (+0xA) 5 libapr-1:\thread_cond\apr_thread_cond_create 44 6 libaprutil-1:\apr_reslist\apr_reslist_create 299 (+0x9) 7 mod_proxy:\proxy_util\ap_proxy_initialize_worker 2013 (+0x2F) 8 mod_proxy:\mod_proxy\proxy_handler 1176 (+0xE) 9 httpd:UNKNOWN at 0x004543A0 Here's the thead the diagnostic said had the pool 'in use' when apr_palloc() was called: 172271Thawed libc:__GI_strncmp (+0x1680) <-- in use 1 libc:__GI_strncmp (+0x1680) 2 libc:getenv (+0xBD) 3 libc:__nscd_getai (+0x3D3) 4 libc:gaih_inet.constprop.8 (+0x15F2) 5 libc:getaddrinfo (+0x11F) 6 libapr-1:\sockaddr\call_resolver 397 (+0x10) 7 mod_proxy:\proxy_util\ap_proxy_determine_connection 2506 (+0x15) 8 mod_proxy_http:\mod_proxy_http\proxy_http_handler 1956 (+0x1D) 9 mod_proxy:\mod_proxy\proxy_run_scheme_handler 3063 (+0x18) 10 mod_proxy:\mod_proxy\proxy_handler 1250 (+0x16) 11 httpd:UNKNOWN at 0x004543A0 Another possible fix would be to use the global mutex everywhere it's needed and remove the call to create and use the worker thread lock altogether, but that would mean changing the code that currently uses the worker proxy thread lock and it seemed cleaner to just change proxy_util.c rather than two source modules. The two functions in ap_proxy_acquire_connection() that I serialized were found from other runs. Both apr_reslist_acquire() and connection_constructor() end up using the same worker pool that is being used by ap_proxy_determine_connection(). e.g. pool concurrency check: pool 0x1d93198(proxy_worker_cp), thread cur 7f019bfff700 in use by 7f019abfd700, state in use -> in use aborting thread: 1Thread 0x7f019bfff700 (LWP 106622) 0x7f01a735c207 in raise () from /lib64/libc.so.6 3 libapr-1:\apr_pools\pool_concurrency_abort 768 4 libapr-1:\apr_pools\apr_palloc 778 (+0xA) 5 libaprutil-1:\apr_reslist\get_container 100 (+0x8) 6 libaprutil-1:\apr_reslist\apr_reslist_acquire 121 (+0x3) 7 mod_proxy:\proxy_util\ap_proxy_acquire_connection 2327 (+0x3) 8 mod_proxy_http:\mod_proxy_http\proxy_http_handler 1933 (+0x19) 9 mod_proxy:\mod_proxy\proxy_run_scheme_handler 3063 (+0x18) 10 mod_proxy:\mod_proxy\proxy_handler 1250 (+0x16) 11 httpd:UNKNOWN at 0x004543A0 And the "in-use" thread: 5Thread 0x7f019abfd700 (LWP 106624) 0x7f01a7424bed in connect () from /lib64/libc.so.6 8 mod_proxy:\proxy_util\ap_proxy_determine_connection 2537 (+0x15) 9 mod_proxy_http:\mod_proxy_http\proxy_http_handler 1956 (+0x1D) 10 mod_proxy:\mod_proxy\proxy_run_scheme_handler 3063 (+0x18) 11 mod_proxy:\mod_proxy\proxy_handler 1250 (+0x16) Once the workers are up and running, they use thread-specific storage, but all the code that starts them up uses a pool that needs to be serialized. worker->cp as used in ap_proxy_acquire_connection() is currently not serialized when running with the worker mpm. It can run at the same time as ap_proxy_initialize_worker() with the same pool used in both. The problem doesn't show up when running the same test with pre-fork or event mpms. -- Don Poitras - Host R&DSAS Institute Inc.SAS Campus Drive mailto:sas...@sas.com (919)531-5637 Fax:677- Cary, NC 27513