On Sat, Jul 16, 2022 at 10:29 AM Ivan Zhakov <i...@apache.org> wrote: > > On Wed, 13 Jul 2022 at 18:06, Yann Ylavic <ylavic....@gmail.com> wrote: >> >> The flexible way for pcre2 users (and httpd maintainers) would be to >> let them opt-out (at runtime) of stack allocation from pcre2_match(), >> which is the patch I proposed. >> > I see several problems in this approach:
I meant the PR/patch I proposed to the pcre team, not the patch to util_pcre which you seem to refer to below. > 1. Every apr_pool_t consumes at least one page, i.e., 4kb or 8kb depending of > the platform. > 2. The implementation may introduce unbounded memory usage, if PCRE2 performs > multiple malloc/free during pattern matching. It seems that currently it > doesn't, but we cannot rely on the implementation details. Stackbuf code > doesn't have this problem, because it's limited to a fixed-size buffer. The memory usage with r1902731 is bounded by the maximum size ever asked by pcre2_match() AND the subpool's max_free setting. This looks quite optimal to me: the pool adapts to the actual needs up to ap_max_mem_free, if ~20K are what most regexes need this just mostly reuse 20K of apr_memnode_t(s) of different sizes (number of system pages) for all the ap_regexec() calls of each thread (memory management would be in the noise of pcre_exec/match, whereas system's malloc/free would/could not). Your stack proposal stops where pcre2 starts calling private_malloc() by itself (and r1902731 leaved this alone), but should that happen because either the regex is more complex, or pcre2 reduces START_FRAMES_SIZE, or pcre2 starts calling pcre2_ctx->malloc() unconditionally for its frames, or pcre2 optionally lets users choose between 20K of stack space and full/20K-initial pcre2_ctx->malloc(). I think that httpd should (still) care about 20K stack allocated anywhere in the call stack, we don't do that elsewhere (besides some "low in call stack" exceptional cases in the MPMs), that's why I raised a PR to the prcre team. > 3. As I said before, the proposed approach effectively increases per thread > memory usage: 4 or 8 kb for match data stored in thread state + stack size. > Which is something I don't really understand, because if we are fine with > that, then we can just increase the thread stack size. On the one hand you ask some admin to increase thread stack size if needed, on the other hand you prevent some admin from reducing it if possible. Why is that? If one has/buys a docker slot to run httpd (usually only that), say with MaxRequestWorkers 1000, would you want it to consume 8GB of stack (Linux default) or e.g. 128MB (with ulimit -s or by default with the docker-fitted Alpine Linux for instance) or anything in between for one's needs. The only thing we (as devs) can do about it is good stack usage practices (20K is not IMHO), because stack exhaustion is fatal and hard to debug. > > With the above in mind, I'm -1 for this approach and r1902731. If (as currently being discussed) the pcre2 team decides to reduce stack usage (optionally or not) and that the match_data should preserve the allocated memory between the calls (i.e. until an explicit pcre2_match_data_free), I think the initial implementation is all we need (the one in 2.4.x, where I thought match_data was doing that exactly). We could replace the hash lookup with an AP_THREAD_LOCAL since you disliked it, but having a pcre_match_data_t per thread would do (we could still track memory usage through private_malloc() to free the memory at some point..). Your approach with 256 bytes of stack space would help allocate the pcre2_general_context and pcre2_match_data structs only, but not the very first frame(s). Until something changes on the pcre2 side (if ever), r1902731 does not hurt your use case anyway (it takes the hand where you leave it to malloc/free, that is pcre2_match() needs more than 20KB of frames), so I'm wondering why this -1. I hear that it complexifies the code a bit (not much more than the stack usage by the way), but it helps some existing/pcre-patched/future cases, so I'm hoping that your -1 comes from my poor explanation on how it actually works. Whether apr_thead_current() belongs to apr(-1.8) or not is irrelevant here, but if ap_thread_current() is useful to httpd I think that the apr_ version is worth it for other software too. I mean, your reluctance seems to be against ap(r)_thread_current(), I don't grok why you deny the advantages (at least) in the util_pcre case though. Regards; Yann.