On Sat, Jul 16, 2022 at 10:29 AM Ivan Zhakov <i...@apache.org> wrote:
>
> On Wed, 13 Jul 2022 at 18:06, Yann Ylavic <ylavic....@gmail.com> wrote:
>>
>> The flexible way for pcre2 users (and httpd maintainers) would be to
>> let them opt-out (at runtime) of stack allocation from pcre2_match(),
>> which is the patch I proposed.
>>
> I see several problems in this approach:

I meant the PR/patch I proposed to the pcre team, not the patch to
util_pcre which you seem to refer to below.

> 1. Every apr_pool_t consumes at least one page, i.e., 4kb or 8kb depending of 
> the platform.
> 2. The implementation may introduce unbounded memory usage, if PCRE2 performs 
> multiple malloc/free during pattern matching. It seems that currently it 
> doesn't, but we cannot rely on the implementation details. Stackbuf code 
> doesn't have this problem, because it's limited to a fixed-size buffer.

The memory usage with r1902731 is bounded by the maximum size ever
asked by pcre2_match() AND the subpool's max_free setting.
This looks quite optimal to me: the pool adapts to the actual needs up
to ap_max_mem_free, if ~20K are what most regexes need this just
mostly reuse 20K of apr_memnode_t(s) of different sizes (number of
system pages) for all the ap_regexec() calls of each thread (memory
management would be in the noise of pcre_exec/match, whereas system's
malloc/free would/could not).

Your stack proposal stops where pcre2 starts calling private_malloc()
by itself (and r1902731 leaved this alone), but should that happen
because either the regex is more complex, or pcre2 reduces
START_FRAMES_SIZE, or pcre2 starts calling pcre2_ctx->malloc()
unconditionally for its frames, or pcre2 optionally lets users choose
between 20K of stack space and full/20K-initial pcre2_ctx->malloc().
I think that httpd should (still) care about 20K stack allocated
anywhere in the call stack, we don't do that elsewhere (besides some
"low in call stack" exceptional cases in the MPMs), that's why I
raised a PR to the prcre team.

> 3. As I said before, the proposed approach effectively increases per thread 
> memory usage: 4 or 8 kb for match data stored in thread state + stack size. 
> Which is something I don't really understand, because if we are fine with 
> that, then we can just increase the thread stack size.

On the one hand you ask some admin to increase thread stack size if
needed, on the other hand you prevent some admin from reducing it if
possible. Why is that?
If one has/buys a docker slot to run httpd (usually only that), say
with MaxRequestWorkers 1000, would you want it to consume 8GB of stack
(Linux default) or e.g. 128MB (with ulimit -s or by default with the
docker-fitted Alpine Linux for instance) or anything in between for
one's needs.
The only thing we (as devs) can do about it is good stack usage
practices (20K is not IMHO), because stack exhaustion is fatal and
hard to debug.

>
> With the above in mind, I'm -1 for this approach and r1902731.

If (as currently being discussed) the pcre2 team decides to reduce
stack usage (optionally or not) and that the match_data should
preserve the allocated memory between the calls (i.e. until an
explicit pcre2_match_data_free), I think the initial implementation is
all we need (the one in 2.4.x, where I thought match_data was doing
that exactly). We could replace the hash lookup with an
AP_THREAD_LOCAL since you disliked it, but having a pcre_match_data_t
per thread would do (we could still track memory usage through
private_malloc() to free the memory at some point..).
Your approach with 256 bytes of stack space would help allocate the
pcre2_general_context and pcre2_match_data structs only, but not the
very first frame(s).

Until something changes on the pcre2 side (if ever), r1902731 does not
hurt your use case anyway (it takes the hand where you leave it to
malloc/free, that is pcre2_match() needs more than 20KB of frames), so
I'm wondering why this -1.
I hear that it complexifies the code a bit (not much more than the
stack usage by the way), but it helps some
existing/pcre-patched/future cases, so I'm hoping that your -1 comes
from my poor explanation on how it actually works.

Whether apr_thead_current() belongs to apr(-1.8) or not is irrelevant
here, but if ap_thread_current() is useful to httpd I think that the
apr_ version is worth it for other software too. I mean, your
reluctance seems to be against ap(r)_thread_current(), I don't grok
why you deny the advantages (at least) in the util_pcre case though.


Regards;
Yann.

Reply via email to