Hi,

On 2025-07-16 15:25:14 -0700, Jacob Champion wrote:
> On Wed, Jul 16, 2025 at 2:34 PM Andres Freund <and...@anarazel.de> wrote:
> > > Based on my understanding of [1], readahead makes this overall problem
> > > much worse by opportunistically slurping bytes off the wire and doing
> > > absolutely nothing with them until you call SSL_read() enough times to
> > > finally get to them.
> >
> > Right - but it also substantially reduces the number of syscalls :(. I'm not
> > sure that it's wise to close that door permanently.
>
> I think it's very likely that OpenSSL's implementation of readahead is
> fundamentally incompatible with our async implementation. For one, the
> documented way to call SSL_read() without accidentally hitting the
> socket is via SSL_pending(), which readahead is documented to break.

Why do we care about not hitting the socket? We always operate the socket in
non-blocking mode anyway?


FWIW, I have seen too many folks move away from encrypted connections for
backups due to openssl being the bottlenecks to just accept that we'll never
go beyond the worst performance.  Any actually decently performing networking
will need to divorce when socket IO happens and when openssl sees the data
much more heavily. Just relying more and more on extremely tight coupling is a
dead end.


> For two, if you're worried about how much data we could potentially
> have to hold during a "drain all pending" operation, I think readahead
> changes the upper bound from "the size of one TLS record" to
> "SO_RCVBUF", doesn't it?

It seems to be a configurable limit, with
SSL_set_default_read_buffer_len(). But the default is to just read up to a
whole frame's worth of data, instead of of reading the length
separately. I.e. an irrelevant amount of memory.


> > Without it openssl will do one read for the record length, and another read
> > for the record contents. I.e. it'll often double the number of syscalls.
>
> That is unfortunate... but if we're talking about a
> correctness/performance tradeoff then I'm somewhat less sympathetic to
> the performance side.

I'm obviously not advocating for incorrectness.


> > Luckily there seems to be SSL_has_pending():
> >
> > > SSL_has_pending() returns 1 if s has buffered data (whether processed or
> > > unprocessed) and 0 otherwise
>
> IIUC, we can't use SSL_has_pending() either. I need to know how
> exactly many bytes to drain out of the userspace buffer without
> introducing more bytes into it.

Why?  The only case where we ought to care about whether pending data exists
inside openssl is if our internal buffer is either empty or doesn't contain
the entire message we're trying to consume. In either of those two cases we
can just consume some data from openssl, without caring how much it precisely
is.


> > To me the pattern in our code seems bogus, even without readahead, no?
> >
> > getCopyDataMessage() can return 0 because our internal buffer is empty,
> > without ever reading from the socket or openssl. But if the SSL record
> > contained two messages (or parts of two messages), all the required data may
> > be in openssl. In which case the pqWait() that pqGetCopyData3() does will 
> > wait
> > forever.
>
> I think the "cheating" in pqSocketCheck() that I mentioned upthread
> ensures that pqWait() will see the OpenSSL data and return
> immediately. (That's a major architectural smell IMO, but it is there.
> Just not for GSS, which might explain some hangs discussed on the list
> a while back.)

I guess so, but it's hard to know without a patch.


> > I.e. the extra read actually ensures that we're not doing a pqWait() without
> > knowing whether we need one.
>
> I don't think so, because (without my WIP patch) you still can't
> guarantee that the single call to pqReadData() got all of the
> readahead data.

It wouldn't need to get all the data in one pqReadData() in this case, as we'd
just loop around and do the whole thing again, until the entire message is
read in (regardless of whether there's remaining data in a partially consumed
record, or in the "unprocessed buffer").

Greetings,

Andres Freund


Reply via email to