Re: [PATCH] Avoid mixing custom and OpenSSL BIO functions

2024-05-19 Thread Daniel Gustafsson
> On 19 May 2024, at 22:21, Peter Eisentraut  wrote:

> Whoever commits these should be sure to coordinate this.

Thanks for the reminder.  I have this patchset, the OCSP stapling patchset and
the multiple-cert one on my radar for when I do the 1.1.0 removal work in the
July CF. Will do another scan for SSL related patches in flight at the time.

--
Daniel Gustafsson





Re: [PATCH] Avoid mixing custom and OpenSSL BIO functions

2024-05-19 Thread Peter Eisentraut

On 19.05.24 20:07, David Benjamin wrote:
On Sun, May 19, 2024 at 12:21 PM Tom Lane > wrote:


Per the cfbot [1], this patch needs a rebase over the ALPN-related
commits.  It still isn't likely to get human attention before the
July commitfest, but you can save some time by making sure it's
in a reviewable state before that.


Rebased version attached. (The conflict was pretty trivial. Both patches 
add a field to some struct.)


Note that there is a concurrent plan to drop support for OpenSSL older 
than 1.1.1 [0], which would obsolete your configure test for 
BIO_get_data.  Whoever commits these should be sure to coordinate this.



[0]: https://www.postgresql.org/message-id/flat/zg3jnursg69dz...@paquier.xyz





Re: [PATCH] Avoid mixing custom and OpenSSL BIO functions

2024-05-19 Thread David Benjamin
On Sun, May 19, 2024 at 12:21 PM Tom Lane  wrote:

> Per the cfbot [1], this patch needs a rebase over the ALPN-related
> commits.  It still isn't likely to get human attention before the
> July commitfest, but you can save some time by making sure it's
> in a reviewable state before that.
>

Rebased version attached. (The conflict was pretty trivial. Both patches
add a field to some struct.)

David
From d949ff826aed2a7a9107be4b166fd48bcae38227 Mon Sep 17 00:00:00 2001
From: David Benjamin 
Date: Sun, 11 Feb 2024 10:42:25 -0500
Subject: [PATCH] Avoid mixing custom and OpenSSL BIO functions

This addresses the root cause of the BIO_set_data conflict that
c82207a548db47623a2bfa2447babdaa630302b9 attempted to address. That fix was
really just a workaround. The real root cause was that postgres was mixing up
two BIO implementations, each of which expected to be driving the BIO.

Mixing them up did not actually do any good. The Port and PGconn structures
already maintained the file descriptors. The socket BIO's copy, configured via
BIO_set_fd, wasn't even being used because my_BIO_s_socket replaced read and
write anyway. We've been essentially keeping extra state around, and relying
on it being unused:

- gets - Not implemented by sockets and not used by libssl.
- puts - Not used by libssl. If it were, it would break the special SIGPIPE and
  interrupt handling postgres aiming for.
- ctrl - (More on this later)
- create - This is just setting up state that we don't use.
- destroy - This is a no-op because we use BIO_NOCLOSE. In fact it's important
  that it's a no-op because otherwise OpenSSL would close the socket under
  postgres' feet!
- callback_ctrl - Not implemented by sockets.

That leaves ctrl. ctrl is a bunch of operations (it's ioctl). The only
operation that matters is BIO_CTRL_FLUSH, which is implemented as a no-op. All
other operations are unused. It's once again good that they're unused because
otherwise OpenSSL might mess with postgres's socket, break the assumptions
around interrupt handling, etc.

Instead, simply implement a very basic ctrl ourselves and drop the other
functions. This avoids the risk that future OpenSSL upgrades add some feature
to BIO_s_socket's ctrl which conflicts with postgres.

Once we've done that, we're free to use BIO_set_data. While BIO_set_app_data
works fine, I've reverted back to BIO_set_data because it's more commonly used.
app_data depends on OpenSSL's "ex_data" mechanism, which is a tad heavier under
the hood.

As this is no longer related to BIO_s_socket or calling SSL_set_fd, I've
renamed the methods to reference the PGconn and Port types instead.
---
 configure|   2 +-
 configure.ac |   2 +-
 meson.build  |   1 +
 src/backend/libpq/be-secure-openssl.c| 106 +++
 src/include/libpq/libpq-be.h |   1 +
 src/include/pg_config.h.in   |   3 +
 src/interfaces/libpq/fe-secure-openssl.c |  94 +---
 src/interfaces/libpq/libpq-int.h |   1 +
 8 files changed, 140 insertions(+), 70 deletions(-)

diff --git a/configure b/configure
index 8e7704d54b..538975530f 100755
--- a/configure
+++ b/configure
@@ -12564,7 +12564,7 @@ done
   # defines OPENSSL_VERSION_NUMBER to claim version 2.0.0, even though it
   # doesn't have these OpenSSL 1.1.0 functions. So check for individual
   # functions.
-  for ac_func in OPENSSL_init_ssl BIO_meth_new ASN1_STRING_get0_data HMAC_CTX_new HMAC_CTX_free
+  for ac_func in OPENSSL_init_ssl BIO_get_data BIO_meth_new ASN1_STRING_get0_data HMAC_CTX_new HMAC_CTX_free
 do :
   as_ac_var=`$as_echo "ac_cv_func_$ac_func" | $as_tr_sh`
 ac_fn_c_check_func "$LINENO" "$ac_func" "$as_ac_var"
diff --git a/configure.ac b/configure.ac
index c7322e292c..4e34539ea2 100644
--- a/configure.ac
+++ b/configure.ac
@@ -1352,7 +1352,7 @@ if test "$with_ssl" = openssl ; then
   # defines OPENSSL_VERSION_NUMBER to claim version 2.0.0, even though it
   # doesn't have these OpenSSL 1.1.0 functions. So check for individual
   # functions.
-  AC_CHECK_FUNCS([OPENSSL_init_ssl BIO_meth_new ASN1_STRING_get0_data HMAC_CTX_new HMAC_CTX_free])
+  AC_CHECK_FUNCS([OPENSSL_init_ssl BIO_get_data BIO_meth_new ASN1_STRING_get0_data HMAC_CTX_new HMAC_CTX_free])
   # OpenSSL versions before 1.1.0 required setting callback functions, for
   # thread-safety. In 1.1.0, it's no longer required, and CRYPTO_lock()
   # function was removed.
diff --git a/meson.build b/meson.build
index 1c0579d5a6..853e1aa9f8 100644
--- a/meson.build
+++ b/meson.build
@@ -1281,6 +1281,7 @@ if sslopt in ['auto', 'openssl']
   # doesn't have these OpenSSL 1.1.0 functions. So check for individual
   # functions.
   ['OPENSSL_init_ssl'],
+  ['BIO_get_data'],
   ['BIO_meth_new'],
   ['ASN1_STRING_get0_data'],
   ['HMAC_CTX_new'],
diff --git a/src/backend/libpq/b

Re: [PATCH] Avoid mixing custom and OpenSSL BIO functions

2024-05-19 Thread Tom Lane
Daniel Gustafsson  writes:
> On 19 Apr 2024, at 15:13, David Benjamin  wrote:
>> Circling back here, anything else needed from my end on this patch?

> Patience mostly, v17 very recently entered feature freeze so a lof of the
> developers are busy finding and fixing bugs to stabilize the
> release.

Per the cfbot [1], this patch needs a rebase over the ALPN-related
commits.  It still isn't likely to get human attention before the
July commitfest, but you can save some time by making sure it's
in a reviewable state before that.

regards, tom lane

[1] http://commitfest.cputube.org/david-benjamin.html




Re: [PATCH] Avoid mixing custom and OpenSSL BIO functions

2024-04-19 Thread Daniel Gustafsson
> On 19 Apr 2024, at 15:13, David Benjamin  wrote:
> 
> Circling back here, anything else needed from my end on this patch?

Patience mostly, v17 very recently entered feature freeze so a lof of the
developers are busy finding and fixing bugs to stabilize the release.  When the
window for new features opens again I am sure this patch will get reviews (I
have it on my personal TODO and hope to get to it).

--
Daniel Gustafsson





Re: [PATCH] Avoid mixing custom and OpenSSL BIO functions

2024-04-19 Thread David Benjamin
Circling back here, anything else needed from my end on this patch?

On Wed, Feb 21, 2024, 17:04 David Benjamin  wrote:

> Thanks for the very thorough comments! I've attached a new version of the
> patch.
>
> On Thu, Feb 15, 2024 at 4:17 PM Andres Freund  wrote:
>
>> Hi,
>>
>> On 2024-02-11 13:19:00 -0500, David Benjamin wrote:
>> > I've attached a patch for the master branch to fix up the custom BIOs
>> used
>> > by PostgreSQL, in light of the issues with the OpenSSL update recently.
>> > While c82207a548db47623a2bfa2447babdaa630302b9 (switching from
>> BIO_get_data
>> > to BIO_get_app_data) resolved the immediate conflict, I don't think it
>> > addressed the root cause, which is that PostgreSQL was mixing up two BIO
>> > implementations, each of which expected to be driving the BIO.
>>
>> Yea, that's certainly not nice - and I think we've been somewhat lucky it
>> hasn't caused more issues. There's some nasty possibilities, e.g.
>> sock_ctrl()
>> partially enabling ktls without our initialization having called
>> ktls_enable(). Right now that just means ktls is unusable, but it's not
>> hard
>> to imagine accidentally ending up sending unencrypted data.
>>
>
> Yeah. Even if, say, the ktls bits work, given you all care enough about
> I/O to have wanted to wrap the BIO, I assume you'd want to pick up those
> features on your own terms, e.g. by implementing the BIO_CTRLs yourself.
>
>
>> I've in the past looked into not using a custom BIO [1], but I have my
>> doubts
>> about that being a good idea. I think medium term we want to be able to do
>> network IO asynchronously, which seems quite hard to do when using
>> openssl's
>> socket BIO.
>
>
>
> > Once we've done that, we're free to use BIO_set_data. While
>> BIO_set_app_data
>> > works fine, I've reverted back to BIO_set_data because it's more
>> commonly used.
>> > app_data depends on OpenSSL's "ex_data" mechanism, which is a tad
>> heavier under
>> > the hood.
>>
>> At first I was a bit wary of that, because it requires us to bring back
>> the
>> fallback implementation. But you're right, it's noticeably heavier than
>> BIO_get_data(), and we do call BIO_get_app_data() fairly frequently.
>>
>
> TBH, I doubt it makes any real difference perf-wise. But I think
> BIO_get_data is a bit more common in this context.
>
>
>> > That leaves ctrl. ctrl is a bunch of operations (it's ioctl). The only
>> > operation that matters is BIO_CTRL_FLUSH, which is implemented as a
>> no-op. All
>> > other operations are unused. It's once again good that they're unused
>> because
>> > otherwise OpenSSL might mess with postgres's socket, break the
>> assumptions
>> > around interrupt handling, etc.
>>
>> How did you determine that only FLUSH is required? I didn't even really
>> find
>> documentation about what the intended semantics actually are.
>>
>
> The unhelpful answer is that my day job is working on BoringSSL, so I've
> spent a lot of time with this mess. :-) But, yeah, it's not well-documented
> at all. OpenSSL ends up calling BIO_flush at the end of each batch of
> writes in libssl. TBH, I suspect that was less intentional and more an
> emergent property of them internally layering a buffer BIO at one point in
> the process, but it's long been part of the (undocumented) API contract.
> Conversely, I don't think OpenSSL can possibly make libssl *require* a
> new BIO_CTRL because they'd break every custom BIO anyone has ever used
> with the library.
>
>
>> E.g. should we implement BIO_CTRL_EOF? Sure, it wasn't really supported so
>> far, because we never set it, but is that right?
>
>
> Ah hmm, BIO_CTRL_EOF is... a bit of a mess. OpenSSL kind of messed things
> up. So, up until recently, I would have said that BIO_CTRL_EOF was not part
> of the interface here. OpenSSL 1.0.x did not implement it for sockets, and
> the BIO_read API *already* had a way to signal EOF: did you return zero
> or -1?
>
> Then OpenSSL 1.1.x introduced size_t-based BIO_read_ex APIs. However, in
> the process, they *forgot that EOF and error are different things* and
> made it impossible to detect EOFs if you use BIO_read_ex! They never
> noticed this, because they didn't actually convert their own code to their
> new API. See this discussion, which alas ended with OpenSSL deciding to
> ignore the problem and not even document their current interface.
> https://github.com/openssl/openssl/issues/8208
>
> Though they never responded, they seem to have tacitly settled using the
> out-of-band BIO_eof function (which is BIO_CTRL_EOF) as the way to signal
> EOF for BIO_read_ex. This is kind of fiddly, but is at least a well-defined
> option. But the problem is no one's BIO_METHODs, including their own, are
> read_ex-based. They all implement the old read callback. But someone might
> call BIO_read_ex on a read-based BIO_METHOD. IMO, BIO_read_ex should be
> responsible for translating between the two EOF conventions. For example,
> it could automatically set a flag when the read callback returns 0 

Re: [PATCH] Avoid mixing custom and OpenSSL BIO functions

2024-02-22 Thread David Benjamin
tually call set_fd() anymore, which sure seems odd.
>

Done. I wasn't sure what to name them, or even what the naming convention
was, but I opted to name them after the Port and PGconn objects they wrap.
Happy to switch to another name if you have a better idea though.

David
From e29b816dfb7ee5ed9f3389d7f06408f83783258a Mon Sep 17 00:00:00 2001
From: David Benjamin 
Date: Sun, 11 Feb 2024 10:42:25 -0500
Subject: [PATCH] Avoid mixing custom and OpenSSL BIO functions

This addresses the root cause of the BIO_set_data conflict that
c82207a548db47623a2bfa2447babdaa630302b9 attempted to address. That fix was
really just a workaround. The real root cause was that postgres was mixing up
two BIO implementations, each of which expected to be driving the BIO.

Mixing them up did not actually do any good. The Port and PGconn structures
already maintained the file descriptors. The socket BIO's copy, configured via
BIO_set_fd, wasn't even being used because my_BIO_s_socket replaced read and
write anyway. We've been essentially keeping extra state around, and relying
on it being unused:

- gets - Not implemented by sockets and not used by libssl.
- puts - Not used by libssl. If it were, it would break the special SIGPIPE and
  interrupt handling postgres aiming for.
- ctrl - (More on this later)
- create - This is just setting up state that we don't use.
- destroy - This is a no-op because we use BIO_NOCLOSE. In fact it's important
  that it's a no-op because otherwise OpenSSL would close the socket under
  postgres' feet!
- callback_ctrl - Not implemented by sockets.

That leaves ctrl. ctrl is a bunch of operations (it's ioctl). The only
operation that matters is BIO_CTRL_FLUSH, which is implemented as a no-op. All
other operations are unused. It's once again good that they're unused because
otherwise OpenSSL might mess with postgres's socket, break the assumptions
around interrupt handling, etc.

Instead, simply implement a very basic ctrl ourselves and drop the other
functions. This avoids the risk that future OpenSSL upgrades add some feature
to BIO_s_socket's ctrl which conflicts with postgres.

Once we've done that, we're free to use BIO_set_data. While BIO_set_app_data
works fine, I've reverted back to BIO_set_data because it's more commonly used.
app_data depends on OpenSSL's "ex_data" mechanism, which is a tad heavier under
the hood.

As this is no longer related to BIO_s_socket or calling SSL_set_fd, I've
renamed the methods to reference the PGconn and Port types instead.
---
 configure|   2 +-
 configure.ac |   2 +-
 meson.build  |   1 +
 src/backend/libpq/be-secure-openssl.c| 106 +++
 src/include/libpq/libpq-be.h |   1 +
 src/include/pg_config.h.in   |   3 +
 src/interfaces/libpq/fe-secure-openssl.c |  96 +---
 src/interfaces/libpq/libpq-int.h |   1 +
 8 files changed, 141 insertions(+), 71 deletions(-)

diff --git a/configure b/configure
index 6b87e5c9a8..cacda708b0 100755
--- a/configure
+++ b/configure
@@ -12870,7 +12870,7 @@ done
   # defines OPENSSL_VERSION_NUMBER to claim version 2.0.0, even though it
   # doesn't have these OpenSSL 1.1.0 functions. So check for individual
   # functions.
-  for ac_func in OPENSSL_init_ssl BIO_meth_new ASN1_STRING_get0_data HMAC_CTX_new HMAC_CTX_free
+  for ac_func in OPENSSL_init_ssl BIO_get_data BIO_meth_new ASN1_STRING_get0_data HMAC_CTX_new HMAC_CTX_free
 do :
   as_ac_var=`$as_echo "ac_cv_func_$ac_func" | $as_tr_sh`
 ac_fn_c_check_func "$LINENO" "$ac_func" "$as_ac_var"
diff --git a/configure.ac b/configure.ac
index 6e64ece11d..083bbabfbf 100644
--- a/configure.ac
+++ b/configure.ac
@@ -1374,7 +1374,7 @@ if test "$with_ssl" = openssl ; then
   # defines OPENSSL_VERSION_NUMBER to claim version 2.0.0, even though it
   # doesn't have these OpenSSL 1.1.0 functions. So check for individual
   # functions.
-  AC_CHECK_FUNCS([OPENSSL_init_ssl BIO_meth_new ASN1_STRING_get0_data HMAC_CTX_new HMAC_CTX_free])
+  AC_CHECK_FUNCS([OPENSSL_init_ssl BIO_get_data BIO_meth_new ASN1_STRING_get0_data HMAC_CTX_new HMAC_CTX_free])
   # OpenSSL versions before 1.1.0 required setting callback functions, for
   # thread-safety. In 1.1.0, it's no longer required, and CRYPTO_lock()
   # function was removed.
diff --git a/meson.build b/meson.build
index 8ed51b6aae..ba133b6e11 100644
--- a/meson.build
+++ b/meson.build
@@ -1290,6 +1290,7 @@ if sslopt in ['auto', 'openssl']
   # doesn't have these OpenSSL 1.1.0 functions. So check for individual
   # functions.
   ['OPENSSL_init_ssl'],
+  ['BIO_get_data'],
   ['BIO_meth_new'],
   ['ASN1_STRING_get0_data'],
   ['HMAC_CTX_new'],
diff --git a/src/backend/libpq/be-secure-openssl.c b/src/backend/libpq/be-secure-openssl.c
index e12b1cc9e3..05cfb8c5d3 100644
--- a/src/backend/libpq/be-secure-openssl.c
+++ b

Re: [PATCH] Avoid mixing custom and OpenSSL BIO functions

2024-02-15 Thread Andres Freund
Hi,

On 2024-02-11 13:19:00 -0500, David Benjamin wrote:
> I've attached a patch for the master branch to fix up the custom BIOs used
> by PostgreSQL, in light of the issues with the OpenSSL update recently.
> While c82207a548db47623a2bfa2447babdaa630302b9 (switching from BIO_get_data
> to BIO_get_app_data) resolved the immediate conflict, I don't think it
> addressed the root cause, which is that PostgreSQL was mixing up two BIO
> implementations, each of which expected to be driving the BIO.

Yea, that's certainly not nice - and I think we've been somewhat lucky it
hasn't caused more issues. There's some nasty possibilities, e.g. sock_ctrl()
partially enabling ktls without our initialization having called
ktls_enable(). Right now that just means ktls is unusable, but it's not hard
to imagine accidentally ending up sending unencrypted data.



I've in the past looked into not using a custom BIO [1], but I have my doubts
about that being a good idea. I think medium term we want to be able to do
network IO asynchronously, which seems quite hard to do when using openssl's
socket BIO.



> Once we've done that, we're free to use BIO_set_data. While BIO_set_app_data
> works fine, I've reverted back to BIO_set_data because it's more commonly 
> used.
> app_data depends on OpenSSL's "ex_data" mechanism, which is a tad heavier 
> under
> the hood.

At first I was a bit wary of that, because it requires us to bring back the
fallback implementation. But you're right, it's noticeably heavier than
BIO_get_data(), and we do call BIO_get_app_data() fairly frequently.



> That leaves ctrl. ctrl is a bunch of operations (it's ioctl). The only
> operation that matters is BIO_CTRL_FLUSH, which is implemented as a no-op. All
> other operations are unused. It's once again good that they're unused because
> otherwise OpenSSL might mess with postgres's socket, break the assumptions
> around interrupt handling, etc.

How did you determine that only FLUSH is required? I didn't even really find
documentation about what the intended semantics actually are.

E.g. should we implement BIO_CTRL_EOF? Sure, it wasn't really supported so
far, because we never set it, but is that right? What about
BIO_CTRL_GET_CLOSE/BIO_CTRL_SET_CLOSE?


Another issue is that 0 doesn't actually seem like the universal error return
- e.g. BIO_C_GET_FD seems to return -1, because 0 is a valid fd.


As of your patch the bio doesn't actually have an FD anymore, should it still
set BIO_TYPE_DESCRIPTOR?



> +static long
> +my_sock_ctrl(BIO *h, int cmd, long num, void *ptr)
> +{
> + longres = 0;
> +
> + switch (cmd)
> + {
> + case BIO_CTRL_FLUSH:
> + /* libssl expects all BIOs to support BIO_flush. */
> + res = 1;
> + break;
> + }
> +
> + return res;
> +}

I'd move the res = 0 into a default: block. That way the compiler can warn if
some case doesn't set it in all branches.


>  static BIO_METHOD *
>  my_BIO_s_socket(void)
>  {

Wonder if we should rename this. It's pretty odd that we still call it's not
really related to s_socket anymore, and doesn't even really implement the same
interface (e.g. get_fd doesn't work anymore).  Similarly, my_SSL_set_fd()
doesn't actually call set_fd() anymore, which sure seems odd.


Greetings,

Andres Freund

[1] 
https://www.postgresql.org/message-id/20210715021747.h2hih7mw56ivyntt%40alap3.anarazel.de




Re: [PATCH] Avoid mixing custom and OpenSSL BIO functions

2024-02-15 Thread Daniel Gustafsson
> On 15 Feb 2024, at 04:58, David Benjamin  wrote:
> 
> By the way, I'm unable to add the patch to the next commitfest due to the 
> cool off period for new accounts. How long is that period? I don't suppose 
> there's a way to avoid it?

There is a way to expedite the cooling-off period (it's a SPAM prevention
measure), but I don't have access to it.  In the meantime I've added the patch
for you, and once the cooling off is over we can add your name as author.

https://commitfest.postgresql.org/47/4835/

I'm currently reviewing it and will get back to you soon on that.

--
Daniel Gustafsson





Re: [PATCH] Avoid mixing custom and OpenSSL BIO functions

2024-02-15 Thread David Benjamin
By the way, I'm unable to add the patch to the next commitfest due to the
cool off period for new accounts. How long is that period? I don't suppose
there's a way to avoid it?

On Mon, Feb 12, 2024 at 11:31 AM David Benjamin  wrote:

> On Mon, Feb 12, 2024 at 9:38 AM Daniel Gustafsson  wrote:
>
>> > On 11 Feb 2024, at 19:19, David Benjamin  wrote:
>> > It turns out the parts that came from the OpenSSL socket BIO were a
>> no-op, and in fact PostgreSQL is relying on it being a no-op. Instead, it's
>> cleaner to just define a custom BIO the normal way, which then leaves the
>> more standard BIO_get_data mechanism usable. This also avoids the risk that
>> a future OpenSSL will add a now BIO_ctrl to the socket type, with libssl
>> calling into it, and then break some assumptions made by PostgreSQL.
>>
>> +   case BIO_CTRL_FLUSH:
>> +   /* libssl expects all BIOs to support BIO_flush.
>> */
>> +   res = 1;
>> +   break;
>>
>> Will this always be true?  Returning 1 implies that we have flushed all
>> data on
>> the socket, but what if we just before called BIO_set_retry..XX()?
>>
>
> The real one is also just a no-op. :-)
> https://github.com/openssl/openssl/blob/master/crypto/bio/bss_sock.c#L215
>
> This is used in places like buffer BIO or the FILE* BIO, where BIO_write
> might accept data, but stage it into a buffer, to be flushed later. libssl
> ends up calling BIO_flush at the end of every flight, which in turn means
> that BIOs used with libssl need to support it, even if to return true
> because there's nothing to flush. (Arguably TCP sockets could have used a
> flush concept, to help control Nagle's algorithm, but for better or worse,
> that's a socket-wide TCP_NODELAY option, rather than an explicit flush
> call.)
>
> BIO_set_retry.. behaves like POSIX I/O, where a failed EWOULDBLOCK write
> is as if you never wrote to the socket at all and doesn't impact
> socket state. That is, the data hasn't been accepted yet. It's not expected
> for BIO_flush to care about the rejected write data. (Also I don't believe
> libssl will ever trigger this case.) It's confusing because unlike an
> EWOULDBLOCK errno, BIO_set_retry.. *is* itself BIO state, but that's just
> because the BIO calling convention is goofy and didn't just return the
> error out of the return value. So OpenSSL just stashes the bit on the BIO
> itself, for you to query out immediately afterwards.
>
>
>> > I've attached a patch which does that. The existing SSL tests pass with
>> it, tested on Debian stable. (Though it took me a few iterations to figure
>> out how to run the SSL tests, so it's possible I've missed something.)
>>
>> We've done a fair bit of work on making them easier to run, so I'm
>> curious if
>> you saw any room for improvements there as someone coming to them for the
>> first
>> time?
>>
>
>  A lot of my time was just trying to figure out how to run the tests in
> the first place, so perhaps documentation? But I may just have been looking
> in the wrong spot and honestly didn't really know what I was doing. I can
> try to summarize what I did (from memory), and perhaps that can point to
> possible improvements?
>
> - I looked in the repository for instructions on running the tests and
> couldn't find any. At this point, I hadn't found src/test/README.
> - I found
> https://wiki.postgresql.org/wiki/Developer_FAQ#How_do_I_test_my_changes.3F,
> linked from https://www.postgresql.org/developer/
> - I ran the configure build with --enable-cassert, ran make check, tests
> passed.
> - I wrote my patch and then spent a while intentionally adding bugs to see
> if the tests would catch it (I wasn't sure whether there was ssl test
> coverage), finally concluding that I wasn't running any ssl tests
> - I looked some more and found src/test/ssl/README
> - I reconfigured with --enable-tap-tests and ran make check
> PG_TEST_EXTRA=ssl per those instructions, but the SSL tests still weren't
> running
> - I grepped for PG_TEST_EXTRA and found references in the CI config, but
> using the meson build
> - I installed meson, mimicked a few commands from the CI. That seemed to
> work.
> - I tried running only the ssl tests, looking up how you specify
> individual tests in meson, to make my compile/test cycles a bit faster, but
> they failed.
> - I noticed that the first couple "tests" were named like setup tasks, and
> guessed that the ssl tests depended on this setup to run. But by then I
> just gave up and waited out the whole test suite per run. :-)
>
> Once I got it running, it was quite smooth. I just wasn't sure how to do
> it.
>
> David
>


Re: [PATCH] Avoid mixing custom and OpenSSL BIO functions

2024-02-13 Thread David Benjamin
On Mon, Feb 12, 2024 at 9:38 AM Daniel Gustafsson  wrote:

> > On 11 Feb 2024, at 19:19, David Benjamin  wrote:
> > It turns out the parts that came from the OpenSSL socket BIO were a
> no-op, and in fact PostgreSQL is relying on it being a no-op. Instead, it's
> cleaner to just define a custom BIO the normal way, which then leaves the
> more standard BIO_get_data mechanism usable. This also avoids the risk that
> a future OpenSSL will add a now BIO_ctrl to the socket type, with libssl
> calling into it, and then break some assumptions made by PostgreSQL.
>
> +   case BIO_CTRL_FLUSH:
> +   /* libssl expects all BIOs to support BIO_flush. */
> +   res = 1;
> +   break;
>
> Will this always be true?  Returning 1 implies that we have flushed all
> data on
> the socket, but what if we just before called BIO_set_retry..XX()?
>

The real one is also just a no-op. :-)
https://github.com/openssl/openssl/blob/master/crypto/bio/bss_sock.c#L215

This is used in places like buffer BIO or the FILE* BIO, where BIO_write
might accept data, but stage it into a buffer, to be flushed later. libssl
ends up calling BIO_flush at the end of every flight, which in turn means
that BIOs used with libssl need to support it, even if to return true
because there's nothing to flush. (Arguably TCP sockets could have used a
flush concept, to help control Nagle's algorithm, but for better or worse,
that's a socket-wide TCP_NODELAY option, rather than an explicit flush
call.)

BIO_set_retry.. behaves like POSIX I/O, where a failed EWOULDBLOCK write is
as if you never wrote to the socket at all and doesn't impact socket state.
That is, the data hasn't been accepted yet. It's not expected for BIO_flush
to care about the rejected write data. (Also I don't believe libssl will
ever trigger this case.) It's confusing because unlike an EWOULDBLOCK
errno, BIO_set_retry.. *is* itself BIO state, but that's just because the
BIO calling convention is goofy and didn't just return the error out of the
return value. So OpenSSL just stashes the bit on the BIO itself, for you to
query out immediately afterwards.


> > I've attached a patch which does that. The existing SSL tests pass with
> it, tested on Debian stable. (Though it took me a few iterations to figure
> out how to run the SSL tests, so it's possible I've missed something.)
>
> We've done a fair bit of work on making them easier to run, so I'm curious
> if
> you saw any room for improvements there as someone coming to them for the
> first
> time?
>

 A lot of my time was just trying to figure out how to run the tests in the
first place, so perhaps documentation? But I may just have been looking in
the wrong spot and honestly didn't really know what I was doing. I can try
to summarize what I did (from memory), and perhaps that can point to
possible improvements?

- I looked in the repository for instructions on running the tests and
couldn't find any. At this point, I hadn't found src/test/README.
- I found
https://wiki.postgresql.org/wiki/Developer_FAQ#How_do_I_test_my_changes.3F,
linked from https://www.postgresql.org/developer/
- I ran the configure build with --enable-cassert, ran make check, tests
passed.
- I wrote my patch and then spent a while intentionally adding bugs to see
if the tests would catch it (I wasn't sure whether there was ssl test
coverage), finally concluding that I wasn't running any ssl tests
- I looked some more and found src/test/ssl/README
- I reconfigured with --enable-tap-tests and ran make check
PG_TEST_EXTRA=ssl per those instructions, but the SSL tests still weren't
running
- I grepped for PG_TEST_EXTRA and found references in the CI config, but
using the meson build
- I installed meson, mimicked a few commands from the CI. That seemed to
work.
- I tried running only the ssl tests, looking up how you specify individual
tests in meson, to make my compile/test cycles a bit faster, but they
failed.
- I noticed that the first couple "tests" were named like setup tasks, and
guessed that the ssl tests depended on this setup to run. But by then I
just gave up and waited out the whole test suite per run. :-)

Once I got it running, it was quite smooth. I just wasn't sure how to do it.

David


Re: [PATCH] Avoid mixing custom and OpenSSL BIO functions

2024-02-12 Thread Daniel Gustafsson
> On 11 Feb 2024, at 19:19, David Benjamin  wrote:
> 
> Hi all,
> 
> I've attached a patch for the master branch to fix up the custom BIOs used by 
> PostgreSQL, in light of the issues with the OpenSSL update recently. While 
> c82207a548db47623a2bfa2447babdaa630302b9 (switching from BIO_get_data to 
> BIO_get_app_data) resolved the immediate conflict, I don't think it addressed 
> the root cause, which is that PostgreSQL was mixing up two BIO 
> implementations, each of which expected to be driving the BIO.

Thanks for your patch, I'm still digesting it so will provide more of a review
later.

> It turns out the parts that came from the OpenSSL socket BIO were a no-op, 
> and in fact PostgreSQL is relying on it being a no-op. Instead, it's cleaner 
> to just define a custom BIO the normal way, which then leaves the more 
> standard BIO_get_data mechanism usable. This also avoids the risk that a 
> future OpenSSL will add a now BIO_ctrl to the socket type, with libssl 
> calling into it, and then break some assumptions made by PostgreSQL.

+   case BIO_CTRL_FLUSH:
+   /* libssl expects all BIOs to support BIO_flush. */
+   res = 1;
+   break;

Will this always be true?  Returning 1 implies that we have flushed all data on
the socket, but what if we just before called BIO_set_retry..XX()?

> I've attached a patch which does that. The existing SSL tests pass with it, 
> tested on Debian stable. (Though it took me a few iterations to figure out 
> how to run the SSL tests, so it's possible I've missed something.)

We've done a fair bit of work on making them easier to run, so I'm curious if
you saw any room for improvements there as someone coming to them for the first
time?

--
Daniel Gustafsson





[PATCH] Avoid mixing custom and OpenSSL BIO functions

2024-02-12 Thread David Benjamin
Hi all,

I've attached a patch for the master branch to fix up the custom BIOs used
by PostgreSQL, in light of the issues with the OpenSSL update recently.
While c82207a548db47623a2bfa2447babdaa630302b9 (switching from BIO_get_data
to BIO_get_app_data) resolved the immediate conflict, I don't think it
addressed the root cause, which is that PostgreSQL was mixing up two BIO
implementations, each of which expected to be driving the BIO.

It turns out the parts that came from the OpenSSL socket BIO were a no-op,
and in fact PostgreSQL is relying on it being a no-op. Instead, it's
cleaner to just define a custom BIO the normal way, which then leaves the
more standard BIO_get_data mechanism usable. This also avoids the risk that
a future OpenSSL will add a now BIO_ctrl to the socket type, with libssl
calling into it, and then break some assumptions made by PostgreSQL.

I've attached a patch which does that. The existing SSL tests pass with
it, tested on Debian stable. (Though it took me a few iterations to figure
out how to run the SSL tests, so it's possible I've missed something.)

The patch is not expected to change behavior, so nothing new to document,
nor any expected performance impact.

David


avoid-mixing-custom-and-openssl-bios-v1.patch
Description: Binary data