Re: PATCH #3 (Postfix 3.4 + 3.5): TLS connection_reuse with "tafile"

2020-08-22 Thread Thorsten Habich


On 8/22/2020 7:02 AM, Viktor Dukhovni wrote:
> On Fri, Aug 21, 2020 at 05:38:42PM -0400, Wietse Venema wrote:
>
>> thorsten.hab...@findichgut.net:
>>> Any chance to backport the patch to 3.4/3.5?
>> This is more change than is allowed in a stable release. Postfix
>> 3.6 drops support for OpenSSL < 1.1.1, deletes o(thousand) lines
>> of DANE support from the Postfix TLS library, and replaces it with
>> o(hundred) lines to use instead the DANE support in OpenSSL.
> The backport request was just for the one-liner fix in posttls-finger,
> where "-X" no longer falsely conflicts with "-r" (when no "-r" is
> in fact specified).  This should/will likely be backported.

Yes, sorry.  I removed the patch I posted earlier in my message. Because
I am not sure that it's correct:

--- a/src/posttls-finger/posttls-finger.c   2019-02-12
14:17:45.0 +0100
+++ b/src/posttls-finger/posttls-finger.c.new   2020-08-21
09:15:04.256945675 +0200
@@ -1988,7 +1988,7 @@
    msg_fatal("bad '-a' option value: %s", state->options.addr_pref);

 #ifdef USE_TLS
-    if (state->tlsproxy_mode && state->reconnect)
+    if (state->tlsproxy_mode && state->reconnect > 0)
    msg_fatal("The -X and -r options are mutually exclusive");
 #endif

while it's  state->reconnect >= 0  in the 3.6 snapshot. I checked the source of 
the 3.5.6 posttls-finger.c again and I am still not sure which version should 
be used for 3.5.




Re: PATCH #3 (Postfix 3.4 + 3.5): TLS connection_reuse with "tafile"

2020-08-21 Thread Viktor Dukhovni
On Fri, Aug 21, 2020 at 05:38:42PM -0400, Wietse Venema wrote:

> thorsten.hab...@findichgut.net:
> > Any chance to backport the patch to 3.4/3.5?
> 
> This is more change than is allowed in a stable release. Postfix
> 3.6 drops support for OpenSSL < 1.1.1, deletes o(thousand) lines
> of DANE support from the Postfix TLS library, and replaces it with
> o(hundred) lines to use instead the DANE support in OpenSSL.

The backport request was just for the one-liner fix in posttls-finger,
where "-X" no longer falsely conflicts with "-r" (when no "-r" is
in fact specified).  This should/will likely be backported.

-- 
Viktor.


Re: PATCH #3 (Postfix 3.4 + 3.5): TLS connection_reuse with "tafile"

2020-08-21 Thread Wietse Venema
Viktor Dukhovni:
> On Fri, Aug 21, 2020 at 03:11:50PM -0400, Wietse Venema wrote:
> 
> > Viktor Dukhovni:
> > > On Fri, Aug 21, 2020 at 10:59:11AM -0400, Wietse Venema wrote:
> > > 
> > > > > Viktor Dukhovni:
> > > > > > -   &_DANE_BASED(state->client_start_props->tls_level))
> > > > > > +   && TLS_DANE_HASTA(state->client_start_props->dane))
> > > > > > msg_warn("%s: DANE requested, but not available",
> > > > > >  state->client_start_props->namaddr);
> > > > 
> > > > Should there be a warning when tls_dane_avail() fails AND the
> > > > TLS_DANE_BASED is true?
> > > 
> > > Not needed if TLS_DANE_HASTA is not true, because:
> > 
> > In that case, can you can suggest a more appropriate warning message?
> > The text no longer matches the error condition.
> 
> Fair point.  The warning message could/should read:
> 
>   msg_warn("%s: DANE or local trust anchor based chain"
>  " verification requested, but not available",
>state->client_start_props->namaddr);

DANE verification requested? This condition triggers when 
the SMTP client (or posttls-finger) specifies an explicit trust
anchor. They do not request DANE.

Are you saying that the condition can also trigger when the
SMTP client (or posttls-finger) tries to enforce DANE?

Wietse

Wietse


Re: PATCH #3 (Postfix 3.4 + 3.5): TLS connection_reuse with "tafile"

2020-08-21 Thread Viktor Dukhovni
> On Aug 21, 2020, at 5:21 PM, thorsten.hab...@findichgut.net wrote:
> 
> By the way I already applied your last patch on the testing environment.
> No problems found so far. tafile and CApath based mandatory TLS delivery
> work just fine.

Thanks for the confirmation.  Fortunately, the good news is not surprising,
the reason for the intermittent (more failure than success) problem you
were having, and only in tlsproxy(8) is clear from the patch.  The wrong
TLS SSL_CTX was selected for "tafile" connections, it was shared with
normal WebPKI connections which raced the "tafile" connections to set
the correct verification callback.

With the symptoms fitting the bug so well, the confirmation is more of
a formality, but still good to have.   Sorry it took a while to get here,
but the early messages in the thread had me focused on resumption, rather
than the initial verification failure, which was the real problem.

-- 
Viktor.



Re: PATCH #3 (Postfix 3.4 + 3.5): TLS connection_reuse with "tafile"

2020-08-21 Thread Wietse Venema
thorsten.hab...@findichgut.net:
> Any chance to backport the patch to 3.4/3.5?

This is more change than is allowed in a stable release. Postfix
3.6 drops support for OpenSSL < 1.1.1, deletes o(thousand) lines
of DANE support from the Postfix TLS library, and replaces it with
o(hundred) lines to use instead the DANE support in OpenSSL.

> By the way I already applied your last patch on the testing
> environment. No problems found so far. tafile and CApath based
> mandatory TLS delivery work just fine.

The bug was a race condition between concurrent TLS handshakes that
were updating the same callback hook in global state, therefore it
only happened with concurrent deliveries to different sites.

Wietse


Re: PATCH #3 (Postfix 3.4 + 3.5): TLS connection_reuse with "tafile"

2020-08-21 Thread Viktor Dukhovni
On Fri, Aug 21, 2020 at 03:11:50PM -0400, Wietse Venema wrote:

> Viktor Dukhovni:
> > On Fri, Aug 21, 2020 at 10:59:11AM -0400, Wietse Venema wrote:
> > 
> > > > Viktor Dukhovni:
> > > > > - &_DANE_BASED(state->client_start_props->tls_level))
> > > > > + && TLS_DANE_HASTA(state->client_start_props->dane))
> > > > >   msg_warn("%s: DANE requested, but not available",
> > > > >state->client_start_props->namaddr);
> > > 
> > > Should there be a warning when tls_dane_avail() fails AND the
> > > TLS_DANE_BASED is true?
> > 
> > Not needed if TLS_DANE_HASTA is not true, because:
> 
> In that case, can you can suggest a more appropriate warning message?
> The text no longer matches the error condition.

Fair point.  The warning message could/should read:

msg_warn("%s: DANE or local trust anchor based chain"
 " verification requested, but not available",
 state->client_start_props->namaddr);

-- 
Viktor.


Re: PATCH #3 (Postfix 3.4 + 3.5): TLS connection_reuse with "tafile"

2020-08-21 Thread Wietse Venema
Viktor Dukhovni:
> On Fri, Aug 21, 2020 at 10:59:11AM -0400, Wietse Venema wrote:
> 
> > > Viktor Dukhovni:
> > > > -   &_DANE_BASED(state->client_start_props->tls_level))
> > > > +   && TLS_DANE_HASTA(state->client_start_props->dane))
> > > > msg_warn("%s: DANE requested, but not available",
> > > >  state->client_start_props->namaddr);
> > 
> > Should there be a warning when tls_dane_avail() fails AND the
> > TLS_DANE_BASED is true?
> 
> Not needed if TLS_DANE_HASTA is not true, because:

In that case, can you can suggest a more appropriate warning message?
The text no longer matches the error condition.

Wietse


Re: PATCH #3 (Postfix 3.4 + 3.5): TLS connection_reuse with "tafile"

2020-08-21 Thread Viktor Dukhovni
On Fri, Aug 21, 2020 at 10:59:11AM -0400, Wietse Venema wrote:

> > Viktor Dukhovni:
> > > - &_DANE_BASED(state->client_start_props->tls_level))
> > > + && TLS_DANE_HASTA(state->client_start_props->dane))
> > >   msg_warn("%s: DANE requested, but not available",
> > >state->client_start_props->namaddr);
> 
> Should there be a warning when tls_dane_avail() fails AND the
> TLS_DANE_BASED is true?

Not needed if TLS_DANE_HASTA is not true, because:

- For a DANE-based policy without DANE-TA TLSA RRs to have
  made it to tlsproxy(8), all the DNS preconditions have
  already been satisfied, and DANE-EE TLSA records have
  been provided to tlsproxy(8).

- In Postfix 2.11–3.5, DANE-EE checks are performed post-handshake.
  Specifically, in particular the DANE-style X.509 verification
  callback is not needed and is not enabled (is set back to
  WebPKI default) on the shared SSL_CTX.

> Would the following be more correct:
> 
>int missing_infrastructure = 0;
> if (!tls_dane_avail()) {  /* mandatory side effects!! */
>   /* True DANE request. */
>   if (TLS_DANE_BASED(state->client_start_props->tls_level)) {
>   msg_warn("%s: DANE requested, but not available",
>state->client_start_props->namaddr);
>   missing_infrastructure = 1;
>   }
>   /* Not DANE, but TA support implicitly dependss on the DANE stack. */
>   else if (TLS_DANE_HASTA(state->client_start_props->dane)) {
>   msg_warn("%s: TA support requested, but DANE is not available",
>state->client_start_props->namaddr);
>   missing_infrastructure = 1;
>   }
> )
> if (missing_infrastructure == 0)
> state->tls_context = tls_client_start(state->client_start_props);

No, because DANE-EE works without tls_dane_avail.  We just check the
certificate fingerprint post-handshake.

> But wait, there is more...
> 
> > >   state->appl_state = tlsp_client_init(state->tls_params,
> > >state->client_init_props,
> > > -   TLS_DANE_BASED(state->client_start_props->tls_level));
> > > +   TLS_DANE_HASTA(state->client_start_props->dane));
> 
> Will this also use the right verify callback function pointer when
> real DANE is requested? Or does real DANE not use those same
> callbacks?

In Postfix 2.11–3.5, the real DANE-EE does not use the custom
DANE-specific X.509 verification callbacks, they're only needed
for DANE-TA verification.

The dichotomy was between WebPKI-style chain verification and DANE-TA
(or "tafile") chain verification.  When all the TLSA records are
DANE-EE, no chain verification is performed, we just do a post-handshake
fingerprint check (supporting both DANE-EE and the "fingerprint"
security level).

All this simplified in Postfix 3.6.  Speaking of which, upgrades from
3.5 to 3.6 need to perform a "reload", to flush the TLS session cache.
Otherwise, some cached sessions returned by the "old" tlsmgr(8) to
new smtp(8) may have the wrong properties.  One way to avoid this is
to include "mail_version" in the session lookup key hash.  We should
probably do that...

-- 
VIktor.


Re: PATCH #3 (Postfix 3.4 + 3.5): TLS connection_reuse with "tafile"

2020-08-21 Thread Wietse Venema
I have more questions.

Wietse Venema:
> Viktor Dukhovni:
> >  state->client_start_props->fd = state->ciphertext_fd;
> >  /* These predicates and warning belong inside tls_client_start(). */
> >  if (!tls_dane_avail()  /* mandatory side effects!! */
> > -   &_DANE_BASED(state->client_start_props->tls_level))
> > +   && TLS_DANE_HASTA(state->client_start_props->dane))
> > msg_warn("%s: DANE requested, but not available",
> >  state->client_start_props->namaddr);

Should there be a warning when tls_dane_avail() fails AND the
TLS_DANE_BASED is true?

Would the following be more correct:

   int missing_infrastructure = 0;
if (!tls_dane_avail()) {/* mandatory side effects!! */
/* True DANE request. */
if (TLS_DANE_BASED(state->client_start_props->tls_level)) {
msg_warn("%s: DANE requested, but not available",
 state->client_start_props->namaddr);
missing_infrastructure = 1;
}
/* Not DANE, but TA support implicitly dependss on the DANE stack. */
else if (TLS_DANE_HASTA(state->client_start_props->dane)) {
msg_warn("%s: TA support requested, but DANE is not available",
 state->client_start_props->namaddr);
missing_infrastructure = 1;
}
)
if (missing_infrastructure == 0)
state->tls_context = tls_client_start(state->client_start_props);

Sorry that the above looks like "my first program" but it is the
best I can do given the libtls API semantics.

But wait, there is more...

> > state->appl_state = tlsp_client_init(state->tls_params,
> >  state->client_init_props,
> > - TLS_DANE_BASED(state->client_start_props->tls_level));
> > + TLS_DANE_HASTA(state->client_start_props->dane));

Will this also use the right verify callback function pointer when
real DANE is requested? Or does real DANE not use those same
callbacks?

Wietse


Re: PATCH #3 (Postfix 3.4 + 3.5): TLS connection_reuse with "tafile"

2020-08-21 Thread Viktor Dukhovni
On Fri, Aug 21, 2020 at 10:32:10AM +0300, Thorsten Habich wrote:

> > This is relevant, but probably not 100% accurate, likely some domains
> > also intermittently failed routine CAfile-based validation.
> 
> Thanks for the patch.  There was no higher number of certificate
> verification failures since I updated to Postfix 3.5.4.
> 
> Checking the logs of the last 8 days, there is only one "Server
> certificate not trusted" error for a CApath-based configuration and 348
> errors for TAfile based configurations.
> The number of *CApath* (if that's relevant) based mandatory TLS
> configurations per destination is currently FAR higher than the tafile
> based configurations.

This is expected, given that most destinations are not using "tafile",
the probability of failing a CApath validation during a concurrent
successful "tafile" delivery is low, but not zero.  If that one
validation failure was transient (the destination otherwise verified
before and after) then that could be the one low probability case.

> Hope this patch is suitable:
> 
> --- a/src/posttls-finger/posttls-finger.c   2019-02-12
> 14:17:45.0 +0100
> +++ b/src/posttls-finger/posttls-finger.c.new   2020-08-21
> 09:15:04.256945675 +0200
> @@ -1988,7 +1988,7 @@
>     msg_fatal("bad '-a' option value: %s", state->options.addr_pref);
> 
>  #ifdef USE_TLS
> -    if (state->tlsproxy_mode && state->reconnect)
> +    if (state->tlsproxy_mode && state->reconnect > 0)
>     msg_fatal("The -X and -r options are mutually exclusive");
>  #endif

This has already been fixed in the 3.6 snapshots.  Don't recall whether
that's been backported to 3.4/3.5.

-- 
Viktor.


Re: PATCH #3 (Postfix 3.4 + 3.5): TLS connection_reuse with "tafile"

2020-08-20 Thread Wietse Venema
Viktor Dukhovni:
> On Thu, Aug 20, 2020 at 01:20:00PM -0400, Wietse Venema wrote:
> 
> > Viktor Dukhovni:
> >
> > > - &_DANE_BASED(state->client_start_props->tls_level))
> > > + && TLS_DANE_HASTA(state->client_start_props->dane))
> > > @@ -1427,7 +1427,7 @@ static void tlsp_get_request_event(int event, void 
> > > *context)
> > > -   TLS_DANE_BASED(state->client_start_props->tls_level));
> > > +   TLS_DANE_HASTA(state->client_start_props->dane));
> > 
> > This looks weird. I thought that the problem was with trust anchors, not 
> > DANE?
> 
> Yes, the problem is with trust anchors, but DANE is the general case of:
> 
> * Policy-based end-entity cert matching:
> 
> - DANE "_25._tcp.example.net. IN TLSA 3 ? ? ..." 
> 
> - The Postfix "fingerprint" security level
> 
> * Policy-based issuer CA cert matching:
> 
> - DANE "_25._tcp.example.net. IN TLSA 2 ? ? ..." 
> 
> - The Postfix verify/secure levels with a custom per-site
>   "tafile" .
> 
> Actual DANE TLSA RRsets can have either or both DANE-EE or DANE-TA
> records, with verification ultimately matching either or both.  The
> "fingerprint" level is mapped to DANE-EE, while "tafile" support is
> mapped to DANE-TA.
> 
> Thus actual DANE, fingerprint and secure/verify with a "tafile" are all
> handled via the "general case" of "some sort of DANE-like policy".
> 
> In Postfix 3.6, the job of validating "some sort DANE-like policy" is
> entirely delegated to OpenSSL.  You'll be pleased to know, that in
> Postfix 3.6 the TLS_DANE_HASTA() and TLS_DANE_HASEE() macros are gone.
> We no longer need to treat the various DANE-like matching differently.

As discussed offlist, I would have structured the code in a different
manner, such that trust-anchor support does not call into the DANE
stack, but DANE and trust anchors are entirely separate features that
call into a common infrastructure.

In any case most of that code is gone with Postfix 3.6.

Wietse


Re: PATCH #3 (Postfix 3.4 + 3.5): TLS connection_reuse with "tafile"

2020-08-20 Thread Viktor Dukhovni
On Thu, Aug 20, 2020 at 01:20:00PM -0400, Wietse Venema wrote:

> Viktor Dukhovni:
>
> > -   &_DANE_BASED(state->client_start_props->tls_level))
> > +   && TLS_DANE_HASTA(state->client_start_props->dane))
> > @@ -1427,7 +1427,7 @@ static void tlsp_get_request_event(int event, void 
> > *context)
> > - TLS_DANE_BASED(state->client_start_props->tls_level));
> > + TLS_DANE_HASTA(state->client_start_props->dane));
> 
> This looks weird. I thought that the problem was with trust anchors, not DANE?

Yes, the problem is with trust anchors, but DANE is the general case of:

* Policy-based end-entity cert matching:

- DANE "_25._tcp.example.net. IN TLSA 3 ? ? ..." 

- The Postfix "fingerprint" security level

* Policy-based issuer CA cert matching:

- DANE "_25._tcp.example.net. IN TLSA 2 ? ? ..." 

- The Postfix verify/secure levels with a custom per-site
  "tafile" .

Actual DANE TLSA RRsets can have either or both DANE-EE or DANE-TA
records, with verification ultimately matching either or both.  The
"fingerprint" level is mapped to DANE-EE, while "tafile" support is
mapped to DANE-TA.

Thus actual DANE, fingerprint and secure/verify with a "tafile" are all
handled via the "general case" of "some sort of DANE-like policy".

In Postfix 3.6, the job of validating "some sort DANE-like policy" is
entirely delegated to OpenSSL.  You'll be pleased to know, that in
Postfix 3.6 the TLS_DANE_HASTA() and TLS_DANE_HASEE() macros are gone.
We no longer need to treat the various DANE-like matching differently.

-- 
Viktor.


Re: PATCH #3 (Postfix 3.4 + 3.5): TLS connection_reuse with "tafile"

2020-08-20 Thread Wietse Venema
Viktor Dukhovni:
>  state->client_start_props->fd = state->ciphertext_fd;
>  /* These predicates and warning belong inside tls_client_start(). */
>  if (!tls_dane_avail()/* mandatory side effects!! */
> - &_DANE_BASED(state->client_start_props->tls_level))
> + && TLS_DANE_HASTA(state->client_start_props->dane))
>   msg_warn("%s: DANE requested, but not available",
>state->client_start_props->namaddr);
>  else
> @@ -1427,7 +1427,7 @@ static void tlsp_get_request_event(int event, void 
> *context)
>   }
>   state->appl_state = tlsp_client_init(state->tls_params,
>state->client_init_props,
> -   TLS_DANE_BASED(state->client_start_props->tls_level));
> +   TLS_DANE_HASTA(state->client_start_props->dane));
>   ready = state->appl_state != 0;
>   break;
>  case TLS_PROXY_FLAG_ROLE_SERVER:

This looks weird. I thought that the problem was with trust anchors, not DANE?

Wietse