On Jun 19, 2018, at 2:38 PM, Wietse Venema <wie...@porcupine.org> wrote: > > It would not crash, but I don't think it would help. > > First, the scache is indexed with keys that include the TLS security > level for a connection, so that we will never reuse a low-security > connection to deliver mail for a high-security destination.
The problem is that with DANE we don't know the security level until we're working with a particular MX host. Or alternatively, we should save the original "DANE candidate" level for recording in the session cache for nexthop entries, and then use the actual level and TLS properties for the per-address entries. IIRC the cache key also includes the delivery agent name, so we never mix delivery agents with different CA stores, TLS policy tables, ... In which case when looking for some connection to the nexthop, I think it could be simpler to assume that if the connection match the policy when it was recently created (we have a connection re-use TTL) then it should still be good enough without salting in the TLS details. Salting in TLS details is of course needed for the per-peer address cache lookups, where the same host might serve multiple domains. > Thus > the lookups need to specify the security level that was in effect > when a connection was stored in the cache. This is not possible with DANE without some further changes, but I am not convinced it is necessary. > Second, it wants to look up the scache for the nexthop and primary > MXes to avoid contacting hosts that we have no cached connection > for. I think that these are the lookups that are failing because > state->iter->rr is not set. Would the patch below help? It would not always result in correct behaviour, the cached nexthop connection might not be associated with the first MX host. So this too would not crash, but may not produce correct results. If, for example, the first MX host has DANE TLSA records that don't match, we presumably (really must) close the connection without caching it, and may make a connection to a backup MX without DANE TLSA RRs (insecure, but that's what we have). But then the nexthop connection's security level does not match the security level of the initial MX. Note also that DANE policy lookup may fail for the initial MX (bad DNSSEC...), but we should still be able to re-use cached connections via the second MX, ... So I don't think the below is correct. One might argue that my patch should simply leave the TLS level alone (rather than set it to "may"), and that the cache entry for the nexthop should reflect that preliminary nexthop security level, before it is potentially modified for MX hosts that lack TLSA records... > *** ./smtp_connect.c- 2018-06-04 19:21:21.000000000 -0400 > --- ./smtp_connect.c 2018-06-19 14:36:32.000000000 -0400 > *************** > *** 672,677 **** > --- 672,678 ---- > * for connection-cache lookup by request nexthop only. > */ > #ifdef USE_TLS > + iter->rr = *addr_list; > if (!smtp_tls_policy_cache_query(why, state->tls, iter)) { > msg_warn("TLS policy lookup error for %s/%s: %s", > STR(iter->dest), STR(iter->host), STR(why->reason)); -- Viktor.