Re: [exim-dev] DANE work
On 03/09/14 05:13, Viktor Dukhovni wrote: On Wed, Sep 03, 2014 at 12:27:16AM +, Phil Pennock wrote: My jaw dropped when I returned to my list index view and saw just how many patches came in from work this weekend by Todd and Jeremy to implement DANE in Exim. Seriously good work which makes me very happy and feeling that I owe drinks to you both. Thank you for picking up where I totally slacked off and working on implementing this. (Thank you for removing one of my guilt layers of I need to find time to work on X too). Yes, thanks a bunch. I have a few questions about the code. Specifically: verify_callback_client_dane(),(lines 469--471) tlsa_lookup(),(lines 1691--1713) dane_tlsa_load(), (all lines) transports/smtp.c:(lines 3297--3306, tempfail_try_clear logic) Since I only read the commit diff, I am missing some context, so please pardon my ignorance... I don't know where the logic is that tracks the DNSSEC status of the MX lookup to determine whether the subsequent per MX-host lookups can be considered secure. There's a flag dnssec on a host struct, filled in by host_find_bydns() called (usually) from the dnslookup router, and checked by tls_client_start() [ in the version you have, but I'm about to pull it up a layer to smtp_deliver() ] in deciding whether to try DANE. I am not sure that DNS lookup errors are handled correctly. I could not find code that distinguishes between lookups failure and NXDOMAIN (which is not a lookup error). Nor me. What action do you suggest for which? I don't at first glance see code that enforces STARTTLS, without authentication, when a secure TLSA RRset exists, but all records are unusable. I did not find any code to coerce PKIX-EE(1) and PKIX-TA(0) to unusable. I did notice that unsupported matching types seem to be needlessly treated as errors, rather than as an unusable record among possibly usable other records. Those are all whatever your library does. I am confused by the interaction with tempfail try clear. That'll be dealt with by the change above; intent is to disable the try-in-clear possibility if dane was requested and the tlsa lookup succeeded. The DANE-specific verify callback seems to assume that any call in which the verification status is OK is sufficient to make the peer's chain valid, but that need not be the case when the chain length is greater than 1. It may be best to not infer any final status in that callback, and check the verification state after the connection completes, sending a QUIT (to the MiTM perhaps) and gracefully closing the connection if verification failed. My understanding was that any fail return code to the sequence of callbacks was enough to tell the OpenSSL library to fail the TLS startup, and that there would be at least one callback per chain element. Is this not the case? -- Cheers, Jeremy -- ## List details at https://lists.exim.org/mailman/listinfo/exim-dev Exim details at http://www.exim.org/ ##
Re: [exim-dev] DANE work
On Thu, Sep 04, 2014 at 11:02:39PM +0100, Jeremy Harris wrote: Since I only read the commit diff, I am missing some context, so please pardon my ignorance... I don't know where the logic is that tracks the DNSSEC status of the MX lookup to determine whether the subsequent per MX-host lookups can be considered secure. There's a flag dnssec on a host struct, filled in by host_find_bydns() called (usually) from the dnslookup router, and checked by tls_client_start() [ in the version you have, but I'm about to pull it up a layer to smtp_deliver() ] in deciding whether to try DANE. So DANE is only attempted when both the MX and the host's A or RRsets are secure, right? Also, a long time ago when I looked at the base dnssec lookup code, I was not confident that chains of CNAMEs were necessarily handled correctly in terms of making sure that all the links are secure (typically recursive resolvers do some of the chaining for you, but you need to be prepared to recurse beyond that in some cases, and in that case track the security status along each CNAME hop). I am not sure that DNS lookup errors are handled correctly. I could not find code that distinguishes between lookups failure and NXDOMAIN (which is not a lookup error). Nor me. What action do you suggest for which? This is section 2.1 of the SMTP with DANE draft. NXDOMAIN is not an error, just a finding that the record does not exist. When TLSA RRs don't exist, that's fine, DANE does not apply. Any other type of lookup error (timeout, servfail, ...) needs to be considered as a potential active attack, and the MX host in question skipped. I don't at first glance see code that enforces STARTTLS, without authentication, when a secure TLSA RRset exists, but all records are unusable. I did not find any code to coerce PKIX-EE(1) and PKIX-TA(0) to unusable. I did notice that unsupported matching types seem to be needlessly treated as errors, rather than as an unusable record among possibly usable other records. Those are all whatever your library does. Postfix applies additional logic above that library to process just the supported TLSA RRs. Unusable TLSA RRs lead to mandatory unauthenticated TLS if no usable RRs are found. The DANE-specific verify callback seems to assume that any call in which the verification status is OK is sufficient to make the peer's chain valid, but that need not be the case when the chain length is greater than 1. It may be best to not infer any final status in that callback, and check the verification state after the connection completes, sending a QUIT (to the MiTM perhaps) and gracefully closing the connection if verification failed. My understanding was that any fail return code to the sequence of callbacks was enough to tell the OpenSSL library to fail the TLS startup, and that there would be at least one callback per chain element. Is this not the case? Oh I see, you terminate connections early... Postfix allows the connection to continue, gathers information and enforces policy after the handshake. If you never return a non-zero value from the callback when the input status is zero, you might be OK. So the immediate issues to address (not necessarily exhaustive) are: * DNS error handling * Handling of unsupported TLSA RR values (unusable records), and in particular PKIX-TA(0)/PKIX-EE(1) which the library supports, but MTAs should not. Later: * Digest agility, either as new code in my library, or additional code in Exim. Plus of course a lot more testing and a more thorough code review. You might also consider ways to refactor the code that make it more obviously complete/correct. For PKIX-TA(2), the library needs to be told which hostnames are valid in the leaf certificate, you need to pass the original nexthop domain, its CNAME expansion (if different) and the TLSA base domain (MX hostname). If you support MX hosts that are CNAMEs (illegal, but often tolerated) then the TLSA base domain should be the CNAME expansion of the MX hostname and only failing that the original MX hostname. I don't recall running into the code that handles this. -- Viktor. -- ## List details at https://lists.exim.org/mailman/listinfo/exim-dev Exim details at http://www.exim.org/ ##
Re: [exim-dev] DANE work
On 03/09/14 01:27, Phil Pennock wrote: My jaw dropped when I returned to my list index view and saw just how many patches came in from work this weekend by Todd and Jeremy to implement DANE in Exim. I did wonder if I should have squashed to a single commit to avoid the mess in the timeline. Do we have a preference, in git usage on the Exim master? Too late now for this one, I guess. -- Cheers, Jeremy -- ## List details at https://lists.exim.org/mailman/listinfo/exim-dev Exim details at http://www.exim.org/ ##
Re: [exim-dev] DANE work
On Wed, Sep 3, 2014 at 1:05 AM, Jeremy Harris j...@wizmail.org wrote: On 03/09/14 01:27, Phil Pennock wrote: My jaw dropped when I returned to my list index view and saw just how many patches came in from work this weekend by Todd and Jeremy to implement DANE in Exim. Jeremy gets to take the bow for this one. My contributions are the initial build plumbing and incorporating Viktor's OpenSSL based DANE framework so that it built. Jeremy did all the hard work of doing DNS lookups, certificate handling, actually calling into the DANE framework, analyzing results, etc. How do we do licensing, since that entire file (dane-openssl.c) is Viktor's original code? Viktor, how is that code licensed? I did wonder if I should have squashed to a single commit to avoid the mess in the timeline. Do we have a preference, in git usage on the Exim master? Back when I committed the DMARC patches, Phil had me target a rebase for 2-5 distinct patches. I think that's probably what we should use as a goal in the future. I do like your numerous individual patches too though since I can see and better understand incremental steps in the thought process. ...Todd -- The total budget at all receivers for solving senders' problems is $0. If you want them to accept your mail and manage it the way you want, send it the way the spec says to. --John Levine -- ## List details at https://lists.exim.org/mailman/listinfo/exim-dev Exim details at http://www.exim.org/ ##
[exim-dev] DANE work
My jaw dropped when I returned to my list index view and saw just how many patches came in from work this weekend by Todd and Jeremy to implement DANE in Exim. Seriously good work which makes me very happy and feeling that I owe drinks to you both. Thank you for picking up where I totally slacked off and working on implementing this. (Thank you for removing one of my guilt layers of I need to find time to work on X too). Happiness. :) -Phil -- ## List details at https://lists.exim.org/mailman/listinfo/exim-dev Exim details at http://www.exim.org/ ##
Re: [exim-dev] DANE work
On Wed, Sep 03, 2014 at 12:27:16AM +, Phil Pennock wrote: My jaw dropped when I returned to my list index view and saw just how many patches came in from work this weekend by Todd and Jeremy to implement DANE in Exim. Seriously good work which makes me very happy and feeling that I owe drinks to you both. Thank you for picking up where I totally slacked off and working on implementing this. (Thank you for removing one of my guilt layers of I need to find time to work on X too). Yes, thanks a bunch. I have a few questions about the code. Specifically: verify_callback_client_dane(), (lines 469--471) tlsa_lookup(), (lines 1691--1713) dane_tlsa_load(), (all lines) transports/smtp.c: (lines 3297--3306, tempfail_try_clear logic) Since I only read the commit diff, I am missing some context, so please pardon my ignorance... I don't know where the logic is that tracks the DNSSEC status of the MX lookup to determine whether the subsequent per MX-host lookups can be considered secure. I am not sure that DNS lookup errors are handled correctly. I could not find code that distinguishes between lookups failure and NXDOMAIN (which is not a lookup error). I don't at first glance see code that enforces STARTTLS, without authentication, when a secure TLSA RRset exists, but all records are unusable. I did not find any code to coerce PKIX-EE(1) and PKIX-TA(0) to unusable. I did notice that unsupported matching types seem to be needlessly treated as errors, rather than as an unusable record among possibly usable other records. I am confused by the interaction with tempfail try clear. The DANE-specific verify callback seems to assume that any call in which the verification status is OK is sufficient to make the peer's chain valid, but that need not be the case when the chain length is greater than 1. It may be best to not infer any final status in that callback, and check the verification state after the connection completes, sending a QUIT (to the MiTM perhaps) and gracefully closing the connection if verification failed. -- Viktor. -- ## List details at https://lists.exim.org/mailman/listinfo/exim-dev Exim details at http://www.exim.org/ ##