Hi Travis, Notes inline.
On Montag, 8. Januar 2018 23:19:36 CET Travis Burtrum wrote: > First, what do docs say: > > RFC-6120[2] Section-3.2.1 #7 says: > > 7. If the initiating entity fails to connect using all resolved IP > > > > addresses for a given FDQN, then it repeats the process of > > resolution and connection for the next FQDN returned by the SRV > > lookup based on the priority and weight as defined in [DNS-SRV]. > > 'fails to connect' does this mean the TCP connection fails, or the XMPP > connection fails? > > #8 might leave a hint: > > 8. If the initiating entity receives a response to its SRV query but > > > > it is not able to establish an XMPP connection using the data > > received in the response, it SHOULD NOT attempt the fallback > > process described in the next section (this helps to prevent a > > state mismatch between inbound and outbound connections). > > This clearly says XMPP connection, but does it apply to #7 ? > > It is also clear I didn't think about this too hard when writing > XEP-0368, because I clearly (to me) assume SRV fallback The text you quote is *not* about SRV fallback. It refers to the fallback to A/AAAA records in the next section (3.2.2). (which, holy cow, we should really not ever ever do if we got SRV records.) The only wording in the RFC for SRV iteration is #7 you quoted. So it is all about the definition of "fails to connect". The elders may have information on what was originally meant by that and if there is some more wisdom on the reasoning for this. > will happen if a > complete XMPP connection is not successful, because under Implementation > Notes I say: > > Server operators should not expect multiplexing (via ALPN) to work in > > all scenarios and therefore should provide additional SRV record(s) > > that do not require multiplexing (either standard STARTTLS or > > dedicated direct XMPP-over-TLS). This is a result of relying on ALPN > > for multiplexing, where ALPN might not be supported by all devices or > > may be disabled by a user due to privacy reasons. > > While I don't explicitly say it, if a port required ALPN to multiplex, > it will generally end up connecting you to a non-XMPP server without > ALPN, meaning you will get back invalid XML, other junk, and/or an > invalid TLS cert. This definitely could use wording in '368. > RFC-2782[3], defining SRV records, makes no mention of this. Which > actually makes sense because it doesn't even define possible protocols, > UDP for example has no connection concept. Mostly it makes sense because SRV are meant to *solve* the duplexing issue, not to make it worse. While I’m at it, I am really uncomfortable with further supporting the "put everything behind SSL on 443" and move the Deep-Packet-Inspection-war behind the TLS, driving us to a world where we’ll everything on port 443, with ALPN- based multiplexing. But that’s kinda OT. (but this is why I’m hesitant with making ALPN a MUST.) > Now that the docs are out of the way, on to the discussion: > > In my opinion, at least all of cannot-connect-to-port, non-XML, > not-proper-stream and invalid TLS cert should trigger a fallback to the > next highest priority SRV record. Is there a guarantee or requirement that servers in two different SRV priorities can be used at the same time? If not, it seems a bad I idea to fall back on them for purely application-layer reasons. > Everyone in the MUC seemed to agree > if authentication fails a fallback would be a bad idea. > > Sam Whited said that if a TCP connection is established fallback should > cease, that it shouldn't have anything to do with or any knowledge of > XMPP, and that it might have security implementations to do otherwise. > (please correct and forgive me if I misunderstood) I disagree with > this, I think if Eve has control over DNS (and no DNSSEC) she can return > arbitrary records anyway so SRV fallback doesn't matter. That’s not true. As soon as one of the SRV records points to another (possibly unsigned) zone, Mallory could forge DNS replies there even without the ability to forge the whole SRV RRset (a low TTL on one of the SRV target host names (compared to the SRV records themselves) could also ease an attack on those host names compared to the SRV records). The other servers could be taken down by (D)DoS, or if you’re on-path, by messing with the TLS handshake. Now this is irrelevant to the current discussion insofar that this is a vector already present in RFC6120 behaviour where "failed to connect" is interpreted at the TCP level, but it shows that there are cases we haven’t thought of yet. I can’t think of any example which is only allowed by the "new" fallback rules right now, but that doesn’t--unfortunately--mean that none exist. One argument which could be made is that we assume that certificate validation is safe. In that case, anything post-TLS is (I think) safe to use as a cause for fallback, because if an attacker is able to play Mallory (Dolev-Yao) on the post-TLS (inside TLS) stream, it’s kind of game-over anyways. At least I can’t think of anything which can be gained from diverting traffic (by deliberately causing SRV fallback with e.g. invalid-XML post-TLS) to another host here (the attacker already has full control of the traffic). (If they can only Eavesdrop and not manipulate, I don’t see how they could divert the traffic with things happening post-TLS which couldn’t be applied pre-TLS too (e.g. DoS of the connection)). So if this argument holds, we only need to take special care (with respect to security issues) for pre-TLS fallback rules. Since we’re talking about connecting to xmpps-server, there is no pre-TLS as far as the XML stream is concerned (e.g. invalid-XML would be safe to fall-back on). Now things become tricky if we look at how to handle invalid certificates and other TLS issues. (FWIW, when I asked about this years ago in, I think, jdev@, it was suggested to me to fall back on about anything which isn’t authn failure. Unfortunately, I can’t recall who said that.) So the worst an attacker could do (assuming that we do strict certificate validation, don’t allow non-TLS and that TLS is safe), is DoS, I think. Any modification of the TLS (and pre-TLS) handshake would lead the client to either fail to set up TLS or succeed to set up TLS with the target host, at which point the post-TLS argument from above takes hold. (If a client can be tricked to use a non-TLS stream, that’s a problem all by itself I guess.) Being able to cause a failure to set up TLS (e.g. when stripping the <starttls/> feature with a client who doesn’t attempt starttls independent of the presence of the feature; or by actively MitM-ing the TLS exchange in an attempt to impersonate the target host with an invalid certificate (#corporatefirewall)) is a DoS vector, which can, indeed, be circumvented by simply trying the next SRV record (assuming that the attacker cannot influence that path, too). Do these arguments make sense? Now one case where this could be a problem, I imagine, is where different SRV priorities are used to group primary and hot-standby servers respectively, with the hot-standby servers being unusable while the primaries are being used. If the hot-standbys cannot reject connections while the primaries are being used, a client could be tricked to connecting to the hot-standbys, potentially getting out-of-sync with the rest of the domain and isolating them on a seemingly empty server with no s2s connectivity. But that can easily happen with DNS connectivity issues already and I’d argue that this is then an issue with the zone which set this up. > I think my proposal is even more generic than the above, I think > authentication-response should be the point when fallback ceases. I disagree. I think the point where authentication is about to start (i.e. the point right before selection of the SASL mechanism) should be where fallback ceases. In addition, no fallback should be made if a required stream feature is not offered. I think it is reasonable to assume that all servers which can be used interchangably will have identical or equivalent stream- and other features. Thus fallback should not be attempted if there is a problem with the offered stream features. Examples: (a) client requires starttls, server doesn’t offer; (b) client does not allow DIGEST-MD5 or PLAIN for policy reasons, server only offers those. Stream errors which happen before authentication are more difficult. (<internal-server-error/> would be a good candidate for "try the next host".) But I can see how "try the next host" could be a reasonable course of action here. > […] after authentication, whether it's successful or not, > you no longer fall back anymore. I wholeheartedly agree on this one. While failed authn can be an issue on the server-side affecting only a single host, I think it will in most cases simply be a typo in the password or a changed password. In both cases, early user feedback is important (now a clever client could ask the user for the password and also try the other SRV options in the background to rule out server config issues, but that’s nothing we should specify.) (I would argue that it is good practice to block (e.g. with a proper stream error) new connection attempts entirely if you know you won’t be able to handle authentication currently.) > Depending what we decide, I plan to set up various domain/SRV record > combinations for testing, probably clients and servers both need this > type of testing, and I doubt it is done often. Setting up test domains sounds like a great thing to do. I’d like to integrate that in my test suite. kind regards, Jonas
signature.asc
Description: This is a digitally signed message part.
_______________________________________________ Standards mailing list Info: https://mail.jabber.org/mailman/listinfo/standards Unsubscribe: [email protected] _______________________________________________
