On 12 January 2018 at 04:05, Travis Burtrum <[email protected]> wrote: > Hello, > > My replies in-line as well. > > On 01/10/2018 03:20 AM, Jonas Wielicki wrote: >> Hi Travis, >> >> Notes inline. >> >> On Montag, 8. Januar 2018 23:19:36 CET Travis Burtrum wrote: >>> First, what do docs say: >>> >>> RFC-6120[2] Section-3.2.1 #7 says: >>>> 7. If the initiating entity fails to connect using all resolved IP >>>> >>>> addresses for a given FDQN, then it repeats the process of >>>> resolution and connection for the next FQDN returned by the SRV >>>> lookup based on the priority and weight as defined in [DNS-SRV]. >>> >>> 'fails to connect' does this mean the TCP connection fails, or the XMPP >>> connection fails? >>> >>> #8 might leave a hint: >>>> 8. If the initiating entity receives a response to its SRV query but >>>> >>>> it is not able to establish an XMPP connection using the data >>>> received in the response, it SHOULD NOT attempt the fallback >>>> process described in the next section (this helps to prevent a >>>> state mismatch between inbound and outbound connections). >>> >>> This clearly says XMPP connection, but does it apply to #7 ? >>> >>> It is also clear I didn't think about this too hard when writing >>> XEP-0368, because I clearly (to me) assume SRV fallback >> >> The text you quote is *not* about SRV fallback. It refers to the fallback to >> A/AAAA records in the next section (3.2.2). (which, holy cow, we should >> really >> not ever ever do if we got SRV records.) >> >> The only wording in the RFC for SRV iteration is #7 you quoted. So it is all >> about the definition of "fails to connect". The elders may have information >> on >> what was originally meant by that and if there is some more wisdom on the >> reasoning for this. > > Yes I fully agree RFC-wise this all hinges on what 'fails to connect' > means, I quoted 8 because it was the only one that didn't use those > exact words and said 'XMPP connection' instead. Whether that means > anything or not is up for interpretation. I'd also vote since it's at > best ambiguous we just decide what the 'right' thing is anyway. >
The question is at what point do we declare a connection "complete", so a subsequent failure is considered a connection failure for the domain as a whole. It's typically been implemented as TCP connection, in most code I can immediately see. I can see an argument that we should make it an XMLStream instead (even one that immediately gives an error). I can just about go along with adding a TLS-protected XMLStream to the mix, but quite honestly I'd be uncomfortable here. >>> will happen if a >>> complete XMPP connection is not successful, because under Implementation >>> Notes I say: >>>> Server operators should not expect multiplexing (via ALPN) to work in >>>> all scenarios and therefore should provide additional SRV record(s) >>>> that do not require multiplexing (either standard STARTTLS or >>>> dedicated direct XMPP-over-TLS). This is a result of relying on ALPN >>>> for multiplexing, where ALPN might not be supported by all devices or >>>> may be disabled by a user due to privacy reasons. >>> >>> While I don't explicitly say it, if a port required ALPN to multiplex, >>> it will generally end up connecting you to a non-XMPP server without >>> ALPN, meaning you will get back invalid XML, other junk, and/or an >>> invalid TLS cert. >> >> This definitely could use wording in '368. > > Absolutely agree, however I'd like to wait to update it so I can also > note what we decide here, I think. > >> While I’m at it, I am really uncomfortable with further supporting the "put >> everything behind SSL on 443" and move the Deep-Packet-Inspection-war behind >> the TLS, driving us to a world where we’ll everything on port 443, with ALPN- >> based multiplexing. But that’s kinda OT. (but this is why I’m hesitant with >> making ALPN a MUST.) > > Yea this has been addressed a few times over the course of this XEP and > while I agree with the sentiment, I'd prefer to connect by any means > possible than to stay unconnected knowing it's more 'pure' that way or > something. :) (though by all means, when you find evil networks, try to > get them to change) > >>> Now that the docs are out of the way, on to the discussion: >>> >>> In my opinion, at least all of cannot-connect-to-port, non-XML, >>> not-proper-stream and invalid TLS cert should trigger a fallback to the >>> next highest priority SRV record. >> >> Is there a guarantee or requirement that servers in two different SRV >> priorities can be used at the same time? If not, it seems a bad I idea to >> fall >> back on them for purely application-layer reasons. > > I believe so? At least my understanding of SRV is that clients can end > up connecting to any at any time. > I don't think a lower priority SRV record can be used if a higher priority one is available, but that's not quite the same thing. However, this doesn't mean that a lower priority server instance can't be in use at the same time as a higher priority one - networking failures might cause the higher priority ones to be unreachable to some clients (but not others). >>> Everyone in the MUC seemed to agree >>> if authentication fails a fallback would be a bad idea. >>> >>> Sam Whited said that if a TCP connection is established fallback should >>> cease, that it shouldn't have anything to do with or any knowledge of >>> XMPP, and that it might have security implementations to do otherwise. >>> (please correct and forgive me if I misunderstood) I disagree with >>> this, I think if Eve has control over DNS (and no DNSSEC) she can return >>> arbitrary records anyway so SRV fallback doesn't matter. >> >> That’s not true. As soon as one of the SRV records points to another >> (possibly >> unsigned) zone, Mallory could forge DNS replies there even without the >> ability >> to forge the whole SRV RRset (a low TTL on one of the SRV target host names >> (compared to the SRV records themselves) could also ease an attack on those >> host names compared to the SRV records). The other servers could be taken >> down >> by (D)DoS, or if you’re on-path, by messing with the TLS handshake. >> >> Now this is irrelevant to the current discussion insofar that this is a >> vector >> already present in RFC6120 behaviour where "failed to connect" is interpreted >> at the TCP level, but it shows that there are cases we haven’t thought of >> yet. >> >> I can’t think of any example which is only allowed by the "new" fallback >> rules >> right now, but that doesn’t--unfortunately--mean that none exist. > > Right DNSSEC only protects if all domains are equally protected, both > SRV and A/AAAA. My only point is with interpreting 'connects to TCP' as > 'stop trying other SRV records' then someone only has to DOS the highest > priority server, instead of all of them. > Well, DNSSEC only provides protection where its deployed. But protecting the SRV record is providing some protection even if the address records are unprotected. > I also kind of object to calling these "new" rules, it's how I've always > interpreted how it should work, how conversations works, and quite > possibly many others work this way too. I'm just looking for consensus > since there doesn't seem to be one. :) > >> One argument which could be made is that we assume that certificate >> validation >> is safe. In that case, anything post-TLS is (I think) safe to use as a cause >> for fallback, because if an attacker is able to play Mallory (Dolev-Yao) on >> the post-TLS (inside TLS) stream, it’s kind of game-over anyways. At least I >> can’t think of anything which can be gained from diverting traffic (by >> deliberately causing SRV fallback with e.g. invalid-XML post-TLS) to another >> host here (the attacker already has full control of the traffic). (If they >> can >> only Eavesdrop and not manipulate, I don’t see how they could divert the >> traffic with things happening post-TLS which couldn’t be applied pre-TLS too >> (e.g. DoS of the connection)). >> >> So if this argument holds, we only need to take special care (with respect to >> security issues) for pre-TLS fallback rules. Since we’re talking about >> connecting to xmpps-server, there is no pre-TLS as far as the XML stream is >> concerned (e.g. invalid-XML would be safe to fall-back on). >> >> >> Now things become tricky if we look at how to handle invalid certificates and >> other TLS issues. (FWIW, when I asked about this years ago in, I think, >> jdev@, >> it was suggested to me to fall back on about anything which isn’t authn >> failure. Unfortunately, I can’t recall who said that.) >> >> So the worst an attacker could do (assuming that we do strict certificate >> validation, don’t allow non-TLS and that TLS is safe), is DoS, I think. Any >> modification of the TLS (and pre-TLS) handshake would lead the client to >> either fail to set up TLS or succeed to set up TLS with the target host, at >> which point the post-TLS argument from above takes hold. (If a client can be >> tricked to use a non-TLS stream, that’s a problem all by itself I guess.) >> >> Being able to cause a failure to set up TLS (e.g. when stripping the >> <starttls/> feature with a client who doesn’t attempt starttls independent of >> the presence of the feature; or by actively MitM-ing the TLS exchange in an >> attempt to impersonate the target host with an invalid certificate >> (#corporatefirewall)) is a DoS vector, which can, indeed, be circumvented by >> simply trying the next SRV record (assuming that the attacker cannot >> influence >> that path, too). >> >> >> Do these arguments make sense? > > I think so, TLS failure surely shouldn't stop fallback since an attacker > can easily set that up. But also I don't think that's the cut-off, as > you said next, if the *right* (tls-authed) server sends > <internal-server-error/> you'd want to continue fallback too. > What about if the right server responds perfectly, but has a high latency (or packet loss)? What about if the right server responds perfectly but you keep losing the connection? >> Now one case where this could be a problem, I imagine, is where different SRV >> priorities are used to group primary and hot-standby servers respectively, >> with the hot-standby servers being unusable while the primaries are being >> used. If the hot-standbys cannot reject connections while the primaries are >> being used, a client could be tricked to connecting to the hot-standbys, >> potentially getting out-of-sync with the rest of the domain and isolating >> them >> on a seemingly empty server with no s2s connectivity. >> >> But that can easily happen with DNS connectivity issues already and I’d argue >> that this is then an issue with the zone which set this up. > > Yes I think this is improper SRV use, all servers in your SRV records > should be usable at any time. > >>> I think my proposal is even more generic than the above, I think >>> authentication-response should be the point when fallback ceases. >> >> I disagree. I think the point where authentication is about to start (i.e. >> the >> point right before selection of the SASL mechanism) should be where fallback >> ceases. In addition, no fallback should be made if a required stream feature >> is not offered. >> >> I think it is reasonable to assume that all servers which can be used >> interchangably will have identical or equivalent stream- and other features. >> Thus fallback should not be attempted if there is a problem with the offered >> stream features. Examples: (a) client requires starttls, server doesn’t >> offer; >> (b) client does not allow DIGEST-MD5 or PLAIN for policy reasons, server only >> offers those. > > I think that's equally sensible, I think either one of these would solve > 99% of the problem. I suppose it's *possible* in a migration sense if > servers are different versions or something to offer different SASL > mechanisms or digests, but I can't imagine it would be common enough to > worry about in the wild. > >> Stream errors which happen before authentication are more difficult. >> (<internal-server-error/> would be a good candidate for "try the next host".) >> But I can see how "try the next host" could be a reasonable course of action >> here. >> >> >>> […] after authentication, whether it's successful or not, >>> you no longer fall back anymore. >> >> I wholeheartedly agree on this one. While failed authn can be an issue on the >> server-side affecting only a single host, I think it will in most cases >> simply >> be a typo in the password or a changed password. In both cases, early user >> feedback is important (now a clever client could ask the user for the >> password >> and also try the other SRV options in the background to rule out server >> config >> issues, but that’s nothing we should specify.) >> >> (I would argue that it is good practice to block (e.g. with a proper stream >> error) new connection attempts entirely if you know you won’t be able to >> handle authentication currently.) > > You read minds, I was going to say UI wouldn't have to show a dumb > 'connecting' the entire time, it could say 'trying server 1', 'trying > server 2', 'are you sure password is correct? trying server 3' etc etc > > It's possible falling back would fix bad username/password too (database > replication on primary down or something), but at this point we are > going down the rabbit hole zinid mentioned, what about bookmarks, mam > sync, etc etc. This seems like one of those sensible 99% fix points to me. > >>> Depending what we decide, I plan to set up various domain/SRV record >>> combinations for testing, probably clients and servers both need this >>> type of testing, and I doubt it is done often. >> >> Setting up test domains sounds like a great thing to do. I’d like to >> integrate >> that in my test suite. > > Still going to hold off a bit to try to reach consensus, but sounds > great, I'll talk to you about it. :) > >> kind regards, >> Jonas > > Thanks much! > Travis > _______________________________________________ > Standards mailing list > Info: https://mail.jabber.org/mailman/listinfo/standards > Unsubscribe: [email protected] > _______________________________________________ _______________________________________________ Standards mailing list Info: https://mail.jabber.org/mailman/listinfo/standards Unsubscribe: [email protected] _______________________________________________
