Re: [Standards] Proper SRV Record Fallback

Jonas Wielicki Wed, 10 Jan 2018 00:22:37 -0800

Hi Travis,

Notes inline.

On Montag, 8. Januar 2018 23:19:36 CET Travis Burtrum wrote:
> First, what do docs say:
> 
> RFC-6120[2] Section-3.2.1 #7 says:
> > 7. If the initiating entity fails to connect using all resolved IP
> > 
> >       addresses for a given FDQN, then it repeats the process of
> >       resolution and connection for the next FQDN returned by the SRV
> >       lookup based on the priority and weight as defined in [DNS-SRV].
> 
> 'fails to connect' does this mean the TCP connection fails, or the XMPP
> connection fails?
> 
> #8 might leave a hint:
> > 8. If the initiating entity receives a response to its SRV query but
> > 
> >       it is not able to establish an XMPP connection using the data
> >       received in the response, it SHOULD NOT attempt the fallback
> >       process described in the next section (this helps to prevent a
> >       state mismatch between inbound and outbound connections).
> 
> This clearly says XMPP connection, but does it apply to #7 ?
> 
> It is also clear I didn't think about this too hard when writing
> XEP-0368, because I clearly (to me) assume SRV fallback 

The text you quote is *not* about SRV fallback. It refers to the fallback to 
A/AAAA records in the next section (3.2.2). (which, holy cow, we should really 
not ever ever do if we got SRV records.)

The only wording in the RFC for SRV iteration is #7 you quoted. So it is all 
about the definition of "fails to connect". The elders may have information on 
what was originally meant by that and if there is some more wisdom on the 
reasoning for this.

> will happen if a
> complete XMPP connection is not successful, because under Implementation
> Notes I say:
> > Server operators should not expect multiplexing (via ALPN) to work in
> > all scenarios and therefore should provide additional SRV record(s)
> > that do not require multiplexing (either standard STARTTLS or
> > dedicated direct XMPP-over-TLS). This is a result of relying on ALPN
> > for multiplexing, where ALPN might not be supported by all devices or
> > may be disabled by a user due to privacy reasons.
> 
> While I don't explicitly say it, if a port required ALPN to multiplex,
> it will generally end up connecting you to a non-XMPP server without
> ALPN, meaning you will get back invalid XML, other junk, and/or an
> invalid TLS cert.

This definitely could use wording in '368.

> RFC-2782[3], defining SRV records, makes no mention of this.  Which
> actually makes sense because it doesn't even define possible protocols,
> UDP for example has no connection concept.

Mostly it makes sense because SRV are meant to *solve* the duplexing issue, 
not to make it worse.

While I’m at it, I am really uncomfortable with further supporting the "put 
everything behind SSL on 443" and move the Deep-Packet-Inspection-war behind 
the TLS, driving us to a world where we’ll everything on port 443, with ALPN-
based multiplexing. But that’s kinda OT. (but this is why I’m hesitant with 
making ALPN a MUST.)

> Now that the docs are out of the way, on to the discussion:
> 
> In my opinion, at least all of cannot-connect-to-port, non-XML,
> not-proper-stream and invalid TLS cert should trigger a fallback to the
> next highest priority SRV record.

Is there a guarantee or requirement that servers in two different SRV 
priorities can be used at the same time? If not, it seems a bad I idea to fall 
back on them for purely application-layer reasons.

> Everyone in the MUC seemed to agree
> if authentication fails a fallback would be a bad idea.
> 
> Sam Whited said that if a TCP connection is established fallback should
> cease, that it shouldn't have anything to do with or any knowledge of
> XMPP, and that it might have security implementations to do otherwise.
> (please correct and forgive me if I misunderstood)  I disagree with
> this, I think if Eve has control over DNS (and no DNSSEC) she can return
> arbitrary records anyway so SRV fallback doesn't matter. 

That’s not true. As soon as one of the SRV records points to another (possibly 
unsigned) zone, Mallory could forge DNS replies there even without the ability 
to forge the whole SRV RRset (a low TTL on one of the SRV target host names  
(compared to the SRV records themselves) could also ease an attack on those 
host names compared to the SRV records). The other servers could be taken down 
by (D)DoS, or if you’re on-path, by messing with the TLS handshake.

Now this is irrelevant to the current discussion insofar that this is a vector 
already present in RFC6120 behaviour where "failed to connect" is interpreted 
at the TCP level, but it shows that there are cases we haven’t thought of yet.

I can’t think of any example which is only allowed by the "new" fallback rules 
right now, but that doesn’t--unfortunately--mean that none exist.

One argument which could be made is that we assume that certificate validation 
is safe. In that case, anything post-TLS is (I think) safe to use as a cause 
for fallback, because if an attacker is able to play Mallory (Dolev-Yao) on 
the post-TLS (inside TLS) stream, it’s kind of game-over anyways. At least I 
can’t think of anything which can be gained from diverting traffic (by 
deliberately causing SRV fallback with e.g. invalid-XML post-TLS) to another 
host here (the attacker already has full control of the traffic). (If they can 
only Eavesdrop and not manipulate, I don’t see how they could divert the 
traffic with things happening post-TLS which couldn’t be applied pre-TLS too 
(e.g. DoS of the connection)).

So if this argument holds, we only need to take special care (with respect to 
security issues) for pre-TLS fallback rules. Since we’re talking about 
connecting to xmpps-server, there is no pre-TLS as far as the XML stream is 
concerned (e.g. invalid-XML would be safe to fall-back on).

Now things become tricky if we look at how to handle invalid certificates and 
other TLS issues. (FWIW, when I asked about this years ago in, I think, jdev@, 
it was suggested to me to fall back on about anything which isn’t authn 
failure. Unfortunately, I can’t recall who said that.)

So the worst an attacker could do (assuming that we do strict certificate 
validation, don’t allow non-TLS and that TLS is safe), is DoS, I think. Any 
modification of the TLS (and pre-TLS) handshake would lead the client to 
either fail to set up TLS or succeed to set up TLS with the target host, at 
which point the post-TLS argument from above takes hold. (If a client can be 
tricked to use a non-TLS stream, that’s a problem all by itself I guess.)

Being able to cause a failure to set up TLS (e.g. when stripping the 
<starttls/> feature with a client who doesn’t attempt starttls independent of 
the presence of the feature; or by actively MitM-ing the TLS exchange in an 
attempt to impersonate the target host with an invalid certificate 
(#corporatefirewall)) is a DoS vector, which can, indeed, be circumvented by 
simply trying the next SRV record (assuming that the attacker cannot influence 
that path, too).

Do these arguments make sense?

Now one case where this could be a problem, I imagine, is where different SRV 
priorities are used to group primary and hot-standby servers respectively, 
with the hot-standby servers being unusable while the primaries are being 
used. If the hot-standbys cannot reject connections while the primaries are 
being used, a client could be tricked to connecting to the hot-standbys, 
potentially getting out-of-sync with the rest of the domain and isolating them 
on a seemingly empty server with no s2s connectivity.

But that can easily happen with DNS connectivity issues already and I’d argue 
that this is then an issue with the zone which set this up.

> I think my proposal is even more generic than the above, I think
> authentication-response should be the point when fallback ceases.

I disagree. I think the point where authentication is about to start (i.e. the 
point right before selection of the SASL mechanism) should be where fallback 
ceases. In addition, no fallback should be made if a required stream feature 
is not offered.

I think it is reasonable to assume that all servers which can be used 
interchangably will have identical or equivalent stream- and other features. 
Thus fallback should not be attempted if there is a problem with the offered 
stream features. Examples: (a) client requires starttls, server doesn’t offer; 
(b) client does not allow DIGEST-MD5 or PLAIN for policy reasons, server only 
offers those.

Stream errors which happen before authentication are more difficult. 
(<internal-server-error/> would be a good candidate for "try the next host".) 
But I can see how "try the next host" could be a reasonable course of action 
here.

> […] after authentication, whether it's successful or not,
> you no longer fall back anymore.

I wholeheartedly agree on this one. While failed authn can be an issue on the 
server-side affecting only a single host, I think it will in most cases simply 
be a typo in the password or a changed password. In both cases, early user 
feedback is important (now a clever client could ask the user for the password 
and also try the other SRV options in the background to rule out server config 
issues, but that’s nothing we should specify.)

(I would argue that it is good practice to block (e.g. with a proper stream 
error) new connection attempts entirely if you know you won’t be able to 
handle authentication currently.)

> Depending what we decide, I plan to set up various domain/SRV record
> combinations for testing, probably clients and servers both need this
> type of testing, and I doubt it is done often.

Setting up test domains sounds like a great thing to do. I’d like to integrate 
that in my test suite.

kind regards,
Jonas

signature.asc
Description: This is a digitally signed message part.

_______________________________________________
Standards mailing list
Info: https://mail.jabber.org/mailman/listinfo/standards
Unsubscribe: [email protected]
_______________________________________________

Re: [Standards] Proper SRV Record Fallback

Reply via email to