I am happy to report that we have upgraded all our relays to Tor 0.4.8.0-alpha-dev and for the pst 8 days since the upgrade the bind conflict has ceased. No firewall rules are being used. No sysctl settings helped.
-- Christopher Sheats (yawnbox) Executive Director Emerald Onion Signal: +1 206.739.3390 Website: https://emeraldonion.org/ Mastodon: https://digitalcourage.social/@EmeraldOnion/ > On Dec 12, 2022, at 1:18 PM, Anders Trier Olesen > <[email protected]> wrote: > > > It is surprising, isn't it? It certainly feels like calling connect > > without first binding to an address should have the same effect as > > manually binding to an address and then calling connect, especially if > > the address you bind to is the same as the kernel would have chosen > > automatically. It seems like it might be a bug, but I'm not qualified to > > judge that. > Yes, I'm starting to think so too. And strange that Cloudflare doesn't > mention stumbling upon this problem in their blogpost on running out of > ephemeral ports. [1] > If I find the time, I'll make an attempt at understanding exactly what is > going on in the kernel. > > > If I am interpreting your results correctly, it means that either of the > > two extremes is safe > Yes. That is what I think too. > > > Anyway, thank your for the insight. I apologize if I was inconsiderate > > in my prior reply. > Likewise! > > Best regards > Anders Trier Olesen > > [1] > https://blog.cloudflare.com/how-to-stop-running-out-of-ephemeral-ports-and-start-to-love-long-lived-connections/ > > On Mon, Dec 12, 2022 at 4:16 PM David Fifield <[email protected] > <mailto:[email protected]>> wrote: >> On Mon, Dec 12, 2022 at 12:39:50AM +0100, Anders Trier Olesen wrote: >> > I wrote some tests[1] which showed behaviour I did not expect. >> > IP_BIND_ADDRESS_NO_PORT seems to work as it should, but calling bind >> > without it >> > enabled turns out to be even worse than I thought. >> > This is what I think is happening: A successful bind() on a socket without >> > IP_BIND_ADDRESS_NO_PORT enabled, with or without an explicit port >> > configured, >> > makes the assigned (or supplied) port unavailable for new connect()s (on >> > different sockets), no matter the destination. I.e if you exhaust the >> > entire >> > net.ipv4.ip_local_port_range with bind() (no matter what IP you bind to!), >> > connect() will stop working - no matter what IP you attempt to connect to. >> > You >> > can work around this by manually doing a bind() (with or without an >> > explicit >> > port, but without IP_BIND_ADDRESS_NO_PORT) on the socket before connect(). >> > >> > What blows my mind is that after running test2, you cannot connect to >> > anything >> > without manually doing a bind() beforehand (as shown by test1 and test3 >> > above)! >> > This also means that after running test2, software like ssh stops working: >> > >> > When using IP_BIND_ADDRESS_NO_PORT, we don't have this problem (1 5 6 can >> > be >> > run in any order): >> >> Thank you for preparing that experiment. It's really valuable, and it >> looks a lot like what I was seeing on the Snowflake bridge: calls to >> connect would fail with EADDRNOTAVAIL unless first bound concretely to a >> port number. IP_BIND_ADDRESS_NO_PORT causes bind not to set a concrete >> port number, so in that respect it's the same as calling connect without >> calling bind first. >> >> It is surprising, isn't it? It certainly feels like calling connect >> without first binding to an address should have the same effect as >> manually binding to an address and then calling connect, especially if >> the address you bind to is the same as the kernel would have chosen >> automatically. It seems like it might be a bug, but I'm not qualified to >> judge that. >> >> If I am interpreting your results correctly, it means that either of the >> two extremes is safe: either everything that needs to bind to a source >> address should call bind with IP_BIND_ADDRESS_NO_PORT, or else >> everything (whether it needs a specific source address or not) should >> call bind *without* IP_BIND_ADDRESS_NO_PORT. (The latter situation is >> what we've arrived at on the Snowflake bridge.) The middle ground, where >> some connections use IP_BIND_ADDRESS_NO_PORT and some do not, is what >> causes trouble, because connections that do not use >> IP_BIND_ADDRESS_NO_PORT somehow "poison" the ephemeral port pool for >> connections that do use IP_BIND_ADDRESS_NO_PORT (and for connections >> that do not bind at all). It would explain why causing HAProxy not to >> use IP_BIND_ADDRESS_NO_PORT resolved errors in my case. >> >> > > Removing the IP_BIND_ADDRESS_NO_PORT option from Haproxy and >> > > *doing nothing else* is sufficient to resolve the problem. >> > >> > Maybe there are other processes on the same host which calls bind() without >> > IP_BIND_ADDRESS_NO_PORT, and blocks the ports? E.g OutboundBindAddress or >> > similar in torrc? >> >> OutboundBindAddress is a likely culprit. We did end up setting >> OutboundBindAddress on the bridge during the period of intense >> performance debugging at the end of September. >> >> One thing doesn't quite add up, though. The earliest EADDRNOTAVAIL log >> messages started at 2022-09-28 10:57:26: >> https://gitlab.torproject.org/tpo/anti-censorship/pluggable-transports/snowflake/-/issues/40198 >> Whereas according to the change history of /etc on the bridge, >> OutboundBindAddress was first set some time between 2022-09-29 21:38:37 >> and 2022-09-29 22:37:06, over 30 hours later. I would be tempted to say >> this is a case of what you initially suspected, simple tuple exhaustion >> between two static IP addresses, if not for the fact that pre-binding an >> address resolved the problem in that case as well ("I get EADDRNOTAVAIL >> sometimes even with netcat, making a connection to the haproxy port—but >> not if I specify a source address in netcat"). But I only ran that >> netcat test after OutboundBindAddress had been set, so there may have >> been many factors being conflated. >> >> Anyway, thank your for the insight. I apologize if I was inconsiderate >> in my prior reply. >> _______________________________________________ >> tor-relays mailing list >> [email protected] <mailto:[email protected]> >> https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-relays > _______________________________________________ > tor-relays mailing list > [email protected] > https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-relays
signature.asc
Description: Message signed with OpenPGP
_______________________________________________ tor-relays mailing list [email protected] https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-relays
