#27841: Surprise race: Intro point closes circuit after NACK, at the same time as client tries to extend circuit to new intro point -------------------------------------------------+------------------------- Reporter: asn | Owner: neel Type: defect | Status: | reopened Priority: Medium | Milestone: Tor: | 0.3.5.x-final Component: Core Tor/Tor | Version: Severity: Normal | Resolution: Keywords: tor-hs dos 035-backport | Actual Points: 040-backport 041-backport | Parent ID: | Points: Reviewer: dgoulet | Sponsor: | Sponsor27 -------------------------------------------------+------------------------- Changes (by dgoulet):
* status: closed => reopened * keywords: tor-hs dos => tor-hs dos 035-backport 040-backport 041-backport * resolution: fixed => Comment: I would like us to strongly reconsider the backport this back down to 035. Reason is that it is really badly affecting tor clients and thus HS reachability. Here is how/why: (The following considers that every time the client reaches the intro point, it gets NACKed because it has the old descriptor.) 1. The obvious issue is that tor clients gets the intro circuit destroyed while it is trying to re-extend to a new IP. This itself, requires the client to do many round trips before noticing and then re-opening a new circuit to see the same again until all 3 IP have failed. 2. This one is a more serious issue. I've experienced during my testing a client looping over all IPs trying to establish an intro point but instead getting its circuit `TRUNCATED` for internal reason just _after_ sending the `INTRODUCE1` cell. This is not seen as an "intro failure" by the client so the intro point will be retried. However, how can we get a `TRUNCATED` _before_ the `INTRODUCE_ACK` nack- ing our request? This behavior I've seen a lot where a client can make 20-30 tries before it finally gets a NACK. The reason I believe is in our cell scheduler: When selecting an active channel, we ask the cmux subsystem to give the first active circuit queue, this is done in `circuitmux_get_first_active_circuit()`. But, we alternate between `DESTROY` cell queue and the RELAY cell queue. When the intro point sends a NACK, it first queues the `INTRODUCE_ACK` cell and then, because of this bug still everywhere in the network, it queues a `DESTROY` just after. Then our scheduler, at that point in time, decides to send the `DESTROY` _before_ the ack resulting in our client receiving a truncated cell, not noticing the NACK and thus retrying the same intro point after. If no `DESTROY` cells were sent on the channel cmux yet, then it is prioritized from the relay cell. So, it is not actually 1/2 chance of hitting this, I believe it is much high probability to hit the issue I just described especially on smaller relays. Considering the above, I'm strongly asking for 035, 040 and 041 backport. -- Ticket URL: <https://trac.torproject.org/projects/tor/ticket/27841#comment:26> Tor Bug Tracker & Wiki <https://trac.torproject.org/> The Tor Project: anonymity online
_______________________________________________ tor-bugs mailing list tor-bugs@lists.torproject.org https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-bugs