On Thu, 29 Sep 2016, Lou Berger wrote:
looks like the NHT code is generating an event into the state machine that it shouldn't be. I removed this and checked it into my work area. See https://github.com/LabNConsulting/quagga-vnc/commit/7162337f9261b91056b95a673a54ad595aef3e5f
Kudos to Martin for logs/pcaps/discussion and the system that id'ed this issue and will verify the fix.
Yeah, I'd come to the same conclusion yesterday with my own test-case (again based on logs and pcaps from Martin) and emailed him. :)
The issue is that NHT is calling connect_check arbitrarily, without sync'ing with rest of state machine that otherwise ensures connect_check is only called after the fd becomes writeable (and hence, that the NONBLOCK connect() has succeeded). As the SOL_ERROR on a non-block connect()'ing fd appears to be undefined (or at least, not useful) prior to then, it happens to get 0 - thinks it has succeeded and goes into OpenSent, despite the socket not having connected.
With Martin's test tool, the test tool is deliberately ignoring the incoming connection (no SYN|ACK, nor any RST). Hence the main peer stays in OpenSent. Then the inbound from the test tool of course goes into the collision_detect code (main peer is in OpenSent) and is NOTIFIED when it should not be.
The reason this happens reliably is that NHT calls nht_update and hence connect_check early in start, but /after/ bgpd has started up the session.
regards, -- Paul Jakma | [email protected] | @pjakma | Key ID: 0xD86BF79464A2FF6A Fortune: I retain the right to change my mind, as always. Le Linus e mobile. - Linus Torvalds _______________________________________________ Quagga-dev mailing list [email protected] https://lists.quagga.net/mailman/listinfo/quagga-dev
