On Thu, 29 Sep 2016, Lou Berger wrote:

looks like the NHT code is generating an event into the state machine
that it shouldn't be.  I removed this and checked it into my work
area.   See
https://github.com/LabNConsulting/quagga-vnc/commit/7162337f9261b91056b95a673a54ad595aef3e5f

Kudos to Martin for logs/pcaps/discussion and the system that id'ed this issue and will verify the fix.

Yeah, I'd come to the same conclusion yesterday with my own test-case (again based on logs and pcaps from Martin) and emailed him. :)

The issue is that NHT is calling connect_check arbitrarily, without sync'ing with rest of state machine that otherwise ensures connect_check is only called after the fd becomes writeable (and hence, that the NONBLOCK connect() has succeeded). As the SOL_ERROR on a non-block connect()'ing fd appears to be undefined (or at least, not useful) prior to then, it happens to get 0 - thinks it has succeeded and goes into OpenSent, despite the socket not having connected.

With Martin's test tool, the test tool is deliberately ignoring the incoming connection (no SYN|ACK, nor any RST). Hence the main peer stays in OpenSent. Then the inbound from the test tool of course goes into the collision_detect code (main peer is in OpenSent) and is NOTIFIED when it should not be.

The reason this happens reliably is that NHT calls nht_update and hence connect_check early in start, but /after/ bgpd has started up the session.

regards,
--
Paul Jakma | [email protected] | @pjakma | Key ID: 0xD86BF79464A2FF6A
Fortune:
I retain the right to change my mind, as always. Le Linus e mobile.

        - Linus Torvalds

_______________________________________________
Quagga-dev mailing list
[email protected]
https://lists.quagga.net/mailman/listinfo/quagga-dev

Reply via email to