Hi,

Thanks for finding the problem.

A fix is now in CVS using two timers.

--- Begin Message ---
CVSROOT:        /usr/local/www/data/cvs
Module name:    xorp
Changes by:     [EMAIL PROTECTED]       2007-10-23 08:59:55 UTC

XORP CVS repository


Modified files:
        ospf          peer.cc peer.hh 

Log message:
        The neighbour data structure contained a single timer to handle
        retransmissions the assumption being that only one was required as the
        states transitioned towards the full state. The problem is that if an
        adjacency is in the Exchange or Loading states it is perfectly legal
        for an LSA arriving from another neighbour to be sent to this peer. If
        the router was the master for the database exchange sending a LSA
        cancelled the database exchange timer and replaced it with the LSA
        retransmit (and LSA requests) timer. Leaving the adjacency stuck in
        state Exchange.
        
        The neighbour structure now has two timers one to handle the database
        exchange and another to handle LSA requests and retransmissions.
        
        Bug found by:   Ben Greear <greearb AT candelatech.com>

Revision  Changes                                 Path
1.300     +26 -22;  commitid: 13177471db80a7ea6;  xorp/ospf/peer.cc
1.147     +25 -8;  commitid: 13177471db80a7ea6;   xorp/ospf/peer.hh

_______________________________________________
Xorp-cvs mailing list
[EMAIL PROTECTED]
http://mailman.ICSI.Berkeley.EDU/mailman/listinfo/xorp-cvs

--- End Message ---
       Atanu.

>>>>> "Ben" == Ben Greear <[EMAIL PROTECTED]> writes:

    Ben> Ben Greear wrote:
    >> I think this snippet shows the problem.  We are disabling the old timer 
and starting
    >> a new timer.
    >> [ 2007/10/22 19:38:04 TRACE xorp_ospfv2 OSPF ]
    >> Event(NegotiationDone) Interface(12.16.12/12.16.12)
    >> Neighbour(99.1.1.9) State(ExStart)
    >> [ 2007/10/22 19:38:04 TRACE xorp_ospfv2 OSPF ] stop_rxmt_timer: 
0x838ab08 12.16.12/12.16.12 Neighbour: 99.1.1.9  NegotiationDone (master)
    >> [ 2007/10/22 19:38:04 TRACE xorp_ospfv2 OSPF ] start_rxmt_timer: 
0x838ab08 12.16.12/12.16.12 Neighbour: 99.1.1.9  send_data_description from 
NegotiationDone
    >> [ 2007/10/22 19:38:04 TRACE xorp_ospfv2 OSPF ] 
send_data_description_packet, Interface(12.16.12/12.16.12) Neighbour(99.1.1.9) 
State(Exchange)
    >> [ 2007/10/22 19:38:05 TRACE xorp_ospfv2 OSPF ] stop_rxmt_timer: 
0x838ab08 12.16.12/12.16.12 Neighbour: 99.1.1.9  push_lsas
    >> [ 2007/10/22 19:38:05 TRACE xorp_ospfv2 OSPF ] start_rxmt_timer: 
0x838ab08 12.16.12/12.16.12 Neighbour: 99.1.1.9  push_lsas
    >> [ 2007/10/22 19:38:05 TRACE xorp_ospfv2 OSPF ] stop_rxmt_timer: 
0x838ab08 12.16.12/12.16.12 Neighbour: 99.1.1.9  push_lsas
    >> [ 2007/10/22 19:38:05 TRACE xorp_ospfv2 OSPF ] start_rxmt_timer: 
0x838ab08 12.16.12/12.16.12 Neighbour: 99.1.1.9  push_lsas
    >> even though we are not out of the Exchange state
    >> ...
    >> [ 2007/10/22 19:38:06 TRACE xorp_ospfv2 OSPF ] start_rxmt_timer: 
0x838ab08 12.16.12/12.16.12 Neighbour: 99.1.1.9  push_lsas
    >> [ 2007/10/22 19:38:06 TRACE xorp_ospfv2 OSPF ] Event(LoadingDone) 
Interface(12.16.12/12.16.12) Neighbour(99.1.1.9) State(Exchange)

    Ben> So, it seems push_lsas causes the send_data_description timer to be 
over-written.  This code seems to do the trick
    Ben> (ignore all the extra debugging..I want to keep it in longer to help 
find other bugs more
    Ben> quickly.)

    Ben> Another fix might be to have multiple timers..as this problem could 
exist elsewhere.

    Ben> template <typename A>
    Ben> bool
    Ben> Neighbour<A>::push_lsas(const char* message)
    Ben> {

    Ben> if (get_state() == Exchange) {
    Ben> XLOG_TRACE(true, "WARNING:  cannot do a push_lsas in Exchange state 
because that will"
    Ben> " over-write our one and only timer that needs to finish with 
send_data_description"
    Ben> " state first, neighbour: %p %s Neighbour: %s  State: %s  %s\n",
    Ben> this, _peer.get_if_name().c_str(),
    Ben> pr_id(get_candidate_id()).c_str(),
    Ben> pp_state(get_state()).c_str(),
    Ben> message);
    Ben> // Can't return false here...calling code will assert in 
area_router.cc --Ben
    Ben> return true;
    Ben> }


    >> 
    >>> Atanu.
    >> 


    Ben> -- 
    Ben> Ben Greear <[EMAIL PROTECTED]>
    Ben> Candela Technologies Inc  http://www.candelatech.com

_______________________________________________
Xorp-hackers mailing list
[email protected]
http://mailman.ICSI.Berkeley.EDU/mailman/listinfo/xorp-hackers

Reply via email to