On 08/22/2012 08:19 PM, Chan, Anthony wrote: > Hi, > > We are running XORP with a custom routing protocol and are able to run into > the error situation where a SEND_FAILED error code is generated > (“XrlPFSTCPSender > died: Keepalive timeout”). After this occurs, XORP basically becomes non > operational, but all the processes are still around and in a running state. > Our > platform is resource constrained so it is fairly easy for us to reproduce > once we inject enough routes. > > I believe what is happening is that the RIB process becomes too busy to > acknowledge the IPC keepalive between it and the routing process. However the > rtrmngr/Finder does not restart the RIB because the RIB became responsive > again and acknowledged the keepalive from the rtrmngr/Finder process. Since > the IPC > between the routing and RIB process is now down, and rtrmngr/Finder cannot > detect any process issues, nothing can be done now to recover from this > state. Do > you believe this is a possible scenario with the current XORP error handling > process?? > > We are using 1.8.5 without setting the environment variable “ > XORP_SENDER_KEEPALIVE_TIME”, therefore using the default 10 seconds as the > keepalive interval.
Does the problem go away if you set the keep-alive higher? You could also throttle your routes that you are sending to the RIB to keep from over-working it. In general, restarting xorp processes never works right anyway..so if one dies (or times out), you usually just have to restart xorp completely. Thanks, Ben -- Ben Greear <[email protected]> Candela Technologies Inc http://www.candelatech.com _______________________________________________ Xorp-hackers mailing list [email protected] http://mailman.ICSI.Berkeley.EDU/mailman/listinfo/xorp-hackers
