On 08/22/2012 08:19 PM, Chan, Anthony wrote:
> Hi,
>
> We are running XORP with a custom routing protocol and are able to run into 
> the error situation where a SEND_FAILED error code is generated 
> (“XrlPFSTCPSender
> died: Keepalive timeout”).  After this occurs, XORP basically becomes non 
> operational, but all the processes are still around and in a running state.  
> Our
> platform is resource constrained so it is fairly easy for us to reproduce 
> once we inject enough routes.
>
> I believe what is happening is that the RIB process becomes too busy to 
> acknowledge the IPC keepalive between it and the routing process.  However the
> rtrmngr/Finder does not restart the RIB because the RIB became responsive 
> again and acknowledged the keepalive from the rtrmngr/Finder process.  Since 
> the IPC
> between the routing and RIB process is now down, and rtrmngr/Finder cannot 
> detect any process issues, nothing can be done now to recover from this 
> state.  Do
> you believe this is a possible scenario with the current XORP error handling 
> process??
>
> We are using 1.8.5 without setting the environment variable “ 
> XORP_SENDER_KEEPALIVE_TIME”, therefore using the default 10 seconds as the 
> keepalive interval.

Does the problem go away if you set the keep-alive higher?

You could also throttle your routes that you are sending to the RIB to keep
from over-working it.

In general, restarting xorp processes never works right anyway..so if one dies 
(or times out),
you usually just have to restart xorp completely.

Thanks,
Ben

-- 
Ben Greear <[email protected]>
Candela Technologies Inc  http://www.candelatech.com


_______________________________________________
Xorp-hackers mailing list
[email protected]
http://mailman.ICSI.Berkeley.EDU/mailman/listinfo/xorp-hackers

Reply via email to