Ben Greear <[EMAIL PROTECTED]> wrote:

> I am trying to start 30 xorp instances on a system with
> around 600 interfaces (vlans and such).  My hardware has lots
> of RAM and a quad-core CPU, but it still takes more than the 10
> second keep-alive timeout to get xorp_fea initialized (it seems
> to be reading large numbers of netlink messages).  This causes
> continual xorp restarts since the timeout fails.

What kind of netlink messages the FEA sees from the kernel?
Are those asynchronous notifications/upcalls about some events
or something that the FEA explicitly requested (e.g., to read the
set of interfaces and IP addresses).
Also, if you run "ip monitor all" in parallel do you see
all those netlink messages?

> I tried throttling so that I only started one xorp per 5
> seconds, and it still times out.

As a starter try tweaking the XRL-related timeouts I suggested
in another thread:

* DEFAULT_SENDER_KEEPALIVE_MS inside libxipc/xrl_pf_stcp.cc
  (current value of 10000ms,  i.e., 10s)
* RESPONSE_TIMEOUT_MS inside libxipc/finder_messenger.hh
  (current value of 30000ms, i.e., 30s)

> I'm going to experiment with increasing the keep-alive timer,
> but I am curious if there are better alternatives.
> 
> *  Maybe don't start keep-alive polling until fea finalizes it's
>    initialization?

On startup this is what is suppose to happen.
However, there are different types of keepalives (some by the
underlying XRL mechanism, other by the rtrmgr itself).
On top of that, there are things the FEA does before it gets to
initializing the XRL mechanism, things during/after
(re)configuration, etc.

All those events need to be analyzed during heavy load to identify
the bottleneck.

> *  Maybe have fea answer keep-alives *while* it's initializing itself?
> 
> *  Optimize fea to only probe info for devices it is configured to
>     care about?

On startup it queries info about all interfaces in the system.
This info is needed for various reasons. E.g., if later it is
reconfigured and on shutdown it is suppose to restore the original
state.
Hence, it might be quite complicated to do selective probing (if
possible at all) and without further analysis currently it is not
clear whether this is the bottleneck.

Regards,
Pavlin

_______________________________________________
Xorp-hackers mailing list
[email protected]
http://mailman.ICSI.Berkeley.EDU/mailman/listinfo/xorp-hackers

Reply via email to