I've been asked to give specific examples, so I'll give two that I'm
working on in the background, and are not ready for prime time yet):
(A) lib/thread.c:
static int
thread_process_fd (struct thread_list *list, fd_set *fdset, fd_set *mfdset)
{
struct thread *thread;
struct thread *next;
int ready = 0;
assert (list);
for (thread = list->head; thread; thread = next)
{
next = thread->next;
if (FD_ISSET (THREAD_FD (thread), fdset))
{
assert (FD_ISSET (THREAD_FD (thread), mfdset));
FD_CLR(THREAD_FD (thread), mfdset);
thread_list_delete (list, thread);
thread_list_add (&thread->master->ready, thread);
thread->type = THREAD_READY;
ready++;
}
}
return ready;
}
Running perf with ~8k peers and attempting to converge, showed that
significant time was spent in thread_process_fd. Iterating over the
thread list, to find the relevant fd needed to find the correct thread is
very expensive. This was doubly expensive because we do it twice once for
read fd's and once for write fd's. Modifying to an array where we use the
fd as the lookup to find the thread and using poll semantics instead of
select, completely removed thread_process_fd from perf reports.
Convergence time dropped from ~15 minutes to ~5 minutes with this change.
I don't have the actual perf report anymore. I did not save it for this
case.
(B) Reading in a the cli for a bgp config that has 8k neighbor statements
takes ~5 minutes to load. perf report over this is:
#
# Overhead Command Shared Object Sy
# ........ ....... .................... .....................................
#
30.61% bgpd bgpd [.] peer_lookup
|
--- peer_lookup
|
|--10.21%-- peer_and_group_lookup_vty
| neighbor_update_source
| cmd_execute_command_real
| config_from_file
| vty_read_config
| main
| __libc_start_main
|
|--10.21%-- neighbor_set_peer_group
| cmd_execute_command_real
| config_from_file
| vty_read_config
| main
| __libc_start_main
|
--10.18%-- peer_remote_as
neighbor_remote_as
cmd_execute_command_real
config_from_file
vty_read_config
main
__libc_start_main
21.85% bgpd libzebra.so.0.0.0 [.] sockunion_same
|
--- sockunion_same
|
--21.57%-- peer_lookup
|
|--7.30%-- peer_remote_as
| neighbor_remote_as
| cmd_execute_command_real
| config_from_file
| vty_read_config
| main
| __libc_start_main
|
|--7.18%-- neighbor_set_peer_group
| cmd_execute_command_real
| config_from_file
| vty_read_config
| main
| __libc_start_main
|
--7.09%-- peer_and_group_lookup_vty
neighbor_update_source
cmd_execute_command_real
config_from_file
vty_read_config
main
__libc_start_main
So peer_lookup is taking aproximately 41% of the cpu time. It's nothing
more than a link list traversal to find the interesting peer. Switching
over to a hash function instead of a list in struct bgp should be a
significant saving in cpu time alone and well worth it imo.
There are other examples, in BGP we create events for eveything for timer
related events. With a large # of peers these events start to overwhelm
the system. Switching to a timer wheel implementation or glomming events
together would be another source of signficant performance improvement.
Though before I make any changes I would want to prove these are true
problems by running perf and spend time examining the code.
In a past life, I spent significant time improving the performance of EIGRP
across all platforms, and BGP and OSPF for NX/OS. My rules of thumb have
become:
(A) Network Time > algorithm time > memory time > cpu time
What do I mean by this?
Fix the code to not be stupid about how it puts packets out on the
network. The minute I have to wait for a response before I can proceed
outweighs everything else.
Then Fix the bad algorithms chosen, found via perf or related tool.
Then Fix time spent accessing memory, found via perf or related tool.
Then Fix cpu based decisions like branch prediction, found via perf or
related tool.
I generally never had to go beyond algorithm improvements. Things sorted
themselves out from there. As that network communication takes several
orders of magnitude more time than anything else. Since we are
implementing routing protocol(s) we have to communicate over the network.
These times dwarf everything else.
(B) Measure, Measure, Measure
As reported elsewhere by other people in this thread, I've run across
multiple attempts at performance improvements and the improvement made
things worse because no actual measurements were taken.
(C) Write code first and then fix the performance issues. Code cleanliness
and maintainability, in general, take higher precedence over performance.
donald
On Fri, May 29, 2015 at 9:45 AM, Donald Sharp <[email protected]>
wrote:
> To Continue with Paul's point. I believe that there are allot of
> performance gains to be had by running perf and looking at the output to
> see where our algorithmic/data structure choices are wrong and fixing those
> first before diving in and rewriting for multi-threaded code.
>
> donald
>
> On Fri, May 29, 2015 at 9:41 AM, Paul Jakma <[email protected]> wrote:
>
>> On Fri, 29 May 2015, Michael H Lambert wrote:
>>
>> I just want to make sure I've been following this email chain correctly.
>>> It seems that there is at least some consensus for a single daemon for all
>>> VRFs. Does this preclude a thread per VRF within the daemon?
>>>
>>
>> generally, no.
>>
>> For existing Quagga libzebra daemons, probably that's going to be such a
>> long effort to achieve, that there will be significant risks as to whether
>> it can succeed. Multi-daemon would be much easier I suspect (it's not like
>> VRF instances need to share any data, so there's no memory/performance
>> trade-off issues that would generally give threads any benefit over
>> multi-process - unlike multi-processing bgpd).
>>
>> regards,
>> --
>> Paul Jakma [email protected] @pjakma Key ID: 64A2FF6A
>> Fortune:
>> You can fool all the people all of the time if the advertising is right
>> and the budget is big enough.
>> -- Joseph E. Levine
>>
>>
>> _______________________________________________
>> Quagga-dev mailing list
>> [email protected]
>> https://lists.quagga.net/mailman/listinfo/quagga-dev
>>
>
>
_______________________________________________
Quagga-dev mailing list
[email protected]
https://lists.quagga.net/mailman/listinfo/quagga-dev