Re: [vpp-dev] mheap performance

Dave Barach (dbarach) Fri, 08 Sep 2017 09:02:52 -0700

Dear Jacek,


Oh, heck, we don’t need to use a sledgehammer to kill a fly. It will take five 
minutes to fix this problem. Copying Ole Troan for his input, and / or to 
simply fix the problem as follows:

 

Make a set of pools whose elements are n * CLIB_CACHE_LINE BYTES in size. It’s 
easy enough to dynamically create a fresh pool if [all of a sudden] you need k 
* CLIB_CACHE_LINE BYTES

 

Allocate d->rules from the appropriate pool by rounding 1<<d->psid_length to a 
multiple of a cache line.

 

At that point, the memory allocator will instantly behave itself. If necessary, 
you can preallocate the rule pools, see also pool.h.

 

Absent data to the contrary, it’s reasonably likely that cache-line alignment 
of d->rules is unnecessary in the first place. Have you tried dropping the 
alignment constraint? 

 

Thanks… Dave

 

From: Jacek Siuda [mailto:[email protected]] 
Sent: Friday, September 8, 2017 10:39 AM
To: Dave Barach (dbarach) <[email protected]>
Cc: [email protected]; Michał Dubiel <[email protected]>
Subject: Re: [vpp-dev] mheap performance

 

Hi Dave,

The perf backtrace (taken from "control-only" lcore 0) is as follows:
-  91.87%     vpp_main  libvppinfra.so.0.0.0    [.] mheap_get_aligned
   - mheap_get_aligned
      - 99.48% map_add_del_psid
           vl_api_map_add_del_rule_t_handler
           vl_msg_api_handler_with_vm_node
           memclnt_process
           vlib_process_bootstrap
           clib_calljmp

Using DPDK's rte_malloc_socket(), CPU consumption drops to around 0,5%.

>From my (somewhat brief) mheap code analysis, it looks like mheap might not 
>take into account alignment when looking for free space to allocate structure. 
>So, in my case, when I keep allocating 16B objects with 64B alignment, it 
>starts to examine each hole it left by previous object's allocation alignment 
>and only then realize it cannot be used because of alignment. But of course I 
>might be wrong and the root cause is entirely elsewhere...

In my test, I'm just adding 300,000 tunnels (one domain+one rule).

Unfortunately, rte_malloc() provides only aligned memory allocation, not 
aligned-at-offset. Theoretically we could provide wrapper around it, but that 
would need some careful coding and a lot of testing. I made an attempt to 
quickly replace mheap globally, but of course it ended up in utter failure.

 

Right now, I added a concept of external allocator to clib (via function 
pointers), I'm enabling it only upon DPDK plugin initialization. However, such 
approach requires using it directly instead of clib alloc, (e.g. I did it upon 
rule adding). While it does not add dependency on DPDK, I'm not fully 
satisfied, because it would need manual replacement of all allocation calls. If 
you want, I can share the patch.

Best Regards,

Jacek.

 

2017-09-05 15:30 GMT+02:00 Dave Barach (dbarach) <[email protected] 
<mailto:[email protected]> >:

Dear Jacek,

 

Use of the clib memory allocator is mainly historical. It’s elegant in a couple 
of ways - including built-in leak-finding - but it has been known to backfire 
in terms of performance. Individual mheaps are limited to 4gb in a [typical] 
32-bit vector length image. 

 

Note that the idiosyncratic mheap API functions “tell me how long this object 
really is” and “allocate N bytes aligned to a boundary at a certain offset” are 
used all over the place.

 

I wouldn’t mind replacing it - so long as we don’t create a hard dependency on 
the dpdk - but before we go there...: Tell me a bit about the scenario at hand. 
What are we repeatedly allocating / freeing? That’s almost never necessary...

 

Can you easily share the offending backtrace?  

 

Thanks… Dave

 

From: [email protected] <mailto:[email protected]>  
[mailto:[email protected] <mailto:[email protected]> ] On 
Behalf Of Jacek Siuda
Sent: Tuesday, September 5, 2017 9:08 AM
To: [email protected] <mailto:[email protected]> 
Subject: [vpp-dev] mheap performance

 

Hi,

I'm conducting a tunnel test using VPP (vnet) map with the following parameters:

ea_bits_len=0, psid_offset=16, psid=length, single rule for each domain; total 
number of tunnels: 300000, total number of control messages: 600k.

My problem is with simple adding tunnels. After adding more than ~150k-200k, 
performance drops significantly: first 100k is added in ~3s (on asynchronous C 
client), next 100k in another ~5s, but the last 100k takes ~37s to add; in 
total: ~45s. Python clients are performing even worse: 32 minutes(!) for 300k 
tunnels with synchronous (blocking) version and ~95s with asynchronous. The 
python clients are expected to perform a bit worse according to vpp docs, but I 
was worried by non-linear time of single tunnel addition that is visible even 
on C client.

While investigating this using perf, I found the culprit: it is the memory 
allocation done for ip address by rule addition request. 
The memory is allocated by clib, which is using mheap library (~98% of cpu 
consumption). I looked into mheap and it looks a bit complicated for allocating 
a short object.
I've done a short experiment by replacing (in vnet/map/ only) clib allocation 
with DPDK rte_malloc() and achieved a way better performance: 300k tunnels in 
~5-6s with the same C-client, and respectively ~70s and ~30-40s with Python 
clients. Also, I haven't noticed any negative impact on packet throughput with 
my experimental allocator.

So, here are my questions:

1) Did someone other reported performance penalties for using mheap library? 
I've searched the list archive and could not find any related questions.

2) Why mheap library was chosen to be used in clib? Are there any performance 
benefits in some scenarios?

3) Are there any (long- or short-term) plans to replace memory management in 
clib with some other library?

4) I wonder, if I'd like to upstream my solution, how should I approach 
customization of memory allocation, so it would be accepted by community. 
Installable function pointers defaulting to clib?

 

Best Regards,

Jacek Siuda.

smime.p7s
Description: S/MIME cryptographic signature

_______________________________________________
vpp-dev mailing list
[email protected]
https://lists.fd.io/mailman/listinfo/vpp-dev

Re: [vpp-dev] mheap performance

Reply via email to