Hi,
 Changed the
strongswan (5.0.4) code to implement the diffie_hellman_t  interface in order 
to use Octeon Core Crypto
Library APIs.  Run the IPsec scenario in
high loads with DH group 1 (Encryption algo: AES and integrity algorithm: SHA1)
in Windriver Linux (on Octeon platform) and found the tunnel setup rate to be
165-170. Note that, with gmp library (using the same set of parameters), the
setup rate was found out to be 120-125. I profiled the code to figure out the
hotspots at both the ends (IKE Responder and IKE Initiator). The overall CPU 
utilization
at both ends was below 10%. Here goes the profiled result. 
  
IKE Initiator  


  PerfTop: 2508649 irqs/sec  kernel:96.0% [1000Hz cpu-clock-msecs],
(all, 16 CPUs) 
------------------------------------------------------------------------------- 

            
samples  pcnt
function                
DSO 
            
_______ _____ ________________________ _____________________ 

          3874309.00 89.5%
r4k_wait [kernel.kallsyms] 
           
73973.00  1.7%
dso__find_symbol         /usr/bin/perf 
           
39131.00  0.9% pthread_mutex_lock libpthread-2.11.1.so 
           
37251.00  0.9% event__preprocess_sample /usr/bin/perf 
           
17659.00  0.4% pthread_rwlock_rdlock libpthread-2.11.1.so 
           
17010.00  0.4% __pthread_rwlock_unlock libpthread-2.11.1.so 
           
15596.00  0.4% __libc_malloc /lib64/libc-2.11.1.so 
           
13918.00  0.3%
maps__find              
/usr/bin/perf 
            13066.00 
0.3% cfree /lib64/libc-2.11.1.so 
           
12733.00  0.3% vfprintf /lib64/libc-2.11.1.so 
           
12684.00  0.3% POSTLOOP1 libstrongswan-gmp.so 
           
11746.00  0.3%
dump_printf             
/usr/bin/perf 
           
11257.00  0.3% MM$L2 libstrongswan-gmp.so 
           
10115.00  0.2% LOOP1 libstrongswan-gmp.so 
            
8556.00  0.2% SHA1Transform libstrongswan-sha1.so 



IKE
Responder 


------------------------------------------------------------------------------- 
   PerfTop:  859932 irqs/sec  kernel:88.4% [1000Hz
cpu-clock-msecs],  (all, 16 CPUs) 
------------------------------------------------------------------------------- 

            
samples  pcnt
function                
DSO 
            
_______ _____ ________________________ ____________________ 

          2783544.00 89.8%
r4k_wait [kernel.kallsyms] 
           
49292.00  1.6%
dso__find_symbol         /usr/bin/perf 
           
27989.00  0.9% pthread_mutex_lock libpthread-2.11.1.so 
           
25644.00  0.8% event__preprocess_sample /usr/bin/perf 
           
14973.00  0.5% pthread_rwlock_rdlock libpthread-2.11.1.so 
           
14370.00  0.5% __pthread_rwlock_unlock libpthread-2.11.1.so 
           
10068.00  0.3%
__libc_malloc           
libc-2.11.1.so 
            
9905.00  0.3%
maps__find              
/usr/bin/perf 
            
9845.00  0.3%
vfprintf                
libc-2.11.1.so 
            
9244.00  0.3% POSTLOOP1 libstrongswan-gmp.so 
            
8480.00  0.3%
cfree                   
libc-2.11.1.so 
             8279.00 
0.3% MM$L2 libstrongswan-gmp.so 
            
7978.00  0.3%
dump_printf             
/usr/bin/perf 
            
7297.00  0.2% LOOP1 libstrongswan-gmp.so 
            
6902.00  0.2% perf_session__findnew    /usr/bin/perf
 
It seems that,  pthread_mutex_lock()
is at the top of the stack and thus consumes more CPU cycles and CPU time. Is
there any option to find out the exact piece of code which causes this
performance issue?
Regards,
Chinmaya
_______________________________________________
Users mailing list
[email protected]
https://lists.strongswan.org/mailman/listinfo/users

Reply via email to