Hi, Changed the strongswan (5.0.4) code to implement the diffie_hellman_t interface in order to use Octeon Core Crypto Library APIs. Run the IPsec scenario in high loads with DH group 1 (Encryption algo: AES and integrity algorithm: SHA1) in Windriver Linux (on Octeon platform) and found the tunnel setup rate to be 165-170. Note that, with gmp library (using the same set of parameters), the setup rate was found out to be 120-125. I profiled the code to figure out the hotspots at both the ends (IKE Responder and IKE Initiator). The overall CPU utilization at both ends was below 10%. Here goes the profiled result. IKE Initiator
PerfTop: 2508649 irqs/sec kernel:96.0% [1000Hz cpu-clock-msecs], (all, 16 CPUs) ------------------------------------------------------------------------------- samples pcnt function DSO _______ _____ ________________________ _____________________ 3874309.00 89.5% r4k_wait [kernel.kallsyms] 73973.00 1.7% dso__find_symbol /usr/bin/perf 39131.00 0.9% pthread_mutex_lock libpthread-2.11.1.so 37251.00 0.9% event__preprocess_sample /usr/bin/perf 17659.00 0.4% pthread_rwlock_rdlock libpthread-2.11.1.so 17010.00 0.4% __pthread_rwlock_unlock libpthread-2.11.1.so 15596.00 0.4% __libc_malloc /lib64/libc-2.11.1.so 13918.00 0.3% maps__find /usr/bin/perf 13066.00 0.3% cfree /lib64/libc-2.11.1.so 12733.00 0.3% vfprintf /lib64/libc-2.11.1.so 12684.00 0.3% POSTLOOP1 libstrongswan-gmp.so 11746.00 0.3% dump_printf /usr/bin/perf 11257.00 0.3% MM$L2 libstrongswan-gmp.so 10115.00 0.2% LOOP1 libstrongswan-gmp.so 8556.00 0.2% SHA1Transform libstrongswan-sha1.so IKE Responder ------------------------------------------------------------------------------- PerfTop: 859932 irqs/sec kernel:88.4% [1000Hz cpu-clock-msecs], (all, 16 CPUs) ------------------------------------------------------------------------------- samples pcnt function DSO _______ _____ ________________________ ____________________ 2783544.00 89.8% r4k_wait [kernel.kallsyms] 49292.00 1.6% dso__find_symbol /usr/bin/perf 27989.00 0.9% pthread_mutex_lock libpthread-2.11.1.so 25644.00 0.8% event__preprocess_sample /usr/bin/perf 14973.00 0.5% pthread_rwlock_rdlock libpthread-2.11.1.so 14370.00 0.5% __pthread_rwlock_unlock libpthread-2.11.1.so 10068.00 0.3% __libc_malloc libc-2.11.1.so 9905.00 0.3% maps__find /usr/bin/perf 9845.00 0.3% vfprintf libc-2.11.1.so 9244.00 0.3% POSTLOOP1 libstrongswan-gmp.so 8480.00 0.3% cfree libc-2.11.1.so 8279.00 0.3% MM$L2 libstrongswan-gmp.so 7978.00 0.3% dump_printf /usr/bin/perf 7297.00 0.2% LOOP1 libstrongswan-gmp.so 6902.00 0.2% perf_session__findnew /usr/bin/perf It seems that, pthread_mutex_lock() is at the top of the stack and thus consumes more CPU cycles and CPU time. Is there any option to find out the exact piece of code which causes this performance issue? Regards, Chinmaya
_______________________________________________ Users mailing list [email protected] https://lists.strongswan.org/mailman/listinfo/users
