Hello all, I have been running strongSwan for a while on some of my networks and have been having a few stability issues. I am working on getting to root cause on a few of them and was wondering if other people are having these issues:
1.) DPD'd connections with dpaction=restart sometimes stop and never come back. The most common form of this is the CHILD_SA going away and never being re-established. I am working on getting better debug messages from charon and figuring out if charon is missing kernel notifications or if it just isn't establishing CHILD_SA's correctly. This problem seems to be worse over lower bandwidth connections. Most of the time this bug takes a while to hit. The first time I saw this bug was after ~ 57 hours of a tunnel working. The fastest I have hit this bug yet is ~ 19 hours. Some of my connections haven't hit this problem in the weeks they have been up. Some of these problems may be documented in: https://lists.strongswan.org/pipermail/users/2009-June/003516.html 2.) Sometimes connections will get into rekeying wars where both ends start displaying: deleting duplicate IKE_SA for peer 'w.x.y.z' due to uniqueness policy which causes a rekey, which causes a duplicate, which causes a rekey,... Note that only one end is configured to initiate the connection (auto=start, dpdaction=restart. The other end is (auto=add, dpdaction=clear)). This bug can also take hours/days to hit. This bug is pretty rare as I have only hit it twice in all my testing. 3.) Sometimes charon locks up. I have seen this happen in many different forms. I hit this style of bug maybe once a week. Unfortunately this bug family is really nasty as I have to kill process, restart processes, etc. Here is one such trace: gdb) thread apply all bt 15 Thread 6 (Thread 0x488304d0 (LWP 15785)): #0 0x0ff8df60 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib/libpthread.so.0 #1 0x0ffd7360 in ?? () from /usr/lib/libstrongswan.so.0 #2 0x10023884 in schedule (this=0x10067f28) at processing/scheduler.c:223 #3 0x100219d8 in execute (this=0x100680e0) at processing/jobs/callback_job.c:145 #4 0x100242e4 in process_jobs (this=0x1006a0b8) at processing/processor.c:123 #5 0x0ff87b34 in start_thread () from /lib/libpthread.so.0 #6 0x0fdf8b94 in clone () from /lib/libc.so.6 Backtrace stopped: previous frame inner to this frame (corrupt stack?) Thread 5 (Thread 0x490304d0 (LWP 15786)): #0 0x0ff92594 in recvfrom () from /lib/libpthread.so.0 #1 0x0f811720 in receive_events (this=<value optimized out>) at kernel_netlink_ipsec.c:748 #2 0x100219d8 in execute (this=0x1006e550) at processing/jobs/callback_job.c:145 #3 0x100242e4 in process_jobs (this=0x1006a0b8) at processing/processor.c:123 #4 0x0ff87b34 in start_thread () from /lib/libpthread.so.0 #5 0x0fdf8b94 in clone () from /lib/libc.so.6 Backtrace stopped: previous frame inner to this frame (corrupt stack?) Thread 4 (Thread 0x498304d0 (LWP 15787)): #0 0x0ff92594 in recvfrom () from /lib/libpthread.so.0 #1 0x0f8175c0 in receive_events (this=0x1006e620) at kernel_netlink_net.c:498 #2 0x100219d8 in execute (this=0x1006e7a8) at processing/jobs/callback_job.c:145 #3 0x100242e4 in process_jobs (this=0x1006a0b8) at processing/processor.c:123 #4 0x0ff87b34 in start_thread () from /lib/libpthread.so.0 #5 0x0fdf8b94 in clone () from /lib/libc.so.6 Backtrace stopped: previous frame inner to this frame (corrupt stack?) Thread 3 (Thread 0x4a0304d0 (LWP 15788)): #0 0x0ff8d930 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib/libpthread.so.0 #1 0x0ffd74ec in ?? () from /usr/lib/libstrongswan.so.0 #2 0x100213f4 in send_packets (this=0x10070888) at network/sender.c:97 #3 0x100219d8 in execute (this=0x100709d8) at processing/jobs/callback_job.c:145 #4 0x100242e4 in process_jobs (this=0x1006a0b8) at processing/processor.c:123 #5 0x0ff87b34 in start_thread () from /lib/libpthread.so.0 #6 0x0fdf8b94 in clone () from /lib/libc.so.6 Backtrace stopped: previous frame inner to this frame (corrupt stack?) Thread 2 (Thread 0x4a8304d0 (LWP 15791)): #0 0x0fdf0798 in select () from /lib/libc.so.6 #1 0x1004b5f8 in receiver (this=0x10069020, packet=0x4a82f93c) at network/socket-raw.c:148 #2 0x10020b7c in receive_packets (this=0x10070aa8) at network/receiver.c:266 #3 0x100219d8 in execute (this=0x10070b88) at processing/jobs/callback_job.c:145 #4 0x100242e4 in process_jobs (this=0x1006a0b8) at processing/processor.c:123 #5 0x0ff87b34 in start_thread () from /lib/libpthread.so.0 #6 0x0fdf8b94 in clone () from /lib/libc.so.6 Backtrace stopped: previous frame inner to this frame (corrupt stack?) Thread 1 (Thread 0x48022110 (LWP 15780)): #0 0x0ff8d930 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib/libpthread.so.0 #1 0x0ffd74b8 in ?? () from /usr/lib/libstrongswan.so.0 #2 0x10030a5c in flush (this=0xfff701c) at sa/ike_sa_manager.c:1552 #3 0x10011270 in destroy (this=0x10067970) at daemon.c:177 #4 0x100125fc in main (argc=<value optimized out>, argv=<value optimized out>) at daemon.c:790 #0 0x0ff8d930 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib/libpthread.so.0 All of my ipsec.conf files look verify similar to: version 2.0 config setup plutostart=no charonstart=yes strictcrlpolicy=no conn host-host-1 ike=aes256-sha2_256-modp1536,aes256-sha1-modp1536,aes128-sha2_256-modp1536,aes128-sha1-modp1536,3des-sha2_256-modp1536,3des-sha1-modp1536 esp=aes256-sha2_256-modp1536,aes256-sha1-modp1536,aes128-sha2_256-modp1536,aes128-sha1-modp1536,3des-sha2_256-modp1536,3des-sha1-modp1536 mobike=no pfs=yes pfsgroup=modp1536 leftupdown=/usr/lib/ipsec/my_updown keyingtries=%forever dpdaction=restart dpddelay=60 left=192.166.1.1 right=192.166.1.2 auto=start authby=secret keyexchange=ikev2 conn net-net-1-2-2 leftsubnet=10.201.0.0/16 rightsubnet=192.167.1.0/24 also=host-host-1 conn net-net-1-2-1 leftsubnet=10.201.0.0/16 rightsubnet=192.168.2.0/24 also=host-host-1 conn net-host-1-2 leftsubnet=10.201.0.0/16 also=host-host-1 conn host-net-1-2 rightsubnet=192.167.1.0/24 also=host-host-1 conn host-net-1-1 rightsubnet=192.168.2.0/24 also=host-host-1 I think strongSwan is great. It is one of the easiest to configure IKE daemons around, but I would like better stability. I have no problem working on the source code to try to solve some of these problems, but I don't want to duplicate work or fight against known issues. If you know anything about any of these bugs, please let me know. I am currently running 4.3.4 on Linux 2.6.29.3 (powerpc), Thanks, Barry _______________________________________________ Users mailing list Users@lists.strongswan.org https://lists.strongswan.org/mailman/listinfo/users