Re: [strongSwan] Best practices regarding monitoring
Hi Peter > So, am I correct to assume that you guys usually evaluate the output > of `ipsec statusall` Preferably I'd do that over vici [1], as it provides a much better interface for various languages to query tunnel status or re-initiate tunnels. > Do you simply send pings to remote systems "behind" the VPN? Actually out-of-sync state is quite uncommon at least with IKEv2. If your peer looses CHILD_SAs but happily answers to DPD/liveness checks on IKE, there is probably a bug somewhere. If a peer deletes a CHILD_SA, it must signal that over IKE, hence its peer should notice that. Even complex rekey collisions are actually defined, but probably not all implementations handle them correctly. Also, you might consider updating to 5.5.x, which brings some additional improvements regarding collision exchanges. > If there is no DPD that uses CHILD_SAs, there might be nothing else > that you can do. There isn't, as from a protocol level this is not needed in IKEv2 due to the strict state synchronization it provides. Of course you could use a short CHILD_SA rekeying interval to check its liveness, but that isn't an optimal solution, either. Regards Martin [1]https://wiki.strongswan.org/projects/strongswan/wiki/Vici
Re: [strongSwan] Best practices regarding monitoring
Hi, On Fri, Jun 09, 2017 at 09:11:27PM +0200, Noel Kuntze wrote: > Besides DPD, there's no standard that charon implements for that. I am > also not aware of any that uses CHILD_SAs. alright, too bad. :-/ So, am I correct to assume that you guys usually evaluate the output of `ipsec statusall` and maybe `ip xfrm {state,policy}` to implement monitoring? Do you simply send pings to remote systems "behind" the VPN? (If there is no DPD that uses CHILD_SAs, there might be nothing else that you can do.) > Huh? Check `ip xfrm state` and `ip xfrm policy`, they give you the SAD and > SPD. > Also check if you receive any ESP packets and what their SPIs are. `ip xfrm state` shows the same SPIs as `ipsec statusall` does. Policies look fine, too. With tcpdump, I can see outgoing encrypted traffic that uses the correct SPIs (and we can decrypt that traffic using Wireshark and the keys shown by `ip xfrm state`). No incoming ESP traffic, though. > I think the much more plausible cases are the following: > 1) Kernel does not send expiration messages to charon when an SA soft or hard > expires > 2) Something in between drops the ESP traffic. Maybe there's a problem with a > stateful firewall? iptables rules? As for #1: How can I check that? I assume that `ip xfrm state` would not show any SAs but `ipsec statusall` still shows them, right? As for #2: Totally possible. We always check our firewalls, but traffic may still be dropped on the remote end. Don't get me wrong, though. I only posted one exemplary scenario that we see with one of our IPSec peers. It illustrates nicely that our strongswan/charon/kernel looks like it's working fine, but still, no response from the remote peer until we do a "service strongswan restart". I understand that I may not have posted all required information to debug this particular issue, simply because that's not what I'm after. :-) At the end of the day, we have to work closely with the admins of our remote peers to fix the individual issues. We're not able to reliably *detect* them, though. Any suggestions are highly appreciated. Thanks! Peter
Re: [strongSwan] Best practices regarding monitoring
Hello Peter, On 09.06.2017 11:46, Peter Hofmann wrote: > Hi, > > we're running various Ubuntu systems with StrongSwan 5.1 or 5.3. Each > system connects to exactly one IPSec/IKE peer. We usually don't know > what kind of peer that is -- is it also running StrongSwan, is it a > hardware firewall, does it run OpenBSD, ... ? No idea. No way of > retrieving log files. They're all black boxes to us. Okay. > > Now, the big question is: How to monitor IPSec connectivity? Ask the administrator of the remote peer for some service that you can use to check connectivity. Besides DPD, there's no standard that charon implements for that. I am also not aware of any that uses CHILD_SAs. > > It's easy to check if there are IKE SAs. It's also not a big deal to > check if there are CHILD SAs. We can do that. However, checking that is > not enough. > > Let me give you an example. > > Here's some output of "ipsec statusall": > > Status of IKE charon daemon (strongSwan 5.1.2, Linux 3.13.0-67-generic, > x86_64): > uptime: 5 days, since Jun 02 11:51:14 2017 > malloc: sbrk 1511424, mmap 0, used 343856, free 1167568 > worker threads: 11 of 16 idle, 5/0/0/0 working, job queue: 0/0/0/0, > scheduled: 84 > loaded plugins: charon test-vectors aes rc2 sha1 sha2 md4 md5 rdrand > random nonce x509 revocation constraints pkcs1 pkcs7 pkcs8 pkcs12 pem openssl > xcbc cmac hmac ctr ccm gcm attr kernel-netlink resolve socket-default stroke > updown eap-identity addrblock > Listening IP addresses: > 10.1.2.3 > $public_IP > Connections: > peer_1: $public_IP...$peer_IP IKEv2 > peer_1: local: [$public_IP] uses pre-shared key authentication > peer_1: remote: [$peer_IP] uses pre-shared key authentication > peer_1: child: 192.168.23.24/32 === 192.168.100.200/32 TUNNEL > Routed Connections: > peer_1{1}: ROUTED, TUNNEL > peer_1{1}: 192.168.23.24/32 === 192.168.100.200/32 > Security Associations (1 up, 0 connecting): > peer_1[79]: ESTABLISHED 82 minutes ago, > $public_IP[$public_IP]...$peer_IP[$peer_IP] > peer_1[79]: IKEv2 SPIs: 1234567890_i abcdefghi_r*, rekeying disabled > peer_1[79]: IKE proposal: > AES_CBC_256/HMAC_SHA2_256_128/PRF_HMAC_SHA2_256/MODP_8192 > peer_1{1}: INSTALLED, TUNNEL, ESP SPIs: c112233_i c445566_o > peer_1{1}: AES_CBC_256/HMAC_SHA2_256_128, 49208 bytes_i (239 pkts, > 1145s ago), 59836 bytes_o (491 pkts, 14s ago), rekeying disabled > peer_1{1}: 192.168.23.24/32 === 192.168.100.200/32 > > Looks fine, doesn't it? Except 192.168.100.200 does not respond. > tcpdump shows that we properly encrypt our traffic using those exact > SPIs and everything. On our end, everything looks fine. But our peer > simply ignores our encrypted traffic. It's as if our peer has > "forgotten" about those SPIs. Huh? Check `ip xfrm state` and `ip xfrm policy`, they give you the SAD and SPD. Also check if you receive any ESP packets and what their SPIs are. I think the much more plausible cases are the following: 1) Kernel does not send expiration messages to charon when an SA soft or hard expires 2) Something in between drops the ESP traffic. Maybe there's a problem with a stateful firewall? iptables rules? See above. Kind regards Noel signature.asc Description: OpenPGP digital signature
[strongSwan] Best practices regarding monitoring
Hi, we're running various Ubuntu systems with StrongSwan 5.1 or 5.3. Each system connects to exactly one IPSec/IKE peer. We usually don't know what kind of peer that is -- is it also running StrongSwan, is it a hardware firewall, does it run OpenBSD, ... ? No idea. No way of retrieving log files. They're all black boxes to us. Okay. Now, the big question is: How to monitor IPSec connectivity? It's easy to check if there are IKE SAs. It's also not a big deal to check if there are CHILD SAs. We can do that. However, checking that is not enough. Let me give you an example. Here's some output of "ipsec statusall": Status of IKE charon daemon (strongSwan 5.1.2, Linux 3.13.0-67-generic, x86_64): uptime: 5 days, since Jun 02 11:51:14 2017 malloc: sbrk 1511424, mmap 0, used 343856, free 1167568 worker threads: 11 of 16 idle, 5/0/0/0 working, job queue: 0/0/0/0, scheduled: 84 loaded plugins: charon test-vectors aes rc2 sha1 sha2 md4 md5 rdrand random nonce x509 revocation constraints pkcs1 pkcs7 pkcs8 pkcs12 pem openssl xcbc cmac hmac ctr ccm gcm attr kernel-netlink resolve socket-default stroke updown eap-identity addrblock Listening IP addresses: 10.1.2.3 $public_IP Connections: peer_1: $public_IP...$peer_IP IKEv2 peer_1: local: [$public_IP] uses pre-shared key authentication peer_1: remote: [$peer_IP] uses pre-shared key authentication peer_1: child: 192.168.23.24/32 === 192.168.100.200/32 TUNNEL Routed Connections: peer_1{1}: ROUTED, TUNNEL peer_1{1}: 192.168.23.24/32 === 192.168.100.200/32 Security Associations (1 up, 0 connecting): peer_1[79]: ESTABLISHED 82 minutes ago, $public_IP[$public_IP]...$peer_IP[$peer_IP] peer_1[79]: IKEv2 SPIs: 1234567890_i abcdefghi_r*, rekeying disabled peer_1[79]: IKE proposal: AES_CBC_256/HMAC_SHA2_256_128/PRF_HMAC_SHA2_256/MODP_8192 peer_1{1}: INSTALLED, TUNNEL, ESP SPIs: c112233_i c445566_o peer_1{1}: AES_CBC_256/HMAC_SHA2_256_128, 49208 bytes_i (239 pkts, 1145s ago), 59836 bytes_o (491 pkts, 14s ago), rekeying disabled peer_1{1}: 192.168.23.24/32 === 192.168.100.200/32 Looks fine, doesn't it? Except 192.168.100.200 does not respond. tcpdump shows that we properly encrypt our traffic using those exact SPIs and everything. On our end, everything looks fine. But our peer simply ignores our encrypted traffic. It's as if our peer has "forgotten" about those SPIs. If you look closely, you can see that there's outgoing traffic, but no incoming traffic: peer_1{1}: ... 49208 bytes_i (239 pkts, 1145s ago), 59836 bytes_o (491 pkts, 14s ago) Reinitiating the entire connection (essentially, doing "service strongswan restart") fixes the problem and we can immediately reach 192.168.100.200. (Yes, in this specific case, it might be worth a try to reenable rekeying on our end. Still, my question is not about fixing this problem at hand. :-)) What do you guys do in such situations? What are best practices for monitoring? How do you detect "dead" CHILD SAs? Is that even possible? There's the obvious idea: Try to ping a system "behind" the VPN. In the example above, we could issue pings to 192.168.100.200 and, if that system does not respond, consider the IPSec connection to be "down". We would like to avoid that, though. Ideally, we could find a way to directly check whether all CHILD SAs are "healthy". Pinging 192.168.100.200 would be "indirect" monitoring: It's a different system and *that* system could be down, not the IPSec connection. In other words, maybe there's something like DPD in IKEv2, but operating on the level of CHILD SAs? Thank you very much in advance! Peter