I looked a bit into ipsec trafficstatus fluctuations we see during testruns. Here is what I found so far. And this is a long e-mail!
Here is an example from a testrun, just a diff for the purpose discussion: https://testing.libreswan.org/v3.28-520-g0ca419163-master/xauth-pluto-14/OUTPUT/road.console.diff This is a random example, it is not the specific one I used. While debugging manually I used xauth-pluto-26 without ipsec stop in it! May be there are more than one patterns of traffic counter differences, the recent diffs are similar to the above one. I know there is another pattern with dpd/liveness tests. I haven't looked into those. Also may be in the past there was another case where trafficstatus would go bellow clear text bytes sent or receive. It suggest leak of clear text traffic. This is more worrying than extra encrypted bytes, but do we still see those? I don't notice them any more. I think the diff, shown above, is more likely to happen in a xauth test with a 0/0 tunnel. There is some other traffic between the hosts. This extra traffic get encrypted and get counted. It is likely a host to host traffic and it occurs intermittently, possibly related to host load (either cpu or network). May be ICMP port unreachable or something like that. with xauth or IKEv2 with CP, client server model, we can not depend on 'ipsec trafficstatus' for 0/0 tunnels. currently there are about 20 - 30 test case that show such diffs? Typically a ping in a test sent and receive 336 bytes, 4 pings echo and echo response. One would expect ipsec trafficstatus should show 336 bytes in and out. However sometimes it would show weird numbers such as: inBytes=1584, outBytes=1374 >From my manual debugging session. ipsec trafficstatus 006 #2: "east-any"[1] 192.1.3.209, username=xroad, type=ESP, add_time=1564437448, inBytes=1584, outBytes=1374, lease=192.0.2.201/32 clearly some extra traffic that arrived on east. Here is how looked into this issue. I build kvm-install with 22 prefixes. the host is 32 threads/core system. Load on the host is important. Then I changed prefixes to 21, start make kvm-test. And using the last set t22, I ran the xauth-pluto-26 manually. When I see the difference I logged into the console and looked aroud After a few times, 10-15, I got the t22.east and t22.road in this weird situation where it show more traffic than ping sent. I logged in to east, north and nic and from north send ping -c 4 most of the time I would see increase of 336 bytes as expected, and 8 ESP packets on the NIC. However sometimes more traffic in ipsec trafficstatus, and correspondingly more ESP packets on the nic! I could not corelate the clear text on east to the extra ESP packets yet. There was lot of noise traffic going around. I need a better tcpdump filter rule. Also when I was idle, on nic occasionally I saw extra ESP packets going by. This is even when I am not pinging. I am yet to figure out out exactly what those extra esp packet were. My guess is some host-to-host traffic getting encrypted because of 0/0 tunnel. May be some icmp unreachable something. Now that we can sort of re-produce it. So we could look into it further. It takes a bit time and focus. So far I don't have the complete story, still thought sharing this would help. And my suggestions are based on a hunch trafficstatus alone is not enough for a test. In the mean while Tuomo keep insisting to to switch to fping! a +1 on that! I will add fping to kvm packages and we should move to it. While at this, I will throw a few more ideas for discussion and to record it. To debug this further we could install some iptable counter rules on nic and east, to see what else is going between these hosts. May be run tcpdump on east or road to see extra traffic. It should be easy to capture we don't have to sanitize it, just need a packet capture. Another observation is this extra traffic appears to be related to load on the host or probably leak from network? I wonder why/how? While I was debugging the single test manually the test run finished and everything appeared to be very stable and no more extra traffic for next 30 minutes. Then I gave up and went to bed! If the theory non parallel testruns would see fewer trafficstatus diffs. Any one running tests without KVM_PREFIXES= specified in Makefile.inc.local notice these trafficstatus diffs? Another solution floated around is more iptable rules to block clear text and log them, either iptable log or to console. For this we need more iptable block + log rules. The current, libreswan specific, iptable target LOGDROP, created in swan-prep, is not portable to docker or namespace testing easily. Because it send information to 'console' which does not exist, at least so far not, in namespace testing or docker testing. Also when running tests in parallel, namespace will blow up with too many iptable rule. A hint about the scaling issue of iptable error : "Another app is currently holding the xtables lock. Perhaps you want to use the -w option?" Using -w180 does nto seems to solve it completely! May be nftable would solve it... Or Tuomo suggested use iptable-restore? iptbale-restore does not easily fit into our model? Paul thinks LOGDROP is the best way? AFIK he came up with the idea of LOGDROP. May be a simple alternative is wrap fping + "ipsec trafficstatus" into a shell script. This script process the output and compare to what ping send. If the inBytes and or outBytes are more than the ping send it is ok? This would fix many cases we see now. Please use fping here!. And investigate the liveness/dpd test cases before fixing things. Another possibility is send fping with specific clear text pattern in the payload and use tcpdump rule to detect leak of this specific traffic. The pattern would be like 'ping -p'. I am not sure if fping suppor this? However, the pattern can't be the same for all traffic all tests. It could leak between tests. Hard part is to use a dynaic pattern. May be just testname or checksum of testname! Recently, I feel we are sprinkling more "ipsec trafficstatus" to many test including the old tests, careful when adding these. It introduce instablity to testrun. -antony _______________________________________________ Swan-dev mailing list [email protected] https://lists.libreswan.org/mailman/listinfo/swan-dev
