Re: [Swan-dev] status of failing tests
On Tue, 6 Mar 2018, Andrew Cagney wrote: testing/pluto/ikev2-ppk-static-05-insist-nokey-insist-fail failed There's something special about that test preventing "addconn" from exiting (infinite loop)? It's nothing related to the specific test... I just ran it against git master on my laptop and it passes fine. Paul ___ Swan-dev mailing list Swan-dev@lists.libreswan.org https://lists.libreswan.org/mailman/listinfo/swan-dev
Re: [Swan-dev] status of failing tests
>> testing/pluto/ikev2-ppk-static-05-insist-nokey-insist-fail failed >> west:output-different >> -003 "westnet-eastnet-ipv4-psk-ppk" #1: connection requires PPK, >> but PPK_ID did not match any loaded PPK >> +003 "westnet-eastnet-ipv4-psk-ppk" #1: connection requires PPK, >> but we didn't find one >> +leak: fork pid, item size: 64 >> +leak detective found 1 leaks, total size 64 > > > I'll double check with Vukasin. Although fork pid is not likely his > fault :) There's something special about that test preventing "addconn" from exiting (infinite loop)? A quick grep shows the process being created: $ grep addconn testing/pluto/ikev2-ppk-static-05-insist-nokey-insist-fail/OUTPUT/west.pluto.log | created addconn helper (pid:2127) using fork+execve | pid table: inserting object 0x7f4ac6f79fa8 (addconn pid 2127) entry 0x7f4ac6f79fb0 into list 0x7f4ad1a10f60 (older 0x7f4ad1a10f60 newer 0x7f4ad1a10f60) | pid table: inserted object 0x7f4ac6f79fa8 (addconn pid 2127) entry 0x7f4ac6f79fb0 (older 0x7f4ad1a10f60 newer 0x7f4ad1a10f60) What's missing is lines like the below showing it exiting: | reaped addconn helper child (status 0) | pid table: removing object 0x7f23f5238fa8 (addconn pid 2406) entry 0x7f23f5238fb0 (older 0x7f23fff48fc0 newer 0x7f23fff48fc0) ___ Swan-dev mailing list Swan-dev@lists.libreswan.org https://lists.libreswan.org/mailman/listinfo/swan-dev
Re: [Swan-dev] status of failing tests
On Tue, 13 Feb 2018, Andrew Cagney wrote: I'm left wondering if it would be easier to have a separate script (kvmrun.py?) that always stops at final.sh and requires/allows only one test. That would work for me. Paul ___ Swan-dev mailing list Swan-dev@lists.libreswan.org https://lists.libreswan.org/mailman/listinfo/swan-dev
Re: [Swan-dev] status of failing tests
On 4 February 2018 at 16:19, Paul Wouters wrote: > On Fri, 2 Feb 2018, Andrew Cagney wrote: > - early stop? testing/pluto/klips-netkey-pluto-06 failed east:output-different west:output-different >>> >>> >>> >>> if final.sh runs a status or trafficstatus and also shuts down pluto for >>> a leak report, there is a race between nodes. If one shuts down fast, >>> the other won't see the proper status because it will have processed the >>> deletes from the other peer's shutdown. The rule is to not have status >>> and shutdown in final.sh. >> >> >> This one is a no win. We've too often missed core dumps because pluto >> wasn't being shutdown. >> >> What about wrapping the inconsistent output in --cut-- --tuc--? > > > I'd say the best fix would be to have a flag that would shutdown pluto > after all ends have send their "done" for the final.sh. Then only grab > the shutdown leaks/cores. Right, but don't tie it to "done" in final.sh. Currently a core dump when: - eastinit.sh runs ok - westinit.sh runs ok east dumps core - westrun.sh times out doesn't get logged because final.sh (which contains scripts to look for core files) gets skipped and the core file is missed. This should be detect but isn't. I know of two equivalent ways of handling this: - even when an earlier script hangs, try to run final.sh - add a new generic script testing/pluto/bin/teardown.sh and always run that (we could hack final.sh to invoke teardown.sh) > But it would have to be a flag, because often we run a test case, so we > can login to the hosts and look at the state manually, so we wouldn't > want pluto to be shutdown in those cases. There is "kvmrunner --stop-at final.sh ...". But is behaviour is orthogonal more than usable in that: kvmrunner testing/pluto/basic-pluto-01 kvmrunner [--stop-at final.sh] testing/pluto/basic-pluto-01 kvmrunner testing/pluto/basic* kvmrunner [--stop-at final.sh] testing/pluto/basic* all do what you would expect but probably not what you want. I'm left wondering if it would be easier to have a separate script (kvmrun.py?) that always stops at final.sh and requires/allows only one test. Andrew ___ Swan-dev mailing list Swan-dev@lists.libreswan.org https://lists.libreswan.org/mailman/listinfo/swan-dev
Re: [Swan-dev] status of failing tests
On Fri, 2 Feb 2018, Andrew Cagney wrote: - early stop? testing/pluto/klips-netkey-pluto-06 failed east:output-different west:output-different if final.sh runs a status or trafficstatus and also shuts down pluto for a leak report, there is a race between nodes. If one shuts down fast, the other won't see the proper status because it will have processed the deletes from the other peer's shutdown. The rule is to not have status and shutdown in final.sh. This one is a no win. We've too often missed core dumps because pluto wasn't being shutdown. What about wrapping the inconsistent output in --cut-- --tuc--? I'd say the best fix would be to have a flag that would shutdown pluto after all ends have send their "done" for the final.sh. Then only grab the shutdown leaks/cores. But it would have to be a flag, because often we run a test case, so we can login to the hosts and look at the state manually, so we wouldn't want pluto to be shutdown in those cases. Paul ___ Swan-dev mailing list Swan-dev@lists.libreswan.org https://lists.libreswan.org/mailman/listinfo/swan-dev
Re: [Swan-dev] status of failing tests
>> - early stop? >> testing/pluto/klips-netkey-pluto-06 failed east:output-different >> west:output-different > > > if final.sh runs a status or trafficstatus and also shuts down pluto for > a leak report, there is a race between nodes. If one shuts down fast, > the other won't see the proper status because it will have processed the > deletes from the other peer's shutdown. The rule is to not have status > and shutdown in final.sh. This one is a no win. We've too often missed core dumps because pluto wasn't being shutdown. What about wrapping the inconsistent output in --cut-- --tuc--? Andrew ___ Swan-dev mailing list Swan-dev@lists.libreswan.org https://lists.libreswan.org/mailman/listinfo/swan-dev
Re: [Swan-dev] status of failing tests
FYI ... On 1 February 2018 at 11:37, Paul Wouters wrote: > On Thu, 1 Feb 2018, D. Hugh Redelmeier wrote: > >> - several failures that were only IKE retransmissions. Just ignore them. >> But a bit weird when IMPAIR_RETRANSMITS is set. > > > can happen in IKEv1 were both ends retransmit? There seems to be two reasons for suppressing re-transmits: - the connection is expected to fail, so speed things up with a quick timeout. --impair retransmits does this by aborting the connection when the first re-transmit timeout expires - the connection is expected to succeed but crypto might make it slow; hence wait the full timeout but don't send intervening re-transmits. --impair send-no-retransmits (new) does this (up until now we've used 'retransmit-interval=15000 # slow retransmits') Tests often use the former when the intent seems to be the latter. Andrew ___ Swan-dev mailing list Swan-dev@lists.libreswan.org https://lists.libreswan.org/mailman/listinfo/swan-dev
Re: [Swan-dev] status of failing tests
On Thu, 1 Feb 2018, D. Hugh Redelmeier wrote: - several failures that were only IKE retransmissions. Just ignore them. But a bit weird when IMPAIR_RETRANSMITS is set. can happen in IKEv1 were both ends retransmit? - not sure: maybe retransmission needed but suppressed? testing/pluto/ikev1-x509-12-san-dn-match failed west:output-different -004 "san" #2: STATE_QUICK_I2: sent QI2, IPsec SA established tunnel mode {ESP=>0xESPESP <0xESPESP xfrm=AES_CBC_128-HMAC_SHA1_96 NATOA=none NATD=none DPD=passive} +002 "san" #2: suppressing retransmit because IMPAIR_RETRANSMITS is set +002 "san" #2: IMPAIR RETRANSMITS: suppressing re-key +002 "san" #2: deleting state (STATE_QUICK_I1) So if the packet is lost, you get the suppressing message when it neds to retransmit. So that is one reason why the output differs even if we have impair set. - state numbers differ, but everything else seems OK I do see these and worry a bit. It seems to indicate a rekey has happened that we did not expect. testing/pluto/ikev2-delete-05-sa-start failed west:output-different testing/pluto/ikev2-delete-06-start-both failed west:output-different Some of this is because of changed behaviour, and it needs looking at more carefully, which is why I haven't updated the reference output yet. - sometimes "ping -c N' sends more than N packets. Huh? We should migrate to using fping. testing/pluto/ikev2-62-host-ondemand failed north:output-different testing/pluto/ikev2-62-host-ondemand-instance failed north:output-different testing/pluto/klips-passthrough-00 failed west:output-different - one dropped packet again, for the ones where we just send 4 pings to confirm, we should just use fping. testing/pluto/ikev1-algo-esp-sha2-01-netkey-klips failed west:output-different testing/pluto/ikev1-algo-esp-sha2-02-netkey-klips failed west:output-different testing/pluto/newoe-07-ike-replace-initiator failed road:output-different testing/pluto/certoe-09-packet-host failed road:output-different testing/pluto/klips-algo-twofish-01 failed west:output-different testing/pluto/klips-algo-serpent-01 failed west:output-different testing/pluto/klips-algo-cast-01 failed west:output-different testing/pluto/ah-pluto-07-klips-netkey failed west:output-different testing/pluto/l2tp-01 failed north:output-different testing/pluto/l2tp-02 failed north:output-different - more than one dropped packet I wish we could get this more stable testing/pluto/interop-ikev2-strongswan-38-mobike-initiator failed north:output-different -2 packets transmitted, 2 received, 0% packet loss, time -rtt min/avg/max/mdev = 0.XXX/0.XXX/0.XXX/0.XXX ms +2 packets transmitted, 0 received, 100% packet loss, time - something funny with MOBIKE testing/pluto/interop-ikev2-strongswan-38-mobike-pool failed east:output-different road:output-different west: +peer supports MOBIKE That message seems strongswan version specific. I thin newer versions have removed this? - script changed and odd message change - ip xfrm state + ipsec look This is a change I made. ipsec look runs both ip xfrm pol and ip xfrm state but adds a sorting wrapper. This prevents false positives where you see flipping of 192.1.2.23 with 192.1.2.45. testing/pluto/interop-ikev2-strongswan-39-mobike-responder failed east:output-different road:output-different - proto esp spi 0xSPISPIXX reqid REQID mode tunnel + proto esp spi 0xSPISPIXX reqid REQIDREQID mode tunnel I think related to the ipsec look somehow double fixing the REQID. testing/pluto/newoe-21-liveness-clear failed east:output-different road:output-different road: script changed! likely something else still went wrong, so output was not yet updated. - tcpdump didn't terminate properly not sure about that. testing/pluto/nflog-03-conns failed west:output-different - who knows testing/pluto/newoe-06-prio failed east:output-different road:output-different -000 #1: "road-east-ikev2" --- cut --- I don't know what those cut lines are and why they appear. Its a test anomaly. - bonus xfrms If you see "transport mode" bonus xfrms, those are the larval states. They seem to linger sometimes for a bit and then we see them. It's harmless (but annoying) testing/pluto/newoe-15-portpass failed road:output-different testing/pluto/newoe-18-private-clearall failed road:output-different testing/pluto/newoe-19-poc-poc-clear failed road:output-different testing/pluto/certoe-07-nat-2-clients failed road:output-different - state serial numbers differ Again, I'm a little concerned why this happens. testing/pluto/delete-sa-04 failed east:output-different testing/pluto/delete-sa-05 failed east:output-different west:output-different (also different conn chosen?) testing/pluto/klips-basic-pluto-01 failed east:output-different - mysteries Needs looking at. testing/pluto/n
Re: [Swan-dev] status of failing tests
| From: Paul Wouters | The tests seem to have gotten fairly consistent. There is a still a | group of problems: Thanks for a summary. I've got some fairly safe changes that I wanted to test. I ran these tests on the weekend. I started with master, as of d765c891ceb150bc65019d8b7dcbf96337c7333b. Here's what I observed. I think that I've removed the cases that you mentioned. I don't think that these errors are my fault (I could be wrong). Could you see if your experiences are the same? - several failures that were only IKE retransmissions. Just ignore them. But a bit weird when IMPAIR_RETRANSMITS is set. - not sure: maybe retransmission needed but suppressed? testing/pluto/ikev1-x509-12-san-dn-match failed west:output-different -004 "san" #2: STATE_QUICK_I2: sent QI2, IPsec SA established tunnel mode {ESP=>0xESPESP <0xESPESP xfrm=AES_CBC_128-HMAC_SHA1_96 NATOA=none NATD=none DPD=passive} +002 "san" #2: suppressing retransmit because IMPAIR_RETRANSMITS is set +002 "san" #2: IMPAIR RETRANSMITS: suppressing re-key +002 "san" #2: deleting state (STATE_QUICK_I1) - state numbers differ, but everything else seems OK testing/pluto/ikev2-delete-05-sa-start failed west:output-different testing/pluto/ikev2-delete-06-start-both failed west:output-different - sometimes "ping -c N' sends more than N packets. Huh? testing/pluto/ikev2-62-host-ondemand failed north:output-different testing/pluto/ikev2-62-host-ondemand-instance failed north:output-different testing/pluto/klips-passthrough-00 failed west:output-different - one dropped packet testing/pluto/ikev1-algo-esp-sha2-01-netkey-klips failed west:output-different testing/pluto/ikev1-algo-esp-sha2-02-netkey-klips failed west:output-different testing/pluto/newoe-07-ike-replace-initiator failed road:output-different testing/pluto/certoe-09-packet-host failed road:output-different testing/pluto/klips-algo-twofish-01 failed west:output-different testing/pluto/klips-algo-serpent-01 failed west:output-different testing/pluto/klips-algo-cast-01 failed west:output-different testing/pluto/ah-pluto-07-klips-netkey failed west:output-different testing/pluto/l2tp-01 failed north:output-different testing/pluto/l2tp-02 failed north:output-different - more than one dropped packet testing/pluto/interop-ikev2-strongswan-38-mobike-initiator failed north:output-different -2 packets transmitted, 2 received, 0% packet loss, time -rtt min/avg/max/mdev = 0.XXX/0.XXX/0.XXX/0.XXX ms +2 packets transmitted, 0 received, 100% packet loss, time - something funny with MOBIKE testing/pluto/interop-ikev2-strongswan-38-mobike-pool failed east:output-different road:output-different west: +peer supports MOBIKE - script changed and odd message change - ip xfrm state + ipsec look testing/pluto/interop-ikev2-strongswan-39-mobike-responder failed east:output-different road:output-different - proto esp spi 0xSPISPIXX reqid REQID mode tunnel + proto esp spi 0xSPISPIXX reqid REQIDREQID mode tunnel testing/pluto/newoe-21-liveness-clear failed east:output-different road:output-different road: script changed! - tcpdump didn't terminate properly testing/pluto/nflog-03-conns failed west:output-different - who knows testing/pluto/newoe-06-prio failed east:output-different road:output-different -000 #1: "road-east-ikev2" --- cut --- - bonus xfrms testing/pluto/newoe-15-portpass failed road:output-different testing/pluto/newoe-18-private-clearall failed road:output-different testing/pluto/newoe-19-poc-poc-clear failed road:output-different testing/pluto/certoe-07-nat-2-clients failed road:output-different - state serial numbers differ testing/pluto/delete-sa-04 failed east:output-different testing/pluto/delete-sa-05 failed east:output-different west:output-different (also different conn chosen?) testing/pluto/klips-basic-pluto-01 failed east:output-different - mysteries testing/pluto/nat-pluto-04 failed east:output-different -? (192.1.2.254) at 12:00:00:de:ad:ba [ether] on eth1 testing/pluto/interop-ikev1-strongswan-11-ah-initiator-sha512 failed west:output-different -generating QUICK_MODE request 0123456789 [ HASH ] -sending packet: from 192.1.2.45[500] to 192.1.2.23[500] (XXX bytes) testing/pluto/interop-ikev2-strongswan-35-ipsec-rekey failed west:output-different -westnet-eastnet-ikev2{6}: DELETING, TUNNEL, reqid 1 +westnet-eastnet-ikev2{6}: REKEYING, TUNNEL, reqid 1, expires in 10 seconds -westnet-eastnet-ikev2{7}: INSTALLED, TUNNEL, reqid 1, ESP SPIs: SPISPI_i SPISPI_o -westnet-eastnet-ikev2{7}: 192.0.1.0/24 === 192.0.2.0/24 testing/pluto/dnssec-pluto-01 failed west:output-different -000 "westnet-eastnet-etc-hosts-auto-add": 192.0.1.0/24===192.1.2.45<192.1.2.45>[@west]...192.1.2.23[@east]===192.0.2.
Re: [Swan-dev] status of failing tests
On Fri, 26 Jan 2018, Antony Antony wrote: There is an odd thing in master, I suspect latest tag commit is not yet merged to master. make showversion 3.22-773-gaafe4eba2-master Ahh. we tried to leave the tag on the release-3.23 branch so we did not have to commit, tag, uncommit the version number. I guess this is a side effect of that. Perhaps we can change the IPSECBASEVERSION to something where we say 3.24dr- and for git to then put it the rest of the short commit id. I a noticed a coredump interop-ikev2-strongswan-22-cp-responder-psk Seems that test didnt run properly on the last run on testing.libreswan.org http://testing.libreswan.org/results/testing/v3.22-763-gc3545f3-master/interop-ikev2-strongswan-22-cp-responder-psk/OUTPUT/ So it didn't produce a core yet. I guess it is something that needs to be looked into. it is probably during shutdown. Coredump first appeared 2018-01-22-swantest-3.22-739-g4e9bc8ed3-master and was not there in 2018-01-16-swantest-3.22-695-g5e1f56d86-master git rev-list --ancestry-path 5e1f56d86..4e9bc8ed3 | wc -l 44 Might be easier to git bisec it if it happens consistently. Paul ___ Swan-dev mailing list Swan-dev@lists.libreswan.org https://lists.libreswan.org/mailman/listinfo/swan-dev
Re: [Swan-dev] status of failing tests
There is an odd thing in master, I suspect latest tag commit is not yet merged to master. make showversion 3.22-773-gaafe4eba2-master I a noticed a coredump interop-ikev2-strongswan-22-cp-responder-psk it is probably during shutdown. Coredump first appeared 2018-01-22-swantest-3.22-739-g4e9bc8ed3-master and was not there in 2018-01-16-swantest-3.22-695-g5e1f56d86-master Core was generated by `/usr/local/libexec/ipsec/pluto --leak-detective --config /etc/ipsec.conf --nofo'. Program terminated with signal SIGSEGV, Segmentation fault. #0 0x7f8bcf8c4b46 in free_event_entry (evp=0x7f8bcfc11168 ) at /usr/src/debug/libreswan-3.23_17510eef70f0c2c18fcd5f1bb4eb447aa931733e/programs/pluto/server.c:467 467 struct pluto_event *next = e->next; #0 0x7f8bcf8c4b46 in free_event_entry (evp=0x7f8bcfc11168 ) at /usr/src/debug/libreswan-3.23_17510eef70f0c2c18fcd5f1bb4eb447aa931733e/programs/pluto/server.c:467 #1 0x7f8bcf8c4cb1 in free_pluto_event_list () at /usr/src/debug/libreswan-3.23_17510eef70f0c2c18fcd5f1bb4eb447aa931733e/programs/pluto/server.c:501 #2 0x7f8bcf8c285a in exit_pluto (status=0) at /usr/src/debug/libreswan-3.23_17510eef70f0c2c18fcd5f1bb4eb447aa931733e/programs/pluto/plutomain.c:1789 #3 0x7f8bcf92c936 in whack_handle (whackctlfd=4) at /usr/src/debug/libreswan-3.23_17510eef70f0c2c18fcd5f1bb4eb447aa931733e/programs/pluto/rcv_whack.c:733 #4 0x7f8bcf92c6ae in whack_handle_cb (fd=4, event=2, arg=0x0) at /usr/src/debug/libreswan-3.23_17510eef70f0c2c18fcd5f1bb4eb447aa931733e/programs/pluto/rcv_whack.c:662 #5 0x7f8bcd8203cc in event_process_active_single_queue (activeq=0x7f8bc7d96ff0, base=0x7f8bc7d90d80) at event.c:1350 #6 event_process_active (base=) at event.c:1420 #7 event_base_loop (base=0x7f8bc7d90d80, flags=0) at event.c:1621 #8 0x7f8bcf8c6d38 in call_server () at /usr/src/debug/libreswan-3.23_17510eef70f0c2c18fcd5f1bb4eb447aa931733e/programs/pluto/server.c:1121 #9 0x7f8bcf8c27b5 in main (argc=5, argv=0x7ffd906c0538) at /usr/src/debug/libreswan-3.23_17510eef70f0c2c18fcd5f1bb4eb447aa931733e/programs/pluto/plutomain.c:1749 On Thu, Jan 25, 2018 at 07:52:27PM -0500, Paul Wouters wrote: > > The tests seem to have gotten fairly consistent. There is a still a > group of problems: > > - dpd / liveness tests need a redesign to avoid constant false positives > - ipv6 tests have unexplained packet flow issues > - some traffic counters are off by a byte or 3. seems our kernels might > differ? > - not yet valid tests are failing for unknown reasons > - "MS+S=C" string mismatch due to auto-detect modecfgserver based on > addresspool >(printing of that string needs fixing, but its a rat hole) > > Some specific test cases of interest: > > xauth-pluto-24-static-addresspool shows the group ID with PSK lease > problem > xauth-pluto-22same > nss-cert-crl-03 missing issuer line? > newoe-08-ike-replace-responderadditional unexplained policies > some strongswan psk tests they gained a cert despite swan-prep > wiping them? > ikev2-10-2behind-nat not sure, lots of things going wrong? > > Paul > > ___ > Swan-dev mailing list > Swan-dev@lists.libreswan.org > https://lists.libreswan.org/mailman/listinfo/swan-dev ___ Swan-dev mailing list Swan-dev@lists.libreswan.org https://lists.libreswan.org/mailman/listinfo/swan-dev