Re: [Swan-dev] status of failing tests

2018-03-06 Thread Paul Wouters

On Tue, 6 Mar 2018, Andrew Cagney wrote:


 testing/pluto/ikev2-ppk-static-05-insist-nokey-insist-fail failed



There's something special about that test preventing "addconn" from
exiting (infinite loop)?


It's nothing related to the specific test...

I just ran it against git master on my laptop and it passes fine.

Paul
___
Swan-dev mailing list
Swan-dev@lists.libreswan.org
https://lists.libreswan.org/mailman/listinfo/swan-dev


Re: [Swan-dev] status of failing tests

2018-03-06 Thread Andrew Cagney
>>  testing/pluto/ikev2-ppk-static-05-insist-nokey-insist-fail failed
>> west:output-different
>> -003 "westnet-eastnet-ipv4-psk-ppk" #1: connection requires PPK,
>> but PPK_ID did not match any loaded PPK
>> +003 "westnet-eastnet-ipv4-psk-ppk" #1: connection requires PPK,
>> but we didn't find one
>> +leak: fork pid, item size: 64
>> +leak detective found 1 leaks, total size 64
>
>
> I'll double check with Vukasin. Although fork pid is not likely his
> fault :)

There's something special about that test preventing "addconn" from
exiting (infinite loop)?
A quick grep shows the process being created:

$ grep addconn 
testing/pluto/ikev2-ppk-static-05-insist-nokey-insist-fail/OUTPUT/west.pluto.log
| created addconn helper (pid:2127) using fork+execve
| pid table: inserting object 0x7f4ac6f79fa8 (addconn pid 2127) entry
0x7f4ac6f79fb0 into list 0x7f4ad1a10f60 (older 0x7f4ad1a10f60 newer
0x7f4ad1a10f60)
| pid table: inserted  object 0x7f4ac6f79fa8 (addconn pid 2127) entry
0x7f4ac6f79fb0 (older 0x7f4ad1a10f60 newer 0x7f4ad1a10f60)

What's missing is lines like the below showing it exiting:

| reaped addconn helper child (status 0)
| pid table: removing  object 0x7f23f5238fa8 (addconn pid 2406) entry
0x7f23f5238fb0 (older 0x7f23fff48fc0 newer 0x7f23fff48fc0)
___
Swan-dev mailing list
Swan-dev@lists.libreswan.org
https://lists.libreswan.org/mailman/listinfo/swan-dev


Re: [Swan-dev] status of failing tests

2018-02-13 Thread Paul Wouters

On Tue, 13 Feb 2018, Andrew Cagney wrote:


I'm left wondering if it would be easier to have a separate script
(kvmrun.py?) that always stops at final.sh and requires/allows only
one test.


That would work for me.

Paul
___
Swan-dev mailing list
Swan-dev@lists.libreswan.org
https://lists.libreswan.org/mailman/listinfo/swan-dev


Re: [Swan-dev] status of failing tests

2018-02-13 Thread Andrew Cagney
On 4 February 2018 at 16:19, Paul Wouters  wrote:
> On Fri, 2 Feb 2018, Andrew Cagney wrote:
>
 - early stop?
  testing/pluto/klips-netkey-pluto-06 failed east:output-different
 west:output-different
>>>
>>>
>>>
>>> if final.sh runs a status or trafficstatus and also shuts down pluto for
>>> a leak report, there is a race between nodes. If one shuts down fast,
>>> the other won't see the proper status because it will have processed the
>>> deletes from the other peer's shutdown. The rule is to not have status
>>> and shutdown in final.sh.
>>
>>
>> This one is a no win.  We've too often missed core dumps because pluto
>> wasn't being shutdown.
>>
>> What about wrapping the inconsistent output in --cut-- --tuc--?
>
>
> I'd say the best fix would be to have a flag that would shutdown pluto
> after all ends have send their "done" for the final.sh. Then only grab
> the shutdown leaks/cores.

Right, but don't tie it to "done" in final.sh.  Currently a core dump when:

- eastinit.sh runs ok
- westinit.sh runs ok
east dumps core
- westrun.sh times out

doesn't get logged because final.sh (which contains scripts to look
for core files) gets skipped and the core file is missed.

This should be detect but isn't.

I know of two equivalent ways of handling this:

- even when an earlier script hangs, try to run final.sh
- add a new generic script testing/pluto/bin/teardown.sh and always run that

(we could hack final.sh to invoke teardown.sh)

> But it would have to be a flag, because often we run a test case, so we
> can login to the hosts and look at the state manually, so we wouldn't
> want pluto to be shutdown in those cases.

There is "kvmrunner --stop-at final.sh ...".  But is behaviour is
orthogonal more than usable in that:
  kvmrunner testing/pluto/basic-pluto-01
  kvmrunner [--stop-at final.sh] testing/pluto/basic-pluto-01
  kvmrunner testing/pluto/basic*
  kvmrunner [--stop-at final.sh] testing/pluto/basic*
all do what you would expect but probably not what you want.

I'm left wondering if it would be easier to have a separate script
(kvmrun.py?) that always stops at final.sh and requires/allows only
one test.

Andrew
___
Swan-dev mailing list
Swan-dev@lists.libreswan.org
https://lists.libreswan.org/mailman/listinfo/swan-dev


Re: [Swan-dev] status of failing tests

2018-02-04 Thread Paul Wouters

On Fri, 2 Feb 2018, Andrew Cagney wrote:


- early stop?
 testing/pluto/klips-netkey-pluto-06 failed east:output-different
west:output-different



if final.sh runs a status or trafficstatus and also shuts down pluto for
a leak report, there is a race between nodes. If one shuts down fast,
the other won't see the proper status because it will have processed the
deletes from the other peer's shutdown. The rule is to not have status
and shutdown in final.sh.


This one is a no win.  We've too often missed core dumps because pluto
wasn't being shutdown.

What about wrapping the inconsistent output in --cut-- --tuc--?


I'd say the best fix would be to have a flag that would shutdown pluto
after all ends have send their "done" for the final.sh. Then only grab
the shutdown leaks/cores.

But it would have to be a flag, because often we run a test case, so we
can login to the hosts and look at the state manually, so we wouldn't
want pluto to be shutdown in those cases.

Paul
___
Swan-dev mailing list
Swan-dev@lists.libreswan.org
https://lists.libreswan.org/mailman/listinfo/swan-dev


Re: [Swan-dev] status of failing tests

2018-02-02 Thread Andrew Cagney
>> - early stop?
>>  testing/pluto/klips-netkey-pluto-06 failed east:output-different
>> west:output-different
>
>
> if final.sh runs a status or trafficstatus and also shuts down pluto for
> a leak report, there is a race between nodes. If one shuts down fast,
> the other won't see the proper status because it will have processed the
> deletes from the other peer's shutdown. The rule is to not have status
> and shutdown in final.sh.

This one is a no win.  We've too often missed core dumps because pluto
wasn't being shutdown.

What about wrapping the inconsistent output in --cut-- --tuc--?

Andrew
___
Swan-dev mailing list
Swan-dev@lists.libreswan.org
https://lists.libreswan.org/mailman/listinfo/swan-dev


Re: [Swan-dev] status of failing tests

2018-02-02 Thread Andrew Cagney
FYI ...

On 1 February 2018 at 11:37, Paul Wouters  wrote:
> On Thu, 1 Feb 2018, D. Hugh Redelmeier wrote:
>
>> - several failures that were only IKE retransmissions.  Just ignore them.
>>  But a bit weird when IMPAIR_RETRANSMITS is set.
>
>
> can happen in IKEv1 were both ends retransmit?

There seems to be two reasons for suppressing re-transmits:

- the connection is expected to fail, so speed things up with a quick
timeout.   --impair retransmits does this by aborting the connection
when the first re-transmit timeout expires

- the connection is expected to succeed but crypto might make it slow;
hence wait the full timeout but don't send intervening re-transmits.
--impair send-no-retransmits (new) does this (up until now we've used
'retransmit-interval=15000 # slow retransmits')

Tests often use the former when the intent seems to be the latter.

Andrew
___
Swan-dev mailing list
Swan-dev@lists.libreswan.org
https://lists.libreswan.org/mailman/listinfo/swan-dev


Re: [Swan-dev] status of failing tests

2018-02-01 Thread Paul Wouters

On Thu, 1 Feb 2018, D. Hugh Redelmeier wrote:


- several failures that were only IKE retransmissions.  Just ignore them.
 But a bit weird when IMPAIR_RETRANSMITS is set.


can happen in IKEv1 were both ends retransmit?


- not sure: maybe retransmission needed but suppressed?
 testing/pluto/ikev1-x509-12-san-dn-match failed west:output-different
-004 "san" #2: STATE_QUICK_I2: sent QI2, IPsec SA established tunnel mode 
{ESP=>0xESPESP <0xESPESP xfrm=AES_CBC_128-HMAC_SHA1_96 NATOA=none NATD=none DPD=passive}
+002 "san" #2: suppressing retransmit because IMPAIR_RETRANSMITS is set
+002 "san" #2: IMPAIR RETRANSMITS: suppressing re-key
+002 "san" #2: deleting state (STATE_QUICK_I1)


So if the packet is lost, you get the suppressing message when it neds
to retransmit. So that is one reason why the output differs even if
we have impair set.


- state numbers differ, but everything else seems OK


I do see these and worry a bit. It seems to indicate a rekey has
happened that we did not expect.


 testing/pluto/ikev2-delete-05-sa-start failed west:output-different
 testing/pluto/ikev2-delete-06-start-both failed west:output-different


Some of this is because of changed behaviour, and it needs looking at
more carefully, which is why I haven't updated the reference output yet.


- sometimes "ping -c N' sends more than N packets.  Huh?


We should migrate to using fping.


 testing/pluto/ikev2-62-host-ondemand failed north:output-different
 testing/pluto/ikev2-62-host-ondemand-instance failed north:output-different
 testing/pluto/klips-passthrough-00 failed west:output-different
- one dropped packet


again, for the ones where we just send 4 pings to confirm, we should
just use fping.


 testing/pluto/ikev1-algo-esp-sha2-01-netkey-klips failed west:output-different
 testing/pluto/ikev1-algo-esp-sha2-02-netkey-klips failed west:output-different
 testing/pluto/newoe-07-ike-replace-initiator failed road:output-different
 testing/pluto/certoe-09-packet-host failed road:output-different
 testing/pluto/klips-algo-twofish-01 failed west:output-different
 testing/pluto/klips-algo-serpent-01 failed west:output-different
 testing/pluto/klips-algo-cast-01 failed west:output-different
 testing/pluto/ah-pluto-07-klips-netkey failed west:output-different
 testing/pluto/l2tp-01 failed north:output-different
 testing/pluto/l2tp-02 failed north:output-different

- more than one dropped packet


I wish we could get this more stable


 testing/pluto/interop-ikev2-strongswan-38-mobike-initiator failed 
north:output-different
-2 packets transmitted, 2 received, 0% packet loss, time 
-rtt min/avg/max/mdev = 0.XXX/0.XXX/0.XXX/0.XXX ms
+2 packets transmitted, 0 received, 100% packet loss, time 

- something funny with MOBIKE
 testing/pluto/interop-ikev2-strongswan-38-mobike-pool failed 
east:output-different road:output-different
west: +peer supports MOBIKE


That message seems strongswan version specific. I thin newer versions
have removed this?


- script changed and odd message change
- ip xfrm state
+ ipsec look


This is a change I made. ipsec look runs both ip xfrm pol and ip xfrm
state but adds a sorting wrapper. This prevents false positives where
you see flipping of 192.1.2.23 with 192.1.2.45.


 testing/pluto/interop-ikev2-strongswan-39-mobike-responder failed 
east:output-different road:output-different
-   proto esp spi 0xSPISPIXX reqid REQID mode tunnel
+   proto esp spi 0xSPISPIXX reqid REQIDREQID mode tunnel


I think related to the ipsec look somehow double fixing the REQID.


 testing/pluto/newoe-21-liveness-clear failed east:output-different 
road:output-different
road: script changed!


likely something else still went wrong, so output was not yet updated.


- tcpdump didn't terminate properly


not sure about that.


 testing/pluto/nflog-03-conns failed west:output-different

- who knows
 testing/pluto/newoe-06-prio failed east:output-different road:output-different
-000 #1: "road-east-ikev2" --- cut ---


I don't know what those cut lines are and why they appear. Its a test
anomaly.


- bonus xfrms


If you see "transport mode" bonus xfrms, those are the larval states.
They seem to linger sometimes for a bit and then we see them. It's
harmless (but annoying)


 testing/pluto/newoe-15-portpass failed road:output-different
 testing/pluto/newoe-18-private-clearall failed road:output-different
 testing/pluto/newoe-19-poc-poc-clear failed road:output-different
 testing/pluto/certoe-07-nat-2-clients failed road:output-different

- state serial numbers differ


Again, I'm a little concerned why this happens.


 testing/pluto/delete-sa-04 failed east:output-different
 testing/pluto/delete-sa-05 failed east:output-different west:output-different
(also different conn chosen?)
 testing/pluto/klips-basic-pluto-01 failed east:output-different

- mysteries


Needs looking at.


 testing/pluto/n

Re: [Swan-dev] status of failing tests

2018-01-31 Thread D. Hugh Redelmeier
| From: Paul Wouters 

| The tests seem to have gotten fairly consistent. There is a still a
| group of problems:

Thanks for a summary.

I've got some fairly safe changes that I wanted to test.  I ran these
tests on the weekend.  I started with master, as of
d765c891ceb150bc65019d8b7dcbf96337c7333b.

Here's what I observed.  I think that I've removed the cases that you
mentioned.  I don't think that these errors are my fault (I could be
wrong).

Could you see if your experiences are the same?

- several failures that were only IKE retransmissions.  Just ignore them.
  But a bit weird when IMPAIR_RETRANSMITS is set.

- not sure: maybe retransmission needed but suppressed?
  testing/pluto/ikev1-x509-12-san-dn-match failed west:output-different
-004 "san" #2: STATE_QUICK_I2: sent QI2, IPsec SA established tunnel 
mode {ESP=>0xESPESP <0xESPESP xfrm=AES_CBC_128-HMAC_SHA1_96 NATOA=none 
NATD=none DPD=passive}
+002 "san" #2: suppressing retransmit because IMPAIR_RETRANSMITS is set
+002 "san" #2: IMPAIR RETRANSMITS: suppressing re-key
+002 "san" #2: deleting state (STATE_QUICK_I1)

- state numbers differ, but everything else seems OK

  testing/pluto/ikev2-delete-05-sa-start failed west:output-different
  testing/pluto/ikev2-delete-06-start-both failed west:output-different

- sometimes "ping -c N' sends more than N packets.  Huh?

  testing/pluto/ikev2-62-host-ondemand failed north:output-different
  testing/pluto/ikev2-62-host-ondemand-instance failed north:output-different
  testing/pluto/klips-passthrough-00 failed west:output-different

- one dropped packet
  testing/pluto/ikev1-algo-esp-sha2-01-netkey-klips failed west:output-different
  testing/pluto/ikev1-algo-esp-sha2-02-netkey-klips failed west:output-different
  testing/pluto/newoe-07-ike-replace-initiator failed road:output-different
  testing/pluto/certoe-09-packet-host failed road:output-different
  testing/pluto/klips-algo-twofish-01 failed west:output-different
  testing/pluto/klips-algo-serpent-01 failed west:output-different
  testing/pluto/klips-algo-cast-01 failed west:output-different
  testing/pluto/ah-pluto-07-klips-netkey failed west:output-different
  testing/pluto/l2tp-01 failed north:output-different
  testing/pluto/l2tp-02 failed north:output-different

- more than one dropped packet
  testing/pluto/interop-ikev2-strongswan-38-mobike-initiator failed 
north:output-different
-2 packets transmitted, 2 received, 0% packet loss, time 
-rtt min/avg/max/mdev = 0.XXX/0.XXX/0.XXX/0.XXX ms
+2 packets transmitted, 0 received, 100% packet loss, time 

- something funny with MOBIKE
  testing/pluto/interop-ikev2-strongswan-38-mobike-pool failed 
east:output-different road:output-different
west: +peer supports MOBIKE

- script changed and odd message change
- ip xfrm state
+ ipsec look

  testing/pluto/interop-ikev2-strongswan-39-mobike-responder failed 
east:output-different road:output-different
-   proto esp spi 0xSPISPIXX reqid REQID mode tunnel
+   proto esp spi 0xSPISPIXX reqid REQIDREQID mode tunnel

  testing/pluto/newoe-21-liveness-clear failed east:output-different 
road:output-different
road: script changed!

- tcpdump didn't terminate properly
  testing/pluto/nflog-03-conns failed west:output-different

- who knows
  testing/pluto/newoe-06-prio failed east:output-different road:output-different
-000 #1: "road-east-ikev2" --- cut ---

- bonus xfrms
  testing/pluto/newoe-15-portpass failed road:output-different
  testing/pluto/newoe-18-private-clearall failed road:output-different
  testing/pluto/newoe-19-poc-poc-clear failed road:output-different
  testing/pluto/certoe-07-nat-2-clients failed road:output-different

- state serial numbers differ
  testing/pluto/delete-sa-04 failed east:output-different
  testing/pluto/delete-sa-05 failed east:output-different west:output-different
(also different conn chosen?)
  testing/pluto/klips-basic-pluto-01 failed east:output-different

- mysteries

  testing/pluto/nat-pluto-04 failed east:output-different
-? (192.1.2.254) at 12:00:00:de:ad:ba [ether] on eth1

  testing/pluto/interop-ikev1-strongswan-11-ah-initiator-sha512 failed 
west:output-different
-generating QUICK_MODE request 0123456789 [ HASH ]
-sending packet: from 192.1.2.45[500] to 192.1.2.23[500] (XXX bytes)

  testing/pluto/interop-ikev2-strongswan-35-ipsec-rekey failed 
west:output-different
-westnet-eastnet-ikev2{6}:  DELETING, TUNNEL, reqid 1
+westnet-eastnet-ikev2{6}:  REKEYING, TUNNEL, reqid 1, expires in 10 
seconds
-westnet-eastnet-ikev2{7}:  INSTALLED, TUNNEL, reqid 1, ESP SPIs: 
SPISPI_i SPISPI_o
-westnet-eastnet-ikev2{7}:   192.0.1.0/24 === 192.0.2.0/24

  testing/pluto/dnssec-pluto-01 failed west:output-different
-000 "westnet-eastnet-etc-hosts-auto-add": 
192.0.1.0/24===192.1.2.45<192.1.2.45>[@west]...192.1.2.23[@east]===192.0.2.

Re: [Swan-dev] status of failing tests

2018-01-26 Thread Paul Wouters

On Fri, 26 Jan 2018, Antony Antony wrote:


There is an odd thing in master, I suspect latest tag commit is not yet merged 
to master.
make showversion
3.22-773-gaafe4eba2-master


Ahh. we tried to leave the tag on the release-3.23 branch so we did not
have to commit, tag, uncommit the version number. I guess this is a
side effect of that. Perhaps we can change the IPSECBASEVERSION to
something where we say 3.24dr- and for git to then put it the rest
of the short commit id.


I a noticed a coredump interop-ikev2-strongswan-22-cp-responder-psk


Seems that test didnt run properly on the last run on testing.libreswan.org

http://testing.libreswan.org/results/testing/v3.22-763-gc3545f3-master/interop-ikev2-strongswan-22-cp-responder-psk/OUTPUT/

So it didn't produce a core yet. I guess it is something that needs to
be looked into.


it is probably during shutdown. Coredump first appeared 
2018-01-22-swantest-3.22-739-g4e9bc8ed3-master and was not there in 
2018-01-16-swantest-3.22-695-g5e1f56d86-master


git   rev-list --ancestry-path 5e1f56d86..4e9bc8ed3 | wc -l
44

Might be easier to git bisec it if it happens consistently.

Paul
___
Swan-dev mailing list
Swan-dev@lists.libreswan.org
https://lists.libreswan.org/mailman/listinfo/swan-dev


Re: [Swan-dev] status of failing tests

2018-01-26 Thread Antony Antony
There is an odd thing in master, I suspect latest tag commit is not yet merged 
to master.
make showversion
3.22-773-gaafe4eba2-master

I a noticed a coredump interop-ikev2-strongswan-22-cp-responder-psk 

it is probably during shutdown. Coredump first appeared 
2018-01-22-swantest-3.22-739-g4e9bc8ed3-master and was not there in 
2018-01-16-swantest-3.22-695-g5e1f56d86-master

Core was generated by `/usr/local/libexec/ipsec/pluto --leak-detective --config 
/etc/ipsec.conf --nofo'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x7f8bcf8c4b46 in free_event_entry (evp=0x7f8bcfc11168 
) at 
/usr/src/debug/libreswan-3.23_17510eef70f0c2c18fcd5f1bb4eb447aa931733e/programs/pluto/server.c:467
467 struct pluto_event *next = e->next;
#0  0x7f8bcf8c4b46 in free_event_entry (evp=0x7f8bcfc11168 
) at 
/usr/src/debug/libreswan-3.23_17510eef70f0c2c18fcd5f1bb4eb447aa931733e/programs/pluto/server.c:467
#1  0x7f8bcf8c4cb1 in free_pluto_event_list () at 
/usr/src/debug/libreswan-3.23_17510eef70f0c2c18fcd5f1bb4eb447aa931733e/programs/pluto/server.c:501
#2  0x7f8bcf8c285a in exit_pluto (status=0) at 
/usr/src/debug/libreswan-3.23_17510eef70f0c2c18fcd5f1bb4eb447aa931733e/programs/pluto/plutomain.c:1789
#3  0x7f8bcf92c936 in whack_handle (whackctlfd=4) at 
/usr/src/debug/libreswan-3.23_17510eef70f0c2c18fcd5f1bb4eb447aa931733e/programs/pluto/rcv_whack.c:733
#4  0x7f8bcf92c6ae in whack_handle_cb (fd=4, event=2, arg=0x0) at 
/usr/src/debug/libreswan-3.23_17510eef70f0c2c18fcd5f1bb4eb447aa931733e/programs/pluto/rcv_whack.c:662
#5  0x7f8bcd8203cc in event_process_active_single_queue 
(activeq=0x7f8bc7d96ff0, base=0x7f8bc7d90d80) at event.c:1350
#6  event_process_active (base=) at event.c:1420
#7  event_base_loop (base=0x7f8bc7d90d80, flags=0) at event.c:1621
#8  0x7f8bcf8c6d38 in call_server () at 
/usr/src/debug/libreswan-3.23_17510eef70f0c2c18fcd5f1bb4eb447aa931733e/programs/pluto/server.c:1121
#9  0x7f8bcf8c27b5 in main (argc=5, argv=0x7ffd906c0538) at 
/usr/src/debug/libreswan-3.23_17510eef70f0c2c18fcd5f1bb4eb447aa931733e/programs/pluto/plutomain.c:1749


On Thu, Jan 25, 2018 at 07:52:27PM -0500, Paul Wouters wrote:
> 
> The tests seem to have gotten fairly consistent. There is a still a
> group of problems:
> 
> - dpd / liveness tests need a redesign to avoid constant false positives
> - ipv6 tests have unexplained packet flow issues
> - some traffic counters are off by a byte or 3. seems our kernels might
>   differ?
> - not yet valid tests are failing for unknown reasons
> - "MS+S=C" string mismatch due to auto-detect modecfgserver based on 
> addresspool
>(printing of that string needs fixing, but its a rat hole)
> 
> Some specific test cases of interest:
> 
> xauth-pluto-24-static-addresspool shows the group ID with PSK lease 
> problem
> xauth-pluto-22same
> nss-cert-crl-03   missing issuer line?
> newoe-08-ike-replace-responderadditional unexplained policies
> some strongswan psk tests they gained a cert despite swan-prep 
> wiping them?
> ikev2-10-2behind-nat  not sure, lots of things going wrong?
> 
> Paul
> 
> ___
> Swan-dev mailing list
> Swan-dev@lists.libreswan.org
> https://lists.libreswan.org/mailman/listinfo/swan-dev
___
Swan-dev mailing list
Swan-dev@lists.libreswan.org
https://lists.libreswan.org/mailman/listinfo/swan-dev