On Sat, 11 Apr 2015, Herbert Xu wrote:
This is wrong because some updates do not
contain keying material.
I don't understand this. Can you explain what the problem is for those
SA's ?
Updates are used in two places in pluto. They're used for inbound
SAs as part of the get_spi + update procedure, and they are used
for NAT-T updates. In the latter case there is no keying material
so you must not replace the update with an add.
The kernel will never delete any live SAs installed by pluto since
pluto does not set hard life times on them. So the NAT-T update
should never fail anyway unless some third party is deleting SAs.
So the patch that switched it between add and update got quite a history
behind it. And a reverted revert commit.
Part of the problem is https://bugs.libreswan.org/show_bug.cgi?id=75
Some history can be seen in the git commits:
https://github.com/libreswan/libreswan/commit/b5fa5eb1033ee3b73f7121a8ba3e593be21f8226
https://github.com/libreswan/libreswan/commit/f81203faff29490157c6ef1cbc75d476a902bb63
https://github.com/libreswan/libreswan/commit/15d27b8ad4a2f0d1fb252e608cfeafe6b7121773
https://github.com/libreswan/libreswan/commit/39b7891e50fae053e8acebdc1f55af6408f8fdad
So first, the b5 commit changed from add to update:
errors on roadwarriors switching between internal IP's and reconnecting,
where NETKEY says a policy already exists (possibly because we do not
properly delete the policy when we delete the phase1, and the XP clients
delete their phase1 after 1 minute of idle time)
I reverted that, but sadly I didn't log why.
It was then reverted again by my with a comment:
* NEW will fail when an existing policy, UPD always works.
* This seems to happen in cases with NAT'ed XP clients, or
* quick recycling/resurfacing of roadwarriors on the same IP.
* req.n.nlmsg_type = XFRM_MSG_NEWPOLICY;
So that does relate to your NAT update comment.
But note that Tuomo also ran into a problem with connecting tunnels as
explained in bug 75:
On configuration where a talks with c via b eg. a == b == c where
tunnels are
defined as a-c on both a = b and b = c we are missing tunnels.
This is bug introduced by commit:
15d27b8ad4a2f0d1fb252e608cfeafe6b7121773
With that patch applied I get this error when starting ipsec:
#31: ERROR: netlink XFRM_MSG_NEWPOLICY response for flow
[email protected] included errno 17: File exists
With patch reverted there are no errors and tunnels work as they should.
When you do a get_spi the kernel generates a temporary SA to keep
hold of the SPI so that nobody else gets it. But this SA only
lives until xfrm_acq_expires.
Oh, I did not realise that! That's good to know.
Therefore redoing the add after update might work but is simply
wrong. You might as well just pluck some random number out of
thin air and use that as your SPI.
I understand now. I guess we need to look into the two tunnel problem
listed above and how to deal with the Win XP / NAT issue, and figure
out what is going wrong there.
Yes, current git has switched to libevent and subsecond retransmits
and timeouts, so we will fall within that 30 second time window as
well.
OK if you can guarantee that you will not call update 30 seconds
after the get_spi, then you should be fine. In that case you can
also revert the patch that retries the add after update because
it is just papering over the xfrm_acq_expires problem and is no
longer needed.
Right. I'll do that.
For libreswan, I suggest that you increase this parameter to
a more appropriate value. I haven't done the calculations but
strongswan sets it to 165 which seems to be appropriate.
Almost 3 minutes? That seems very long.
Well it just has to be longer than the maximum interval between
pluto doing get_spi and calling update_sa.
Maybe pluto should explictly track this timer and just fail when it
notices the time has expired.
Paul
_______________________________________________
Swan-dev mailing list
[email protected]
https://lists.libreswan.org/mailman/listinfo/swan-dev