Re: [Qemu-devel] [PATCH] tun: orphan an skb on tx

2015-02-03 Thread David Woodhouse
On Tue, 2015-02-03 at 16:19 -0800, David Miller wrote:
 From: David Woodhouse dw...@infradead.org
 Date: Mon, 02 Feb 2015 07:27:10 +
 
  I'm guessing you don't want to push the *whole* management of the TLS
  control connection *and* the UDP transport, and probing the latter with
  keepalives, into the kernel? I certainly don't :)
 
 Whilst Herbert Xu and I have discussed in the past supporting
 automatic SSL handling of socket data during socket writes in the
 kernel, doing TLS stuff would be a bit of a stretch :-)

Right. For the DTLS I was thinking we'd do the handshake in userspace
and then hand the UDP socket down. At that point it's basically the same
as ESP with the bytes in a slightly different place.

So I really am looking at an option for here's a UDP socket to send
those tun packets out on, with this encryption setup as the sanest
plan I can come up with.

-- 
dwmw2



smime.p7s
Description: S/MIME cryptographic signature


Re: [Qemu-devel] [PATCH] tun: orphan an skb on tx

2015-02-03 Thread David Miller
From: David Woodhouse dw...@infradead.org
Date: Mon, 02 Feb 2015 07:27:10 +

 I'm guessing you don't want to push the *whole* management of the TLS
 control connection *and* the UDP transport, and probing the latter with
 keepalives, into the kernel? I certainly don't :)

Whilst Herbert Xu and I have discussed in the past supporting
automatic SSL handling of socket data during socket writes in the
kernel, doing TLS stuff would be a bit of a stretch :-)



Re: [Qemu-devel] [PATCH] tun: orphan an skb on tx

2015-02-02 Thread Steffen Klassert
On Mon, Feb 02, 2015 at 07:27:10AM +, David Woodhouse wrote:
 On Sun, 2015-02-01 at 21:07 -0800, David Miller wrote:
 
  We might as well have not have implemented the IPSEC stack at all,
  because as a result of the userland VPN stuff our IPSEC stack is
  largely unused except by a very narrow group of users.
 
 Well, I'd love to make better use of it if I can. I do suspect it makes
 most sense for userspace to continue to manage the probing of UDP
 connectivity, and the fallback to TCP mode — and I suspect it also makes
 sense to continue to use tun for passing packets up to the VPN client
 when it's using the TCP transport.
 
 So the question would be how we handle redirecting the packet flow to
 the optional UDP transport, when the VPN client determines that it's
 available. For the sake of the user setting up firewall and routing
 rules, I do think it's important that it continues to appear to
 userspace as the *same* device for the entire lifetime of the session,
 regardless of which transport the packets happen to be using at a given
 moment in time. It doesn't *have* to be tun, though. 
 
 You don't seem to like my suggestion of somehow pushing down an XFRM
 state to the tun device to direct the packets out there instead of up to
 userspace. Do you have an alternative suggestion... or a specific
 concern that would help me come up with something you like better?

Maybe you want to use a virtual tunnel interface (vti) what we have
already. Everything that is routed through such an interface is
guaranteed to be either encrypted if a matching xfrm state is present
or dropped. Same on the rceive side, everything that is received by
this interface is guaranteed to be IPsec processed. So you can do
a routing based decision about the IPsec processing.

While I'm sure it could handle the ESP in UDP encapsulation, I'm not that
sure about your TCP fallback because this requires a valid xfrm state
to allow packets to pass. Using the same interface for both is probably
not possible.



Re: [Qemu-devel] [PATCH] tun: orphan an skb on tx

2015-02-02 Thread David Woodhouse
On Mon, 2015-02-02 at 16:23 +0100, Phil Sutter wrote:
 Since you want to provide connectivity over HTTPS which is not possible
 in kernel space, you are stuck with keeping the tun device. So the
 packet flow in that case is identical to how e.g. OpenVPN does it:
 
 - tunX holds default route
 - OpenConnect then:
   - receives packets on /dev/tun
   - holds TCP socket to VPN concentrator
   - does encapsulation into TLS
 
 Speaking of optimisation, the interesting part is the alternative flow
 via IPsec in UDP.

Right. The packet flow you describe is what we already have. Except of
course we already *do* establish the UDP connection (which is DTLS when
we're talking to a Cisco AnyConnect server, and ESP in UDP when we're
talking to Juniper). If we get responses to keepalive packets, we'll
send outbound packets over the UDP connection. If the UDP connectivity
goes AWOL, we'll fall back to sending on TCP. Rekeying of the UDP
connection is handled over the TCP control connection too. Even in the
DTLS case, the master secret and session ID are exchanged over TCP and
the DTLS is actually done as a 'session resume', without the normal DTLS
handshake ever happening.

As you say, I'm stuck with keeping the tun device (or something very
much like it). This *isn't* like vpnc where I can set up an IPSec config
and just let it run.

  AFAICT, it should be possible to setup an ESP in UDP
 tunnel using XFRM (see ip-xfrm(8) for reference), although I didn't try
 that myself. The funny thing with XFRM is, it applies before the routing
 decision does: If my IPsec policy matches, the packet goes that way no
 matter what the routing table says about the original destination. This
 can be used to override the default route provided via tun0 in the above
 case.

Except it isn't even the default route. We get given a bunch of split
includes or split excludes from the VPN server. We pass them to
vpnc-script or NetworkManager to actually set the routes up, and those
tools may make their own tweaks to what the server requested — denying
the default route and setting up explicit routes, or adding firewall
rules or NAT to incoming/outgoing packets on the tun device.

If it is no longer *just* the single tun device, everything gets really
complicated. Even *before* we talk about changing it on the fly during
normal operation.

 Of course, OpenConnect has to manage all the XFRM/policy stuff on it's
 own, since switching from ESP in UDP back to TLS would mean to tear down
 the XFRM tunnel. OpenConnect would have to setup (a limited) XFRM and
 send test traffic to decide whether to set it up fully (if limited) or
 tear it down (if unlimited) again so traffic arrives at tunX again.

Right. And ideally without CAP_NET_ADMIN.

 In my opinion, this might work. The whole setup is probably about as
 intuitive as the fact that kernel IPsec tunnel mode does not naturally
 provide an own interface. Firewall setup on top of that might become a
 matter of try-and-error. Maybe having a VTI interface and merely moving
 the default route instead of fiddling with policies all the time might
 make things a little easier to comprehend, but surely adds some
 performance overhead.

I think even the latter is sufficiently complex to manage that it's not
worth pursuing. I may throw together my suggested hack using
tun_get_socket() and see how much it makes *me* barf before deciding
whether to show it here for more feedback :)

-- 
dwmw2


smime.p7s
Description: S/MIME cryptographic signature


Re: [Qemu-devel] [PATCH] tun: orphan an skb on tx

2015-02-02 Thread David Woodhouse
On Mon, 2015-02-02 at 09:24 +0100, Steffen Klassert wrote:
 
 Maybe you want to use a virtual tunnel interface (vti) what we have
 already. Everything that is routed through such an interface is
 guaranteed to be either encrypted if a matching xfrm state is present
 or dropped. Same on the rceive side, everything that is received by
 this interface is guaranteed to be IPsec processed. So you can do
 a routing based decision about the IPsec processing.
 
 While I'm sure it could handle the ESP in UDP encapsulation, I'm not that
 sure about your TCP fallback because this requires a valid xfrm state
 to allow packets to pass. Using the same interface for both is probably
 not possible.

I'm trying to imagine how we could make it work in practice if we end up
exposing two *different* interfaces and having to change the kernel's
routing according to whether we have UDP connectivity at any given
moment in time.

Given how painful it already is to maintain vpnc-script and make it do
the right thing for split-include and split-exclude routing, I'm not
really sure I want to go there.

Even if we could get such a scheme to work, it would probably also
require retaining root privileges to make the changes — and one of the
security benefits over the proprietary VPN clients is that we don't
*need* to run as root. We can either drop privs after running
vpnc-script to do the initial routing setup, or in the NetworkManager
case we *never* run with elevated privileges; we just pass the
IP/routing information back over DBus to NetworkManager.

It occurs to me that for the approach I was thinking about, I wouldn't
even need to touch the internals of the tun driver. It could be a
separate driver which just uses tun_get_socket(). Userspace could hand
it the file descriptors of the tun device and the connected UDP socket,
along with the encryption parameters — and then just stop reading
packets from the tun device for itself.

-- 
dwmw2


smime.p7s
Description: S/MIME cryptographic signature


Re: [Qemu-devel] [PATCH] tun: orphan an skb on tx

2015-02-02 Thread Phil Sutter
Hi,

On Mon, Feb 02, 2015 at 07:27:10AM +, David Woodhouse wrote:
 On Sun, 2015-02-01 at 21:07 -0800, David Miller wrote:
  From: David Woodhouse dw...@infradead.org
  Date: Sun, 01 Feb 2015 21:29:43 +
  
   I really was looking for some way to push down something like an XFRM
   state into the tun device and just say shove them out here until I tell
   you otherwise.
  
  People decided to use TUN and push VPN stuff back into userspace,
  and there are repercussions for that decision.
  
  I'm not saying this to be mean or whatever, but I was very
  disappointed when userland IPSEC solutions using TUN started showing
  up.
 
 Yeah. That's a valid criticism of vpnc, certainly. I never did
 understand why it reimplemented the IPSec stack.
 
 For my OpenConnect client it's somewhat more justified though — the
 initial data transport there is over TLS, which the kernel doesn't
 support. And if we *can* establish UDP communication, that's over DTLS
 which the kernel doesn't support either. It's not even the *standard*
 version of DTLS because Cisco are still using a pre-RFC4347 version of
 the protocol. And we also need to probe the UDP connectivity and do
 keepalives and manage the fallback to using the TCP data transport.
 
 It's not like vpnc where it really is just a case of setting up the ESP
 context and letting it run.
 
 It's only now I've added Juniper support, which uses ESP-in-UDP for the
 data transport, that I'm doing something that the kernel supports at
 all. And now I'm looking at how to make use of that.
 
  We might as well have not have implemented the IPSEC stack at all,
  because as a result of the userland VPN stuff our IPSEC stack is
  largely unused except by a very narrow group of users.
 
 Well, I'd love to make better use of it if I can. I do suspect it makes
 most sense for userspace to continue to manage the probing of UDP
 connectivity, and the fallback to TCP mode — and I suspect it also makes
 sense to continue to use tun for passing packets up to the VPN client
 when it's using the TCP transport.
 
 So the question would be how we handle redirecting the packet flow to
 the optional UDP transport, when the VPN client determines that it's
 available. For the sake of the user setting up firewall and routing
 rules, I do think it's important that it continues to appear to
 userspace as the *same* device for the entire lifetime of the session,
 regardless of which transport the packets happen to be using at a given
 moment in time. It doesn't *have* to be tun, though. 

Since you want to provide connectivity over HTTPS which is not possible
in kernel space, you are stuck with keeping the tun device. So the
packet flow in that case is identical to how e.g. OpenVPN does it:

- tunX holds default route
- OpenConnect then:
  - receives packets on /dev/tun
  - holds TCP socket to VPN concentrator
  - does encapsulation into TLS

Speaking of optimisation, the interesting part is the alternative flow
via IPsec in UDP. AFAICT, it should be possible to setup an ESP in UDP
tunnel using XFRM (see ip-xfrm(8) for reference), although I didn't try
that myself. The funny thing with XFRM is, it applies before the routing
decision does: If my IPsec policy matches, the packet goes that way no
matter what the routing table says about the original destination. This
can be used to override the default route provided via tun0 in the above
case.

Of course, OpenConnect has to manage all the XFRM/policy stuff on it's
own, since switching from ESP in UDP back to TLS would mean to tear down
the XFRM tunnel. OpenConnect would have to setup (a limited) XFRM and
send test traffic to decide whether to set it up fully (if limited) or
tear it down (if unlimited) again so traffic arrives at tunX again.

In my opinion, this might work. The whole setup is probably about as
intuitive as the fact that kernel IPsec tunnel mode does not naturally
provide an own interface. Firewall setup on top of that might become a
matter of try-and-error. Maybe having a VTI interface and merely moving
the default route instead of fiddling with policies all the time might
make things a little easier to comprehend, but surely adds some
performance overhead.

Cheers, Phil



Re: [Qemu-devel] [PATCH] tun: orphan an skb on tx

2015-02-01 Thread David Woodhouse
On Wed, 2010-04-14 at 08:58 +0800, Herbert Xu wrote:
 On Tue, Apr 13, 2010 at 08:31:03PM +0200, Eric Dumazet wrote:
 
  Herbert Acked your patch, so I guess its OK, but I think it can be
  dangerous.
 
 The tun socket accounting was never designed to stop it from
 flooding another tun interface.  It's there to stop it from
 transmitting above a destination interface TX bandwidth and
 cause unnecessary packet drops.  It also limits the total amount
 of kernel memory that can be pinned down by a single tun interface.
 
 In this case, all we're doing is shifting the accounting from the
 hardware queue to the qdisc queue.
 
 So your ability to flood a tun interface is essentially unchanged.

I've just been looking at VPN performance, using netperf to flood an
openconnect/ocserv connection over GigE and profiling my VPN client.

If I run netperf over the *unencrypted* link, it only sends 1Gb/s of
packets — because the packets are correctly accounted to netperf's UDP
socket until the moment they're actually transmitted on the wire, and
the backpressure works correctly.

When I run over the VPN, netperf thinks it sent 2½ times the amount of
TX traffic. Packets are being dropped by the tun device before even
feeding them up to the VPN client to be sent — presumably because of
this skb_orphan() call. (The client itself should do the right thing,
and only suck packets out of the tun at the rate it can shove them out
*its* UDP socket.)

Did we ever look at the alternative solution of taking ownership only
after a timeout, or on demand when we need to shut down the device?

-- 
dwmw2



smime.p7s
Description: S/MIME cryptographic signature


Re: [Qemu-devel] [PATCH] tun: orphan an skb on tx

2015-02-01 Thread David Woodhouse
On Sun, 2015-02-01 at 14:26 +0200, Michael S. Tsirkin wrote:
  When I run over the VPN, netperf thinks it sent 2½ times the amount of
  TX traffic.
 
 At some level, it's expected: netperf's manual actually says:
   A UDP_STREAM test has no end-to-end flow control - UDP provides none and
   neither does netperf. However, if you wish, you can configure netperf
   with --enable-intervals=yes to enable the global command-line -b and -w
   options to pace bursts of traffic onto the network.

True, but UDP is just a canary for other protocols here. We *should* be
able to keep track of the packets before they even leave the box, and
know that we haven't even managed to send them. Even if we know it's a
datagram protocol and it's potentially going to be dropped in transit
later.

Of course, now I'm looking closely at the path these packets take to
leave the box, it starts to offend me that they're being passed up to
userspace just to encrypt them (as DTLS or ESP) and then send them back
down to the kernel on a UDP socket. The kernel already knows how to
{en,de}crypt ESP, and do the sequence number checking on incoming
packets.

I'm wondering if we bypass userspace in that case somehow — let
userspace negotiate the encryption and connect the UDP socket, then just
pass the socket fd and the parameters to the kernel so that incoming
packets are decrypted and 'received' on the tun device, and outgoing
packets on the tun device are encrypted and sent out on the UDP socket. 

The performance isn't too much of an issue for a VPN *client* in
practice, but we have a server implementation too which would probably
benefit quite well from such an offload facility.

If I were to look at such a thing, would it provoke screams of horror? 

-- 
dwmw2



smime.p7s
Description: S/MIME cryptographic signature


Re: [Qemu-devel] [PATCH] tun: orphan an skb on tx

2015-02-01 Thread Michael S. Tsirkin
On Sun, Feb 01, 2015 at 11:20:33AM +, David Woodhouse wrote:
 On Wed, 2010-04-14 at 08:58 +0800, Herbert Xu wrote:
  On Tue, Apr 13, 2010 at 08:31:03PM +0200, Eric Dumazet wrote:
  
   Herbert Acked your patch, so I guess its OK, but I think it can be
   dangerous.
  
  The tun socket accounting was never designed to stop it from
  flooding another tun interface.  It's there to stop it from
  transmitting above a destination interface TX bandwidth and
  cause unnecessary packet drops.  It also limits the total amount
  of kernel memory that can be pinned down by a single tun interface.
  
  In this case, all we're doing is shifting the accounting from the
  hardware queue to the qdisc queue.
  
  So your ability to flood a tun interface is essentially unchanged.
 
 I've just been looking at VPN performance, using netperf to flood an
 openconnect/ocserv connection over GigE and profiling my VPN client.
 
 If I run netperf over the *unencrypted* link, it only sends 1Gb/s of
 packets — because the packets are correctly accounted to netperf's UDP
 socket until the moment they're actually transmitted on the wire, and
 the backpressure works correctly.
 
 When I run over the VPN, netperf thinks it sent 2½ times the amount of
 TX traffic.

At some level, it's expected: netperf's manual actually says:
A UDP_STREAM test has no end-to-end flow control - UDP provides none and
neither does netperf. However, if you wish, you can configure netperf
with --enable-intervals=yes to enable the global command-line -b and -w
options to pace bursts of traffic onto the network.

 Packets are being dropped by the tun device before even
 feeding them up to the VPN client to be sent — presumably because of
 this skb_orphan() call. (The client itself should do the right thing,
 and only suck packets out of the tun at the rate it can shove them out
 *its* UDP socket.)

A simple work-around is to limit the rate using a non work conservig qdisc.

 Did we ever look at the alternative solution of taking ownership only
 after a timeout, or on demand when we need to shut down the device?

I've been thinking about this on and off, but didn't find a good
safe solution yet.

For timeout, the difficulty is to find a good timer value,
low enough to avoid DOS attacks but high enough to avoid
spurious packet drops (and expensive timer interrupts).

-- 
MST



Re: [Qemu-devel] [PATCH] tun: orphan an skb on tx

2015-02-01 Thread David Woodhouse
On Sun, 2015-02-01 at 12:19 -0800, David Miller wrote:
 From: David Woodhouse dw...@infradead.org
 Date: Sun, 01 Feb 2015 13:33:50 +
 
  Of course, now I'm looking closely at the path these packets take to
  leave the box, it starts to offend me that they're being passed up to
  userspace just to encrypt them (as DTLS or ESP) and then send them back
  down to the kernel on a UDP socket. The kernel already knows how to
  {en,de}crypt ESP, and do the sequence number checking on incoming
  packets.
 
 It's funny, I thought we had an IPSEC stack

Right. But I'm trying to work out how we can sanely *use* that from a
VPN client.

The client normally sets up a tun device, configuring it with
appropriate IP addresses and routes by invoking vpnc-script or passing
the information back to NetworkManager. The client itself might not even
have root privs, in the NetworkManager case.

The initial authentication and connection are done over HTTPS, and
packets *can* be passed that way if they need to be. But obviously the
client *also* tries to set up a UDP data transport too — which is DTLS
in the case of Cisco AnyConnect, and ESP in UDP for Juniper.

If it *can* get communication over UDP, it'll use it. Otherwise it just
passes packets over the TCP connection. So it needs to dynamically set
up and tear down the ESP/DTLS tunnels as and when they are working.

Ideally we want it such that that packets routed to the tun device get
transparently encrypted and sent out on the UDP socket, and packets
received from UDP and successfully decrypted will appear to have arrived
on the tun device. The user may be manually tweaking the routing, or
setting up firewall/NAT/etc. on the tun device.

I can see how to set up an ESP in UDP tunnel such that it looks like the
packets are actually departing on the *physical* interface (which in
practice I suppose they are). But that's going to be fairly complex to
set up, and extremely non-intuitive and hard to manage for the user. To
the extent that I don't think it's actually deployable.

I really was looking for some way to push down something like an XFRM
state into the tun device and just say shove them out here until I tell
you otherwise.

-- 
dwmw2



smime.p7s
Description: S/MIME cryptographic signature


Re: [Qemu-devel] [PATCH] tun: orphan an skb on tx

2015-02-01 Thread David Miller
From: David Woodhouse dw...@infradead.org
Date: Sun, 01 Feb 2015 13:33:50 +

 Of course, now I'm looking closely at the path these packets take to
 leave the box, it starts to offend me that they're being passed up to
 userspace just to encrypt them (as DTLS or ESP) and then send them back
 down to the kernel on a UDP socket. The kernel already knows how to
 {en,de}crypt ESP, and do the sequence number checking on incoming
 packets.

It's funny, I thought we had an IPSEC stack



Re: [Qemu-devel] [PATCH] tun: orphan an skb on tx

2015-02-01 Thread David Miller
From: David Woodhouse dw...@infradead.org
Date: Sun, 01 Feb 2015 21:29:43 +

 I really was looking for some way to push down something like an XFRM
 state into the tun device and just say shove them out here until I tell
 you otherwise.

People decided to use TUN and push VPN stuff back into userspace,
and there are repercussions for that decision.

I'm not saying this to be mean or whatever, but I was very
disappointed when userland IPSEC solutions using TUN started showing
up.

We might as well have not have implemented the IPSEC stack at all,
because as a result of the userland VPN stuff our IPSEC stack is
largely unused except by a very narrow group of users.



[Qemu-devel] [PATCH] tun: orphan an skb on tx

2010-04-13 Thread Michael S. Tsirkin
The following situation was observed in the field:
tap1 sends packets, tap2 does not consume them, as a result
tap1 can not be closed. This happens because
tun/tap devices can hang on to skbs undefinitely.

As noted by Herbert, possible solutions include a timeout followed by a
copy/change of ownership of the skb, or always copying/changing
ownership if we're going into a hostile device.

This patch implements the second approach.

Note: one issue still remaining is that since skbs
keep reference to tun socket and tun socket has a
reference to tun device, we won't flush backlog,
instead simply waiting for all skbs to get transmitted.
At least this is not user-triggerable, and
this was not reported in practice, my assumption is
other devices besides tap complete an skb
within finite time after it has been queued.

A possible solution for the second issue
would not to have socket reference the device,
instead, implement dev-destructor for tun, and
wait for all skbs to complete there, but this
needs some thought, probably too risky for 2.6.34.

Signed-off-by: Michael S. Tsirkin m...@redhat.com
Tested-by: Yan Vugenfirer yvuge...@redhat.com

---

Please review the below, and consider for 2.6.34,
and stable trees.

 drivers/net/tun.c |4 
 1 files changed, 4 insertions(+), 0 deletions(-)

diff --git a/drivers/net/tun.c b/drivers/net/tun.c
index 96c39bd..4326520 100644
--- a/drivers/net/tun.c
+++ b/drivers/net/tun.c
@@ -387,6 +387,10 @@ static netdev_tx_t tun_net_xmit(struct sk_buff *skb, 
struct net_device *dev)
}
}
 
+   /* Orphan the skb - required as we might hang on to it
+* for indefinite time. */
+   skb_orphan(skb);
+
/* Enqueue packet */
skb_queue_tail(tun-socket.sk-sk_receive_queue, skb);
dev-trans_start = jiffies;
-- 
1.7.0.2.280.gc6f05