Re: [PATCH 3.2 085/115] veth: don’t modify ip_summed; doing so treats packets with bad checksums as good.
On 04/30/2016 03:01 PM, Vijay Pandurangan wrote: On Sat, Apr 30, 2016 at 5:52 PM, Ben Greear wrote: Good point, so if you had: eth0 <-> raw <-> user space-bridge <-> raw <-> vethA <-> veth B <-> userspace-stub <->eth1 and user-space hub enabled this elide flag, things would work, right? Then, it seems like what we need is a way to tell the kernel router/bridge logic to follow elide signals in packets coming from veth. I'm not sure what the best way to do this is because I'm less familiar with conventions in that part of the kernel, but assuming there's a way to do this, would it be acceptable? You cannot receive on one veth without transmitting on the other, so I think the elide csum logic can go on the raw-socket, and apply to packets in the transmit-from-user-space direction. Just allowing the socket to make the veth behave like it used to before this patch in question should be good enough, since that worked for us for years. So, just an option to modify the ip_summed for pkts sent on a socket is probably sufficient. I don't think this is right. Consider: - App A sends out corrupt packets 50% of the time and discards inbound data. - App B doesn't care about corrupt packets and is happy to receive them and has some way of dealing with them (special case) - App C is a regular app, say nc or something. In your world, where A decides what happens to data it transmits, then A<--veth-->B and A<---wire-->B will have the same behaviour but A<-- veth --> C and A<-- wire --> C will have _different_ behaviour: C will behave incorrectly if it's connected over veth but correctly if connected with a wire. That is a bug. Since A cannot know what the app it's talking to will desire, I argue that both sides of a message must be opted in to this optimization. How can you make a generic app C know how to do this? The path could be, for instance: eth0 <-> user-space-A <-> vethA <-> vethB <-> { kernel routing logic } <-> vethC <-> vethD <-> appC There are no sockets on vethB, but it does need to have special behaviour to elide csums. Even if appC is hacked to know how to twiddle some thing on it's veth port, mucking with vethD will have no effect on vethB. With regard to your example above, why would A corrupt packets? My guess: 1) It has bugs (so, fix the bugs, it could equally create incorrect data with proper checksums, so just enabling checksumming adds no useful protection.) 2) It means to corrupt frames. In that case, someone must expect that C should receive incorrect frames, otherwise why bother making App-A corrupt them in the first place? 3) You are explicitly trying to test the kernel checksum logic, so you want the kernel to detect the bad checksum and throw away the packet. In this case, just don't set the socket option in appA to elide checksums and the packet will be thrown away. Any other cases you can think of? Thanks, Ben -- Ben Greear Candela Technologies Inc http://www.candelatech.com
Re: [PATCH 3.2 085/115] veth: don’t modify ip_summed; doing so treats packets with bad checksums as good.
On Sat, Apr 30, 2016 at 1:59 PM, Ben Greear wrote: > > On 04/30/2016 12:54 PM, Tom Herbert wrote: >> >> We've put considerable effort into cleaning up the checksum interface >> to make it as unambiguous as possible, please be very careful to >> follow it. Broken checksum processing is really hard to detect and >> debug. >> >> CHECKSUM_UNNECESSARY means that some number of _specific_ checksums >> (indicated by csum_level) have been verified to be correct in a >> packet. Blindly promoting CHECKSUM_NONE to CHECKSUM_UNNECESSARY is >> never right. If CHECKSUM_UNNECESSARY is set in such a manner but the >> checksum it would refer to has not been verified and is incorrect this >> is a major bug. > > > Suppose I know that the packet received on a packet-socket has > already been verified by a NIC that supports hardware checksumming. > > Then, I want to transmit it on a veth interface using a second > packet socket. I do not want veth to recalculate the checksum on > transmit, nor to validate it on the peer veth on receive, because I do > not want to waste the CPU cycles. I am assuming that my app is not > accidentally corrupting frames, so the checksum can never be bad. > > How should the checksumming be configured for the packets going into > the packet-socket from user-space? > Checksum unnecessary conversion to checksum complete should be done and then the computed value can be sent to user space. If you want to take advantage of the value on transmit then we would to change the interface. But I'm not sure why recalculating the the checksum in the host is even a problem. With all the advances in checksum offload, LCO, RCO it seems like that shouldn't be a common case anyway. > Also, I might want to send raw frames that do have > broken checksums (lets assume a real NIC, not veth), and I want them > to hit the wire with those bad checksums. > > How do I configure the checksumming in this case? Just send raw packets with bad checksums (no checksum offload). Tom > > > Thanks, > Ben > > > >> >> Tom >> >> On Sat, Apr 30, 2016 at 12:40 PM, Ben Greear >> wrote: >>> >>> >>> >>> On 04/30/2016 11:33 AM, Ben Hutchings wrote: On Thu, 2016-04-28 at 12:29 +0200, Sabrina Dubroca wrote: > > > Hello, >>> >>> >>> >> >> >> http://dmz2.candelatech.com/?p=linux-4.4.dev.y/.git;a=commitdiff;h=8153e983c0e5eba1aafe1fc296248ed2a553f1ac;hp=454b07405d694dad52e7f41af5816eed0190da8a > > > Actually, no, this is not really a regression. [...] It really is. Even though the old behaviour was a bug (raw packets should not be changed), if there are real applications that depend on that then we have to keep those applications working somehow. >>> >>> >>> >>> To be honest, I fail to see why the old behaviour is a bug when sending >>> raw packets from user-space. If raw packets should not be changed, then >>> we need some way to specify what the checksum setting is to begin with, >>> otherwise, user-space has not enough control. >>> >>> A socket option for new programs, and sysctl configurable defaults for >>> raw >>> sockets >>> for old binary programs would be sufficient I think. >>> >>> >>> Thanks, >>> Ben >>> >>> -- >>> Ben Greear >>> Candela Technologies Inc http://www.candelatech.com >> >> > > -- > Ben Greear > Candela Technologies Inc http://www.candelatech.com
Re: [PATCH 3.2 085/115] veth: don’t modify ip_summed; doing so treats packets with bad checksums as good.
On Sat, Apr 30, 2016 at 5:52 PM, Ben Greear wrote: >> >> Good point, so if you had: >> >> eth0 <-> raw <-> user space-bridge <-> raw <-> vethA <-> veth B <-> >> userspace-stub <->eth1 >> >> and user-space hub enabled this elide flag, things would work, right? >> Then, it seems like what we need is a way to tell the kernel >> router/bridge logic to follow elide signals in packets coming from >> veth. I'm not sure what the best way to do this is because I'm less >> familiar with conventions in that part of the kernel, but assuming >> there's a way to do this, would it be acceptable? > > > You cannot receive on one veth without transmitting on the other, so > I think the elide csum logic can go on the raw-socket, and apply to packets > in the transmit-from-user-space direction. Just allowing the socket to make > the veth behave like it used to before this patch in question should be good > enough, since that worked for us for years. So, just an option to modify > the > ip_summed for pkts sent on a socket is probably sufficient. I don't think this is right. Consider: - App A sends out corrupt packets 50% of the time and discards inbound data. - App B doesn't care about corrupt packets and is happy to receive them and has some way of dealing with them (special case) - App C is a regular app, say nc or something. In your world, where A decides what happens to data it transmits, then A<--veth-->B and A<---wire-->B will have the same behaviour but A<-- veth --> C and A<-- wire --> C will have _different_ behaviour: C will behave incorrectly if it's connected over veth but correctly if connected with a wire. That is a bug. Since A cannot know what the app it's talking to will desire, I argue that both sides of a message must be opted in to this optimization. > >>> There may be no sockets on the vethB port. And reader/writer is not >>> a good way to look at it since I am implementing a bi-directional bridge >>> in >>> user-space and each packet-socket is for both rx and tx. >> >> >> Sure, but we could model a bidrectional connection as two >> unidirectional sockets for our discussions here, right? > > > Best not to I think, you want to make sure that one socket can > correctly handle tx and rx. As long as that works, then using > uni-directional sockets should work too. > > > Thanks, > Ben > > -- > Ben Greear > Candela Technologies Inc http://www.candelatech.com
Re: [PATCH 3.2 085/115] veth: don’t modify ip_summed; doing so treats packets with bad checksums as good.
On 04/30/2016 02:36 PM, Vijay Pandurangan wrote: On Sat, Apr 30, 2016 at 5:29 PM, Ben Greear wrote: On 04/30/2016 02:13 PM, Vijay Pandurangan wrote: On Sat, Apr 30, 2016 at 4:59 PM, Ben Greear wrote: On 04/30/2016 12:54 PM, Tom Herbert wrote: We've put considerable effort into cleaning up the checksum interface to make it as unambiguous as possible, please be very careful to follow it. Broken checksum processing is really hard to detect and debug. CHECKSUM_UNNECESSARY means that some number of _specific_ checksums (indicated by csum_level) have been verified to be correct in a packet. Blindly promoting CHECKSUM_NONE to CHECKSUM_UNNECESSARY is never right. If CHECKSUM_UNNECESSARY is set in such a manner but the checksum it would refer to has not been verified and is incorrect this is a major bug. Suppose I know that the packet received on a packet-socket has already been verified by a NIC that supports hardware checksumming. Then, I want to transmit it on a veth interface using a second packet socket. I do not want veth to recalculate the checksum on transmit, nor to validate it on the peer veth on receive, because I do not want to waste the CPU cycles. I am assuming that my app is not accidentally corrupting frames, so the checksum can never be bad. How should the checksumming be configured for the packets going into the packet-socket from user-space? It seems like that only the receiver should decide whether or not to checksum packets on the veth, not the sender. How about: We could add a receiving socket option for "don't checksum packets received from a veth when the other side has marked them as elide-checksum-suggested" (similar to UDP_NOCHECKSUM), and a sending socket option for "mark all data sent via this socket to a veth as elide-checksum-suggested". So the process would be: Writer: 1. open read socket 2. open write socket, with option elide-checksum-for-veth-suggested 3. write data Reader: 1. open read socket with "follow-elide-checksum-suggestions-on-veth" 2. read data The kernel / module would then need to persist the flag on all packets that traverse a veth, and drop these data when they leave the veth module. I'm not sure this works completely. In my app, the packet flow might be: eth0 <-> raw-socket <-> user-space-bridge <-> raw-socket <-> vethA <-> vethB <-> [kernel router/bridge logic ...] <-> eth1 Good point, so if you had: eth0 <-> raw <-> user space-bridge <-> raw <-> vethA <-> veth B <-> userspace-stub <->eth1 and user-space hub enabled this elide flag, things would work, right? Then, it seems like what we need is a way to tell the kernel router/bridge logic to follow elide signals in packets coming from veth. I'm not sure what the best way to do this is because I'm less familiar with conventions in that part of the kernel, but assuming there's a way to do this, would it be acceptable? You cannot receive on one veth without transmitting on the other, so I think the elide csum logic can go on the raw-socket, and apply to packets in the transmit-from-user-space direction. Just allowing the socket to make the veth behave like it used to before this patch in question should be good enough, since that worked for us for years. So, just an option to modify the ip_summed for pkts sent on a socket is probably sufficient. There may be no sockets on the vethB port. And reader/writer is not a good way to look at it since I am implementing a bi-directional bridge in user-space and each packet-socket is for both rx and tx. Sure, but we could model a bidrectional connection as two unidirectional sockets for our discussions here, right? Best not to I think, you want to make sure that one socket can correctly handle tx and rx. As long as that works, then using uni-directional sockets should work too. Thanks, Ben -- Ben Greear Candela Technologies Inc http://www.candelatech.com
Re: [PATCH 3.2 085/115] veth: don’t modify ip_summed; doing so treats packets with bad checksums as good.
On Sat, Apr 30, 2016 at 5:29 PM, Ben Greear wrote: > > > On 04/30/2016 02:13 PM, Vijay Pandurangan wrote: >> >> On Sat, Apr 30, 2016 at 4:59 PM, Ben Greear >> wrote: >>> >>> >>> >>> On 04/30/2016 12:54 PM, Tom Herbert wrote: We've put considerable effort into cleaning up the checksum interface to make it as unambiguous as possible, please be very careful to follow it. Broken checksum processing is really hard to detect and debug. CHECKSUM_UNNECESSARY means that some number of _specific_ checksums (indicated by csum_level) have been verified to be correct in a packet. Blindly promoting CHECKSUM_NONE to CHECKSUM_UNNECESSARY is never right. If CHECKSUM_UNNECESSARY is set in such a manner but the checksum it would refer to has not been verified and is incorrect this is a major bug. >>> >>> >>> >>> Suppose I know that the packet received on a packet-socket has >>> already been verified by a NIC that supports hardware checksumming. >>> >>> Then, I want to transmit it on a veth interface using a second >>> packet socket. I do not want veth to recalculate the checksum on >>> transmit, nor to validate it on the peer veth on receive, because I do >>> not want to waste the CPU cycles. I am assuming that my app is not >>> accidentally corrupting frames, so the checksum can never be bad. >>> >>> How should the checksumming be configured for the packets going into >>> the packet-socket from user-space? >> >> >> >> It seems like that only the receiver should decide whether or not to >> checksum packets on the veth, not the sender. >> >> How about: >> >> We could add a receiving socket option for "don't checksum packets >> received from a veth when the other side has marked them as >> elide-checksum-suggested" (similar to UDP_NOCHECKSUM), and a sending >> socket option for "mark all data sent via this socket to a veth as >> elide-checksum-suggested". >> >> So the process would be: >> >> Writer: >> 1. open read socket >> 2. open write socket, with option elide-checksum-for-veth-suggested >> 3. write data >> >> Reader: >> 1. open read socket with "follow-elide-checksum-suggestions-on-veth" >> 2. read data >> >> The kernel / module would then need to persist the flag on all packets >> that traverse a veth, and drop these data when they leave the veth >> module. > > > I'm not sure this works completely. In my app, the packet flow might be: > > eth0 <-> raw-socket <-> user-space-bridge <-> raw-socket <-> vethA <-> vethB > <-> [kernel router/bridge logic ...] <-> eth1 Good point, so if you had: eth0 <-> raw <-> user space-bridge <-> raw <-> vethA <-> veth B <-> userspace-stub <->eth1 and user-space hub enabled this elide flag, things would work, right? Then, it seems like what we need is a way to tell the kernel router/bridge logic to follow elide signals in packets coming from veth. I'm not sure what the best way to do this is because I'm less familiar with conventions in that part of the kernel, but assuming there's a way to do this, would it be acceptable? > > There may be no sockets on the vethB port. And reader/writer is not > a good way to look at it since I am implementing a bi-directional bridge in > user-space and each packet-socket is for both rx and tx. Sure, but we could model a bidrectional connection as two unidirectional sockets for our discussions here, right? > >>> Also, I might want to send raw frames that do have >>> broken checksums (lets assume a real NIC, not veth), and I want them >>> to hit the wire with those bad checksums. >>> >>> >>> How do I configure the checksumming in this case? >> >> >> >> Correct me if I'm wrong but I think this is already possible now. You >> can have packets with incorrect checksum hitting the wire as is. What >> you cannot do is instruct the receiving end to ignore the checksum >> from the sending end when using a physical device (and something I >> think we should mimic on the sending device). > > > Yes, it does work currently (or, last I checked)...I just want to make sure > it keeps working. Yeah, good point. It would be great if we could write a test, or at the very least, a list of invariants about what kinds of things should work somewhere. > > > Thanks, > Ben > > -- > Ben Greear > Candela Technologies Inc http://www.candelatech.com
Re: [PATCH 3.2 085/115] veth: don’t modify ip_summed; doing so treats packets with bad checksums as good.
On 04/30/2016 02:13 PM, Vijay Pandurangan wrote: On Sat, Apr 30, 2016 at 4:59 PM, Ben Greear wrote: On 04/30/2016 12:54 PM, Tom Herbert wrote: We've put considerable effort into cleaning up the checksum interface to make it as unambiguous as possible, please be very careful to follow it. Broken checksum processing is really hard to detect and debug. CHECKSUM_UNNECESSARY means that some number of _specific_ checksums (indicated by csum_level) have been verified to be correct in a packet. Blindly promoting CHECKSUM_NONE to CHECKSUM_UNNECESSARY is never right. If CHECKSUM_UNNECESSARY is set in such a manner but the checksum it would refer to has not been verified and is incorrect this is a major bug. Suppose I know that the packet received on a packet-socket has already been verified by a NIC that supports hardware checksumming. Then, I want to transmit it on a veth interface using a second packet socket. I do not want veth to recalculate the checksum on transmit, nor to validate it on the peer veth on receive, because I do not want to waste the CPU cycles. I am assuming that my app is not accidentally corrupting frames, so the checksum can never be bad. How should the checksumming be configured for the packets going into the packet-socket from user-space? It seems like that only the receiver should decide whether or not to checksum packets on the veth, not the sender. How about: We could add a receiving socket option for "don't checksum packets received from a veth when the other side has marked them as elide-checksum-suggested" (similar to UDP_NOCHECKSUM), and a sending socket option for "mark all data sent via this socket to a veth as elide-checksum-suggested". So the process would be: Writer: 1. open read socket 2. open write socket, with option elide-checksum-for-veth-suggested 3. write data Reader: 1. open read socket with "follow-elide-checksum-suggestions-on-veth" 2. read data The kernel / module would then need to persist the flag on all packets that traverse a veth, and drop these data when they leave the veth module. I'm not sure this works completely. In my app, the packet flow might be: eth0 <-> raw-socket <-> user-space-bridge <-> raw-socket <-> vethA <-> vethB <-> [kernel router/bridge logic ...] <-> eth1 There may be no sockets on the vethB port. And reader/writer is not a good way to look at it since I am implementing a bi-directional bridge in user-space and each packet-socket is for both rx and tx. Also, I might want to send raw frames that do have broken checksums (lets assume a real NIC, not veth), and I want them to hit the wire with those bad checksums. How do I configure the checksumming in this case? Correct me if I'm wrong but I think this is already possible now. You can have packets with incorrect checksum hitting the wire as is. What you cannot do is instruct the receiving end to ignore the checksum from the sending end when using a physical device (and something I think we should mimic on the sending device). Yes, it does work currently (or, last I checked)...I just want to make sure it keeps working. Thanks, Ben -- Ben Greear Candela Technologies Inc http://www.candelatech.com
Re: [PATCH 3.2 085/115] veth: don’t modify ip_summed; doing so treats packets with bad checksums as good.
On Sat, Apr 30, 2016 at 4:59 PM, Ben Greear wrote: > > > On 04/30/2016 12:54 PM, Tom Herbert wrote: >> >> We've put considerable effort into cleaning up the checksum interface >> to make it as unambiguous as possible, please be very careful to >> follow it. Broken checksum processing is really hard to detect and >> debug. >> >> CHECKSUM_UNNECESSARY means that some number of _specific_ checksums >> (indicated by csum_level) have been verified to be correct in a >> packet. Blindly promoting CHECKSUM_NONE to CHECKSUM_UNNECESSARY is >> never right. If CHECKSUM_UNNECESSARY is set in such a manner but the >> checksum it would refer to has not been verified and is incorrect this >> is a major bug. > > > Suppose I know that the packet received on a packet-socket has > already been verified by a NIC that supports hardware checksumming. > > Then, I want to transmit it on a veth interface using a second > packet socket. I do not want veth to recalculate the checksum on > transmit, nor to validate it on the peer veth on receive, because I do > not want to waste the CPU cycles. I am assuming that my app is not > accidentally corrupting frames, so the checksum can never be bad. > > How should the checksumming be configured for the packets going into > the packet-socket from user-space? It seems like that only the receiver should decide whether or not to checksum packets on the veth, not the sender. How about: We could add a receiving socket option for "don't checksum packets received from a veth when the other side has marked them as elide-checksum-suggested" (similar to UDP_NOCHECKSUM), and a sending socket option for "mark all data sent via this socket to a veth as elide-checksum-suggested". So the process would be: Writer: 1. open read socket 2. open write socket, with option elide-checksum-for-veth-suggested 3. write data Reader: 1. open read socket with "follow-elide-checksum-suggestions-on-veth" 2. read data The kernel / module would then need to persist the flag on all packets that traverse a veth, and drop these data when they leave the veth module. > > > Also, I might want to send raw frames that do have > broken checksums (lets assume a real NIC, not veth), and I want them > to hit the wire with those bad checksums. > > > How do I configure the checksumming in this case? Correct me if I'm wrong but I think this is already possible now. You can have packets with incorrect checksum hitting the wire as is. What you cannot do is instruct the receiving end to ignore the checksum from the sending end when using a physical device (and something I think we should mimic on the sending device). > > > > Thanks, > Ben > > > >> >> Tom >> >> On Sat, Apr 30, 2016 at 12:40 PM, Ben Greear wrote: >>> >>> >>> >>> On 04/30/2016 11:33 AM, Ben Hutchings wrote: On Thu, 2016-04-28 at 12:29 +0200, Sabrina Dubroca wrote: > > > Hello, >>> >>> >>> >> >> http://dmz2.candelatech.com/?p=linux-4.4.dev.y/.git;a=commitdiff;h=8153e983c0e5eba1aafe1fc296248ed2a553f1ac;hp=454b07405d694dad52e7f41af5816eed0190da8a > > > Actually, no, this is not really a regression. [...] It really is. Even though the old behaviour was a bug (raw packets should not be changed), if there are real applications that depend on that then we have to keep those applications working somehow. >>> >>> >>> >>> To be honest, I fail to see why the old behaviour is a bug when sending >>> raw packets from user-space. If raw packets should not be changed, then >>> we need some way to specify what the checksum setting is to begin with, >>> otherwise, user-space has not enough control. >>> >>> A socket option for new programs, and sysctl configurable defaults for raw >>> sockets >>> for old binary programs would be sufficient I think. >>> >>> >>> Thanks, >>> Ben >>> >>> -- >>> Ben Greear >>> Candela Technologies Inc http://www.candelatech.com >> >> > > -- > Ben Greear > Candela Technologies Inc http://www.candelatech.com
Re: [PATCH 3.2 085/115] veth: don’t modify ip_summed; doing so treats packets with bad checksums as good.
On 04/30/2016 12:54 PM, Tom Herbert wrote: We've put considerable effort into cleaning up the checksum interface to make it as unambiguous as possible, please be very careful to follow it. Broken checksum processing is really hard to detect and debug. CHECKSUM_UNNECESSARY means that some number of _specific_ checksums (indicated by csum_level) have been verified to be correct in a packet. Blindly promoting CHECKSUM_NONE to CHECKSUM_UNNECESSARY is never right. If CHECKSUM_UNNECESSARY is set in such a manner but the checksum it would refer to has not been verified and is incorrect this is a major bug. Suppose I know that the packet received on a packet-socket has already been verified by a NIC that supports hardware checksumming. Then, I want to transmit it on a veth interface using a second packet socket. I do not want veth to recalculate the checksum on transmit, nor to validate it on the peer veth on receive, because I do not want to waste the CPU cycles. I am assuming that my app is not accidentally corrupting frames, so the checksum can never be bad. How should the checksumming be configured for the packets going into the packet-socket from user-space? Also, I might want to send raw frames that do have broken checksums (lets assume a real NIC, not veth), and I want them to hit the wire with those bad checksums. How do I configure the checksumming in this case? Thanks, Ben Tom On Sat, Apr 30, 2016 at 12:40 PM, Ben Greear wrote: On 04/30/2016 11:33 AM, Ben Hutchings wrote: On Thu, 2016-04-28 at 12:29 +0200, Sabrina Dubroca wrote: Hello, http://dmz2.candelatech.com/?p=linux-4.4.dev.y/.git;a=commitdiff;h=8153e983c0e5eba1aafe1fc296248ed2a553f1ac;hp=454b07405d694dad52e7f41af5816eed0190da8a Actually, no, this is not really a regression. [...] It really is. Even though the old behaviour was a bug (raw packets should not be changed), if there are real applications that depend on that then we have to keep those applications working somehow. To be honest, I fail to see why the old behaviour is a bug when sending raw packets from user-space. If raw packets should not be changed, then we need some way to specify what the checksum setting is to begin with, otherwise, user-space has not enough control. A socket option for new programs, and sysctl configurable defaults for raw sockets for old binary programs would be sufficient I think. Thanks, Ben -- Ben Greear Candela Technologies Inc http://www.candelatech.com -- Ben Greear Candela Technologies Inc http://www.candelatech.com
Re: [PATCH 3.2 085/115] veth: don’t modify ip_summed; doing so treats packets with bad checksums as good.
[oops – resending this because I was using gmail in HTML mode before by accident] There was a discussion on a separate thread about this. I agree with Sabrina fully. I believe veth should provide an abstraction layer that correctly emulates a physical network in all ways. Consider an environment where we have multiple physical computers. Each computer runs one or more containers, each of which has a publicly routable ip address. When adding a new app to the cluster, a scheduler might decide to run this container on any physical machine of its choice, assuming that apps have a way of routing traffic to their backends (we did something similar Google >10 years ago). This is something we might imagine happening with docker and ipv6 for instance. If you have an app, A, which sends raw ethernet frames (the simplest case I could imagine) with TCP data that has invalid checksums to app B, which is receiving it, the behaviour of the system _will be different_ depending upon whether app B is scheduled to run on the same machine as app A or not. This seems like a clear bug and a broken abstraction (especially as the default case), and something we should endeavour to avoid. I do see Ben's point about enabling sw checksum verification as potentially incurring a huge performance penalty (I haven't had a chance to measure it) that is completely wasteful in the vast majority of cases. Unfortunately I just don't see how we can solve this problem in a way that preserves a correct abstraction layer while also avoiding excess work. I guess the first piece of data that would be helpful is to determine just how big of a performance penalty this is. If it's small, then maybe it doesn't matter. On Thu, Apr 28, 2016 at 6:29 AM, Sabrina Dubroca wrote: > Hello, > > 2016-04-27, 17:14:44 -0700, Ben Greear wrote: >> On 04/27/2016 05:00 PM, Hannes Frederic Sowa wrote: >> > Hi Ben, >> > >> > On Wed, Apr 27, 2016, at 20:07, Ben Hutchings wrote: >> > > On Wed, 2016-04-27 at 08:59 -0700, Ben Greear wrote: >> > > > On 04/26/2016 04:02 PM, Ben Hutchings wrote: >> > > > > >> > > > > 3.2.80-rc1 review patch. If anyone has any objections, please let >> > > > > me know. >> > > > I would be careful about this. It causes regressions when sending >> > > > PACKET_SOCKET buffers from user-space to veth devices. >> > > > >> > > > There was a proposed upstream fix for the regression, but it has not >> > > > gone >> > > > into the tree as far as I know. >> > > > >> > > > http://www.spinics.net/lists/netdev/msg370436.html >> > > [...] >> > > >> > > OK, I'll drop this for now. >> > >> > The fall out from not having this patch is in my opinion a bigger >> > fallout than not having this patch. This patch fixes silent data >> > corruption vs. the problem Ben Greear is talking about, which might not >> > be that a common usage. >> > >> > What do others think? >> > >> > Bye, >> > Hannes >> > >> >> This patch from Cong Wang seems to fix the regression for me, I think it >> should be added and >> tested in the main tree, and then apply them to stable as a pair. >> >> http://dmz2.candelatech.com/?p=linux-4.4.dev.y/.git;a=commitdiff;h=8153e983c0e5eba1aafe1fc296248ed2a553f1ac;hp=454b07405d694dad52e7f41af5816eed0190da8a > > Actually, no, this is not really a regression. > > If you capture packets on a device with checksum offloading enabled, > the TCP/UDP checksum isn't filled. veth also behaves that way. What > the "veth: don't modify ip_summed" patch does is enable proper > checksum validation on veth. This really was a bug in veth. > > Cong's patch would also break cases where we choose to inject packets > with invalid checksums, and they would now be accepted as correct. > > Your use case is invalid, it just happened to work because of a > bug. If you want the stack to fill checksums so that you want capture > and reinject packets, you have to disable checksum offloading (or > compute the checksum yourself in userspace). > > Thanks. > > -- > Sabrina
Re: [PATCH 3.2 085/115] veth: don’t modify ip_summed; doing so treats packets with bad checksums as good.
We've put considerable effort into cleaning up the checksum interface to make it as unambiguous as possible, please be very careful to follow it. Broken checksum processing is really hard to detect and debug. CHECKSUM_UNNECESSARY means that some number of _specific_ checksums (indicated by csum_level) have been verified to be correct in a packet. Blindly promoting CHECKSUM_NONE to CHECKSUM_UNNECESSARY is never right. If CHECKSUM_UNNECESSARY is set in such a manner but the checksum it would refer to has not been verified and is incorrect this is a major bug. Tom On Sat, Apr 30, 2016 at 12:40 PM, Ben Greear wrote: > > > On 04/30/2016 11:33 AM, Ben Hutchings wrote: >> >> On Thu, 2016-04-28 at 12:29 +0200, Sabrina Dubroca wrote: >>> >>> Hello, > > http://dmz2.candelatech.com/?p=linux-4.4.dev.y/.git;a=commitdiff;h=8153e983c0e5eba1aafe1fc296248ed2a553f1ac;hp=454b07405d694dad52e7f41af5816eed0190da8a >>> >>> Actually, no, this is not really a regression. >> >> [...] >> >> It really is. Even though the old behaviour was a bug (raw packets >> should not be changed), if there are real applications that depend on >> that then we have to keep those applications working somehow. > > > To be honest, I fail to see why the old behaviour is a bug when sending > raw packets from user-space. If raw packets should not be changed, then > we need some way to specify what the checksum setting is to begin with, > otherwise, user-space has not enough control. > > A socket option for new programs, and sysctl configurable defaults for raw > sockets > for old binary programs would be sufficient I think. > > > Thanks, > Ben > > -- > Ben Greear > Candela Technologies Inc http://www.candelatech.com
Re: [PATCH 3.2 085/115] veth: don’t modify ip_summed; doing so treats packets with bad checksums as good.
On 04/30/2016 11:33 AM, Ben Hutchings wrote: On Thu, 2016-04-28 at 12:29 +0200, Sabrina Dubroca wrote: Hello, http://dmz2.candelatech.com/?p=linux-4.4.dev.y/.git;a=commitdiff;h=8153e983c0e5eba1aafe1fc296248ed2a553f1ac;hp=454b07405d694dad52e7f41af5816eed0190da8a Actually, no, this is not really a regression. [...] It really is. Even though the old behaviour was a bug (raw packets should not be changed), if there are real applications that depend on that then we have to keep those applications working somehow. To be honest, I fail to see why the old behaviour is a bug when sending raw packets from user-space. If raw packets should not be changed, then we need some way to specify what the checksum setting is to begin with, otherwise, user-space has not enough control. A socket option for new programs, and sysctl configurable defaults for raw sockets for old binary programs would be sufficient I think. Thanks, Ben -- Ben Greear Candela Technologies Inc http://www.candelatech.com
Re: [PATCH 3.2 085/115] veth: don’t modify ip_summed; doing so treats packets with bad checksums as good.
On Thu, 2016-04-28 at 06:45 -0700, Ben Greear wrote: > > On 04/28/2016 03:29 AM, Sabrina Dubroca wrote: [...] > > Your use case is invalid, it just happened to work because of a > > bug. If you want the stack to fill checksums so that you want capture > > and reinject packets, you have to disable checksum offloading (or > > compute the checksum yourself in userspace). > Disabling checksum offloading or computing in user-space (and then > recomputing in veth to verify the checksum?) is a huge performance loss. > > Maybe we could add a socket option to enable Cong's patch on a per-socket > basis? That way my use-case can still work and you can have this new > behaviour by default? It does sound like a useful option to have. If there are other applications that depend on veth's checksum-fixing behaviour and are being distributed in binary form, then a per-device option might be necessary so users can keep those applications working before they're updated. Ben. -- Ben Hutchings Tomorrow will be cancelled due to lack of interest. signature.asc Description: This is a digitally signed message part
Re: [PATCH 3.2 085/115] veth: don’t modify ip_summed; doing so treats packets with bad checksums as good.
On Thu, 2016-04-28 at 12:29 +0200, Sabrina Dubroca wrote: > Hello, > > 2016-04-27, 17:14:44 -0700, Ben Greear wrote: > > > > On 04/27/2016 05:00 PM, Hannes Frederic Sowa wrote: > > > > > > Hi Ben, > > > > > > On Wed, Apr 27, 2016, at 20:07, Ben Hutchings wrote: > > > > > > > > On Wed, 2016-04-27 at 08:59 -0700, Ben Greear wrote: > > > > > > > > > > On 04/26/2016 04:02 PM, Ben Hutchings wrote: > > > > > > > > > > > > > > > > > > 3.2.80-rc1 review patch. If anyone has any objections, please let > > > > > > me know. > > > > > I would be careful about this. It causes regressions when sending > > > > > PACKET_SOCKET buffers from user-space to veth devices. > > > > > > > > > > There was a proposed upstream fix for the regression, but it has not > > > > > gone > > > > > into the tree as far as I know. > > > > > > > > > > http://www.spinics.net/lists/netdev/msg370436.html > > > > [...] > > > > > > > > OK, I'll drop this for now. > > > The fall out from not having this patch is in my opinion a bigger > > > fallout than not having this patch. This patch fixes silent data > > > corruption vs. the problem Ben Greear is talking about, which might not > > > be that a common usage. > > > > > > What do others think? > > > > > > Bye, > > > Hannes > > > > > This patch from Cong Wang seems to fix the regression for me, I think it > > should be added and > > tested in the main tree, and then apply them to stable as a pair. > > > > http://dmz2.candelatech.com/?p=linux-4.4.dev.y/.git;a=commitdiff;h=8153e983c0e5eba1aafe1fc296248ed2a553f1ac;hp=454b07405d694dad52e7f41af5816eed0190da8a > Actually, no, this is not really a regression. [...] It really is. Even though the old behaviour was a bug (raw packets should not be changed), if there are real applications that depend on that then we have to keep those applications working somehow. Ben. -- Ben Hutchings Tomorrow will be cancelled due to lack of interest. signature.asc Description: This is a digitally signed message part
Re: [PATCH 3.2 085/115] veth: don’t modify ip_summed; doing so treats packets with bad checksums as good.
On 04/28/2016 03:29 AM, Sabrina Dubroca wrote: Hello, 2016-04-27, 17:14:44 -0700, Ben Greear wrote: On 04/27/2016 05:00 PM, Hannes Frederic Sowa wrote: Hi Ben, On Wed, Apr 27, 2016, at 20:07, Ben Hutchings wrote: On Wed, 2016-04-27 at 08:59 -0700, Ben Greear wrote: On 04/26/2016 04:02 PM, Ben Hutchings wrote: 3.2.80-rc1 review patch. If anyone has any objections, please let me know. I would be careful about this. It causes regressions when sending PACKET_SOCKET buffers from user-space to veth devices. There was a proposed upstream fix for the regression, but it has not gone into the tree as far as I know. http://www.spinics.net/lists/netdev/msg370436.html [...] OK, I'll drop this for now. The fall out from not having this patch is in my opinion a bigger fallout than not having this patch. This patch fixes silent data corruption vs. the problem Ben Greear is talking about, which might not be that a common usage. What do others think? Bye, Hannes This patch from Cong Wang seems to fix the regression for me, I think it should be added and tested in the main tree, and then apply them to stable as a pair. http://dmz2.candelatech.com/?p=linux-4.4.dev.y/.git;a=commitdiff;h=8153e983c0e5eba1aafe1fc296248ed2a553f1ac;hp=454b07405d694dad52e7f41af5816eed0190da8a Actually, no, this is not really a regression. If you capture packets on a device with checksum offloading enabled, the TCP/UDP checksum isn't filled. veth also behaves that way. What the "veth: don't modify ip_summed" patch does is enable proper checksum validation on veth. This really was a bug in veth. Cong's patch would also break cases where we choose to inject packets with invalid checksums, and they would now be accepted as correct. Your use case is invalid, it just happened to work because of a bug. If you want the stack to fill checksums so that you want capture and reinject packets, you have to disable checksum offloading (or compute the checksum yourself in userspace). Disabling checksum offloading or computing in user-space (and then recomputing in veth to verify the checksum?) is a huge performance loss. Maybe we could add a socket option to enable Cong's patch on a per-socket basis? That way my use-case can still work and you can have this new behaviour by default? Thanks, Ben -- Ben Greear Candela Technologies Inc http://www.candelatech.com
Re: [PATCH 3.2 085/115] veth: don’t modify ip_summed; doing so treats packets with bad checksums as good.
Hello, 2016-04-27, 17:14:44 -0700, Ben Greear wrote: > On 04/27/2016 05:00 PM, Hannes Frederic Sowa wrote: > > Hi Ben, > > > > On Wed, Apr 27, 2016, at 20:07, Ben Hutchings wrote: > > > On Wed, 2016-04-27 at 08:59 -0700, Ben Greear wrote: > > > > On 04/26/2016 04:02 PM, Ben Hutchings wrote: > > > > > > > > > > 3.2.80-rc1 review patch. If anyone has any objections, please let me > > > > > know. > > > > I would be careful about this. It causes regressions when sending > > > > PACKET_SOCKET buffers from user-space to veth devices. > > > > > > > > There was a proposed upstream fix for the regression, but it has not > > > > gone > > > > into the tree as far as I know. > > > > > > > > http://www.spinics.net/lists/netdev/msg370436.html > > > [...] > > > > > > OK, I'll drop this for now. > > > > The fall out from not having this patch is in my opinion a bigger > > fallout than not having this patch. This patch fixes silent data > > corruption vs. the problem Ben Greear is talking about, which might not > > be that a common usage. > > > > What do others think? > > > > Bye, > > Hannes > > > > This patch from Cong Wang seems to fix the regression for me, I think it > should be added and > tested in the main tree, and then apply them to stable as a pair. > > http://dmz2.candelatech.com/?p=linux-4.4.dev.y/.git;a=commitdiff;h=8153e983c0e5eba1aafe1fc296248ed2a553f1ac;hp=454b07405d694dad52e7f41af5816eed0190da8a Actually, no, this is not really a regression. If you capture packets on a device with checksum offloading enabled, the TCP/UDP checksum isn't filled. veth also behaves that way. What the "veth: don't modify ip_summed" patch does is enable proper checksum validation on veth. This really was a bug in veth. Cong's patch would also break cases where we choose to inject packets with invalid checksums, and they would now be accepted as correct. Your use case is invalid, it just happened to work because of a bug. If you want the stack to fill checksums so that you want capture and reinject packets, you have to disable checksum offloading (or compute the checksum yourself in userspace). Thanks. -- Sabrina
Re: [PATCH 3.2 085/115] veth: don’t modify ip_summed; doing so treats packets with bad checksums as good.
On 04/27/2016 05:00 PM, Hannes Frederic Sowa wrote: Hi Ben, On Wed, Apr 27, 2016, at 20:07, Ben Hutchings wrote: On Wed, 2016-04-27 at 08:59 -0700, Ben Greear wrote: On 04/26/2016 04:02 PM, Ben Hutchings wrote: 3.2.80-rc1 review patch. If anyone has any objections, please let me know. I would be careful about this. It causes regressions when sending PACKET_SOCKET buffers from user-space to veth devices. There was a proposed upstream fix for the regression, but it has not gone into the tree as far as I know. http://www.spinics.net/lists/netdev/msg370436.html [...] OK, I'll drop this for now. The fall out from not having this patch is in my opinion a bigger fallout than not having this patch. This patch fixes silent data corruption vs. the problem Ben Greear is talking about, which might not be that a common usage. What do others think? Bye, Hannes This patch from Cong Wang seems to fix the regression for me, I think it should be added and tested in the main tree, and then apply them to stable as a pair. http://dmz2.candelatech.com/?p=linux-4.4.dev.y/.git;a=commitdiff;h=8153e983c0e5eba1aafe1fc296248ed2a553f1ac;hp=454b07405d694dad52e7f41af5816eed0190da8a diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c index da1ae0e..f8cc758 100644 (file) --- a/net/packet/af_packet.c +++ b/net/packet/af_packet.c @@ -1926,6 +1926,7 @@ retry: goto out_unlock; } + skb->ip_summed = CHECKSUM_UNNECESSARY; skb->protocol = proto; skb->dev = dev; skb->priority = sk->sk_priority; @@ -2352,6 +2353,7 @@ static int tpacket_fill_skb(struct packet_sock *po, struct sk_buff *skb, ph.raw = frame; + skb->ip_summed = CHECKSUM_UNNECESSARY; skb->protocol = proto; skb->dev = dev; skb->priority = po->sk.sk_priority; @@ -2776,6 +2778,7 @@ static int packet_snd(struct socket *sock, struct msghdr *msg, size_t len) goto out_free; } + skb->ip_summed = CHECKSUM_UNNECESSARY; skb->protocol = proto; skb->dev = dev; skb->priority = sk->sk_priority; Thanks, Ben -- Ben Greear Candela Technologies Inc http://www.candelatech.com
Re: [PATCH 3.2 085/115] veth: don’t modify ip_summed; doing so treats packets with bad checksums as good.
Hi Ben, On Wed, Apr 27, 2016, at 20:07, Ben Hutchings wrote: > On Wed, 2016-04-27 at 08:59 -0700, Ben Greear wrote: > > On 04/26/2016 04:02 PM, Ben Hutchings wrote: > > > > > > 3.2.80-rc1 review patch. If anyone has any objections, please let me > > > know. > > I would be careful about this. It causes regressions when sending > > PACKET_SOCKET buffers from user-space to veth devices. > > > > There was a proposed upstream fix for the regression, but it has not gone > > into the tree as far as I know. > > > > http://www.spinics.net/lists/netdev/msg370436.html > [...] > > OK, I'll drop this for now. The fall out from not having this patch is in my opinion a bigger fallout than not having this patch. This patch fixes silent data corruption vs. the problem Ben Greear is talking about, which might not be that a common usage. What do others think? Bye, Hannes
Re: [PATCH 3.2 085/115] veth: don’t modify ip_summed; doing so treats packets with bad checksums as good.
On Wed, 2016-04-27 at 08:59 -0700, Ben Greear wrote: > On 04/26/2016 04:02 PM, Ben Hutchings wrote: > > > > 3.2.80-rc1 review patch. If anyone has any objections, please let me know. > I would be careful about this. It causes regressions when sending > PACKET_SOCKET buffers from user-space to veth devices. > > There was a proposed upstream fix for the regression, but it has not gone > into the tree as far as I know. > > http://www.spinics.net/lists/netdev/msg370436.html [...] OK, I'll drop this for now. Ben. -- Ben Hutchings Once a job is fouled up, anything done to improve it makes it worse. signature.asc Description: This is a digitally signed message part
Re: [PATCH 3.2 085/115] veth: don’t modify ip_summed; doing so treats packets with bad checksums as good.
On 04/26/2016 04:02 PM, Ben Hutchings wrote: 3.2.80-rc1 review patch. If anyone has any objections, please let me know. I would be careful about this. It causes regressions when sending PACKET_SOCKET buffers from user-space to veth devices. There was a proposed upstream fix for the regression, but it has not gone into the tree as far as I know. http://www.spinics.net/lists/netdev/msg370436.html Thanks, Ben -- From: Vijay Pandurangan [ Upstream commit ce8c839b74e3017996fad4e1b7ba2e2625ede82f ] Packets that arrive from real hardware devices have ip_summed == CHECKSUM_UNNECESSARY if the hardware verified the checksums, or CHECKSUM_NONE if the packet is bad or it was unable to verify it. The current version of veth will replace CHECKSUM_NONE with CHECKSUM_UNNECESSARY, which causes corrupt packets routed from hardware to a veth device to be delivered to the application. This caused applications at Twitter to receive corrupt data when network hardware was corrupting packets. We believe this was added as an optimization to skip computing and verifying checksums for communication between containers. However, locally generated packets have ip_summed == CHECKSUM_PARTIAL, so the code as written does nothing for them. As far as we can tell, after removing this code, these packets are transmitted from one stack to another unmodified (tcpdump shows invalid checksums on both sides, as expected), and they are delivered correctly to applications. We didn’t test every possible network configuration, but we tried a few common ones such as bridging containers, using NAT between the host and a container, and routing from hardware devices to containers. We have effectively deployed this in production at Twitter (by disabling RX checksum offloading on veth devices). This code dates back to the first version of the driver, commit ("[NET]: Virtual ethernet device driver"), so I suspect this bug occurred mostly because the driver API has evolved significantly since then. Commit <0b7967503dc97864f283a> ("net/veth: Fix packet checksumming") (in December 2010) fixed this for packets that get created locally and sent to hardware devices, by not changing CHECKSUM_PARTIAL. However, the same issue still occurs for packets coming in from hardware devices. Co-authored-by: Evan Jones Signed-off-by: Evan Jones Cc: Nicolas Dichtel Cc: Phil Sutter Cc: Toshiaki Makita Cc: netdev@vger.kernel.org Cc: linux-ker...@vger.kernel.org Signed-off-by: Vijay Pandurangan Acked-by: Cong Wang Signed-off-by: David S. Miller [bwh: Backported to 3.2: adjust context] Signed-off-by: Ben Hutchings --- drivers/net/veth.c | 6 -- 1 file changed, 6 deletions(-) --- a/drivers/net/veth.c +++ b/drivers/net/veth.c @@ -126,11 +126,6 @@ static netdev_tx_t veth_xmit(struct sk_b stats = this_cpu_ptr(priv->stats); rcv_stats = this_cpu_ptr(rcv_priv->stats); - /* don't change ip_summed == CHECKSUM_PARTIAL, as that - will cause bad checksum on forwarded packets */ - if (skb->ip_summed == CHECKSUM_NONE && - rcv->features & NETIF_F_RXCSUM) - skb->ip_summed = CHECKSUM_UNNECESSARY; length = skb->len; if (dev_forward_skb(rcv, skb) != NET_RX_SUCCESS) -- Ben Greear Candela Technologies Inc http://www.candelatech.com
[PATCH 3.2 085/115] veth: don’t modify ip_summed; doing so treats packets with bad checksums as good.
3.2.80-rc1 review patch. If anyone has any objections, please let me know. -- From: Vijay Pandurangan [ Upstream commit ce8c839b74e3017996fad4e1b7ba2e2625ede82f ] Packets that arrive from real hardware devices have ip_summed == CHECKSUM_UNNECESSARY if the hardware verified the checksums, or CHECKSUM_NONE if the packet is bad or it was unable to verify it. The current version of veth will replace CHECKSUM_NONE with CHECKSUM_UNNECESSARY, which causes corrupt packets routed from hardware to a veth device to be delivered to the application. This caused applications at Twitter to receive corrupt data when network hardware was corrupting packets. We believe this was added as an optimization to skip computing and verifying checksums for communication between containers. However, locally generated packets have ip_summed == CHECKSUM_PARTIAL, so the code as written does nothing for them. As far as we can tell, after removing this code, these packets are transmitted from one stack to another unmodified (tcpdump shows invalid checksums on both sides, as expected), and they are delivered correctly to applications. We didn’t test every possible network configuration, but we tried a few common ones such as bridging containers, using NAT between the host and a container, and routing from hardware devices to containers. We have effectively deployed this in production at Twitter (by disabling RX checksum offloading on veth devices). This code dates back to the first version of the driver, commit ("[NET]: Virtual ethernet device driver"), so I suspect this bug occurred mostly because the driver API has evolved significantly since then. Commit <0b7967503dc97864f283a> ("net/veth: Fix packet checksumming") (in December 2010) fixed this for packets that get created locally and sent to hardware devices, by not changing CHECKSUM_PARTIAL. However, the same issue still occurs for packets coming in from hardware devices. Co-authored-by: Evan Jones Signed-off-by: Evan Jones Cc: Nicolas Dichtel Cc: Phil Sutter Cc: Toshiaki Makita Cc: netdev@vger.kernel.org Cc: linux-ker...@vger.kernel.org Signed-off-by: Vijay Pandurangan Acked-by: Cong Wang Signed-off-by: David S. Miller [bwh: Backported to 3.2: adjust context] Signed-off-by: Ben Hutchings --- drivers/net/veth.c | 6 -- 1 file changed, 6 deletions(-) --- a/drivers/net/veth.c +++ b/drivers/net/veth.c @@ -126,11 +126,6 @@ static netdev_tx_t veth_xmit(struct sk_b stats = this_cpu_ptr(priv->stats); rcv_stats = this_cpu_ptr(rcv_priv->stats); - /* don't change ip_summed == CHECKSUM_PARTIAL, as that - will cause bad checksum on forwarded packets */ - if (skb->ip_summed == CHECKSUM_NONE && - rcv->features & NETIF_F_RXCSUM) - skb->ip_summed = CHECKSUM_UNNECESSARY; length = skb->len; if (dev_forward_skb(rcv, skb) != NET_RX_SUCCESS)