Re: [PATCH 3.2 085/115] veth: don’t modify ip_summed; doing so treats packets with bad checksums as good.

2016-04-30 Thread Ben Greear



On 04/30/2016 03:01 PM, Vijay Pandurangan wrote:

On Sat, Apr 30, 2016 at 5:52 PM, Ben Greear  wrote:


Good point, so if you had:

eth0 <-> raw <-> user space-bridge <-> raw <-> vethA <-> veth B <->
userspace-stub <->eth1

and user-space hub enabled this elide flag, things would work, right?
Then, it seems like what we need is a way to tell the kernel
router/bridge logic to follow elide signals in packets coming from
veth. I'm not sure what the best way to do this is because I'm less
familiar with conventions in that part of the kernel, but assuming
there's a way to do this, would it be acceptable?



You cannot receive on one veth without transmitting on the other, so
I think the elide csum logic can go on the raw-socket, and apply to packets
in the transmit-from-user-space direction.  Just allowing the socket to make
the veth behave like it used to before this patch in question should be good
enough, since that worked for us for years.  So, just an option to modify
the
ip_summed for pkts sent on a socket is probably sufficient.


I don't think this is right. Consider:

- App A  sends out corrupt packets 50% of the time and discards inbound data.
- App B doesn't care about corrupt packets and is happy to receive
them and has some way of dealing with them (special case)
- App C is a regular app, say nc or something.

In your world, where A decides what happens to data it transmits,
then
A<--veth-->B and A<---wire-->B will have the same behaviour

but

A<-- veth --> C and A<-- wire --> C will have _different_ behaviour: C
will behave incorrectly if it's connected over veth but correctly if
connected with a wire. That is a bug.

Since A cannot know what the app it's talking to will desire, I argue
that both sides of a message must be opted in to this optimization.


How can you make a generic app C know how to do this?  The path could be,
for instance:

eth0 <-> user-space-A <-> vethA <-> vethB <-> { kernel routing logic } <-> vethC <-> 
vethD <-> appC

There are no sockets on vethB, but it does need to have special behaviour to 
elide
csums.  Even if appC is hacked to know how to twiddle some thing on it's veth 
port,
mucking with vethD will have no effect on vethB.

With regard to your example above, why would A corrupt packets?  My guess:

1)  It has bugs (so, fix the bugs, it could equally create incorrect data with 
proper checksums,
so just enabling checksumming adds no useful protection.)

2)  It means to corrupt frames.  In that case, someone must expect that C 
should receive incorrect
frames, otherwise why bother making App-A corrupt them in the first place?

3)  You are explicitly trying to test the kernel checksum logic, so you want 
the kernel to
detect the bad checksum and throw away the packet.  In this case, just 
don't set the socket
option in appA to elide checksums and the packet will be thrown away.

Any other cases you can think of?

Thanks,
Ben

--
Ben Greear 
Candela Technologies Inc  http://www.candelatech.com


Re: [PATCH 3.2 085/115] veth: don’t modify ip_summed; doing so treats packets with bad checksums as good.

2016-04-30 Thread Tom Herbert
On Sat, Apr 30, 2016 at 1:59 PM, Ben Greear  wrote:
>
> On 04/30/2016 12:54 PM, Tom Herbert wrote:
>>
>> We've put considerable effort into cleaning up the checksum interface
>> to make it as unambiguous as possible, please be very careful to
>> follow it. Broken checksum processing is really hard to detect and
>> debug.
>>
>> CHECKSUM_UNNECESSARY means that some number of _specific_ checksums
>> (indicated by csum_level) have been verified to be correct in a
>> packet. Blindly promoting CHECKSUM_NONE to CHECKSUM_UNNECESSARY is
>> never right. If CHECKSUM_UNNECESSARY is set in such a manner but the
>> checksum it would refer to has not been verified and is incorrect this
>> is a major bug.
>
>
> Suppose I know that the packet received on a packet-socket has
> already been verified by a NIC that supports hardware checksumming.
>
> Then, I want to transmit it on a veth interface using a second
> packet socket.  I do not want veth to recalculate the checksum on
> transmit, nor to validate it on the peer veth on receive, because I do
> not want to waste the CPU cycles.  I am assuming that my app is not
> accidentally corrupting frames, so the checksum can never be bad.
>
> How should the checksumming be configured for the packets going into
> the packet-socket from user-space?
>
Checksum unnecessary conversion to checksum complete should be done
and then the computed value can be sent to user space. If you want to
take advantage of the value on transmit then we would to change the
interface. But I'm not sure why recalculating the the checksum in the
host is even a problem. With all the advances in checksum offload,
LCO, RCO it seems like that shouldn't be a common case anyway.

> Also, I might want to send raw frames that do have
> broken checksums (lets assume a real NIC, not veth), and I want them
> to hit the wire with those bad checksums.
>
> How do I configure the checksumming in this case?

Just send raw packets with bad checksums (no checksum offload).

Tom

>
>
> Thanks,
> Ben
>
>
>
>>
>> Tom
>>
>> On Sat, Apr 30, 2016 at 12:40 PM, Ben Greear 
>> wrote:
>>>
>>>
>>>
>>> On 04/30/2016 11:33 AM, Ben Hutchings wrote:


 On Thu, 2016-04-28 at 12:29 +0200, Sabrina Dubroca wrote:
>
>
> Hello,
>>>
>>>
>>>
>>
>>
>> http://dmz2.candelatech.com/?p=linux-4.4.dev.y/.git;a=commitdiff;h=8153e983c0e5eba1aafe1fc296248ed2a553f1ac;hp=454b07405d694dad52e7f41af5816eed0190da8a
>
>
> Actually, no, this is not really a regression.


 [...]

 It really is.  Even though the old behaviour was a bug (raw packets
 should not be changed), if there are real applications that depend on
 that then we have to keep those applications working somehow.
>>>
>>>
>>>
>>> To be honest, I fail to see why the old behaviour is a bug when sending
>>> raw packets from user-space.  If raw packets should not be changed, then
>>> we need some way to specify what the checksum setting is to begin with,
>>> otherwise, user-space has not enough control.
>>>
>>> A socket option for new programs, and sysctl configurable defaults for
>>> raw
>>> sockets
>>> for old binary programs would be sufficient I think.
>>>
>>>
>>> Thanks,
>>> Ben
>>>
>>> --
>>> Ben Greear 
>>> Candela Technologies Inc  http://www.candelatech.com
>>
>>
>
> --
> Ben Greear 
> Candela Technologies Inc  http://www.candelatech.com


Re: [PATCH 3.2 085/115] veth: don’t modify ip_summed; doing so treats packets with bad checksums as good.

2016-04-30 Thread Vijay Pandurangan
On Sat, Apr 30, 2016 at 5:52 PM, Ben Greear  wrote:
>>
>> Good point, so if you had:
>>
>> eth0 <-> raw <-> user space-bridge <-> raw <-> vethA <-> veth B <->
>> userspace-stub <->eth1
>>
>> and user-space hub enabled this elide flag, things would work, right?
>> Then, it seems like what we need is a way to tell the kernel
>> router/bridge logic to follow elide signals in packets coming from
>> veth. I'm not sure what the best way to do this is because I'm less
>> familiar with conventions in that part of the kernel, but assuming
>> there's a way to do this, would it be acceptable?
>
>
> You cannot receive on one veth without transmitting on the other, so
> I think the elide csum logic can go on the raw-socket, and apply to packets
> in the transmit-from-user-space direction.  Just allowing the socket to make
> the veth behave like it used to before this patch in question should be good
> enough, since that worked for us for years.  So, just an option to modify
> the
> ip_summed for pkts sent on a socket is probably sufficient.

I don't think this is right. Consider:

- App A  sends out corrupt packets 50% of the time and discards inbound data.
- App B doesn't care about corrupt packets and is happy to receive
them and has some way of dealing with them (special case)
- App C is a regular app, say nc or something.

In your world, where A decides what happens to data it transmits,
then
A<--veth-->B and A<---wire-->B will have the same behaviour

but

A<-- veth --> C and A<-- wire --> C will have _different_ behaviour: C
will behave incorrectly if it's connected over veth but correctly if
connected with a wire. That is a bug.

Since A cannot know what the app it's talking to will desire, I argue
that both sides of a message must be opted in to this optimization.






>
>>> There may be no sockets on the vethB port.  And reader/writer is not
>>> a good way to look at it since I am implementing a bi-directional bridge
>>> in
>>> user-space and each packet-socket is for both rx and tx.
>>
>>
>> Sure, but we could model a bidrectional connection as two
>> unidirectional sockets for our discussions here, right?
>
>
> Best not to I think, you want to make sure that one socket can
> correctly handle tx and rx.  As long as that works, then using
> uni-directional sockets should work too.
>
>
> Thanks,
> Ben
>
> --
> Ben Greear 
> Candela Technologies Inc  http://www.candelatech.com


Re: [PATCH 3.2 085/115] veth: don’t modify ip_summed; doing so treats packets with bad checksums as good.

2016-04-30 Thread Ben Greear



On 04/30/2016 02:36 PM, Vijay Pandurangan wrote:

On Sat, Apr 30, 2016 at 5:29 PM, Ben Greear  wrote:



On 04/30/2016 02:13 PM, Vijay Pandurangan wrote:


On Sat, Apr 30, 2016 at 4:59 PM, Ben Greear 
wrote:




On 04/30/2016 12:54 PM, Tom Herbert wrote:



We've put considerable effort into cleaning up the checksum interface
to make it as unambiguous as possible, please be very careful to
follow it. Broken checksum processing is really hard to detect and
debug.

CHECKSUM_UNNECESSARY means that some number of _specific_ checksums
(indicated by csum_level) have been verified to be correct in a
packet. Blindly promoting CHECKSUM_NONE to CHECKSUM_UNNECESSARY is
never right. If CHECKSUM_UNNECESSARY is set in such a manner but the
checksum it would refer to has not been verified and is incorrect this
is a major bug.




Suppose I know that the packet received on a packet-socket has
already been verified by a NIC that supports hardware checksumming.

Then, I want to transmit it on a veth interface using a second
packet socket.  I do not want veth to recalculate the checksum on
transmit, nor to validate it on the peer veth on receive, because I do
not want to waste the CPU cycles.  I am assuming that my app is not
accidentally corrupting frames, so the checksum can never be bad.

How should the checksumming be configured for the packets going into
the packet-socket from user-space?




It seems like that only the receiver should decide whether or not to
checksum packets on the veth, not the sender.

How about:

We could add a receiving socket option for "don't checksum packets
received from a veth when the other side has marked them as
elide-checksum-suggested" (similar to UDP_NOCHECKSUM), and a sending
socket option for "mark all data sent via this socket to a veth as
elide-checksum-suggested".

So the process would be:

Writer:
1. open read socket
2. open write socket, with option elide-checksum-for-veth-suggested
3. write data

Reader:
1. open read socket with "follow-elide-checksum-suggestions-on-veth"
2. read data

The kernel / module would then need to persist the flag on all packets
that traverse a veth, and drop these data when they leave the veth
module.



I'm not sure this works completely.  In my app, the packet flow might be:

eth0 <-> raw-socket <-> user-space-bridge <-> raw-socket <-> vethA <-> vethB
<-> [kernel router/bridge logic ...] <-> eth1


Good point, so if you had:

eth0 <-> raw <-> user space-bridge <-> raw <-> vethA <-> veth B <->
userspace-stub <->eth1

and user-space hub enabled this elide flag, things would work, right?
Then, it seems like what we need is a way to tell the kernel
router/bridge logic to follow elide signals in packets coming from
veth. I'm not sure what the best way to do this is because I'm less
familiar with conventions in that part of the kernel, but assuming
there's a way to do this, would it be acceptable?


You cannot receive on one veth without transmitting on the other, so
I think the elide csum logic can go on the raw-socket, and apply to packets
in the transmit-from-user-space direction.  Just allowing the socket to make
the veth behave like it used to before this patch in question should be good
enough, since that worked for us for years.  So, just an option to modify the
ip_summed for pkts sent on a socket is probably sufficient.


There may be no sockets on the vethB port.  And reader/writer is not
a good way to look at it since I am implementing a bi-directional bridge in
user-space and each packet-socket is for both rx and tx.


Sure, but we could model a bidrectional connection as two
unidirectional sockets for our discussions here, right?


Best not to I think, you want to make sure that one socket can
correctly handle tx and rx.  As long as that works, then using
uni-directional sockets should work too.

Thanks,
Ben

--
Ben Greear 
Candela Technologies Inc  http://www.candelatech.com


Re: [PATCH 3.2 085/115] veth: don’t modify ip_summed; doing so treats packets with bad checksums as good.

2016-04-30 Thread Vijay Pandurangan
On Sat, Apr 30, 2016 at 5:29 PM, Ben Greear  wrote:
>
>
> On 04/30/2016 02:13 PM, Vijay Pandurangan wrote:
>>
>> On Sat, Apr 30, 2016 at 4:59 PM, Ben Greear 
>> wrote:
>>>
>>>
>>>
>>> On 04/30/2016 12:54 PM, Tom Herbert wrote:


 We've put considerable effort into cleaning up the checksum interface
 to make it as unambiguous as possible, please be very careful to
 follow it. Broken checksum processing is really hard to detect and
 debug.

 CHECKSUM_UNNECESSARY means that some number of _specific_ checksums
 (indicated by csum_level) have been verified to be correct in a
 packet. Blindly promoting CHECKSUM_NONE to CHECKSUM_UNNECESSARY is
 never right. If CHECKSUM_UNNECESSARY is set in such a manner but the
 checksum it would refer to has not been verified and is incorrect this
 is a major bug.
>>>
>>>
>>>
>>> Suppose I know that the packet received on a packet-socket has
>>> already been verified by a NIC that supports hardware checksumming.
>>>
>>> Then, I want to transmit it on a veth interface using a second
>>> packet socket.  I do not want veth to recalculate the checksum on
>>> transmit, nor to validate it on the peer veth on receive, because I do
>>> not want to waste the CPU cycles.  I am assuming that my app is not
>>> accidentally corrupting frames, so the checksum can never be bad.
>>>
>>> How should the checksumming be configured for the packets going into
>>> the packet-socket from user-space?
>>
>>
>>
>> It seems like that only the receiver should decide whether or not to
>> checksum packets on the veth, not the sender.
>>
>> How about:
>>
>> We could add a receiving socket option for "don't checksum packets
>> received from a veth when the other side has marked them as
>> elide-checksum-suggested" (similar to UDP_NOCHECKSUM), and a sending
>> socket option for "mark all data sent via this socket to a veth as
>> elide-checksum-suggested".
>>
>> So the process would be:
>>
>> Writer:
>> 1. open read socket
>> 2. open write socket, with option elide-checksum-for-veth-suggested
>> 3. write data
>>
>> Reader:
>> 1. open read socket with "follow-elide-checksum-suggestions-on-veth"
>> 2. read data
>>
>> The kernel / module would then need to persist the flag on all packets
>> that traverse a veth, and drop these data when they leave the veth
>> module.
>
>
> I'm not sure this works completely.  In my app, the packet flow might be:
>
> eth0 <-> raw-socket <-> user-space-bridge <-> raw-socket <-> vethA <-> vethB
> <-> [kernel router/bridge logic ...] <-> eth1

Good point, so if you had:

eth0 <-> raw <-> user space-bridge <-> raw <-> vethA <-> veth B <->
userspace-stub <->eth1

and user-space hub enabled this elide flag, things would work, right?
Then, it seems like what we need is a way to tell the kernel
router/bridge logic to follow elide signals in packets coming from
veth. I'm not sure what the best way to do this is because I'm less
familiar with conventions in that part of the kernel, but assuming
there's a way to do this, would it be acceptable?



>
> There may be no sockets on the vethB port.  And reader/writer is not
> a good way to look at it since I am implementing a bi-directional bridge in
> user-space and each packet-socket is for both rx and tx.

Sure, but we could model a bidrectional connection as two
unidirectional sockets for our discussions here, right?



>
>>> Also, I might want to send raw frames that do have
>>> broken checksums (lets assume a real NIC, not veth), and I want them
>>> to hit the wire with those bad checksums.
>>>
>>>
>>> How do I configure the checksumming in this case?
>>
>>
>>
>> Correct me if I'm wrong but I think this is already possible now. You
>> can have packets with incorrect checksum hitting the wire as is. What
>> you cannot do is instruct the receiving end to ignore the checksum
>> from the sending end when using a physical device (and something I
>> think we should mimic on the sending device).
>
>
> Yes, it does work currently (or, last I checked)...I just want to make sure
> it keeps working.

Yeah, good point. It would be great if we could write a test, or at
the very least, a list of invariants about what kinds of things should
work somewhere.

>
>
> Thanks,
> Ben
>
> --
> Ben Greear 
> Candela Technologies Inc  http://www.candelatech.com


Re: [PATCH 3.2 085/115] veth: don’t modify ip_summed; doing so treats packets with bad checksums as good.

2016-04-30 Thread Ben Greear



On 04/30/2016 02:13 PM, Vijay Pandurangan wrote:

On Sat, Apr 30, 2016 at 4:59 PM, Ben Greear  wrote:



On 04/30/2016 12:54 PM, Tom Herbert wrote:


We've put considerable effort into cleaning up the checksum interface
to make it as unambiguous as possible, please be very careful to
follow it. Broken checksum processing is really hard to detect and
debug.

CHECKSUM_UNNECESSARY means that some number of _specific_ checksums
(indicated by csum_level) have been verified to be correct in a
packet. Blindly promoting CHECKSUM_NONE to CHECKSUM_UNNECESSARY is
never right. If CHECKSUM_UNNECESSARY is set in such a manner but the
checksum it would refer to has not been verified and is incorrect this
is a major bug.



Suppose I know that the packet received on a packet-socket has
already been verified by a NIC that supports hardware checksumming.

Then, I want to transmit it on a veth interface using a second
packet socket.  I do not want veth to recalculate the checksum on
transmit, nor to validate it on the peer veth on receive, because I do
not want to waste the CPU cycles.  I am assuming that my app is not
accidentally corrupting frames, so the checksum can never be bad.

How should the checksumming be configured for the packets going into
the packet-socket from user-space?



It seems like that only the receiver should decide whether or not to
checksum packets on the veth, not the sender.

How about:

We could add a receiving socket option for "don't checksum packets
received from a veth when the other side has marked them as
elide-checksum-suggested" (similar to UDP_NOCHECKSUM), and a sending
socket option for "mark all data sent via this socket to a veth as
elide-checksum-suggested".

So the process would be:

Writer:
1. open read socket
2. open write socket, with option elide-checksum-for-veth-suggested
3. write data

Reader:
1. open read socket with "follow-elide-checksum-suggestions-on-veth"
2. read data

The kernel / module would then need to persist the flag on all packets
that traverse a veth, and drop these data when they leave the veth
module.


I'm not sure this works completely.  In my app, the packet flow might be:

eth0 <-> raw-socket <-> user-space-bridge <-> raw-socket <-> vethA <-> vethB <-> 
[kernel router/bridge logic ...] <-> eth1

There may be no sockets on the vethB port.  And reader/writer is not
a good way to look at it since I am implementing a bi-directional bridge in
user-space and each packet-socket is for both rx and tx.


Also, I might want to send raw frames that do have
broken checksums (lets assume a real NIC, not veth), and I want them
to hit the wire with those bad checksums.


How do I configure the checksumming in this case?



Correct me if I'm wrong but I think this is already possible now. You
can have packets with incorrect checksum hitting the wire as is. What
you cannot do is instruct the receiving end to ignore the checksum
from the sending end when using a physical device (and something I
think we should mimic on the sending device).


Yes, it does work currently (or, last I checked)...I just want to make sure it 
keeps working.

Thanks,
Ben

--
Ben Greear 
Candela Technologies Inc  http://www.candelatech.com


Re: [PATCH 3.2 085/115] veth: don’t modify ip_summed; doing so treats packets with bad checksums as good.

2016-04-30 Thread Vijay Pandurangan
On Sat, Apr 30, 2016 at 4:59 PM, Ben Greear  wrote:
>
>
> On 04/30/2016 12:54 PM, Tom Herbert wrote:
>>
>> We've put considerable effort into cleaning up the checksum interface
>> to make it as unambiguous as possible, please be very careful to
>> follow it. Broken checksum processing is really hard to detect and
>> debug.
>>
>> CHECKSUM_UNNECESSARY means that some number of _specific_ checksums
>> (indicated by csum_level) have been verified to be correct in a
>> packet. Blindly promoting CHECKSUM_NONE to CHECKSUM_UNNECESSARY is
>> never right. If CHECKSUM_UNNECESSARY is set in such a manner but the
>> checksum it would refer to has not been verified and is incorrect this
>> is a major bug.
>
>
> Suppose I know that the packet received on a packet-socket has
> already been verified by a NIC that supports hardware checksumming.
>
> Then, I want to transmit it on a veth interface using a second
> packet socket.  I do not want veth to recalculate the checksum on
> transmit, nor to validate it on the peer veth on receive, because I do
> not want to waste the CPU cycles.  I am assuming that my app is not
> accidentally corrupting frames, so the checksum can never be bad.
>
> How should the checksumming be configured for the packets going into
> the packet-socket from user-space?


It seems like that only the receiver should decide whether or not to
checksum packets on the veth, not the sender.

How about:

We could add a receiving socket option for "don't checksum packets
received from a veth when the other side has marked them as
elide-checksum-suggested" (similar to UDP_NOCHECKSUM), and a sending
socket option for "mark all data sent via this socket to a veth as
elide-checksum-suggested".

So the process would be:

Writer:
1. open read socket
2. open write socket, with option elide-checksum-for-veth-suggested
3. write data

Reader:
1. open read socket with "follow-elide-checksum-suggestions-on-veth"
2. read data

The kernel / module would then need to persist the flag on all packets
that traverse a veth, and drop these data when they leave the veth
module.


>
>
> Also, I might want to send raw frames that do have
> broken checksums (lets assume a real NIC, not veth), and I want them
> to hit the wire with those bad checksums.
>
>
> How do I configure the checksumming in this case?


Correct me if I'm wrong but I think this is already possible now. You
can have packets with incorrect checksum hitting the wire as is. What
you cannot do is instruct the receiving end to ignore the checksum
from the sending end when using a physical device (and something I
think we should mimic on the sending device).


>
>
>
> Thanks,
> Ben
>
>
>
>>
>> Tom
>>
>> On Sat, Apr 30, 2016 at 12:40 PM, Ben Greear  wrote:
>>>
>>>
>>>
>>> On 04/30/2016 11:33 AM, Ben Hutchings wrote:


 On Thu, 2016-04-28 at 12:29 +0200, Sabrina Dubroca wrote:
>
>
> Hello,
>>>
>>>
>>>
>>
>> http://dmz2.candelatech.com/?p=linux-4.4.dev.y/.git;a=commitdiff;h=8153e983c0e5eba1aafe1fc296248ed2a553f1ac;hp=454b07405d694dad52e7f41af5816eed0190da8a
>
>
> Actually, no, this is not really a regression.


 [...]

 It really is.  Even though the old behaviour was a bug (raw packets
 should not be changed), if there are real applications that depend on
 that then we have to keep those applications working somehow.
>>>
>>>
>>>
>>> To be honest, I fail to see why the old behaviour is a bug when sending
>>> raw packets from user-space.  If raw packets should not be changed, then
>>> we need some way to specify what the checksum setting is to begin with,
>>> otherwise, user-space has not enough control.
>>>
>>> A socket option for new programs, and sysctl configurable defaults for raw
>>> sockets
>>> for old binary programs would be sufficient I think.
>>>
>>>
>>> Thanks,
>>> Ben
>>>
>>> --
>>> Ben Greear 
>>> Candela Technologies Inc  http://www.candelatech.com
>>
>>
>
> --
> Ben Greear 
> Candela Technologies Inc  http://www.candelatech.com


Re: [PATCH 3.2 085/115] veth: don’t modify ip_summed; doing so treats packets with bad checksums as good.

2016-04-30 Thread Ben Greear


On 04/30/2016 12:54 PM, Tom Herbert wrote:

We've put considerable effort into cleaning up the checksum interface
to make it as unambiguous as possible, please be very careful to
follow it. Broken checksum processing is really hard to detect and
debug.

CHECKSUM_UNNECESSARY means that some number of _specific_ checksums
(indicated by csum_level) have been verified to be correct in a
packet. Blindly promoting CHECKSUM_NONE to CHECKSUM_UNNECESSARY is
never right. If CHECKSUM_UNNECESSARY is set in such a manner but the
checksum it would refer to has not been verified and is incorrect this
is a major bug.


Suppose I know that the packet received on a packet-socket has
already been verified by a NIC that supports hardware checksumming.

Then, I want to transmit it on a veth interface using a second
packet socket.  I do not want veth to recalculate the checksum on
transmit, nor to validate it on the peer veth on receive, because I do
not want to waste the CPU cycles.  I am assuming that my app is not
accidentally corrupting frames, so the checksum can never be bad.

How should the checksumming be configured for the packets going into
the packet-socket from user-space?

Also, I might want to send raw frames that do have
broken checksums (lets assume a real NIC, not veth), and I want them
to hit the wire with those bad checksums.

How do I configure the checksumming in this case?


Thanks,
Ben




Tom

On Sat, Apr 30, 2016 at 12:40 PM, Ben Greear  wrote:



On 04/30/2016 11:33 AM, Ben Hutchings wrote:


On Thu, 2016-04-28 at 12:29 +0200, Sabrina Dubroca wrote:


Hello,





http://dmz2.candelatech.com/?p=linux-4.4.dev.y/.git;a=commitdiff;h=8153e983c0e5eba1aafe1fc296248ed2a553f1ac;hp=454b07405d694dad52e7f41af5816eed0190da8a


Actually, no, this is not really a regression.


[...]

It really is.  Even though the old behaviour was a bug (raw packets
should not be changed), if there are real applications that depend on
that then we have to keep those applications working somehow.



To be honest, I fail to see why the old behaviour is a bug when sending
raw packets from user-space.  If raw packets should not be changed, then
we need some way to specify what the checksum setting is to begin with,
otherwise, user-space has not enough control.

A socket option for new programs, and sysctl configurable defaults for raw
sockets
for old binary programs would be sufficient I think.


Thanks,
Ben

--
Ben Greear 
Candela Technologies Inc  http://www.candelatech.com




--
Ben Greear 
Candela Technologies Inc  http://www.candelatech.com


Re: [PATCH 3.2 085/115] veth: don’t modify ip_summed; doing so treats packets with bad checksums as good.

2016-04-30 Thread Vijay Pandurangan
[oops – resending this because I was using gmail in HTML mode before
by accident]

There was a discussion on a separate thread about this. I agree with
Sabrina fully. I believe veth should provide an abstraction layer that
correctly emulates a physical network in all ways.

Consider an environment where we have multiple physical computers.
Each computer runs one or more containers, each of which has a
publicly routable ip address.  When adding a new app to the cluster, a
scheduler might decide to run this container on any physical machine
of its choice, assuming that apps have a way of routing traffic to
their backends (we did something similar Google >10 years ago). This
is something we might imagine happening with docker and ipv6 for
instance.

If you have an app, A, which sends raw ethernet frames (the simplest
case I could imagine) with TCP data that has invalid checksums to app
B, which is receiving it, the behaviour of the system _will be
different_ depending upon whether app B is scheduled to run on the
same machine as app A or not. This seems like a clear bug and a broken
abstraction (especially as the default case), and something we should
endeavour to avoid.

I do see Ben's point about enabling sw checksum verification as
potentially incurring a huge performance penalty (I haven't had a
chance to measure it) that is completely wasteful in the vast majority
of cases.

Unfortunately I just don't see how we can solve this problem in a way
that preserves a correct abstraction layer while also avoiding excess
work. I guess the first piece of data that would be helpful is to
determine just how big of a performance penalty this is. If it's
small, then maybe it doesn't matter.




On Thu, Apr 28, 2016 at 6:29 AM, Sabrina Dubroca  wrote:
> Hello,
>
> 2016-04-27, 17:14:44 -0700, Ben Greear wrote:
>> On 04/27/2016 05:00 PM, Hannes Frederic Sowa wrote:
>> > Hi Ben,
>> >
>> > On Wed, Apr 27, 2016, at 20:07, Ben Hutchings wrote:
>> > > On Wed, 2016-04-27 at 08:59 -0700, Ben Greear wrote:
>> > > > On 04/26/2016 04:02 PM, Ben Hutchings wrote:
>> > > > >
>> > > > > 3.2.80-rc1 review patch.  If anyone has any objections, please let 
>> > > > > me know.
>> > > > I would be careful about this.  It causes regressions when sending
>> > > > PACKET_SOCKET buffers from user-space to veth devices.
>> > > >
>> > > > There was a proposed upstream fix for the regression, but it has not 
>> > > > gone
>> > > > into the tree as far as I know.
>> > > >
>> > > > http://www.spinics.net/lists/netdev/msg370436.html
>> > > [...]
>> > >
>> > > OK, I'll drop this for now.
>> >
>> > The fall out from not having this patch is in my opinion a bigger
>> > fallout than not having this patch. This patch fixes silent data
>> > corruption vs. the problem Ben Greear is talking about, which might not
>> > be that a common usage.
>> >
>> > What do others think?
>> >
>> > Bye,
>> > Hannes
>> >
>>
>> This patch from Cong Wang seems to fix the regression for me, I think it 
>> should be added and
>> tested in the main tree, and then apply them to stable as a pair.
>>
>> http://dmz2.candelatech.com/?p=linux-4.4.dev.y/.git;a=commitdiff;h=8153e983c0e5eba1aafe1fc296248ed2a553f1ac;hp=454b07405d694dad52e7f41af5816eed0190da8a
>
> Actually, no, this is not really a regression.
>
> If you capture packets on a device with checksum offloading enabled,
> the TCP/UDP checksum isn't filled.  veth also behaves that way.  What
> the "veth: don't modify ip_summed" patch does is enable proper
> checksum validation on veth.  This really was a bug in veth.
>
> Cong's patch would also break cases where we choose to inject packets
> with invalid checksums, and they would now be accepted as correct.
>
> Your use case is invalid, it just happened to work because of a
> bug.  If you want the stack to fill checksums so that you want capture
> and reinject packets, you have to disable checksum offloading (or
> compute the checksum yourself in userspace).
>
> Thanks.
>
> --
> Sabrina


Re: [PATCH 3.2 085/115] veth: don’t modify ip_summed; doing so treats packets with bad checksums as good.

2016-04-30 Thread Tom Herbert
We've put considerable effort into cleaning up the checksum interface
to make it as unambiguous as possible, please be very careful to
follow it. Broken checksum processing is really hard to detect and
debug.

CHECKSUM_UNNECESSARY means that some number of _specific_ checksums
(indicated by csum_level) have been verified to be correct in a
packet. Blindly promoting CHECKSUM_NONE to CHECKSUM_UNNECESSARY is
never right. If CHECKSUM_UNNECESSARY is set in such a manner but the
checksum it would refer to has not been verified and is incorrect this
is a major bug.

Tom

On Sat, Apr 30, 2016 at 12:40 PM, Ben Greear  wrote:
>
>
> On 04/30/2016 11:33 AM, Ben Hutchings wrote:
>>
>> On Thu, 2016-04-28 at 12:29 +0200, Sabrina Dubroca wrote:
>>>
>>> Hello,
>
>

 http://dmz2.candelatech.com/?p=linux-4.4.dev.y/.git;a=commitdiff;h=8153e983c0e5eba1aafe1fc296248ed2a553f1ac;hp=454b07405d694dad52e7f41af5816eed0190da8a
>>>
>>> Actually, no, this is not really a regression.
>>
>> [...]
>>
>> It really is.  Even though the old behaviour was a bug (raw packets
>> should not be changed), if there are real applications that depend on
>> that then we have to keep those applications working somehow.
>
>
> To be honest, I fail to see why the old behaviour is a bug when sending
> raw packets from user-space.  If raw packets should not be changed, then
> we need some way to specify what the checksum setting is to begin with,
> otherwise, user-space has not enough control.
>
> A socket option for new programs, and sysctl configurable defaults for raw
> sockets
> for old binary programs would be sufficient I think.
>
>
> Thanks,
> Ben
>
> --
> Ben Greear 
> Candela Technologies Inc  http://www.candelatech.com


Re: [PATCH 3.2 085/115] veth: don’t modify ip_summed; doing so treats packets with bad checksums as good.

2016-04-30 Thread Ben Greear



On 04/30/2016 11:33 AM, Ben Hutchings wrote:

On Thu, 2016-04-28 at 12:29 +0200, Sabrina Dubroca wrote:

Hello,



http://dmz2.candelatech.com/?p=linux-4.4.dev.y/.git;a=commitdiff;h=8153e983c0e5eba1aafe1fc296248ed2a553f1ac;hp=454b07405d694dad52e7f41af5816eed0190da8a

Actually, no, this is not really a regression.

[...]

It really is.  Even though the old behaviour was a bug (raw packets
should not be changed), if there are real applications that depend on
that then we have to keep those applications working somehow.


To be honest, I fail to see why the old behaviour is a bug when sending
raw packets from user-space.  If raw packets should not be changed, then
we need some way to specify what the checksum setting is to begin with,
otherwise, user-space has not enough control.

A socket option for new programs, and sysctl configurable defaults for raw 
sockets
for old binary programs would be sufficient I think.

Thanks,
Ben

--
Ben Greear 
Candela Technologies Inc  http://www.candelatech.com


Re: [PATCH 3.2 085/115] veth: don’t modify ip_summed; doing so treats packets with bad checksums as good.

2016-04-30 Thread Ben Hutchings
On Thu, 2016-04-28 at 06:45 -0700, Ben Greear wrote:
> 
> On 04/28/2016 03:29 AM, Sabrina Dubroca wrote:
[...]
> > Your use case is invalid, it just happened to work because of a
> > bug.  If you want the stack to fill checksums so that you want capture
> > and reinject packets, you have to disable checksum offloading (or
> > compute the checksum yourself in userspace).
> Disabling checksum offloading or computing in user-space (and then
> recomputing in veth to verify the checksum?) is a huge performance loss.
> 
> Maybe we could add a socket option to enable Cong's patch on a per-socket
> basis?  That way my use-case can still work and you can have this new
> behaviour by default?

It does sound like a useful option to have.  If there are other
applications that depend on veth's checksum-fixing behaviour and are
being distributed in binary form, then a per-device option might be
necessary so users can keep those applications working before they're
updated.

Ben.

-- 
Ben Hutchings
Tomorrow will be cancelled due to lack of interest.

signature.asc
Description: This is a digitally signed message part


Re: [PATCH 3.2 085/115] veth: don’t modify ip_summed; doing so treats packets with bad checksums as good.

2016-04-30 Thread Ben Hutchings
On Thu, 2016-04-28 at 12:29 +0200, Sabrina Dubroca wrote:
> Hello,
> 
> 2016-04-27, 17:14:44 -0700, Ben Greear wrote:
> > 
> > On 04/27/2016 05:00 PM, Hannes Frederic Sowa wrote:
> > > 
> > > Hi Ben,
> > > 
> > > On Wed, Apr 27, 2016, at 20:07, Ben Hutchings wrote:
> > > > 
> > > > On Wed, 2016-04-27 at 08:59 -0700, Ben Greear wrote:
> > > > > 
> > > > > On 04/26/2016 04:02 PM, Ben Hutchings wrote:
> > > > > > 
> > > > > > 
> > > > > > 3.2.80-rc1 review patch.  If anyone has any objections, please let 
> > > > > > me know.
> > > > > I would be careful about this.  It causes regressions when sending
> > > > > PACKET_SOCKET buffers from user-space to veth devices.
> > > > > 
> > > > > There was a proposed upstream fix for the regression, but it has not 
> > > > > gone
> > > > > into the tree as far as I know.
> > > > > 
> > > > > http://www.spinics.net/lists/netdev/msg370436.html
> > > > [...]
> > > > 
> > > > OK, I'll drop this for now.
> > > The fall out from not having this patch is in my opinion a bigger
> > > fallout than not having this patch. This patch fixes silent data
> > > corruption vs. the problem Ben Greear is talking about, which might not
> > > be that a common usage.
> > > 
> > > What do others think?
> > > 
> > > Bye,
> > > Hannes
> > > 
> > This patch from Cong Wang seems to fix the regression for me, I think it 
> > should be added and
> > tested in the main tree, and then apply them to stable as a pair.
> > 
> > http://dmz2.candelatech.com/?p=linux-4.4.dev.y/.git;a=commitdiff;h=8153e983c0e5eba1aafe1fc296248ed2a553f1ac;hp=454b07405d694dad52e7f41af5816eed0190da8a
> Actually, no, this is not really a regression.
[...]

It really is.  Even though the old behaviour was a bug (raw packets
should not be changed), if there are real applications that depend on
that then we have to keep those applications working somehow.

Ben.

-- 
Ben Hutchings
Tomorrow will be cancelled due to lack of interest.

signature.asc
Description: This is a digitally signed message part


Re: [PATCH 3.2 085/115] veth: don’t modify ip_summed; doing so treats packets with bad checksums as good.

2016-04-28 Thread Ben Greear



On 04/28/2016 03:29 AM, Sabrina Dubroca wrote:

Hello,

2016-04-27, 17:14:44 -0700, Ben Greear wrote:

On 04/27/2016 05:00 PM, Hannes Frederic Sowa wrote:

Hi Ben,

On Wed, Apr 27, 2016, at 20:07, Ben Hutchings wrote:

On Wed, 2016-04-27 at 08:59 -0700, Ben Greear wrote:

On 04/26/2016 04:02 PM, Ben Hutchings wrote:


3.2.80-rc1 review patch.  If anyone has any objections, please let me know.

I would be careful about this.  It causes regressions when sending
PACKET_SOCKET buffers from user-space to veth devices.

There was a proposed upstream fix for the regression, but it has not gone
into the tree as far as I know.

http://www.spinics.net/lists/netdev/msg370436.html

[...]

OK, I'll drop this for now.


The fall out from not having this patch is in my opinion a bigger
fallout than not having this patch. This patch fixes silent data
corruption vs. the problem Ben Greear is talking about, which might not
be that a common usage.

What do others think?

Bye,
Hannes



This patch from Cong Wang seems to fix the regression for me, I think it should 
be added and
tested in the main tree, and then apply them to stable as a pair.

http://dmz2.candelatech.com/?p=linux-4.4.dev.y/.git;a=commitdiff;h=8153e983c0e5eba1aafe1fc296248ed2a553f1ac;hp=454b07405d694dad52e7f41af5816eed0190da8a


Actually, no, this is not really a regression.

If you capture packets on a device with checksum offloading enabled,
the TCP/UDP checksum isn't filled.  veth also behaves that way.  What
the "veth: don't modify ip_summed" patch does is enable proper
checksum validation on veth.  This really was a bug in veth.

Cong's patch would also break cases where we choose to inject packets
with invalid checksums, and they would now be accepted as correct.

Your use case is invalid, it just happened to work because of a
bug.  If you want the stack to fill checksums so that you want capture
and reinject packets, you have to disable checksum offloading (or
compute the checksum yourself in userspace).


Disabling checksum offloading or computing in user-space (and then
recomputing in veth to verify the checksum?) is a huge performance loss.

Maybe we could add a socket option to enable Cong's patch on a per-socket
basis?  That way my use-case can still work and you can have this new
behaviour by default?

Thanks,
Ben


--
Ben Greear 
Candela Technologies Inc  http://www.candelatech.com


Re: [PATCH 3.2 085/115] veth: don’t modify ip_summed; doing so treats packets with bad checksums as good.

2016-04-28 Thread Sabrina Dubroca
Hello,

2016-04-27, 17:14:44 -0700, Ben Greear wrote:
> On 04/27/2016 05:00 PM, Hannes Frederic Sowa wrote:
> > Hi Ben,
> > 
> > On Wed, Apr 27, 2016, at 20:07, Ben Hutchings wrote:
> > > On Wed, 2016-04-27 at 08:59 -0700, Ben Greear wrote:
> > > > On 04/26/2016 04:02 PM, Ben Hutchings wrote:
> > > > > 
> > > > > 3.2.80-rc1 review patch.  If anyone has any objections, please let me 
> > > > > know.
> > > > I would be careful about this.  It causes regressions when sending
> > > > PACKET_SOCKET buffers from user-space to veth devices.
> > > > 
> > > > There was a proposed upstream fix for the regression, but it has not 
> > > > gone
> > > > into the tree as far as I know.
> > > > 
> > > > http://www.spinics.net/lists/netdev/msg370436.html
> > > [...]
> > > 
> > > OK, I'll drop this for now.
> > 
> > The fall out from not having this patch is in my opinion a bigger
> > fallout than not having this patch. This patch fixes silent data
> > corruption vs. the problem Ben Greear is talking about, which might not
> > be that a common usage.
> > 
> > What do others think?
> > 
> > Bye,
> > Hannes
> > 
> 
> This patch from Cong Wang seems to fix the regression for me, I think it 
> should be added and
> tested in the main tree, and then apply them to stable as a pair.
> 
> http://dmz2.candelatech.com/?p=linux-4.4.dev.y/.git;a=commitdiff;h=8153e983c0e5eba1aafe1fc296248ed2a553f1ac;hp=454b07405d694dad52e7f41af5816eed0190da8a

Actually, no, this is not really a regression.

If you capture packets on a device with checksum offloading enabled,
the TCP/UDP checksum isn't filled.  veth also behaves that way.  What
the "veth: don't modify ip_summed" patch does is enable proper
checksum validation on veth.  This really was a bug in veth.

Cong's patch would also break cases where we choose to inject packets
with invalid checksums, and they would now be accepted as correct.

Your use case is invalid, it just happened to work because of a
bug.  If you want the stack to fill checksums so that you want capture
and reinject packets, you have to disable checksum offloading (or
compute the checksum yourself in userspace).

Thanks.

-- 
Sabrina


Re: [PATCH 3.2 085/115] veth: don’t modify ip_summed; doing so treats packets with bad checksums as good.

2016-04-27 Thread Ben Greear

On 04/27/2016 05:00 PM, Hannes Frederic Sowa wrote:

Hi Ben,

On Wed, Apr 27, 2016, at 20:07, Ben Hutchings wrote:

On Wed, 2016-04-27 at 08:59 -0700, Ben Greear wrote:

On 04/26/2016 04:02 PM, Ben Hutchings wrote:


3.2.80-rc1 review patch.  If anyone has any objections, please let me know.

I would be careful about this.  It causes regressions when sending
PACKET_SOCKET buffers from user-space to veth devices.

There was a proposed upstream fix for the regression, but it has not gone
into the tree as far as I know.

http://www.spinics.net/lists/netdev/msg370436.html

[...]

OK, I'll drop this for now.


The fall out from not having this patch is in my opinion a bigger
fallout than not having this patch. This patch fixes silent data
corruption vs. the problem Ben Greear is talking about, which might not
be that a common usage.

What do others think?

Bye,
Hannes



This patch from Cong Wang seems to fix the regression for me, I think it should 
be added and
tested in the main tree, and then apply them to stable as a pair.

http://dmz2.candelatech.com/?p=linux-4.4.dev.y/.git;a=commitdiff;h=8153e983c0e5eba1aafe1fc296248ed2a553f1ac;hp=454b07405d694dad52e7f41af5816eed0190da8a



diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c
index da1ae0e..f8cc758 100644 (file)
--- a/net/packet/af_packet.c
+++ b/net/packet/af_packet.c
@@ -1926,6 +1926,7 @@ retry:
goto out_unlock;
}

+   skb->ip_summed = CHECKSUM_UNNECESSARY;
skb->protocol = proto;
skb->dev = dev;
skb->priority = sk->sk_priority;
@@ -2352,6 +2353,7 @@ static int tpacket_fill_skb(struct packet_sock *po, 
struct sk_buff *skb,

ph.raw = frame;

+   skb->ip_summed = CHECKSUM_UNNECESSARY;
skb->protocol = proto;
skb->dev = dev;
skb->priority = po->sk.sk_priority;
@@ -2776,6 +2778,7 @@ static int packet_snd(struct socket *sock, struct msghdr 
*msg, size_t len)
goto out_free;
}

+   skb->ip_summed = CHECKSUM_UNNECESSARY;
skb->protocol = proto;
skb->dev = dev;
skb->priority = sk->sk_priority;

Thanks,
Ben

--
Ben Greear 
Candela Technologies Inc  http://www.candelatech.com



Re: [PATCH 3.2 085/115] veth: don’t modify ip_summed; doing so treats packets with bad checksums as good.

2016-04-27 Thread Hannes Frederic Sowa
Hi Ben,

On Wed, Apr 27, 2016, at 20:07, Ben Hutchings wrote:
> On Wed, 2016-04-27 at 08:59 -0700, Ben Greear wrote:
> > On 04/26/2016 04:02 PM, Ben Hutchings wrote:
> > > 
> > > 3.2.80-rc1 review patch.  If anyone has any objections, please let me 
> > > know.
> > I would be careful about this.  It causes regressions when sending
> > PACKET_SOCKET buffers from user-space to veth devices.
> > 
> > There was a proposed upstream fix for the regression, but it has not gone
> > into the tree as far as I know.
> > 
> > http://www.spinics.net/lists/netdev/msg370436.html
> [...]
> 
> OK, I'll drop this for now.

The fall out from not having this patch is in my opinion a bigger
fallout than not having this patch. This patch fixes silent data
corruption vs. the problem Ben Greear is talking about, which might not
be that a common usage.

What do others think?

Bye,
Hannes


Re: [PATCH 3.2 085/115] veth: don’t modify ip_summed; doing so treats packets with bad checksums as good.

2016-04-27 Thread Ben Hutchings
On Wed, 2016-04-27 at 08:59 -0700, Ben Greear wrote:
> On 04/26/2016 04:02 PM, Ben Hutchings wrote:
> > 
> > 3.2.80-rc1 review patch.  If anyone has any objections, please let me know.
> I would be careful about this.  It causes regressions when sending
> PACKET_SOCKET buffers from user-space to veth devices.
> 
> There was a proposed upstream fix for the regression, but it has not gone
> into the tree as far as I know.
> 
> http://www.spinics.net/lists/netdev/msg370436.html
[...]

OK, I'll drop this for now.

Ben.

-- 
Ben Hutchings
Once a job is fouled up, anything done to improve it makes it worse.

signature.asc
Description: This is a digitally signed message part


Re: [PATCH 3.2 085/115] veth: don’t modify ip_summed; doing so treats packets with bad checksums as good.

2016-04-27 Thread Ben Greear

On 04/26/2016 04:02 PM, Ben Hutchings wrote:

3.2.80-rc1 review patch.  If anyone has any objections, please let me know.


I would be careful about this.  It causes regressions when sending
PACKET_SOCKET buffers from user-space to veth devices.

There was a proposed upstream fix for the regression, but it has not gone
into the tree as far as I know.

http://www.spinics.net/lists/netdev/msg370436.html

Thanks,
Ben



--

From: Vijay Pandurangan 

[ Upstream commit ce8c839b74e3017996fad4e1b7ba2e2625ede82f ]

Packets that arrive from real hardware devices have ip_summed ==
CHECKSUM_UNNECESSARY if the hardware verified the checksums, or
CHECKSUM_NONE if the packet is bad or it was unable to verify it. The
current version of veth will replace CHECKSUM_NONE with
CHECKSUM_UNNECESSARY, which causes corrupt packets routed from hardware to
a veth device to be delivered to the application. This caused applications
at Twitter to receive corrupt data when network hardware was corrupting
packets.

We believe this was added as an optimization to skip computing and
verifying checksums for communication between containers. However, locally
generated packets have ip_summed == CHECKSUM_PARTIAL, so the code as
written does nothing for them. As far as we can tell, after removing this
code, these packets are transmitted from one stack to another unmodified
(tcpdump shows invalid checksums on both sides, as expected), and they are
delivered correctly to applications. We didn’t test every possible network
configuration, but we tried a few common ones such as bridging containers,
using NAT between the host and a container, and routing from hardware
devices to containers. We have effectively deployed this in production at
Twitter (by disabling RX checksum offloading on veth devices).

This code dates back to the first version of the driver, commit
 ("[NET]: Virtual ethernet device driver"), so I
suspect this bug occurred mostly because the driver API has evolved
significantly since then. Commit <0b7967503dc97864f283a> ("net/veth: Fix
packet checksumming") (in December 2010) fixed this for packets that get
created locally and sent to hardware devices, by not changing
CHECKSUM_PARTIAL. However, the same issue still occurs for packets coming
in from hardware devices.

Co-authored-by: Evan Jones 
Signed-off-by: Evan Jones 
Cc: Nicolas Dichtel 
Cc: Phil Sutter 
Cc: Toshiaki Makita 
Cc: netdev@vger.kernel.org
Cc: linux-ker...@vger.kernel.org
Signed-off-by: Vijay Pandurangan 
Acked-by: Cong Wang 
Signed-off-by: David S. Miller 
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings 
---
  drivers/net/veth.c | 6 --
  1 file changed, 6 deletions(-)

--- a/drivers/net/veth.c
+++ b/drivers/net/veth.c
@@ -126,11 +126,6 @@ static netdev_tx_t veth_xmit(struct sk_b
stats = this_cpu_ptr(priv->stats);
rcv_stats = this_cpu_ptr(rcv_priv->stats);

-   /* don't change ip_summed == CHECKSUM_PARTIAL, as that
-  will cause bad checksum on forwarded packets */
-   if (skb->ip_summed == CHECKSUM_NONE &&
-   rcv->features & NETIF_F_RXCSUM)
-   skb->ip_summed = CHECKSUM_UNNECESSARY;

length = skb->len;
if (dev_forward_skb(rcv, skb) != NET_RX_SUCCESS)




--
Ben Greear 
Candela Technologies Inc  http://www.candelatech.com



[PATCH 3.2 085/115] veth: don’t modify ip_summed; doing so treats packets with bad checksums as good.

2016-04-26 Thread Ben Hutchings
3.2.80-rc1 review patch.  If anyone has any objections, please let me know.

--

From: Vijay Pandurangan 

[ Upstream commit ce8c839b74e3017996fad4e1b7ba2e2625ede82f ]

Packets that arrive from real hardware devices have ip_summed ==
CHECKSUM_UNNECESSARY if the hardware verified the checksums, or
CHECKSUM_NONE if the packet is bad or it was unable to verify it. The
current version of veth will replace CHECKSUM_NONE with
CHECKSUM_UNNECESSARY, which causes corrupt packets routed from hardware to
a veth device to be delivered to the application. This caused applications
at Twitter to receive corrupt data when network hardware was corrupting
packets.

We believe this was added as an optimization to skip computing and
verifying checksums for communication between containers. However, locally
generated packets have ip_summed == CHECKSUM_PARTIAL, so the code as
written does nothing for them. As far as we can tell, after removing this
code, these packets are transmitted from one stack to another unmodified
(tcpdump shows invalid checksums on both sides, as expected), and they are
delivered correctly to applications. We didn’t test every possible network
configuration, but we tried a few common ones such as bridging containers,
using NAT between the host and a container, and routing from hardware
devices to containers. We have effectively deployed this in production at
Twitter (by disabling RX checksum offloading on veth devices).

This code dates back to the first version of the driver, commit
 ("[NET]: Virtual ethernet device driver"), so I
suspect this bug occurred mostly because the driver API has evolved
significantly since then. Commit <0b7967503dc97864f283a> ("net/veth: Fix
packet checksumming") (in December 2010) fixed this for packets that get
created locally and sent to hardware devices, by not changing
CHECKSUM_PARTIAL. However, the same issue still occurs for packets coming
in from hardware devices.

Co-authored-by: Evan Jones 
Signed-off-by: Evan Jones 
Cc: Nicolas Dichtel 
Cc: Phil Sutter 
Cc: Toshiaki Makita 
Cc: netdev@vger.kernel.org
Cc: linux-ker...@vger.kernel.org
Signed-off-by: Vijay Pandurangan 
Acked-by: Cong Wang 
Signed-off-by: David S. Miller 
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings 
---
 drivers/net/veth.c | 6 --
 1 file changed, 6 deletions(-)

--- a/drivers/net/veth.c
+++ b/drivers/net/veth.c
@@ -126,11 +126,6 @@ static netdev_tx_t veth_xmit(struct sk_b
stats = this_cpu_ptr(priv->stats);
rcv_stats = this_cpu_ptr(rcv_priv->stats);
 
-   /* don't change ip_summed == CHECKSUM_PARTIAL, as that
-  will cause bad checksum on forwarded packets */
-   if (skb->ip_summed == CHECKSUM_NONE &&
-   rcv->features & NETIF_F_RXCSUM)
-   skb->ip_summed = CHECKSUM_UNNECESSARY;
 
length = skb->len;
if (dev_forward_skb(rcv, skb) != NET_RX_SUCCESS)