Re: [net PATCH 2/2] ipv4/GRO: Make GRO conform to RFC 6864
On 04/01/2016 07:21 PM, Eric Dumazet wrote: On Fri, 2016-04-01 at 22:16 -0400, David Miller wrote: From: Alexander DuyckDate: Fri, 1 Apr 2016 12:58:41 -0700 RFC 6864 is pretty explicit about this, IPv4 ID used only for fragmentation. https://tools.ietf.org/html/rfc6864#section-4.1 The goal with this change is to try and keep most of the existing behavior in tact without violating this rule? I would think the sequence number should give you the ability to infer a drop in the case of TCP. In the case of UDP tunnels we are now getting a bit more data since we were ignoring the outer IP header ID before. When retransmits happen, the sequence numbers are the same. But you can then use the IP ID to see exactly what happened. You can even tell if multiple retransmits got reordered. Eric's use case is extremely useful, and flat out eliminates ambiguity when analyzing TCP traces. Yes, our team (including Van Jacobson ;) ) would be sad to not have sequential IP ID (but then we don't have them for IPv6 ;) ) Your team would not be the only one sad to see that go away. rick jones Since the cost of generating them is pretty small (inet->inet_id counter), we probably should keep them in linux. Their usage will phase out as IPv6 wins the Internet war...
Re: [net PATCH 2/2] ipv4/GRO: Make GRO conform to RFC 6864
On Fri, Apr 1, 2016 at 7:16 PM, David Millerwrote: > From: Alexander Duyck > Date: Fri, 1 Apr 2016 12:58:41 -0700 > >> RFC 6864 is pretty explicit about this, IPv4 ID used only for >> fragmentation. https://tools.ietf.org/html/rfc6864#section-4.1 >> >> The goal with this change is to try and keep most of the existing >> behavior in tact without violating this rule? I would think the >> sequence number should give you the ability to infer a drop in the >> case of TCP. In the case of UDP tunnels we are now getting a bit more >> data since we were ignoring the outer IP header ID before. > > When retransmits happen, the sequence numbers are the same. But you > can then use the IP ID to see exactly what happened. You can even > tell if multiple retransmits got reordered. > > Eric's use case is extremely useful, and flat out eliminates ambiguity > when analyzing TCP traces. I'm not really sure the IP ID is really all that useful. On a 10G link it takes about 80ms for it to wrap using an MTU of 1500. That value is going to keep dropping as we move up to 40G and 100G. That was one of the motivations behind RFC 6864 because it starts becoming a bottle-neck if you want to keep the IP IDs truly unique. In addition while this change would allow you to combined disjointed IP IDs I don't think you would lose the re-transmission as there would likely be a gap in sequence numbers that would cause the flow to be flushed from GRO, and it isn't as if we can prepend the retransmit to the aggregated frame, we are always adding to the tail. I would think the most likely case where this change would foul up any IP IDs would be the garbage-in/garbage-out case like the IPv6 to IPv4 translation that is using the fixed IP ID of 0. I agree that it isn't desirable, however at the same time per RFC 6864 there is nothing there to say we cannot do that. In addition it would likely help to mitigate some of the impact of IP ID on things like SLHC compression since the resegmented frames would be incrementing so it would reduce the number of times the IP ID would have to be updated. - Alex
Re: [net PATCH 2/2] ipv4/GRO: Make GRO conform to RFC 6864
On Sat, 2016-04-02 at 10:19 +0800, Herbert Xu wrote: > On Fri, Apr 01, 2016 at 07:15:33PM -0700, Eric Dumazet wrote: > > On Sat, 2016-04-02 at 09:57 +0800, Herbert Xu wrote: > > > > > > We could easily fix that by adding a feature bit to control this, > > > something like SKB_GSO_TCP_FIXEDID. > > > > I understood the patch allowed to aggregate 4 segments having > > > > ID=12 ID=40 ID=80 ID=1000 > > Right. But I haven't seen any justification that we should aggregate > such packets at all. The only valid case that I have seen is for > the case of unchanging IDs, no? Presumably repeats of "DF=1 ID=0" should be what we really need to handle. On my wish list, having some reordering logic in GRO would be far more interesting than these minor details ;)
Re: [net PATCH 2/2] ipv4/GRO: Make GRO conform to RFC 6864
On Fri, 2016-04-01 at 22:16 -0400, David Miller wrote: > From: Alexander Duyck> Date: Fri, 1 Apr 2016 12:58:41 -0700 > > > RFC 6864 is pretty explicit about this, IPv4 ID used only for > > fragmentation. https://tools.ietf.org/html/rfc6864#section-4.1 > > > > The goal with this change is to try and keep most of the existing > > behavior in tact without violating this rule? I would think the > > sequence number should give you the ability to infer a drop in the > > case of TCP. In the case of UDP tunnels we are now getting a bit more > > data since we were ignoring the outer IP header ID before. > > When retransmits happen, the sequence numbers are the same. But you > can then use the IP ID to see exactly what happened. You can even > tell if multiple retransmits got reordered. > > Eric's use case is extremely useful, and flat out eliminates ambiguity > when analyzing TCP traces. Yes, our team (including Van Jacobson ;) ) would be sad to not have sequential IP ID (but then we don't have them for IPv6 ;) ) Since the cost of generating them is pretty small (inet->inet_id counter), we probably should keep them in linux. Their usage will phase out as IPv6 wins the Internet war...
Re: [net PATCH 2/2] ipv4/GRO: Make GRO conform to RFC 6864
On Fri, Apr 01, 2016 at 07:15:33PM -0700, Eric Dumazet wrote: > On Sat, 2016-04-02 at 09:57 +0800, Herbert Xu wrote: > > > > We could easily fix that by adding a feature bit to control this, > > something like SKB_GSO_TCP_FIXEDID. > > I understood the patch allowed to aggregate 4 segments having > > ID=12 ID=40 ID=80 ID=1000 Right. But I haven't seen any justification that we should aggregate such packets at all. The only valid case that I have seen is for the case of unchanging IDs, no? Cheers, -- Email: Herbert XuHome Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
Re: [net PATCH 2/2] ipv4/GRO: Make GRO conform to RFC 6864
From: Alexander DuyckDate: Fri, 1 Apr 2016 12:58:41 -0700 > RFC 6864 is pretty explicit about this, IPv4 ID used only for > fragmentation. https://tools.ietf.org/html/rfc6864#section-4.1 > > The goal with this change is to try and keep most of the existing > behavior in tact without violating this rule? I would think the > sequence number should give you the ability to infer a drop in the > case of TCP. In the case of UDP tunnels we are now getting a bit more > data since we were ignoring the outer IP header ID before. When retransmits happen, the sequence numbers are the same. But you can then use the IP ID to see exactly what happened. You can even tell if multiple retransmits got reordered. Eric's use case is extremely useful, and flat out eliminates ambiguity when analyzing TCP traces.
Re: [net PATCH 2/2] ipv4/GRO: Make GRO conform to RFC 6864
On Sat, 2016-04-02 at 09:57 +0800, Herbert Xu wrote: > Eric Dumazetwrote: > > > > I do not particularly care, but it is worth mentioning that GRO+TSO > > would not be idempotent anymore. > > We could easily fix that by adding a feature bit to control this, > something like SKB_GSO_TCP_FIXEDID. I understood the patch allowed to aggregate 4 segments having ID=12 ID=40 ID=80 ID=1000 -> resulting GRO packet with 4 segments and ID=12. TSO would generate 12,13,14,15 or 12,12,12,12 with your flag ? (Before the patch no aggregation took place and exact same packets were forwarded with 12, 40, 80, 1000) As I said, I am not sure we should care :)
Re: [net PATCH 2/2] ipv4/GRO: Make GRO conform to RFC 6864
Eric Dumazetwrote: > > I do not particularly care, but it is worth mentioning that GRO+TSO > would not be idempotent anymore. We could easily fix that by adding a feature bit to control this, something like SKB_GSO_TCP_FIXEDID. Cheers, -- Email: Herbert Xu Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
RE: [net PATCH 2/2] ipv4/GRO: Make GRO conform to RFC 6864
| For transmit we can leave the IP ID code as is. For receive we should not be | snooping into the IP ID for any frames that have the DF bit set as devices | that have adopted RFC 6864 on their transmit path will end up causing issues. Currently, GRO does not coalesce TCP packets originating from nodes which do IPv6 to IPv4 translation. These packets have DF set to 1 and IP ID set 0 according to https://tools.ietf.org/html/rfc6145#page-17 With this patch, we should be able to see coalescing happen for those cases as well.
Re: [net PATCH 2/2] ipv4/GRO: Make GRO conform to RFC 6864
On Fri, Apr 1, 2016 at 12:24 PM, David Millerwrote: > From: Eric Dumazet > Date: Fri, 01 Apr 2016 11:49:03 -0700 > >> For example, TCP stack tracks per socket ID generation, even if it >> sends DF=1 frames. Damn useful for tcpdump analysis and drop >> inference. > > Thanks for mentioning this, I never considered this use case. RFC 6864 is pretty explicit about this, IPv4 ID used only for fragmentation. https://tools.ietf.org/html/rfc6864#section-4.1 The goal with this change is to try and keep most of the existing behavior in tact without violating this rule? I would think the sequence number should give you the ability to infer a drop in the case of TCP. In the case of UDP tunnels we are now getting a bit more data since we were ignoring the outer IP header ID before. >> With your change, the resulting GRO packet would propagate the ID of >> first frag. Most GSO/GSO engines will then provide a ID sequence, >> which might not match original packets. Right. But that is only in the case where the IP IDs did not already increment or were left uninitialized meaning the transmitter was probably already following RFC 6864 and chose a fixed value. Odds are in such a case we end up improving the performance if anything as there are plenty of legacy systems out there that still require the IPv4 ID increment in order to get LRO/GRO. >> I do not particularly care, but it is worth mentioning that GRO+TSO >> would not be idempotent anymore. In the patch I mentioned we had already broken that. I'm basically just going through and fixing the cases for tunnels where we were doing the outer header wrong while at the same time relaxing the requirements for the inner header if DF is set. I'll probably add some documentation do the Documentation folder about it as well. I'm currently in the process of writing up documentation for GSO and GSO partial for the upcoming patchset. I can pretty easily throw in a few comments about GRO as well. > Our eventual plan was to start emitting zero in the ID field for > outgoing TCP datagrams with DF set, since the issue that caused us to > generate incrementing IDs in the first place (buggy Microsoft SLHC > compression) we decided is not relevant and important enough to > accommodate any more. For the GSO partial stuff I was probably just going to have the IP ID on the inner headers lurch forward in chunks equal to gso_segs when we are doing the segmentation. I didn't want to use a fixed value just because that would likely make it easy to identify Linux devices being a bump in the wire. I figure if there are already sources that weren't updating IP ID for their segmentation offloads then if we just take that approach odds are we will blend in with the other devices and be more difficult to single out. Another reason for doing it this way is that different devices are going to have different behaviors with GSO partial. In the case of the i40e driver it recognizes both inner and outer network headers so it can increment both correctly. In the case of igb and ixgbe they only can support the outer header so the inner IP ID value would be lurching by gso_size every time we move from one GSO frame to the next. > So outside of your TCP behavior analysis case, there isn't a > compelling argument to keeping that code around any more, rather than > just put zero in the ID field. > > I suppose we could keep the counter code around and allow it to be > enabled using a sysctl or socket option, but how strongly do you > really feel about this? I'm not suggesting we drop the counter code for transmit. What RFC 6864 says is "Originating sources MAY set the IPv4 ID field of atomic datagrams to any value." For transmit we can leave the IP ID code as is. For receive we should not be snooping into the IP ID for any frames that have the DF bit set as devices that have adopted RFC 6864 on their transmit path will end up causing issues. - Alex
Re: [net PATCH 2/2] ipv4/GRO: Make GRO conform to RFC 6864
From: Eric DumazetDate: Fri, 01 Apr 2016 11:49:03 -0700 > For example, TCP stack tracks per socket ID generation, even if it > sends DF=1 frames. Damn useful for tcpdump analysis and drop > inference. Thanks for mentioning this, I never considered this use case. > With your change, the resulting GRO packet would propagate the ID of > first frag. Most GSO/GSO engines will then provide a ID sequence, > which might not match original packets. > > I do not particularly care, but it is worth mentioning that GRO+TSO > would not be idempotent anymore. Our eventual plan was to start emitting zero in the ID field for outgoing TCP datagrams with DF set, since the issue that caused us to generate incrementing IDs in the first place (buggy Microsoft SLHC compression) we decided is not relevant and important enough to accommodate any more. So outside of your TCP behavior analysis case, there isn't a compelling argument to keeping that code around any more, rather than just put zero in the ID field. I suppose we could keep the counter code around and allow it to be enabled using a sysctl or socket option, but how strongly do you really feel about this?
Re: [net PATCH 2/2] ipv4/GRO: Make GRO conform to RFC 6864
On Fri, 2016-04-01 at 11:05 -0700, Alexander Duyck wrote: > RFC 6864 states that the IPv4 ID field MUST NOT be used for purposes other > than fragmentation and reassembly. Currently we are looking at this field > as a way of identifying what frames can be aggregated and which cannot for > GRO. While this is valid for frames that do not have DF set, it is invalid > to do so if the bit is set. > > In addition we were generating IPv4 ID collisions when 2 or more flows were > interleaved over the same tunnel. To prevent that we store the result of > all IP ID checks via a "|=" instead of overwriting previous values. Note that for atomic datagrams (DF = 1), since fragmentation and reassembly can not occur, maybe some people use ID field for other purposes. For example, TCP stack tracks per socket ID generation, even if it sends DF=1 frames. Damn useful for tcpdump analysis and drop inference. With your change, the resulting GRO packet would propagate the ID of first frag. Most GSO/GSO engines will then provide a ID sequence, which might not match original packets. I do not particularly care, but it is worth mentioning that GRO+TSO would not be idempotent anymore.