On Mon, Mar 24, 2014 at 10:34 AM, Erik Nordmark <[email protected]> wrote: > On 3/14/14 8:52 PM, Zhou, Han wrote: >> >> >> Tom, the point of this draft is that the "last possible point in the >> stack" can >> be pushed to the remote end-point of the VXLAN tunnel. If the remote >> is an hypervisor, this GSO is terminated without actual work: the >> receiving hypervisor simply delivers the large packet to receiving guest. > > Han, > > Are you saying that the sending VM and vswitch will send a large packet > (e.g., 32k) over UDP, and this will be delivered to the receiving vSwitch > and VM as one large packet? That certainly makes the work on the VMs a lot > less, hence I can understand that you see performance improvements. > > However, that would result in IP fragmentation of that large UDP/VXLAN > packet AFAICT. > That would be UFO. It seems more likely that you'd want to do TSO/LRO style L4 segmentation/reassembly.
It's still not clear to me on why the MTU would need to be on the wire, offload mechanisms already work now without that information. Also, for packets going from one VM to another within a host moving jumbo packets could be directly linked into the TSO/LRO mechanisms. > The IETF has some experience with protocols where the loss unit is smaller > than the retransmission unit, and this results in very poor performance > under packet loss due to the loss of a single, small unit resulting in the > retransmission of a large unit (the 32k packet, which might be a TCP, SCTP, > etc packet i.e. a reliable protocol with retransmissions.) > > Regards, > Erik > > > > >> This results in a huge performance gain, especially in the receiving side >> because the number of packets being handled are much smaller. The >> speed up is 2x - 3x even when the physical network still transmits small >> (1514 bytes) packets with IP fragmentation. If jumbo frames used in >> physical network the speedup will be boosted even higher. I would like >> to share more data of the prototype if it is of interest. >> >> So this is not a local mechanism, but need agreement between end-points. >> In a setup like: >> Hypervisor A <-- Hypervisor B --> Gateway >> On hypervisor B the VTEP treat both remote VTEPs the same way: it fills >> segmentation-offloading information for GSO packets. Hypervisor A >> optionally checks such information to understand that this is a valid >> large >> packet and don't drop it even its size is bigger than the guest's virtual >> interface. But this information is critical to the Gateway: it has to >> perform >> the real segmentation if the packet is being forwarded to physical >> networks. >> And this is why we need the metadata in the on-the-wire protocol. >> >>> - I believe this would conflict with the proposal to add a protocol >>> field to the VXLAN header. Overloading one field in a fixed header is >>> not an adequate substitute for a truly extensible header. In the best >>> case we could only use one or the other functionality in a given >>> packet. In the worse case, overloading opens the door to backwards >>> compatibility issues and the potential for misinterpretation of >>> fields. >>> >> I agree that we'd better avoid field overloading. But as stated in section >> 3, >> it is not conflict with VXLAN-gpe, because when segmentation offloading >> is enabled the encapsulated packet should always be Ethernet, and in >> such case prototype is not needed. But you reminded me that, it should >> be defined clearly that P bit specified by VXLAN-gpe MUST be set to 0 >> when S bit is 1. We will address that in version 01. >> >> Or if you know any real (or potential) scenarios of conflict between >> VXLAN-soe and VXLAN-gpe, please kindly point out and we can consider >> using the remaining space in the header instead of overloading it. >> >>> Tom >>> >>>> So this is a practical yet generic proposal, which extends the >>>> offloading concept >>>> to from kernel stacks to remote end-points of overlay networks. >>>> >>>> The metadata for offloading is very similar to STT. There difference is >>>> that: >>>> 1. it doesn’t add fake TCP header to utilize NIC TSO. >>>> 2. it doesn't include helper fields - just to save the limited VXLAN >>>> header space >>> >>> for >>>> >>>> other possible purpose in the future. >>>> 3. VXLAN is widely adopted and this is only a minor extension backward >>> >>> compatible >>>> >>>> Based on this, it is highly recommended to add segmentation metadata in >>> >>> VXLAN >>>> >>>> header as proposed in this draft. >>>> >>>> Any comments are appreciated! >>>> >>>> Best regards, >>>> Han Zhou >>>> >>>> -----Original Message----- >>>> From: [email protected] [mailto:[email protected]] >>>> Sent: Thursday, March 13, 2014 10:29 PM >>>> To: Zhou, Han; Li, Chengyuan; Li, Chengyuan; Zhou, Han >>>> Subject: New Version Notification for draft-zhou-li-vxlan-soe-00.txt >>>> >>>> >>>> A new version of I-D, draft-zhou-li-vxlan-soe-00.txt >>>> has been successfully submitted by Han Zhou and posted to the >>>> IETF repository. >>>> >>>> Name: draft-zhou-li-vxlan-soe >>>> Revision: 00 >>>> Title: Segmentation Offloading Extension for VxLAN >>>> Document date: 2014-03-13 >>>> Group: Individual Submission >>>> Pages: 7 >>>> URL: >>> >>> http://www.ietf.org/internet-drafts/draft-zhou-li-vxlan-soe-00.txt >>>> >>>> Status: >>>> https://datatracker.ietf.org/doc/draft-zhou-li-vxlan-soe/ >>>> Htmlized: http://tools.ietf.org/html/draft-zhou-li-vxlan-soe-00 >>>> >>>> >>>> Abstract: >>>> Segmentation offloading is nowadays common in network stack >>>> implementation and well supported by para-virtualized network device >>>> drivers for virtual machine (VM)s. This draft describes an extension >>>> to Virtual eXtensible Local Area Network (VXLAN) so that >>>> segmentation >>>> can be decoupled from physical/underlay networks and offloaded >>>> further to the remote end-point thus improving data-plane >>>> performance >>>> for VMs running on top of overlay networks. >>>> >>>> >>>> >>>> >>>> Please note that it may take a couple of minutes from the time of >>>> submission >>>> until the htmlized version and diff are available at tools.ietf.org. >>>> >>>> The IETF Secretariat >>>> >>>> _______________________________________________ >>>> nvo3 mailing list >>>> [email protected] >>>> https://www.ietf.org/mailman/listinfo/nvo3 >> >> _______________________________________________ >> nvo3 mailing list >> [email protected] >> https://www.ietf.org/mailman/listinfo/nvo3 >> > _______________________________________________ nvo3 mailing list [email protected] https://www.ietf.org/mailman/listinfo/nvo3
