Hi Tom,

On Mon, 2014-03-24 at 14:33 -0700, Tom Herbert wrote:
> On Mon, Mar 24, 2014 at 10:34 AM, Erik Nordmark <[email protected]> wrote:
> > On 3/14/14 8:52 PM, Zhou, Han wrote:
> >>
> >>
> >> Tom, the point of this draft is that the "last possible point in the
> >> stack" can
> >> be pushed to the remote end-point of the VXLAN tunnel. If the remote
> >> is an hypervisor, this GSO is terminated without actual work: the
> >> receiving hypervisor simply delivers the large packet to receiving guest.
> >
> > Han,
> >
> > Are you saying that the sending VM and vswitch will send a large packet
> > (e.g., 32k) over UDP, and this will be delivered to the receiving vSwitch
> > and VM as one large packet? That certainly makes the work on the VMs a lot
> > less, hence I can understand that you see performance improvements.
> >
> > However, that would result in IP fragmentation of that large UDP/VXLAN
> > packet AFAICT.
> >
> That would be UFO. It seems more likely that you'd want to do TSO/LRO
> style L4 segmentation/reassembly.
> 

UFO would be ideal, but it is not available in most current NICs. And it
is not the mechanism described in this draft. This proposal is a pure
software solution without any hardware offloading required.
Implementation can choose to utilize hardware UFO if available, and
otherwise fallback to the mechanism proposed here (VXLAN-SOE).

> It's still not clear to me on why the MTU would need to be on the
> wire, offload mechanisms already work now without that information.
> Also, for packets going from one VM to another within a host moving
> jumbo packets could be directly linked into the TSO/LRO mechanisms.
> 

If the remote VTEP is a gateway rather than a hypervisor, it would
require the overlay MTU information for re-segmentation so that the
correct MTU (may be a result of path-MTU negotiation) is applied. As
mentioned in the draft (section 2.4), there are 3 choices on the gateway:
1. re-segment in gateway software
2. offload to in NIC hardware
3. offload to the next hop if it is a tunnel transport (e.g. VXLAN-GPE)
and the tunnel protocol supports offloading, too (e.g. also VXLAN-SOE).

Best regards,
Han

> 
> > The IETF has some experience with protocols where the loss unit is smaller
> > than the retransmission unit, and this results in very poor performance
> > under packet loss due to the loss of a single, small unit resulting in the
> > retransmission of a large unit (the 32k packet, which might be a TCP, SCTP,
> > etc packet i.e. a reliable protocol with retransmissions.)
> >
> > Regards,
> >     Erik
> >
> >
> >
> >
> >> This results in a huge performance gain, especially in the receiving side
> >> because the number of packets being handled are much smaller. The
> >> speed up is 2x - 3x even when the physical network still transmits small
> >> (1514 bytes) packets with IP fragmentation. If jumbo frames used in
> >> physical network the speedup will be boosted even higher. I would like
> >> to share more data of the prototype if it is of interest.
> >>
> >> So this is not a local mechanism, but need agreement between end-points.
> >> In a setup like:
> >>     Hypervisor A <-- Hypervisor B --> Gateway
> >> On hypervisor B the VTEP treat both remote VTEPs the same way: it fills
> >> segmentation-offloading information for GSO packets. Hypervisor A
> >> optionally checks such information to understand that this is a valid
> >> large
> >> packet and don't drop it even its size is bigger than the guest's virtual
> >> interface. But this information is critical to the Gateway: it has to
> >> perform
> >> the real segmentation if the packet is being forwarded to physical
> >> networks.
> >> And this is why we need the metadata in the on-the-wire protocol.
> >>
> >>> - I believe this would conflict with the proposal to add a protocol
> >>> field to the VXLAN header. Overloading one field in a fixed header is
> >>> not an adequate substitute for a truly extensible header. In the best
> >>> case we could only use one or the other functionality in a given
> >>> packet. In the worse case, overloading opens the door to backwards
> >>> compatibility issues and the potential for misinterpretation of
> >>> fields.
> >>>
> >> I agree that we'd better avoid field overloading. But as stated in section
> >> 3,
> >> it is not conflict with VXLAN-gpe, because when segmentation offloading
> >> is enabled the encapsulated packet should always be Ethernet, and in
> >> such case prototype is not needed. But you reminded me that, it should
> >> be defined clearly that P bit specified by VXLAN-gpe MUST be set to 0
> >> when S bit is 1. We will address that in version 01.
> >>
> >> Or if you know any real (or potential) scenarios of conflict between
> >> VXLAN-soe and VXLAN-gpe, please kindly point out and we can consider
> >> using the remaining space in the header instead of overloading it.
> >>
> >>> Tom
> >>>
> >>>> So this is a practical yet generic proposal, which extends the
> >>>> offloading concept
> >>>> to from kernel stacks to remote end-points of overlay networks.
> >>>>
> >>>> The metadata for offloading is very similar to STT. There difference is
> >>>> that:
> >>>> 1. it doesn’t add fake TCP header to utilize NIC TSO.
> >>>> 2. it doesn't include helper fields - just to save the limited VXLAN
> >>>> header space
> >>>
> >>> for
> >>>>
> >>>> other possible purpose in the future.
> >>>> 3. VXLAN is widely adopted and this is only a minor extension backward
> >>>
> >>> compatible
> >>>>
> >>>> Based on this, it is highly recommended to add segmentation metadata in
> >>>
> >>> VXLAN
> >>>>
> >>>> header as proposed in this draft.
> >>>>
> >>>> Any comments are appreciated!
> >>>>
> >>>> Best regards,
> >>>> Han Zhou
> >>>>
> >>>> -----Original Message-----
> >>>> From: [email protected] [mailto:[email protected]]
> >>>> Sent: Thursday, March 13, 2014 10:29 PM
> >>>> To: Zhou, Han; Li, Chengyuan; Li, Chengyuan; Zhou, Han
> >>>> Subject: New Version Notification for draft-zhou-li-vxlan-soe-00.txt
> >>>>
> >>>>
> >>>> A new version of I-D, draft-zhou-li-vxlan-soe-00.txt
> >>>> has been successfully submitted by Han Zhou and posted to the
> >>>> IETF repository.
> >>>>
> >>>> Name:           draft-zhou-li-vxlan-soe
> >>>> Revision:       00
> >>>> Title:          Segmentation Offloading Extension for VxLAN
> >>>> Document date:  2014-03-13
> >>>> Group:          Individual Submission
> >>>> Pages:          7
> >>>> URL:
> >>>
> >>> http://www.ietf.org/internet-drafts/draft-zhou-li-vxlan-soe-00.txt
> >>>>
> >>>> Status:
> >>>> https://datatracker.ietf.org/doc/draft-zhou-li-vxlan-soe/
> >>>> Htmlized:       http://tools.ietf.org/html/draft-zhou-li-vxlan-soe-00
> >>>>
> >>>>
> >>>> Abstract:
> >>>>     Segmentation offloading is nowadays common in network stack
> >>>>     implementation and well supported by para-virtualized network device
> >>>>     drivers for virtual machine (VM)s. This draft describes an extension
> >>>>     to Virtual eXtensible Local Area Network (VXLAN) so that
> >>>> segmentation
> >>>>     can be decoupled from physical/underlay networks and offloaded
> >>>>     further to the remote end-point thus improving data-plane
> >>>> performance
> >>>>     for VMs running on top of overlay networks.
> >>>>
> >>>>
> >>>>
> >>>>
> >>>> Please note that it may take a couple of minutes from the time of
> >>>> submission
> >>>> until the htmlized version and diff are available at tools.ietf.org.
> >>>>
> >>>> The IETF Secretariat
> >>>>
> >>>> _______________________________________________
> >>>> nvo3 mailing list
> >>>> [email protected]
> >>>> https://www.ietf.org/mailman/listinfo/nvo3
> >>
> >> _______________________________________________
> >> nvo3 mailing list
> >> [email protected]
> >> https://www.ietf.org/mailman/listinfo/nvo3
> >>
> >

_______________________________________________
nvo3 mailing list
[email protected]
https://www.ietf.org/mailman/listinfo/nvo3

Reply via email to