Hi Tom,

Thanks for your comments! Please see my comments inline.

> -----Original Message-----
> From: Tom Herbert [mailto:[email protected]]
> Sent: Friday, March 14, 2014 11:13 PM
> To: Zhou, Han
> Cc: [email protected]; [email protected]
> Subject: Re: [nvo3] FW: New Version Notification for
> draft-zhou-li-vxlan-soe-00.txt
> 
> On Thu, Mar 13, 2014 at 7:25 PM, Zhou, Han <[email protected]> wrote:
> > Hi folks,
> >
> > We posted a draft as an extension to VXLAN. Please take a look.
> >
> > The motivation came from our experiments on VXLAN optimization. It seems
> lots
> > of discussions ongoing about the necessity of adding metadata to transport
> headers,
> > and it is also controversial whether we should take offloading into
> consideration in
> > the headers. However, our test result shows significant performance gains 
> > even
> > without any help from hardware offloading. The performance of a single TCP
> > session improved from 1.5 Gbits/sec to 3.5~4 Gbits/sec: more than doubled!
> >
> Hi Han,
> 
> - The mechanisms you're using are local within a host so this should
> be accomplished by software API as opposed to changing the on-the-wire
> protocol. The API should be generic other encaps. In the most general
> case, we'd want to provide a TSO device interface to the guest and
> only do the segmentation at last possible point in the stack (either
> GSO from the host's physical driver, or TSO if device has support).
> Most of this is supported now in Linux kernel.

Tom, the point of this draft is that the "last possible point in the stack" can
be pushed to the remote end-point of the VXLAN tunnel. If the remote
is an hypervisor, this GSO is terminated without actual work: the 
receiving hypervisor simply delivers the large packet to receiving guest.
This results in a huge performance gain, especially in the receiving side
because the number of packets being handled are much smaller. The 
speed up is 2x - 3x even when the physical network still transmits small
(1514 bytes) packets with IP fragmentation. If jumbo frames used in
physical network the speedup will be boosted even higher. I would like 
to share more data of the prototype if it is of interest.

So this is not a local mechanism, but need agreement between end-points.
In a setup like:
   Hypervisor A <-- Hypervisor B --> Gateway
On hypervisor B the VTEP treat both remote VTEPs the same way: it fills
segmentation-offloading information for GSO packets. Hypervisor A
optionally checks such information to understand that this is a valid large
packet and don't drop it even its size is bigger than the guest's virtual
interface. But this information is critical to the Gateway: it has to perform
the real segmentation if the packet is being forwarded to physical networks.
And this is why we need the metadata in the on-the-wire protocol.

> - I believe this would conflict with the proposal to add a protocol
> field to the VXLAN header. Overloading one field in a fixed header is
> not an adequate substitute for a truly extensible header. In the best
> case we could only use one or the other functionality in a given
> packet. In the worse case, overloading opens the door to backwards
> compatibility issues and the potential for misinterpretation of
> fields.
> 

I agree that we'd better avoid field overloading. But as stated in section 3,
it is not conflict with VXLAN-gpe, because when segmentation offloading
is enabled the encapsulated packet should always be Ethernet, and in 
such case prototype is not needed. But you reminded me that, it should 
be defined clearly that P bit specified by VXLAN-gpe MUST be set to 0 
when S bit is 1. We will address that in version 01.

Or if you know any real (or potential) scenarios of conflict between 
VXLAN-soe and VXLAN-gpe, please kindly point out and we can consider
using the remaining space in the header instead of overloading it.

> Tom
> 
> > So this is a practical yet generic proposal, which extends the offloading 
> > concept
> > to from kernel stacks to remote end-points of overlay networks.
> >
> > The metadata for offloading is very similar to STT. There difference is 
> > that:
> > 1. it doesn’t add fake TCP header to utilize NIC TSO.
> > 2. it doesn't include helper fields - just to save the limited VXLAN header 
> > space
> for
> > other possible purpose in the future.
> > 3. VXLAN is widely adopted and this is only a minor extension backward
> compatible
> >
> > Based on this, it is highly recommended to add segmentation metadata in
> VXLAN
> > header as proposed in this draft.
> >
> > Any comments are appreciated!
> >
> > Best regards,
> > Han Zhou
> >
> > -----Original Message-----
> > From: [email protected] [mailto:[email protected]]
> > Sent: Thursday, March 13, 2014 10:29 PM
> > To: Zhou, Han; Li, Chengyuan; Li, Chengyuan; Zhou, Han
> > Subject: New Version Notification for draft-zhou-li-vxlan-soe-00.txt
> >
> >
> > A new version of I-D, draft-zhou-li-vxlan-soe-00.txt
> > has been successfully submitted by Han Zhou and posted to the
> > IETF repository.
> >
> > Name:           draft-zhou-li-vxlan-soe
> > Revision:       00
> > Title:          Segmentation Offloading Extension for VxLAN
> > Document date:  2014-03-13
> > Group:          Individual Submission
> > Pages:          7
> > URL:
> http://www.ietf.org/internet-drafts/draft-zhou-li-vxlan-soe-00.txt
> > Status:         https://datatracker.ietf.org/doc/draft-zhou-li-vxlan-soe/
> > Htmlized:       http://tools.ietf.org/html/draft-zhou-li-vxlan-soe-00
> >
> >
> > Abstract:
> >    Segmentation offloading is nowadays common in network stack
> >    implementation and well supported by para-virtualized network device
> >    drivers for virtual machine (VM)s. This draft describes an extension
> >    to Virtual eXtensible Local Area Network (VXLAN) so that segmentation
> >    can be decoupled from physical/underlay networks and offloaded
> >    further to the remote end-point thus improving data-plane performance
> >    for VMs running on top of overlay networks.
> >
> >
> >
> >
> > Please note that it may take a couple of minutes from the time of submission
> > until the htmlized version and diff are available at tools.ietf.org.
> >
> > The IETF Secretariat
> >
> > _______________________________________________
> > nvo3 mailing list
> > [email protected]
> > https://www.ietf.org/mailman/listinfo/nvo3
_______________________________________________
nvo3 mailing list
[email protected]
https://www.ietf.org/mailman/listinfo/nvo3

Reply via email to