from:"Tom Herbert"

Re: [nvo3] Questions to your draft-herbert-nvo3-ila-02

2016-04-15 Thread Tom Herbert

On Fri, Apr 15, 2016 at 3:15 PM, Linda Dunbar  wrote:
> Tom,
>
>
>
> Your draft 3.1 suggests that the “locator” is encoded in the address. Does
> it mean if the application is moved to a different “container”, the
> application will have a different address (as the locator has changed)?
>
The address on the wire would change, not the addresses that an
application sees. The basic sequence of ILA translation is something
like: SIR:ID->Loc:ID->SIR:ID. So applications in both sides in the
network only see the SIR addresses. Translation from SIR to ILA
address and back can be done as end hosts or in the network which I
believes adheres to the NVE model (the latter might imply some L2
forwarding to final end host).

Tom

>
>
> Thanks,
>
>
>
> Linda
>
>

___
nvo3 mailing list
nvo3@ietf.org
https://www.ietf.org/mailman/listinfo/nvo3

[nvo3] Fwd: New Version Notification for draft-herbert-vxlan-rco-01.txt

2016-03-01 Thread Tom Herbert

Hi,

I have uploaded a new version of Remote Checksum Offload for VXLAN
(and also draft-herbert-remotecsumoffload-02.txt that contains
background discussion).

RCO for VXLAN feature has been implemented for a while and being
deployed as an optional feature in the Linux implementation of VXLAN.
We are currently adding support to Linux for VXLAN-GPE, so I would
like to include RCO as a feature from the start.

Accordingly, I have some questions for VXLAN protocol experts:

1) How can we "officially" allocate a flag bit to indicate RCO option
in VXLAN? Is this correct use and choice of a bit? (currently we use
bit #10 in the header)
2) Is using the low order bits of the VNID for the option data
architecturally "correct"? If not, what is the suggested alternative?
3) Presumably we want this for VXLAN-GPE also. Should we just use the
same flag and field for that? Does this need its own draft? I don't
believe there are any conflicts (however we do have one for GPB that
will need to be resolved)
4) More of an FYI: we recently changed to make the UDP checksum
enabled as the default for configuration of VXLAN for IPv4 and IPv6
(zero TX checksum can still be configured). There is now a lot of data
that enabling the UDP checksum yields much better performance on hosts
since we can leverage common supported TX/RX UDP checksum offload in
NICs.

Thanks,
Tom


-- Forwarded message --
From:  <internet-dra...@ietf.org>
Date: Tue, Mar 1, 2016 at 8:43 AM
Subject: New Version Notification for draft-herbert-vxlan-rco-01.txt
To: Tom Herbert <t...@herbertland.com>



A new version of I-D, draft-herbert-vxlan-rco-01.txt
has been successfully submitted by Tom Herbert and posted to the
IETF repository.

Name:   draft-herbert-vxlan-rco
Revision:   01
Title:  Remote checksum offload for VXLAN
Document date:  2016-02-29
Group:  Individual Submission
Pages:  7
URL:
https://www.ietf.org/internet-drafts/draft-herbert-vxlan-rco-01.txt
Status: https://datatracker.ietf.org/doc/draft-herbert-vxlan-rco/
Htmlized:   https://tools.ietf.org/html/draft-herbert-vxlan-rco-01
Diff:   https://www.ietf.org/rfcdiff?url2=draft-herbert-vxlan-rco-01

Abstract:
   This specification describes remote checksum offload for VXLAN.
   Remote checksum offload is a mechanism that provides checksum offload
   of transport checksums in encapsulated packets using rudimentary
   offload capabilities found in most Network Interface Card (NIC)
   devices. The outer UDP checksum is enabled on transmit and, with some
   additional meta data, a receiver is able to deduce the checksum to be
   set in an encapsulated packet. Effectively this offloads the
   computation of the inner checksum which can be a significant
   performance optimization. Enabling the UDP checksum has the
   additional advantage that it covers more of the packet including the
   IP pseudo header and virtual network identifier.




Please note that it may take a couple of minutes from the time of submission
until the htmlized version and diff are available at tools.ietf.org.

The IETF Secretariat

___
nvo3 mailing list
nvo3@ietf.org
https://www.ietf.org/mailman/listinfo/nvo3

Re: [nvo3] I-D Action: draft-ietf-nvo3-geneve-01.txt

2016-01-18 Thread Tom Herbert

On Thu, Jan 14, 2016 at 12:27 PM,   wrote:
>
> A New Internet-Draft is available from the on-line Internet-Drafts 
> directories.
>  This draft is a work item of the Network Virtualization Overlays Working 
> Group of the IETF.
>
> Title   : Geneve: Generic Network Virtualization Encapsulation
> Authors : Jesse Gross
>   Ilango Ganga
> Filename: draft-ietf-nvo3-geneve-01.txt
> Pages   : 26
> Date: 2016-01-14
>
> Abstract:
>Network virtualization involves the cooperation of devices with a
>wide variety of capabilities such as software and hardware tunnel
>endpoints, transit fabrics, and centralized control clusters.  As a
>result of their role in tying together different elements in the
>system, the requirements on tunnels are influenced by all of these
>components.  Flexibility is therefore the most important aspect of a
>tunnel protocol if it is to keep pace with the evolution of the
>system.  This draft describes Geneve, a protocol designed to
>recognize and accommodate these changing capabilities and needs.
>
A couple of comments...

The discussion of efficient implementation (section 2.2.1) seems vague
to me. For example, from the draft:

"As new functionality becomes sufficiently well defined to add to
endpoints, supporting options can be designed using ordering
restrictions and other techniques to ease parsing."

What are these ordering restrictions? Does this mean TLVs have (or
might eventually have) some defined ordering as to how they can appear
in the packet? Would this contradict section 4.3: "Option ordering is
not significant".

What are "other techniques to ease parsing"?

Thanks,
Tom


>
> The IETF datatracker status page for this draft is:
> https://datatracker.ietf.org/doc/draft-ietf-nvo3-geneve/
>
> There's also a htmlized version available at:
> https://tools.ietf.org/html/draft-ietf-nvo3-geneve-01
>
> A diff from the previous version is available at:
> https://www.ietf.org/rfcdiff?url2=draft-ietf-nvo3-geneve-01
>
>
> Please note that it may take a couple of minutes from the time of submission
> until the htmlized version and diff are available at tools.ietf.org.
>
> Internet-Drafts are also available by anonymous FTP at:
> ftp://ftp.ietf.org/internet-drafts/
>
> ___
> nvo3 mailing list
> nvo3@ietf.org
> https://www.ietf.org/mailman/listinfo/nvo3

___
nvo3 mailing list
nvo3@ietf.org
https://www.ietf.org/mailman/listinfo/nvo3

Re: [nvo3] Requesting Next Protocol = 0 for Ethernet [ draft-ietf-nvo3-vxlan-gpe-01.txt ]

2015-11-16 Thread Tom Herbert

On Mon, Nov 16, 2015 at 1:45 PM, Surendra Kumar (smkumar)
 wrote:
>
> Agree. I was just implying that the cost equivalent of discriminating v6/v4 
> via the IP.ver field may already be paid even today.
>
Yes, a stack must verify that the version number matches the protocol
number.The number of conditionals in the path to check protocol and
version is the same regargless of rather there are two protocol
numbers (for IPv4 and IPv6) and version number must be verified, or
one number for IP and version number is used as a discriminator.

Tom

> Surendra.
>
> -Original Message-
> From: Joe Touch [mailto:to...@isi.edu]
> Sent: Monday, November 16, 2015 11:54 AM
> To: Surendra Kumar (smkumar) ; Sandeep Kumar (Sandeep) 
> Relan ; nvo3@ietf.org
> Cc: to...@isi.edu; Shahram Davari ; Anoop Ghanwani 
> ; Larry Kreeger (kreeger) 
> Subject: Re: [nvo3] Requesting Next Protocol = 0 for Ethernet [ 
> draft-ietf-nvo3-vxlan-gpe-01.txt ]
>
>
>
> On 11/15/2015 12:47 PM, Surendra Kumar (smkumar) wrote:
>> I suspect some implementations may check the version even today.
>
> It's not about checking the version number. It's that the version number is 
> what should be used to differentiate different versions of IP, not the outer 
> (lower) layer next-protocol field. That field should really only say "IP", 
> i.e., not "IPv4" or "IPv6".
>
> The reason this didn't happen for Ethernet was because of a perception that 
> hardware would need "deep" packet information, which would be otherwise too 
> complex to extract. That situation was true for the first year or so of 
> deployment, but was quickly overtaken by technology capability.
>
> Yet another reason never to make a protocol design decision for a short-term 
> optimization.
>
>> Btw, wouldn't this lead to allocating yet another code-point in IP
>> protocol name space for tunnels while code points already exist for v4
>> and v6 ? Would that be ok ?
>
> No; I was just remarking that we should have one codepoint for IP, not 
> different ones for IPv4, IPv6, and (eventually) whatever IP versions follow.
>
> Whether we have one or more codepoints for IP in ethernet or other lower 
> layers, there are already codepoints for IP "next protocol" being IP.
> Unfortunately, IP makes the same error - it uses "4" for IPv4 and "41"
> for IPv6; it should really have just stuck with one value.
>
> Joe
>
>
>>
>> Surendra.
>>
>>
>> -Original Message-
>> From: Joe Touch [mailto:to...@isi.edu]
>> Sent: Monday, November 09, 2015 9:46 AM
>> To: Surendra Kumar (smkumar) ; Sandeep Kumar
>> (Sandeep) Relan ; nvo3@ietf.org
>> Cc: to...@isi.edu; Shahram Davari ; Anoop
>> Ghanwani ; Larry Kreeger (kreeger)
>> 
>> Subject: Re: [nvo3] Requesting Next Protocol = 0 for Ethernet [
>> draft-ietf-nvo3-vxlan-gpe-01.txt ]
>>
>> That would be a lot better, though it still suffers from the use of 
>> different codepoints for IPv4 and IPv6.
>>
>> The concept of version numbers shouldn't be this difficult.
>>
>> Joe
>>
>> On 11/7/2015 11:49 AM, Surendra Kumar (smkumar) wrote:
>>> On a related note, seems like the VXLAN-GPE next protocol field can
>>> do away with the new registry and re-use the IP protocol numbers:
>>> http://www.iana.org/assignments/protocol-numbers/protocol-numbers.xht
>>> m
>>> l
>>>
>>> The only one not covered in that name space is NSH, which can be signaled 
>>> over UDP - which also limits the overhead.
>>>
>>> Regards,
>>> Surendra.
>>>
>>> -Original Message-
>>> From: nvo3 [mailto:nvo3-boun...@ietf.org] On Behalf Of Joe Touch
>>> Sent: Thursday, November 05, 2015 3:31 PM
>>> To: Sandeep Kumar (Sandeep) Relan ;
>>> nvo3@ietf.org
>>> Cc: Shahram Davari ; Anoop Ghanwani
>>> ; Larry Kreeger (kreeger) ;
>>> to...@isi.edu
>>> Subject: Re: [nvo3] Requesting Next Protocol = 0 for Ethernet [
>>> draft-ietf-nvo3-vxlan-gpe-01.txt ]
>>>
>>> Question - why are there two next-protocols for IP? That's what the IP 
>>> version number is for.
>>>
>>> (yes, Ethernet messed this up with two different next-proto values,
>>> but it was only because "it's faster in hardware", which ceased to be
>>> true vs looking at the IP version number directly)
>>>
>>> I.e., this mechanism should not require revision to support future versions 
>>> of IP. That's why IP has its own version number field.
>>>
>>> Joe
>>>
>>> On 11/5/2015 2:13 PM, Sandeep Kumar (Sandeep) Relan wrote:
 With Reference to :  draft-ietf-nvo3-vxlan-gpe-01.txt



 Dear Authors,



 I noticed that below request from Shahram (almost 6 weeks ago) has
 not been evaluated and considered in this draft discussion:



 Current draft defines the following Next Protocol values:

Re: [nvo3] draft--pang--nvo3--vxlan--path--detection--01

2015-11-04 Thread Tom Herbert

On Wed, Nov 4, 2015 at 9:31 PM, Haoweiguo  wrote:
> Hi Sam,
>
> The extra bit in VXLAN reserved field has no side effect on regular VXLAN
> forwarding process. The hardware requirements for intermediate nodes is also
> low, the intermediate nodes only need to grab the data packets with the OAM
> flag to control plane using regular ACL, most current commertial chipsets
> can support this behavior.
>

"The extra bit in VXLAN reserved field has no side effect on regular
VXLAN forwarding process."-- I think this is an assumption based on
what implementations are probably doing currently as opposed to what
is specified by the protocol. The VXLAN RFC does not specify a
forwarding process AFAICT. And, as I mentioned yesterday, VXLAN is not
extensible in its current definition.  Unknown reserved bits must be
ignored on reception which means that a receiver that doesn't support
PD bit will assume that the payload is an Ethernet packet and attempt
to deliver it as such.

Given that this is not the first time someone has tried to extend
VXLAN and it's wide deployment, it might be prudent to try to define a
method to extend the protocol without breaking compatibility. The
pseudo packet (cannot just be called a header) trick will work if the
packet is created such that it is guaranteed to be dropped at a
receiver that doesn't understand the extensions (e.g. put a zero in
total length field in case of IP). Obviously any method should support
more than just one extension. This implies that fields must be
precisely ordered, so for an implementation to parse field, it must be
able to calculate the offset of the field. So the order that fields
are defined in VXLAN becomes important and needs to be spelled out.
The current placement of the PD bit probably would have to change for
instance. The easiest thing to do is define the flag bits left to
right, where to process the n'th bit an implementation must understand
the first n-1 bits (see GRE or GUE), but the n+1 bits and beyond can
still be ignored.

Tom

The pseudo header trick for extensibility might work
> Thanks,
>
> weiguo
>
> 
> From: Deepak Kumar (dekumar) [deku...@cisco.com]
> Sent: Wednesday, November 04, 2015 14:51
> To: Sam Aldrin
> Cc: Shahram Davari; nvo3@ietf.org; Dacheng Zhang
>
> Subject: Re: [nvo3] draft--pang--nvo3--vxlan--path--detection--01
>
> HI Sam,
>
> Vxlan field that is used is reserved field and so existing Asic based
> hardware won't add this in transmit but receiving packet with reserved bit
> set has no side effect.
> If hardware is programmable their is no issue even in transmit.
>
> Can you give me example of any Asic implementation which will have problem,
> we can add text for user to be careful before turning  on the solution.
>
> We can even call this extension of vxlan with pd bit.
>
> Thanks,
> Deepak
>
> Sent from my iPhone
>
> On Nov 4, 2015, at 3:09 PM, Sam Aldrin  wrote:
>
> Hi Deepak,
>
> Aren’t you or aren’t you not changing the packet format by introducing PD
> flag bit in the reserved field. i.e changing RFC7348?
> If so, how can you claim to be informational? Is it because RFC is
> informational?
>
> For ex, VXLAN-GPE is in standards track, although it is now in expired
> state.
>
> Irrespective of technical differences, if a specific format is being
> changed, it will impact existing future deployments as well, informational
> or not.
> Being informational does not avoid that.
>
> -sam
>
> On Nov 3, 2015, at 8:03 PM, Deepak Kumar (dekumar) 
> wrote:
>
> HI Sam,
>
> This is good discussion and we are bringing this draft as informatiinal
> draft for narrow scenario for some operators but not for other operators.
>
> Ttl solution is too slow at scale and instead of argument we can give data
> of how much time it takes but for some operator that amount of time is okay
> but for some they have will want it to complete it quickly. As this being
> informational solution it's brought to working group as hardware driven
> controller controlled scenario and make its language may and should so all
> the issues it may cause to software vtep can be fixed.
>
> Why can't software based and hardware based solution co-exist when
> information draft won't force everyone to implement it.
>
> Thanks
> Deepak
>
> Sent from my iPhone
>
> On Nov 4, 2015, at 12:41 PM, Sam Aldrin  wrote:
>
> Hi Deepak,
>
> What you are describing is very narrow scenario, which has its own pitfalls.
> Inline for my comments.
>
> On Nov 3, 2015, at 7:10 PM, Deepak Kumar (dekumar) 
> wrote:
>
> Hi Shahram/Sam,
>
> This solution is hardware centric with controller and policy needs to be
> created on each hop.
> This solution is not applicable for all scenarios.
>
> Policy example
> Match peer vtep ip == destination ip of packet destination  port 4789, pd
> bit action punt and drop.
> Match peer vtep ip!=destination ip

Re: [nvo3] RFC 7637 on NVGRE: Network Virtualization Using Generic Routing Encapsulation

2015-11-01 Thread Tom Herbert

On Sun, Nov 1, 2015 at 1:12 PM, Jesse Gross  wrote:
> Sure, probably all of the hardware implementations have some limits on their
> ability to handle the full breadth of Geneve options. Geneve was
> intentionally designed to be very future proof and support limits beyond
> what I would realistically expect people to implement/use in the near
> future. The more interesting question is whether it is possible to support a
> useful set for what we need today. Unfortunately, I can’t really give
> specifics for different implementations (in the cases that I know) but let
> me try to make some generalizations and you can follow up with individual
> vendors if you need more details.
>
It would great if TLV processing in the HW data path is now a solved
problem and we can soon start productively defining and using TLVs in
various protocols with worry (I would like to at least use IPv6
extensions headers for instance). But, AFAIK that is not the current
reality. Maybe some HW designers on this alias can chime in...

Tom

___
nvo3 mailing list
nvo3@ietf.org
https://www.ietf.org/mailman/listinfo/nvo3

Re: [nvo3] RFC 7637 on NVGRE: Network Virtualization Using Generic Routing Encapsulation

2015-10-30 Thread Tom Herbert

On Fri, Oct 30, 2015 at 1:38 PM, Pankaj Garg <pank...@microsoft.com> wrote:
> Inline.
>
>> -Original Message-
>> From: Tom Herbert [mailto:t...@herbertland.com]
>> Sent: Friday, October 30, 2015 11:57 AM
>> To: Pankaj Garg <pank...@microsoft.com>
>> Cc: Dino Farinacci <farina...@gmail.com>; Manish Kumar (manishkr)
>> <manis...@cisco.com>; Lucy Yong <lucy.y...@huawei.com>;
>> nvo3@ietf.org
>> Subject: Re: [nvo3] RFC 7637 on NVGRE: Network Virtualization Using
>> Generic Routing Encapsulation
>>
>> > [PG] I don't think GUE provides flexibility that is needed for future
>> encapsulation. Network virtualization is mostly used with-in datacenters and
>> in such environments, flexibility is needed to change and innovate rapidly.
>> We need an encapsulation format that provides such flexibility and does not
>> tie our hands.
>>
>> Well, we have already defined seven extensions to GUE. AFAICT adding
>> these was quite straightforward none of these can break forward
>> compatibility, nor NIC offloads, and is amenable to efficient header parsing 
>> in
>> both software and hardware. GUE also allows for private data in the header
>> section for a site or application to insert data with whatever format is 
>> suitable
>> (for instance, if SPUD uses GUE format CBOR data could go here).  But, if you
>> really do see an deficiency in this model that would "tie your hands" please
>> elaborate, we appreciate the feedback!
> [PG] Our network stack consists of multiple layers, where layers can be 
> developed independently (and can even be from separate vendors). Using 
> Geneve/NSH style TLV provides us flexibility of a non-conflicting private 
> data space, where different layers can insert their own data on transmit and 
> process that on reception. How would one achieve this flexibility in GUE?

You can define proprietary options (TLVs or other format) in the
private data space. The interpretation of the this space is agreement
between the sender and the receiver so there should be no ambiguity or
limits to what can be placed there (subject only to header size
constraints).

Tom

>>
>> Thanks,
>> Tom

___
nvo3 mailing list
nvo3@ietf.org
https://www.ietf.org/mailman/listinfo/nvo3

Re: [nvo3] RFC 7637 on NVGRE: Network Virtualization Using Generic Routing Encapsulation

2015-10-30 Thread Tom Herbert

> [PG] Yes, which is what TLVs in NSH/Geneve do but these are part of the 
> format and not something we have to define on the side. Two independent 
> entities can attach their metadata on the same packet without conflicts etc. 
> Eventually, one can take either of these encap protocols and twist/turn them 
> to achieve what is needed, e.g. in Geneve one can define a standard first TLV 
> that has flag based options etc. as well. In GUE, I can create another IETF 
> draft to define "standard" TLV based usage of "private" data space. That is 
> why, I think many people on the alias feel that many of these encap don't 
> have any substantial value over one another, other than semantic differences. 
> That said, out of the box, IMHO, NSH has the most flexibility i.e. define new 
> MDType, using that MDType for flag based options in fixed header and TLVs for 
> variable length.

What I meant to say is that the format of private data could be typed
in GUE via an option. For instance, we could define a data format for
Geneve and then, presto, GUE can carry Geneve TLVs. While personally I
am very leery about deploying any new use cases of TLVs in my data
center (given the lessons learned with IP options and extension
headers), if allowing them in GUE would achieve a compromise that
pulls us closer to defining _one_ NVo3 protocol, then it seems like a
reasonable direction.

Tom

___
nvo3 mailing list
nvo3@ietf.org
https://www.ietf.org/mailman/listinfo/nvo3

Re: [nvo3] RFC 7637 on NVGRE: Network Virtualization Using Generic Routing Encapsulation

2015-10-30 Thread Tom Herbert

> To follow up on Pankaj’s mention of ecosystem support, one comment about the
> viability of TLVs is that whether they are a useful extension mechanism is
> mostly based on the implementer’s perception. If they are seen as an add-on
> that is not really core functionality (as in IPv4 and IPv6), then sure,
> people won’t bother to support them. However, in the case of Geneve, they
> are obviously the major goal of the protocol and they are being implemented,
> in both software and hardware.
>
Jesse,

Your point that TLVs are major goal of Geneve would be much stronger
if you could reference defined TLVs that are critical to the protocol
function. Maybe I'm missing something, but I can't find any defined
TLVs for Geneve on the web at all.

As for efficient hardware implementation, section 2.2.1 of the Geneve
draft does acknowledge the problem for VLH at least:

"The use of a variable length header and options in a protocol often
raises questions about whether it is truly efficiently implementable
in hardware."

Unfortunately, the rest of the section offers no practical guidance to
HW or SW vendors on how to do efficient implementation.  The closest
thing to a mitigation for the potential performance problem is:

"As new functionality becomes sufficiently well defined to add to
endpoints, supporting options can be designed using ordering
restrictions and other techniques to ease parsing."

This is obviously vague. I suppose the ordering restrictions are the
proposed solution for the combinatorial parsing problem of TLVs. But
without details on how there is no teeth behind this idea. As for
these "other techniques"...

> As a result, it’s not an inherent property of TLVs that they are implemented
> or not. In order to make them useful, they need to be used and considered
> important from day 1. Making them available as an additional extensibility
> mechanism pretty much guarantees that they won’t be available in the future,
> as we’ve seen.

Put the VNID into a TLV then you are guaranteed that people will implement them!

Tom

___
nvo3 mailing list
nvo3@ietf.org
https://www.ietf.org/mailman/listinfo/nvo3

Re: [nvo3] RFC 7637 on NVGRE: Network Virtualization Using Generic Routing Encapsulation

2015-10-29 Thread Tom Herbert

> A key limitation that prevents software from using extensions is NIC 
> offloads. Both Geneve and VXLAN-GPE+NSH allows extension of these protocols 
> without breaking NIC offloads.

Can you describe why you think this is? Both Geneve and VXLAN-GPE+NSH
are not usable with most implementations of checksum offloads and
probably TSO. As we demonstrated in GUE it is possible to offload
encapsulated checksums by enabling the other UDP checksum and using
the remote checksum offload option (which we also implemented for
VXLAN).

Tom

___
nvo3 mailing list
nvo3@ietf.org
https://www.ietf.org/mailman/listinfo/nvo3

Re: [nvo3] RFC 7637 on NVGRE: Network Virtualization Using Generic Routing Encapsulation

2015-10-29 Thread Tom Herbert

On Thu, Oct 29, 2015 at 1:19 PM, Pankaj Garg <pank...@microsoft.com> wrote:
> Inline.
>
>> -Original Message-
>> From: Tom Herbert [mailto:t...@herbertland.com]
>> Sent: Thursday, October 29, 2015 12:55 PM
>> To: Pankaj Garg <pank...@microsoft.com>
>> Cc: Dino Farinacci <farina...@gmail.com>; Manish Kumar (manishkr)
>> <manis...@cisco.com>; Lucy Yong <lucy.y...@huawei.com>;
>> nvo3@ietf.org
>> Subject: Re: [nvo3] RFC 7637 on NVGRE: Network Virtualization Using
>> Generic Routing Encapsulation
>>
>> On Thu, Oct 29, 2015 at 12:41 PM, Pankaj Garg <pank...@microsoft.com>
>> wrote:
>> > Inline.
>> >
>> >> -Original Message-
>> >> From: Tom Herbert [mailto:t...@herbertland.com]
>> >> Sent: Thursday, October 29, 2015 12:25 PM
>> >> To: Pankaj Garg <pank...@microsoft.com>
>> >> Cc: Dino Farinacci <farina...@gmail.com>; Manish Kumar (manishkr)
>> >> <manis...@cisco.com>; Lucy Yong <lucy.y...@huawei.com>;
>> nvo3@ietf.org
>> >> Subject: Re: [nvo3] RFC 7637 on NVGRE: Network Virtualization Using
>> >> Generic Routing Encapsulation
>> >>
>> >> > A key limitation that prevents software from using extensions is
>> >> > NIC
>> >> offloads. Both Geneve and VXLAN-GPE+NSH allows extension of these
>> >> protocols without breaking NIC offloads.
>> >>
>> >> Can you describe why you think this is? Both Geneve and VXLAN-
>> GPE+NSH
>> >> are not usable with most implementations of checksum offloads and
>> >> probably TSO. As we demonstrated in GUE it is possible to offload
>> >> encapsulated checksums by enabling the other UDP checksum and using
>> >> the remote checksum offload option (which we also implemented for
>> VXLAN).
>> > [PG] We can define newer TLVs and extend Geneve or NSH without
>> breaking NIC offloads. NIC has to support Geneve/NSH offload in the first
>> place, but after that we can safely extend using TLVs as needed without
>> dealing with NIC offload changes etc. This includes the range of offload from
>> Checksum, LSO, LRO, VMQ, RSS etc.
>>
>> Okay, you are assuming that we would go out an buy NICs that specifically
>> support these protocols in the first place. My belief is that if NIC vendors
>> implement generic offload mechanisms it is a far a better route and could
>> support a multitude of encapsulation protocols (see
> [PG] It is up to hardware vendors to decide how to optimally and/or 
> generically do this in hardware, but from software interface perspective, 
> there is non-trivial amount of work that needs to be done for each new 
> offload. It took significant time to get NVGRE and VXLAN offloads working 
> correctly. We don't want to go through that pain, at least not unless we 
> _have_ to and that is the promise of NSH and Geneve. If we can build a 
> generic interface for offloads that NICs implement, that would be great as 
> well, but I am skeptical of the viability of that given the variation in 
> formats of various protocols.

The generic interfaces already exists. Please look at my paper on UDP
encapsulation. We have simple well defined interfaces to handle four
of the five basic NIC offloads in a generic way (RX csum, TX csum,
RSS, and LSO). The fifth, LRO, would require NIC support per protocol
but it's pretty straightforward to make a software cognate for that
(but LRO has it own problems anyway...). This applies to VXLAN, GUE,
geneve, MPLS/UDP, GRE/UDP, IP/UDP and whatever new encapsulations we
would dream up.

Tom

>> https://na01.safelinks.protection.outlook.com/?url=http%3a%2f%2fpeople.
>> netfilter.org%2fpablo%2fnetdev0.1%2fpapers%2fUDP-Encapsulation-in-
>> Linux.pdf=01%7c01%7cpankajg%40microsoft.com%7cc9c9e5f6433
>> 9c07c08d2e09ad904%7c72f988bf86f141af91ab2d7cd011db47%7c1=MS
>> vIxq6ldUFWP5v0g%2foPBZuC95iJYWPHfirUdYeA%2bPQ%3d).
> [PG] I haven't read this fully yet, but will read it sometime.
>>
>> As for "safely extend using TLVs" have you actually verified that works with
>> HW, performance is unaffected, and this does not create new vectors of DOS
>> attacks? (Given the unmitigated disappointment with IP options I'm very
>> skeptical of and deployment of TLVs at L3 or below in the data center.)
> [PG] We have not tested it but maybe some IHVs who are supporting Geneve (or 
> NSH) offload can comment. I have no technical evidence, though, as to why it 
> would _not_ work (sans some implementation constraints on maximum header size 
> etc, which is pretty reasonable).
>>
>> Thanks,
>> Tom
>>
>> >>
>> >> Tom

___
nvo3 mailing list
nvo3@ietf.org
https://www.ietf.org/mailman/listinfo/nvo3

Re: [nvo3] RFC 7637 on NVGRE: Network Virtualization Using Generic Routing Encapsulation

2015-10-29 Thread Tom Herbert

On Thu, Oct 29, 2015 at 5:19 PM, Lucy yong <lucy.y...@huawei.com> wrote:
>>
>> "GRE-in-UDP
>> I am not really sure where this fits in for network virtualization. It
>> does not have required ecosystem support to be a viable option and
>> does not solve the need for future encapsulations."
>>
>> Counter argument is that there are many GRE applications today that
>> face load balancing issue. GRE-in-UDP encap addresses this issue.
>> NVGRE disadvantage mentioned below is addressed by GRE-in-UDP. In
>> other words, NVGRE can benefit from the gre-in-udp if it will stay long.
>
> [PG] What is the advantage of moving from NVGRE to GRE-in-UDP, as opposed to 
> moving to VXLAN? VXLAN has much better ecosystem support so one would 
> immediately benefit from that, no? Outside of network virtualization 
> GRE-in-UDP may have use cases and I have never questioned those as I am not 
> familiar with those.
> [Lucy]  VXLAN even did not allow encap of non-ethernet inner packet. -Lucy

And GRE has been around for many years, has seen considerable
deployment and operational experience, and demonstrates a HW feasible
mechanism of extensibility for a low level protocol. In the end, both
NVGRE and VXLAN suffer from the same limitation in lack of
extensibility (VXLAN more than GRE), but NVGRE over UDP seems
reasonable to at least define if there is a deployed base of NVGRE
that could benefit.

Tom

>
>>
>> Lucy
>>
>> -Original Message-
>> From: Pankaj Garg [mailto:pank...@microsoft.com]
>> Sent: Thursday, October 29, 2015 2:10 PM
>> To: Dino Farinacci; Manish Kumar (manishkr)
>> Cc: Tom Herbert; Lucy yong; nvo3@ietf.org
>> Subject: RE: [nvo3] RFC 7637 on NVGRE: Network Virtualization Using
>> Generic Routing Encapsulation
>>
>> NVGRE and VXLAN
>> NVGRE is widely used in datacenter networks and there is wide hardware
>> support available for it (both in terms of NICs and Switches). The key
>> advantage VXLAN has over NVGRE is that it allows larger entropy and
>> doesn't require any change in middle boxes to calculate hash and do
>> ECMP. Most network virtualization solutions are supporting both of
>> these today. Even though in long term, things may settle on a single
>> encap format, that is not practically possible in the short term. In
>> my opinion, both NVGRE and VXLAN are here to stay for quite a while
>> given the investment and usage of these technologies. Could we have
>> settled on just one? Yes, we could have, but I am not sure either of these 
>> would be the one (read further).
>>
>> Shortcomings:
>> Around maybe 2-3 years back, we identified that both of these encap
>> have one severe limitation (well VXLAN had two) i.e. ability (or lack
>> thereof) to extend them. In addition, VXLAN even did not allow encap
>> of non-ethernet inner packet. The main problem for network
>> virtualization, however was extensibility. There are a lot of use
>> cases where such extensibility can add significant advantage to
>> encapsulation. It seems other people were thinking of the same at the same 
>> time. This resulted in 3 encap options for "future"
>> encap. VXLAN-GPE + NSH, Geneve and GUE.
>>
>> VXLAN-GPE + NSH, Geneve and GUE
>> Out of the three, I personally prefer the first two, partly because I
>> have been deeply involved in the extensibility design of these but
>> more importantly I feel they offer advantage over GUE. VXLAN-GPE + NSH
>> seems to be the most flexible of the present options as it allows
>> extensibility in hardware and software friendly manner. When we think
>> about extensibility, there is always struggle between hardware and
>> software. While software can evolve much faster, hardware cannot and
>> we have to design something that does not restrict software to
>> innovate but eventually let hardware catch up and use those innovations as 
>> well.
>>
>> A key limitation that prevents software from using extensions is NIC 
>> offloads.
>> Both Geneve and VXLAN-GPE+NSH allows extension of these protocols
>> without breaking NIC offloads. This is a huge advantage given that in
>> a network virtualization environment, typically most endpoints are software.
>> This allows software vendors to do faster innovation without worrying
>> about the whole ecosystem aspects of it. Both Geneve and NSH provide
>> this extensibility using TLV format and they both use exact same TLV
>> format (by choice).
>>
>> However, not all endpoints can be software and eventually for some of
>> those innovations (such as security or OAM etc.) it would be required
>

Re: [nvo3] RFC 7637 on NVGRE: Network Virtualization Using Generic Routing Encapsulation

2015-10-29 Thread Tom Herbert

>> As for "safely extend using TLVs" have you actually verified that works with
>> HW, performance is unaffected, and this does not create new vectors of DOS
>> attacks? (Given the unmitigated disappointment with IP options I'm very
>> skeptical of and deployment of TLVs at L3 or below in the data center.)
> [PG] We have not tested it but maybe some IHVs who are supporting Geneve (or 
> NSH) offload can comment. I have no technical evidence, though, as to why it 
> would _not_ work (sans some implementation constraints on maximum header size 
> etc, which is pretty reasonable).

The technical evidence can be found in evaluating mechanisms of
similar protocols at the same layer. In particular, Geneve TLVs like
IP options and NSH is similar to IP extension headers. Both IP options
and extension headers are currently undeployable in the data center
due to insufficient or incorrect HW support in switches an NICs, and I
haven't (yet) seen a reasonable explanation or implementation showing
that the fate of Geneve TLVs or NSH will be any different.

GUE follows the model of extensibility of GRE. In developing GUE, we
already had a lot of deployment experience with GRE fields (keyid,
csum, seq). The various HW we tested had no issues with them, so we do
believe GUE extensions can be deployed at scale.

Anyway, the model of extensibility is a significant differentiator in
the three NVo3 protocols. I hope there will be some detailed
discussion around this at some point!

Thanks,
Tom


>>
>> Thanks,
>> Tom
>>
>> >>
>> >> Tom

___
nvo3 mailing list
nvo3@ietf.org
https://www.ietf.org/mailman/listinfo/nvo3

Re: [nvo3] Fwd: New Version Notification for draft-matsuhira-me6e-pr-00.txt

2015-10-19 Thread Tom Herbert

On Sun, Oct 18, 2015 at 6:17 PM, Naoki Matsuhira
 wrote:
> This is new proposal for nvo3.
>
This looks like it is potentially a subset of ILA functionality.
Please look at https://www.ietf.org/id/draft-herbert-nvo3-ila-01.txt.
We could encode a VLAN and Ethernet address into an identifier with a
new identifier type.

Thanks,
Tom

> I think nvo3 might find this interesting.
>
> Regards,
>
> Naoki Matsuhira
>
>  Original Message 
> Subject: New Version Notification for draft-matsuhira-me6e-pr-00.txt
> Date: Sat, 17 Oct 2015 22:22:59 -0700
> From: 
> To: Naoki Matsuhira , Naoki Matsuhira
> 
>
>
> A new version of I-D, draft-matsuhira-me6e-pr-00.txt
> has been successfully submitted by Naoki Matsuhira and posted to the
> IETF repository.
>
> Name:   draft-matsuhira-me6e-pr
> Revision:   00
> Title:  Multiple Ethernet - IPv6 address mapping encapsulation - 
> prefix
> resolution
> Document date:  2015-10-18
> Group:  Individual Submission
> Pages:  8
> URL:
> https://www.ietf.org/internet-drafts/draft-matsuhira-me6e-pr-00.txt
> Status: https://datatracker.ietf.org/doc/draft-matsuhira-me6e-pr/
> Htmlized:   https://tools.ietf.org/html/draft-matsuhira-me6e-pr-00
>
>
> Abstract:
>This document specifies Multiple Ethernet - IPv6 address mapping
>encapsulation - Prefix Resolution (ME6E-PR) specification.  ME6E-PR
>makes expantion ethernet network over IPv6 backbone network with
>encapsuation technoogy.  And also, E6ME-PR can stack multiple
>Ethernet networks.  ME6E-PR work on non own routing domain.
>
>
>
>
>
>
> Please note that it may take a couple of minutes from the time of submission
> until the htmlized version and diff are available at tools.ietf.org.
>
> The IETF Secretariat
>
>
> ___
> nvo3 mailing list
> nvo3@ietf.org
> https://www.ietf.org/mailman/listinfo/nvo3

___
nvo3 mailing list
nvo3@ietf.org
https://www.ietf.org/mailman/listinfo/nvo3

[nvo3] Fwd: New Version Notification for draft-herbert-nvo3-ila-01.txt

2015-10-12 Thread Tom Herbert

Changes from 00
  - Moved motivation section earlier in the draft and expanded it
  - ILA addresses (locators) are only used in destination addresses
  - Description the networking virtualization architecture with
reference to NVo3 arch
  - Moved techniques to generate task identifiers to the appendix
  - Update email address

Posted to v6ops also.

Thanks,
Tom

-- Forwarded message --
From:  <internet-dra...@ietf.org>
Date: Mon, Oct 12, 2015 at 10:06 AM
Subject: New Version Notification for draft-herbert-nvo3-ila-01.txt
To: Tom Herbert <t...@herbertland.com>



A new version of I-D, draft-herbert-nvo3-ila-01.txt
has been successfully submitted by Tom Herbert and posted to the
IETF repository.

Name:   draft-herbert-nvo3-ila
Revision:   01
Title:  Identifier-locator addressing for network virtualization
Document date:  2015-10-09
Group:  Individual Submission
Pages:  30
URL:
https://www.ietf.org/internet-drafts/draft-herbert-nvo3-ila-01.txt
Status: https://datatracker.ietf.org/doc/draft-herbert-nvo3-ila/
Htmlized:   https://tools.ietf.org/html/draft-herbert-nvo3-ila-01
Diff:   https://www.ietf.org/rfcdiff?url2=draft-herbert-nvo3-ila-01

Abstract:
   This specification describes identifier-locator addressing (ILA) in
   IPv6 for network virtualization. Identifier-locator addressing
   differentiates between location and identity of a network node. Part
   of an address expresses the immutable identity of the node, and
   another part indicates the location of the node which can be dynamic.
   Identifier-locator addressing can be used to efficiently implement
   overlay networks for network virtualization




Please note that it may take a couple of minutes from the time of submission
until the htmlized version and diff are available at tools.ietf.org.

The IETF Secretariat

___
nvo3 mailing list
nvo3@ietf.org
https://www.ietf.org/mailman/listinfo/nvo3

Re: [nvo3] RFC 7637 on NVGRE: Network Virtualization Using Generic Routing Encapsulation

2015-10-01 Thread Tom Herbert

Hi Pankaj,

Do you think there is any value, intent, or issue for doing NVGRE/UDP
(via https://tools.ietf.org/html/draft-ietf-tsvwg-gre-in-udp-encap-07)

Tom

On Thu, Sep 24, 2015 at 1:43 PM, Pankaj Garg  wrote:
> FYI, NVGRE is published as an information RFC 7637. Your documents that 
> reference NVGRE, please use this RFC number.
>
> Thanks
> Pankaj
>
>> To: ietf-announce at ietf.org, rfc-dist at rfc-editor.org
>> Subject: RFC 7637 on NVGRE: Network Virtualization Using Generic Routing
>> Encapsulation
>> From: rfc-editor at rfc-editor.org
>> Date: Wed, 23 Sep 2015 15:16:09 -0700 (PDT)
>> Archived-at: > announce/KPKrdzVjAGl5H931oM-D1ce71zQ>
>> Cc: rfc-editor at rfc-editor.org
>> Delivered-to: ietf-announce at ietfa.amsl.com
>> List-archive: 
>> List-help: 
>> List-id: "IETF announcement list. No discussions." 
>> List-post: 
>> List-subscribe: ,
>> 
>> List-unsubscribe: ,
>> 
>> Reply-to: ietf at ietf.org
>>
>>   A new Request for Comments is now available in online RFC libraries.
>>
>>
>> RFC 7637
>>
>> Title:  NVGRE: Network Virtualization Using Generic
>> Routing Encapsulation
>> Author: P. Garg, Ed.,
>> Y. Wang, Ed.
>> Status: Informational
>> Stream: Independent
>> Date:   September 2015
>> Mailbox:pankajg at microsoft.com,
>> yushwang at microsoft.com
>> Pages:  17
>> Characters: 40042
>> Updates/Obsoletes/SeeAlso:   None
>>
>> I-D Tag:draft-sridharan-virtualization-nvgre-08.txt
>>
>> URL:https://www.rfc-editor.org/info/rfc7637
>>
>> DOI:http://dx.doi.org/10.17487/RFC7637
>>
>> This document describes the usage of the Generic Routing Encapsulation
>> (GRE) header for Network Virtualization (NVGRE) in multi-tenant data
>> centers.  Network Virtualization decouples virtual networks and addresses
>> from physical network infrastructure, providing isolation and concurrency
>> between multiple virtual networks on the same physical network
>> infrastructure.  This document also introduces a Network Virtualization
>> framework to illustrate the use cases, but the focus is on specifying the 
>> data-
>> plane aspect of NVGRE.
>>
>>
>> INFORMATIONAL: This memo provides information for the Internet
>> community.
>> It does not specify an Internet standard of any kind. Distribution of this
>> memo is unlimited.
>>
>> This announcement is sent to the IETF-Announce and rfc-dist lists.
>> To subscribe or unsubscribe, see
>>   https://www.ietf.org/mailman/listinfo/ietf-announce
>>   https://mailman.rfc-editor.org/mailman/listinfo/rfc-dist
>>
>> For searching the RFC series, see https://www.rfc-editor.org/search For
>> downloading RFCs, see https://www.rfc-editor.org/rfc.html
>>
>> Requests for special distribution should be addressed to either the author of
>> the RFC in question, or to rfc-editor at rfc-editor.org.  Unless specifically
>> noted otherwise on the RFC itself, all RFCs are for unlimited distribution.
>>
>>
>> The RFC Editor Team
>> Association Management Solutions, LLC
>
> ___
> nvo3 mailing list
> nvo3@ietf.org
> https://www.ietf.org/mailman/listinfo/nvo3

___
nvo3 mailing list
nvo3@ietf.org
https://www.ietf.org/mailman/listinfo/nvo3

Re: [nvo3] destination UDP port : draft-ietf-nvo3-vxlan-gpe-00

2015-09-24 Thread Tom Herbert

> What we may want to say, then, is that if a P bit of 0 is used then none of
> the other flags must be set.  This would prevent someone from generating a
> packet with a P bit of 0 and trying to use new GPE features.
>
> [Lucy] The P bit is used for version purpose too.  The rule is if the GPE
> need support new features to Ethernet payload, it has to set P bit and use
> the protocol field to indicate Ethernet payload.
>
> The rule is: GPE accepts a special case of P=0 mode as valid for Ethernet
> Payload., and P =1 (mode) & Protocol = Ethernet (0x03). When a P bit of 0 is
> used, none of the other flags must be set.
>
>
>
> This works under condition/rule that VXLAN protocol enhancement MUST be
> stopped.
>
Lucy,

I'm not sure why the constraint needs to be so stringent. The P bit
can be used to discriminate between VXLAN and VXLAN-GPE received on
the same port. i.e if P bit is set, it is a VXLAN-GPE packet. If it is
not set it is VXLAN packet and the rest of the fields are processed
accordingly. New features can be added to either protocol in this way
using the other reserved bits independently. This allows backwards
compatibility to receive VXLAN on a VXLAN-GPE port (doesn't help with
the forward compatibility problem though).

Tom

___
nvo3 mailing list
nvo3@ietf.org
https://www.ietf.org/mailman/listinfo/nvo3

Re: [nvo3] 回复： Application of a time slot in this ietf meeting//Re: New draft: Path Detection in VXLAN Overlay Network

2015-07-03 Thread Tom Herbert

The Extendable TLV field contains two TLVs. Both of them are set by
the network devices along the transport path.-- this might be
problematic. From draft-ietf-tsvwg-port-use-11.txt:  Ultimately, port
numbers numbers indicate services only to the endpoints, and any
intermediate device that assigns meaning to a value can be
incorrect.. A host may legitimately send UDP packets to the VXLAN
port number that aren't actually VXLAN, so if intermediate devices are
modifying these packets based just on destination couldn't this result
in data corruption? Magic numbers like those defined in SPUD or
https://tools.ietf.org/html/draft-herbert-udp-magic-numbers-00 could
mitigate such an issue.

Tom





On Fri, Jul 3, 2015 at 10:23 AM, Dapeng Liu maxpass...@gmail.com wrote:


 在 2015年7月2日 星期四，上午12:04，Dacheng Zhang 写道：

 HI, Shahram:

 Thanks for the comments.

 See my reply inline please..

 Cheers

 Dacheng

 发件人: Shahram Davari dav...@broadcom.com
 日期: 2015年6月30日 星期二 上午2:12
 至: dacheng de dacheng@alibaba-inc.com
 抄送: nvo3@ietf.org nvo3@ietf.org, bens...@queuefull.net
 bens...@queuefull.net, matthew.bo...@alcatel-lucent.com
 matthew.bo...@alcatel-lucent.com, Dapeng Liu maxpass...@gmail.com
 主题: RE: [nvo3] Application of a time slot in this ietf meeting//Re: New
 draft: Path Detection in VXLAN Overlay Network





 I read your draft and I don't think it can get the information that you
 claim it does.



 For example you have ingress interface TLV to record the ingress port of the
 ingress PE. First of all this interface is already communicated by the
 controller to the Ingress PE.



 Ingress/egress interface are refer to the interfaces on the data path. The
 controller will use another interface to communicate with the PE



 SD What I mean is that the controller is asking the PE to inject this test
 packet to a tunnel via a specific PE interface. So the controller already
 knows which PE interfaces the packet is going to use.


 Dacheng\ That is why For the ingress PE, only EIID is mandatory.



 Secondly if you want to test the Correct IP forwarding you can just use BFD
 for the outer IP. If you like to use UDP, you could do BFD over UDP over IP.

 BFD works at layer 2. It’s mainly used to test whether the layer 2 data path
 is connected. In contrast, our method is focusing on the tracing function.
 (Our solution can help find out which switch on the path is broken.)



 SD The way you have defined it does not test anything in L2, since your
 packet is not exercising any L2 forwarding. Your packet is L3 forwarded all
 the way.


 Dacheng\  I am trying to understand your point and please correct me if I am
 wrong. The purpose of this work is to check the error of paths over a l3
 network. (Pleas see figure 1.)



 Thirdly, your draft can't get the egress interface information of the egress
 PE, since there is nothing (No MAC or IP address) in your VXLAN payload that
 can be forwarded by the egress PE to the egress interface.

 Two TLV (IIID/EIID) could be used to record the ingress/egress interfaces ID
 of current router the OAM packet is  flowing through. For the ingress PE,
 EIID is mandatory while for the egress PE IIID is mandatory. For the
 intermediate router, both IIID and EIID are mandatory. So, for the egress,
 EIID does not have to be transported to the controller.



 SD I know you have TLV. But what I mean is that there is nothing in your
 draft that makes your packets to be L2 forwarded. For example you are not
 doing any (VXLAN/VNI forwarding) of your packet. As an example assume you
 packet arrives at the Final Egress PE.  The Egress PE can’t forward this
 packet based on the IP destination address since it is 127/8.


 Dacheng\ Ok, our objective is to check the paths between vteps. So, the
 egress PE does not have to forward this packet to the tenant. Note we
 mentioned that  for the egress PE only IIID is mandatory.



 Fourthly the path trace that you describe does not work. How does an
 intermediate router know it has to send a copy of this packet to CPU? The
 only method is to use TTL expiry, which already exists in IP trace route.



 In our method, o bit in VxLAN is used to indicate it’s a OAM packet, which
 should be copy to CPU. One OAM packet is able to trace n devices on the
 path, while TTL expire method needs n OAM packets to trace. Of course if you
 assume that the intermediate devices do not support our solution, our
 mechanism will not work properly then.



 SD Are you suggesting that the intermediate routers need to do deep packet
 inspection and after finding this magical bit copy it to CPU? This is a
 layer violation and should NOT be done.  You should not make decision on any
 layer when layers above it are not terminated.



 Basically you are expecting intermediate routers to look in to the packet,
 skip outer Ethernet, Skip IP, check IP payload is UDP, check UDP-Dest port
 is VXLAN, then check a specific bit in VXLAN header in order to decide to
 copy to CPU or not.  This

Re: [nvo3] New draft: Path Detection in VXLAN Overlay Network

2015-06-29 Thread Tom Herbert

On Mon, Jun 29, 2015 at 6:48 AM, Deepak Kumar (dekumar)
deku...@cisco.com wrote:
 Hi Dapeng Liu,

 I support idea of hardware and controller based Path detection and tracking
 where whole network is OAM capable and packet keeps forwarding in hardware.
 I believe you guys presented this solution in ONS summit also.
 You should explicitly call out in draft that this solution is not backward
 compatible and all switches require to be OAM capable in hardware to look at
 new bit to punt and copy the packet even in underlay.

VXLAN-GPE already defines an OAM bit. If all the hardware needs to be
updated anyway, why not just move to that and avoid having to worry
about the compatibility problem?

Thanks,
Tom

 Thanks,
 Deepak

 From: Dapeng Liu maxpass...@gmail.com
 Date: Saturday, June 20, 2015 at 10:04 AM
 To: nvo3@ietf.org
 Subject: [nvo3] New draft: Path Detection in VXLAN Overlay Network

 Hello all,

 We have submitted a draft for path detection in VXLAN  overlay network.
 http://datatracker.ietf.org/doc/draft-pang-nvo3-vxlan-path-detection/

 The draft proposes a method for path detection in VXLAN network and it
 defines the path detection packet format by using one reserve bit in the
 VXLAN header.

 Comments  suggestions are welcomed.


 --
 Dapeng Liu


 ___
 nvo3 mailing list
 nvo3@ietf.org
 https://www.ietf.org/mailman/listinfo/nvo3


___
nvo3 mailing list
nvo3@ietf.org
https://www.ietf.org/mailman/listinfo/nvo3

Re: [nvo3] New draft: Path Detection in VXLAN Overlay Network

2015-06-23 Thread Tom Herbert

On Sat, Jun 20, 2015 at 1:04 PM, Dapeng Liu maxpass...@gmail.com wrote:
 Hello all,

 We have submitted a draft for path detection in VXLAN  overlay network.
 http://datatracker.ietf.org/doc/draft-pang-nvo3-vxlan-path-detection/

 The draft proposes a method for path detection in VXLAN network and it
 defines the path detection packet format by using one reserve bit in the
 VXLAN header.

 Comments  suggestions are welcomed.


PD (1 bit) - Indicates it is a PD packet and needs to be handled as
specified in this document. This does not seem not forward compatible
since per RFC7348 reserved bits are ignored on receipt. Since the
VXLAN draft purposely is putting valid headers in the VXLAN payload, I
don't see anything that would prevent misinterpretation if such a
packet is sent to a legacy device. Other attempts to extend VXLAN have
hit this same problem, and it is probably also an issue in for nvgre.

Thanks,
Tom




 --
 Dapeng Liu


 ___
 nvo3 mailing list
 nvo3@ietf.org
 https://www.ietf.org/mailman/listinfo/nvo3


___
nvo3 mailing list
nvo3@ietf.org
https://www.ietf.org/mailman/listinfo/nvo3

Re: [nvo3] New draft: Path Detection in VXLAN Overlay Network

2015-06-23 Thread Tom Herbert

On Tue, Jun 23, 2015 at 9:24 PM, Liyizhou liyiz...@huawei.com wrote:
 Probably it requires at least the edge nodes to be all upgraded to prevent 
 the leaking to the user. Either the deployment should guarantee that or some 
 mechanism like capability exchange would be used.

 IMHO prohibiting any usage of reserved bits is frustrating and goes against 
 the original intention to even having reserved bits.

Indeed it is frustrating. This is precisely why in GUE we require a
receiver do drop any packet with unknown flag bits set-- this avoids
the forward compatibility problem from the get-go.

In any case, this problem should at least be mentioned in the draft
and what steps need to be taken to avoid issues in practice.

Tom


 Thank,
 Yizhou

 -Original Message-
 From: nvo3 [mailto:nvo3-boun...@ietf.org] On Behalf Of Tom Herbert
 Sent: Tuesday, June 23, 2015 10:21 PM
 To: Dapeng Liu
 Cc: nvo3@ietf.org
 Subject: Re: [nvo3] New draft: Path Detection in VXLAN Overlay Network

 On Sat, Jun 20, 2015 at 1:04 PM, Dapeng Liu maxpass...@gmail.com wrote:
 Hello all,

 We have submitted a draft for path detection in VXLAN  overlay network.
 http://datatracker.ietf.org/doc/draft-pang-nvo3-vxlan-path-detection/

 The draft proposes a method for path detection in VXLAN network and it
 defines the path detection packet format by using one reserve bit in
 the VXLAN header.

 Comments  suggestions are welcomed.


 PD (1 bit) - Indicates it is a PD packet and needs to be handled as 
 specified in this document. This does not seem not forward compatible since 
 per RFC7348 reserved bits are ignored on receipt. Since the VXLAN draft 
 purposely is putting valid headers in the VXLAN payload, I don't see anything 
 that would prevent misinterpretation if such a packet is sent to a legacy 
 device. Other attempts to extend VXLAN have hit this same problem, and it is 
 probably also an issue in for nvgre.

 Thanks,
 Tom




 --
 Dapeng Liu


 ___
 nvo3 mailing list
 nvo3@ietf.org
 https://www.ietf.org/mailman/listinfo/nvo3


 ___
 nvo3 mailing list
 nvo3@ietf.org
 https://www.ietf.org/mailman/listinfo/nvo3

___
nvo3 mailing list
nvo3@ietf.org
https://www.ietf.org/mailman/listinfo/nvo3

Re: [nvo3] 答复: VxLAN Security Consideration

2015-06-03 Thread Tom Herbert

On Wed, Jun 3, 2015 at 2:20 AM, Liuyuanjiao liuyuanj...@huawei.com wrote:
 Dear Tom:

 The GUE can resolve the VNI to be shown, but GUE means another 
 module, not vxlan module. So the vxlan packet or vxlan payload should be 
 encrypted into the GUE payload.
 I feel this is a little heavy for the device and network. But I am not 
 sure for it.

I don't believe it's possible to add payload encryption to VXLAN
without breaking forward compatibility since the protocol is not
extensible (we already saw this with the attempt to add a next header
protocol number, and this is also is the problem with the security
option proposal). So this would mean a new port number which
essentially implies a new protocol anyway.

Maybe it's possible to do this in VXLAN-GPE with some NSH header?

Tom


 Best Regards
 Liu Yuanjiao

 -邮件原件-
 发件人: Tom Herbert [mailto:t...@herbertland.com]
 发送时间: 2015年6月3日 10:59
 收件人: Dacheng Zhang
 抄送: Michael Shieh; David Mozes; Xuxiaohu; nvo3@ietf.org; Liuyuanjiao
 主题: Re: [nvo3] VxLAN Security Consideration

 On Tue, Jun 2, 2015 at 6:57 PM, Dacheng Zhang dacheng@alibaba-inc.com 
 wrote:
  I think both ipsec and dtls would work.

 The middle network is not controlled by customer and the service
 provider, it’s provided by 3nd company, so the environment is not
 trusted, we need to encrypt the VxLAN packets or VxLAN payload for our user 
 data.

 Currently, no such specific method, I think we need to provide one
 way to resolve it.

 A question for Yuanjian, are there any cases in which we need to only
 encrypt the vxlan payloads while transporting the headers in plain
 text? If so, the condition could be a little more complex.

 We have a payload encryption option in GUE using DTLS 
 (https://tools.ietf.org/html/draft-hy-gue-4-secure-transport-02). This allows 
 the GUE headers to be sent in plaintext for inspection by middleboxes. One 
 nice feature is that this is done as an extension to the protocol so that we 
 don't need another port like just doing DTLS over UPD payload would require. 
 We do though need to consider security of the header itself in this case 
 though.

 Tom


 Cheers

 Dacheng







 Best Regards

  Liu Yuanjiao


 ___
 nvo3 mailing list
 nvo3@ietf.org
 https://www.ietf.org/mailman/listinfo/nvo3



 This message is for the designated and authorized recipient only and
 may contain privileged, proprietary, confidential or otherwise private
 information relating to vArmour Networks, Inc. and is the sole
 property of vArmour Networks, Inc.  Any views or opinions expressed
 are solely those of the author and do not necessarily represent those of 
 vArmour Networks, Inc.
 If you have received this message in error, or if you are not
 authorized to receive it, please notify the sender immediately and
 delete the original message and any attachments from your system
 immediately. If you are not a designated or authorized recipient, any
 other use or retention of this message or its contents is prohibited.
 ___ nvo3 mailing list
 nvo3@ietf.org https://www.ietf.org/mailman/listinfo/nvo3

 ___
 nvo3 mailing list
 nvo3@ietf.org
 https://www.ietf.org/mailman/listinfo/nvo3

 ___
 nvo3 mailing list
 nvo3@ietf.org
 https://www.ietf.org/mailman/listinfo/nvo3

___
nvo3 mailing list
nvo3@ietf.org
https://www.ietf.org/mailman/listinfo/nvo3

Re: [nvo3] VxLAN Security Consideration

2015-06-02 Thread Tom Herbert

On Tue, Jun 2, 2015 at 6:57 PM, Dacheng Zhang
dacheng@alibaba-inc.com wrote:
  I think both ipsec and dtls would work.

 The middle network is not controlled by customer and the service
 provider, it’s provided by 3nd company, so the environment is not trusted,
 we need to encrypt the VxLAN packets or VxLAN payload for our user data.

 Currently, no such specific method, I think we need to provide one way
 to resolve it.

 A question for Yuanjian, are there any cases in which we need to only
 encrypt the vxlan payloads while transporting the headers in plain text? If
 so, the condition could be a little more complex.

We have a payload encryption option in GUE using DTLS
(https://tools.ietf.org/html/draft-hy-gue-4-secure-transport-02). This
allows the GUE headers to be sent in plaintext for inspection by
middleboxes. One nice feature is that this is done as an extension to
the protocol so that we don't need another port like just doing DTLS
over UPD payload would require. We do though need to consider security
of the header itself in this case though.

Tom


 Cheers

 Dacheng







 Best Regards

  Liu Yuanjiao


 ___
 nvo3 mailing list
 nvo3@ietf.org
 https://www.ietf.org/mailman/listinfo/nvo3



 This message is for the designated and authorized recipient only and may
 contain privileged, proprietary, confidential or otherwise private
 information relating to vArmour Networks, Inc. and is the sole property of
 vArmour Networks, Inc.  Any views or opinions expressed are solely those of
 the author and do not necessarily represent those of vArmour Networks, Inc.
 If you have received this message in error, or if you are not authorized to
 receive it, please notify the sender immediately and delete the original
 message and any attachments from your system immediately. If you are not a
 designated or authorized recipient, any other use or retention of this
 message or its contents is prohibited.
 ___ nvo3 mailing list
 nvo3@ietf.org https://www.ietf.org/mailman/listinfo/nvo3

 ___
 nvo3 mailing list
 nvo3@ietf.org
 https://www.ietf.org/mailman/listinfo/nvo3


___
nvo3 mailing list
nvo3@ietf.org
https://www.ietf.org/mailman/listinfo/nvo3

Re: [nvo3] [trill] Fwd: Mail regarding draft-ietf-trill-over-ip

2015-05-05 Thread Tom Herbert

On Tue, May 5, 2015 at 10:53 AM, Joe Touch to...@isi.edu wrote:


 On 5/5/2015 9:39 AM, Templin, Fred L wrote:
 Hi Joe,
 ..
 IP in UDP adds only port numbers and an Internet checksum.

 That doesn't address fragmentation; if outer fragmentation is assumed,
 IPv4 needs to be rate-limited to avoid ID collisions and the Internet
 checksum is insufficient to correct those collisions.

 Right - that is why we have GUE. But, when these functions are not
 needed GUE can perform header compression and the result looks
 exactly like IP in UDP.

 That seems impossible.

 The outer IP header indicates UDP as next-protocol, and GUE based on the
 port number.

 You can't then compress the GUE header to nothing. You still need at
 least one bit somewhere to indicate compressed GUE header, and there's
 nothing left.

As I described previously, the first two bits of GUE header are
version number. If we reserve version 0x1, then an IPv6 or IPv4 can be
directly encapsulated on the same port with GUE (version 0)-- this is
what Fred means by GUE header compression. No new port is needed for
this and it is backwards compatible with GUE.

 And no, I don't think compressed GUE qualifies as an independently
 useful service that warrants a separate UDP port.

 Joe

 ___
 nvo3 mailing list
 nvo3@ietf.org
 https://www.ietf.org/mailman/listinfo/nvo3

___
nvo3 mailing list
nvo3@ietf.org
https://www.ietf.org/mailman/listinfo/nvo3

Re: [nvo3] [trill] Fwd: Mail regarding draft-ietf-trill-over-ip

2015-05-05 Thread Tom Herbert

On Tue, May 5, 2015 at 11:47 AM, Templin, Fred L
fred.l.temp...@boeing.com wrote:
 Hi Joe,

 -Original Message-
 From: Joe Touch [mailto:to...@isi.edu]
 Sent: Tuesday, May 05, 2015 11:26 AM
 To: Templin, Fred L; Xuxiaohu; Donald Eastlake; tr...@ietf.org
 Cc: nvo3@ietf.org; int-a...@ietf.org; s...@ietf.org
 Subject: Re: [trill] Fwd: Mail regarding draft-ietf-trill-over-ip

 On 5/5/2015 11:04 AM, Templin, Fred L wrote:
  Hi Joe,

  -Original Message-
  From: Joe Touch [mailto:to...@isi.edu]
  Sent: Tuesday, May 05, 2015 10:54 AM
  To: Templin, Fred L; Xuxiaohu; Donald Eastlake; tr...@ietf.org
  Cc: nvo3@ietf.org; int-a...@ietf.org; s...@ietf.org
  Subject: Re: [trill] Fwd: Mail regarding draft-ietf-trill-over-ip

  On 5/5/2015 9:39 AM, Templin, Fred L wrote:
  Hi Joe,
  ..
  IP in UDP adds only port numbers and an Internet checksum.

  That doesn't address fragmentation; if outer fragmentation is assumed,
  IPv4 needs to be rate-limited to avoid ID collisions and the Internet
  checksum is insufficient to correct those collisions.

  Right - that is why we have GUE. But, when these functions are not
  needed GUE can perform header compression and the result looks
  exactly like IP in UDP.

  That seems impossible.

  Not impossible - Tom Herbert provided the solution:

  http://www.ietf.org/mail-archive/web/int-area/current/msg04593.html

 That is allocating bits (or bit patterns) from the IP header.

 The solution provided - to check for 0x01 - is incorrect. IP can have
 versions that include 0x10 and 0x11.

 The version field in both IPv4 and IPv6 have that bit set to 1. If GUE
 then deems that bit to indicate direct IP encapsulation, then there
 is no need for a GUE header of length greater than 0.

 You may say that future IP protocol versions might not have that bit
 set in the version field. But, the version bits for IPv4 and IPv6 will
 never change (by definition) and we do not see a new IP protocol
 version replacing IPv4 or IPv6 on the near-term horizon.

 Even if a new IP protocol version emerged with the direct IP
 encapsulation bit set to 0, that version can still be accommodated
 by GUE. It's just that direct encapsulation cannot be used and a
 non-zero-length GUE header is needed.

Or just define a simple version translation as part of encapsulation.
So for IPv8:

0x1000-0x0101 on encapsulation
0x0101-0x1000 on decapsualtion

 Thanks - Fred
 fred.l.temp...@boeing.com

 The only solution would be to say that if the first three bits were 0,
 then it's not an IP packet - but that would require reassigning 0x
 and 0x0001 for GUE purposes.

 Although that's possible, I don't see why we would allocate IP versions
 to GUE message types.

 Joe

 ___
 trill mailing list
 tr...@ietf.org
 https://www.ietf.org/mailman/listinfo/trill

___
nvo3 mailing list
nvo3@ietf.org
https://www.ietf.org/mailman/listinfo/nvo3

Re: [nvo3] [trill] Fwd: Mail regarding draft-ietf-trill-over-ip

2015-05-05 Thread Tom Herbert

On Tue, May 5, 2015 at 11:26 AM, Joe Touch to...@isi.edu wrote:

 On 5/5/2015 11:04 AM, Templin, Fred L wrote:
 Hi Joe,

 -Original Message-
 From: Joe Touch [mailto:to...@isi.edu]
 Sent: Tuesday, May 05, 2015 10:54 AM
 To: Templin, Fred L; Xuxiaohu; Donald Eastlake; tr...@ietf.org
 Cc: nvo3@ietf.org; int-a...@ietf.org; s...@ietf.org
 Subject: Re: [trill] Fwd: Mail regarding draft-ietf-trill-over-ip

 On 5/5/2015 9:39 AM, Templin, Fred L wrote:
 Hi Joe,
 ..
 IP in UDP adds only port numbers and an Internet checksum.

 That doesn't address fragmentation; if outer fragmentation is assumed,
 IPv4 needs to be rate-limited to avoid ID collisions and the Internet
 checksum is insufficient to correct those collisions.

 Right - that is why we have GUE. But, when these functions are not
 needed GUE can perform header compression and the result looks
 exactly like IP in UDP.

 That seems impossible.

 Not impossible - Tom Herbert provided the solution:

 http://www.ietf.org/mail-archive/web/int-area/current/msg04593.html

 That is allocating bits (or bit patterns) from the IP header.

 The solution provided - to check for 0x01 - is incorrect. IP can have
 versions that include 0x10 and 0x11.

It is correct as we defined it-- this is a solution to support direct
encapsulation of only IPv4 and IPv6. This optimizes encapsulation
those two IP protocols, and does not effect encapsulation of other
protocols (including new versions of IP) using the GUE and header and
protocol field. Whether this optimization is worth it is a fair
question, but if this obviates the need for a separate port number in
UDP-IP maybe it would be.

 The only solution would be to say that if the first three bits were 0,
 then it's not an IP packet - but that would require reassigning 0x
 and 0x0001 for GUE purposes.

 Although that's possible, I don't see why we would allocate IP versions
 to GUE message types.

We're not proposing that at all.

 Joe

 ___
 trill mailing list
 tr...@ietf.org
 https://www.ietf.org/mailman/listinfo/trill

___
nvo3 mailing list
nvo3@ietf.org
https://www.ietf.org/mailman/listinfo/nvo3

Re: [nvo3] Encapsulation considerations

2015-04-16 Thread Tom Herbert

On Wed, Apr 15, 2015 at 5:42 PM, Erik Nordmark nordm...@acm.org wrote:
 On 4/9/15 10:56 AM, Tom Herbert wrote:

 On Thu, Apr 9, 2015 at 10:46 AM, Erik Nordmark nordm...@acm.org wrote:

 I thought the purpose of RFS was to send the packet (and associated
 interrupt) to the CPU where the application process is running. That
 implies
 an exact flow lookup. Some hash value, whether computed by the receiving
 NIC
 or whether in some entropy field in the packet (computed by the sender or
 encapsulator) would not suffice for that purpose.

 RFS is technically a best effort mechanism where exact flow lookup is
 not necessary, and in many cases the device won't even be able to
 determine the actual transport like if the encapsulated packet was
 encrypted. What we need for this to work is a very low probability of
 collisions among active traffic, an occasional should be that a few
 packets may be routed to the wrong CPU. It still works, but is
 slightly suboptimal for those packets. There have been some related
 issues reported on Linux netdev that 16-bits of indirection indexed by
 hash value is not enough.

 Of course, a hash value (e.g., from the entropy field) is useful for RSS.

 Same value is used for RFS. NICs provide a 32 bit hash over 5-tuple.

 Ah - I must have read the Linux description to quickly.
 A hash works to spread the flows, and if the process is then scheduled on
 the same CPU as where the packets arrive performance improves.

 My point was that if you first one to schedule/bind the process to a CPU and
 later ensure that the packets arrive on said CPU, then you need an exact
 match and not a hash.

If the hash entropy is large enough to make collisions near zero, then
the hash can be used a unique flow identifier.
With a 32 bit hash value and say 10K active connections, there are
very few collisions. The binding something like is Hash-Queue-CPU,
so host can program which queue to receive packets on that match the
hash.

In reality though, exact flow match is a fleeting concept in devices
and the hash is really want we need anyway. We really don't want
devices to need to parse every encapsulation protocol to find inner
transport headers, and besides when we start encrypting encapsulated
packets they won't be able to get to the exact flow anyway. Also, it's
important that system can adapt to seeing flows whose outer hashes
change (like when UDP source port changes). SPUD is also compelling
here as a way to provide a specific more persistent and explicit flow
identification which is needed for stateful inspection/firewalls for
encapsulated transports.

Tom

 Thanks,
Erik



 Tom

 Regards,
 Erik

 ___
 nvo3 mailing list
 nvo3@ietf.org
 https://www.ietf.org/mailman/listinfo/nvo3



___
nvo3 mailing list
nvo3@ietf.org
https://www.ietf.org/mailman/listinfo/nvo3

Re: [nvo3] Encapsulation considerations

2015-04-09 Thread Tom Herbert

On Thu, Apr 9, 2015 at 10:46 AM, Erik Nordmark nordm...@acm.org wrote:
 On 4/8/15 8:11 PM, Lizhong Jin wrote:

 [Lizhong] If the NVE and tenant is integrated into one device, then the
 issue could
 be solved by implementation. Because tenant know the entropy value of the
 first
 segment, and use the same value to the subsequent segment. So different
 implementation model could provide different entropy value. Or do we need
 other
 mechanism to mitigate this issue, e.g., fragment on NVE in
 draft-herbert-gue-fragmentation.

 Lizhong,

 My point was more fundamental. Today on the Internet there are routers which
 do hashing for LAG or ECMP purposes. They have the same issues with
 fragmented packets.


 For section 18.1.1, I suggest RFS should be analyzed. E.g., if NIC
 want to do flow steering based on the inner header information, then
 it could use entropy value instead of the inner header information.

 Did you mean Receive Side Scaling (RSS)? That is mentioned in the
 document -
 section 18.1.1 . Perhaps the RSS text can be improved.
 Receive Flow Scaling is the name of a Linux software technique to take
 into
 account the CPU on which the application is running, which is different.

 [Lizhong] I am referring to hardware based flow steering, similar with
 Accelerated
 RFS (Receive Flow Steering). In the NIC I/O virtualization, the NIC will
 directly trigger
 interrupt to the CPU where application is running by looking up a flow
 table. We will
 use the inner header to do the flow table lookup, if hardware could not
 parse the
 inner header, then entropy value in the encapsulation is required. Any
 design of
 hiding inner header information should consider above implementation.


 I thought the purpose of RFS was to send the packet (and associated
 interrupt) to the CPU where the application process is running. That implies
 an exact flow lookup. Some hash value, whether computed by the receiving NIC
 or whether in some entropy field in the packet (computed by the sender or
 encapsulator) would not suffice for that purpose.

RFS is technically a best effort mechanism where exact flow lookup is
not necessary, and in many cases the device won't even be able to
determine the actual transport like if the encapsulated packet was
encrypted. What we need for this to work is a very low probability of
collisions among active traffic, an occasional should be that a few
packets may be routed to the wrong CPU. It still works, but is
slightly suboptimal for those packets. There have been some related
issues reported on Linux netdev that 16-bits of indirection indexed by
hash value is not enough.

 Of course, a hash value (e.g., from the entropy field) is useful for RSS.

Same value is used for RFS. NICs provide a 32 bit hash over 5-tuple.

Tom


 Regards,
Erik

 ___
 nvo3 mailing list
 nvo3@ietf.org
 https://www.ietf.org/mailman/listinfo/nvo3

___
nvo3 mailing list
nvo3@ietf.org
https://www.ietf.org/mailman/listinfo/nvo3

Re: [nvo3] Encapsulation considerations

2015-04-08 Thread Tom Herbert

On Tue, Apr 7, 2015 at 12:02 AM, Lizhong Jin lizho@gmail.com wrote:
 Hi Erik,
 Thanks for the draft. I suggest to add one consideration: the generation of
 entropy value.
 1. When the node receive an UDP/TCP packet from the tenant, then entropy
 value could be a hash value of 5-tuple.
 2. When the node receive an IP packet from the tenant, then entropy value
 could be a hash value of 2-tuple.
 3. When the node receive an Ethernet packet from the tenant, then entropy
 value could be a hash value of MAC address.
 4. When the node receive an IP fragmented packet from the tenant, the first
 fragment is UDP/TCP, and entropy will be from 5-tuple. But the second and
 subsequent segments will generate entropy from 2-tuple, which will have
 different entropy value with the first segment. The issue could not be
 resolved currently if NVE is separated from tenant into two physical
 devices.

This is a problem common to IP fragmentation. If the NVE is doing the
fragmentation, we have the option (like in
draft-herbert-gue-fragmentation-00) of fragmenting before the packet
before encapsulating, so that each fragment still uses the same
encapsulation and entropy value.

 For section 18.1.1, I suggest RFS should be analyzed. E.g., if NIC want to
 do flow steering based on the inner header information, then it could use
 entropy value instead of the inner header information.

Yes, that is the intent. There is some risk in that over a single
tunnel we would only have 14 bits of entropy but possibly many
thousands of flows so that there are collisions. This shouldn't be any
issues as long as the implementation take collisions into account and
the cost of collisions is low enough. Some alternatives are use IPv6
flow label for addiotnal 20 bits of entropy, or parse the inner packet
if we really need to uniquely identify the flow.

Tom

 Regards
 Lizhong


 -Original Message-
 From: Erik Nordmark [mailto:nordm...@sonic.net]
 Sent: 2015年3月26日 5:01
 To: nvo3@ietf.org
 Subject: [nvo3] Encapsulation considerations


 I presented part of this at the most recent NVO3 interim meeting.The full
 12
 areas of considerations where presented at RTGWG earlier this week.
   The draft is
 http://datatracker.ietf.org/doc/draft-rtg-dt-encap/
   and the slides are at
http://www.ietf.org/proceedings/92/slides/slides-92-rtgwg-8.pdf

 There is probably additional things in there to consider for NVO3, and
 advice
 that can be reused to make it easier to move NVO3 forward.

 Regards,
 Erik





 ___
 nvo3 mailing list
 nvo3@ietf.org
 https://www.ietf.org/mailman/listinfo/nvo3

___
nvo3 mailing list
nvo3@ietf.org
https://www.ietf.org/mailman/listinfo/nvo3

Re: [nvo3] Fwd: New Version Notification for draft-herbert-gue-fragmentation-00.txt

2015-04-03 Thread Tom Herbert

On Wed, Apr 1, 2015 at 12:55 PM, Joe Touch to...@isi.edu wrote:


 On 3/25/2015 9:58 PM, Tom Herbert wrote:
 This draft describes a fragmentation option for GUE. The option is
 intended for use cases where GUE is used over a network where we might
 not be able to control or know what the link MTUs in a tunnel are.
 This also provides a answer to the interesting degenerative case where
 someone configures an MTU of 1280 on the link and there is an attempt
 to encapsulate an IPv6 packet of size 1280-- in this case the packet
 size + encapsulation  link MTU  we cannot send a ICMP PTB since 1280
 is specified minimum MTU for IPv6.

 This describes only the mechanics of fragmentation/reassembly in GUE.
 It does cover the the semantics of use such as how to determine tunnel
 Path MTU, when to fragment.

 FWIW, there are a few issues that should be addressed. RFC4459 suggests
 a taxonomy, but it's informational (and IMO incorrect) on a few points.

 Below are suggestions.

 Joe

 ---

 Section 1, also Sec 3.1:

 There are two separate trigger points for MTU handling:

 A- the tunnel path MTU
 this is defined by the protocols over
 which the tunnel is defined to operate

 B- the tunnel egress reassembly maximum
 this is specified by the tunnel protocol
 description

 The following cases should be handled as follows:

 1. encaps packet = A
 send as-is

 2. A  encaps packet = B
 fragment and reassemble

 3. encaps packet B
 drop and send PTB


Thanks, Joe. We can probably add some general guidelines like this to
the draft. I think #3 is provided for when the link MTU of tunnel =
B.

 If you don't want to ever do frag/reassembly, then make sure A=B.

 Note that A cannot match B whenever a tunnel mechanism is used
 recursively, e.g., GUE over GUE, even indirectly (GUE over IP over GUE).


Each tunneling layer should have its own concept of both link MTU and
path MTU for the tunnel. So the link MTU in an inner tunnel could
define the path MTU in an outer tunnel.

 ---

 The wrap of the IPv4 frag field and the impact of avoiding fragmentation
 is described in RFC 6864.

Will add a reference to that RFC.

Thanks,
Tom

___
nvo3 mailing list
nvo3@ietf.org
https://www.ietf.org/mailman/listinfo/nvo3

[nvo3] Fwd: New Version Notification for draft-herbert-gue-encap-considerations-01.txt

2015-03-26 Thread Tom Herbert

This draft provides a description of how GUE addresses the
considerations that were enumerated the Encapsulation Considerations
draft recently produced by the encaps design team. The intent is that
this draft will help in evaluation of GUE protocol.

Thanks,
Tom



-- Forwarded message --
From:  internet-dra...@ietf.org
Date: Thu, Mar 26, 2015 at 7:57 AM
Subject: New Version Notification for
draft-herbert-gue-encap-considerations-01.txt
To: Tom Herbert t...@herbertland.com, Osama Zia osa...@microsoft.com



A new version of I-D, draft-herbert-gue-encap-considerations-01.txt
has been successfully submitted by Tom Herbert and posted to the
IETF repository.

Name:   draft-herbert-gue-encap-considerations
Revision:   01
Title:  Encapsulation Considerations for GUE
Document date:  2015-03-26
Group:  Individual Submission
Pages:  13
URL:
http://www.ietf.org/internet-drafts/draft-herbert-gue-encap-considerations-01.txt
Status:
https://datatracker.ietf.org/doc/draft-herbert-gue-encap-considerations/
Htmlized:
http://tools.ietf.org/html/draft-herbert-gue-encap-considerations-01
Diff:
http://www.ietf.org/rfcdiff?url2=draft-herbert-gue-encap-considerations-01

Abstract:
   This document provides a description of how Generic UDP Encapsulation
   addresses the encapsulation considerations that are described in the
   Encapsulation Considerations Internet Draft.




Please note that it may take a couple of minutes from the time of submission
until the htmlized version and diff are available at tools.ietf.org.

The IETF Secretariat

___
nvo3 mailing list
nvo3@ietf.org
https://www.ietf.org/mailman/listinfo/nvo3

[nvo3] Fwd: New Version Notification for draft-herbert-gue-fragmentation-00.txt

2015-03-25 Thread Tom Herbert

This draft describes a fragmentation option for GUE. The option is
intended for use cases where GUE is used over a network where we might
not be able to control or know what the link MTUs in a tunnel are.
This also provides a answer to the interesting degenerative case where
someone configures an MTU of 1280 on the link and there is an attempt
to encapsulate an IPv6 packet of size 1280-- in this case the packet
size + encapsulation  link MTU  we cannot send a ICMP PTB since 1280
is specified minimum MTU for IPv6.

This describes only the mechanics of fragmentation/reassembly in GUE.
It does cover the the semantics of use such as how to determine tunnel
Path MTU, when to fragment.

Thanks,
Tom


-- Forwarded message --
From:  internet-dra...@ietf.org
Date: Wed, Mar 25, 2015 at 9:09 PM
Subject: New Version Notification for draft-herbert-gue-fragmentation-00.txt
To: Tom Herbert t...@herbertland.com, Fred L. Templin fltemp...@acm.org



A new version of I-D, draft-herbert-gue-fragmentation-00.txt
has been successfully submitted by Tom Herbert and posted to the
IETF repository.

Name:   draft-herbert-gue-fragmentation
Revision:   00
Title:  Fragmentation option for Generic UDP Encapsulation
Document date:  2015-03-25
Group:  Individual Submission
Pages:  12
URL:
http://www.ietf.org/internet-drafts/draft-herbert-gue-fragmentation-00.txt
Status:
https://datatracker.ietf.org/doc/draft-herbert-gue-fragmentation/
Htmlized:   http://tools.ietf.org/html/draft-herbert-gue-fragmentation-00


Abstract:
   This specification describes a fragmentation and reassembly
   capability with an associated header option for Generic UDP
   Encapsulation.




Please note that it may take a couple of minutes from the time of submission
until the htmlized version and diff are available at tools.ietf.org.

The IETF Secretariat

___
nvo3 mailing list
nvo3@ietf.org
https://www.ietf.org/mailman/listinfo/nvo3

Re: [nvo3] IPR poll for draft-herbert-gue-03

2015-03-20 Thread Tom Herbert

I am not aware of any relevant IPR for GUE.

Thanks,
Tom


On Fri, Mar 20, 2015 at 1:55 PM, Bocci, Matthew (Matthew)
matthew.bo...@alcatel-lucent.com wrote:
 This mail starts an IPR poll on
 draft-herbert-gue-03.

 Authors, are you aware of any IPR that applies to
 draft-herbert-gue-03?

 If so, has this IPR been disclosed in compliance with IETF IPR rules
 (see RFCs 3979, 4879, 3669 and 5378 for more details)?

 If you are listed as a document author or contributor please respond to
 this email stating of whether or not you are aware of any relevant
 IPR. The response needs to be sent to the NVO3 WG mailing list. The
 document will not advance to the next stage until a response
 has been received from each author and each contributor.

 If you are on the NVO3 WG email list but are not listed as an author or
 contributor, then please explicitly respond only if you are aware of any
 IPR that has not yet been disclosed in conformance with IETF rules.

 We will close this poll on 4th  April 2015 or later if we have not heard
 from
 all authors/contributors by then.

 Matthew  Benson

 ___
 nvo3 mailing list
 nvo3@ietf.org
 https://www.ietf.org/mailman/listinfo/nvo3


___
nvo3 mailing list
nvo3@ietf.org
https://www.ietf.org/mailman/listinfo/nvo3

[nvo3] Suggestion to apply encaps DT work to NVO3

2015-03-14 Thread Tom Herbert

Hello,

I believe (perhaps with some bias!) that the encaps design team did a
fairly thorough job in enumerating the key common issues of
encapsulation. I think this work is useful to apply to selecting a
data plane protocol for NVO3.

I would like suggest that a way forward with the NVO3 data plane
discussions is that for each of the proposed encapsulation protocols a
statement is produced about how it addresses or considers each the
common issues. This would be good for discussion and would be helpful
in evaluating the merits of these protocols.

The twelve common issues the design team came up with are:

1. How to provide entropy for ECMP
2. Next header indication
3. Packet size and fragmentation/reassembly
4. OAM - what support needed in an encapsulation format?
5. Security and privacy
6. QoS
7. Congestion Considerations
8. Header and data protection - UDP or header checksums
9. Extensibility - for OAM, security, and/or congestion control
10. Layering of multiple encapsulations
11. Service model
12. Hardware Friendly

More details about potential considerations in these areas are in the
draft https://datatracker.ietf.org/doc/draft-rtg-dt-encap/

Thanks,
Tom

___
nvo3 mailing list
nvo3@ietf.org
https://www.ietf.org/mailman/listinfo/nvo3

[nvo3] Fwd: New Version Notification for draft-herbert-gue-03.txt

2015-03-07 Thread Tom Herbert

Hello,

We have posted a new version of the GUE draft. The major change in
this version we have changed the Private flags to be Extended flags,
and define how variable private data may be part of the header.

Thanks,
Tom




-- Forwarded message --
From:  internet-dra...@ietf.org
Date: Fri, Mar 6, 2015 at 2:11 PM
Subject: New Version Notification for draft-herbert-gue-03.txt
To: Tom Herbert t...@herbertland.com, Osama Zia osa...@microsoft.com



A new version of I-D, draft-herbert-gue-03.txt
has been successfully submitted by Tom Herbert and posted to the
IETF repository.

Name:   draft-herbert-gue
Revision:   03
Title:  Generic UDP Encapsulation
Document date:  2015-03-06
Group:  Individual Submission
Pages:  24
URL:http://www.ietf.org/internet-drafts/draft-herbert-gue-03.txt
Status: https://datatracker.ietf.org/doc/draft-herbert-gue/
Htmlized:   http://tools.ietf.org/html/draft-herbert-gue-03
Diff:   http://www.ietf.org/rfcdiff?url2=draft-herbert-gue-03

Abstract:
   This specification describes Generic UDP Encapsulation (GUE), which
   is a scheme for using UDP to encapsulate packets of arbitrary IP
   protocols for transport across layer 3 networks. By encapsulating
   packets in UDP, specialized capabilities in networking hardware for
   efficient handling of UDP packets can be leveraged. GUE specifies
   basic encapsulation methods upon which higher level constructs, such
   tunnels and overlay networks for network virtualization, can be
   constructed. GUE is extensible by allowing optional data fields as
   part of the encapsulation, and is generic in that it can encapsulate
   packets of various IP protocols.




Please note that it may take a couple of minutes from the time of submission
until the htmlized version and diff are available at tools.ietf.org.

The IETF Secretariat

___
nvo3 mailing list
nvo3@ietf.org
https://www.ietf.org/mailman/listinfo/nvo3

Re: [nvo3] [sfc] VxLAN-gpe vs nvo3

2015-03-04 Thread Tom Herbert

https://www.cs.purdue.edu/homes/kompella/publications/infocom13.pdf

On Wed, Mar 4, 2015 at 2:55 PM, Larry Kreeger (kreeger)
kree...@cisco.com wrote:
 Hi Sunil,

 It seems like your question would be more appropriate for the NVO3 mailer
 than the SFC one (so I added NVO3).

 Note that the ideas covered in draft-chen-nvo3-load-banlancing do not seem
 to be part of the NVO3 charter, nor is it part of the NVO3 problem
 statement, nor has it been discussed on the NVO3 mailer to my knowledge.  I
 did take a look at the draft, and I am skeptical of its cost/benefit for
 implementation.

Yes, it seems like a lot of additional protocol machinery. If we
didn't have to worry about OOO packets then simply modulating the UDP
source port will allow packets in the same flow to go over different
ECMP paths. In fact, we are seeing renewed interested in this for
packet spraying in DCs. UDP encap is compelling because of the ability
source route via the source port.

https://www.cs.purdue.edu/homes/kompella/publications/infocom13.pdf

Tom

  - Larry

 From: Sunil Vallamkonda suni...@f5.com
 Date: Wednesday, March 4, 2015 2:30 PM
 To: s...@ietf.org s...@ietf.org
 Subject: [sfc] VxLAN-gpe vs nvo3

 Per http://tools.ietf.org/html/draft-quinn-sfc-nsh-07

 Section 11.2, VxLAN-gpe can be an encapsulation.

 However,  how does the VxLAN-gpe header co-exist with overlapping VxLAN
 header modification by nvo3:

 https://datatracker.ietf.org/doc/draft-chen-nvo3-load-banlancing/



 Sunil








 ___
 nvo3 mailing list
 nvo3@ietf.org
 https://www.ietf.org/mailman/listinfo/nvo3


___
nvo3 mailing list
nvo3@ietf.org
https://www.ietf.org/mailman/listinfo/nvo3

Re: [nvo3] is it resonsable that draft-dunbar-nvo3-nva-mapping-distribution-01 suggests NVE using bit-map to represent its supported VNs?

2015-01-29 Thread Tom Herbert

On Thu, Jan 29, 2015 at 1:53 PM, Linda Dunbar linda.dun...@huawei.com wrote:
 When a NVE is initialized or re-started, it uses Virtual Network scoped
 instances of the IS-IS to announce all the Virtual Networks in which it is
 participating.



 The  current draft-dunbar-nvo3-nva-mapping-distribution-01 suggests using
 the bit map to represent the supported VNs.

+-+-+-+-+-+-+-+-+

| Type  |  (1 byte)

+-+-+-+-+-+-+-+-+

|   Length  |  (1 byte)

+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

| RESV  |  Start VN ID  |  (2 bytes)

+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

| VNID bit-map

+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

 Figure 2. Enabled-VN TLV

Hi Linda,

First a couple of meta comments on the draft:

- Sizes of type and lengths in TLVs are inconsistent (some 16 bits,
some 32 bits). It might be just as well to make all of them 16 bits.
- For the above, I think the Start VNID field is 4 bytes not 2 bytes.
- Please avoid implicitly setting constraints on the data plane in the
definition of control plane. For instance, I've already made arguments
that VN ID might be thirty-two bits, and there's little cost to
defining thirty-two bit VN IDs in the control plane.





 For 24-bits VN ID, there could be 16million VNs. Even with the “Start VN ID”
 listed, the number of bytes for the bitmap can be very large.



 Therefore, I think it is better to have a flag indicating if the VNs are
 listed individually, Upper/Lower ranges, or bit mapped.

I wouldn't use a flag for that, it's probably cleaner to define
another TLV type that gives a list of VNIDs. Either the list or
bit-map can be used interchangeably.

Tom



 Any other suggestions?



 Linda










 ___
 nvo3 mailing list
 nvo3@ietf.org
 https://www.ietf.org/mailman/listinfo/nvo3


___
nvo3 mailing list
nvo3@ietf.org
https://www.ietf.org/mailman/listinfo/nvo3

Re: [nvo3] 答复: Comments on NVO3 data plane requirements for OAM

2014-11-24 Thread Tom Herbert

On Fri, Nov 21, 2014 at 12:37 AM, Mach Chen mach.c...@huawei.com wrote:

 Hi Deepak,

  In addition, I see the value of you proposed optional measurement
  field, it could be used to carry some correlation (e.g., block/period
  number) and timestamp information, then combine with the marking bit,
  it can greatly simplify the marking based solution.
 
  +++DK:
  I think adding information regarding measurement field, block, period, etc. 
  is not
  required in data path as more information reduces the mtu, and this can 
  easily be
  added if required by TLV to OAM functionality with new subtype (as this is 
  control
  or configuration functionality).

 I am talking about two things here:

 1) the fixed marking bits, I think they are necessary for passive PM;

 2) the correlation information, timestamps, counters, they could be 
 communicated either through in-band or out-or-band, each way has its pros and 
 cons;

  Also even passive oam loss measurement solution to calculating loss is not
  accurate as packets can arrive late outside the measuring blocks. Even in 
  that

 If it only depends on the marking bit and the measuring period is set to a 
 very small interval, indeed, that will affect the accuracy. But from 
 engineering point view, an operator and an implementation will not set (or 
 support) to a very aggressive period.

I don't think this would be true in our data center. The very reason
we would enable a passive mechanism is for getting accurate
measurements of high granularity. For instance, if we want to use this
as real time feedback for congestion control we need fine grained
information. Also, as part of debugging a customers problems it may
come down to us being able to identify specific packets that are being
dropped or experiencing unusual latency, I don't see how marking with
a couple of bits is sufficient for that. We already have this deployed
in the pre-NV world (mostly provided by TCP), in an NV world there are
many cases we won't have visibility into the customer's protocol so
we'll need to find alternative methods (which likely results in
annotating packets).

 And if the packets can carry some correlation information (e.g., the 
 block/period number), then the accuracy should be no problem.

 In theory, you are right, if the delay of the packets of block exceed a 
 threshold (e.g., a block period), the packets may be mis-counted into another 
 block.

  case to get accurate measurement instead of ipfix method, better to use OAM 
  to
  exchange these marked packet counters on both ends and do loss measurement
  between two consecutive loss measurement replies.

 I am fine with either way for communicating the counters and timestamps.

 
  For loss measurement, why we have to count traffic for marked packets only 
  and
  not maintain counters per flow?

 I'm not sure what's your question here.

 To calculate the packet loss, counters maintenance (no matter at where) is 
 necessary, it depends on the specific implementation.

 Best regards,
 Mach


  -Original Message-
  From: Deepak Kumar (dekumar) [mailto:deku...@cisco.com]
  Sent: Friday, November 21, 2014 2:34 PM
  To: Mach Chen; Tom Herbert
  Cc: nvo3@ietf.org
  Subject: Re: [nvo3] 答复: Comments on NVO3 data plane requirements for OAM
 
  Hi Mach,
 
  Please see inline +++DK:
 
 
  On 11/20/14 5:02 PM, Mach Chen mach.c...@huawei.com wrote:
 
  Hi Tom,
  
  Please see my response inline...
  
   -Original Message-
   From: Tom Herbert [mailto:therb...@google.com]
   Sent: Friday, November 21, 2014 1:28 AM
   To: Mach Chen
   Cc: nvo3@ietf.org
   Subject: Re: [nvo3] 答复: Comments on NVO3 data plane requirements for
   OAM
  
   On Wed, Nov 19, 2014 at 5:54 PM, Mach Chen mach.c...@huawei.com
  wrote:
Hi Tissa,
   
Thanks for your response!
   
Please see my response inline...
   
-Original Message-
From: nvo3 [mailto:nvo3-boun...@ietf.org] On Behalf Of Tissa
Senevirathne
(tsenevir)
Sent: Wednesday, November 19, 2014 8:45 PM
To: Haoweiguo; Tom Herbert
Cc: Greg Mirsky; Tapraj Singh; Deepak Kumar (dekumar);
nvo3@ietf.org
Subject: Re: [nvo3] 答复: Comments on NVO3 data plane requirements
for OAM
   
Hi Weiguo, Mach et,al
   
The discussion here is NVO3 data plane requirements for OAM. Like
I have said
   
You are right, this discussion is about NVO3 data plane
requirements
  for OAM,
   but recently the focus is Performance Measurement (PM) requirement to
  NVO3
   that is also one of the OAM functions.
   
earlier,  we do not need to complicate the Data Plane. Can you
explain to me
   
Complicate/simple is not the goal, the goal is to define a
  reasonable solution
   that can satisfy the requirement. That's why I agree with Greg that
  we should  firstly make the agreement on the requirement.
   
   Mach,
  
   The nvo3 OAM requirements draft
   (draft-ashwood-nvo3-oam-requirements-01) seems to already contain a
  fairly

Re: [nvo3] 答复: Comments on NVO3 data plane requirements for OAM

2014-11-18 Thread Tom Herbert

On Tue, Nov 18, 2014 at 3:54 PM, Tissa Senevirathne (tsenevir)
tsene...@cisco.com wrote:
 Greg



 I disagree with you on FM and PM cannot be achieved in ECMP environment.
 Significant amount of work has gone in to this area during TRILL OAM.
 Please check the use of Flow entropy functionality proposed in NVO3 OAM.



 https://tools.ietf.org/html/draft-tissa-nvo3-oam-fm-00

Tissa,

If I am reading this correctly, the OAM message would be composed of
the encapsulation header, followed by 128 bytes of which contains a
pseudo header for switching, followed by a self defining OAM message.
The OAM bit is only used at the receiver to distinguish data messages
for OAM messages for processing. Is this interpretation correct?

Thanks,
Tom





 From: nvo3 [mailto:nvo3-boun...@ietf.org] On Behalf Of Greg Mirsky
 Sent: Tuesday, November 18, 2014 3:03 PM
 To: Tapraj Singh
 Cc: nvo3@ietf.org
 Subject: Re: [nvo3] 答复: Comments on NVO3 data plane requirements for OAM



 Hi Tapraj,

 though I agree and support with idea of having OAM flag in NVO3 header I
 have to point to:

 absence of WG agreed upon OAM Requirements;
 no gap analysis of tools for NVO3 OAM;
 OAM flag does not help passive performance measurement marking method (two
 bit-long field for marking in fixed position).

 I agree that PW VCCV and GAL/G-ACh can be viewed as MPLS identification of
 OAM packet (though not necessarily OAM). But IP clearly doesn't have such
 identification for OAM and that, in part, why in-band requirement for IP
 OAM, both FM and Active PM, is not attainable (ECMP environment).

 Regards,

 Greg



 On Tue, Nov 18, 2014 at 1:31 PM, Tapraj Singh tsi...@juniper.net wrote:

 Hi All,

  I totally agree with the point made by Deepak and Tissa here.
 Our OAM should follow the data path for services as much as possible and
 all
 other protocol specific information should be in the OAM protocol specific
 TLVs.

 LAYER2 OAM

 In term of identify the OAM packet, first level of identification for L2
 OAM
 Should be the MAC address and send level of hierarchy should be the ether
 type or OUI.
 No other OAM Specific field should be allowed in the packet header.

  Please note that L3 OAM and MPLS also follow the same principle.

 Thanks
 Tapraj


 On 11/17/14 12:39 PM, Deepak Kumar (dekumar) deku...@cisco.com wrote:

I Agree with Tissa below. My Goal also was to point out that instead of
complicating the header, we can do OAM performance within OAM channel
itself and this is extensible and can be done in hardware which is why
mostly things are added in header.

Also, Operators keep asking for new OAM tools (Fault detection,
verification, isolation, Interworking, alarm, putting service in
maintenance and perform test)  and Performance tools, eg: (Delay/Jitter,
Actual Loss Measurement, Synthetic Loss, loopback signaling like TDM,
Generate frames to verify qos etc.) and so OAM Channel solution will be
extensible.

Thanks,
Deepak

On 11/17/14 8:47 AM, Tissa Senevirathne (tsenevir) tsene...@cisco.com
wrote:

I think we are complicating OAM beyond what it is needed.

As far as packet encapsulation is concern, all what is needed is single
bit. This bit is needed to prevent OAM packets leaking out from the
domain.

Termination of OAM and processing of it happen based on the addressing in
the packet.

E.g. if Address matches and OAM bit is set then it is an OAM packet
addressed to the local MEP/MP.

Not other way around. Why? Because we want OAM to be as closely as
possible follow the Data path.

If we need to have performance and delay measurements, we SHOULD NOT
mutate the packet header.

Instead OAM specific extensions should be in the OAM shim.

As an example. You could have packet fragment (which is sometimes called
flow entropy) and at the end of that you can have all of the stuff you
need in the world of OAM.

Hope this clarify

Thanks
Tissa
-Original Message-
From: nvo3 [mailto:nvo3-boun...@ietf.org] On Behalf Of Tom Herbert
Sent: Monday, November 17, 2014 8:02 AM
To: Marc Binderberger
Cc: Greg Mirsky; Mach Chen; Deepak Kumar (dekumar); nvo3@ietf.org;
Haoweiguo; Larry Kreeger (kreeger); Vero Zheng; Jon Hudson
Subject: Re: [nvo3] 答复: Comments on NVO3 data plane requirements for OAM

On Mon, Nov 17, 2014 at 12:01 AM, Marc Binderberger m...@sniff.de
wrote:
 Hello Deepak et al.,

 so this sounds like we need more than just a (2nd) bit for delay
measurement.
 Seems we need an optional header extension or a TLV to carry all the
 information (timestamps, oam Subtype). Sounds definitely more than a
 32/64bit header could carry (*).

 The optional header extension, when done similar to GUE, has a fixed
 position. For the TLV this would be an additional requirement. This
 would allow for hardware-stamping.

The alternative is to do active delay measurement using request/reply.
We should be able to define the requirements so that an OAM message
corresponding to a flow which would be routed in exactly the same way as
a data message for the flow

Re: [nvo3] 答复: Comments on NVO3 data plane requirements for OAM

2014-11-17 Thread Tom Herbert

On Mon, Nov 17, 2014 at 12:01 AM, Marc Binderberger m...@sniff.de wrote:
 Hello Deepak et al.,

 so this sounds like we need more than just a (2nd) bit for delay measurement.
 Seems we need an optional header extension or a TLV to carry all the
 information (timestamps, oam Subtype). Sounds definitely more than a 32/64bit
 header could carry (*).

 The optional header extension, when done similar to GUE, has a fixed
 position. For the TLV this would be an additional requirement. This would
 allow for hardware-stamping.

The alternative is to do active delay measurement using request/reply.
We should be able to define the requirements so that an OAM message
corresponding to a flow which would be routed in exactly the same way
as a data message for the flow. Larry mentioned that we might even
want to put a fake packet header as the first part of the
encapsulated payload of an OAM message for instance.

 Now if we introduce such an OAM extension header it could as well carry the
 first bit we discussed for packet loss measurement (?).


 Regards, Marc

 (*: at least all proposals so far have a base header that fits into 32/64
 bit, plus IP and potential UDP)




 On Sun, 16 Nov 2014 16:44:54 +, Deepak Kumar (dekumar) wrote:
 Hi,

 Please see inline +++DK:

 On 11/14/14 11:09 AM, Jon Hudson jon.hud...@gmail.com wrote:


 One comment in line

 On Nov 13, 2014, at 11:47 PM, Vero Zheng vero.zh...@huawei.com wrote:

 Hi Tom,

 Please see in-line.

 BR, Vero

 -Original Message-
 From: nvo3 [mailto:nvo3-boun...@ietf.org] On Behalf Of Tom Herbert
 Sent: Friday, November 14, 2014 4:27 PM
 To: Mach Chen
 Cc: Greg Mirsky; Haoweiguo; Marc Binderberger; Larry Kreeger;
 nvo3@ietf.org
 Subject: Re: [nvo3] 答复: Comments on NVO3 data plane requirements for
 OAM

 On Wed, Nov 12, 2014 at 5:13 PM, Mach Chen mach.c...@huawei.com
 wrote:
 Hi Tom,

 -Original Message-
 From: Tom Herbert [mailto:therb...@google.com]
 Sent: Thursday, November 13, 2014 3:11 AM
 To: Marc Binderberger
 Cc: Mach Chen; Greg Mirsky; Haoweiguo; nvo3@ietf.org; Larry Kreeger
 Subject: Re: [nvo3] 答复: Comments on NVO3 data plane requirements for
 OAM

 On Wed, Nov 12, 2014 at 2:11 AM, Marc Binderberger m...@sniff.de
 wrote:
 Hello Mach,

 so for delay measurement you use the color flag to mark a single
 packet, which helps the receiver to pick the right packet?  And
 repeat this every time period T ?

...0001001001000...
 Is there there a draft or description of how this algorithm would
 work? Seems like there would need to be quite a bot of
 synchronization needed between end points (synchronized clocks,
 provisions to correlate measurements correctly with lost packets,
 replicated packets, etc.). Also, what is envisioned for range for
 the period?

 Here is a reference

 https://datatracker.ietf.org/doc/draft-chen-ippm-coloring-based-ipfpm-fr
 amew
 ork/.

 Thanks for the pointer. Regarding the need for synchronized clocks to
 measure
 delay, I consulted our local NTP expert. The host clock jitter we
 currently see in
 our network is currently usually greater than one-way packet delay (in
 some
 cases much greater), so in his words:
 measuring one-way packet delays using host clocks is a lost cause.
 Please take this as just one data point!

 Jon Thank you. As someone who has managed NTP more times and for more
 years than I care to admit, this is a very good datapoint to consider.
 NTP helps many understand that time is relative.

 +++DK: As per our experience in carrier Ethernet we supported one way
 delay and never found NTP useful even for our lab networks (I am referring
 software based NTP NTPv3).
 As mentioned below IEEE 1588v2 will vary based on equipment and operator
 networks but in our testing we found it very precise if properly deployed.
 IEEE 1588v2 is very precise if phy based timestamping is used. Even
 timestamping at NP level provided great results for one way delay.

 If we want to accurately measure two way delay we need 4 timestamp total
 on receiver of frame (this is to avoid processing time that's taken for
 reply by software as hardware can put timestamp at lower layer without
 doing delay and jitter calculation).
 For one way delay we will require 2 timestamp, so lower layer hardware can
 timestamp before packet is punted to software.

 As mentioned below I agree 8 byte IEEE 1588 timestamp is required.

 We should also look for Synthetic OAM applicability for performance ('O'
 bit can be overloaded to do both Fault and performance if OAM is defined
 with different oam Subtype for Delay and Loss frames and it will not be
 too deep hardware inspection) as that give large flexibility
 (synthetic/real loss measurement, Availability/unavailability, on-demand
 and pro-active performance) and can be run on all flows of ECMP.

 Thanks,
 Deepak



 [Vero] Thanks for this. What about the current experience with 1588v2
 then?

 Yes, it does need some synchronization. As for the range, it depends
 on two

[nvo3] Fwd: New Version Notification for draft-herbert-remotecsumoffload-01.txt

2014-11-17 Thread Tom Herbert

This is a new version of remote checksum offload, the primary
difference from 00 is that the offsets are now relative to the end of
the encapsulation header instead of the beginning.

Remote checksum offload is supported in GUE in upstream Linux (will be
in 3.18, http://www.spinics.net/lists/netdev/msg302554.html).

VXLAN experts: I would like to implement this in VXLAN also. I think
the option could be compressed to 8 bits. Would it make sense to use
an available reserved bit (maybe call it for private use) and put the
data in the low order eight bits of vni field?

Thanks,
Tom

-- Forwarded message --
From:  internet-dra...@ietf.org
Date: Thu, Nov 13, 2014 at 12:19 AM
Subject: New Version Notification for draft-herbert-remotecsumoffload-01.txt
To: Tom Herbert therb...@google.com



A new version of I-D, draft-herbert-remotecsumoffload-01.txt
has been successfully submitted by Tom Herbert and posted to the
IETF repository.

Name:   draft-herbert-remotecsumoffload
Revision:   01
Title:  Remote checksum offload for encapsulation
Document date:  2014-11-12
Group:  Individual Submission
Pages:  11
URL:
http://www.ietf.org/internet-drafts/draft-herbert-remotecsumoffload-01.txt
Status:
https://datatracker.ietf.org/doc/draft-herbert-remotecsumoffload/
Htmlized:   http://tools.ietf.org/html/draft-herbert-remotecsumoffload-01
Diff:
http://www.ietf.org/rfcdiff?url2=draft-herbert-remotecsumoffload-01

Abstract:
   This specification describes remote checksum offload, which is a
   mechanism that provides checksum offload of transport checksums in
   encapsulated packets using rudimentary offload capabilities found in
   most Network Interface Card (NIC) devices. The outer header checksum
   (e.g. that in UDP or GRE) is enabled in packets and, with some
   additional meta information, a receiver is able to deduce the
   checksum to be set in an encapsulated packet. Effectively this
   offloads the computation of the inner checksum. Enabling the outer
   checksum in encapsulation has the additional advantage that it covers
   more of the packet than the inner checksum including the
   encapsulation headers.




Please note that it may take a couple of minutes from the time of submission
until the htmlized version and diff are available at tools.ietf.org.

The IETF Secretariat

___
nvo3 mailing list
nvo3@ietf.org
https://www.ietf.org/mailman/listinfo/nvo3

Re: [nvo3] 答复: Comments on NVO3 data plane requirements for OAM

2014-11-14 Thread Tom Herbert

On Wed, Nov 12, 2014 at 5:13 PM, Mach Chen mach.c...@huawei.com wrote:
 Hi Tom,

 -Original Message-
 From: Tom Herbert [mailto:therb...@google.com]
 Sent: Thursday, November 13, 2014 3:11 AM
 To: Marc Binderberger
 Cc: Mach Chen; Greg Mirsky; Haoweiguo; nvo3@ietf.org; Larry Kreeger
 Subject: Re: [nvo3] 答复: Comments on NVO3 data plane requirements for OAM

 On Wed, Nov 12, 2014 at 2:11 AM, Marc Binderberger m...@sniff.de wrote:
  Hello Mach,

  so for delay measurement you use the color flag to mark a single
  packet, which helps the receiver to pick the right packet?  And repeat
  this every time period T ?

  ...0001001001000...

 Is there there a draft or description of how this algorithm would work? Seems
 like there would need to be quite a bot of synchronization needed between end
 points (synchronized clocks, provisions to correlate measurements correctly 
 with
 lost packets, replicated packets, etc.). Also, what is envisioned for range 
 for the
 period?

 Here is a reference 
 https://datatracker.ietf.org/doc/draft-chen-ippm-coloring-based-ipfpm-framework/.

Thanks for the pointer. Regarding the need for synchronized clocks to
measure delay, I consulted our local NTP expert. The host clock jitter
we currently see in our network is currently usually greater than
one-way packet delay (in some cases much greater), so in his words:
measuring one-way packet delays using host clocks is a lost cause.
Please take this as just one data point!

 Yes, it does need some synchronization. As for the range, it depends on two 
 factors, one is the implementation limitation, the other the requirement of 
 the operators. In the above reference, the suggested periods are 1s, 10s, 
 1min, 10min and 1h.

I think if we were implementing delay measurement in GUE, I would
advocate add a 64 bit optional field for timestamp, probably
containing source time stamp, and echoed timestamp for a flow (usec
resolution and similar in design TCP timestamp option). This easily
gives a precise RTT, and if clocks are precisely synchronized then one
way latency could be calculated also.

Thanks,
Tom

 Best regards,
 Mach

 Thanks,
 Tom

  One question I still have is: why is the measurement done in the NVE 
  header?
  The outer header is IP/IPv6, so couldn't we use the coloring for the
  IP/IPv6 header, assuming this is defined?

  Thanks  Regards,
  Marc

  On Wed, 12 Nov 2014 09:34:52 +, Mach Chen wrote:
  Hi Tom,

  -Original Message-
  From: Tom Herbert [mailto:therb...@google.com]
  Sent: Wednesday, November 12, 2014 5:06 PM
  To: Mach Chen
  Cc: Greg Mirsky; Haoweiguo; nvo3@ietf.org; Larry Kreeger (kreeger)
  Subject: Re: [nvo3] 答复: Comments on NVO3 data plane requirements for
  OAM

  On Wed, Nov 12, 2014 at 12:55 AM, Mach Chen mach.c...@huawei.com
  wrote:
  Hi Greg and all,

  Single bit is not sufficient if someone wants to perform loss and
  delay measurement  simultaneously, then two bits needed.

  Is that necessary? Can they share the same time quantum (as well as
  other metrics maybe to be added later)? In all the protocols
  mentioned, the reserved bits are a somewhat precious resource.

  Yes, it's necessary if there is ECMP.

  Given one bit is used for both loss and delay measurement, for loss
  measurement, it periodically set and clear the marking bit, a flow is
  divided into consecutive blocks, and then the counting and
  calculating are based on each block. This is fine for loss measurement.

  For delay measurement, it has to make sure the timestamps (collected
  at sender and receiver) are for the same packet. Presumably, the time
  when changing the marking bit is right time to get the timestamps.
  Since there is ECMP, the first packet of a block at the sender may
  probably different from the first packet at the receiver, thus it
  will get the mismatched timestamps to calculate the delay.

  Best regards,
  Mach

  Tom

  Best regards,

  Mach

  From: nvo3 [mailto:nvo3-boun...@ietf.org] On Behalf Of Greg Mirsky
  Sent: Wednesday, November 12, 2014 8:05 AM
  To: Haoweiguo
  Cc: nvo3@ietf.org; Larry Kreeger (kreeger)
  Subject: Re: [nvo3] 答复: Comments on NVO3 data plane requirements
  for OAM

  Dear All,
  agree with Weiguo, single bit flag in fixed position would be
  sufficient and HW-friendly.

  Regards,

  Greg

  On Tue, Nov 11, 2014 at 3:51 PM, Haoweiguo haowei...@huawei.com
  wrote:

  Hi Larry,

  For marking purpose, i think one bit maybe OK, fixed fields in NVO3
  header is precious. I would like it is set in fixed field, rather
  than in option field. Because chipset normally can't process
  optional field, it is hard to realize in-band performance
  measurement if using optional
  field for marking.
  For other real time congestion control function, maybe more bits
  are needed.

  Thanks

  weiguo

  发件人: Larry Kreeger (kreeger) [kree...@cisco.com]
  发送时间: 2014年11月12

Re: [nvo3] 答复: Comments on NVO3 data plane requirements for OAM

2014-11-12 Thread Tom Herbert

On Wed, Nov 12, 2014 at 12:55 AM, Mach Chen mach.c...@huawei.com wrote:
 Hi Greg and all,



 Single bit is not sufficient if someone wants to perform loss and delay
 measurement  simultaneously, then two bits needed.

Is that necessary? Can they share the same time quantum (as well as
other metrics maybe to be added later)? In all the protocols
mentioned, the reserved bits are a somewhat precious resource.

Tom



 Best regards,

 Mach



 From: nvo3 [mailto:nvo3-boun...@ietf.org] On Behalf Of Greg Mirsky
 Sent: Wednesday, November 12, 2014 8:05 AM
 To: Haoweiguo
 Cc: nvo3@ietf.org; Larry Kreeger (kreeger)
 Subject: Re: [nvo3] 答复: Comments on NVO3 data plane requirements for OAM



 Dear All,
 agree with Weiguo, single bit flag in fixed position would be sufficient and
 HW-friendly.

 Regards,

 Greg



 On Tue, Nov 11, 2014 at 3:51 PM, Haoweiguo haowei...@huawei.com wrote:

 Hi Larry,

 For marking purpose, i think one bit maybe OK, fixed fields in NVO3 header
 is precious. I would like it is set in fixed field, rather than in option
 field. Because chipset normally can't process optional field, it is hard to
 realize in-band performance measurement if using optional field for marking.
 For other real time congestion control function, maybe more bits are needed.

 Thanks

 weiguo

 

 发件人: Larry Kreeger (kreeger) [kree...@cisco.com]
 发送时间: 2014年11月12日 4:33
 收件人: Haoweiguo; Greg Mirsky


 抄送: nvo3@ietf.org
 主题: Re: [nvo3] Comments on NVO3 data plane requirements for OAM



 Hi Weiguo,



 What do you envision this marking looking like?  e.g. is it just a single
 flag bit, or large field with a counter or sequence number, or some kind of
 flow ID?  If not a single flag, how large do you see the field being?



 If it is more than a flag (and I assume it would be), and is not mandatory
 for all implementations, then it seems to fall into the category of optional
 extensions.



 Thanks, Larry



 From: Haoweiguo haowei...@huawei.com
 Date: Tuesday, November 11, 2014 10:18 AM
 To: Greg Mirsky gregimir...@gmail.com
 Cc: nvo3@ietf.org nvo3@ietf.org
 Subject: [nvo3] 答复: Comments on NVO3 data plane requirements for OAM



 Hi Greg,

 I fully agree with you.

 The real time OAM is passive performance measurement methods. I would like
 NVO3 data encapsulation has a field for marking and not affect forwarding of
 packets, the marking field is only used for performance measurement. The
 NVO3 packet with this marking flag don't need to be sent to control plane,
 it is different from OAM(ping/Trace) packet processing.

 Thanks

 weiguo

 

 发件人: Greg Mirsky [gregimir...@gmail.com]
 发送时间: 2014年11月12日 4:07
 收件人: Haoweiguo
 抄送: nvo3@ietf.org
 主题: Re: [nvo3] Comments on NVO3 data plane requirements for OAM

 Hi Weiguo,

 marking groups of packets that belong to the particular flow to facilitate
 measurement of some performance metric, whether loss or delay/delay
 variation, may be viewed as one of passive performance measurement methods.
 But such marking should not alter, at least not significantly alter,
 treatment of data flow in the network. Because of that, I believe, OAM flag
 should not be used for marking as that will force punting marked packets
 from fast forwarding path to the control plane. But it might be good to have
 a field in NVO3 header that may be used for marking and not affect
 forwarding of packets if altered.

 Regards,

 Greg



 On Tue, Nov 11, 2014 at 12:34 AM, Haoweiguo haowei...@huawei.com wrote:

 Hi All,

 I maybe not clearly said in today’s NVO3 meeting, pls allow me to reiterate
 the OAM data plane requirements on the mail list.

 Currently NVO3 data plane encapsulation only includes one OAM flag, it is
 used for Ping/Trace similar applications. This kind of OAM application is
 initiated by operators for network connectivity verification, normally when
 network failure occurs. There is another OAM requirements of real time OAM
 or synthesizing OAM. It can be used for packet loss detection in real time.
 When ingress NVE receives traffic from local TS, it gets packet statistics,
 and mark(coloring) the OAM flag relying on local policy when it performs
 NVO3 encapsulation. When egress NVEs receives the traffic, it decapsulates
 NVO3 encapsulation, and gets packet statistics with the real time OAM flag
 marking. By comparing the packet number of ingress NVE and the sum of all
 egress NVEs, packet loss can be deduced. This method can be applicable for
 both unicast and multicast traffic. Local policy on ingress NVE is
 configured by operators or automatically acquired from centralized
 orchestration.

 Thanks

 weiguo


 ___
 nvo3 mailing list
 nvo3@ietf.org
 https://www.ietf.org/mailman/listinfo/nvo3






 ___
 nvo3 mailing list
 nvo3@ietf.org
 https://www.ietf.org/mailman/listinfo/nvo3


___
nvo3

Re: [nvo3] Comments on NVO3 data plane requirements for OAM

2014-11-11 Thread Tom Herbert

On Tue, Nov 11, 2014 at 12:33 PM, Larry Kreeger (kreeger)
kree...@cisco.com wrote:
 Hi Weiguo,

 What do you envision this marking looking like?  e.g. is it just a single
 flag bit, or large field with a counter or sequence number, or some kind of
 flow ID?  If not a single flag, how large do you see the field being?

 If it is more than a flag (and I assume it would be), and is not mandatory
 for all implementations, then it seems to fall into the category of optional
 extensions.

I assume this is a request for in-band measurement as opposed to some
out of band summary mechanism which seems to be more typical of OAM.
If we are adding loss counters/delay metrics to every data packet,
this is starting to look like the sort of data we meed for congestion
control and in fact might be a subset of that.

Tom

 Thanks, Larry

 From: Haoweiguo haowei...@huawei.com
 Date: Tuesday, November 11, 2014 10:18 AM
 To: Greg Mirsky gregimir...@gmail.com
 Cc: nvo3@ietf.org nvo3@ietf.org
 Subject: [nvo3] 答复: Comments on NVO3 data plane requirements for OAM

 Hi Greg,

 I fully agree with you.

 The real time OAM is passive performance measurement methods. I would like
 NVO3 data encapsulation has a field for marking and not affect forwarding of
 packets, the marking field is only used for performance measurement. The
 NVO3 packet with this marking flag don't need to be sent to control plane,
 it is different from OAM(ping/Trace) packet processing.

 Thanks

 weiguo

 
 发件人: Greg Mirsky [gregimir...@gmail.com]
 发送时间: 2014年11月12日 4:07
 收件人: Haoweiguo
 抄送: nvo3@ietf.org
 主题: Re: [nvo3] Comments on NVO3 data plane requirements for OAM

 Hi Weiguo,
 marking groups of packets that belong to the particular flow to facilitate
 measurement of some performance metric, whether loss or delay/delay
 variation, may be viewed as one of passive performance measurement methods.
 But such marking should not alter, at least not significantly alter,
 treatment of data flow in the network. Because of that, I believe, OAM flag
 should not be used for marking as that will force punting marked packets
 from fast forwarding path to the control plane. But it might be good to have
 a field in NVO3 header that may be used for marking and not affect
 forwarding of packets if altered.

 Regards,
 Greg

 On Tue, Nov 11, 2014 at 12:34 AM, Haoweiguo haowei...@huawei.com wrote:

 Hi All,

 I maybe not clearly said in today’s NVO3 meeting, pls allow me to
 reiterate the OAM data plane requirements on the mail list.

 Currently NVO3 data plane encapsulation only includes one OAM flag, it is
 used for Ping/Trace similar applications. This kind of OAM application is
 initiated by operators for network connectivity verification, normally when
 network failure occurs. There is another OAM requirements of real time OAM
 or synthesizing OAM. It can be used for packet loss detection in real time.
 When ingress NVE receives traffic from local TS, it gets packet statistics,
 and mark(coloring) the OAM flag relying on local policy when it performs
 NVO3 encapsulation. When egress NVEs receives the traffic, it decapsulates
 NVO3 encapsulation, and gets packet statistics with the real time OAM flag
 marking. By comparing the packet number of ingress NVE and the sum of all
 egress NVEs, packet loss can be deduced. This method can be applicable for
 both unicast and multicast traffic. Local policy on ingress NVE is
 configured by operators or automatically acquired from centralized
 orchestration.

 Thanks

 weiguo


 ___
 nvo3 mailing list
 nvo3@ietf.org
 https://www.ietf.org/mailman/listinfo/nvo3



 ___
 nvo3 mailing list
 nvo3@ietf.org
 https://www.ietf.org/mailman/listinfo/nvo3


___
nvo3 mailing list
nvo3@ietf.org
https://www.ietf.org/mailman/listinfo/nvo3

Re: [nvo3] Comments on NVO3 data plane requirements for OAM

2014-11-11 Thread Tom Herbert

On Tue, Nov 11, 2014 at 4:03 PM, Greg Mirsky gregimir...@gmail.com wrote:
 Hi Tom,
 I see very little use for out-of-band performance measurement as it result
 hardly characteristic of monitored service. Perhaps we compare
 out-or-service and in-service measurement. Marking is to facilitate Passive
 performance measurement which is obviously in-service and in-band OAM.

By out of band I mean not piggy backed on a data packet, but still can
follow same path (for example ping to test path).

 As
 example of passive measurement it has limitations as well as advantages.
 Marking method does not require tagging data packets with anything but mark
 in the way that should not alter network treatment of unmarked packet. All
 timestamps and counters are to be collected at observation points. Marking
 helps to correlate collected information and perform measurements.

How would get a time stamp from just a single mark on a packet?

 Regards,
 Greg

 On Tue, Nov 11, 2014 at 1:42 PM, Tom Herbert therb...@google.com wrote:

 On Tue, Nov 11, 2014 at 12:33 PM, Larry Kreeger (kreeger)
 kree...@cisco.com wrote:
  Hi Weiguo,
 
  What do you envision this marking looking like?  e.g. is it just a
  single
  flag bit, or large field with a counter or sequence number, or some kind
  of
  flow ID?  If not a single flag, how large do you see the field being?
 
  If it is more than a flag (and I assume it would be), and is not
  mandatory
  for all implementations, then it seems to fall into the category of
  optional
  extensions.
 
 I assume this is a request for in-band measurement as opposed to some
 out of band summary mechanism which seems to be more typical of OAM.
 If we are adding loss counters/delay metrics to every data packet,
 this is starting to look like the sort of data we meed for congestion
 control and in fact might be a subset of that.

 Tom

  Thanks, Larry
 
  From: Haoweiguo haowei...@huawei.com
  Date: Tuesday, November 11, 2014 10:18 AM
  To: Greg Mirsky gregimir...@gmail.com
  Cc: nvo3@ietf.org nvo3@ietf.org
  Subject: [nvo3] 答复: Comments on NVO3 data plane requirements for OAM
 
  Hi Greg,
 
  I fully agree with you.
 
  The real time OAM is passive performance measurement methods. I would
  like
  NVO3 data encapsulation has a field for marking and not affect
  forwarding of
  packets, the marking field is only used for performance measurement. The
  NVO3 packet with this marking flag don't need to be sent to control
  plane,
  it is different from OAM(ping/Trace) packet processing.
 
  Thanks
 
  weiguo
 
  
  发件人: Greg Mirsky [gregimir...@gmail.com]
  发送时间: 2014年11月12日 4:07
  收件人: Haoweiguo
  抄送: nvo3@ietf.org
  主题: Re: [nvo3] Comments on NVO3 data plane requirements for OAM
 
  Hi Weiguo,
  marking groups of packets that belong to the particular flow to
  facilitate
  measurement of some performance metric, whether loss or delay/delay
  variation, may be viewed as one of passive performance measurement
  methods.
  But such marking should not alter, at least not significantly alter,
  treatment of data flow in the network. Because of that, I believe, OAM
  flag
  should not be used for marking as that will force punting marked packets
  from fast forwarding path to the control plane. But it might be good to
  have
  a field in NVO3 header that may be used for marking and not affect
  forwarding of packets if altered.
 
  Regards,
  Greg
 
  On Tue, Nov 11, 2014 at 12:34 AM, Haoweiguo haowei...@huawei.com
  wrote:
 
  Hi All,
 
  I maybe not clearly said in today’s NVO3 meeting, pls allow me to
  reiterate the OAM data plane requirements on the mail list.
 
  Currently NVO3 data plane encapsulation only includes one OAM flag, it
  is
  used for Ping/Trace similar applications. This kind of OAM application
  is
  initiated by operators for network connectivity verification, normally
  when
  network failure occurs. There is another OAM requirements of real time
  OAM
  or synthesizing OAM. It can be used for packet loss detection in real
  time.
  When ingress NVE receives traffic from local TS, it gets packet
  statistics,
  and mark(coloring) the OAM flag relying on local policy when it
  performs
  NVO3 encapsulation. When egress NVEs receives the traffic, it
  decapsulates
  NVO3 encapsulation, and gets packet statistics with the real time OAM
  flag
  marking. By comparing the packet number of ingress NVE and the sum of
  all
  egress NVEs, packet loss can be deduced. This method can be applicable
  for
  both unicast and multicast traffic. Local policy on ingress NVE is
  configured by operators or automatically acquired from centralized
  orchestration.
 
  Thanks
 
  weiguo
 
 
  ___
  nvo3 mailing list
  nvo3@ietf.org
  https://www.ietf.org/mailman/listinfo/nvo3
 
 
 
  ___
  nvo3 mailing list
  nvo3@ietf.org
  https://www.ietf.org/mailman/listinfo/nvo3

Re: [nvo3] 答复: Comments on NVO3 data plane requirements for OAM

2014-11-11 Thread Tom Herbert

On Tue, Nov 11, 2014 at 7:04 PM, Haoweiguo haowei...@huawei.com wrote:
 Hi Tom,
 Pls see inline with [weiguo].
 Thanks
 weiguo

 
 发件人: Tom Herbert [therb...@google.com]
 发送时间: 2014年11月12日 8:22
 收件人: Greg Mirsky
 抄送: Larry Kreeger (kreeger); Haoweiguo; nvo3@ietf.org
 主题: Re: [nvo3] Comments on NVO3 data plane requirements for OAM

 On Tue, Nov 11, 2014 at 4:03 PM, Greg Mirsky gregimir...@gmail.com wrote:
 Hi Tom,
 I see very little use for out-of-band performance measurement as it result
 hardly characteristic of monitored service. Perhaps we compare
 out-or-service and in-service measurement. Marking is to facilitate Passive
 performance measurement which is obviously in-service and in-band OAM.

 By out of band I mean not piggy backed on a data packet, but still can
 follow same path (for example ping to test path).

 [weiguo]: Yes,exactly. Current NVO3 OAM consideration only relates to out of 
 band OAM, not piggy backed on data packet. So i would like to add a 
 additional passive measurement method, i.e., in-band OAM.

  As
 example of passive measurement it has limitations as well as advantages.
 Marking method does not require tagging data packets with anything but mark
 in the way that should not alter network treatment of unmarked packet. All
 timestamps and counters are to be collected at observation points. Marking
 helps to correlate collected information and perform measurements.

 How would get a time stamp from just a single mark on a packet?

 [weiguo]: For packet loss statistics purpose, no time stamp is needed. A bit 
 in NVO3 header is enough for packet loss. The flag is used to separaring 
 packets between different statistics period. For example, if statistic period 
 is 10 seconds, first period packets the flag is set to 1 on ingress NVE, 
 second period the flag is set to 0, third period the flag is set to 1 again, 
 then repeat again and again, until the statistics behavior terminated.
 Ingress NVE and egress NVE need to send their statistics for each period to a 
 centralized point, the centralized point compares the statictics of packet 
 number, the difference number is packet loss.


Can you just keep a running count of packets received on the tunnel
and return that periodically (I believe this is something like what
circuit breaker does)?

Tom

 Regards,
 Greg

 On Tue, Nov 11, 2014 at 1:42 PM, Tom Herbert therb...@google.com wrote:

 On Tue, Nov 11, 2014 at 12:33 PM, Larry Kreeger (kreeger)
 kree...@cisco.com wrote:
  Hi Weiguo,
 
  What do you envision this marking looking like?  e.g. is it just a
  single
  flag bit, or large field with a counter or sequence number, or some kind
  of
  flow ID?  If not a single flag, how large do you see the field being?
 
  If it is more than a flag (and I assume it would be), and is not
  mandatory
  for all implementations, then it seems to fall into the category of
  optional
  extensions.
 
 I assume this is a request for in-band measurement as opposed to some
 out of band summary mechanism which seems to be more typical of OAM.
 If we are adding loss counters/delay metrics to every data packet,
 this is starting to look like the sort of data we meed for congestion
 control and in fact might be a subset of that.

 Tom

  Thanks, Larry
 
  From: Haoweiguo haowei...@huawei.com
  Date: Tuesday, November 11, 2014 10:18 AM
  To: Greg Mirsky gregimir...@gmail.com
  Cc: nvo3@ietf.org nvo3@ietf.org
  Subject: [nvo3] 答复: Comments on NVO3 data plane requirements for OAM
 
  Hi Greg,
 
  I fully agree with you.
 
  The real time OAM is passive performance measurement methods. I would
  like
  NVO3 data encapsulation has a field for marking and not affect
  forwarding of
  packets, the marking field is only used for performance measurement. The
  NVO3 packet with this marking flag don't need to be sent to control
  plane,
  it is different from OAM(ping/Trace) packet processing.
 
  Thanks
 
  weiguo
 
  
  发件人: Greg Mirsky [gregimir...@gmail.com]
  发送时间: 2014年11月12日 4:07
  收件人: Haoweiguo
  抄送: nvo3@ietf.org
  主题: Re: [nvo3] Comments on NVO3 data plane requirements for OAM
 
  Hi Weiguo,
  marking groups of packets that belong to the particular flow to
  facilitate
  measurement of some performance metric, whether loss or delay/delay
  variation, may be viewed as one of passive performance measurement
  methods.
  But such marking should not alter, at least not significantly alter,
  treatment of data flow in the network. Because of that, I believe, OAM
  flag
  should not be used for marking as that will force punting marked packets
  from fast forwarding path to the control plane. But it might be good to
  have
  a field in NVO3 header that may be used for marking and not affect
  forwarding of packets if altered.
 
  Regards,
  Greg
 
  On Tue, Nov 11, 2014 at 12:34 AM, Haoweiguo haowei...@huawei.com
  wrote:
 
  Hi All,
 
  I maybe not clearly said in today’s NVO3 meeting

Re: [nvo3] I-D Action: draft-ietf-nvo3-arch-02.txt

2014-10-30 Thread Tom Herbert

On Thu, Oct 30, 2014 at 2:55 PM, Larry Kreeger (kreeger)
kree...@cisco.com wrote:
 Hi Tom,

 We do have a term in NVO3 - Tenant System.  It is when it comes to
 concrete examples that we fall back on the most prominent example of the
 VM.

I'm referring to wording in section 3.4:

From an NVO3 perspective, it should be assumed that where the
document uses the term VM and hypervisor, the intention is that
the discussion also applies to other systems

  - Larry

 On 10/30/14 2:52 PM, Tom Herbert therb...@google.com wrote:

On Thu, Oct 30, 2014 at 1:50 PM, Behcet Sarikaya sarikaya2...@gmail.com
wrote:
  Hi Tom,

 Did you check Linux Containers discussion in the revision? What do you
think?

Thanks for the clarify text. I still wish there was a more networking
specific term than VM for this, much like we talk about hosts instead
of machines, but I suppose the use of VM is pretty burned in at this
point.

Tom

 Regards,

 Behcet


 A New Internet-Draft is available from the on-line Internet-Drafts
directories.
  This draft is a work item of the Network Virtualization Overlays
 Working Group of the IETF.

 Title   : An Architecture for Overlay Networks (NVO3)
 Authors : David Black
   Jon Hudson
   Lawrence Kreeger
   Marc Lasserre
   Thomas Narten
 Filename: draft-ietf-nvo3-arch-02.txt
 Pages   : 31
 Date: 2014-10-27

 Abstract:
This document presents a high-level overview architecture for
building overlay networks in NVO3.  The architecture is given at a
high-level, showing the major components of an overall system.  An
important goal is to divide the space into individual smaller
components that can be implemented independently and with clear
interfaces and interactions with other components.  It should be
possible to build and implement individual components in isolation
and have them work with other components with no changes to other
components.  That way implementers have flexibility in implementing
individual components and can optimize and innovate within their
respective components without requiring changes to other components.


 The IETF datatracker status page for this draft is:
 https://datatracker.ietf.org/doc/draft-ietf-nvo3-arch/

 There's also a htmlized version available at:
 http://tools.ietf.org/html/draft-ietf-nvo3-arch-02

 A diff from the previous version is available at:
 http://www.ietf.org/rfcdiff?url2=draft-ietf-nvo3-arch-02


 Please note that it may take a couple of minutes from the time of
submission
 until the htmlized version and diff are available at tools.ietf.org.

 Internet-Drafts are also available by anonymous FTP at:
 ftp://ftp.ietf.org/internet-drafts/

 ___
 nvo3 mailing list
 nvo3@ietf.org
 https://www.ietf.org/mailman/listinfo/nvo3

___
nvo3 mailing list
nvo3@ietf.org
https://www.ietf.org/mailman/listinfo/nvo3


___
nvo3 mailing list
nvo3@ietf.org
https://www.ietf.org/mailman/listinfo/nvo3

Re: [nvo3] Concerns about NVO3 dataplane requirements document

2014-10-21 Thread Tom Herbert

Hi Eric,

Thank you for sending this. Some comments in line.

On Tue, Oct 21, 2014 at 8:37 AM, Erik Nordmark nordm...@acm.org wrote:

 I expressed this on the phone at the interim meeting and was asked to post
 with a  bit more detail.

 According to the call the intended purpose of the requirements document is
 to help the WG choose between different proposed dataplane encapsulation
 protocols. However, I get the impression that
 draft-ietf-nvo3-dataplane-requirements was not written with that purpose in
 mind.

 To be clear, the document contains useful background text and various
 discussion on what needs to be done at encapsulation and decapsulation
 points either by an implementation or operationally. However, that does not
 help with the (current) intended purpose.

 The actual requirements on encapsulation protocol are few and weak. Section
 3.3.1 contains a few of them.
 But even the requirement on VNID is weak. The document says that there MUST
 be a VNID, but then goes on to

This field MAY be an
explicit, unique (to the administrative domain) virtual network
identifier (VNID) or MAY express the necessary context information
in other ways (e.g. a locally significant identifier).

 While in theory locally significant indentifiers can be made to work, they
 would require an additional control-plane mechanism to handle the (dynamic?)
 mapping between VNID  and the local identifier. Especially with BGP EVPN now
 being handled in the BESS WG, I think we should have a requirement for our
 dataplane that it MUST contain a VNID field.

 That section also discusses QoS/CoS. But it's requirement is a MAY for a QoS
 field in the NVO3 overlay header. Either we should require such a field or
 not; otherwise this doesn't help us choose.
 (And my personal take is that we can have solutions which map between TS
 QoS/CoS and underlay CoS without also having some QoS field in the NVO3
 header. But my overall point is that the MAY isn't a helpful criteria to
 choose between protocols.)


 Those are the only requirements on the dataplane I've found in the
 document.

 There might be other required or desired properties that are not in the
 draft. For instance, one can contemplate a requirement that the
 encapsulation MUST/SHOULD facilitate ECMP in unmodified routers in the
 underlay (e.g., using the common technique of UDP encaps with entropy placed
 in the UDP source port field).

 Thus my take is that a document with requirements on the NVO3 dataplane
 encapsulation can be a page or two. Some introduction and background
 followed by a list of required and desired properties of the  encapsulation
 format such as:

Agreed, but I think there are a few more probably.

  - MUST contain an VNID field. This field MUST be large enough to scale to
 100's of thousands of virtual networks

In even moderate sized deployments hierarchical assignment of VNIDs,
sub-VNs, fragmentation in the space, classed VNIDs might be used--
this probably should be reflected in requirements to allow for that (I
have pointed this out previously).

I believe the bigger item missing around VNID  is security. Isolation
between VNs isolation is a critical requirement, so it follows that
the VNID must be adequately secured to protect against spoofing or
corruption.

  - ??? QoS field inside the NVO3 overlay header or not ???

I would add that a congestion control mechanism in the encapsulation
layer might also be considered.

  - MUST/SHOULD facilitate ECMP in unmodified routers in the underlay

Is it reasonable to require UDP based encapsulation to support this,
or at least that any protocol can easily be encapsulated within UDP
(like being done for GRE/UDP)?

 (Others participants might have other requirements on the encapsulation
 format. My main message is the focus on the encaps requirements which seems
 to be quite few.)

With respect to encapsulation protocol requirements, I would propose
that extensibility (ability to associate meta data with the
encapsulation) is also a requirement.

 The implementation and operational requirements in
 draft-ietf-nvo3-dataplane-requirements should IMHO belong in a different
 document. Some might already be covered in the architecture document. And
 others might need to be refined as we specify more details for the NVO3
 solution.


One other thing that should be clarified: The protocol of the
encapsulated packet is described in the req. draft as
Tenant frame: Ethernet or IP based upon the VNI type.

This is a requirement for a multi-protocol encapsulation which is
good, but seems restrictive in the protocols carried and implies that
protocol is inferred from VNID as opposed the to requiring/allowing
the encapsulation layer to have a protocol field.

Thanks,
Tom

 Hence if we are going to get WG consensus on a dataplane encapsulation
 requirements document I strongly suggest we start small with the above (3,
 give or take) bullets being the actual content.

Erik

Re: [nvo3] Security issue on L3 address migration ( was RE: L3 Address migration in NVO3 mobility draft (now with the new name : draft-merged-nvo3-ts-address-Migration)

2014-10-15 Thread Tom Herbert

On Wed, Oct 15, 2014 at 2:02 PM, Linda Dunbar linda.dun...@huawei.com wrote:
 Tom,

 I am a bit confused of your comments. See questions inserted below:


 -Original Message-
 From: nvo3 [mailto:nvo3-boun...@ietf.org] On Behalf Of Tom Herbert

 Security should be considered here also. Security is strongest when applied 
 end-to-end. For example, if we are encrypting at the ingress NVE, decryption 
 should happen at the final NVE-- if the NVEs are on host then this implies 
 packets are encrypted during all of transit.
 Also, when encrypting this may be over the full inner packet so that virtual 
 addresses are not visible to intermediate devices. With ESP/GUE the header 
 stack might be IP/UDP/GUE/ESP/IP; vnid is visible, but virtual addresses 
 aren't. Even if we are just using a security cookie to validate a VNID, the 
 outer IP addresses can be considered for anti-spoofing. Triangular routing 
 convolutes both of these cases I believe.

 [Linda] I would think the L3 address migration is orthogonal to ingress NVE 
 encryption  egress NVE decryption. Can you elaborate how those two issues 
 related?

I'll use a simple example to demonstrate this...

Suppose we have a customer that has requested encryption for data in
flight. So for communications between their two NVEs we need to have
an established SA between them (one SA should be sufficient between
the NVEs, we probably don't need an SA for each VN or virtual
address).

So if V1 and V2 are communicating endpoints, and E1 and E2 are the
corresponding NVEs, then we would have something like:

V1-E1-E2-V2

where communications between E1, E2 are encrypted-- V1 to E1 and E2 to
V2 are plaintext (where if they are co-resident with the respective
NVEs on a host that all communications are encrypted).

So now if we have triangular routing, then the path looks like:

V1-E1-X-E2-V2.

So the question is how to encrypt packets with this path. We can't use
an SA between E1 and E2 since E1 doesn't know that that it's peer is
E2 (if it did then we wouldn't be doing triangular routing). So in
order to encrypt end to end, it seems like we need to encrypt using
one SA between E1 and X, and another between X and E2-- but this is
not really end to end and I would assume it's a lot more load on X
than it bargained for.




 In theory, host hosting by every NVE (including the DCBR) can achieve
 the optimal path forwarding in very fragmented network. But host
 routing can be challenging in a very large and highly virtualized data
 center, there could be hundreds of thousands of hosts/VMs, sometimes
 in millions, due to business demand and highly advanced server 
 virtualization technologies.

 This is true, but there are some mitigating factors favoring host routes. 1) 
 a particular NVE should only be communicating with a subset of virtual 
 addresses at a given time. 2) Migrations are fairly rare events and latency 
 to adapt is already assumed to be in 10's of msecs.
 An extra round trip or so to relearn a host route after migration is not 
 unreasonable.

 So the host routes could be in a cache containing the working set where there 
 is a mechanism to resolve them.

 [Linda] I'd assume that NVA can always send update when a host is moved to a 
 different NVE. So I am not concerned with the time taken to learn the new 
 route. The text is intended to describe other options when the cache of a 
 switch/router in a Data Center  can't have individual mappings for all VMs in 
 the VNs supported (especially the gateway).




 ECMP can be used by the DCBR or any NVEs that don’t support host
 routing or can’t access NVA to distribute traffic equally to any of
 the NVEs that support the subnet (VN). If an NVE doesn’t have the
 destination of a data packet directly attached, it can query NVA for
 the target NVE to which the destination is attached, and encapsulate
 the packet with the target NVE as outer destination before sending it out.

 Another approach is to designate one or two NVEs as designated
 forwarder for a specific subnet when the subnet is spread across many
 NVEs. For example, if high percentage of TSs of one subnet is attached
 to NVE “X”, the remaining small percentage of the subnet is spread around 
 many NVEs.
 Designating NVE “X” as the designated forwarder for the subnet can
 greatly reduce the “triangular routing” for the traffic destined to
 TSs in this subnet.

 I'm not sure this is practical. While there are reasons to schedule VMs with 
 physical locality in the DC, there are also reasons we wouldn't do this (like 
 we don't want a single device failure to be able to take out all VM's of a 
 customer). Also, I don't think we want to constrain the job scheduler any 
 more than it already is-- it's likely over enough time that entropy will 
 prevail so that VMs for particular subnets are randomly distributed across 
 the DC.

 If we designate NVE X as a forwarder like you suggest, then when it gets a 
 packet for which there is a better route

Re: [nvo3] L3 Address migration in NVO3 mobility draft (now with the new name : draft-merged-nvo3-ts-address-Migration)

2014-10-10 Thread Tom Herbert

Hi Linda,

Thank you for adding this section to the draft. Some comments in line...


 L3 Address Migration

 When the attachment to NVE is L3 based, TS migration can cause one
 subnetwork to be scatted among many NVEs, or fragmented addresses.

 The outbound traffic of fragmented L3 addresses doesn’t have the same issue
 as L2 address migration, but the inbound traffic has the same issues as L2
 address migration (Section 6). In theory, host hosting at DCBR can achieve
 the optimal path forwarding in very fragmented network. But host routing can
 be challenging in a very large and highly virtualized data center, there
 could be hundreds of thousands of hosts/VMs, sometimes in millions, due to
 business demand and highly advanced server virtualization technologies.

 Optimal routing of TS's inbound traffic. This means that as a given TS moves
 from one server to another, the (inbound) traffic originated outside of the
 TS's directly attached NVE, and destined to that TS be routed optimally to
 the NVE to which the server presently hosting that TS, without first
 traversing some other NVEs. This is also known as avoiding triangular
 routing.

Security should be considered here also. Security is strongest when
applied end-to-end. For example, if we are encrypting at the ingress
NVE, decryption should happen at the final NVE-- if the NVEs are on
host then this implies packets are encrypted during all of transit.
Also, when encrypting this may be over the full inner packet so that
virtual addresses are not visible to intermediate devices. With
ESP/GUE the header stack might be IP/UDP/GUE/ESP/IP; vnid is visible,
but virtual addresses aren't. Even if we are just using a security
cookie to validate a VNID, the outer IP addresses can be considered
for anti-spoofing. Triangular routing convolutes both of these cases I
believe.

 In theory, host hosting by every NVE (including the DCBR) can achieve the
 optimal path forwarding in very fragmented network. But host routing can be
 challenging in a very large and highly virtualized data center, there could
 be hundreds of thousands of hosts/VMs, sometimes in millions, due to
 business demand and highly advanced server virtualization technologies.

This is true, but there are some mitigating factors favoring host
routes. 1) a particular NVE should only be communicating with a subset
of virtual addresses at a given time. 2) Migrations are fairly rare
events and latency to adapt is already assumed to be in 10's of msecs.
An extra round trip or so to relearn a host route after migration is
not unreasonable.

So the host routes could be in a cache containing the working set
where there is a mechanism to resolve them.

 ECMP can be used by the DCBR or any NVEs that don’t support host routing or
 can’t access NVA to distribute traffic equally to any of the NVEs that
 support the subnet (VN). If an NVE doesn’t have the destination of a data
 packet directly attached, it can query NVA for the target NVE to which the
 destination is attached, and encapsulate the packet with the target NVE as
 outer destination before sending it out.

 Another approach is to designate one or two NVEs as designated forwarder for
 a specific subnet when the subnet is spread across many NVEs. For example,
 if high percentage of TSs of one subnet is attached to NVE “X”, the
 remaining small percentage of the subnet is spread around many NVEs.
 Designating NVE “X” as the designated forwarder for the subnet can greatly
 reduce the “triangular routing” for the traffic destined to TSs in this
 subnet.

I'm not sure this is practical. While there are reasons to schedule
VMs with physical locality in the DC, there are also reasons we
wouldn't do this (like we don't want a single device failure to be
able to take out all VM's of a customer). Also, I don't think we want
to constrain the job scheduler any more than it already is-- it's
likely over enough time that entropy will prevail so that VMs for
particular subnets are randomly distributed across the DC.

If we designate NVE X as a forwarder like you suggest, then when it
gets a packet for which there is a better route it could send the
equivalent of an ICMP redirect to back to the originator to eliminate
the triangular routing. Furthermore, if instead of acting as a
forwarder, NVE X is a resolver then a resolution protocol could be
implemented (ARP model) so that packets might never forwarded using
triangular routing which could address the end to end security issue I
posed above.

Thanks,
Tom



 Linda Dunbar



___
nvo3 mailing list
nvo3@ietf.org
https://www.ietf.org/mailman/listinfo/nvo3

Re: [nvo3] Poll for a better name for draft-merged-nvo3-vm-mobility-scheme-00.txt

2014-10-07 Thread Tom Herbert

On Tue, Oct 7, 2014 at 7:33 PM, Sri Gundavelli (sgundave)
sgund...@cisco.com wrote:

 Hi Thomas,


  No. Let's please not go here.

 No .. no. My comment is not intended to argue against inventing a new
 approach; I'm not in the way. But, if we do not discuss and show a
 reasonable technical argument as why an option needs to be ruled out, I
 suspect we will end with the exact same answer, but with a different
 technology title and functional entities.  I'm more curious to understand
 how the VM mobility properties/requirements are any different from IP
 mobility requirements of a classical mobile node. In today's deployments,
 that classical mobile node is not always a cellular device /laptop, but
 can also be a mobile router, IOT device with no user association.


 IP Mobility protocols are providing layer-3 mobility to a device. There is
 very little relation to the user. If the device moves by itself, or if the
 user moves it physically, the protocol has no clue. If there exists any
 relation to the user, its only about using the access authentication of
 the user and binding that identity to the mobility session. But, even such
 user relation is not present in mobile router/IOT deployments. The core
 protocol only deals with signaling and managing the forwarding state.
 There is no user identity semantic in the data-plane.

 If a mobile node changes its point of attachment in the network, because
 the owner of the device moves it, or when a VM moves across L3 boundaries
 due to the administrator triggering a VM migration may not mean any thing
 to the protocol underneath. If we take CDMA2000, the MIP stack is in the
 chipset and there are many IOT devices with the cellular interface and
 there is still IP Mobility for the device without a user operating it.




  the entire physical device is mobile,

 This is a good point. But, however you look at it, the Virtual Machine
 construct is presenting the view of a separate IP node. It has an
 Operating System, set of applications, a logical interface card, IP
 address configuration, forward stack, and a set of resources. Applications
 are able to bind to an address, TCP/UDP ports and are able to send/receive
 IP traffic. That Virtual Machine entity as a whole is moving across
 networks. Now, why would it matter for layer-3 mobility protocols to be
 aware of this subtle difference on a real mobile device, vs a VM instance
 ? Why is it relevant from the forwarding point of view ?


You've reverted to posing the networking virtualization problem in
terms of virtual machines which leads to a use case specific
solution-- this is exactly the reason I suggested to not use the term.
The general problem is not (virtual) machine migration, it is a
problem of moving networking state between hosts and adapting the
network to be aware of this.






  Finally, in MIP, the mobile device itself *knows* it is mobile and
 actively participates in that mobility.

 There exists two mobility models; Client-based and network-based. For the
 later, there is no such assumption on the client awareness. The network is
 responsible for providing the mobility.



 Regards
 Sri




 On 10/7/14 9:54 AM, Thomas Narten nar...@us.ibm.com wrote:

  = Now, if I replace VM with mobile node, that's layer-3
  mobility and we have few solution options thereŠ
 
 No. Let's please not go here. As David pointed out in a different
 thread, NVO3 is about VM *migration*.
 
 Mobile IP is a very different beast built on a number of fundamentally
 different assumptions. E.g., in mobile IP, the entire physical device
 is mobile, not just a VM. Also, the physical device is moving, i.e.,
 because its owner is carrying it around. In the DC, the VM is moving
 because the DC operator wants to move it. Finally, in MIP, the mobile
 device itself *knows* it is mobile and actively participates in that
 mobility. In data centers, the VMs are oblivious to being moved and
 are not themselves actively involved in any of the signaling or steps
 of the move.
 
 Thomas
 
 ___
 nvo3 mailing list
 nvo3@ietf.org
 https://www.ietf.org/mailman/listinfo/nvo3


___
nvo3 mailing list
nvo3@ietf.org
https://www.ietf.org/mailman/listinfo/nvo3

Re: [nvo3] Poll for a better name for draft-merged-nvo3-vm-mobility-scheme-00.txt

2014-10-06 Thread Tom Herbert

On Mon, Oct 6, 2014 at 8:54 AM, Linda Dunbar linda.dun...@huawei.com
wrote:

 I agree with Tom  David that this draft only describes the solutions to
address the issues associated with hosts' (VMs') addresses floating across
multiple NVEs or PODs, but not cover the state aspects (on middleware
boxes) of the VM mobility.

 What do people think of NVO3 Address Mobility Scheme?

While it's a better title, I think my comment was meant to be more general.
VMs (and hypervisors) are references to specific mechanisms of server
virtualization, but not the only mechanisms that can be deployed (e.g.
container virtualization is not normally a VM and does not have an explicit
hypervisor). VM is really just one use case of network virtualization and
not in itself a networking term, so I think that mechanisms or protocols
for network virtualization really should be described without reference to
VMs unless there really is something VM specific about that.

Tom

 Anyone has better suggestions?


 Thank, Linda

 -Original Message-
 From: nvo3 [mailto:nvo3-boun...@ietf.org] On Behalf Of Black, David
 Sent: Friday, October 03, 2014 8:15 PM
 To: Tom Herbert
 Cc: nvo3@ietf.org
 Subject: Re: [nvo3] FW: New Version Notification for
draft-merged-nvo3-vm-mobility-scheme-00.txt

 I would suggest that Address mobility scheme might be a better
 title. VM migration is one instance of when we need address mobility,
 but we'll need this in with container migration or when we just want
 to move an virtual address between servers.

 I agree, and I think Larry pointed this out earlier.

 Also, w.r.t. VM or
 container migration, addresses are not the only networking state we
 need to consider, we need consider how to move connection state (e.g.
 open TCP connections bond to the address being moved)-- this seems to
 be out of scope for this draft.

 OTOH, I would caution about getting too involved in this as both ends of
the spectrum of connection state preservation are reasonable and used in
practice:

 - VM live migration preserves TCP connections and the like.
 - IP address takeover on hardware failure doesn't preserve
 anything whose state was solely on the hardware that's now a
 smoking pile of parts.

 Thanks,
 --David

 -Original Message-
 From: Tom Herbert [mailto:therb...@google.com]
 Sent: Friday, October 03, 2014 9:04 PM
 To: Linda Dunbar
 Cc: nvo3@ietf.org; Larry Kreeger (kreeger); Black, David
 Subject: Re: [nvo3] FW: New Version Notification for
 draft-merged-nvo3-vm- mobility-scheme-00.txt

 On Fri, Oct 3, 2014 at 10:22 AM, Linda Dunbar linda.dun...@huawei.com
wrote:
  As NVO3 new charter encourage solutions proposals, we added more
 comprehensive solutions for the issues described in the
 draft-ietf-nvo3-vm- mobility-issues. We now call it
draft-merged-nvo3-vm-mobility-scheme-00
 
 I would suggest that Address mobility scheme might be a better
 title. VM migration is one instance of when we need address mobility,
 but we'll need this in with container migration or when we just want
 to move an virtual address between servers. Also, w.r.t. VM or
 container migration, addresses are not the only networking state we
 need to consider, we need consider how to move connection state (e.g.
 open TCP connections bond to the address being moved)-- this seems to
 be out of scope for this draft.

 Tom

  Comments and suggestions are greatly appreciated.
 
  Linda
 
  -Original Message-
  From: internet-dra...@ietf.org [mailto:internet-dra...@ietf.org]
  Sent: Friday, October 03, 2014 12:18 PM
  To: Rahul Aggarwal; Wim Henderickx; Ravi Shekhar; Luyuan Fang; Linda
  Dunbar;
 Rahul Aggarwal; Luyuan Fang; Wim Henderickx; Ravi Shekhar; Yakov
 Rekhter; Yakov Rekhter; Linda Dunbar; Ali Sajassi; Ali Sajassi
  Subject: New Version Notification for
  draft-merged-nvo3-vm-mobility-scheme-
 00.txt
 
 
  A new version of I-D, draft-merged-nvo3-vm-mobility-scheme-00.txt
  has been successfully submitted by Linda Dunbar and posted to the
  IETF
 repository.
 
  Name:   draft-merged-nvo3-vm-mobility-scheme
  Revision:   00
  Title:  NVO3 VM Mobility Scheme
  Document date:  2014-10-03
  Group:  Individual Submission
  Pages:  24
  URL:
http://www.ietf.org/internet-drafts/draft-merged-nvo3-vm-
 mobility-scheme-00.txt
  Status: https://datatracker.ietf.org/doc/draft-merged-nvo3-vm-
 mobility-scheme/
  Htmlized:
http://tools.ietf.org/html/draft-merged-nvo3-vm-mobility-
 scheme-00
 
 
  Abstract:
 This document describes the schemes to overcome the network-related
 issues to achieve seamless Virtual Machine mobility in the data
 center and between data centers.
 
 
 
 
  Please note that it may take a couple of minutes from the time of
  submission
 until the htmlized version and diff are available at tools.ietf.org.
 
  The IETF Secretariat
 
  ___
  nvo3 mailing list
  nvo3@ietf.org
  https://www.ietf.org/mailman

Re: [nvo3] Poll for a better name for draft-merged-nvo3-vm-mobility-scheme-00.txt

2014-10-06 Thread Tom Herbert

On Mon, Oct 6, 2014 at 12:41 PM, Linda Dunbar linda.dun...@huawei.com wrote:
 Tom,

 Are you saying that the term “VM” should only be described in motivation and
 the proposed scheme should work for address mobility enabled by any methods?

Yes, that is my hope, with the constraint that this solution is for
address mobility in realm DC network virtualization (i.e. this should
not be reinventing mobile IP for instance)

Tom



 Linda



 From: nvo3 [mailto:nvo3-boun...@ietf.org] On Behalf Of Tom Herbert
 Sent: Monday, October 06, 2014 2:16 PM

 While it's a better title, I think my comment was meant to be more general.
 VMs (and hypervisors) are references to specific mechanisms of server
 virtualization, but not the only mechanisms that can be deployed (e.g.
 container virtualization is not normally a VM and does not have an explicit
 hypervisor). VM is really just one use case of network virtualization and
 not in itself a networking term, so I think that mechanisms or protocols for
 network virtualization really should be described without reference to VMs
 unless there really is something VM specific about that.









 Tom


 Anyone has better suggestions?


 Thank, Linda

 -Original Message-
 From: nvo3 [mailto:nvo3-boun...@ietf.org] On Behalf Of Black, David
 Sent: Friday, October 03, 2014 8:15 PM
 To: Tom Herbert
 Cc: nvo3@ietf.org
 Subject: Re: [nvo3] FW: New Version Notification for
 draft-merged-nvo3-vm-mobility-scheme-00.txt

 I would suggest that Address mobility scheme might be a better
 title. VM migration is one instance of when we need address mobility,
 but we'll need this in with container migration or when we just want
 to move an virtual address between servers.

 I agree, and I think Larry pointed this out earlier.

 Also, w.r.t. VM or
 container migration, addresses are not the only networking state we
 need to consider, we need consider how to move connection state (e.g.
 open TCP connections bond to the address being moved)-- this seems to
 be out of scope for this draft.

 OTOH, I would caution about getting too involved in this as both ends of
 the spectrum of connection state preservation are reasonable and used in
 practice:

 - VM live migration preserves TCP connections and the like.
 - IP address takeover on hardware failure doesn't preserve
 anything whose state was solely on the hardware that's now a
 smoking pile of parts.

 Thanks,
 --David

 -Original Message-
 From: Tom Herbert [mailto:therb...@google.com]
 Sent: Friday, October 03, 2014 9:04 PM
 To: Linda Dunbar
 Cc: nvo3@ietf.org; Larry Kreeger (kreeger); Black, David
 Subject: Re: [nvo3] FW: New Version Notification for
 draft-merged-nvo3-vm- mobility-scheme-00.txt

 On Fri, Oct 3, 2014 at 10:22 AM, Linda Dunbar linda.dun...@huawei.com
 wrote:
  As NVO3 new charter encourage solutions proposals, we added more
 comprehensive solutions for the issues described in the
 draft-ietf-nvo3-vm- mobility-issues. We now call it
 draft-merged-nvo3-vm-mobility-scheme-00
 
 I would suggest that Address mobility scheme might be a better
 title. VM migration is one instance of when we need address mobility,
 but we'll need this in with container migration or when we just want
 to move an virtual address between servers. Also, w.r.t. VM or
 container migration, addresses are not the only networking state we
 need to consider, we need consider how to move connection state (e.g.
 open TCP connections bond to the address being moved)-- this seems to
 be out of scope for this draft.

 Tom

  Comments and suggestions are greatly appreciated.
 
  Linda
 
  -Original Message-
  From: internet-dra...@ietf.org [mailto:internet-dra...@ietf.org]
  Sent: Friday, October 03, 2014 12:18 PM
  To: Rahul Aggarwal; Wim Henderickx; Ravi Shekhar; Luyuan Fang; Linda
  Dunbar;
 Rahul Aggarwal; Luyuan Fang; Wim Henderickx; Ravi Shekhar; Yakov
 Rekhter; Yakov Rekhter; Linda Dunbar; Ali Sajassi; Ali Sajassi
  Subject: New Version Notification for
  draft-merged-nvo3-vm-mobility-scheme-
 00.txt
 
 
  A new version of I-D, draft-merged-nvo3-vm-mobility-scheme-00.txt
  has been successfully submitted by Linda Dunbar and posted to the
  IETF
 repository.
 
  Name:   draft-merged-nvo3-vm-mobility-scheme
  Revision:   00
  Title:  NVO3 VM Mobility Scheme
  Document date:  2014-10-03
  Group:  Individual Submission
  Pages:  24
  URL:
  http://www.ietf.org/internet-drafts/draft-merged-nvo3-vm-
 mobility-scheme-00.txt
  Status: https://datatracker.ietf.org/doc/draft-merged-nvo3-vm-
 mobility-scheme/
  Htmlized:
  http://tools.ietf.org/html/draft-merged-nvo3-vm-mobility-
 scheme-00
 
 
  Abstract:
 This document describes the schemes to overcome the network-related
 issues to achieve seamless Virtual Machine mobility in the data
 center and between data centers.
 
 
 
 
  Please note that it may take a couple of minutes from the time of
  submission
 until the htmlized

Re: [nvo3] Enhancing Virtual Network Encapsulation with IPv6

2014-09-08 Thread Tom Herbert

 I'm not sure how practical this is. There's already deployment to use
 the flow label as representative of inner flow (like for ECMP hashing,
 etc.).

 Is this use in the flow label fields of the outer IPv6 packets when they're 
 the encapsulation protocol between NVEs in the underlay network?

 I may be misunderstanding your comment, however just to be clear, the IPv6 
 tenant system to tenant system traffic within a virtual network can use their 
 (inner) IPv6 flow label however they like. My draft is only about using the 
 flow label field in the outer IPv6 header used to encapsulate/tunnel across 
 the underlay network between the NVEs.

Yes, I am referring to the outer flow label. In encapsulation we can
set this to the hash in the of the inner four tuple to route in the
network based the inner flow. This is similar to how we can set the
source port of UDP outer header to correspond to the inner flow (see
various UDP encap proposals).


 If we split this field too much, we would start to lose entropy
 for the hash (I would like to think we have at least 14 bits worth of
 entropy here)
 For holding the whole Virtual Network ID in the flow
 label, 20 bits is really limiting for scaling (I still think we need
 32 bits), and partitioning the network into islands with overlapping
 number spaces is an extremely unpleasant thought.


 While I think it would be useful if the flow label was a bit bigger, I 
 thought it would be unlikely that there would be any instances of IPv6 
 underlay networks that would need to support more than in the order of 1 
 million virtual networks.

In a large networks it is likely that the virtual network ID space
will have structure, hierarchical assignment, special bits, etc. so
fragmentation is very possible.

 I appreciate who you work for, does that mean Google might have use cases 
 where more than a million virtual networks need to be supported over a single 
 underlay network?

Yes. Even if we didn't today, I wouldn't want to deploy something that
could limit our growth in the future (say for at least for ten years).
We've already learned this lesson in the IPv4 address space!


 Carrying Tenant Packet Address and Other Information in IIDs:

 Encoding virtual network IDs and tenant address into IPv6 addresses is
 potentially very interesting. We have considered identifier/locator
 (ILNP like) addressing with 64 bits for locator, 32 bits for virtual
 network identifier, and 32 bit tenant address. With this we don't need
 any encapsulation header, so the fact that we're doing NV is
 transparent on the wire. This means all the offloads, and networking
 optimizations for IPv6, TCP/IPv6, etc. will work without change.

 Agree. I looked at some of the measures in other encapsulations such as VXLAN 
 (e.g., UDP header added to provide load balancing entropy), or NVGRE encoding 
 of flow information in the GRE key sequence field (which would need new 
 hardware/firmware to look at it for load balancing), and realised that IPv6 
 either already provided those mechanisms (e.g., flow label with the right 
 contents/values) or they could be achieved by creatively using the IID field 
 in the outer IPv6 addresses if NVEs were identified using /64s rather than 
 /128s in the IPv6 underlay network.

 I'm not quite sure about not needing any VN specific encapsulation header 
 with what I've suggested. If the outer IPv6 flow label was used to carry the 
 VN Context ID, then there would need to be some other header carrying a 
 checksum to protect it, as the IPv6 header doesn't have a checksum to protect 
 it.

Right, this is why I'd prefer to have the VNI in the address so that
it would be included in the pseudo header csum for UDP and TCP. The
use case in DC where no encap header could be relevant is when we are
converting internal communications (e.g. non-third party traffic) to
use VNs. Encapsulating in IPv6 should be no less secure, and using a
native packet format might be critical in maintaining performance
versus non-VN.

 There would also be a need for a 'tenant packet type' field somewhere in a VN 
 specific encapsulation header if more than just a single tenant packet type 
 is to be supported.

Wouldn't next-header be sufficient?

Tom



 I presented last week at Ausnog 2014 on what I've suggested in the ID, here 
 are the slides if they'd be of interest:


 Network Virtualisation: The Killer App for IPv6?
 http://www.users.on.net/~markachy/nvtkaipv6.pdf


 Best regards,
 Mark.


 Mobility is easy, just change identifier to locator mapping. Downside
 is that we may need to do a lot NAT, and we still might need
 additional header to ensure integrity/authenticity of virtual
 networking identifier (it is nice that the addresses, including the
 VNI, are covered in the pseudo header for the normal transport
 checksum)


 Tom


  Thanks very much,
  Mark.




  - Forwarded Message -
  From: internet-dra...@ietf.org
 internet-dra...@ietf.org
  To: Mark Smith

[nvo3] Fwd: New Version Notification for draft-herbert-remotecsumoffload-00.txt

2014-08-26 Thread Tom Herbert

Remote checksum offload is intended to be used to get checksum offload
of encapsulated packets to work with legacy NICs (those that don't
understand encapsulation protocols). I will be posting an
implementation to Linux shortly for GUE and/or Geneve.

Thanks,
Tom

-- Forwarded message --
From:  internet-dra...@ietf.org
Date: Tue, Aug 26, 2014 at 1:39 PM
Subject: New Version Notification for draft-herbert-remotecsumoffload-00.txt
To: Tom Herbert therb...@google.com



A new version of I-D, draft-herbert-remotecsumoffload-00.txt
has been successfully submitted by Tom Herbert and posted to the
IETF repository.

Name:   draft-herbert-remotecsumoffload
Revision:   00
Title:  Remote checksum offload for encapsulation
Document date:  2014-08-27
Group:  Individual Submission
Pages:  9
URL:
http://www.ietf.org/internet-drafts/draft-herbert-remotecsumoffload-00.txt
Status:
https://datatracker.ietf.org/doc/draft-herbert-remotecsumoffload/
Htmlized:   http://tools.ietf.org/html/draft-herbert-remotecsumoffload-00


Abstract:
   This specification describes remote checksum offload for
   encapsulation, which is a mechanism that provides checksum offload of
   encapsulated packets using rudimentary offload capabilities found in
   most Network Interface Card (NIC) devices. The outer header checksum
   (e.g. that in UDP or GRE) is enabled in packets and, with some
   additional meta information, a receiver is able to deduce the
   checksum to be set for an inner encapsulated packet. Effectively this
   offloads the computation of the inner checksum. Enabling the outer
   checksum in encapsulation has the additional advantage that it covers
   more of the packet than the inner checksum including the
   encapsulation headers.




Please note that it may take a couple of minutes from the time of submission
until the htmlized version and diff are available at tools.ietf.org.

The IETF Secretariat

___
nvo3 mailing list
nvo3@ietf.org
https://www.ietf.org/mailman/listinfo/nvo3

Re: [nvo3] Taking NVGRE draft to Informational RFC

2014-08-04 Thread Tom Herbert

On Sun, Aug 3, 2014 at 10:16 PM, Pankaj Garg garg.pan...@microsoft.com wrote:
 Hi All,



 As discussed in the NVO3 forum in the past, we intend to take NVGRE to
 informational RFC status. This would ensure that there is a stable way to
 build compliant and interoperable NVGRE implementation. It would also allow
 extensions to NVGRE to be able to refer to a stable document version as
 opposed to rolling revisions, thus removing the risk of introducing
 incompatibilities and interoperability issues later on. We are seeking (and
 would highly appreciate) editorial feedback from NVO3 fellows. Could you
 please review the latest draft of NVGRE (below) and provide us any editorial
 feedback on the latest NVGRE draft by 11th August?

Hi Pankaj,

Per the draft:

o The C (Checksum Present) and S (Sequence Number Present) bits in
   the GRE header MUST be zero.

This explicitly disallows use of GRE checksum and hence there is no
end to L3 mechanism defined within the protocol to check against
corruption of the nvgre header. As I mentioned before on this list, I
am particularly concerned about the vulnerability of the VSID. I
suspect the intention is that nvgre should only be deployed in
situations where the under lying networks provide error detection
along the whole path (links and in switches). In any case, I would
suggest there should be some discussion of this in the draft.

Thanks,
Tom



 Thanks

 Pankaj (on behalf of NVGRE co-authors)






 ___
 nvo3 mailing list
 nvo3@ietf.org
 https://www.ietf.org/mailman/listinfo/nvo3


___
nvo3 mailing list
nvo3@ietf.org
https://www.ietf.org/mailman/listinfo/nvo3

Re: [nvo3] Comments on http://tools.ietf.org/html/draft-quinn-vxlan-gpe-03

2014-08-01 Thread Tom Herbert

?

 If there is an ethertype allocated for ethernet there can be one allocated 
 for NSH. And then if anyone ever wanted to run NSH directly over ethernet, 
 you have your ethertype.

 But I AM NOT RECOMMENDING THIS. I believe NSH should run on top of UDP and 
 get its own port number. NSH sends UDP packets, period.

 But at least the situation of having both VxLAN and LISP can be simplified by
 having a common umbrella and one common discussion.

 Agree.

 Personally I think VxLAN-gpe is how the VxLAN/LISP header could have looked
 like from the start (hindsight is great, I know) and I don't have a technical
 problem with the draft itself. What is missing is enough context to discuss
 it. E.g. I'm still not sure why there is a P flag, if for a hard technical
 reason or for the aesthetics that every field is controlled by a turn-on flag
 ;-)

 There is a P-flag so you have demux ability in the VXLAN/LISP header. So we 
 will demux at the UDP port level, the P-bit level, and the ethertype level. 
 All these will be demux decisions a forwarder has to deal with. This is quite 
 ridiculous.

 So I encourage and kindly ask the authors to provide more of this context in
 the next draft version.

 Regards, Marc

 Dino




 On Thu, 31 Jul 2014 16:16:53 -0700, Dino Farinacci wrote:
 Dino,

 Would you re-phrase your response?  I am having some trouble parsing it,
 so
 I must be missing something.

 First, I think (when you said ... sent from any pair of ports ...) you
 meant to
 say ... sent with any pair of ports ...  - but this is a guess.

 Yes with is a better way of stating it.

 As for making OAM messages traverse the exact same path as data, this is
 what OAM is expected to do.  In essence, if data follows a path that
 involves

 Good luck. I do not how you will be able to control each ECMP path at each
 path across different vendors as well as the same vendor with different
 hashing algorithms.

 One needs to argue if you really need the granuarlity for the complexity
 that will needed to get this partially correct.

 a non-zero number of gates, while OAM does not, the successful delivery of
 OAM is only an approximate indication of the data-path integrity.  Any H/W
 that data has to go through, and OAM does not go through, could fail and we
 would see an OAM indication of a valid path through which data either would
 not go, or would be diverted in some unexpected way.

 Well I think LISP RLOC-probing is good enough, but I am biased.  ;-)

 Ordinarily, this should not be a problem for the hardware, as (ordinarily)
 the
 OAM is indistinguishable from data.  The hardware works no harder to push
 OAM than it would to push an equivalent amount of data.

 If an ITR sends a packet the ETR's address, the middle boxes do not know if
 it is a control-packet versus a data-packet.

 So, what is the problem again?

 I am trying to avoid problems. Seems like things are being over-engineered.
 Again.

 Dino

 P.S. Sorry I keep being negative. And if one person says shut up, I'll stop
 posting.


 --
 Eric

 -Original Message-
 From: nvo3 [mailto:nvo3-boun...@ietf.org] On Behalf Of Dino Farinacci
 Sent: Wednesday, July 30, 2014 9:13 PM
 To: Larry Kreeger
 Cc: Tom Herbert; David Melman; Marc Binderberger; LISP mailing list list;
 nvo3@ietf.org
 Subject: Re: [nvo3] Comments on
 http://tools.ietf.org/html/draft-quinn-vxlan-gpe-03

 I'm assuming that routers and switches will be multipathing based on
 the UDP port numbers, so I would expect different destination UDP
 ports to take different equal cost paths.

 Well if OAM is going to be effective, messages need to be sent from any
 pair of ports that yield 0 through N modulus so multiple paths can be
 determined. So it doesn't matter with the port number values  you use,
 those control packets will be ECMPed as well.

 If you are also inferring that you want the OAM packets to go through the
 same data-path of each device on the path, then you will have to put TLVs
 in the data path, which is traditionally not prudent. See my Puneet
 reference from previous email.

 Dino

 ___
 nvo3 mailing list
 nvo3@ietf.org
 https://www.ietf.org/mailman/listinfo/nvo3



___
nvo3 mailing list
nvo3@ietf.org
https://www.ietf.org/mailman/listinfo/nvo3

Re: [nvo3] Comments on http://tools.ietf.org/html/draft-quinn-vxlan-gpe-03

2014-07-31 Thread Tom Herbert

On Thu, Jul 31, 2014 at 5:56 AM, Paul Quinn (paulq) pa...@cisco.com wrote:
 removed LISP list


 On Jul 30, 2014, at 8:16 PM, Tom Herbert therb...@google.com wrote:

 On Wed, Jul 30, 2014 at 3:00 PM, Larry Kreeger (kreeger)
 kree...@cisco.com wrote:
 Hi Tom,

 First, the VXLAN-GPE Next Protocol field would indicate the value 0x4 for
 NSH (as specified in draft-quinn-vxlan-gpe-03).  Then, directly following
 the VXLAN-GPE header would be an NSH header.  One would need to define a
 new MD Type (not the SFC value of 0x1, specified in draft-quinn-nsh-03).
 Then you need to make a decision as to whether you want to possibility of
 your authentication value to be processed by hardware or only software.
 If you want hardware support, then I would recommend that it not be
 encoded as a TLV.  If you only care about software support, then you could
 encode the authentication in a TLV using your own organization's TLV
 Class, or perhaps an IETF TLV Class if you are standardizing it.  If you
 expect hardware to parse and validate the VNI authentication, then I would
 encode it somewhere within the 20 bytes following the base NSH header and
 not in a TLV.

 Any optional data I define which proves useful in the datapath I may
 eventually want to implement in HW, and I really wouldn't want to have
 to make such a decision up front-- so I'll assume anything we'd want
 to define would need to go into NSH headers in order to keep HW
 support an option. So then in this model is it correct to say that the
 we could arbitrarily extend the protocol by using a chain of NSH
 headers each of which provides 20 bytes of data we can use for
 optional data and still be HW friendly?


 You can’t generalize hardware parsing like that.  The key for easy of parsing 
 is fixed sizes and known offsets.  Stacking headers changes that.  That isn’t 
 to say you couldn’t do what you suggest, but it would be more complex and 
 limits the value of the simple fixed size header.

I'm not trying generalize hardware parsing, I was trying to
extrapolate how to use NSH to allow different meta data. Unless I
misunderstand, NSH has similar properties to TLVs with the exception
that they are fixed size. I do agree that fixed sizes and known offset
are important for easy parsing, which is precisely why I'm leery about
anything that looks like TLVs in such a low level data path.

The major parsing problem with TLVs (and I presume NSH if more than
one header is allowed) is that the number of possible header
combinations is combinatorial. For example, if there are 5 TLVs this
generates 150 possible combinations. If we use optional flag-fields
like in GUE (based on GRE) there would only be 32 possible
combinations. The latter is much more amenable to HW parsing
techniques like say a TCAM, not to mention the overhead to describe
each TLV potentially increases TCAM width.

We have already verified that both NICs and switches are capable of
parsing GRE with various combinations of the defined variable headers
(keyid, seqnum, csum). I take this as evidence that the is method of
extensibility is HW friendly.

___
nvo3 mailing list
nvo3@ietf.org
https://www.ietf.org/mailman/listinfo/nvo3

Re: [nvo3] Comments on http://tools.ietf.org/html/draft-quinn-vxlan-gpe-03

2014-07-30 Thread Tom Herbert

On Wed, Jul 30, 2014 at 3:00 PM, Larry Kreeger (kreeger)
kree...@cisco.com wrote:
 Hi Tom,

 First, the VXLAN-GPE Next Protocol field would indicate the value 0x4 for
 NSH (as specified in draft-quinn-vxlan-gpe-03).  Then, directly following
 the VXLAN-GPE header would be an NSH header.  One would need to define a
 new MD Type (not the SFC value of 0x1, specified in draft-quinn-nsh-03).
 Then you need to make a decision as to whether you want to possibility of
 your authentication value to be processed by hardware or only software.
 If you want hardware support, then I would recommend that it not be
 encoded as a TLV.  If you only care about software support, then you could
 encode the authentication in a TLV using your own organization's TLV
 Class, or perhaps an IETF TLV Class if you are standardizing it.  If you
 expect hardware to parse and validate the VNI authentication, then I would
 encode it somewhere within the 20 bytes following the base NSH header and
 not in a TLV.

Any optional data I define which proves useful in the datapath I may
eventually want to implement in HW, and I really wouldn't want to have
to make such a decision up front-- so I'll assume anything we'd want
to define would need to go into NSH headers in order to keep HW
support an option. So then in this model is it correct to say that the
we could arbitrarily extend the protocol by using a chain of NSH
headers each of which provides 20 bytes of data we can use for
optional data and still be HW friendly?

Thanks,
Tom

  - Larry

 On 7/30/14 2:46 PM, Tom Herbert therb...@google.com wrote:

 I think the intent of NSH is to be generic enough to work at different
 layers.  The recent addition of the Metadata Type field in the NSH
header
 allows for it to be used for purposes beyond SFC.  It could
theoretically
 be used to essentially extend the header of the layer below it (e.g.
 VXLAN/LISP).  e.g. I think this could be used for Tom to carry his 64
bit
 VNI authentication.

I'd be interested to see exactly what the headers might look like in
that case. I've tried to extrapolate from the SFC drafts how that
might work but really don't see it...

Thanks,
Tom

  - Larry

Just like any other UDP application. If that packet needs to be
encapsulated that is a lower level function. Just like IP packets can go
in an MPLS based LSP.

 picture?  They do mean something about how the packet is handled,
don't
they?

I won't answer that because those bit introductions into the design are
indeed design bugs IMO.

Dino
___
nvo3 mailing list
nvo3@ietf.org
https://www.ietf.org/mailman/listinfo/nvo3



___
nvo3 mailing list
nvo3@ietf.org
https://www.ietf.org/mailman/listinfo/nvo3

Re: [nvo3] Comments on http://tools.ietf.org/html/draft-quinn-vxlan-gpe-03

2014-07-28 Thread Tom Herbert

 Allowing the reserved bits in the header to be ignored on receive
 limits the usefulness in that new bits that are defined can only be
 advisory and not fundamentally change interpretation of the packet.

 I agree with your statement in the 2nd half but not your opinion about the
 usefulness. Don't see a problem here, LISP is a good example how ignoring
 unknown flags works well.

How so? Adding protocol field to LISP-gpe has the same backwards
compatibility problem as VXLAN-gpe, but is resolving it in a different
way. The addition of the P-bit relies on some (presumably) out of band
mechanism to ensure protocol compatibility. From the LISP-gpe draft:

A LISP-gpe router MUST not encapsulate non-IP packets to a LISP
router.  A method for determining the capabilities of a LISP router
(gpe or legacy) is out of the scope of this draft.

Thanks,
Tom


 Regards, Marc



 On Sun, 27 Jul 2014 17:28:01 -0700, Tom Herbert wrote:
 On Sun, Jul 27, 2014 at 2:21 PM, Dino Farinacci farina...@gmail.com wrote:
 2. The VXLAN-GPE draft should focus only on the VXLAN-GPE header and
 requires the assignment of a new UDP port.   The fact that the VXLAN-GPE
 header closely resembles VXLAN may be convenient for implementers, but
 this protocol by definition is not backward compatible with VXLAN.

 If you do that then it will be harder for VXLAN-GPE systems to
 interoperate with a VXLAN systems. Because a VXLAN-GPE system will need to
 open and maintain 2 UDP sockets. And an implementaiton will have to be
 careful to not set the P-bit for the VXLAN socket or clear the P-bit for
 VXLAN-GPE socket. This is all completely unnecessary and one way or the
 other should be used.

 I am not sure what you are suggesting. AFAICT there is no backwards
 compatible means to to add the protocol field to VXLAN which is the
 motivation for the new UDP port, which in turn implies a new protocol
 (which also implies an opportunity to add a more general set of
 protocol features like version number and options extensions which are
 also not backwards compatible). Maybe it's possible to break
 compatibility within the protocol and assume that out of band
 mechanisms could negotiate use of P-bit to compensate, but I assume
 there's already quite a bit of VXLAN deployment so that seems pretty
 shaky robustness-wise to me.

 It's not just adding the protocol field that would be an issue, even
 adding the OAM bit to VXLAN would be problematic. Per VXLAN spec
 unspecified flag bits are ignored on receive, so if the OAM bit were
 subsequently defined in VXLAN it will be ignored by existing
 implementations and these packets will be processed normally-- this
 seems to be incompatible with the proposed VXLAN-gpe requirement that
 When the O bit is set to 1, the packet is an OAM packet and OAM
 processing MUST occur. (btw 'OAM processing' is awfully ambiguous to
 be a MUST here IMO).

 Allowing the reserved bits in the header to be ignored on receive
 limits the usefulness in that new bits that are defined can only be
 advisory and not fundamentally change interpretation of the packet.
 Had the requirement in VXLAN been that packets with unknown bits set
 be dropped, then adding P-bit and O-bit could have been done with
 backwards compatibility. This might be a reasonable requirement to
 consider if new protocol (i.e. new port number) is undertaken.

 Thanks,
 Tom

 And if *it was agreed* on to use different UDP port numbers (like the way
 LISP did it for L2 versus L3 packet encapsulation), we wouldn't need the
 P-bit at all. But there was push back (by somebody) to not allocate
 another port for VXLAN, so the demux was forced to be in the VXLAN header.

 And is also the reason this baggage is being carried over the LISP when it
 really isn't needed.

 3. True, the ‘P’ bit is not needed for backward compatibility, but I’m
 not against it if there is value to make it consistent with the LISP-GPE
 header.

 There is no incremental benefit to use the P-bit for LISP. We had a
 solution but because of the requirement to have no new port for VXLAN,
 LISP is affected.

 Just another example how the working group is putting effort into things
 that creates more work but no benefit. Don't get me wrong, the cisco guys
 did this (the VXLAN and LISP same position for P-bit) for consistency, and
 they should be applauded for that. But if VXLAN could have another port
 number assigned for other protocols, maybe the VXLAN-GPE would look so
 much different.

 Something to think about as the working group now has new productivity
 mentality.

 Dino

 ___
 nvo3 mailing list
 nvo3@ietf.org
 https://www.ietf.org/mailman/listinfo/nvo3

 ___
 nvo3 mailing list
 nvo3@ietf.org
 https://www.ietf.org/mailman/listinfo/nvo3

___
nvo3 mailing list
nvo3@ietf.org
https://www.ietf.org/mailman/listinfo/nvo3

Re: [nvo3] Comments on http://tools.ietf.org/html/draft-quinn-vxlan-gpe-03

2014-07-27 Thread Tom Herbert

On Sun, Jul 27, 2014 at 2:21 PM, Dino Farinacci farina...@gmail.com wrote:
 2. The VXLAN-GPE draft should focus only on the VXLAN-GPE header and 
 requires the assignment of a new UDP port.   The fact that the VXLAN-GPE 
 header closely resembles VXLAN may be convenient for implementers, but this 
 protocol by definition is not backward compatible with VXLAN.

 If you do that then it will be harder for VXLAN-GPE systems to interoperate 
 with a VXLAN systems. Because a VXLAN-GPE system will need to open and 
 maintain 2 UDP sockets. And an implementaiton will have to be careful to not 
 set the P-bit for the VXLAN socket or clear the P-bit for VXLAN-GPE socket. 
 This is all completely unnecessary and one way or the other should be used.

I am not sure what you are suggesting. AFAICT there is no backwards
compatible means to to add the protocol field to VXLAN which is the
motivation for the new UDP port, which in turn implies a new protocol
(which also implies an opportunity to add a more general set of
protocol features like version number and options extensions which are
also not backwards compatible). Maybe it's possible to break
compatibility within the protocol and assume that out of band
mechanisms could negotiate use of P-bit to compensate, but I assume
there's already quite a bit of VXLAN deployment so that seems pretty
shaky robustness-wise to me.

It's not just adding the protocol field that would be an issue, even
adding the OAM bit to VXLAN would be problematic. Per VXLAN spec
unspecified flag bits are ignored on receive, so if the OAM bit were
subsequently defined in VXLAN it will be ignored by existing
implementations and these packets will be processed normally-- this
seems to be incompatible with the proposed VXLAN-gpe requirement that
When the O bit is set to 1, the packet is an OAM packet and OAM
processing MUST occur. (btw 'OAM processing' is awfully ambiguous to
be a MUST here IMO).

Allowing the reserved bits in the header to be ignored on receive
limits the usefulness in that new bits that are defined can only be
advisory and not fundamentally change interpretation of the packet.
Had the requirement in VXLAN been that packets with unknown bits set
be dropped, then adding P-bit and O-bit could have been done with
backwards compatibility. This might be a reasonable requirement to
consider if new protocol (i.e. new port number) is undertaken.

Thanks,
Tom

 And if *it was agreed* on to use different UDP port numbers (like the way 
 LISP did it for L2 versus L3 packet encapsulation), we wouldn't need the 
 P-bit at all. But there was push back (by somebody) to not allocate another 
 port for VXLAN, so the demux was forced to be in the VXLAN header.

 And is also the reason this baggage is being carried over the LISP when it 
 really isn't needed.

 3. True, the ‘P’ bit is not needed for backward compatibility, but I’m not 
 against it if there is value to make it consistent with the LISP-GPE header.

 There is no incremental benefit to use the P-bit for LISP. We had a solution 
 but because of the requirement to have no new port for VXLAN, LISP is 
 affected.

 Just another example how the working group is putting effort into things that 
 creates more work but no benefit. Don't get me wrong, the cisco guys did this 
 (the VXLAN and LISP same position for P-bit) for consistency, and they should 
 be applauded for that. But if VXLAN could have another port number assigned 
 for other protocols, maybe the VXLAN-GPE would look so much different.

 Something to think about as the working group now has new productivity 
 mentality.

 Dino

 ___
 nvo3 mailing list
 nvo3@ietf.org
 https://www.ietf.org/mailman/listinfo/nvo3

___
nvo3 mailing list
nvo3@ietf.org
https://www.ietf.org/mailman/listinfo/nvo3

Re: [nvo3] Comments on http://tools.ietf.org/html/draft-quinn-vxlan-gpe-03

2014-07-15 Thread Tom Herbert

On Tue, Jul 15, 2014 at 4:32 PM, Paul Quinn (paulq) pa...@cisco.com wrote:
 Hi Tom,

 Thanks for the questions and comments!  Please see inline.

 On Jul 14, 2014, at 3:43 PM, Tom Herbert therb...@google.com wrote:

 Hi VXLAN-gpe authors,

 Abstract: technically this is not extending a VXLAN but defines a new
 protocol that looks similar to VXLAN (demonstrated by need for new UDP
 port assignment).

 We are trying to balance re-use of the VXLAN format and the need to support 
 existing non-GPE hardware that might already be deployed.  We looked at using 
 the same port, and the new one, and decided, at this point that a new port is 
 easier for migration but since the packet format is essentially VXLAN to keep 
 the VXLAN name.



 Section 3.1: yet another protocol numbering scheme is defined. Why not
 use IP protocol numbering space. e.g. 41 for IPv6, 94 for IPv4, 97 for
 Ethernet. NSH would need an protocol number allocation (maybe that is
 the intent so these can be used at L3? ).

 We can certainly use the IP protocol numbers for the protocols that have one. 
  However, there are protocols that might be encapsulated that don’t have an 
 IP protocol number (not just NSH) so we still need a registry for those.

NSH looks an awful lot like an IP extension header to me, have you
considered requesting a protocol number for this?

You can always use proto=47 (GRE) as an secondary encapsulation header
to encapsulate the full ETHER_TYPE range. This technique is described
in GUE draft (cost is 4 bytes and a little bit of processing for these
other protocols not represented by an IP protocol number).



 Section 3.1: P-bit seems unnecessary to me, is more complex to
 process, and there is no reason to be compatible with VXLAN. Without
 P-bit we can always do simple indirect look-up to get protocol handler
 (e.g. handler = proto_handlers[protocol]), but with the P-bit we need
 to do an additional conditional.

 The P bit ensures that we have forwarding logic consistent with LISP 
 (https://datatracker.ietf.org/doc/draft-lewis-lisp-gpe/).  Also, in the case 
 if/when gpe uses the same port as VXLAN, the p-bit helps with parsing on the 
 receiving end.

I think there's more logic in being consistent with Ethernet, IP, GRE
protocol standards where the next protocol field is always valid as an
invariant.




 Also, since this now has a protocol and version in the header, the
 only thing fundamentally missing is a header length field. Please
 consider adding that. See GUE
 (http://tools.ietf.org/html/draft-herbert-gue-01) or geneve for the
 justification of why this is critical.


 The length of the gpe header is fixed, so adding length wouldn’t buy us much?

Fixed length gpe (or in VXLAN for that matter) implies no protocol
eXtensibility :-).

Thanks,
Tom

___
nvo3 mailing list
nvo3@ietf.org
https://www.ietf.org/mailman/listinfo/nvo3

Re: [nvo3] needed data plane encap requirement in draft-ietf-nvo3-dataplane-requirements

2014-03-21 Thread Tom Herbert

On Thu, Mar 20, 2014 at 7:20 PM, Pankaj Garg garg.pan...@microsoft.com wrote:
 I agree with you in spirit that carrying 1K of metadata to transport 64B
 payload won’t make sense. The question is, how do we need the right size of
 metadata ahead of time such as 16-bytes or 4-bytes or 64-bytes? Hence the
 argument for flexibility in the metadata size and definition. I am sure that
 when metadata proposals are floated around, there would be enough scrutiny
 to make sure that the impact on data plane is minimal. There are use cases
 of metadata that can actually improve data plane performance, instead of
 reducing it, so the proposals to carry metadata should not prohibit such
 flexibility.

It's not just the size of all the metadata that is an issue, it's also
the number of individual items, the overhead for each item, the
processing algorithm, and the necessary validation that must be done--
all these will add up to determine how fast packet processing is.



 From: Tissa Senevirathne (tsenevir) [mailto:tsene...@cisco.com]
 Sent: Friday, March 21, 2014 7:21 AM
 To: Pankaj Garg; Larry Kreeger (kreeger); Black, David; Rajeev Manur
 Cc: draft-ietf-nvo3-dataplane-requireme...@tools.ietf.org; nvo3@ietf.org


 Subject: RE: [nvo3] needed data plane encap requirement in
 draft-ietf-nvo3-dataplane-requirements



 I think success of any protocol or encap depends on the simplicity. Please
 do not take away that., way this is swinging I am feared that we will
 include 1KB of meta header to transmit 64Byte packet and in turn increase
 latency of my data plane.



 From: nvo3 [mailto:nvo3-boun...@ietf.org] On Behalf Of Pankaj Garg
 Sent: Thursday, March 20, 2014 6:38 PM
 To: Larry Kreeger (kreeger); Black, David; Rajeev Manur
 Cc: draft-ietf-nvo3-dataplane-requireme...@tools.ietf.org; nvo3@ietf.org
 Subject: Re: [nvo3] needed data plane encap requirement in
 draft-ietf-nvo3-dataplane-requirements



 Inline.



 From: nvo3 [mailto:nvo3-boun...@ietf.org] On Behalf Of Larry Kreeger
 (kreeger)
 Sent: Friday, March 21, 2014 2:56 AM
 To: Black, David; Rajeev Manur
 Cc: draft-ietf-nvo3-dataplane-requireme...@tools.ietf.org; nvo3@ietf.org
 Subject: Re: [nvo3] needed data plane encap requirement in
 draft-ietf-nvo3-dataplane-requirements



 I agree with David, which is why I said that we should not use SFC as a use
 case to motivate the need for metadata that would essentially add value at
 the NVO3 layer.  If the metadata is being used for NVO3 OAM/performance
 optimizations, then it should directly follow the NVO3 encap, and using a
 protocol type field to do this is a well understood/implemented method.

 [PG] I agree that using header chaining is one way to carry metadata.
 However, across a range of protocols, all possible methods to carry
 meta-data are used, for example, putting the meta-data in fixed headers, to
 carry it via header chaining, to carrying it via TLV options etc. So in some
 sense, all of these methods are well understood and implemented. It is a
 matter of picking the one that fits the bill the best.



  - Larry



 From: Black, David Black david.bl...@emc.com
 Date: Thursday, March 20, 2014 1:59 PM
 To: Rajeev Manur rma...@broadcom.com
 Cc: draft-ietf-nvo3-dataplane-requireme...@tools.ietf.org
 draft-ietf-nvo3-dataplane-requireme...@tools.ietf.org, nvo3@ietf.org
 nvo3@ietf.org
 Subject: Re: [nvo3] needed data plane encap requirement in
 draft-ietf-nvo3-dataplane-requirements



 More to the point is that for VXLAN’s current L2 service, the existing VXLAN
 implicit indication of Ethernet as the next protocol suffices, as the SFC
 header should be wrapped in an Ethernet header inside the VXLAN
 encapsulation so that the encapsulated packed can be forwarded immediately
 upon VXLAN decapsulation.



 Having VXLAN directly indicate SFC as next protocol causes the SFC header to
 be the outer header after VXLAN decapsulation, and the resulting packet
 cannot be forwarded until the NVE removes or moves the SFC header; that
 seems like a rather unfortunate restriction and coupling of functionality
 (NVO3 and SFC) that ought to be separable.



 Thanks,
 --David



 From: Rajeev Manur [mailto:rma...@broadcom.com]
 Sent: Thursday, March 20, 2014 2:16 PM
 To: Pankaj Garg
 Cc: Lucy yong; Jim Guichard (jguichar; Black, David;
 draft-ietf-nvo3-dataplane-requireme...@tools.ietf.org; Larry Kreeger
 (kreeger; nvo3@ietf.org
 Subject: RE: [nvo3] needed data plane encap requirement in
 draft-ietf-nvo3-dataplane-requirements





 NextProto seems to do the job for currently proposed SFC encapsulation. If
 you mean non-SFC metadata then yes, there may be a few use-cases where
 people may want to carry such metadata E2E directly over NVO3. For such
 cases we could make a provision in the NVO3 hdr to carry optional EXT_HDR
 with metadata.

 Alternatively we could explore possibility of making SFC the generic
 mechanism to carry any type of metadata.

 Thanks!
 --Rajeev



 Original Message-

 From: Pankaj

Re: [nvo3] FW: New Version Notification for draft-zhou-li-vxlan-soe-00.txt

2014-03-14 Thread Tom Herbert

On Thu, Mar 13, 2014 at 7:25 PM, Zhou, Han hzh...@ebay.com wrote:
 Hi folks,

 We posted a draft as an extension to VXLAN. Please take a look.

 The motivation came from our experiments on VXLAN optimization. It seems lots
 of discussions ongoing about the necessity of adding metadata to transport 
 headers,
 and it is also controversial whether we should take offloading into 
 consideration in
 the headers. However, our test result shows significant performance gains even
 without any help from hardware offloading. The performance of a single TCP
 session improved from 1.5 Gbits/sec to 3.5~4 Gbits/sec: more than doubled!

Hi Han,

- The mechanisms you're using are local within a host so this should
be accomplished by software API as opposed to changing the on-the-wire
protocol. The API should be generic other encaps. In the most general
case, we'd want to provide a TSO device interface to the guest and
only do the segmentation at last possible point in the stack (either
GSO from the host's physical driver, or TSO if device has support).
Most of this is supported now in Linux kernel.
- I believe this would conflict with the proposal to add a protocol
field to the VXLAN header. Overloading one field in a fixed header is
not an adequate substitute for a truly extensible header. In the best
case we could only use one or the other functionality in a given
packet. In the worse case, overloading opens the door to backwards
compatibility issues and the potential for misinterpretation of
fields.

Tom

 So this is a practical yet generic proposal, which extends the offloading 
 concept
 to from kernel stacks to remote end-points of overlay networks.

 The metadata for offloading is very similar to STT. There difference is that:
 1. it doesn’t add fake TCP header to utilize NIC TSO.
 2. it doesn't include helper fields - just to save the limited VXLAN header 
 space for
 other possible purpose in the future.
 3. VXLAN is widely adopted and this is only a minor extension backward 
 compatible

 Based on this, it is highly recommended to add segmentation metadata in VXLAN
 header as proposed in this draft.

 Any comments are appreciated!

 Best regards,
 Han Zhou

 -Original Message-
 From: internet-dra...@ietf.org [mailto:internet-dra...@ietf.org]
 Sent: Thursday, March 13, 2014 10:29 PM
 To: Zhou, Han; Li, Chengyuan; Li, Chengyuan; Zhou, Han
 Subject: New Version Notification for draft-zhou-li-vxlan-soe-00.txt


 A new version of I-D, draft-zhou-li-vxlan-soe-00.txt
 has been successfully submitted by Han Zhou and posted to the
 IETF repository.

 Name:   draft-zhou-li-vxlan-soe
 Revision:   00
 Title:  Segmentation Offloading Extension for VxLAN
 Document date:  2014-03-13
 Group:  Individual Submission
 Pages:  7
 URL:
 http://www.ietf.org/internet-drafts/draft-zhou-li-vxlan-soe-00.txt
 Status: https://datatracker.ietf.org/doc/draft-zhou-li-vxlan-soe/
 Htmlized:   http://tools.ietf.org/html/draft-zhou-li-vxlan-soe-00


 Abstract:
Segmentation offloading is nowadays common in network stack
implementation and well supported by para-virtualized network device
drivers for virtual machine (VM)s. This draft describes an extension
to Virtual eXtensible Local Area Network (VXLAN) so that segmentation
can be decoupled from physical/underlay networks and offloaded
further to the remote end-point thus improving data-plane performance
for VMs running on top of overlay networks.




 Please note that it may take a couple of minutes from the time of submission
 until the htmlized version and diff are available at tools.ietf.org.

 The IETF Secretariat

 ___
 nvo3 mailing list
 nvo3@ietf.org
 https://www.ietf.org/mailman/listinfo/nvo3

___
nvo3 mailing list
nvo3@ietf.org
https://www.ietf.org/mailman/listinfo/nvo3

Re: [nvo3] comment on herbert-gue-01

2014-03-11 Thread Tom Herbert

On Tue, Mar 11, 2014 at 9:49 AM, Lucy yong lucy.y...@huawei.com wrote:
 Hi Tom,

 Please see in-line below.

 -Original Message-
 From: Tom Herbert [mailto:therb...@google.com]
 Sent: Monday, March 10, 2014 7:14 PM
 To: Lucy yong
 Cc: nvo3@ietf.org
 Subject: Re: comment on herbert-gue-01

 Hi Lucy, thanks for the comments!

 On Mon, Mar 10, 2014 at 2:51 PM, Lucy yong lucy.y...@huawei.com wrote:
 Hi Tom,

 I read this draft. It is interesting proposal. It is indeed another
 tunneling encapsulation proposal and aims in applying to NVO as well
 (not limited to).

 Regarding the semantics, it suggests using the flag on the header to
 indicate the option field presence in the header. Three flags are
 specified in current proposal. For such kinds of semantics, IMO, it is
 very important to specify the processing order in the flags that
 derives the option field presence. This is because that these optional
 fields are present independently. If the flag is on, the corresponding
 field present, but the processor does not know where the option fields
 present. If all the option field length is vary, it mandates that the
 processor look at the header flag in the order too.  Therefore,
 specifying flag processing order is required in such semantics.

 The order of fields follows the order of the option flag(s) and the size of a 
 particular optional field is fixed. So for a specific set of flags, the 
 layout and sizes of the fields are fixed. For instance, if the flags are 
 0xa000, the fields are precisely the four byte vni followed by 8 byte 
 security header. There is no other possible interpretation, so in fact I can 
 use such properties to create a fast
 path:
 [Lucy] OK. Then it is necessary to make the field order constraint explicitly 
 clear. BTW using flag bits achieve different sizes of a field has pros and 
 cons.

Indeed. I had considered making all fields one size (say 32 bits),
where each bit indicates once field, and contiguous larger fields
could be created by grouping continuous bits together. This would make
header length and offset calculation very easy. Problems were that
this precludes true flags (no associated field), large fields would
consume a lot of the bits (e.g. a 256 bit field requires 8 flag bits),
and a grouped field could have multiple representations yielding the
same length.

 if (gue-flags == 0xa000) {
vgue = (struct vgue *) gue;
sec_key = lookup(vgue-vni, ip addresses, ...)
if (vgue-sec !=  sec_key)
   goto drop_packet;

strip_gue_hdr(pkt);

proto_ctx = proto_table[gue-protocol];
(*proto_ctx-process_pkt)(pkt);
 }

 Because TLVs have no requisite ordering, it is much more difficult to code a 
 simple fast path like this (see TCP code).

 For the same above reason, the statement “A middle box may interpret
 some flags and optional fields of the GUE

 header for classification purposes, but is not required to understand
 all flags and fields in GUE packets.” has some problem. How does
 middle box know where the needed fields present on the GUE packets
 without knowing all the flows and option fields format? I do not know
 why you want middle box to perform the treatment on the inner payload 
 without tunnel termination?

 We assume that new flags and bits are always added at the end (right of last 
 bit that was defined). In this way, if a device implements some processing of 
 the n'th bit, if needs to know how to compute it's field offset which is a 
 function of the first n-1 bits which were previously been defined. New bits 
 added at n+1 and above can't change the offset of the n'th bit so a device 
 can continue to process it without knowing anything about the new bits. The 
 header length field allows skipping over new unknown bits to find the next 
 header.
 [Lucy] yes, if the field presence order and field length are fixed. The 
 middle box needs to understand GUE format including all optional fields but 
 not the content in the fields. Having the private field at last and requiring 
 a field length indication there becomes unique privilege for this field.

That's why we needed to put private bits at the end.

 It is my impression that IPsec maybe used by a underlay network that
 carries NVO traffic when necessary. Here you proposal that using IPsec
 within the overlay network. Does that mean that the overlay app. does
 not rely on the underlay network to provide security?

 Yes, IPsec may always carry nvo3 traffic, but the problem is how to maintain 
 visibility of the encpasulation to intermediate network devices. For 
 instance, the UDP portion of a VXLAN packet could be encapsulated in 
 transport mode-- this gives headers:
 IP-ESP-UDP-VXLAN-packet. The problem is that the VXLAN header is no longer in 
 the outside header (could in fact be encrytped) so if we want do firewall in 
 the network based on vni we have lost that ability. My solution is to 
 encapsulate by IP-UDP-GUE-ESP-packet.
 [Lucy] Are these controversial

Re: [nvo3] question and comment on gross-geneve draft

2014-03-10 Thread Tom Herbert

On Mon, Mar 10, 2014 at 11:48 AM, Lucy yong lucy.y...@huawei.com wrote:
 Hi Authors,



“Transit device.  A forwarding element along the path of the tunnel.

A transit device MAY be capable of understanding the Geneve frame

format but does not originate or terminate Geneve packets.”



 Could you give an example of such transit device? I do not call a firewall
 device as a transit device or forwarding element. If you mean that, please
 use the term of service function and recheck if the definition fit or not.



 NVO technology aims in tunneling packets across a underlay network, and
 tunnel terminates at network virtualization edge (NVEs).



 I think that all the metadata description relates to this transit device and
 is very confused.



 The fields in a tunnel encapsulation header are for tunnel ingress end point
 (EP) to convey some information (state) for tunnel egress EP, so egress EP
 can react on it. To design such header, we should be very clear what kind of
 actions tunnel egress EP can or should act on it. There are three: one is to
 terminate the tunnel and forward the packet based on inner address on the
 packet; the second is to terminate the tunnel and forward it based on other
 information (i.e. not inner address on the packets); third are OAM action.
 Is there other beside these three? We have OAM flag in the geneve header, we
 need another flag to differ between the first action and the second.

Why does the header need to indicate how the packet is to be
forwarded? Since the packet is terminated, how to forward it or
process it is an otherwise local decision. OAM would be better served
to be expressed in an EtherType to eliminate awkward semantics of the
bit, so then all geneve packets are processed first based on
EtherType.




 It is possible that some of other information in the second action may be
 carried by the encapsulated packet, which is what SFC WG is working on and
 names it as SFC header. But the tunnel encapsulation header just needs to
 distinguish the two actions and treats SFC header as a metadata in the
 second action.



 We should separate the states that a tenant system needs to pass to the
 other tenant system from the tunnel encapsulation format because the tunnel
 terminates at an NVE not tenant system.  The critical optional flag in
 geneve header is too general without clear requirements.



 Regards,

 Lucy






 ___
 nvo3 mailing list
 nvo3@ietf.org
 https://www.ietf.org/mailman/listinfo/nvo3


___
nvo3 mailing list
nvo3@ietf.org
https://www.ietf.org/mailman/listinfo/nvo3

Re: [nvo3] question and comment on gross-geneve draft

2014-03-10 Thread Tom Herbert

On Mon, Mar 10, 2014 at 1:29 PM, Lucy yong lucy.y...@huawei.com wrote:
 Please see in-line below w/ [Lucy1]

 -Original Message-
 From: Tom Herbert [mailto:therb...@google.com]
 Sent: Monday, March 10, 2014 3:23 PM
 To: Lucy yong
 Cc: draft-gross-gen...@tools.ietf.org; nvo3@ietf.org
 Subject: Re: [nvo3] question and comment on gross-geneve draft

 On Mon, Mar 10, 2014 at 12:49 PM, Lucy yong lucy.y...@huawei.com wrote:

 -Original Message-
 From: Tom Herbert [mailto:therb...@google.com]
 Sent: Monday, March 10, 2014 2:29 PM
 To: Lucy yong
 Cc: draft-gross-gen...@tools.ietf.org; nvo3@ietf.org
 Subject: Re: [nvo3] question and comment on gross-geneve draft

 On Mon, Mar 10, 2014 at 11:48 AM, Lucy yong lucy.y...@huawei.com wrote:
 Hi Authors,

“Transit device.  A forwarding element along the path of the tunnel.

A transit device MAY be capable of understanding the Geneve frame

format but does not originate or terminate Geneve packets.”

 Could you give an example of such transit device? I do not call a
 firewall device as a transit device or forwarding element. If you
 mean that, please use the term of service function and recheck if the 
 definition fit or not.

 NVO technology aims in tunneling packets across a underlay network,
 and tunnel terminates at network virtualization edge (NVEs).

 I think that all the metadata description relates to this transit
 device and is very confused.

 The fields in a tunnel encapsulation header are for tunnel ingress
 end point
 (EP) to convey some information (state) for tunnel egress EP, so
 egress EP can react on it. To design such header, we should be very
 clear what kind of actions tunnel egress EP can or should act on it.
 There are three: one is to terminate the tunnel and forward the
 packet based on inner address on the packet; the second is to
 terminate the tunnel and forward it based on other information (i.e. not 
 inner address on the packets); third are OAM action.
 Is there other beside these three? We have OAM flag in the geneve
 header, we need another flag to differ between the first action and the 
 second.

 Why does the header need to indicate how the packet is to be forwarded? 
 Since the packet is terminated, how to forward it or process it is an 
 otherwise local decision. OAM would be better served to be expressed in an 
 EtherType to eliminate awkward semantics of the bit, so then all geneve 
 packets are processed first based on EtherType.
 [Lucy] because it is the general understanding (or assumption) in NVO that 
 tunnel egress terminates the packet and forwards based on inner address on 
 the packet. In case, a service function such as firewall is a tenant system 
 and attaches to a NVE (say egress). If another NVE (say ingress) receives 
 the packets from its attached Tenant system and needs to forward packets to 
 the firewall first, it needs to encapsulate the packet and send that egress 
 NVE. The ingress NVE has the choice to forward the packet to the destination 
 TS directly or send to a service function based on the policy. Upon 
 receiving a packet from tunnel, egress NVE also needs to know if it should 
 perform inner address based forwarding or forward based on the other 
 information.

 I am not in favor of using EtherType to determine what action that egress 
 NVE should do in NVO because EtherType introduces another layer from network 
 perspective. We use protocol type or ethertype to differ client layer from 
 server layer (i.e. network). Here it is about what actions the network 
 overlay layer to take.

 EtherType is very important in the encapsulation. It serves to both identify 
 the type of the packet being encapsulated and indicates the processing entry 
 point of the packet (e.g. switch (EtherType) ...)-- I these properties should 
 be invariant. Adding alternative paths or interpretations of the enclosed 
 data muddles its definition and convolutes the fast path. If a different 
 packet format is needed where EtherType does not indicate the type of the 
 enclosed data, I would suggest type (version) 0x1 of the encap protocol be 
 defined with it's own format which can be optimized for the alternative 
 format (like a version 0x1 in case of geneve).

 [Lucy1] Yes, EtherType is very important. Geneve header has the EtherType 
 field. My point is that we should not use EtherType for tunnel egress to 
 determine which forwarding mechanism to use. I thought that the version usage 
 is for backward compatibility, not for different type of encap. protocol.

Sorry, that's the way I would do it in GUE which has a type-version
field :-) If there's a field called protocol type in an
encapsulation header then that's what it should be. I'd be really
worried if the interpretation of the field depended on flags or other
parts of the packet.

 Lucy

 Tom

 Regards,
 Lucy

 It is possible that some of other information in the second action
 may be carried by the encapsulated packet, which

Re: [nvo3] comment on herbert-gue-01

2014-03-10 Thread Tom Herbert

Hi Lucy, thanks for the comments!

On Mon, Mar 10, 2014 at 2:51 PM, Lucy yong lucy.y...@huawei.com wrote:
 Hi Tom,



 I read this draft. It is interesting proposal. It is indeed another
 tunneling encapsulation proposal and aims in applying to NVO as well (not
 limited to).



 Regarding the semantics, it suggests using the flag on the header to
 indicate the option field presence in the header. Three flags are specified
 in current proposal. For such kinds of semantics, IMO, it is very important
 to specify the processing order in the flags that derives the option field
 presence. This is because that these optional fields are present
 independently. If the flag is on, the corresponding field present, but the
 processor does not know where the option fields present. If all the option
 field length is vary, it mandates that the processor look at the header flag
 in the order too.  Therefore, specifying flag processing order is required
 in such semantics.

The order of fields follows the order of the option flag(s) and the
size of a particular optional field is fixed. So for a specific set of
flags, the layout and sizes of the fields are fixed. For instance, if
the flags are 0xa000, the fields are precisely the four byte vni
followed by 8 byte security header. There is no other possible
interpretation, so in fact I can use such properties to create a fast
path:

if (gue-flags == 0xa000) {
   vgue = (struct vgue *) gue;
   sec_key = lookup(vgue-vni, ip addresses, ...)
   if (vgue-sec !=  sec_key)
  goto drop_packet;

   strip_gue_hdr(pkt);

   proto_ctx = proto_table[gue-protocol];
   (*proto_ctx-process_pkt)(pkt);
}

Because TLVs have no requisite ordering, it is much more difficult to
code a simple fast path like this (see TCP code).



 For the same above reason, the statement “A middle box may interpret some
 flags and optional fields of the GUE

 header for classification purposes, but is not required to understand all
 flags and fields in GUE packets.” has some problem. How does middle box know
 where the needed fields present on the GUE packets without knowing all the
 flows and option fields format? I do not know why you want middle box to
 perform the treatment on the inner payload without tunnel termination?

We assume that new flags and bits are always added at the end (right
of last bit that was defined). In this way, if a device implements
some processing of the n'th bit, if needs to know how to compute it's
field offset which is a function of the first n-1 bits which were
previously been defined. New bits added at n+1 and above can't change
the offset of the n'th bit so a device can continue to process it
without knowing anything about the new bits. The header length field
allows skipping over new unknown bits to find the next header.



 It is my impression that IPsec maybe used by a underlay network that carries
 NVO traffic when necessary. Here you proposal that using IPsec within the
 overlay network. Does that mean that the overlay app. does not rely on the
 underlay network to provide security?

Yes, IPsec may always carry nvo3 traffic, but the problem is how to
maintain visibility of the encpasulation to intermediate network
devices. For instance, the UDP portion of a VXLAN packet could be
encapsulated in transport mode-- this gives headers:
IP-ESP-UDP-VXLAN-packet. The problem is that the VXLAN header is no
longer in the outside header (could in fact be encrytped) so if we
want do firewall in the network based on vni we have lost that
ability. My solution is to encapsulate by IP-UDP-GUE-ESP-packet.




   o Type: type of header. The rest of the fields in the header are

 defined based the type.



 Do you mean here? a typo?
Yes, should read defined based on the type




 The proposal of encapsulating a layer 2 protocol in GUE is interesting. It
 means that, for different protocol type value, GUE header may be different.
 Why do you think that is a good design? How can hardware implement in an
 easy way?

The GUE header did not change, in this case a GRE packet is being
encapsulated within GUE. With IP protocol we can directly encapsulate
IPv4, IPv6, GRE, and Ethernet frames using EtherIP (as a small bonus,
EtherIP includes an extra two bytes before encapsulated Ethernet
header which retains 32 bit alignment of following IP protocols--
vxlan and nvgre don't account for that). Using GRE encapsulation is
only necessary for other L3 protocols.

Tom




 Regards,

 Lucy





___
nvo3 mailing list
nvo3@ietf.org
https://www.ietf.org/mailman/listinfo/nvo3

Re: [nvo3] Fwd: New Version Notification for draft-herbert-gue-01.txt

2014-03-06 Thread Tom Herbert

On Thu, Mar 6, 2014 at 7:24 PM, Lizhong Jin lizho@gmail.com wrote:
 Hi Tom, see inline below.

 Regards
 Lizhong

 -Original Message-
 From: Tom Herbert [mailto:therb...@google.com]
 Sent: 2014年3月7日 0:42
 To: Lizhong Jin
 Cc: nvo3@ietf.org; mls.i...@gmail.com
 Subject: Re: [nvo3] Fwd: New Version Notification for draft-herbert-gue-
 01.txt

 Hi Lizhong, thanks for the comments!

 On Wed, Mar 5, 2014 at 11:53 PM, Lizhong Jin lizho@gmail.com wrote:
  Hi Tom,
  In section 2.3, the 16bit 0s is redundant for a packet. I prefer the 
  format like
 below:
  0   1   2   3
  0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
 | 0x0 |   Hlen  |V|SEC|R|R|P|P|E| Protocol  |
 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

  When Ebit = 0, protocol = 8bit IP protocol number.
  When Ebit = 1, protocol = 16bit EtherType.
  Then the ASIC could always use the combined 17bit (1bit Ebit + 16bit
 Protocol) as an index to parse the payload.

 We considered combining Ethertype and IP protocol into one field, but came
 to the conclusion this isn't such a good idea. The points are:

 1) Encapsulation of IPv4, IPv6, and EtherIP covers the vast majority of use
 cases, so encapsulating L3 protocols is the point to optimize
 2) As you can see in your modified header format, EtherType needs an
 additional 8 bits of core header. This only leaves 2 bits for future 
 extensions,
 based on the rate at which fields were added to GRE that does not seem like
 enough.
 [Lizhong] R-bits number is indeed an issue. But follow the definition rule of 
 the security and private options in current draft, it is not extensible 
 enough. It seems every new option will occupy some R-bits. Why not change the 
 options to TLV like Geneve? The current definition of VNI is OK, but the 
 security and private options could be TLV. If we change these options to TLV, 
 then we could have more R-bits in the GUE header.

Like I said in previous comments, using flag-fields versus TLVs is an
explicit tradeoff (probably the most interesting discussion needed
regarding meta data). While the TLVs allow open ended extensibility,
flag-fields are limited but extremely simple, more compact, and
parsing is trivial. Giving the low levelness of the protocol and
probable deployment in high PPS environment within data center, I
still opt for the second. Adding new fields requires a lot of
diligence, and we want to ensure that any new fields allow multiple
uses so we for instance have a security field instead of a cookie
field. I believe a good indicator of rate at which options are added
would be GRE, In 20 years, looks like 2 new options were added
(although I believe that there were a handful of non-standardize ones
also).

 3) Lookup of Ethertype is more complex than IP protocol because of the
 larger space. A 64K lookup table (or in your proposal a 128K lookup
 table) is prohibitive in many environments. For instance, on a 64 bit host 
 this
 becomes a 1/2M (and 1M) of memory. The 256 entry IP protocol table is only
 2K memory. In fact, in Linux the EtherType lookup uses a hash table smaller
 than 64K, while IP protocol is directly indexed in a 256 entry table. Note, 
 this
 will also be issue with NSH.
 [Lizhong] as you said, 64k/128k table will not be implemented, and hash table 
 with supported type number will be used instead. The 17bit I suggested is the 
 hash key, and does not implicitly indicate the table size.

Hash tables require relatively more logic than directly indexing.

proto_ctx = table[proto];

vs.

proto_ctx = hash_table[proto  0xff...]
Deal with collisions...

 4) I want to encourage that IP protocol = ESP becomes a common case :-)
 5) Allowing encap of GRE header for other L2 protocols doesn't seem
 unreasonable.
 [Lizhong] my concern is the 16 redundant bits. At current header definition 
 stage, it would be better if we could optimize the header. If GUE has been 
 widely deployed, then by using GRE would be a compatible method.

It's still only 4 bytes overhead. A single TLV in geneve has four
bytes of overhead for instance.

 6) If you remove the UDP part of the encapsulation it looks a lot like an IP
 extension header. This is not a coincidence!

  In my mind, one of the potential use cases of the private fields is
 congestion control. Currently, there is only TCP have congestion control
 mechanism. Some other non-TCP traffics in DC also require flow based
 congestion control.

 On CC and encap:

 1) It is extremely important to consider! The obvious need is for non-
 conformant CC traffic-- basically anything from an untrusted guest
 2) It's very likely that an untrusted guest and host will both be doing CC, 
 so
 we need CC that doesn't interact poorly.
 3) DCCP is an possible option (IP protocol = DCCP) without needing to add
 new fields

Re: [nvo3] Comments on Draft Geneve

2014-03-03 Thread Tom Herbert

Hi Anton,

What you are describing is header data split which is where a device
splits header and data portions of packet into two buffers so that
data can be page aligned (or as least in a different cache line as you
pointed out). Several NICs have already implemented this with TCP to
split out TCP data from rest of the packet headers. They have done
this even though there is no restriction that TCP options don't vary
during the lifetime of a connection, in fact it's pretty essential to
the protocol to allow that they do. Also, NICs are stateless for this,
so the split is done for each packet independently by parsing the
packet and computing the offset for the split. A device can perform
header data split in the same manner for encapsulation protocols (I
suspect some might have already done that for GRE).

Within a data center environment, it is probably true that pretty much
all packets we'd send will have the same format. Homogeneity makes it
much easier to do things like program a TCAM for headers. However,
this is an implementation and deployment choice, not a necessarily
fundamental property of the protocol. At the protocol design
robustness protocol is the stronger guiding principle.

Tom



On Mon, Mar 3, 2014 at 12:05 AM, Anton Ivanov (antivano)
antiv...@cisco.com wrote:
 Hi all,

 I would like to address one more issue which has been omitted so far from
 the background to the discussion.

 If we restrict the use cases to virtualization (which is the remit of NVO3),
 the assumption that variable length options are easy to implement in
 software is valid if and only if they are constant length for the duration
 of a session. Otherwise it is incorrect.

 If you work purely in software with no VMs involved f.e. software switch
 which takes pseudowires from the network and writes to pseudowires with a
 variable length header parsing geneve is trivial - you allocate big enough
 buffers and play with offsets. The code for that has been polished over the
 years, standard kernel buffer handling on all OS-es (or its equivalents for
 switches), nothing new here.

 If you have to pass that data into a VM this changes the picture - you want
 that data to be page aligned so you can page it in without copying it. This
 is trivial if your header is constant for the duration of the session. You
 get the header separately, data separately by knowing the offsets. The APIs
 to do that are there - it does not matter are you doing it in userspace
 (POSIX vector IO and its Microsoft equivalent) or in kernel space scatter
 gather IO. It is easy.

 If your header is variable length during the session and you do not know the
 size for a particular packet you have page-in the whole buffer and supply
 the driver with an offset on where to start. This means that you have to
 zero the bits of the header which would otherwise leak into the VM every
 time and/or do some copying. If you do not zero them, you have a security
 issue of the VM seeing its overlay and/or metadata which may have potential
 security use. The same applies if you can write directly to the VM address
 space instead of paging in buffers via the mmu. Zeroing 256+ bytes on every
 pass tends to add up to quite a few CPU cycles over time.

 So from an implementation perspective as far as variable size headers are
 concerned, there is little difference between software in a virtualized
 environment and hardware. They have very similar restrictions (unless you
 want to sacrifice 40% of your performance to an interim copy). Provided that
 you want performance of course.

 Going back to Geneve - if the header is constant duration within the session
 it is not different from what has been done in l2tp and what is being done
 in sfc. No technical merit to perpetrate it. If the header is variable, then
 we either have a case of:

 1. The draft may need an IPR statement already at this stage. I do not feel
 comfortable discussing a spec that looks like it has been submarined so you
 need a specific piece of IPR to implement it with an acceptable performance.

 2. A spec that is specifically tailored to a single NPU/NIC to ship from a
 single (un)known vendor. This is similarly not something we should be
 discussing (once again - IPR statement there too).

 Brgds,

 A.



 On 02/03/14 23:30, Phil Bedard wrote:

 I've read most of the posts in this thread as an operator who may be looking
 at an overlay solution in the future.

 So the crux of the discussion is whether to extend the functionality of an
 existing protocol or introduce a brand new protocol.

 I would like to see the VNI space extended to 32 bits instead of 24 in
 whatever encapsulation method is being chosen.  24 seems like a holdover
 from the 802.1ah I-SID value and other adapted tunnel protocol limitations
 and I'm not sure it's really necessary anymore.

 I also believe there has to be a protocol identifier in the encapsulation
 header identifying what comes next.  Static provisioning of this kind of

Re: [nvo3] Draft Geneve

2014-02-28 Thread Tom Herbert

On Fri, Feb 28, 2014 at 3:24 PM, Sam Aldrin aldrin.i...@gmail.com wrote:
 Hi all,

 Read the draft but have few questions on the same line others have asked.

 - Is this draft intended for standardizing within NVo3 WG? The status
 indicates it as informational. Also it is good to have it as draft-nvo3,
 if it is meant for NVo3 WG.
 - I fail to find good reasoning, in the current version of the draft, on why
 design of encap transport header should be closely associated with metadata
 OR closely tied together? Could you add more details to clarify?

The draft alludes to the general need for extensibility, but does not
provide any example uses, so maybe I can suggest one. We have a real
use case for an encapsulation protocol with security to allow
validation of the virtual network identifier. In their current for
vxlan and nvgre have no provisions for authenticating or integrity
check of vni, existing mechanisms in the network were not deemed
robust enough to guarantee integrity of vni and ensure strict
isolation between tenants. UDP checksum is not sufficient for this. We
need a mechanism to at least have enforce an unpredictable security
token, or possibly at stronger authentication using something like a
message digest. This is intrinsic to the encapsulation, we cannot
deploy network virtualization without this security, hence an
extensible protocol is desirable. Additionally, as the network scales,
new threats emerge, we may have need for further extensions to adapt.
All of this needs to be efficient and amenable to HW performance
optimizations.

 - Has the GAP analysis draft concluded on the assertion you made in your
 draft, for ex:
  Existing tunnel protocols have each attempted to solve different
 aspects of these new requirements, only to be quickly rendered out of date
 by changing control plane implementations and advancements.
 - In the past WG sessions and over the emails, the indication given was that
 various encap technologies already exist. Defining a good control plane
 technology is the need for the WG, no?
 - Finally, would like to see more details on why new encap is needed and
 could solve the problem, while extensions to VXLAN or NVGRE or LISP cannot
 solve.

 thanks
 -sam



 On Fri, Feb 28, 2014 at 11:00 AM, Anton Ivanov (antivano)
 antiv...@cisco.com wrote:

 On 28/02/14 18:28, Pankaj Garg wrote:
  The definition of meta-data for a feature should come from their
  respective WG, e.g. service-chaining blob design should come from SFC. How,
  the meta-data data is carried in an encapsulation comes from the design of
  the encapsulation format. The point of Geneve is that it would allow the
  meta-data to be carried without breaking NIC hardware offloads, whereas
  VXLAN, NVGRE etc. would require NIC hardware changes to carry meta-data.

 I have read that section about 10 times and I fail to see you doing
 anything else beside putting it on 32 bit boundary and mandating it ends
 on a 32 bit boundary.

 If that is the case, that can be done with any other encaps. No need to
 define a new one - just insert metadata in front of payload and pad to
 boundaries if needed.

 If that is not the case, can you please expand on exactly what it makes
 it so special in the draft so that an insertion of a metadata blob at 32
 bit boundary and ending on 32 bit boundary into L2TPv3, GRE, VXLAN, etc
 does not fulfill that. It is not evident from the draft in its current
 form.

 A.

 
  -Original Message-
  From: nvo3 [mailto:nvo3-boun...@ietf.org] On Behalf Of Anton Ivanov
  (antivano)
  Sent: Friday, February 28, 2014 11:17 PM
  To: nvo3@ietf.org
  Subject: Re: [nvo3] Draft Geneve
 
  On 28/02/14 16:57, Pankaj Garg wrote:
  We can discuss merits of Geneve vs VXLAN vs NVGRE vs STT vs L2TPV3 vs
  Name your favorite protocol when it comes to standardization for network
  virtualization.
  I already said what I had to say here. There is no discussion to be had.
 
  My point was that Geneve is designed for network virtualization and not
  only for service chaining. We have many use cases of ability to carry
  meta-data to evolve network virtualization that have nothing to do with
  service chaining and hence the right discussion forum for Geneve is NVO3
  (IMHO).
  You again missed Kens and my point.
 
  We do not dispute the need to carry metadata and this is exactly the
  discussion we should be having.
 
  However the need to carry metadata has to be _DECOUPLED_ from
  encapsulation requirements.
 
  That is trivial.
 
  Use case - I have a Geneve  (+ metadata) on the left, I cross an NVI and
  I exit on the right via let's say VXLAN. If metadata is naturally welded to
  encapsulation as you propose this means losing all of it.
 
  Sorry, this does not fly. You asking the IETF under the guise of
  standard to grant you monopoly on any deployment where you got a leg in.
 
  Errr.. that shall be a no then :)
 
 
  A.
 
 
  -Original Message-
  From: Anton Ivanov (antivano)

Re: [nvo3] Fwd: New Version Notification for draft-herbert-gue-00.txt

2014-02-12 Thread Tom Herbert

Hi Lizhong, thanks for your comments!

On Tue, Feb 11, 2014 at 6:07 PM, Lizhong Jin lizho@gmail.com wrote:
 Hi Tom
 I reviewed your GUE draft, and it is an interesting draft. Several comments:
 1. Section 2.2. 'Protocol' is 8 bits 
 (http://www.iana.org/assignments/protocol-numbers/protocol-numbers.xhtml). 
 Why not use the EtherType which is 16 bits?

- As I mentioned in the draft, using Internet protocol number covers
L4 (as well as many L3 and L2 protocols). This allows encap of ESP,
SCTP, IPIP etc. EtherType doesn't allow L4.
- IP protocol covers most of the common encapsulation cases: IP, IPv6,
Ethernet, and various L4 protocols. It also allows GRE which can be
used to encapsulate an arbitrary EtherType at cost of additional 4
bytes.
- With the above point, the additional 8 bits for EtherType in the
header seems costly for its benefit to me.
- GUE is modeled after IP extension headers which already carry IP
protocol. In fact, if we separate out the non-UDP part then it could
be defined as an extension header. This becomes a generic header to
carry arbitrary flags and options in a packet.
- Processing an encapsulated IP protocol within an IP packet is more
efficient than an Ethertype. Following the header extension model,
this is just a recursive entry into IP processing. Protocol lookup is
simpler, a 256 entry lookup table is feasible in most implementations,
but 64K table for Ethertypes is less pleasant.
- Several IP protocols have already assigned ports for foo/UDP. ESP,
GRE/UDP proposals for example. Instead of continuing on this trend,
GUE covers all the protocols in one port allocation.

 Or have both to make it general? E.g, have a flag to indicate it is IP 
 protocol number or EtherType.

That could be done, but I think it would be a different type as
opposed to a flag as it would be a different header format.

 2. Section 2.2. 'P' Private flag. Why we need two 'P' flags?

It's mostly arbitrary. Private fields are valuable to allow data
center operators to define their own packet fields and to experiment
with new ones. I could imagine a use case where one private fields is
used for the standard fields, and the second is used for
experiments.

 3. Section 3.1. How to process TTL of IP header? E.g, in non-virtualization 
 environment, will you copy the TTL from Encapsulated packet to IP header? 
 Some applications, like trace, may require that.

GUE should not affect the semantics of IP in IP encapsulation for
things like TTL, so whatever is done for IPIP, SIT, IP/GRE, etc.
should continue to work.

 4. Section 3.6. MTU and fragmentation issues. It is suggested to not allow 
 fragmentation at this tunnel level. Same principle is also defined by VxLAN.

Packets should be be fragmented before tunnel if possible.

 5. Section 5. Motivation for GUE. Suggest to also comparing with 
 http://www.ietf.org/id/draft-quinn-vxlan-gpe-00.txt. From my view, the main 
 benefit of GUE is allowing to have private options.

It's more than just private options, it's that GUE is *extensible*.
Protocols like vxlan, gre, and GUE are potentially primary protocols
with the data center, as such I believe a key requirement is that the
data center operator be able to adapt the protocol to changing needs
and threats. AFAICT neither vxlan nor nvgre allow for new fields, and
even if they did, without an obvious header length field, things like
middlebox deep inspection break when new options appear (this is what
makes it virtually impossible to add a new field to GRE for instance).

Thanks,
Tom

 Regards
 Lizhong

 -Original Message-
 From: Tom Herbert [mailto:therb...@google.com]
 Sent: 2014年2月12日 0:03
 To: nvo3@ietf.org
 Subject: [nvo3] Fwd: New Version Notification for draft-herbert-gue-00.txt

 Hello,

 I didn't originally forward this to the nv03 draft, but it was suggested I 
 do. This
 is a proposal for generic UDP encapsulation (not specifically for 
 virtualization,
 but that could be a use case).

 Major differences between this and vxlan, nvgre are:

 1) Primarily intended to be extensible. This includes reserving a couple of
 flags for private use.
 2) Header length field to allow middle boxes or devices to skip over unknown
 options to find next header.
 3) Encapsulates by IP protocol. In particular, we need a clean way to
 encapsulate ESP or private security protocol within a data center (with or
 without network virtualization).

 Thanks,
 Tom


 -- Forwarded message --
 From:  internet-dra...@ietf.org
 Date: Fri, Dec 20, 2013 at 9:20 AM
 Subject: New Version Notification for draft-herbert-gue-00.txt
 To: Tom Herbert therb...@google.com



 A new version of I-D, draft-herbert-gue-00.txt has been successfully
 submitted by Tom Herbert and posted to the IETF repository.

 Filename:draft-herbert-gue
 Revision:00
 Title:   Generic UDP Encapsulation
 Creation date:   2013-12-20
 Group:   Individual Submission
 Number of pages: 16
 URL: http

< 1 2

101 - 172 of 172 matches

Mail list logo