Hi Erik,

+++DK2:

On 6/1/15, 1:40 PM, "Erik Nordmark" <[email protected]> wrote:

>On 5/28/15 12:36 PM, Deepak Kumar (dekumar) wrote:
>> Hi,
>>
>> Inline +++DK:
>>
>> On 5/28/15 11:50 AM, "Darrel Lewis (darlewis)" <[email protected]>
>>wrote:
>>
>>> On May 27, 2015, at 10:35 AM, Erik Nordmark <[email protected]> wrote:
>>>
>>>> On 5/22/15 1:28 PM, Deepak Kumar (dekumar) wrote:
>>>>> Hi Erik,
>>>>>
>>>>> Agreed we don't ned 96 bytes of customer data in all scenario for
>>>>> intra-DC
>>>>> but it will required in few scenarios and need to be use judiciously
>>>>>by
>>>>> user.
>>>> Deepak,
>>>>
>>>> As I understand your example below it is about testing inter-subnet
>>>> traffic i.e. a case where you have overlay routers which forward
>>>>packets
>>>> based on the original/inner IP header.
>>>>
>>>> That seems like a case were one could be using existing IP tools
>>>>(ping,
>>>> traceroute) in the overlay, and perhaps extending the information
>>>> returned in ICMP errors following the approach used for MPLS (RFC4884
>>>> and RFC4950). Perhaps we should consider those RFCs more in NVO3 since
>>>> it means less need for new tools and new training - many ping and
>>>> traceroute implementations already print the extensions.
>>
>> +++DK: Even in case of intra-subnet traffic we will go till Spine if
>> customer creates their data center in multiple pods or domains.
>Deepak,
>
>I'm having a hard time understanding your point and how it relates to
>whether or not it is useful considering RFC 4884/4950.



+++DK2: I was trying to refer that not only inter-subnet but intra-subnet
traffic can reach super spine and within 1 datacenter.
I am also okay in looking further in RFC 4884/4950 as in TRILL OAM also we
have Return-code and Return-subcode to carry these type of information
only.


>
>>
>> Eg:-
>> Each PODs has 100 Leafs pair, and connectivity between POD for even
>> briding traffic route is only at L3 Spine.
>>
>> Within POD path taken will not go till S1,s2, but for inter-POD it will
>> reach S1.
>>
>> Also Traceroute or TTL expiry doesn't test the exact datapath in
>> forwarding in spine, as real traffic will cover as ASIC treat outer
>>layer
>> TTL expiry different than looking at inner header, de-encap, encap
>> changing vni and forwarding towards the leafs.
>Traceroute as a tool can send packets that mimic the actual payload
>traffic in any way you want (TCP vs. UDP, different source ports,
>different IP addresses; current implementations might not allow full
>control of the source port number). Thus with additional 4884//4950
>information in the ICMP ttl expiry I think it can be quite useful.
>
>Apart from that I don't understand what assumptions you are making about
>the topology and underlay vs. underlay routing to say that TTL expiry
>would result in different paths.


+++DK2:

TTL expiry is required for switches which are not oam capable and also if
we want cpu driven way to get the path, but if a new datacenter is owned
by single vendor and all its equipment support OAM we need more hardware
friendly OAM to match on O bits to provide exact path which will exactly
follow the Data with minimum cpu coming in path.

Customer want to track exact path is taken between 2 VMs by providing the
5 tuple of customer information, intgress interface and dot1q tag of
customer traffic.

There are some enhancements required for TTL expiry

Eg:- A - B - C -D --> There's ECMP on every hop.
We get success from B, but no reply from C, in this case we don't get the
egress interface on B on which data traffic will go. (This can be fixed by
Query hardware to get Egress interface and carrying the TLV back or even
list of all Egress interfaces.)

Eg: In above scenario when there's a scenario of super spine or even DCI
which have to look at original data to forward the packet.
If we terminate the OAM at the super spine/dci then we are not following
the exact data path but cpu driven path. (We need to carry oam channel
inner packet)

Eg: Also when I tried TTL expiry on multiple hardware saw TTL expiry is
happening from underlay before matching the vtep ip and VNI and complete
packet is punted to CPU to do futher validation, or sometime hardware
match the peer information and only inner packet is punted to cpu.

Thanks,
Deepak


>
>Regards,
>    Erik
>
>>
>> Thanks,
>> Deepak
>>
>>> This approach seems reasonable to me.  Its certainly what we've
>>>attempted
>>> to do with the LISP overlay.  I think the problem space is complex
>>>enough
>>> without trying to bold the ocean.  The good thing about this
>>>incremental
>>> approach is that if operational experience proofs a need, it can be
>>> addressed then.
>>>
>>> -=Darrel
>>>
>>>> Do you see a need for the 96 bits of customer data when there is no
>>>> NVO3 overlay routers in the path?
>>>>
>>>> Thanks,
>>>>    Erik
>>>>
>>>>>
>>>>> Network Diagram (Tried my best I will do better diagram in the draft)
>>>>>
>>>>>
>>>>>
>>>>>          /  S1 \       / S2  \           (L3 VTEP)
>>>>>         /       \     /       \      (full mesh in east/west,
>>>>> x1<->s1,s2,
>>>>> x2<->s1,s2, ...)
>>>>>        /         \   /         \
>>>>>       X1         X2       X3       X4 ...
>>>>>       /  \     /  |
>>>>>      /    \   /   |           full mesh in east/west l1<->x1, x2, x3,
>>>>> x4,
>>>>> l2 <-> x1,x2,x3,x4, l3 ..)
>>>>>     /      \ /    |
>>>>>    L1            L2       L3        L4 Š       (L2 VTEP)
>>>>>
>>>>>     |            |
>>>>>    H1            H2 ŠŠ
>>>>>
>>>>>
>>>>> Now H1 is host, L1 is Switch, X1 is switch in underlay, S1 is switch
>>>>> with
>>>>> l3 gateway functionality.
>>>>>
>>>>> All switches are connected in East West fully connected with multiple
>>>>> way
>>>>> ecmp.
>>>>>
>>>>> Customer want to test L1 - Lx scenario providing Host information
>>>>>(H1,
>>>>> and
>>>>> Hx)
>>>>> For example: H1 and Hx are in inter subnet and across mobility domain
>>>>> or
>>>>> any scenario where traffic will reach till L3 VTEP.
>>>>> L3 VTEP will do de-encap of outer header, vxlan header and do again
>>>>> re-encap of packet towards Lx.
>>>>> Now to forward packet toward Lx Inner payload should be same as
>>>>> customer
>>>>> information so right hashing is choosen as it's done for real data
>>>>> packet.
>>>>>
>>>>> Thanks,
>>>>> Deepak
>>>>>
>>>>>
>>>>>       
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On 5/22/15 8:31 AM, "Erik Nordmark" <[email protected]> wrote:
>>>>>
>>>>>> Deepak,
>>>>>>
>>>>>> Didn't have time to go into this on the call, but the draft and your
>>>>>> slides refer to "96 bits of customer data"
>>>>>> I understand why that was needed for TRILL - basically the entropy
>>>>>>for
>>>>>> ECMP/LAG is calculated at each hop in TRILL so it needs to look at
>>>>>>the
>>>>>> inner Ethernet, IP, and TCP/UDP headers.
>>>>>>
>>>>>> But for NVO3 the thinking is to use a UDP header where the source
>>>>>>UDP
>>>>>> port is set (in the ingress NVE) to some hash of those inner
>>>>>>addresses
>>>>>> and ports.
>>>>>> If that is the source of entropy, why do we need to also carry 96
>>>>>>bits
>>>>>> of the inner packet in the OAM frames?
>>>>>>
>>>>>> Regards,
>>>>>>      Erik
>>>>>>
>>>>> _______________________________________________
>>>>> nvo3 mailing list
>>>>> [email protected]
>>>>> https://www.ietf.org/mailman/listinfo/nvo3
>>>>>
>>>> _______________________________________________
>>>> nvo3 mailing list
>>>> [email protected]
>>>> https://www.ietf.org/mailman/listinfo/nvo3
>>>
>>
>

_______________________________________________
nvo3 mailing list
[email protected]
https://www.ietf.org/mailman/listinfo/nvo3

Reply via email to