Re: [bess] IGMP / MLD Proxy Draft update (NLRI change)

2020-04-26 Thread Wen Lin
Hi All,

Thank you for bringing up the discussion on the WG mailing list.

I don’t think RT-4 is a good example to follow.  On the other hand, I am not 
sure if PBB-EVPN or EVPN multihoming was deployed in any real production 
network back in 2013 when the NLRI of RT-4 was updated.   In any case, the 
potential risk to the network is now clearly explained in this email thread.  
For type-8 we need a safe procedure to avoid any potential risk to the network.

Multihoming PEs may come from the same vendor in some networks, but we are 
talking about the standard procedure here.   Similarly, we have the standard in 
the EVPN DF election procedures among/between multihoming PEs.

Today EVPN is widely deployed, not only in DC, but also in service provider and 
enterprise networks.  We not only have PBB-EVPN, but also EVPN/VXLAN, 
EVPN/MPLS, EVPNoMPLSoUPD, etc.  EVPN is supported by many vendors today.

Thanks,
Wen

From: "Ali Sajassi (sajassi)" 
Date: Sunday, April 26, 2020 at 6:08 PM
To: John Scudder 
Cc: "Mankamana Mishra (mankamis)" , "bess@ietf.org" 
, "draft-ietf-bess-evpn-igmp-mld-pr...@ietf.org" 

Subject: Re: [bess] IGMP / MLD Proxy Draft update (NLRI change)
Resent-From: 
Resent-To: , , , 
, 
Resent-Date: Sunday, April 26, 2020 at 6:08 PM

[External Email. Be cautious of content]

Hi John,

I think we need a good operational procedures similar to what we did for RT-4 
regardless of what approach we take because currently we have two deployments 
(by two vendors) that use the RT-8 with two different lengths. And without 
proper procedure, mixing these boxes can cause issues such as BGP session reset 
(which you also pointed out previously). So, I believe we need to have a proper 
procedure while we are upgrading them to interoperate with each other. And for 
interoperability, let me categorize the use of the two different code-points as 
the 3rd option. So for sake of completeness, let me repeat them here:


  1.  Just go with the new format and for multi-vendor deployment, making sure 
the new format is used. Considering the current deployments situations where 
intra-DCs and intra-sites are  done using a single vendor but different vendors 
are used for different sites and DCs, this can be feasible. Maybe that’s why we 
haven’t run into the interop issues because for the current deployment model.
  2.  Accommodate both lengths (i.e., bullet b) above) and turn on 
RT-constraint on the PEs that support old RT-8 format. This way, the RR can 
properly reflect both RT-8 formats. The PEs supporting the new format can be 
inserted into the network without issue. And the PEs supporting the old format 
can be gradually migrated to the new format.
  3.  Use a new code point for the new format and the new PEs need to support 
both code points and then deprecate the old code point
If we look at the vendor situation (AFAIK), since IETF in Nov, the vendors that 
have implemented this feature except one, have upgraded their implementation to 
support either both format or both lengths because we thought we had a 
unanimous agreement. So, that means all vendors except one can do option 1 and 
2. Now if we are asking everyone to implement option 3, then that would impose 
additional burden on the vendors that they have already implemented to support 
both formats/length with the same code point. I agree that if we weren’t in the 
current situation, option 3 would have been somewhat cleaner, but at this 
point, if we go with option 3, we will be asking these vendors to do yet 
another implementation.

With regard to my RT-constraint comment, allow me to clarify it as follow: The 
RT-8 is only intended to be exchanged among multi-homing PEs and 99% of 
multi-homing scenarios are dual-homing. Furthermore, the dual-homing PEs are 
from the same vendor. This means when this route is advertised by a PE in an 
EVPN network that has 100 PEs, it uses a route-target that is for only one 
other PE. So, in a network with PE1 to PE100 where PE1 and PE2 are dual-homed 
and PE1 advertised this route, then only PE2 needs to import this route and all 
other PEs need to discard when they receive. So, let’s assume, we have a 
network where PE1 to PE 50 run the old format and the PE51 through PE100 run 
option-2 (e.g., they either support both formats or both lengths). Then, when 
PE1 wants to advertise an RT-8 intended for PE2, it will be received by PE3 to 
PE100 and they will discard the route. Now, we need to make sure if PE100 
advertises a route for PE99 (its dual-homing counterpart) with the new format, 
it doesn’t cause an issue for PE1 to PE50. These PEs can use RT-constraint to 
have the RR only send the routes that they have imports for. So, PE1 will not 
receive the RT-8 route from PE100 to cause it any issue.

Regards,
Ali




From: John Scudder 
Date: Sunday, April 26, 2020 at 12:33 PM
To: Cisco Employee 
Cc: "Mankamana Mishra (mankamis)" , "bess@ietf.org" 
, "draft-ietf-bess-evpn-igmp-mld-pr...@ietf.org" 

Subject: Re: 

Re: [bess] You are an Author, Please respond: Fwd: WG adoption poll for draft-gmsm-bess-evpn-bfd-04

2020-04-26 Thread MALLIK MUDIGONDA (mmudigon)
 Hello Chairs,

I am not aware of any undisclosed IPR.

Thanks
Mallik M J

From: Donald Eastlake 
Sent: Saturday, April 25, 2020 02:55
To: Ali Sajassi (sajassi) ; Vengada Prasad Govindan 
(venggovi) ; MALLIK MUDIGONDA (mmudigon) 

Cc: Greg Mirsky 
Subject: You are an Author, Please respond: Fwd: WG adoption poll for 
draft-gmsm-bess-evpn-bfd-04

Hi Ali, Prasad, Mallik,

You are front page authors of this draft. According to the BESS chairs, it will 
not progress unless you respond at least to the IPR question in the message 
below. Greg Mirsky and I have already responded. Please respond now.

Thanks,
Donald
===
 Donald E. Eastlake 3rd   +1-508-333-2270 (cell)
 2386 Panoramic Circle, Apopka, FL 32703 USA
 d3e...@gmail.com

-- Forwarded message -
From: Bocci, Matthew (Nokia - GB) 
mailto:matthew.bo...@nokia.com>>
Date: Wed, Feb 26, 2020 at 9:42 AM
Subject: WG adoption poll for draft-gmsm-bess-evpn-bfd-04
To: draft-gmsm-bess-evpn-...@ietf.org 
mailto:draft-gmsm-bess-evpn-...@ietf.org>>, 
bess@ietf.org mailto:bess@ietf.org>>
Cc: bess-cha...@ietf.org 
mailto:bess-cha...@ietf.org>>

Hello,

This email begins a two-weeks WG adoption poll for draft-gmsm-bess-evpn-bfd-04 
[1] .

Please review the draft and post any comments to the BESS working group list.

We are also polling for knowledge of any undisclosed IPR that applies to this 
Document, to ensure that IPR has been disclosed in compliance with IETF IPR 
rules (see RFCs 3979, 4879, 3669 and 5378 for more details).

If you are listed as an author or a contributor of this document, please 
respond to this email and indicate whether or not you are aware of any relevant 
undisclosed IPR, copying the BESS mailing list. The document won't progress 
without answers from all the authors and contributors.

Currently, there are no IPR disclosures against this document.

If you are not listed as an author or a contributor, then please explicitly 
respond only if you are aware of any IPR that has not yet been disclosed in 
conformance with IETF rules.

This poll for adoption closes on Wednesday 11th March 2020.

Regards,
Matthew and Stephane

[1] https://datatracker.ietf.org/doc/draft-gmsm-bess-evpn-bfd/



___
BESS mailing list
BESS@ietf.org
https://www.ietf.org/mailman/listinfo/bess


Re: [bess] VXLAN BGP EVPN Question

2020-04-26 Thread Gyan Mishra
Thank you all for the responses and overall Verizon will be looking forward
to the DCI overlay draft being published and implemented by vendors.

The major gain with this draft over multi site with the re-origination of
RD and VNI translation on the gateways preventing the automatic inter pod
inter site flooding creating the selective advertisement feature.  Not the
same as conversation learning where only active local flows within a fabric
are flooded inter fabric.

Verizon and am sure many other operators would benefit greatly with this
draft.

I had another question vxlan evpn NVO3 RFC RFC 8365.

So with the advent of EVPN architecture it had really paved the way for a
new NG L2 vpn architecture primarily from the gain over VPLS in the core or
L2 data center domain is the separation of control plane and data plane.
So how that is achieved by BGP EVPN procedures as we know with the type 1
a-d ES discovery of host AC single or all active multi homed hosts the
concept of “Mac-IP” learning in the control plane separation from NVE
overlay tunnel  local to remote leaf data plane forwarding for local or
globally significant L2 and L3 VNIs.

The other major benefit of vxlan evpn architecture over vpls or traditional
L2 domains is that broadcast is eliminated and converted to BUM constrained
multicast support of ASM and not SSM in the underlay  natively without MVPN
procedures.

With that unicast is carried in the data plane over NVE tunnel L3 VNI and
multicast is carried over the same NVE tunnel using L2 VNI. So here without
MVPN procedures only ASM is supported for BUM traffic.  I believe putting
my PIM WG hat on is that SSM does not support network based source
discovery and so the source and destination anycast vtep leaf cannot be
discovered.

 However if NG MVPN PMSI-UI-MI inclusive or aggregate inclusive trees are
used for scalability then SSM is supported as the discovery happens via
MVPN route type 7 for SSM and  and type 5 and 6 for ASM. I did notice in
the draft that a lot of the similar MVPN procedures for p-tunnel are
replicated using the same RFC 6513-6514.  I noticed that IR as it is
unicast replication and not multicast and does have more processing load
that there is a chance of
With IR their is a chance of transient packet duplication and with that
their is a special vxlan GPC encapsulation.  How does that work and is that
part of the MVPN procedures or is that available with IR w/o MVPN
procedures.

Also with vxlan EVPN architecture as I mentioned the big gain is the
separation of control and data plane with the Type 2 Mac mobility routes
learned via BGP in the control plane.

>From a practical standpoint and benefit of vxlan evpn is that it is
impossible to have a routing loop not only would the control plane not
build but the data plane NVE overlay tunnel would not come up so from a
technical perspective and major benefit of vxlan evpn is that a day N vxlan
deployment it is impossible to have a routing loop that could cause a
meltdown.  I guess if someone injected a host route for leaf vtep would be
no different then in MPLS world someone injecting a FEC into the core but
that is easy to build controls to prevent.

As far a Unicast traffic type 2 Mac mobility routing loop and is that
possible?

As far as unicast flow Mac mobility routing loop my thoughts are that since
we are using BGP standard BGP rules apply that Spine reflects routes
beteeen the rr client leafs and that following standard BGP advertisement
rules the rr client can to advertise back a Mac that was reflected to it by
another leaf and if the loop had to first happen in the control plane for
it to happen in the data plane.

How about BUM traffic routing loop and is that possible?

Section 8.3.1 split horizon with local bias for vxlan encapsulation would
be the means of BUM traffic loop protection.  So what it’s saying below is
that with the “local bias” feature if you have a pair of leafs with anycast
vtep shared IP that Is the one trackers that it filters on the anycast vtep
IP shared and only floods out to all local multi homed hosts for multicast
with the anycast vtep source.  So that way any Mac looped and sourced from
any remote anycast vtep is dropped.


   Every NVE tracks the IP address(es) associated with the other NVE(s)
   with which it has shared multihomed ESs.  When the NVE receives a
   multi-destination frame from the overlay network, it examines the
   source IP address in the tunnel header (which corresponds to the
   ingress NVE) and filters out the frame on all local interfaces
   connected to ESs that are shared with the ingress NVE.  With this
   approach, it is required that the ingress NVE perform replication
   locally to all directly attached Ethernet segments (regardless of the
   DF election state) for all flooded traffic ingress from the access
   interfaces (i.e., from the hosts).  This approach is referred to as
   "Local Bias", and has the advantage that only a single IP address
   need be used 

Re: [bess] VXLAN BGP EVPN Question

2020-04-26 Thread Gyan Mishra
Jorge

In the BGP EVPN NVO RFC 8365 there are  controls built in for Mac flooding
related to intra pod with all active multi homed hosts. So with any multi
home failure the mass mac withdrawal all NVEs reconverge to new next hop
when the ES of failed gateway is withdrawn.  Also the backup path aliasing
for multi homed always active for load balancing of remote NVEs. Split
horizon filtering for BUM traffic to prevent looping back to different ES
gateway connected to host.


So with the DCI overlay draft those same EVPN procedures for intra pod NVE
to help with convergence and  flooding is now applied to the inter pod
stitched NVE  via the UMR route for BUM traffic.

So the new UMR route type prevents re-flooding when the routes are all
known via alias to redundant gateway similar to the backup path aliasing
for load balancing intra-site.

Kind regards

Gyan

On Sun, Apr 26, 2020 at 10:02 AM Rabadan, Jorge (Nokia - US/Mountain View) <
jorge.raba...@nokia.com> wrote:

> Hi Gyan,
>
>
>
> Actually we started with the evpn dci draft in 2013 :-)
>
>
>
> The way I see the unknown mac route it saves flooding if all the MACs in
> the POD/DC are known beforehand. The unknown unicast traffic can be aliased
> to the GWs. In case of failure in one of the GWs, the AD per-ES route for
> the I-ES will be withdrawn (mass withdraw for all EVIs) and the unknown
> traffic can be sent to the redundant GWs. So this failure won’t generate
> any extra flooding.
>
>
>
> Thanks.
>
> Jorge
>
>
>
> *From: *Gyan Mishra 
> *Date: *Saturday, April 25, 2020 at 8:45 AM
> *To: *"Lukas Krattiger (lkrattig)" , "
> saja...@cisco.com" 
> *Cc: *BESS , Jeff Tantsura ,
> "Rabadan, Jorge (Nokia - US/Mountain View)" 
> *Subject: *Re: [bess] VXLAN BGP EVPN Question
>
>
>
>
>
> + Ali
>
>
>
> Lukas
>
>
>
> I noticed that Ali was on the multi site draft which I which expired in
> 2017 around the same time the DCI overlay  draft was submitted.  I went
> through the logs but did not go through the mail archives to see what
> happen to multi site draft.  My guess is these were two competing drafts
> and multi site was geared solely to EVPN procedures for vxlan encapsulation
> and thus did not achieve WG adoption, where your DCI overlay draft accounts
> for every encapsulation type using EVPN procedures and is more
> comprehensive approach to DCI providing an improved solution to Multisite
> vxlan overlay stitching.
>
>
>
> I like the re-origination of the VNI and RD idea using local context on
> the gateway as an additional control mechanism which prevents Type 2 mac-ip
> routes from being flooded between pods that should not without flood
> filters. With the multi site feature there are no control and all mobility
> routes are flooded unfortunately active or not.
>
>
>
> With this draft is it possible to add a feature for conversation learning
> of only active flows when the type 1 BGP a-d is sent for initial BUM
> advertisement for arp or nd, there could be a snooping mechanism similar to
> IGMP snooping that discovers the active flow and thus creates the control
> plane level type 2 Mac-IP state followed by being flooded in data plane NVE
> tunnel overlay.  I think this concept could apply intra site fabric leaf to
> leaf but I think would be extremely beneficial for inter pod or inter site.
>
>
>
> This could be separate feature or option to the selective advertisement.
>
>
>
> So the selective advertisement works in conjunction with re-origination of
> RD and locally significant VNI.
>
>
>
> So what I would envision with the conversation learning active flow
> detection feature you would use global VNI and now only the active type-2
> Mac-IP routes would be propagated inter pod or site.
>
>
>
> This feature would be a tremendous benefit to operators and help with mac
> scale.
>
>
>
> In our Cisco multisite feature implementations we do use the recommended
> BUM traffic multi site feature specific suppression applied on the BGW.  So
> that definitely helps with the BUM suppression for sure.
>
>
>
> In section 3.5.1 UMR - so the route type is like a default Mac route 0/48
> with ESI set to DCI gateway I-ESI for all active multi homing, and so
> instead of flooding all mac’s and have to rely on mass mac withdrawals
> during a failure, now only the UMR is withdrawn.  Is that correct?
>
>
>
> That’s a huge savings on resources.
>
>
>
> Kind regards
>
>
>
> Gyan
>
>
>
> On Fri, Apr 24, 2020 at 3:25 PM Lukas Krattiger (lkrattig) <
> lkrat...@cisco.com> wrote:
>
> Thanks Jorge and Jeff for guiding all the way thru the features and
> functions we have around, in DCI-overlay and Multi-Site.
>
>
>
> Gyan,
>
>
>
> Specific to the VNI distribution, BUM handling and the re-origination in
> Multi-Site.
>
> With re-origination, the RDs are changed on the GW node. With this in
> mind, the VNI could be Global or local significant. In the case of local
> significants, we can stitch VNIs together (ie (VNI1 - GW - VNI2 - GW -
> VNI3).
>
> Further, MAC- or IP-VRFs that 

Re: [bess] IGMP / MLD Proxy Draft update (NLRI change)

2020-04-26 Thread Ali Sajassi (sajassi)
Hi John,

I think we need a good operational procedures similar to what we did for RT-4 
regardless of what approach we take because currently we have two deployments 
(by two vendors) that use the RT-8 with two different lengths. And without 
proper procedure, mixing these boxes can cause issues such as BGP session reset 
(which you also pointed out previously). So, I believe we need to have a proper 
procedure while we are upgrading them to interoperate with each other. And for 
interoperability, let me categorize the use of the two different code-points as 
the 3rd option. So for sake of completeness, let me repeat them here:


  1.  Just go with the new format and for multi-vendor deployment, making sure 
the new format is used. Considering the current deployments situations where 
intra-DCs and intra-sites are  done using a single vendor but different vendors 
are used for different sites and DCs, this can be feasible. Maybe that’s why we 
haven’t run into the interop issues because for the current deployment model.
  2.  Accommodate both lengths (i.e., bullet b) above) and turn on 
RT-constraint on the PEs that support old RT-8 format. This way, the RR can 
properly reflect both RT-8 formats. The PEs supporting the new format can be 
inserted into the network without issue. And the PEs supporting the old format 
can be gradually migrated to the new format.
  3.  Use a new code point for the new format and the new PEs need to support 
both code points and then deprecate the old code point
If we look at the vendor situation (AFAIK), since IETF in Nov, the vendors that 
have implemented this feature except one, have upgraded their implementation to 
support either both format or both lengths because we thought we had a 
unanimous agreement. So, that means all vendors except one can do option 1 and 
2. Now if we are asking everyone to implement option 3, then that would impose 
additional burden on the vendors that they have already implemented to support 
both formats/length with the same code point. I agree that if we weren’t in the 
current situation, option 3 would have been somewhat cleaner, but at this 
point, if we go with option 3, we will be asking these vendors to do yet 
another implementation.

With regard to my RT-constraint comment, allow me to clarify it as follow: The 
RT-8 is only intended to be exchanged among multi-homing PEs and 99% of 
multi-homing scenarios are dual-homing. Furthermore, the dual-homing PEs are 
from the same vendor. This means when this route is advertised by a PE in an 
EVPN network that has 100 PEs, it uses a route-target that is for only one 
other PE. So, in a network with PE1 to PE100 where PE1 and PE2 are dual-homed 
and PE1 advertised this route, then only PE2 needs to import this route and all 
other PEs need to discard when they receive. So, let’s assume, we have a 
network where PE1 to PE 50 run the old format and the PE51 through PE100 run 
option-2 (e.g., they either support both formats or both lengths). Then, when 
PE1 wants to advertise an RT-8 intended for PE2, it will be received by PE3 to 
PE100 and they will discard the route. Now, we need to make sure if PE100 
advertises a route for PE99 (its dual-homing counterpart) with the new format, 
it doesn’t cause an issue for PE1 to PE50. These PEs can use RT-constraint to 
have the RR only send the routes that they have imports for. So, PE1 will not 
receive the RT-8 route from PE100 to cause it any issue.

Regards,
Ali




From: John Scudder 
Date: Sunday, April 26, 2020 at 12:33 PM
To: Cisco Employee 
Cc: "Mankamana Mishra (mankamis)" , "bess@ietf.org" 
, "draft-ietf-bess-evpn-igmp-mld-pr...@ietf.org" 

Subject: Re: [bess] IGMP / MLD Proxy Draft update (NLRI change)

Hi Ali,

Your option 1 is substantially what I proposed, the sole difference being that 
I propose following normal IETF procedure and moving to a new code point. 
Without moving to a new code point, the only thing standing in the way of a 
catastrophe is luck and good operational procedures, hardly a robust option. 
With moving to a new code point, there’s literally no way to trigger this 
scenario.

It’s the safer thing to do and the right thing to do. The code’s not hard, I’m 
tempted to call it trivial. We do this kind of thing all the time — one code 
point for prestandard, another for the standardized version. I see no downside, 
all upside.

Regarding RT-constrain, I don’t follow your reasoning for how it guarantees 
safety in a mixed network.

—John


On Apr 26, 2020, at 3:21 PM, Ali Sajassi (sajassi)  wrote:

John,

Thanks for your insightful input and suggestion. We have had other situations 
similar to this in the past and we have resolved them by the consensus and 
without having a “ticking time bomb” to cause a network meltdown. One such 
situation was the need to extend RT-4 to add the originator router’s address 
which changed the length of RT-4 route. At the time there were pre-RFC 
implementation from several vendors 

Re: [bess] IGMP / MLD Proxy Draft update (NLRI change)

2020-04-26 Thread John Scudder
Hi Ali,

Your option 1 is substantially what I proposed, the sole difference being that 
I propose following normal IETF procedure and moving to a new code point. 
Without moving to a new code point, the only thing standing in the way of a 
catastrophe is luck and good operational procedures, hardly a robust option. 
With moving to a new code point, there’s literally no way to trigger this 
scenario.

It’s the safer thing to do and the right thing to do. The code’s not hard, I’m 
tempted to call it trivial. We do this kind of thing all the time — one code 
point for prestandard, another for the standardized version. I see no downside, 
all upside.

Regarding RT-constrain, I don’t follow your reasoning for how it guarantees 
safety in a mixed network.

—John

On Apr 26, 2020, at 3:21 PM, Ali Sajassi (sajassi)  wrote:



John,

Thanks for your insightful input and suggestion. We have had other situations 
similar to this in the past and we have resolved them by the consensus and 
without having a “ticking time bomb” to cause a network meltdown. One such 
situation was the need to extend RT-4 to add the originator router’s address 
which changed the length of RT-4 route. At the time there were pre-RFC 
implementation from several vendors already deployed in different networks and 
the vendors decided to go with the new RT-4 format and upgrade to it and making 
sure the interoperability is based on standard RFC and not pre-standard 
version. That worked fine as I and other colleagues from other vendors 
(including yours) are not aware of any issues regarding that update. We have a 
lesser situation in here because of the following implementation status:


  1.  Some vendors have implemented both format
  2.  Some vendors have allowed for both lengths (including my vendor) to avoid 
malformed NLRI. Allowing for both length doesn’t mean supporting both format 
but rather both lengths so that the PE that doesn’t need to import the route, 
doesn’t interpret the old format as malformed.
  3.  Vendors that haven’t implemented it, prefer new format
  4.  AFAIK, there is only a single vendor that implemented the v4-only format

So, based on the current data, I think we can have the following two options 
that IMO are simpler:

  1.  Just go with the new format and for multi-vendor deployment, making sure 
the new format is used. Considering the current deployments situations where 
intra-DCs and intra-sites are  done using a single vendor but different vendors 
are used for different sites and DCs, this can be feasible. Maybe that’s why we 
haven’t run into the interop issues because for the current deployment model.
  2.  Accommodate both lengths (i.e., bullet b) above) and turn on 
RT-constraint on the PEs that support old RT-8 format. This way, the RR can 
properly reflect both RT-8 formats. The PEs supporting the new format can be 
inserted into the network without issue. And the PEs supporting the old format 
can be gradually migrated to the new format.

I should just mention that for RT-4 changes that all the vendors did long time 
ago, the approach (1) was adopted.

Regards,
Ali


From: John Scudder 
Date: Friday, April 24, 2020 at 3:01 PM
To: "Mankamana Mishra (mankamis)" 
Cc: "bess@ietf.org" , 
"draft-ietf-bess-evpn-igmp-mld-pr...@ietf.org" 

Subject: Re: [bess] IGMP / MLD Proxy Draft update (NLRI change)
Resent-From: 
Resent-To: Cisco Employee , , 
, , 
Resent-Date: Friday, April 24, 2020 at 3:01 PM

Hi All,

Regarding the proposal to remove the Leave Group Synchronization field from the 
Multicast Leave Synch Route, the current proposal is inadequate. Below I 
discuss why, and provide an alternate suggestion. For those who don’t want to 
read my wall of text, my key motivation is simple:

- The current proposal is a ticking time bomb because it leaves in the field a 
situation where two incompatible implementations can exist undetectably.

And my proposal boils down to two things:

- For the new format NLRI that omits the field, allocate a new code point. 
Deprecate [*] code point 8 going forward.
- Optionally provide a somewhat more sophisticated interworking option for 
backward compatibility.

Nitty-gritty below including considerations for how to transition from code 
point 8 to the TBD code point.

As far as I can tell, there is consensus that the field is not useful. That’s a 
good start. The customary way of dealing with this would be to mark the field 
“reserved”, but evidently there are multiple divergent implementations in the 
field that use different formats for the Multicast Leave Synch Route, some that 
include the field and some that don’t. (I should disclose here that my 
employer’s implementation is in the “include” camp.)

There is an obvious interoperability problem here: BGP implementations are 
required to sanity-check the NLRI they receive (see RFC 4271 section 6.3, RFC 
4760 section 7, and RFC 7606 section 5.3). This checking is required whether or 
not there’s a route target 

Re: [bess] IGMP / MLD Proxy Draft update (NLRI change)

2020-04-26 Thread Ali Sajassi (sajassi)

John,

Thanks for your insightful input and suggestion. We have had other situations 
similar to this in the past and we have resolved them by the consensus and 
without having a “ticking time bomb” to cause a network meltdown. One such 
situation was the need to extend RT-4 to add the originator router’s address 
which changed the length of RT-4 route. At the time there were pre-RFC 
implementation from several vendors already deployed in different networks and 
the vendors decided to go with the new RT-4 format and upgrade to it and making 
sure the interoperability is based on standard RFC and not pre-standard 
version. That worked fine as I and other colleagues from other vendors 
(including yours) are not aware of any issues regarding that update. We have a 
lesser situation in here because of the following implementation status:


  1.  Some vendors have implemented both format
  2.  Some vendors have allowed for both lengths (including my vendor) to avoid 
malformed NLRI. Allowing for both length doesn’t mean supporting both format 
but rather both lengths so that the PE that doesn’t need to import the route, 
doesn’t interpret the old format as malformed.
  3.  Vendors that haven’t implemented it, prefer new format
  4.  AFAIK, there is only a single vendor that implemented the v4-only format

So, based on the current data, I think we can have the following two options 
that IMO are simpler:

  1.  Just go with the new format and for multi-vendor deployment, making sure 
the new format is used. Considering the current deployments situations where 
intra-DCs and intra-sites are  done using a single vendor but different vendors 
are used for different sites and DCs, this can be feasible. Maybe that’s why we 
haven’t run into the interop issues because for the current deployment model.
  2.  Accommodate both lengths (i.e., bullet b) above) and turn on 
RT-constraint on the PEs that support old RT-8 format. This way, the RR can 
properly reflect both RT-8 formats. The PEs supporting the new format can be 
inserted into the network without issue. And the PEs supporting the old format 
can be gradually migrated to the new format.

I should just mention that for RT-4 changes that all the vendors did long time 
ago, the approach (1) was adopted.

Regards,
Ali


From: John Scudder 
Date: Friday, April 24, 2020 at 3:01 PM
To: "Mankamana Mishra (mankamis)" 
Cc: "bess@ietf.org" , 
"draft-ietf-bess-evpn-igmp-mld-pr...@ietf.org" 

Subject: Re: [bess] IGMP / MLD Proxy Draft update (NLRI change)
Resent-From: 
Resent-To: Cisco Employee , , 
, , 
Resent-Date: Friday, April 24, 2020 at 3:01 PM

Hi All,

Regarding the proposal to remove the Leave Group Synchronization field from the 
Multicast Leave Synch Route, the current proposal is inadequate. Below I 
discuss why, and provide an alternate suggestion. For those who don’t want to 
read my wall of text, my key motivation is simple:

- The current proposal is a ticking time bomb because it leaves in the field a 
situation where two incompatible implementations can exist undetectably.

And my proposal boils down to two things:

- For the new format NLRI that omits the field, allocate a new code point. 
Deprecate [*] code point 8 going forward.
- Optionally provide a somewhat more sophisticated interworking option for 
backward compatibility.

Nitty-gritty below including considerations for how to transition from code 
point 8 to the TBD code point.

As far as I can tell, there is consensus that the field is not useful. That’s a 
good start. The customary way of dealing with this would be to mark the field 
“reserved”, but evidently there are multiple divergent implementations in the 
field that use different formats for the Multicast Leave Synch Route, some that 
include the field and some that don’t. (I should disclose here that my 
employer’s implementation is in the “include” camp.)

There is an obvious interoperability problem here: BGP implementations are 
required to sanity-check the NLRI they receive (see RFC 4271 section 6.3, RFC 
4760 section 7, and RFC 7606 section 5.3). This checking is required whether or 
not there’s a route target present to cause the router to consume the NLRI, the 
standards require the NLRI to be checked regardless. The consequence of 
malformed NLRI is a session reset. This turns out to be a difficult problem in 
BGP, even though we’ve worked to reduce the number of error cases that require 
a session reset, malformed NLRI are one of the very bad cases we can’t paper 
over. The IDR WG worked on this very hard during the development of RFC 7606, 
it is a real problem. When an implementation expects one NLRI format and 
receives another, that’s a malformed NLRI, and can be expected to cause a 
session reset. To leave this situation in place would be BGP protocol 
malpractice.

As far as I can tell, this means it is only through dumb luck that we have had 
two different NLRI formats in the wild without a network meltdown. This seems 

Re: [bess] IGMP / MLD Proxy Draft update (NLRI change)

2020-04-26 Thread Susan Hares
John - Thank you for reminding folks of RFC7120.  It exists due to lots of past 
pain 

 

Bess WG 

 

Please go toward the new code point scenario John Scudder describes and RFC7120 
recommends.  


  If at some point changes that are not backward compatible are
  nonetheless required, a decision needs to be made as to whether
  previously allocated code points must be deprecated (see Section 3.3
  for more information on code point deprecation).  The considerations
  include aspects such as the possibility of existing deployments of
  the older implementations and, hence, the possibility for a collision
  between older and newer implementations in the field.

 

Let’s not tempt fate for melt-downs.  Now, more than ever.

 

 

I look forward to John’s write-up of the alternative. 

 

Sue Hares

IDR co-chair. 

 

 

 

 

 

From: BESS [mailto:bess-boun...@ietf.org] On Behalf Of John Scudder
Sent: Friday, April 24, 2020 6:01 PM
To: Mankamana Mishra (mankamis)
Cc: draft-ietf-bess-evpn-igmp-mld-pr...@ietf.org; bess@ietf.org
Subject: Re: [bess] IGMP / MLD Proxy Draft update (NLRI change)

 

Hi All,

Regarding the proposal to remove the Leave Group Synchronization field from the 
Multicast Leave Synch Route, the current proposal is inadequate. Below I 
discuss why, and provide an alternate suggestion. For those who don’t want to 
read my wall of text, my key motivation is simple:

 

- The current proposal is a ticking time bomb because it leaves in the field a 
situation where two incompatible implementations can exist undetectably.

 

And my proposal boils down to two things:

- For the new format NLRI that omits the field, allocate a new code point. 
Deprecate [*] code point 8 going forward.
- Optionally provide a somewhat more sophisticated interworking option for 
backward compatibility.

Nitty-gritty below including considerations for how to transition from code 
point 8 to the TBD code point.

As far as I can tell, there is consensus that the field is not useful. That’s a 
good start. The customary way of dealing with this would be to mark the field 
“reserved”, but evidently there are multiple divergent implementations in the 
field that use different formats for the Multicast Leave Synch Route, some that 
include the field and some that don’t. (I should disclose here that my 
employer’s implementation is in the “include” camp.) 

There is an obvious interoperability problem here: BGP implementations are 
required to sanity-check the NLRI they receive (see RFC 4271 section 6.3, RFC 
4760 section 7, and RFC 7606 section 5.3). This checking is required whether or 
not there’s a route target present to cause the router to consume the NLRI, the 
standards require the NLRI to be checked regardless. The consequence of 
malformed NLRI is a session reset. This turns out to be a difficult problem in 
BGP, even though we’ve worked to reduce the number of error cases that require 
a session reset, malformed NLRI are one of the very bad cases we can’t paper 
over. The IDR WG worked on this very hard during the development of RFC 7606, 
it is a real problem. When an implementation expects one NLRI format and 
receives another, that’s a malformed NLRI, and can be expected to cause a 
session reset. To leave this situation in place would be BGP protocol 
malpractice.

As far as I can tell, this means it is only through dumb luck that we have had 
two different NLRI formats in the wild without a network meltdown. This seems 
like a ticking time bomb situation.

The implementations are in the field already, we can’t just stamp our feet and 
say “you should have followed the spec” and make the problem go away. So we 
have to think about how to migrate to one agreed format, whatever it may be. 
(The idea that interoperability concerns can be addressed by simply never 
mixing old and new implementations in the same network can be dismissed out of 
hand. That amounts to “there are no interoperability problems if there’s no 
interoperation”, and are we not a standards organization, and is our goal not 
interoperability?)

Let’s take as a given that the agreed format will end up being the one that 
removes the Leave Group Synchronization field. Since something has to change, 
it may as well be the thing that removes the vestigial field.

The cleanest solution is to keep the format depicted in draft -04 (and its 
predecessors) on code point 8, and to allocate a new code point for the new 
format. The old code point would be deprecated, the new code point would be the 
standardized version. It turns out that moving code points is exactly the 
strategy prescribed (or at least strongly recommended) by RFC 7120 section 3.2:

  If at some point changes that are not backward compatible are
  nonetheless required, a decision needs to be made as to whether
  previously allocated code points must be deprecated (see Section 3.3
  for more information on code point deprecation).  The considerations
  include aspects such as 

Re: [bess] VXLAN BGP EVPN Question

2020-04-26 Thread Rabadan, Jorge (Nokia - US/Mountain View)
Hi Gyan,

Actually we started with the evpn dci draft in 2013 :-)

The way I see the unknown mac route it saves flooding if all the MACs in the 
POD/DC are known beforehand. The unknown unicast traffic can be aliased to the 
GWs. In case of failure in one of the GWs, the AD per-ES route for the I-ES 
will be withdrawn (mass withdraw for all EVIs) and the unknown traffic can be 
sent to the redundant GWs. So this failure won’t generate any extra flooding.

Thanks.
Jorge

From: Gyan Mishra 
Date: Saturday, April 25, 2020 at 8:45 AM
To: "Lukas Krattiger (lkrattig)" , "saja...@cisco.com" 

Cc: BESS , Jeff Tantsura , "Rabadan, 
Jorge (Nokia - US/Mountain View)" 
Subject: Re: [bess] VXLAN BGP EVPN Question


+ Ali

Lukas

I noticed that Ali was on the multi site draft which I which expired in 2017 
around the same time the DCI overlay  draft was submitted.  I went through the 
logs but did not go through the mail archives to see what happen to multi site 
draft.  My guess is these were two competing drafts and multi site was geared 
solely to EVPN procedures for vxlan encapsulation and thus did not achieve WG 
adoption, where your DCI overlay draft accounts for every encapsulation type 
using EVPN procedures and is more comprehensive approach to DCI providing an 
improved solution to Multisite vxlan overlay stitching.

I like the re-origination of the VNI and RD idea using local context on the 
gateway as an additional control mechanism which prevents Type 2 mac-ip routes 
from being flooded between pods that should not without flood filters. With the 
multi site feature there are no control and all mobility routes are flooded 
unfortunately active or not.

With this draft is it possible to add a feature for conversation learning of 
only active flows when the type 1 BGP a-d is sent for initial BUM advertisement 
for arp or nd, there could be a snooping mechanism similar to IGMP snooping 
that discovers the active flow and thus creates the control plane level type 2 
Mac-IP state followed by being flooded in data plane NVE tunnel overlay.  I 
think this concept could apply intra site fabric leaf to leaf but I think would 
be extremely beneficial for inter pod or inter site.

This could be separate feature or option to the selective advertisement.

So the selective advertisement works in conjunction with re-origination of RD 
and locally significant VNI.

So what I would envision with the conversation learning active flow detection 
feature you would use global VNI and now only the active type-2 Mac-IP routes 
would be propagated inter pod or site.

This feature would be a tremendous benefit to operators and help with mac scale.

In our Cisco multisite feature implementations we do use the recommended BUM 
traffic multi site feature specific suppression applied on the BGW.  So that 
definitely helps with the BUM suppression for sure.

In section 3.5.1 UMR - so the route type is like a default Mac route 0/48 with 
ESI set to DCI gateway I-ESI for all active multi homing, and so instead of 
flooding all mac’s and have to rely on mass mac withdrawals during a failure, 
now only the UMR is withdrawn.  Is that correct?

That’s a huge savings on resources.

Kind regards

Gyan

On Fri, Apr 24, 2020 at 3:25 PM Lukas Krattiger (lkrattig) 
mailto:lkrat...@cisco.com>> wrote:
Thanks Jorge and Jeff for guiding all the way thru the features and functions 
we have around, in DCI-overlay and Multi-Site.

Gyan,

Specific to the VNI distribution, BUM handling and the re-origination in 
Multi-Site.
With re-origination, the RDs are changed on the GW node. With this in mind, the 
VNI could be Global or local significant. In the case of local significants, we 
can stitch VNIs together (ie (VNI1 - GW - VNI2 - GW - VNI3).
Further, MAC- or IP-VRFs that are not supposed to be extended to a remote Sites 
will not advertise any MAC or IP routes beyond the local GW. This way you will 
keep the control-plane clean and avoid unnecessary creation of flood lists. 
This is what we call selective advertisement, which is different than 
conversational learning. Conversational learning could be a complement to 
selective advertisement. The unknown MAC approach that Jorge mentioned is a 
different approach for similar optimizations.
In addition to ARP suppression, in the specific Cisco implementation of 
Multi-Site, we provide a BUM traffic policer to rate limit between Sites. This 
policer are located on the GW and acts in the egress direction.

So with the DCI EVPN VNI translation does that end up netting the desired 
effect control plane segregation from data plane and providing that reduced 
size Mac VRF showing only active interesting traffic type 2 Mac-IP routes intra 
pod within the DC.

In a certain way, yes

Kind Regards
-Lukas



On Apr 24, 2020, at 7:21 AM, Rabadan, Jorge (Nokia - US/Mountain View) 
mailto:jorge.raba...@nokia.com>> wrote:

Hi Gyan,

The dci evpn overlay draft indeed provides that segmentation. EVPN routes are