draft-fang-mpls-hsdn-for-hsdc-01

Petr Lapukhov Sun, 29 Mar 2015 20:22:43 -0700

Robert,

On MPLS selling points:

Generally speaking, any tunneling technique may achieve the FIB state 
compression in "transit" (but not all) devices (by the virtue of core/edge 
asymmetry). MPLS is just one of these techniques, which seems attractive due to 
its simplicity (flat label lookup) and protocol agnostic operations. However, 
since IPv4/IPv6 is not going away from DC boxes, the advantage of simplicity is 
diminished - it's not like we would see super-cheap and easy-to-operate 
MPLS-only DC switches on the market. Additionally, I personally don't see FIB 
state explosion as such a daunting problem in properly engineered/summarized 
DC/DCI networks, but that's a separate conversation.

As another plus for MPLS, MPLS OAM related techniques seem attractive as they 
allow to test the "circuit" independent of payload. This is true in "pure" MPLS 
case, but if hardware parser is looking beyond the MPLS stack (which it often 
does) this "protocol-independent" OAM becomes not-so-independent :). Another 
good reason for MPLS OAM is the ability to lock down specific paths for 
testing, though one may argue that any source-routed technique would do that. 
However, MPLS just seems to be the one with least overhead and supported across 
multiple DC hardware platforms.

Despite the advantages, the are some actual technical challenges with MPLS in 
data-center. For example, some platforms limit L3 ECMP to IP2MPLS function only 
(though L2 ECMP is still possible with MPLS header). Similar issue is happens 
when handing anycast destinations, which is critical to many load-balancing 
solutions. These issues could be addressed by performing IP lookups and 
building label stacks in the host, which is the ultimately the segment/source 
routing approach to the problem. This also allows the network to be purely MPLS.

Pushing IP2MPLS edge to the host level seems like a great idea until you 
consider the amount of state that one needs to distribute & synchronized among 
potentially millions of devices. For example, if you want to ECMP off of a host 
in a DC network with 128 paths you need to inform every host of a link failure 
when any of these 128 paths becomes unavailable (assuming that any host may 
talk to any host over any path). Distributed state synchronization is a not an 
easy problem, and it hell of a pain to troubleshoot, especially when many 
devices are involved.

This is why, I'm personally in favor of hybrid approaches, where source-routing 
(not necessarily via MPLS, though) is added on top of traditional 
IP/shortest-path model, and allows for extended OAM functionality, while 
retaining full compatibility with the traditional approach. This does not allow 
for achieving the goal of ultimate FIB compression, but as I mentioned I do not 
see this as a serious problem with proper route summarization...

Regards,

Petr

________________________________
From: [email protected] [[email protected]] on behalf of Robert Raszuk 
[[email protected]]
Sent: Wednesday, March 25, 2015 8:34 PM
To: Petr Lapukhov
Cc: Luyuan Fang; [email protected]; [email protected]; Pedro Marques
Subject: Re: https://tools.ietf.org/html/draft-fang-mpls-hsdn-for-hsdc-01

Hello Petr,

Thank you very much for yr comment ! One question:

> To me, the only good selling point for mpls in DC, in my opinion,
> is having a uniform end to end transport (with corresponding OAM etc).

Let me understand this. What is the definition of "uniform end to end 
transport" ?

If you use IP for transport and say use L3VPN option C in overlay or for 
example LISP what is not uniform in compute node to compute node transport 
across any DC or any region ?

As far as OAM do we have any missing tools with IP operation ?

- - -

I am just trying to understand technical rationale - if any such exist - aside 
from sales, political, religious or fanatic - why anyone would propose mpls 
transport for DC underlay.

- - -

No longer then today I was actually arguing with one networking vendor that 
MPLS as demux value (say in RFC4364) makes a lot of sense for DCs multi tenant 
rather then reinventing the wheel and use different name for effectively the 
same function (GRE key or NVGRE VSID). But that is in the overlay space. 
Completely different topic.

Best regards,
r.

On Thu, Mar 26, 2015 at 1:56 AM, Petr Lapukhov 
<[email protected]<mailto:[email protected]>> wrote:
AFK, so can't write a well-formed comment :(

but in short, my personal experience was that circuit-like transports play well 
as *augmentation* to shortest-path / ecmp / longest-prefix match techniques, 
not as a complete replacement (after all, ip already works). Mpls circuits are 
alright if you have network asymmetry and need to work around it, but in 
symmetric topologies they seem rather unnecessary, unless you really want to 
have end-to-end uniform data plane, which has both downsides and benefits.

Robert and I had some discussions around pure mpls / seamless mpls DC + DCI 
networks a couple of years ago, but it was hard to find a strong selling point 
for mpls (I was arguing for mpls, btw). In general, MPLS offers uniform, 
protocol agnostic forwarding plane, with simple lookup, but the latter is not 
such a big win with modern (and upcoming) silicon. Next, for entropy reasons it 
is often necessary to resort to leaky abstractions with mpls  (eg nibble 
guessing) or add complications (entropy label), which makes the architecture 
more complicated.

Additionally, I feel that FIB compaction has more to do with network structure 
and careful control of state propagation rather that underlying forwarding 
mechanism. On this side, something that could be achieved with IP via simple 
summarization requires rather sophisticated LSP hierarchies with mpls.

To me, the only good selling point for mpls in DC, in my opinion, is having a 
uniform end to end transport (with corresponding OAM etc). It is not very clear 
whether this has more advantages than downsides, and requires a separate 
discussion :)

Petr

Mar 25, 2015, в 5:00 PM, "Robert Raszuk" 
<[email protected]<mailto:[email protected]>> написал(а):

Hello Luyuan,

Quote:

"The HSDN forwarding architecture in the underlay network isbased on four main 
concepts: 1. Dividing the DC and DCI in ahierarchically-partitioned structure; 
2. Assigning groups ofUnderlay Border Nodes in charge of forwarding within each 
partition; 3. Constructing HSDN MPLS label stacks to identify the end points 
according to the HSDN structure; and 4. Forwarding using the HSDN MPLS labels."

Can you provide any reasoning for going to such complexity when trying to use 
MPLS as transport within and between DCs as compared with using IP based 
transport ? Note that IP based transport native summarization provides 
unquestionable forwarding FIB compression.

Quote:

"HSDN is designed to allow the physical decoupling ofcontrol and forwarding, 
and have the LFIBs configuredby a controller according to a full SDN approach. 
Th controller-centric approach is described in this document."

+

Quote:

"2) The network nodes MUST support MPLS forwarding."

Please kindly note that to the best of my knowledge number of ODMs routers used 
to construct IP CLOS Fabric does not really have control plane which supports 
MPLS transport. Neither distributed nor centrally ie via controller managed.

Quote:

"The key observation is that it is impractical, uneconomical, and
ultimately unnecessary to use a fully connected Clos-based topology in a large 
scale DC."

That is an interesting statement. I think however that one should distinguish 
interconnected regions with proper CLOS fabric from some sort of CLOS fabric 
want-to-be type of topologies. In any case it has no bearing on the main points 
of the scalable interconnect discussion.

- - -

While we could go via number of other comments let's cut it short.

Your draft states that HSDN works with IPv4 transport in the below statement:

Quote:

"Although HSDN can be used with any forwarding technology, including IPv4 and 
IPv6,"

1. Can you summarise reasons what problems do you see with IPv4/IPv6 based 
underlay in the DCs that drove you to provide this document to be based on MPLS 
?

(Note that tenant mobility is the overlay task and nothing to do with underlay.)

2. Can you describe how are you going to distribute MPLS stack to be used for 
forwarding in the underlay to servers ?

3. How are you going to provide efficient ECMP intra-dc ? I see no trace of 
entropy labels in your document.

4. For TE is there anything missing in the below document ?

https://tools.ietf.org/html/draft-lapukhov-bgp-sdn-00

Many thx,

r.

_______________________________________________
nvo3 mailing list
[email protected]
https://www.ietf.org/mailman/listinfo/nvo3

Re: [nvo3] https://tools.ietf.org/html/draft-fang-mpls-hsdn-for-hsdc-01

Reply via email to