Erik,

I have added your email to RTGWG list, so now you are allowed to post there. 

Cheers,
Jeff
 

On 4/12/17, 01:50, "Isis-wg on behalf of Erik Auerswald" 
<[email protected] on behalf of [email protected]> wrote:

    Hi all,
    
    I have read draft-white-openfabric-02 and would like to comment
    on a few points. I'll start at the top of the draft and continue
    through the text.
    
    Please keep my e-mail address in replies, because I am not subscribed
    to the isis-wg and rtgwg mailing lists.
    
    1.
       The abstract states "[...]topology information is extracted
    through broad based connections." I do not understand that sentence.
    
    2.
       Section 1.1., Goals, mentions large scale data centers. Would
    it be appropriate to reference RFC 7938, Use of BGP for Routing
    in Large-Scale Data Centers, here? Said RFC proposes a Clos topology
    for the network, which seems to be similar to the spine and leaf
    topology of openfabric.
    
    3.
       In section 1.3., Simplification, I noticed a spelling mistake:
    mutliaccess (should be multiaccess).
    
    4.
       In section 1.5., Sample Network, a spine and leaf network is
    shown in figure 1. The topology shown in that figure is different
    from the 5-stage Clos topology shown in RFC 7938, figure 3. The
    5-stage Clos topology from RFC 7938 represents the network topology
    used by Facebook for the Altoona data center, as publicized in
    
https://code.facebook.com/posts/360346274145943/introducing-data-center-fabric-the-next-generation-facebook-data-center-network/.
    
    Another generalization of the 3-stage Clos network to more than
    3 stages called Beneš network can be found on Wikipedia:
    
https://en.wikipedia.org/wiki/Clos_network#Clos_networks_with_more_than_three_stages
    
    Both of these 5-stage networks differ from figure 1 of the
    openfabric draft insofar as each T2 switch is connected to a
    proper subset of T1 switches (openfabric designation) in both the
    RFC 7938 "Clos" topology and the Beneš network. This is crucial
    for increasing the amount of input- and output ports without
    using bigger switches.
    
    Since this is important for later comments, I have adapted figure 3
    from RFC 7938 into the following drawing:
    
    
            +----+                  +----+
            |L1.1|                  |L1.2|             (T0)
            +----+                  +----+
             |   \________________  /   |
             |    ________________\/    |
             |   /                 \    |
            +----+                  +----+
            |F1.1|                  |F1.2|             (T1)
            +----+                  +----+
            /    \                  /    \
           /      \                /      \
       +----+    +----+        +----+    +----+
       |S1.1|    |S1.2|        |S2.1|    |S2.2|        (T2)
       +----+    +----+        +----+    +----+
           \      /                \      /
            \    /                  \    /
            +----+                  +----+
            |F2.1|                  |F2.2|             (T1)
            +----+                  +----+
             |   \________________  /   |
             |    ________________\/    |
             |   /                 \    |
            +----+                  +----+
            |L2.1|                  |L2.2|             (T0)
            +----+                  +----+
    
         Legend:
           Lx.y: Leaf switches (a.k.a. Top of Rack (ToR) switches)
           Fx.y: Fabric switches
           Sx.y: Spine switches
    
         Inter-switch connections:
           Lx.y is connected to Fx.*
           Fx.y is connected to Lx.* and Sy.*
           Sx.y is connected to F*.x 
    
       Figure 2: 5-Stage Clos Topology (adapted from [RFC7938], Figure 3)
    
    I have used the name "Fabric switch" similar to Facebook's use
    of that name in the above referenced blog post, just to have
    distinct names and single letter abbreviations for each tier.
    
    A reference to RFC 7938, section 3.2, Clos Network Topology, would
    fit into this section.
    
    5.
       It might be appropriate to mention the use of timeouts and
    exponential back-off for initial adjacency formation in section 2.
    Something like sequentially trying all discovered neighbors and
    using exponentially increasing random timeouts for subsequent
    rounds until the first adjacency is formed. A "Happy Eyeballs"
    (RFC 6555) like approach of trying to form two adjacencies with
    a slight delay in-between might be nice as well.
    
    6.
       Section 3., Determining Location on the Fabric, relies on the
    special topology from figure 1 of the openfaric draft. In both
    Beneš networks and the topology shown in figure 2 (of this mail),
    FD == TD and TD == 4 holds for non-T0 switches. One example is
    S1.1 from figure 2. It can be easily seen from that figure that
    for all switches in that topology FD == TD == 4. Thus the algorithms
    from sections 3.1., Determining T0, and 3.2., Determining T1 and
    above, do not work for general fabric topologies.
    
    7.
       The algorithm described in section 4, Flooding Optimization, does
    not work for the 5-stage "Clos" topology (see figure 2). An example
    for this is a change that pertains just switches S1.1 and F1.1 in
    figure 2 (e.g. a link between these two switches fails). Because
    the T0 switches Lx.y receive the LSPs as DNR, the LSPs do not reach
    switches Fx.2 and S2.y during flooding. The failure recovery
    mechanism of section 4.1., Flooding Failures, is needed to propagate
    the LSPs by design, but this is clearly thought of as a backup
    mechanism that is not needed for normal operation.
    
    8.
       Section 5.1., Transit Link Reachability, would benefit from
    a reference to RFC 5837, Extending ICMP for Interface and Next-Hop
    Identification.
    
    9.
       Section 6., Openfabric and Route Aggregation, should disallow
    route summarization. Otherwise the failure of a single link will
    result in traffic black-holing without intra-tier links. See e.g.
    RFC 7839, sections 8.2. and 8.2.1. But intra-tier links are
    disallowed in section 1.5, Sample Network.
    
    Since the reason for disallowing intra-tier links, topology auto-
    detection, is not yet solved (see comment 6. above), you might
    allow the combination of intra-tier links and route summarization.
    I would prefer disallwoing both for openfabric, because the added
    complexity of route summarization and its effects on resiliency
    in the case of failures seem a bad trade-off for the reduced
    routing table size.
    
    Thanks for reading this far. :-)
    
    Best regards,
    Erik
    -- 
    Dipl.-Inform. Erik Auerswald         http://www.fg-networking.de/
    [email protected] T:+49-631-4149988-0 M:+49-176-64228513
    
    Gesellschaft für Fundamental Generic Networking mbH
    Geschäftsführung: Volker Bauer, Jörg Mayer
    Gerichtsstand: Amtsgericht Kaiserslautern - HRB: 3630
    
    _______________________________________________
    Isis-wg mailing list
    [email protected]
    https://www.ietf.org/mailman/listinfo/isis-wg
    


_______________________________________________
rtgwg mailing list
[email protected]
https://www.ietf.org/mailman/listinfo/rtgwg

Reply via email to