Re: [trill] Tsvart early review of draft-ietf-trill-over-ip-10

Donald Eastlake Tue, 30 Jan 2018 21:26:42 -0800

Hi Magnus,

It has been a while but the just posted version -12 is intended to
resolve your comments except those related to middle boxes. (The TRILL
WG has decided middle boxes will be out of scope for this draft.)


Thanks,
Donald
===============================
 Donald E. Eastlake 3rd   +1-508-333-2270 (cell)
 155 Beaver Street, Milford, MA 01757 USA
 [email protected]


On Thu, Jun 29, 2017 at 10:17 PM, Donald Eastlake <[email protected]> wrote:
> HI Magnus,
>
> On Tue, Jun 27, 2017 at 1:13 PM, Magnus Westerlund
> <[email protected]> wrote:
>>
>> Hi Donald,
>>
>> After having read your response I think there is an important question
>> about the applicability of this document that affects several of the issues
>> below and what solution you need. That is the question of what type of paths
>> one expect to get Trill over IP working over. Because if the target is
>> general Internet and also through middleboxes such as NATs and Firewall (Not
>> intending to block) then there are a lot more work to ensure this. If you
>> for example changes the applicability to require any on path middleboxes to
>> fulfill certain requirements things can be more easily addressed.
>
>
> The use cases in the document support communication over the general
> Internet in the brach office case but that does not necessarily imply
> NATs/Firewalls.
>
>> Den 2017-06-26 kl. 02:07, skrev Donald Eastlake:
>>
>> Hi Magnus,
>>
>> Thanks for the extensive review. See my responses below.
>>
>> On Thu, Jun 15, 2017 at 1:32 PM, Magnus Westerlund
>> <[email protected]> wrote:
>>
>>
>> Diffserv usage
>> --------------
>>
>> Section 4.3:
>>
>>    TRILL over IP implementations MUST support setting the DSCP value in
>>    the outer IP Header of TRILL packets they send by mapping the TRILL
>>    priority and DEI to the DSCP. They MAY support, for a TRILL Data
>>    packet where the native frame payload is an IP packet, mapping the
>>    DSCP in this inner IP packet to the outer IP Header with the default
>>    for that mapping being to copy the DSCP without change.
>>
>> I think it is fine to require that implementations are capable  of setting
>> DSCP values on the outer IP header. However, I fail to see any discussion
>> of
>> the potential issues with actually setting the DSCP values. It is one
>> thing to
>> do this in an IP back bone use case where one can know and have control
>> over
>> the PHB that the DSCP values maps to. But otherwise, over general internet
>> the
>> behavior is not that predictable. One can easily be subject to policers or
>> remapping. Also as the actual DSCP code point usage is domain specific
>> this is
>> difficult. Priority reversal is likely the least of the problems that this
>> can
>> run into over general Internet.
>>
>> It sounds like appropriate discussion and warnings about these issues
>> would resolve the above comment.
>>
>> I would note that the choice of encapsulation here do becomes important.
>> Your's and Joe Touch's observation that for TCP, you can only have a single
>> DSCP marking per TCP connection for example. For others, see the discussion
>> in Section 5.1 of https://datatracker.ietf.org/doc/rfc7657/ on this issue.
>
>
> Well, if a TRILL over IP implementation using TCP transport wants to have
> more than one priority category for traffic where there might be one or more
> intervening IP routers, which would be the normal case, it would just need a
> TCP connection per priority category. Mapping the 8 priority levels into a
> smaller number of categories is a routine thing to do. Note that the base
> TRILL protocol specification (RFC 6325) says:
>
>    RBridges are not required to implement any
>    particular number of distinct priority levels but may treat one or
>
>    more adjacent priority levels in the same fashion.
>
>>
>> David Black also raised an important question if one should treat this as
>> a tunnel with a single predictable behavior or let the inner networks
>> marking show through. Establishing a tunnel with a single PHB has less risk
>> of running into issues than multiple different markings.
>
>
> It is an implementation choice whether to have a single PHB or eight, one
> per priority level, or something in between.
>>
>> Section 4.3:
>>
>>    The default TRILL priority and DEI to DSCP mapping, which may be
>>    configured per TRILL over IP port, is an follows. Note that the DEI
>>    value does not affect the default mapping and, to provide a
>>    potentially lower priority service than the default priority 0,
>>    priority 1 is considered lower priority than 0. So the priority
>>    sequence from lower to higher priority is 1, 0, 2, 3, 4, 5, 6, 7.
>>
>>       TRILL Priority  DEI  DSCP Field (Binary/decimal)
>>       --------------  ---  -----------------------------
>>                   0   0/1  001000 / 8
>>                   1   0/1  000000 / 0
>>                   2   0/1  010000 / 16
>>                   3   0/1  011000 / 24
>>                   4   0/1  100000 / 32
>>                   5   0/1  101000 / 40
>>                   6   0/1  110000 / 48
>>                   7   0/1  111000 / 56
>>
>> This appear to be an problematic mapping. At least for prio 0 and 1. As
>> priority 1 appears to be intended to be higher than priority 0, it is
>> interesting that it is mapped to CS1, which to quote
>> https://datatracker.ietf.org/doc/rfc7657/:
>>
>> CS1 ('001000') was subsequently designated as the recommended
>>       codepoint for the Lower Effort (LE) PHB [RFC3662].
>>
>> So what is proposed can in a network using default mapping, result in that
>> you
>> get priority 0 to be lower priority than 1. Plus that in some networks
>> this can
>> also results in strange remapping that results in a different PHB for CS1
>> than.
>>
>> The intent in the draft is to reflect the default relative priority of
>> the different priority code points in IEEE Std 802.1Q where priority 1
>> is lower than priority 0. At a quick look, it appears to me that RFC
>> 2474 requires that 0x001000 be handled as being of a priority not
>> lower than the priority with which 0x000000 is handled. Yet RFC 3662,
>> which you point to, seems to suggest using 0x001000 as a lower
>> priority code point than 0x000000. Given that 3662 not only does not
>> update 2474 but is only Informational while 2474 is Standards Track, I
>> would say that 2474 dominates and that this draft makes the best
>> assumptions it can about default behavior...
>>
>>
>> David Black provide a good answer on this.
>
>
> I'll reply to him.
>>
>> MTU and Fragmentation
>> ---------------------
>>
>> I think there are two main issue here. The first one is MTUD discovery
>> of the actual IP path MTU between the ports. That will be needed to
>> prevent
>> a lot of traffic going into MTU black holes. Especially as TRILL requries
>> 1470 byte support which is likey above a lot of paths.
>>
>> Seems like it would depend on the environments where TRILL was used.
>> For example, I do not think 1470 would be a problem in most Data
>> Center or Internet Exchange point uses, for example. Data Centers
>> sometimes support 9K jumbo frames and the like.
>>
>> In fact, it is probably bad to focus too much on 1470 -- that is a
>> required minimum to be sure that reasonable size link state PDUs can
>> be successfully flooded through the TRILL campus so that routing will
>> work. However, it would commonly be the case that, for the TRILL
>> campus to be useful in a particular case, links need to be able to
>> carry the expected size TRILL Data packets. For example, if there were
>> two parts of a TRILL campus connected by one or a few TRILL over IP
>> links and the end stations in each part were assuming they could use
>> 1500 byte Ethernet packets, then the TRILL over IP links would need to
>> support an MTU based on 1500 + TRILL Header + IP and TRILL over IP
>> encapsulation. And more if security was being used or there were any
>> other reasons for additional headers/encapsulation...
>>
>>
>> Yes, and over general Internet you should be happy if you get 1500 bytes
>> of IP MTU, it may easily be lower with a couple of additional tunnel
>> headers. Thus, what you say is the goal is not feasible without a solution
>> that supports fragmentation and reassembly, enabling one TRILL packet to be
>> sent in multiple IP packets. The re-assembly do requires buffering and not
>> something to easily perform on a router fast path. And attempting to use IP
>> fragmentation is likely doomed if you have any type of NAT or Firewall in
>> the way.
>>
>> This points to a dedicated solution or using a transport protocol that
>> supports carrying arbitrary data sizes, like TCP or SCTP. And you need to
>> use the byte-stream API of TCP to achieve this.
>
>
> OK.
>>
>> Section 8.4:
>>
>>    Path MTU discovery [RFC4821] should be useful
>>    in determining the IP MTU between a pair of RBridge ports with IP
>>    connectivity.
>>
>> The issue with RFC4821 is that it has requirements on the packetization
>> layer.
>> Trill appears to have several components that are useful. However, it will
>> require a specification of the procedure to result in a useful tool.
>>
>> See below.
>>
>> Section 8.4:
>>
>>    TRILL IS-IS MTU PDUs, as specified in Section 5 of [RFC6325] and in
>>    [RFC7177], can be used to obtain added assurance of the MTU of a
>>    link.
>>
>> Yes, that can confirm working MTUs that are at 1470 or above, but appears
>> prevented from working below 1470?
>>
>> While there is a minimum size for TRILL IS-IS MTU PDUs, determined by
>> header size, it is well below 1470, probably (depending on whether
>> secuirty is in use, etc.) below 150 bytes.
>>
>>
>> Okay, if you say so, it was not obvious from the spec that is was allowed
>> to probe for paths with lesser MTUs than 1470.
>>
>> Thus, it appears that there is a lack of mechanism here to actually get a
>> valid
>> and functional MTU from TRILL in the cases where the Path MTU is below
>> 1470. If
>> I am wrong good, but I think this is an important piece for how to handle
>> the
>> next main issue.
>>
>> How about referencing Section 3 of
>> https://tools.ietf.org/html/draft-ietf-trill-mtu-negotiation-05
>> which is currently in IETF Last Call? (The wording of that section is
>> probably going to be improved based on an OPS review by Brian
>> Carpenter.)
>>
>> I looked at this, and it appears to have the same issue, that it can't
>> probe for MTU values below 1470.
>
>
> I think the thing was that, before TRILL over IP, it would not have been
> useful to determine an MTU below 1470. But there is no particular problem in
> constructing a smaller MTU-probe PDUs and an RBridge receiving such a PDU is
> generally required to respond with an equal length MTU-ack.
>>
>>    2) RB1 tries to send an MTU-probe padded to the size 1470.
>>
>>       a) If RB1 fails to receive an MTU-ack from RB2 after k tries, RB1
>>          sets the "failed minimum MTU test" flag for RB2 in RB1's Hello
>>          and stop.
>>
>>
>> But, the algorithm clearly performs a binary search for the MTU. If
>> one look at RFC 4821 one will notice that there are some additional
>> considerations
>> there how to make the probing better and robuster. But, cleary Trill has
>> some other
>> criterias for what is a success. Verification that Sz works appears
>> sufficient,
>> and there are no need to probe further upwards.
>>
>> UDP encapsulation and IP fragments.
>>
>>   ----------------------------------
>>
>> I see it as a big issue that UDP encapsulation is the native one, and that
>> relies on IP fragmentation despite the need for reliable fragmentation.
>> With
>> the setup of having to support 1470 MTU on TRILL level some packets will
>> be
>> fragmented in many environments. That will lead to a lot of losses, and as
>> discussed below a very big problem with middleboxes. The main problem here
>> is
>> that if one tries to rely on IP fragments one will have issues with
>> packets
>> ending up in black holes. And different problems depending on IPv4 or
>> IPv6.
>> IPv6 is lilkely the lesser problem assuming that one have working PMTUD.
>>
>> There are several ways out of this.
>>
>> 1. Detect issues and use TCP encapsulation with correctly set MSS to not
>> get IP
>> fragements 2. Determine MTU and implement an fragmentation mechanism on
>> top of
>> UDP.
>>
>> So, I don't see that much problem with UDP being the general default
>> consistent with the TRILL philosophy of defaulting to need zero or
>> minimal configuration. The default should be to use multicast Hellos
>> for discovery of neighbors which sure points at UDP to me. Having to
>> traverse a NAT should be a rare case. Since, in the NAT case, you have
>> to configure things related to the static binding and the IP
>> address(es) of peer(s) anyway you can also configure to use a
>> different encapsulation than UDP, such as TCP, at the same time. I
>> don't see it as much of a problem if, by default, TRILL won't operate
>> through a NAT. If you are using UDP and it fragments and fragments are
>> dropped at a NAT, probably you can't exchange Hellos so you will not
>> form an adjacency and anything on the other side of the NAT will not
>> be visible.
>>
>>
>> Yes, but this is the issue of applicability and documenting that
>> applicability. I don't know what goals and requirements that exist for
>> Trill. If the WG are fine with some restrictions, then document them and
>> focus on solving the issues that must be solved.
>>
>> You can clearly choose to require TCP for cases where the IP MTU is
>> insufficient for carrying the Sz sized trill packets between the RBs using
>> UDP.
>
>
> OK.
>>
>> Zero Checksum:
>> --------------
>>
>> Section 5.4:
>>
>> UDP Checksum - as specified in [RFC0768]
>>
>> Considering the fast path encapsulation desire, I am surprised to not see
>> any
>> mentioning of use of zero checksum here. Raising the zero checksum and
>> forward
>> reference would be good I think.
>>
>> And then Section 8.5:
>>
>>    The requirements for the usage of the zero UDP Checksum in a UDP
>>    tunnel protocol are detailed in [RFC6936]. These requirements apply
>>    to the UDP based TRILL over IP encapsulations specified herein
>>    (native and VXLAN), which are applications of UDP tunnel.
>>
>> If you actually intended to allow zero checksum, then you actually should
>> document that Trill fulfills the requirements that the applicability
>> statement
>> raises. I have not analyzed how well it meets these requirements.
>>
>> Please review Section 6.2 of RFC 8086 for example how that can be done.
>>
>> OK. We'll look into it.
>>
>> TCP Encapsulation issue
>> -----------------------
>>
>> Section 5.6:
>>
>> The TCP encapsulation appear to be missing an delimiter format allowing
>> each
>> individual TRILL packet/payload to be read out of the TCP's byte stream.
>> In
>> other words, a normal implementation has no way of ensuring that the TCP
>> payload starts with the start of a new TRILL payload. Multiple small TRILL
>> payloads may be included in the same TCP payload, and also only parts as
>> TCP is
>> one way of dealing with TRILL packets that are larger than the
>> IP+Encapsulation
>> MTU that actually will work.
>>
>> This comment is based on that there appear to be no length fields included
>> in
>> the TRILL header. The most straight forward delimiter is a 2-byte length
>> field
>> for the TRILL payload to be encapsulated.
>>
>> Right. It might also be useful to include some sort of check field, as
>> is done in BGP, to detect if you are out of sync in parsing the TCP
>> stream.
>>
>> As you need to actually perform re-assembly, the solution is to use the
>> byte stream semantics the TCP API provides and have a framing for each
>> packet.
>
>
> Of course.
>
> My point was that the framing might usefully have some sort of flag field,
> like the BGP framing has, so that there was a good chance of detecting if
> the parsing of the byte stream into frames has gotten out of synch.
>>
>> Another point is that, while with UDP it seems fine to send packets
>> with assorted QoS, you don't want to encourage re-ordering of TCP
>> packets in a stream. So if TCP encapsulation is being used, you want
>> to use the same DSCP value for the packets in a particular TCP stream.
>> So, generally, you need to have a TCP connection per priority handling
>> category. Mapping the 8 priority levels into a smaller number of
>> handling categories is a normal thing to do so you certainly don't
>> necessarily need 8 TCP connections. Adding material on this should not
>> be too hard.
>>
>>
>> Yes, agreed it is a possibility and points into possible considerations
>> that David raised.
>>
>> Section 5.6:
>>
>> TCP endpoint requirements. I do wonder if an application like TRILL actual
>> would need to discuss performance impacting implementation choices or
>> limitations. For example use of NAGLE, the requirements on buffer sizes in
>> relation to Bandwidth delay products, as buffer memory in a RBridge will
>> impact
>> performance.
>>
>> Well, I'm not sure how deeply this document should get into such
>> performance issues. What about just saying something about
>> consideration being given to tuning TCP for performance and pointing
>> to one or a few other RFCs that talk about this?
>>
>>
>> As Joe said, these are important considerations. If your intention is to
>> enable this to run at substantial fractions of line rates of the interfaces.
>> Then this do require considerations.
>
>
> I see.
>>
>> Congestion Control
>> ------------------
>> First thanks for the effort here.
>>
>> You're welcome.
>>
>> 8.1.2 In Other Environments
>>
>>    Where UDP based encapsulation headers are used in TRILL over IP in
>>    environments other than those discussed in Section 8.1.1, specific
>>    congestion control mechanisms are commonly needed.  However, if the
>>    traffic being carried by the TRILL over IP link is already congestion
>>    controlled and the size and volatility of the TRILL IS-IS link state
>>    database is limited, then specific congestion control may not be
>>    needed. See [RFC8085] Section 3.1.11 for further guidance.
>>
>> This is correct, however my question is if the RBridges have any way of
>> knowing
>> which traffic is actually congestion controlled, considering that TRILL
>> provides
>> an layer 2 abstraction. I wonder if there should be any type of white list
>> of
>> the types of layer 2 payloads that can be assumed to be congestion
>> controlled,
>> and thus okay to forward over IP paths? I am worried that without any
>> recommendation to prevent traffic that is not controlled to be forwarded,
>> can
>> lead to congestion issues.
>>
>> The other issue I think may exist is the issue serial unicast emulation of
>> broadcast/multicast creates. As this amplifies the outgoing packet rate
>> with
>> a factor of how many addresses are configured for serial unicast this can
>> be significant traffic expansion. Thus, I think additional considerations
>> are
>> needed here, and maybe rate limiting of the amount of traffic to be
>> multicasted.
>>
>> OK. We can think about those issues.
>>
>> Flow and ECMP
>> -------------
>>
>> Section 8.3:
>>
>> For example, for TRILL
>>    Data, this entropy field could be based on some hash of the
>>    Inner.MacDA, Inner.MacSA, and Inner.VLAN or Inner.FGL.
>>
>> I would appreciate clearer references to what these fields are.
>>
>> In a TRILL Data packet, the payload after the TRILL Header looks like
>> an Ethernet frame except that there is always either a VLAN tag or,
>> alternatively, where the VLAN tag would be, a Fine Grained Label
>> [RFC7172]. (The preceding is the view in the TRILL RFCs, but there is
>> an equivalent and equally valid view in which all the fields through
>> and including the VLAN or FGL tag are part of the TRILL Header.) The
>> TRILL base protocol specification focuses on Ethernet as a link
>> technology between TRILL switches, in which case there will be a link
>> header including an Outer.MacDA and Outer.MacSA fields and possibly an
>> Outer.VLAN, all before the TRILL Header. See Figure 1 and Figure 2 in
>> RFC 7172.
>>
>> Some of the above could be added to the draft for clarity.
>>
>> If I understand this correctly, the idea here is to look into the inner
>> layer 2 frames, and use the flow equivalents that exists on that level and
>> hash that into value that maps the flows onto the source port range.
>>
>> Yes.
>>
>> I think this text should include a summary of the principle and ensure to
>> note the important requirement that what is considered flows in the inner
>> must not result in being striped over multiple source ports as this may
>> lead to
>> reordering issues due to packets taking different paths.
>>
>> Well, we can add some text. But when would the relative ordering
>> matter for two TRILL Data packets where the two inner native payloads
>> have different values for any one or more of these three fields
>> (Inner.MacDA, Inner.MacSA, and inner VLAN/FGL tag) ? If any of those
>> fields are different, you are talking about different streams.
>>
>>
>> Okay, then this is very straightforward.
>>
>> NAT and TRILL over IP:
>> Section 8.5:
>>
>> If one like to use TRILL over IP through a NAT, then there are some very
>> important considerations that are missing. First the need for static
>> binding
>> configurations or the need for determining ones external address(es) and
>> be
>> able to communicate that to the peer RBridges, and in addition ensure that
>> one
>> has keep-alives to that the NAT binding never times out.
>>
>> I think those are good points. There is an additional problem that
>> TRILL Hellos detect neighbors with which they have 2-way connectivity
>> by indicating, inside the Hellos that are sent, from what neighbors
>> Hellos have been received on that port. If a NAT is involved, these
>> neighbor addresses inside Hellos need to be mapped.
>>
>> Yes, and the question is how that can be handled, by the receiver of the
>> packet, or if the sender needs to determine what address it uses and provide
>> that in the HELLOs. If the first is possible that can simplify a lot.
>
>
> I'm not sure. this would require a little detailed design work.
>>
>> Next is the issue that there is almost zero chance of getting a IP/UDP
>> encapsulation TRILL payload through the NAT if it results in IP
>> fragmentation,
>> as NATs don't do defragment and refragmented on the internal side, and an
>> IP
>> fragment lacks UDP port and thus can't be matched to binding.
>>
>> So perhaps the recommendation should be to configure the port to use
>> TCP if there will be fragmentation.
>>
>> Yes, I think that are likely the simplest solution for you.
>
>
> OK
>
> Thanks,
> Donald
> =============================
>  Donald E. Eastlake 3rd   +1-508-333-2270 (cell)
>  155 Beaver Street, Milford, MA 01757 USA
>  [email protected]
>
>> Also if you like to run IP/ESP through a NAT, then you most likely need
>> the
>> IP/UDP/ESP encapsulation (https://tools.ietf.org/html/rfc3948). Note that
>> this
>> will restrict the MTU even further and thus ensure that the 1470
>> requirement
>> cannot be fulfilled even without additional tunnels over an 1500 bytes MTU
>> Ethernet infrastructure.
>>
>> I would note that also firewalls likely have issues with IP fragments for
>> the
>> same reason, they require significant amount of state to be verified if
>> they
>> should be let through.
>>
>> In general I think you should create a configuration that has chance to
>> work
>> through most middleboxes, but I think you should require static bindings.
>> I
>> think that configuration is, and don't laugh now, but
>> IP/UDP/ESP/TCP/TRILL,
>> otherwise you will not be able to have both security and reliable
>> fragmentation
>> of TRILL packets.
>>
>> OK. Thanks again for this review. It has pointed out a number of
>> problems and in thinking about those, I believe a couple of further
>> problems have come to mind that I mentioned above. We'll work on a
>> revised draft.
>>
>>
>> Cheers
>>
>> Magnus Westerlund
>>
>> ----------------------------------------------------------------------
>> Media Technologies, Ericsson Research
>> ----------------------------------------------------------------------
>> Ericsson AB                 | Phone  +46 10 7148287
>> Torshamnsgatan 23           | Mobile +46 73 0949079
>> SE-164 80 Stockholm, Sweden | mailto: [email protected]
>> ----------------------------------------------------------------------
>
>

_______________________________________________
trill mailing list
[email protected]
https://www.ietf.org/mailman/listinfo/trill

Re: [trill] Tsvart early review of draft-ietf-trill-over-ip-10

Reply via email to