Hi Magnus, It has been a while but the just posted version -12 is intended to resolve your comments except those related to middle boxes. (The TRILL WG has decided middle boxes will be out of scope for this draft.)
Thanks, Donald =============================== Donald E. Eastlake 3rd +1-508-333-2270 (cell) 155 Beaver Street, Milford, MA 01757 USA d3e...@gmail.com On Thu, Jun 29, 2017 at 10:17 PM, Donald Eastlake <d3e...@gmail.com> wrote: > HI Magnus, > > On Tue, Jun 27, 2017 at 1:13 PM, Magnus Westerlund > <magnus.westerl...@ericsson.com> wrote: >> >> Hi Donald, >> >> After having read your response I think there is an important question >> about the applicability of this document that affects several of the issues >> below and what solution you need. That is the question of what type of paths >> one expect to get Trill over IP working over. Because if the target is >> general Internet and also through middleboxes such as NATs and Firewall (Not >> intending to block) then there are a lot more work to ensure this. If you >> for example changes the applicability to require any on path middleboxes to >> fulfill certain requirements things can be more easily addressed. > > > The use cases in the document support communication over the general > Internet in the brach office case but that does not necessarily imply > NATs/Firewalls. > >> Den 2017-06-26 kl. 02:07, skrev Donald Eastlake: >> >> Hi Magnus, >> >> Thanks for the extensive review. See my responses below. >> >> On Thu, Jun 15, 2017 at 1:32 PM, Magnus Westerlund >> <magnus.westerl...@ericsson.com> wrote: >> >> >> Diffserv usage >> -------------- >> >> Section 4.3: >> >> TRILL over IP implementations MUST support setting the DSCP value in >> the outer IP Header of TRILL packets they send by mapping the TRILL >> priority and DEI to the DSCP. They MAY support, for a TRILL Data >> packet where the native frame payload is an IP packet, mapping the >> DSCP in this inner IP packet to the outer IP Header with the default >> for that mapping being to copy the DSCP without change. >> >> I think it is fine to require that implementations are capable of setting >> DSCP values on the outer IP header. However, I fail to see any discussion >> of >> the potential issues with actually setting the DSCP values. It is one >> thing to >> do this in an IP back bone use case where one can know and have control >> over >> the PHB that the DSCP values maps to. But otherwise, over general internet >> the >> behavior is not that predictable. One can easily be subject to policers or >> remapping. Also as the actual DSCP code point usage is domain specific >> this is >> difficult. Priority reversal is likely the least of the problems that this >> can >> run into over general Internet. >> >> It sounds like appropriate discussion and warnings about these issues >> would resolve the above comment. >> >> I would note that the choice of encapsulation here do becomes important. >> Your's and Joe Touch's observation that for TCP, you can only have a single >> DSCP marking per TCP connection for example. For others, see the discussion >> in Section 5.1 of https://datatracker.ietf.org/doc/rfc7657/ on this issue. > > > Well, if a TRILL over IP implementation using TCP transport wants to have > more than one priority category for traffic where there might be one or more > intervening IP routers, which would be the normal case, it would just need a > TCP connection per priority category. Mapping the 8 priority levels into a > smaller number of categories is a routine thing to do. Note that the base > TRILL protocol specification (RFC 6325) says: > > RBridges are not required to implement any > particular number of distinct priority levels but may treat one or > > more adjacent priority levels in the same fashion. > >> >> David Black also raised an important question if one should treat this as >> a tunnel with a single predictable behavior or let the inner networks >> marking show through. Establishing a tunnel with a single PHB has less risk >> of running into issues than multiple different markings. > > > It is an implementation choice whether to have a single PHB or eight, one > per priority level, or something in between. >> >> Section 4.3: >> >> The default TRILL priority and DEI to DSCP mapping, which may be >> configured per TRILL over IP port, is an follows. Note that the DEI >> value does not affect the default mapping and, to provide a >> potentially lower priority service than the default priority 0, >> priority 1 is considered lower priority than 0. So the priority >> sequence from lower to higher priority is 1, 0, 2, 3, 4, 5, 6, 7. >> >> TRILL Priority DEI DSCP Field (Binary/decimal) >> -------------- --- ----------------------------- >> 0 0/1 001000 / 8 >> 1 0/1 000000 / 0 >> 2 0/1 010000 / 16 >> 3 0/1 011000 / 24 >> 4 0/1 100000 / 32 >> 5 0/1 101000 / 40 >> 6 0/1 110000 / 48 >> 7 0/1 111000 / 56 >> >> This appear to be an problematic mapping. At least for prio 0 and 1. As >> priority 1 appears to be intended to be higher than priority 0, it is >> interesting that it is mapped to CS1, which to quote >> https://datatracker.ietf.org/doc/rfc7657/: >> >> CS1 ('001000') was subsequently designated as the recommended >> codepoint for the Lower Effort (LE) PHB [RFC3662]. >> >> So what is proposed can in a network using default mapping, result in that >> you >> get priority 0 to be lower priority than 1. Plus that in some networks >> this can >> also results in strange remapping that results in a different PHB for CS1 >> than. >> >> The intent in the draft is to reflect the default relative priority of >> the different priority code points in IEEE Std 802.1Q where priority 1 >> is lower than priority 0. At a quick look, it appears to me that RFC >> 2474 requires that 0x001000 be handled as being of a priority not >> lower than the priority with which 0x000000 is handled. Yet RFC 3662, >> which you point to, seems to suggest using 0x001000 as a lower >> priority code point than 0x000000. Given that 3662 not only does not >> update 2474 but is only Informational while 2474 is Standards Track, I >> would say that 2474 dominates and that this draft makes the best >> assumptions it can about default behavior... >> >> >> David Black provide a good answer on this. > > > I'll reply to him. >> >> MTU and Fragmentation >> --------------------- >> >> I think there are two main issue here. The first one is MTUD discovery >> of the actual IP path MTU between the ports. That will be needed to >> prevent >> a lot of traffic going into MTU black holes. Especially as TRILL requries >> 1470 byte support which is likey above a lot of paths. >> >> Seems like it would depend on the environments where TRILL was used. >> For example, I do not think 1470 would be a problem in most Data >> Center or Internet Exchange point uses, for example. Data Centers >> sometimes support 9K jumbo frames and the like. >> >> In fact, it is probably bad to focus too much on 1470 -- that is a >> required minimum to be sure that reasonable size link state PDUs can >> be successfully flooded through the TRILL campus so that routing will >> work. However, it would commonly be the case that, for the TRILL >> campus to be useful in a particular case, links need to be able to >> carry the expected size TRILL Data packets. For example, if there were >> two parts of a TRILL campus connected by one or a few TRILL over IP >> links and the end stations in each part were assuming they could use >> 1500 byte Ethernet packets, then the TRILL over IP links would need to >> support an MTU based on 1500 + TRILL Header + IP and TRILL over IP >> encapsulation. And more if security was being used or there were any >> other reasons for additional headers/encapsulation... >> >> >> Yes, and over general Internet you should be happy if you get 1500 bytes >> of IP MTU, it may easily be lower with a couple of additional tunnel >> headers. Thus, what you say is the goal is not feasible without a solution >> that supports fragmentation and reassembly, enabling one TRILL packet to be >> sent in multiple IP packets. The re-assembly do requires buffering and not >> something to easily perform on a router fast path. And attempting to use IP >> fragmentation is likely doomed if you have any type of NAT or Firewall in >> the way. >> >> This points to a dedicated solution or using a transport protocol that >> supports carrying arbitrary data sizes, like TCP or SCTP. And you need to >> use the byte-stream API of TCP to achieve this. > > > OK. >> >> Section 8.4: >> >> Path MTU discovery [RFC4821] should be useful >> in determining the IP MTU between a pair of RBridge ports with IP >> connectivity. >> >> The issue with RFC4821 is that it has requirements on the packetization >> layer. >> Trill appears to have several components that are useful. However, it will >> require a specification of the procedure to result in a useful tool. >> >> See below. >> >> Section 8.4: >> >> TRILL IS-IS MTU PDUs, as specified in Section 5 of [RFC6325] and in >> [RFC7177], can be used to obtain added assurance of the MTU of a >> link. >> >> Yes, that can confirm working MTUs that are at 1470 or above, but appears >> prevented from working below 1470? >> >> While there is a minimum size for TRILL IS-IS MTU PDUs, determined by >> header size, it is well below 1470, probably (depending on whether >> secuirty is in use, etc.) below 150 bytes. >> >> >> Okay, if you say so, it was not obvious from the spec that is was allowed >> to probe for paths with lesser MTUs than 1470. >> >> Thus, it appears that there is a lack of mechanism here to actually get a >> valid >> and functional MTU from TRILL in the cases where the Path MTU is below >> 1470. If >> I am wrong good, but I think this is an important piece for how to handle >> the >> next main issue. >> >> How about referencing Section 3 of >> https://tools.ietf.org/html/draft-ietf-trill-mtu-negotiation-05 >> which is currently in IETF Last Call? (The wording of that section is >> probably going to be improved based on an OPS review by Brian >> Carpenter.) >> >> I looked at this, and it appears to have the same issue, that it can't >> probe for MTU values below 1470. > > > I think the thing was that, before TRILL over IP, it would not have been > useful to determine an MTU below 1470. But there is no particular problem in > constructing a smaller MTU-probe PDUs and an RBridge receiving such a PDU is > generally required to respond with an equal length MTU-ack. >> >> 2) RB1 tries to send an MTU-probe padded to the size 1470. >> >> a) If RB1 fails to receive an MTU-ack from RB2 after k tries, RB1 >> sets the "failed minimum MTU test" flag for RB2 in RB1's Hello >> and stop. >> >> >> But, the algorithm clearly performs a binary search for the MTU. If >> one look at RFC 4821 one will notice that there are some additional >> considerations >> there how to make the probing better and robuster. But, cleary Trill has >> some other >> criterias for what is a success. Verification that Sz works appears >> sufficient, >> and there are no need to probe further upwards. >> >> UDP encapsulation and IP fragments. >> >> ---------------------------------- >> >> I see it as a big issue that UDP encapsulation is the native one, and that >> relies on IP fragmentation despite the need for reliable fragmentation. >> With >> the setup of having to support 1470 MTU on TRILL level some packets will >> be >> fragmented in many environments. That will lead to a lot of losses, and as >> discussed below a very big problem with middleboxes. The main problem here >> is >> that if one tries to rely on IP fragments one will have issues with >> packets >> ending up in black holes. And different problems depending on IPv4 or >> IPv6. >> IPv6 is lilkely the lesser problem assuming that one have working PMTUD. >> >> There are several ways out of this. >> >> 1. Detect issues and use TCP encapsulation with correctly set MSS to not >> get IP >> fragements 2. Determine MTU and implement an fragmentation mechanism on >> top of >> UDP. >> >> So, I don't see that much problem with UDP being the general default >> consistent with the TRILL philosophy of defaulting to need zero or >> minimal configuration. The default should be to use multicast Hellos >> for discovery of neighbors which sure points at UDP to me. Having to >> traverse a NAT should be a rare case. Since, in the NAT case, you have >> to configure things related to the static binding and the IP >> address(es) of peer(s) anyway you can also configure to use a >> different encapsulation than UDP, such as TCP, at the same time. I >> don't see it as much of a problem if, by default, TRILL won't operate >> through a NAT. If you are using UDP and it fragments and fragments are >> dropped at a NAT, probably you can't exchange Hellos so you will not >> form an adjacency and anything on the other side of the NAT will not >> be visible. >> >> >> Yes, but this is the issue of applicability and documenting that >> applicability. I don't know what goals and requirements that exist for >> Trill. If the WG are fine with some restrictions, then document them and >> focus on solving the issues that must be solved. >> >> You can clearly choose to require TCP for cases where the IP MTU is >> insufficient for carrying the Sz sized trill packets between the RBs using >> UDP. > > > OK. >> >> Zero Checksum: >> -------------- >> >> Section 5.4: >> >> UDP Checksum - as specified in [RFC0768] >> >> Considering the fast path encapsulation desire, I am surprised to not see >> any >> mentioning of use of zero checksum here. Raising the zero checksum and >> forward >> reference would be good I think. >> >> And then Section 8.5: >> >> The requirements for the usage of the zero UDP Checksum in a UDP >> tunnel protocol are detailed in [RFC6936]. These requirements apply >> to the UDP based TRILL over IP encapsulations specified herein >> (native and VXLAN), which are applications of UDP tunnel. >> >> If you actually intended to allow zero checksum, then you actually should >> document that Trill fulfills the requirements that the applicability >> statement >> raises. I have not analyzed how well it meets these requirements. >> >> Please review Section 6.2 of RFC 8086 for example how that can be done. >> >> OK. We'll look into it. >> >> TCP Encapsulation issue >> ----------------------- >> >> Section 5.6: >> >> The TCP encapsulation appear to be missing an delimiter format allowing >> each >> individual TRILL packet/payload to be read out of the TCP's byte stream. >> In >> other words, a normal implementation has no way of ensuring that the TCP >> payload starts with the start of a new TRILL payload. Multiple small TRILL >> payloads may be included in the same TCP payload, and also only parts as >> TCP is >> one way of dealing with TRILL packets that are larger than the >> IP+Encapsulation >> MTU that actually will work. >> >> This comment is based on that there appear to be no length fields included >> in >> the TRILL header. The most straight forward delimiter is a 2-byte length >> field >> for the TRILL payload to be encapsulated. >> >> Right. It might also be useful to include some sort of check field, as >> is done in BGP, to detect if you are out of sync in parsing the TCP >> stream. >> >> As you need to actually perform re-assembly, the solution is to use the >> byte stream semantics the TCP API provides and have a framing for each >> packet. > > > Of course. > > My point was that the framing might usefully have some sort of flag field, > like the BGP framing has, so that there was a good chance of detecting if > the parsing of the byte stream into frames has gotten out of synch. >> >> Another point is that, while with UDP it seems fine to send packets >> with assorted QoS, you don't want to encourage re-ordering of TCP >> packets in a stream. So if TCP encapsulation is being used, you want >> to use the same DSCP value for the packets in a particular TCP stream. >> So, generally, you need to have a TCP connection per priority handling >> category. Mapping the 8 priority levels into a smaller number of >> handling categories is a normal thing to do so you certainly don't >> necessarily need 8 TCP connections. Adding material on this should not >> be too hard. >> >> >> Yes, agreed it is a possibility and points into possible considerations >> that David raised. >> >> Section 5.6: >> >> TCP endpoint requirements. I do wonder if an application like TRILL actual >> would need to discuss performance impacting implementation choices or >> limitations. For example use of NAGLE, the requirements on buffer sizes in >> relation to Bandwidth delay products, as buffer memory in a RBridge will >> impact >> performance. >> >> Well, I'm not sure how deeply this document should get into such >> performance issues. What about just saying something about >> consideration being given to tuning TCP for performance and pointing >> to one or a few other RFCs that talk about this? >> >> >> As Joe said, these are important considerations. If your intention is to >> enable this to run at substantial fractions of line rates of the interfaces. >> Then this do require considerations. > > > I see. >> >> Congestion Control >> ------------------ >> First thanks for the effort here. >> >> You're welcome. >> >> 8.1.2 In Other Environments >> >> Where UDP based encapsulation headers are used in TRILL over IP in >> environments other than those discussed in Section 8.1.1, specific >> congestion control mechanisms are commonly needed. However, if the >> traffic being carried by the TRILL over IP link is already congestion >> controlled and the size and volatility of the TRILL IS-IS link state >> database is limited, then specific congestion control may not be >> needed. See [RFC8085] Section 3.1.11 for further guidance. >> >> This is correct, however my question is if the RBridges have any way of >> knowing >> which traffic is actually congestion controlled, considering that TRILL >> provides >> an layer 2 abstraction. I wonder if there should be any type of white list >> of >> the types of layer 2 payloads that can be assumed to be congestion >> controlled, >> and thus okay to forward over IP paths? I am worried that without any >> recommendation to prevent traffic that is not controlled to be forwarded, >> can >> lead to congestion issues. >> >> The other issue I think may exist is the issue serial unicast emulation of >> broadcast/multicast creates. As this amplifies the outgoing packet rate >> with >> a factor of how many addresses are configured for serial unicast this can >> be significant traffic expansion. Thus, I think additional considerations >> are >> needed here, and maybe rate limiting of the amount of traffic to be >> multicasted. >> >> OK. We can think about those issues. >> >> Flow and ECMP >> ------------- >> >> Section 8.3: >> >> For example, for TRILL >> Data, this entropy field could be based on some hash of the >> Inner.MacDA, Inner.MacSA, and Inner.VLAN or Inner.FGL. >> >> I would appreciate clearer references to what these fields are. >> >> In a TRILL Data packet, the payload after the TRILL Header looks like >> an Ethernet frame except that there is always either a VLAN tag or, >> alternatively, where the VLAN tag would be, a Fine Grained Label >> [RFC7172]. (The preceding is the view in the TRILL RFCs, but there is >> an equivalent and equally valid view in which all the fields through >> and including the VLAN or FGL tag are part of the TRILL Header.) The >> TRILL base protocol specification focuses on Ethernet as a link >> technology between TRILL switches, in which case there will be a link >> header including an Outer.MacDA and Outer.MacSA fields and possibly an >> Outer.VLAN, all before the TRILL Header. See Figure 1 and Figure 2 in >> RFC 7172. >> >> Some of the above could be added to the draft for clarity. >> >> If I understand this correctly, the idea here is to look into the inner >> layer 2 frames, and use the flow equivalents that exists on that level and >> hash that into value that maps the flows onto the source port range. >> >> Yes. >> >> I think this text should include a summary of the principle and ensure to >> note the important requirement that what is considered flows in the inner >> must not result in being striped over multiple source ports as this may >> lead to >> reordering issues due to packets taking different paths. >> >> Well, we can add some text. But when would the relative ordering >> matter for two TRILL Data packets where the two inner native payloads >> have different values for any one or more of these three fields >> (Inner.MacDA, Inner.MacSA, and inner VLAN/FGL tag) ? If any of those >> fields are different, you are talking about different streams. >> >> >> Okay, then this is very straightforward. >> >> NAT and TRILL over IP: >> Section 8.5: >> >> If one like to use TRILL over IP through a NAT, then there are some very >> important considerations that are missing. First the need for static >> binding >> configurations or the need for determining ones external address(es) and >> be >> able to communicate that to the peer RBridges, and in addition ensure that >> one >> has keep-alives to that the NAT binding never times out. >> >> I think those are good points. There is an additional problem that >> TRILL Hellos detect neighbors with which they have 2-way connectivity >> by indicating, inside the Hellos that are sent, from what neighbors >> Hellos have been received on that port. If a NAT is involved, these >> neighbor addresses inside Hellos need to be mapped. >> >> Yes, and the question is how that can be handled, by the receiver of the >> packet, or if the sender needs to determine what address it uses and provide >> that in the HELLOs. If the first is possible that can simplify a lot. > > > I'm not sure. this would require a little detailed design work. >> >> Next is the issue that there is almost zero chance of getting a IP/UDP >> encapsulation TRILL payload through the NAT if it results in IP >> fragmentation, >> as NATs don't do defragment and refragmented on the internal side, and an >> IP >> fragment lacks UDP port and thus can't be matched to binding. >> >> So perhaps the recommendation should be to configure the port to use >> TCP if there will be fragmentation. >> >> Yes, I think that are likely the simplest solution for you. > > > OK > > Thanks, > Donald > ============================= > Donald E. Eastlake 3rd +1-508-333-2270 (cell) > 155 Beaver Street, Milford, MA 01757 USA > d3e...@gmail.com > >> Also if you like to run IP/ESP through a NAT, then you most likely need >> the >> IP/UDP/ESP encapsulation (https://tools.ietf.org/html/rfc3948). Note that >> this >> will restrict the MTU even further and thus ensure that the 1470 >> requirement >> cannot be fulfilled even without additional tunnels over an 1500 bytes MTU >> Ethernet infrastructure. >> >> I would note that also firewalls likely have issues with IP fragments for >> the >> same reason, they require significant amount of state to be verified if >> they >> should be let through. >> >> In general I think you should create a configuration that has chance to >> work >> through most middleboxes, but I think you should require static bindings. >> I >> think that configuration is, and don't laugh now, but >> IP/UDP/ESP/TCP/TRILL, >> otherwise you will not be able to have both security and reliable >> fragmentation >> of TRILL packets. >> >> OK. Thanks again for this review. It has pointed out a number of >> problems and in thinking about those, I believe a couple of further >> problems have come to mind that I mentioned above. We'll work on a >> revised draft. >> >> >> Cheers >> >> Magnus Westerlund >> >> ---------------------------------------------------------------------- >> Media Technologies, Ericsson Research >> ---------------------------------------------------------------------- >> Ericsson AB | Phone +46 10 7148287 >> Torshamnsgatan 23 | Mobile +46 73 0949079 >> SE-164 80 Stockholm, Sweden | mailto: magnus.westerl...@ericsson.com >> ---------------------------------------------------------------------- > > _______________________________________________ trill mailing list trill@ietf.org https://www.ietf.org/mailman/listinfo/trill