Bob --
Thanks for taking the time to explain the history and rationale of Table
3 to me. I'm going to clear my DISCUSS based on this explanation, since
it seems that the technical solution in the document is correct, if a
bit non-obvious. Some additional comments inline.
[For those of you who did not see Bob's response: it appears that my
original DISCUSS email did not reach him, so he replied to a reduced
audience. I have restored the traditional IESG review distribution to
this email. For this reason, I am eliding less of Bob's response than I
usually would]
On 2/20/18 12:51 PM, Bob Briscoe wrote:
On Thu, Feb 8, 2018 at 2:15 PM, Adam Roach
<[email protected] <mailto:[email protected]>> wrote:
On 2/8/18 11:53 AM, Donald Eastlake wrote:
Hi Adam,
On Wed, Feb 7, 2018 at 8:12 PM, Adam Roach
<[email protected] <mailto:[email protected]>> wrote:
> Adam Roach has entered the following ballot position for
> draft-ietf-trill-ecn-support-05: Discuss
>
> ...
>
>
----------------------------------------------------------------------
> DISCUSS:
>
----------------------------------------------------------------------
>
> Thanks to the authors, chairs, shepherd, and working
group for the effort that
> has been put into this document.
>
> I have concerns about some ambiguity and/or
self-contradiction in this
> specification, but I suspect these should be easy to
fix. In particular, the
> behavior defined in Table 3 doesn't seem to be
consistent with the behavior
> described in the prose.
>
> For easy reference, I've copied Table 3 here:
>
>>
+---------+----------------------------------------------+
>> | Inner | Arriving TRILL 3-bit ECN
Codepoint Name |
>> | Native
+---------+------------+------------+----------+
>> | Header | Not-ECT | ECT(0) | ECT(1)
| CE |
>>
+---------+---------+------------+------------+----------+
>> | Not-ECT | Not-ECT | Not-ECT(*) | Not-ECT(*)
| <drop> |
>> | ECT(0) | ECT(0) | ECT(0) | ECT(1)
| CE |
>> | ECT(1) | ECT(1) | ECT(1)(*) | ECT(1)
| CE |
>> | CE | CE | CE | CE(*) |
CE |
>>
+---------+---------+------------+------------+----------+
>>
>> Table 3. Egress ECN Behavior
>>
>> An asterisk in the above table indicates a
currently unused
>> combination that SHOULD be logged. In contrast to
[RFC6040], in TRILL
>> the drop condition is the result of a valid
combination of events and
>> need not be logged.
>
> The prose in this document indicates:
>
> 1. Ingress gateway either copies the native header
value to the TRILL ECN
> codepoint (resulting in any of the four values
above) or doesn't insert
> any ECN information in the TRILL header.
>
> 2. Intermediate gateways can set the CCE flag,
resulting in "CE" in the
> table above.
>
> Based on the above, a packet arriving at an egress
gateway can only be in one of
> the following states:
>
> A. TRILL header is Not-ECT because no TRILL node
inserted ECN information.
>
> B. TRILL header value == Native header value
because the ingress gateway
> copied it from native to TRILL.
>
> C. TRILL header is "CE" because an intermediate
node indicated congestion.
Sort of... But note that the TRILL header ECN bit s
can indicate non-ECT while the CCE bit is set making
the overall TRILL Header "CE".
Right. That's part of case C. My states above assume
application of Table 2 has already taken place.
> If that's correct, I would think that any state
other than those three needs
> to be marked with an (*). In particular, these two
states fall into that
> classification, and seem to require an asterisk:
[BB] These states are not tagged (*) cos they can arise with variants
of ECN marking behaviour referred to in Section 4 (explained beneath
each bullet below). By design (explained further below), a consistent
encap and decap behaviour is intended to support all variants of ECN,
because subnet ingress and egress nodes will very probably be deployed
independently of each other and of intermediate nodes and/or L4 endpoints.
Nonetheless, you're right that this is completely non-obvious to a
reader coming to this new. So let's change the text:
CURRENT:
An asterisk in the above table indicates a currently unused
combination that SHOULD be logged. In contrast to [RFC6040
<https://tools.ietf.org/html/rfc6040>], in TRILL
the drop condition is the result of a valid combination of events and
need not be logged.
SUGGESTED
An asterisk in the above table indicates a combination that is currently
unused in all variants of ECN marking (see Section 4) and therefore
SHOULD be logged. In contrast to [RFC6040
<https://tools.ietf.org/html/rfc6040>], in TRILL
the drop condition is the result of a valid combination of events and
need not be logged.
Thanks. That definitely helps; but see my comments below.
>
> - Native==CE && TRILL==ECT(0)
[BB] You have indeed highlighted an awkward wrinkle here - we could
have tagged this combination with '(*)', cos it is strictly currently
unused by trill. However, I'm going to argue against a '(*)'...
My explanation starts with some history: Back in 2001 when ECN in IP
was first defined [RFC3168], an IP-in-IP tunnel ingress set the outer
to ECT(0) if ECN in the native header was anything but Not-ECT (0b00).
This was intended to close off a perceived covert channel in IPsec.
However, since then the security area agreed that this added
over-paranoid complexity. So, since [RFC6040] an IP-in-IP tunnel
ingress now just copies the native ECN to the outer, and TRILL ECN
follows this new simpler model.
Altho this spec does not allow a TRILL ingress RBrIdge to exhibit the
old IP-in-IP legacy behaviour, I think it best not to tag this
combination as 'currently unused'. Because, wherever possible, we want
trill's ECN encap and decap to mimic IP-in-IP (a principle explained
further down this email).
A more specific reason: there is already a congestion monitoring
technique being proposed that fakes the legacy IP-in-IP behaviour to
measure congestion across a tunnel
(draft-ietf-tsvwg-tunnel-congestion-feedback). Before it becomes an
RFC, I wouldn't be surprised if that draft gets extended from IP-in-IP
to IP-in-TRILL and other L2 protocols as well. If that likely event
happened, we would have to update this trill-ecn spec to remove the
'(*)' again. In the mean time, there's little harm in not logging a
combination that is not harmful, even tho it is stricty 'currently
unused'.
(sry about the triple negative!)
This is a great explanation. I understand the risks of putting
forward-looking speculation in RFCs, but could I convince you to at
least explain in the document that the reason for this state not being
considered out of the ordinary is to mimic legacy IP-in-IP ECN behavior?
Any acknowledgement of why this case is being treated the way it is
would be sufficient to prevent implementors from deciding that the table
is wrong.
>
> - Native==ECT(0) && TRILL==ECT(1)
[BB] The standards track PCN variant of ECN [RFC6660] uses ECT(1) as a
lower severity of congestion notification than CE. Hence the asymmetry
where a native ECT(0) header can have an ECT(1) outer, but not vice versa.
As with the case above, it would be good (inasmuch as it would serve to
un-confuse implementors) to include an explanation that this case makes
sense due to RFC 6660's handling of ECT(0) versus ECT(1). Such a comment
would also address my concern quoted below.
I would defer to my co-author Bob Briscoe but it looks
to me like you have a good point.
Thanks; I'll wait to hear from Bob.
> In addition, the behavior this table defines for Native==ECT(0) &&
TRILL==ECT(1)
> is somewhat perplexing: for this case, the value in
the TRILL header takes
> precedence; however, when Native==ECT(1) &&
TRILL==ECT(0) the Native header
> takes precedence. Or, put another way, this table
defines ECT(1) to always
> override ECT(0). I don't find any prose in here to
indicate why this needs to be
> treated differentially, so I'm left to conclude that
this is a typographical
> error. If that's not the case, please add motivating
text to Table 3 explaining
> why ECT(1) is treated differently than ECT(0) for
baseline ECN behavior.
As noted in Section 4.1, there is an ECN variation
where ECT(0) just indicates ECT while ECT(1) indicates
congestion of a lesser severity level has been
encountered than that indicated by CE. I believe the
dominance of ECT(1) over ECT(0) is designed to not
interfere with this variation.
Yes, and section 4 opens with the explanation that
"Section 3 specifies interworking between TRILL and the
original standardized form of ECN in IP [RFC3168]."
Beyond that, I wouldn't expect any of the text in
non-normative section 4 or its subsections to have
bearing on the normative table in Section 3.
[BB]
* PCN is standards track, hence PCN-specific combinations are not
tagged 'currently unused' in (normative) Section 3.
* L4S is experimental track, but it doesn't use any different
combinations of ECN than standard ECN, so it doesn't alter the CU
asterisks in section 3.
Section 4 is informative, only because it doesn't define the protocol
spec; it merely explains how the normative spec in Section 3 allows
for ECN variants (whether stds track or experimental).
Thanks. The answer to this that I find compelling is your explanation
below about RFC 6040's relative ordering of ECN markings, which I think
should be at least mentioned in passing here.
My reading of RFC 8311 is that it contemplates a series
of experiments beyond those currently under
development. It may well be that the current
experiments consider ECT(1) to have a higher severity
than ECT(0), but that this may not make sense for
future experiments. Even if it does, I don't see
guidance in RFC 8311 (or any other update to RFC 3168)
that suggests such a relationship between ECT(0) and
ECT(1) exists.
As above, the possibility that ECT(1) is a higher severity than ECT(0)
is already on the standards track (PCN [RFC6660]).
As far as possible, we want trill subnet endpoints to follow the same
ECN behaviour as IP-in-IP tunnel endpoints [RFC6040]. The explicitly
stated philosophy of RFC6040 and [ECNencapGuide] (which is referred to
from the Intro to this trill spec), is for respectively all tunnel
endpoints and subnet endpoints to have the same mechanical, dumb ECN
behaviour, whatever the variant of ECN in use. Because in the
incrementally deployed public Internet, one cannot know what tunnels
and subnets might be encountered across different paths.
On top of this (as implied by the existence of section
4), the TRILL handling for ECN will need to vary from
experiment to experiment. It seems reasonable that part
of this variability would include different mapping of
ECN bits by the egress gateway.
Nope. By design, all variants of ECN use the same encap and decap
behaviours, otherwise they would be undeployable to all intents and
purposes. Perhaps we should make that clearer in this sentence
introducing Section 4:
CURRENT:
The ECN wire protocol for TRILL (Section 2
<https://tools.ietf.org/html/draft-ietf-trill-ecn-support-05#section-2>) has been designed to
support the other known variants of ECN,
SUGGESTED:
The ECN wire protocol for TRILL (Section 2
<https://tools.ietf.org/html/draft-ietf-trill-ecn-support-05#section-2>) and the ingress (Section 3.1)
and egress (Section 3.3) ECN behaviours have been designed to
support the other known variants of ECN,
This sounds good. It might be worth mentioning that this behavior is
intended to also be forward-compatible with future ECN variants, as long
as they conform to RFC 6040's notion of relative codepoint priorities.
Basically, I see two ways this can be resolved,
although I'm happy to hear alternatives so long as they
end up with the ECN and TRILL documents being in a
consistent and future-proof state:
Approach 1: Revise Table 3 so that ECT(0) and ECT(1)
are treated the same (i.e., pick one of "TRILL header
wins" or "Native header wins" -- I would suggest
"Native header wins," for maximal compatibility with
older ECN clients but I'm not dogmatic on this point),
and then include a normative statement that allows RFC
8311 experiments to override this mapping as makes
sense for their design.
Approach 2: Leave the table as is, but add an
explanation of why ECT(1) is given preferential
treatment over ECT(0).
[BB] I hope you're happy with the additional text I've suggested for
why Table 3 was like it was. In summary, consistency with IP-in-IP is
paramount wherever possible.
That's exactly what the additional text should say. :) I don't see that
in your current proposal.
Add a normative dependency from this document to a new
document that updates RFC 8311 to add a requirement
that any experiments that treat ECT(0) differently than
ECT(1) MUST be designed such ECT(0) always indicates a
lower severity of congestion than ECT(1). I presume
that this new document would need to be done in
coordination with TSVWG.
[BB] That's already catered for in [RFC6040] (stds track) for
IP-in-IP, quoting:
o In all other cases where the inner supports ECN, the decapsulator
MUST set the outgoing ECN field to the more severe marking of the
outer and inner ECN fields, where the ranking of severity from
highest to lowest is CE, ECT(1), ECT(0), Not-ECT. This in no way
precludes cases where ECT(1) and ECT(0) have the same severity;
and in [ECNencapGuide] (intended BCP) for IP-in-any-L2, quoting:
4. Congestion indications may be encoded by a severity level. For
instance increasing levels of congestion might be encoded by
numerically increasing indications, e.g. pre-congestion
notification (PCN) can be encoded in each PDU at three severity
levels in IP or MPLS [RFC6660 <https://tools.ietf.org/html/rfc6660>].
If the arriving inner header is an ECN-PDU, where the inner and
outer headers carry indications of congestion of different
severity, the more severe indication SHOULD be forwarded in
preference to the less severe.
I think Approach 1 is more straightforward, but if
there's a feeling in the working group that we need
default egress behavior that is forwards-compatible
with yet-undesigned experiments, then I think Approach
2 is what you need.
/a
The way ECN encap & decap has been designed already follows your
'approach 2'; where the same consistent behaviour at all encaps and
decaps can be exploited in both backward and forward compatible ways.
Given your explanation above, I agree that such is the case. I think
including enough of that explanation that implementors don't find
themselves double-guessing the specification in Table 3 would ultimately
aid in interoperability.
Indeed, here's proof that this forward compatibility is not just
theoretical...
L4S came along after all this, and required drop probability to be the
square of the marking probability, but only for one of the ECN
codepoints. We managed to contrive the marking behaviour of an
intermediate trill RBridge that supports L4S so that any trill egress
will achieve this precise squaring, even an egress without any ECN or
L4S logic, and no squaring logic either (see Appendix A).
Cheers, and thx again for noticing all this.
Bob
_______________________________________________
trill mailing list
[email protected]
https://www.ietf.org/mailman/listinfo/trill