I've been giving some thought to how Geneve could be used in a way that
is compatible with being decoded by TCAM on a packet processing "fast
path". Mostly, I've been doing this as an exercise to get thinking
about the issues involved in NVO, since I'm new to the subject.
Most of this analysis is straightforward, but various considerations
regarding designing options arise, and people might not think of them
if they didn't deliberately make option definitions TCAM-compatible.
The only unexpected problem is that if we want to use TCAM processing
on a fixed initial set of options but allow packets to have varying
sets of options following the fixed options. That requires defining
"last option" bits in both the starting Geneve words and the option
start word.
* TCAM compatibility
As I understand it, "TCAM compatibility" for Geneve (or any similar
NVO header) requires that:
(1) A packet can be tested for conformance with an expected format of
options by testing the subset of bits at specific offsets for equality
with specific values.
(2) If the packet passes the test, the values of specific options can
always be found at specific offsets within the header.
(3) In any single operational environment, Geneve can be used in a way
that the vast majority of packets have Geneve headers with a single
expected format.
An "expected format" consists of a specific sequence of option types,
with each option having a specific length. These options are the
initial or only options in the Geneve header. Optionally, an expected
format can require that there be no further options beyond the
required options.
The crux seems to be to test that (1) the Option Class and Type fields
of each option have the expected values, (2) each option's Length
field has the expected value, and (3) the overall Option Length field
has the expected value. This ensures that the expected options are
where they are expected to be, and that what appear to be options
actually are within the Geneve header. (This is assuming that the
expected format does not allow additional trailing options, so that
the Option Length must have a specific value.)
When defining options, one should define them so they can be used in a
TCAM-compatible way. A couple of considerations are:
* Options should have variants that specify the "default" effect
If an option expresses something by its presence, there must be a way
to express the default meaning (the semantics of the option being
absent) with the same option. In the simplest form, if there is a
"special processing mode" option, there should be a bit within the
option that can be changed to invoke the "normal processing mode".
Otherwise, if an operational environment requires packets to be marked
for both modes of processing, the only way to express "normal
processing mode" is by omitting the option, violating the expected
format.
As an example, it seems that usage of 802.1q (the Ethernet virtual
network identifier header) often violates this principle, in that most
frames have 802.1q frames that have an 802.1q header (are within a
VPN), but some that are used for "control" purposes omit the 802.1q
header (are not in a VPN).
* For options that have no value (Length = 0)
If an option has no value data, it would be annoying to design an
entire value word just to carry one bit that specifies whether the
option has an effect or not. An alternative is to assign two Option
Type values that differ in one bit position, with one type having
"default" meaning, i.e., same as when both options are absent. This
allows the sender to use one option or the other in a particular
position to specify how the packet is to be handled.
If the non-default option has the C bit, a convenient choice for the
default option is to use the same lower seven bits of the type, but
with C = 0. For example, the "special processing mode" option would
be:
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Option Class = FFF0 |1| Type = 40|R|R|R|Length=0 |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
and the "normal processing mode" option would be:
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Option Class = FFF0 |0| Type = 40|R|R|R|Length=0 |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
special processing mode = FFF0-C0
normal processing mode = FFF0-40
* Be able to fix the length of an option
To be TCAM-compatible, an expected format must fix the lengths of all
of the options it specifies. This is described in section 4.2.1, but
I think a minor edit would help:
4.2.1. Constraints on Options
While Geneve options are more flexible, a control plane may restrict
the number of option TLVs as well as the order and size of the TLVs,
between tunnel endpoints, [...]
A control plane may negotiate a subset of option TLVs and certain TLV
ordering, as well may limit the total number of option TLVs present
in the packet, for example, to accommodate hardware capable of
processing fewer options [I-D.dt-nvo3-encap]. Hence, a control plane
needs to have the ability to describe the supported TLVs subset and
their order to the tunnel end points. In the absence of a control
plane, alternative configuration mechanisms may be used for this
purpose. The exact mechanism is not defined in this document.
The first paragraph mentions control of the size of options, but the
second paragraph doesn't. I suggest changing "a subset of option TLVs
and certain TLV ordering" to "a subset of option TLVs, their lengths,
and ordering", and changing "the supported TLVs subset and their
order" to "the supported TLVs subset and their order and lengths".
More significantly, an environment can only set the length of an
option if the option is designed properly. Of course, if the option
is defined with a fixed length, this is easy. But if an option can
have a variable length, care is needed to allow it to be used in this
way, because the amount of data to be carried may be smaller than the
length of the option.
As an example consider an option that carries an MPLS label stack.
This seems straightforward:
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Option Class | Type |R|R|R| 3 |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Label 1 | Exp |0| TTL |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Label 2 | Exp |0| TTL |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Label 3 | Exp |1| TTL |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
But if a transit device wants to pop a label off the stack, you want
to avoid having to shorten the option, which would violate the
expected format:
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Option Class | Type |R|R|R| 2 |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Label 2 | Exp |0| TTL |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Label 3 | Exp |1| TTL |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
One might define the option to respect the MPLS S bit, so that words
after a label with S=1 are padding:
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Option Class | Type |R|R|R| 3 |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Label 2 | Exp |0| TTL |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Label 3 | Exp |1| TTL |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| ignored |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
(The labels are shifted earlier so that the label to be used for
current routing decisions is always the first one.)
But that doesn't allow a zero-length list of labels to be specified.
One could augment this by filling the ignored words with Label value 0
(which is the reserved "IPv4 Explicit NULL Label"). Then an option
with Length = 3 containing two labels would be formatted:
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Option Class | Type |R|R|R| 3 |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Label 2 | Exp |0| TTL |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Label 3 | Exp |1| TTL |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| 0 | 0 |0| 0 |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
and an option containing zero labels would be formatted:
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Option Class | Type |R|R|R| 3 |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| 0 | 0 |0| 0 |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| 0 | 0 |0| 0 |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| 0 | 0 |0| 0 |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
As another example, consider an option containing a list of IP
addresses to be used as a route:
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Option Class | Type |R|R|R| 3 |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| IP address 1 |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| IP address 2 |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| IP address 3 |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
If an intermediate system wants to delete the first address because it
has already been used, there should be a defined way to shift the
later addresses up and fill the final word with known-invalid value
such as 0.0.0.0:
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Option Class | Type |R|R|R| 3 |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| IP address 2 |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| IP address 3 |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| 0.0.0.0 |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
* Allowing additional options: the Last-Option bit
A problem arises if we want to support expected formats that allow
further options beyond the specified set. The problem is that there's
no way to use TCAM to test whether the Option Length field is greater
than or equal to a specified constant, and we need to do that test to
verify that the purported initial, required options are actually
within the Geneve header.
Formally, the problem is that TCAM can't compare a number with a
constant if the number is expressed in binary. To do a comparison,
the number must be expressed in *unary* -- in our case, each option
header word must contain a "last option" bit, and the sequence of last
option bits in the option header words express the number of options
in unary. In addition, the initial words must also contain a bit
telling whether there are any options at all, since TCAM can't test
the Option Length field for being greater than zero.
Thus, the Geneve header would assign a bit saying whether there are
any options:
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|Ver| Opt Len |O|C|L| Rsvd. | Protocol Type |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Virtual Network Identifier (VNI) | Reserved |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| ... |
And the option start word would assign a bit as the last-option bit:
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Option Class | Type |L|R|R| Length |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
L (1 bit): Last option. If set, no options follow in the Geneve
header.
Dale
_______________________________________________
nvo3 mailing list
[email protected]
https://www.ietf.org/mailman/listinfo/nvo3