You had several issues recorded in the 'issues' file for this case,
but you weren't present for the inception review to discuss them. I
read through the issues and responded as best I could to each, and we
discussed them at some length during the meeting.
I'd like to close this loop to make sure you've had a chance to read
the answers. Please look over the issues (with written responses)
below, and follow up on any that may not be completely answered. (You
might also want to visit the minutes and recording of the inception
review itself; some of the verbal answers went beyond the written
ones. For instance, I made a point of saying that I'm intentionally
not changing anything in the filtering code and that section 6.8 of
the spec is just advice to other project teams. Since you've read it,
my job is done here. ;-})
djr-01 From bridging-security.txt (1)(a), it would seem that there is
potential for an attacker to supply packets to the network that
would result in excessive CPU use - is that accurate?
How would an administrator detect this style of problem (melting
of the network) using Open/Solaris? (Are the observability tools
provided sufficient?)
Reply: There are two separate CPU-use-related threats here. One is
that an attacker could just flood the network with lots of STP
traffic for us to handle. You'd be able to detect that using
'prstat' and other tools, the effort required to mount the
attack would be significant, and the impact should be minor.
The other (which I think you're actually referring to) is the
network-melting effect of an L2 forwarding loop, which can
happen if someone can _prevent_ STP packets from getting
through.
That's an existing hazard that we all live with in standard
bridges today: bridged networks sometimes go down because of
persistent loops, which is one of the reasons why RBridges
will be better (though that's the subject of a future
project).
The change for this project is that instead of being just a
victim (when other bridges fail), we could be one of the
active participants in the loop. If that were the case, you'd
see the "dladm show-bridge -sl" counters increasing rapidly.
An inherent problem with this case is that there's no really
effective way of detecting or countering this failure mode in
any automatic fashion. It's not so different from a "really
busy day." (Obviously, if detected by a human as an abnormal
case, shutting down the affected links or bridges will solve
the problem, and that's usually how it's handled in real
networks today.)
The same problem can be caused by the "fastroute $IF" "to $IF"
and "dup-to $IF" options in ipf.conf when used on the input
side of a link. The packet is forwarded to another interface
without a TTL decrement, and if there's an L2 forwarding path
between those two interfaces, *exactly* the same failure mode
occurs. The difference is that bridging includes Spanning
Tree, which is designed to detect and disable such loops, and
IP filter has no such protection.
It might be possible to advance the state of the art here by
detecting the combination of high packet rate and identical
packets (having FCS delivery from the hardware would likely be
pretty important), but we're not proposing that with this
project.
djr-02 Given djr-01 and the integration of crossbow to provide MAC layer
classification and resource controls, is it possible to leverage
crossbow to protect the system from abuse refered to in (1)(a)?
If not immediately, is there scope for this as a future project?
Reply: Crossbow currently identifies flows in MAC clients, such as
VNICs. It doesn't work down at the IEEE 802.1 level where
bridging takes place.
In principle, it might be possible to create non-flow-oriented
resource controls down at lower levels, but I don't believe
that would be a viable fix for the general problem in (1)(a).
That problem isn't a matter of "abuse" or any bad action on
the part of other network nodes or users. It's a matter of a
single packet being transmitted on a network, picked up by a
bridge, and then retransmitted on the same network. Over and
over again.
It's not an abuse of the network by someone sending too much
traffic, so there's place where we can apply a throttle. It's
a failure of the network control protocols that ends up making
the network fundamentally unusable.
A resource control here would (in theory) limit the rate at
which we make this forwarding mistake in the case where
there's a persistent loop, but it wouldn't alleviate the
problem because all traffic in the same resource class would
be affected just as though the whole network were swamped. We
would still use all of our resource allotment resending the
same packet over and over.
For the same reason that you wouldn't ordinarily (at least)
use a resource control scheme to protect users against
erroneous IP routes, the same doesn't seem to apply here.
(If the resource controls here included WFQ among per MAC
destination queues, then there might be a good argument for
using that solution. I think having WFQ with very large
numbers of output queues would be a good addition to the
system, regardless of whether bridging is used or not. It
doesn't fix the original problem, but it does potentially
reduce the impact of many classes of DoS problems -- and many
others as well, such as basic fairness issues in cascaded IP
forwarding elements.)
djr-03 From bridge-spec.txt, (2.1), the requirement to use individual
network links to observe packets being sent does not fit with
what I would expect as a user. Needing to sniff the individual
network connections seems somewhat onerous (a snoop per link
in the bridge is required) and presupposes that the "user" knows
which interface they need to look on for the packet(s) they're
trying to observe.
Reply: You can snoop either individual links (if you want to see
what's going on with that link) or using the special bridge
observability node described in the section you reference.
The latter provides a copy of *all* traffic transiting the
bridge and doesn't require you to snoop individual links. You
see everything.
On Solaris today, you already *do* have to pick a link on
which you want to snoop, so there's no change in that respect.
We're adding observability, not taking any away.
djr-05 From bridge-spec.txt, (2.2)(b), how does this impact IP if IP
interfaces are plumb'd on top of a NIC that is also configured
to be part of a bridge?
Reply: IP traffic potentially runs more slowly because it's unable to
take advantage of hardware features.
That's a natural consequence of bridging, because the IP stack
doesn't actually know which underlying link will be used to
transmit a given packet: that knowledge depends on the bridge
forwarding table.
In principle, a future project could handle per-MAC-
destination hardware options from within IP, but that's not
this project. (And probably should be dependent on Erik's
refactoring project.)
djr-06 From bridge-spec.txt, (2.2)(b), do the notes here apply equally
to receiving and sending or only to one of the two?
Reply: They apply to all traffic on the link.
djr-07 From bridge-spec.txt, (6.8), surely you mean that the system
"should not do this" only when it is being used operationally
as a bridge.
Reply: Yes ... though given the possibility of third-party bridging
code, it seems unwise to allow the case to occur in general
without at least significant warnings. Filtering away
bridging PDUs (STP) looks like a very bad idea to me, and I
find it hard to see a case where it'd be justified.
--
James Carlson, Solaris Networking <james.d.carlson at sun.com>
Sun Microsystems / 35 Network Drive 71.232W Vox +1 781 442 2084
MS UBUR02-212 / Burlington MA 01803-2757 42.496N Fax +1 781 442 1677