Issues for 2008/055

Nicolas Droux Wed, 17 Dec 2008 15:59:01 -0700

On Dec 17, 2008, at 2:00 PM, James Carlson wrote:

> Nicolas Droux writes:
>> Here are my issues for Solaris Bridging (PSARC 2008/055).
>> Unfortunately I have a conflict and won't be able to attend the
>> inception review, but I'll be happy to follow-up by email.
>
> I'll integrate these (and my replies) into the existing issues file.


Thanks for your replies. Some follow-up below...

>
>
>> ngd-01 bridging-spec.txt states "The links assigned to a bridge must
>> not themselves be VLANs, VNICs, or tunnels. Only links that would be
>> acceptable as part of an aggregation or links that are aggregations
>> themselves may be assigned to a bridge." It should be also possible  
>> to
>> bridge etherstubs (introduced by Crossbow [PSARC 2006/357]), since
>> they can be used to create virtual switches.
>
> We had an extended talk about this one.  The spec intentionally
> doesn't mention etherstubs (except in passing) because they're not
> prohibited.
>
> You *should* in principle be able to create a bridge between two
> etherstub instances.  I've attempted to do this, and I've found that
> there appear to be numerous bugs related to etherstubs in ON today --
> for instance, dladm_linkid2legacyname() thinks they're invalid and
> dlpi_bind() won't allow me to bind to SAP zero so that I can send and
> receive STP into the bit-bucket.

I don't think the dladm_linkid2legacyname() you are seeing is a bug.  
As its name implies, it is used for legacy data-link names and  
etherstubs don't fall in that category.

We also limitations built-in to prevent an etherstub to be plumbed,  
which maybe causing the other issue you are hitting.

So things seem to be currently working as expected. If there are new  
requirements for etherstubs in order to make them work with bridging,  
we'll be happy to work with you on that.

> I'm sure I can fix and/or work around those bugs, and thus make it
> possible to bridge these objects.  I'll include doing that as part of
> the project.  From my prototype:
>
> # dladm show-bridge -l bar
> LINK         STATE        UPTIME   DESROOT
> stub1        forwarding   22       32768/0:0:0:0:0:0
> stub2        forwarding   22       32768/0:0:0:0:0:0
>
> I'm not sure, though, that it's an interesting case.  You'll get
> better performance if you just put all of the VNICs that must talk
> with each other together on a single etherstub if you're planning to
> bridge etherstubs together.  If you're planning to bridge an etherstub
> with a regular NIC, then just move the VNICs over to the regular NIC.

An important benefit is to have the flexibility to build virtual  
networks in a box which map directly to physical topologies.


>> ngd-02 in bridging-spec.txt, 2.2 a), the proposed link/up behavior in
>> the presence of bridges needs to be refined. With Crossbow VNICs, the
>> link status advertised to MAC clients depends also on the presence of
>> other MAC clients on top of the underlying data-link, in order to
>> maintain connectivity between these MAC clients when the physical  
>> link
>> of the underlying data-link goes down. This needs to be factored-in  
>> in
>> the logic used to reflect the link status when bridging is configured
>> on the underlying data-link.
>
> This appears to be a misunderstanding.  I'm not modifying the existing
> link up/down handling that Crossbow VNICs have in any way.

I think it's the following sentence in your document which which is  
confusing to me: "This means that when all external links are showing  
link-down status, the upper-level clients using the MAC layers will  
see link-down events as well."

VNICs are MAC clients, and their link status may not reflect the link- 
down events of the external links.

> The existing behavior is that the VNIC stays up if there are other
> VNICs configured on the same NIC.  The same is true when bridging is
> present in the picture: if all of the physical NICs go down, then
> VNICs will still do the same thing they did before, and will still
> advertise "up" status to clients when there are multiple VNICs present
> on the same NIC.
>
> There's no issue here.

I'd suggest clarifying the interactions in the spec.

>> ngd-05 bridging-design.pdf The design document is still referring to
>> old Crossbow architectural details which do not match what was
>> integrated in Nevada. For instance, some of the arguments used  
>> against
>> using the Crossbow classifier don't hold true with the latest  
>> Crossbow
>> implementation, and the discussion at pages 16-17 should be updated,
>> and the design possibly revisited based on the latest Crossbow flow
>> table architecture.
>
> The design document, as I've tried to make clear, is a very early
> draft, has not been updated, and is informative for the architectural
> review, not normative.  It's explicitly not under review here.
>
> However, the classifier issues still remain, and we discussed those at
> length.  The analogy from before still stands: for the same basic
> reasons that the Fireengine conn_t classifier can't really be used
> effectively as a substitute for the Patricia-tree based IP forwarding
> look-up (and vice-versa), the local delivery related classification in
> Crossbow doesn't appear suitable for the bridge forwarding case.
>
> With Crossbow, the classification is tied to the administrative bits,
> which rely on explicit configuration of the VNICs and flows involved
> using a user-space component.  With bridging, forwarding entries are
> created and updated on the fly based on source MAC addresses seen in
> the data path, and then aged away over time; there's no administrative
> involvement normally expected for these entries.

This doesn't have to be the case. The Crossbow flow implementation  
provides a kernel API which allows flows to be created and added to  
flow tables. That API today is used for VNICs, but also via MAC client  
creation in general (e.g. through LDOMs), for user-specified flows,  
and for multicast addresses. It could be used by the bridge code as  
well.

> The two are different in many respects.  In theory, though, it might
> be possible modify Crossbow so that it can create and destroy
> classification entries on the fly (this does not look trivial in the
> least; the locking scheme makes this an unobvious approach), and it
> may be possible to make use of some aspects of flow administration
> when tied to more easily identifiable objects, such as VLANs, though
> it's unclear how this should work with the existing Crossbow resource
> management structure.

Crossbow can already create and destroy flow entries on the fly. The  
locking requirements are also very straightforward.

> I regard all of that as a research project.  It may well be an
> interesting one, but it's not this project by any stretch.  I have no
> plans or engineering resources available to redesign the internals of
> Crossbow to handle things it wasn't originally designed to do, and I
> think that insisting on such an extension of the project I've proposed
> is not reasonable.  I will not be doing that.

I don't think you need to "redesign the internals of Crossbow".

We have kernel APIs which I believe can achieve most of what you need  
here. There might be some small gaps, but I don't see why you would  
need to introduce a new classification table at layer 2 since we  
already have most of what you need at the same layer in mac.

> For what it's worth, it may also be possible to modify Crossbow so
> that it eliminates the Fireengine classifier entirely.  After all, the
> two are much more aligned than are Crossbow and bridging: both involve
> identifying specific receiving client(s) on input and handling output
> from multiple clients, and both involve classification structures that
> are created strictly on the action of user space components.  It seems
> like a performance loss to have Crossbow inspect and classify the
> packet once -- potentially looking high up the stack for flow
> information -- only to have Fireengine do the same thing again.

Of course it might be possible to use flows from other layers of the  
stack, but this is not as obvious as bridging. See below...

> I can see that this path wasn't taken, so I can't help but wonder how
> reuse of Crossbow's classifier could be considered a requirement for
> bridging.

It is very relevant to bridging since the bridge forwarding happens at  
the same place on the data-path as the classification that Crossbow  
introduced in the MAC layer. For example on transmit, the  
classification on the destination MAC address results in sending the  
packet to another MAC client (e.g. VNIC), send copies of the packets  
to members of a multicast group, or send the packet on the wire. A new  
outcome would be to pass the packet to a bridge.

With Crossbow the old mac txinfo implementation is completely gone,  
and all packets sent from a client will go through mac_tx(). Since  
mac_tx() is where classification takes place, and where you need to do  
your own checks, it seems natural to combine the operations in a  
single classification operation.

Similarly on receive, the old mac rx_add() entry points are gone, and  
demultiplexing to the interested parties is now done by the mac layer  
through the same classification table. So having the entry for an  
address the bridge is interested in would allow the classification to  
be leveraged for the receive side as well.

So reusing the classifier for components of the same layer of the  
stack seems that the natural thing to do. Once you use flows you can  
also take advantage of hardware classification on the receive side.

The Crossbow team will be happy to answer questions you may have about  
the new datapath, and discuss specific requirements you may have, and  
are not addressed by the current implementation.

> One important issue did come up here: we need to define the relative
> ordering between L2 filtering and bridging, and I believe it makes
> sense to put L2 filtering closer to the physical I/O.  In other words,
> L2 filter should do its work underneath the bridge.

There's filtering which needs to occur between multiple MAC clients  
(VNICs are MAC clients) defined on top of the same data-link. For  
example to be consistent with the way things work in the physical  
world, one might want to prevent a VM to be able to specific send  
packets on the wire, which in this case would include a bridge. On the  
transmit side these checks would have to be done before the packet is  
potentially sent through a bridge, i.e. the L2 filtering would have to  
be done "on top" of the bridge.

Nicolas.

>
>
> -- 
> James Carlson, Solaris Networking              <james.d.carlson at sun.com 
> >
> Sun Microsystems / 35 Network Drive        71.232W   Vox +1 781 442  
> 2084
> MS UBUR02-212 / Burlington MA 01803-2757   42.496N   Fax +1 781 442  
> 1677

-- 
Nicolas Droux - Solaris Kernel Networking - Sun Microsystems, Inc.
nicolas.droux at sun.com - http://blogs.sun.com/droux

Issues for 2008/055

Reply via email to