I've been thinking about a problem I've had that keeps recurring, and I'd like to present the following proposal to PSARC. However,before I do, I'd like to hear thoughts from other folks who might have thoughts on the matter.

Stream and Character Dual Personality Device Support
----------------------------------------------------

Background:
-----------

There are two different frameworks for device drivers in Solaris -- one that
implies a device is a typical character/block device (and hence supports the
typical read(2), write(2), etc. entry points), and another that assumes that
devices are STREAMs devices and export their entry points via cb_ops.

Most device drivers fall firmly into one style (character/block) or
the other (STREAMs.)  Usually which kind of device the driver acts as is
determined by what type of hardware it is.  For example, NIC drivers are
required to be STREAMs devices in Solaris.  Audio devices historically were
STREAMs based, although since Boomer (PSARC 2008/318) they are now character
devices.


Problem:
--------

Some devices for one reason or another cannot properly be described as a
just character/block or STREAMs.  This is a problem that has recurred many
times in the author's experience.  Some specific examples:

* Venus (Sun Crypto Accelerator 4000) -- this device is a NIC, so it
 must be a STREAMs device.  However, for the Crypto framework used at
 the time (kcl, in Solaris 8) the crypto functionality was expressed
 via character interfaces.  (STREAMs was deemed far to cumbersome for
 this kind of functionality since a lot of complex structures had to
 be copied back and forth.)

* Audio - to be compatible with the legacy Audio API, audio devices
 need to be STREAMs devices.  But the new style OSS API is character
 based, and trying to impose STREAMs created a problem where STREAMs
 queueing semantics resulted in latency that the OSS API could not express
 to applications (and hence applications lost the ability to perform
 accurate positioning within the stream.)

* Converged IB/networking devices.  Devices from some vendors can support
either Infiniband or 10GbE functionality, based upon firmware configuration
 or external influences.  The Infiniband functionality and framework was
 designed to be character/block driven (especially it wants to perform some
 efficient mmap(2) operations and wants to avoid STREAMs latency) while the
 10 gigabit ethernet functionality needs to be based upon STREAMs as part
 of the GLDv3.

The problem here is that Solaris' DDI requires all minor nodes of a single
device (dev_info_t) to be either STREAMs or character/block device.  There
is no way to mix and match.


Historial Workarounds:

The historical workarounds have varied here.  There are two approaches
that the author has seen so far:

1) Nexus driver approach.  In this approach, the device is a nexus,
  and separate device drivers for each type of personality are developed.
  While this works, the problems with it stem from the fact that the nexus
framework is very awkward, undocumented, and not available to 3rd parties.
  It also creates an artifical branch point in the tree, which might not
  be the way to handle.  This approach can get in the way of code sharing.
  For legacy products, like the converged networking device, making use of
  this may require substantial rearchitecture of already complex devices
  and subsystems.

2) Pseudo-driver.  A slightly different approach, by using LDI (or legacy
equivalents) and a pseudo device, its possible to dynamically create minor nodes of a different device type. The austr device in the audio subsystem
  was created for this purpose.  While it works, and doesn't violate any
public DDI, its incredibly awkward, and requires devices to do extra magic outside of the driver to make sure that minor nodes are properly reflected. (For example, the audio framework performs a devfsadm -i austr during early
  boot to make sure that the instance for this pseudo driver is attached
  so that it can create and remove minor nodes on demand from the master
  device.)

  A similar approach was used in the Solaris 8 software for Venus -- the
  crypto minor nodes were owned by the crypto framework, rather than the
  physical device instance.  As a result of this, the framework needed
  to ensure that its minor node was always ready, and the drivers exported
  the ddi-no-autodetach and ddi-forceattach properties to make sure that
  hardware associated with the crypto was always ready to go.


Proposed Solution:
------------------

We'd like to propose that it should be possible for a device driver
to export minor nodes of both types (STREAMs and character/block devices.)

The main challenge here is deciding which entry points to use (the
cb_ops or the streamtab ones).

We propose the creation of a new cb_flag, D_DUAL_PERSONALITY, which when
present indicates that a device supports both STREAMs and regular minor
nodes.

At open(9e) time, the STREAMs entry point will then be allowed to return a
special errno (already defined), ENOSTR, to indicate that the minor node
supplied is not associated with STREAMs, but rather that the specfs framework
should retry the open using the character based open(9e).

This extra retry will not be performed if the D_DUAL_PERSONALITY flag
is not present.  (Optional: we could eliminate the flag, since most
STREAMs device drivers simply provide nodev() for their
character/block style open(9e).)

All other points can simply change their checks for STREAMSTAB(major) to a
check for an already open stream.

For example, in spec_write():

   if (STREAMSTAB(getmajor(dev))) {
       ASSERT(vp->v_type == VCHR);
       smark(sp, SUPD);
       return (strwrite(vp, uiop, cr));
   }

Would be rewritten as:

     if (vp->v_stream != NULL) {
             ASSERT(vp->v_type == VCHR);
       smark(sp, SUPD);
       return (strwrite(vp, uiop, cr));
     }

(Note that strwrite already has an ASSERT(vp->v_stream) as its first line
of executable code other than variable assignment).


GLDv3 Ramifications:
-------------------

Note that while the above problem solves the situation generally, it
dosen't help with frameworks where the framework manages the minor
number space.  The GLDv3 is one such framework.

The GLDv3 assumes that physical minor numbers less than 1001 are associated
with clone opens, and that PPAs will be numbered 1-1000.  (Number 0 is used
for DLPI style 2 attachments.)

We'd like to propose that the GLDv3 change to start numbering minor numbers
at 10001, to leave more room for other kinds of minor numbers.

Device drivers interacting with GLDv3 will probably need to "override" the
old getinfo(9e) implementation, because they will need to look for GLDv3
minor numbers *and* for other (e.g. IB) minor numbers.  Likewise, they might
need to override the streamstab open to return ENOSTR if the minor number is
not a GLDv3 network device.

Other frameworks might have similiar ramifications.

(For audio devices there would be no such ramification, since the
audio framework would be managing the minor number spaces for both
kinds of devices.)

_______________________________________________
opensolaris-code mailing list
opensolaris-code@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/opensolaris-code

Reply via email to