(This is deep SCSA implementation stuff... feel free to delete/ignore it
if you aren't concerned about the SCSI device driver framework DDI or
implementation.)
I've been playing around with some new drivers, and I've run into a
significant flaw, that I think is severe enough to merit possibly
correcting before the next "official" release, in SCSA.
The problem is two-fold -- first the interface is designed around some
assumptions that don't hold true all the time, and second the
implementation also makes some evil assumptions about internals of the
DMA framework that can be violated.
The problem comes up when on SPARC the DMA framework decides it needs to
use partial DMA. This can happen with the maximum permitted transfer is
smaller than the fixed limit implied by maxphys, even on SPARC systems
with IOMMUs. So the assumption that partial DMA doesn't happen on
SPARC (or rather when _DMA_USES_PHYSADDR is undefined) is wrong.
Right now, a driver that does that will trigger an assertion fault in
scsi_resource.c.
ASSERT(pktw->pcw_total_xfer <= bp->b_bcount);
in_pktp->pkt_resid = bp->b_bcount -
One approach I tried to fix this was to basically #define
_DMA_USES_PHYSADDR (well actually I changed the define name and made it
true), but this falls over elsewhere ... specifically the
implementation of scsi_dmaget_attr is completely hosed -- it access the
ddi_dma_handle_t implementation details, and it assumes incorrectly that
the dmai_cookie member is a valid array of DMA cookies. However, I
found that this is not necessarily the case on SPARC. Apparently this
is a significant difference between SPARC and x86. (Of course accessing
the internal DMA handle details is totally DDI uncompliant... and even
if you're willing to overlook that as this is code that resides inside
ON, it totally violates the software engineering principle of using and
respecting reasonable interface boundaries.
When I hit this problem, I gave up on making the fixes to the SCSA
implementation for my driver to work properly with tran_setup_pkt. It
seems clear to me that the assumption that one can access the set of DMA
cookies as a fixed array of objects is fundamentally flawed ... while it
works on x86, SPARC seems to violate this principle. (And, I wonder if
making this assumption could be problematic for future extensions to the
IOMMU DMA implementation for x86.)
Now, for my case, I "ported" the implementation of tran_setup_pkt (plus
a number of cleanups and fixes, and I also made it use
scsi_hba_pkt_alloc() as part of the kmem_cache constructor, so that I
wouldn't need to rely on undocumented scsi_pkt_size()) to blk2scsa.
I'll be posting a webrev of those changes soon -- the implementation is
wholly DDI compliant, I believe, and does not leave too many cycles on
the floor (although the kmem_cache constructor is a bit more expensive
than it would otherwise be -- but this should not be too frequently
executed.)
If someone wants to see these changes separately, please let me know.
I'm left with the following conclusions:
* scsi_tran_setup_pkt() is flawed. In particular, the use of
pkt_cookies as a simple point to an array of cookies is an unworkable
interface given the lack of consistency for support of such in the
underlying DMA implementations. Instead, we need a programmatic
interface that iterates over the DMA cookies, using ddi_dma_nextcookie()
and the pkt->pkt_dma_handle.
* the checks for _DMA_USES_PHYSADDR in scsi_resource.c are simply
misguided... SPARC cannot assume that partial DMA doesn't happen,
because it can, and there is nothing in the DDI that prevents drivers
that cause this from getting written.
* accessing internal portions of the DMA handle or other bits of the
DDI is a really bad design choice, and should be avoided unless there is
some really compelling need for it. (There
wasn't here.)
* in the meantime, device driver authors can workaround this by
(re-)implementing all the tran_setup_pkt logic themselves in their own
drivers. However, this is really unfortunate since it largely defeats
the purpose of having a simpler DDI for SCSI drivers in the first place.
Going forward, I can see several options:
1) document the restrictions for device drivers, and make no code
changes. I consider this the least acceptable choice, although possibly
the most expedient.
2) "fix" the API (and underlying implementation) in SCSA, and fix
all the consumers. This can be done now, because we have not delivered
an official release yet that includes tran_setup_pkt(). In other words,
we still have time to correct this before it is finalized. (Although
I'm not sure if any unbundled consumers of this API exist yet.)
3) leave the existing DDI, but mark it Obsolete (file CRs for
consumer drivers to fix it), and supply a new one that doesn't have the
limitations. This increases the code footprint and maybe also the
maintenance costs, but it probably reduces the testing impact on all the
new drivers. (The new behavior could be triggered with a new flag to
scsi_hba_attach_setup().)
Obviously, whatever we do (except for option #1), it will have impact on
numerous consuming drivers, possible performance implications (at the
very least calling ddi_dma_nextcookie is going to be more expensive than
just incrementing a pointer), and a significant test burden on any
changed drivers. There's ARC impact as well, but I'm happy to help out
with that, as I am (the last time I checked) a PSARC member in good
standing.
One other note: if we're going to make this significant change, I'd also
like to make one other change, which is to eliminate the assumption that
the SCSI hba needs to access the packet for DMA. The changes needed are
simple (I made them in my blk2scsa implementation), and can save some
overhead. While you might wonder why anyone would have a PIO only
device (I use a bcopy into prealloc'd buffers for SDcard, so it behaves
like PIO, to work around SDcard host hardware bugs), it has applications
for things like iSCSI, which never directly needs to perform DMA
operations. (So this would potentially accelerate iSCSI, and maybe even
things like scsa2usb.)
Thanks for any thoughts on any of this.
- Garrett
_______________________________________________
storage-discuss mailing list
storage-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/storage-discuss