(This is deep SCSA implementation stuff... feel free to delete/ignore it if you aren't concerned about the SCSI device driver framework DDI or implementation.)

I've been playing around with some new drivers, and I've run into a significant flaw, that I think is severe enough to merit possibly correcting before the next "official" release, in SCSA.

The problem is two-fold -- first the interface is designed around some assumptions that don't hold true all the time, and second the implementation also makes some evil assumptions about internals of the DMA framework that can be violated.

The problem comes up when on SPARC the DMA framework decides it needs to use partial DMA. This can happen with the maximum permitted transfer is smaller than the fixed limit implied by maxphys, even on SPARC systems with IOMMUs. So the assumption that partial DMA doesn't happen on SPARC (or rather when _DMA_USES_PHYSADDR is undefined) is wrong.

Right now, a driver that does that will trigger an assertion fault in scsi_resource.c.

       ASSERT(pktw->pcw_total_xfer <= bp->b_bcount);
       in_pktp->pkt_resid = bp->b_bcount -

One approach I tried to fix this was to basically #define _DMA_USES_PHYSADDR (well actually I changed the define name and made it true), but this falls over elsewhere ... specifically the implementation of scsi_dmaget_attr is completely hosed -- it access the ddi_dma_handle_t implementation details, and it assumes incorrectly that the dmai_cookie member is a valid array of DMA cookies. However, I found that this is not necessarily the case on SPARC. Apparently this is a significant difference between SPARC and x86. (Of course accessing the internal DMA handle details is totally DDI uncompliant... and even if you're willing to overlook that as this is code that resides inside ON, it totally violates the software engineering principle of using and respecting reasonable interface boundaries.

When I hit this problem, I gave up on making the fixes to the SCSA implementation for my driver to work properly with tran_setup_pkt. It seems clear to me that the assumption that one can access the set of DMA cookies as a fixed array of objects is fundamentally flawed ... while it works on x86, SPARC seems to violate this principle. (And, I wonder if making this assumption could be problematic for future extensions to the IOMMU DMA implementation for x86.)

Now, for my case, I "ported" the implementation of tran_setup_pkt (plus a number of cleanups and fixes, and I also made it use scsi_hba_pkt_alloc() as part of the kmem_cache constructor, so that I wouldn't need to rely on undocumented scsi_pkt_size()) to blk2scsa. I'll be posting a webrev of those changes soon -- the implementation is wholly DDI compliant, I believe, and does not leave too many cycles on the floor (although the kmem_cache constructor is a bit more expensive than it would otherwise be -- but this should not be too frequently executed.)

If someone wants to see these changes separately, please let me know.

I'm left with the following conclusions:

* scsi_tran_setup_pkt() is flawed. In particular, the use of pkt_cookies as a simple point to an array of cookies is an unworkable interface given the lack of consistency for support of such in the underlying DMA implementations. Instead, we need a programmatic interface that iterates over the DMA cookies, using ddi_dma_nextcookie() and the pkt->pkt_dma_handle.

* the checks for _DMA_USES_PHYSADDR in scsi_resource.c are simply misguided... SPARC cannot assume that partial DMA doesn't happen, because it can, and there is nothing in the DDI that prevents drivers that cause this from getting written.

* accessing internal portions of the DMA handle or other bits of the DDI is a really bad design choice, and should be avoided unless there is some really compelling need for it. (There
wasn't here.)

* in the meantime, device driver authors can workaround this by (re-)implementing all the tran_setup_pkt logic themselves in their own drivers. However, this is really unfortunate since it largely defeats the purpose of having a simpler DDI for SCSI drivers in the first place.

Going forward, I can see several options:

1) document the restrictions for device drivers, and make no code changes. I consider this the least acceptable choice, although possibly the most expedient.

2) "fix" the API (and underlying implementation) in SCSA, and fix all the consumers. This can be done now, because we have not delivered an official release yet that includes tran_setup_pkt(). In other words, we still have time to correct this before it is finalized. (Although I'm not sure if any unbundled consumers of this API exist yet.)

3) leave the existing DDI, but mark it Obsolete (file CRs for consumer drivers to fix it), and supply a new one that doesn't have the limitations. This increases the code footprint and maybe also the maintenance costs, but it probably reduces the testing impact on all the new drivers. (The new behavior could be triggered with a new flag to scsi_hba_attach_setup().)

Obviously, whatever we do (except for option #1), it will have impact on numerous consuming drivers, possible performance implications (at the very least calling ddi_dma_nextcookie is going to be more expensive than just incrementing a pointer), and a significant test burden on any changed drivers. There's ARC impact as well, but I'm happy to help out with that, as I am (the last time I checked) a PSARC member in good standing.

One other note: if we're going to make this significant change, I'd also like to make one other change, which is to eliminate the assumption that the SCSI hba needs to access the packet for DMA. The changes needed are simple (I made them in my blk2scsa implementation), and can save some overhead. While you might wonder why anyone would have a PIO only device (I use a bcopy into prealloc'd buffers for SDcard, so it behaves like PIO, to work around SDcard host hardware bugs), it has applications for things like iSCSI, which never directly needs to perform DMA operations. (So this would potentially accelerate iSCSI, and maybe even things like scsa2usb.)

Thanks for any thoughts on any of this.

   - Garrett

_______________________________________________
storage-discuss mailing list
storage-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/storage-discuss

Reply via email to