I am sponsoring the following fasttrack for Mark Johnson, requesting patch
binding and a timeout of 10/16/2007.
-Chris
Template Version: @(#)sac_nextcase 1.64 07/13/07 SMI
1. Introduction
1.1 Project/Component Working Name:
dma-max-arch scsi capability
1.2 Name of Document Author/Supplier:
Author: Mark Johnson
1.3 Date of This Document:
Tue Oct 9 10:36:08 MDT 2007
4. Technical Description
4.1 Introduction
This case introduces the 'dma-max-arch' scsi capability, and is the
first of two fasttracks to address st tape driver performance on
x86.
dma-max-arch scsi capability
bp_copyin()/bp_copyout()
There is a recently escalated bug outstanding for this issue.
6567168 s10 x86 st tape driver performance issue
http://monaco.sfbay/detail.jsf?cr=6567168
4.2 Background
Most tape drives cannot handle partial DMAs. An entire tape block
must be transferred in a single DMA.
For our SPARC based systems, this is relatively simple. Since
these systems have an IOMMU, the only real consideration is the
underlying HBA's maximum DMA size, which is returned via the
'dma-max' scsi capability.
For current x86 bases systems, this becomes more complicated. Not
only can the maximum DMA be limited by the underlying HBA's maximum
DMA size, but it can also be limited by the DMA engines
scatter/gather list constraints (if the memory is completely
fragmented).
Today on x86, the st tape driver will allocate physically
contiguous memory and then bp_mapin/bcopy/bp_mapout all transfers
where the tape blocksize is greater than 64KBytes.
There are two parts to the solution. The first is to provide a way
for the st driver to query what the DMA constraints of the HBA are,
taking the sgllen into account. The second is to provide a 64-bit
optimized bp copy for block sizes which are too large to fit within
the HBA's sgllen constraints, but are within the HBA's maximum DMA
size.
For example, the ST driver may find out that the maximum DMA
supported by the HBA ('dma-max') is 4M and the maximum DMA supported
by the HBA/system is 1M ('dma-max-arch'). The st driver can then
allow any blocksize <= 1M to go directly to the HBA and then use
the optimized copy for block sizes greater than 1M and less than or
equal to 4M. Today it does an un-optimized copy for block sizes
greater than 64K.
This case addresses the first part of the problem. It provides a
new 'dma-max-arch' SCSI capability which will return additional DMA
arch constraints. If there are no additional constraints (e.g. on
SPARC), scsi_ifgetcap() will returned undefined for 'dma-max-arch'.
An implementation note of interest, the 'dma-max-arch' capability is
implemented by the scsi_ifgetcap(9F) implementation itself. It does
not rely on a tran_getcap/tran_setcap(9E) response from the HBA.
4.3 Interface Table
INTERFACE COMMITMENT LEVEL COMMENT
'dma-max-arch' Committed new scsi_ifgetcap(9F)
capability
4.4 Man page changes
See below.
6. Resources and Schedule
6.4. Steering Committee requested information
6.4.1. Consolidation C-team Name:
ON
6.5. ARC review type: FastTrack
6.6. ARC Exposure: open
A.1 man page changes for scsi_ifgetcap(9F)
...
dma-max Maximum dma transfer size that is
supported by the host adapter.
+ dma-max-arch Maximum dma transfer size that is
+ supported by system. Takes the host
+ adapter and system architecture into
+ account. This is useful for target
+ drivers which don't support partial DMAs
+ on systems which don't have an IOMMU. In
+ this case, the DMA can also be limited by
+ the host adapters scatter/gather list
+ constraints.
+
+ The 'dma-max-arch' capability is not
+ settable. It is implemented with
+ scsi_ifgetcap(9F) and does not rely on a
+ tran_getcap(9E) response from the HBA.
+
+
msg-out Message out capability that is sup-
ported by the host adapter: 0 dis-
ables, 1 enables.
...