I am sponsoring the following fasttrack for myself, requesting patch
binding and a timeout of 06/18/2007.
This proposal is being driven by FMA's need to know the capacity of a
failed disk FRU. If capacity information is not available, identifying
a "similar capacity" replacement requires detailed knowledge of how the
failed disk vendor and product information maps to capacity.
-Chris
Template Version: @(#)sac_nextcase 1.61 05/24/07 SMI
1. Introduction
1.1. Project/Component Working Name:
Device Size Properties
1.2. Name of Document Author/Supplier:
Author: Chris Horne
1.3 Date of This Document:
11 June, 2007
4. Technical Description
4.1 Problem
The Solaris WDD (Writing Device Drivers) defines the 64-bit
"Nblocks" and "Size" properties, and deprecates the 32-bit
"nblocks" and "size", but there is no Size(9P) or Nblocks(9P)
man page (CR 1098989).
The "Nblocks" and "Size" properties describe the size of a
ddi_create_minor_node(9F) minor node partition). For devices
that have multiple partitions, there is no simple, generic way
to obtain the size of the device.
4.2 Proposal
This proposal provides man pages for the existing "Nblocks" and
"Size" partition size properties and defines three new
properties, "blksize", "device-nblocks" and "device-blksize",
to represent block device size. All these properties will be
described in the new Size(9P) man page.
The 'blksize' properties represents the data bearing block
size. The term 'data bearing' refers to how much user data is
stored in a block. If we support sector checksums in the
future, a device with 520 byte sectors still has a data bearing
block size of 512. If "blksize" is not defined at the
partition level, then "device-blksize" applies to all
partitions. If neither are defined then the DEV_BSIZE (512)
default is implied.
The "device-nblocks" property represents the number of
'blksize' blocks of data on a block device.
The "device-nblocks" property is an 'int64' property, and
"device-blksize" and "blksize" are 'int' properties.
For FMA, fmd(1M) modules subscribe to DR sysevents to
synchronize with DINFOCACHE snapshot changes - DR invalidates
the DINFOCACHE snapshot. The "Dynamic Lun Expansion"
PSARC/2006/373 case also uses sysevents to learn about capacity
changes.
4.2.1 Compatibility
The proposal changes the units of the existing "Nblocks"
property from constant DEV_BSIZE byte blocks to variable
"blksize" or "device-blksize" byte blocks.
An old driver will not support the "blksize" or
"device-blksize" properties. New consumer code (specfs/ldi) is
responsible for falling back to BDEV_BSIZE when "blksize" and
"device-blksize" don't exist. The allows an old driver that
only supports "Nblocks" to work with both new and old consumer
code - making this a compatible change for old drivers.
With the change in "Nblocks" units, old consumer code that
operates against a device with a block size other than
DEV_BSIZE, and asks for "Nblocks", assuming DEV_BSIZE blocks,
will now think the device is smaller. In general, consumer
code needs to check the existence of "blksize" and
"device-blksize" before defaulting to DEV_BSIZE, or use
interface ldi_get_size(9F) (or bdev_Size()) which will now
perform these checks.
In considering this old consumer incompatibility, the fact that
"Nblocks" is implemented as a dynamic property, and dynamic
properties don't show up in devinfo snapshots, means that we
only need to consider in-kernel consumers. Most in-kernel
consumers that depend on "Nblocks", like Veritas VxFS/VxVM, use
the private legacy undocumented bdev_Size()/bdev_size()
interfaces instead of asking for the "Nblocks"/"nblocks"
property value directly. The bdev_Size()/bdev_size()
interfaces will continue to return blocks in DEV_BSIZE units.
This limits exposure to in-kernel consumers that already
support large sector devices and which directly request the
"Nblocks" property value. Discounting this theoretical, very
unlikely corner case, changing "Nblocks" units is a compatible
change.
For in-kernel consumers adding new support for large sector
size devices, "Nblocks" understanding is just a minor
consideration. When in-kernel consumers undertake large sector
device support, they are encouraged to switch to LDI. Today,
we already have LDI consumers that support large sector disks:
ZFS uses ldi_getsize(9F) and DKIOCGETMEDIAINFO (CR6407365).
The ZFS implementation has no direct dependency on "Nblocks".
In the future, adding an ldi_get_blksize() interface would be a
better solution than using DKIOCGETMEDIAINFO.
To summarize, in the interest of preserving the 'number of
blocks' semantic of "Nblocks", it is best to incur an
incompatibility now - while the impact is limited to a
theoretical, an very unlikely, corner case. If we are
unwilling to accept this theoretical corner case
incompatibility now, "Nblocks" will over time loose its 'block'
semantic.
4.2.2 Stability
The "Size" and "Nblocks" properties are already public - they
are defined and discussed in the WDD. Their original 32bit
form, "size" and "nblocks", were introduced in 1990 - prior to
ARC existence. We are adding man(9P) documentation to these
already-public 'committed' properties. Since "Nblocks" is
defined in units of "blksize", and "blksize" has a fall back
of "device-blksize", both "blksize" and "device-blksize" need
the same 'committed' stability level as "Nblocks".
Putting the stability level issue aside, the goal is a man page
that describes meaningful things: device capacity is
meaningful.
4.2.3 Delivery
Since S10, all target drivers in ON use the private
ddi_prop_op_nblocks() interface off their ddi_prop_op(9E)
implementation to share the same dynamic partition property
implementation (see 4814888 - this also supports deprecated
32-bit "nblocks" and "size" properties). All 'real device'
target drivers that currently call ddi_prop_op_nblocks() should
also support the new "device-nblocks"/"device-blksize"
properties. The list of 'real device' target drivers is:
common/io/scsi/targets/sd.c
sun/io/dada/targets/dad.c
intel/io/dktp/disk/cmdk.c
common/io/pcmcia/pcdisk.c
common/io/fd.c
sun/io/fd.c
As these drivers are updated to support large sectors, they
should switch to the new private ddi_prop_op_nblocks_blksize()
interface.
To meet schedule requirements, the initial putback may only add
"device-nblocks"/"device-blksize" support to sd. A subsequent
putback will bring the other drivers listed above in sync. For
the sd driver, the new "device-" property values track
un_blockcount, un_sys_blocksize, and un_tgt_blocksize and will
be implemented using ddi_prop_update_(9F).
4.2.4 Alternatives for device capacity
This section describes alternative ways of obtaining device
capacity. For some of the alternatives, having to open the
device represents a common problem because:
o You can only determine device size if you have the
permissions needed to open the device.
o You will not be able to determine device size if another
application had an active exclusive open.
o Opening each device does not scale well. With devinfo
properties, if there are devices that need to be
attached, a multi-threaded attach occurs.
The alternatives considered were:
o SCSA Alternative:
The "SCSA SCSI-3 enhancements" PSARC/1996/113 case defined
SCSA 'capabilities' for 'sector-size' and 'total-sectors'.
This approach was discounted because SCSA capabilities are
not available in devinfo snapshots. Also, not all target
drivers that support size use a SCSA transport (cmdk, dad),
so exposing SCSA capabilities would not provide a generic
solution.
o VTOC Alternative:
The Intel VTOC sector size has a v_sectorsz field
<http://tinyurl.com/27exn4>.
This approach was discounted because there is no v_sectorsz
field in the sparc vtoc structure. Also, you need to open the
device to issue the ioctl to obtain the vtoc.
o DKIOCGETMEDIAINFO Alternative:
The dk_minfo structure returns dki_lbsize and dki_capacity.
This approach was discounted because you need to open the
device to issue the ioctl to obtain dk_minfo.
o Whole-disk Alternative.
Use existing "Nblocks" and "Size" properties of the 'whole
disk' partition.
This approach was discounted because the application needs to
understand which partition represents the whole disk. This
is a problem because the "whole-disk" minor node:
o depends on platform:
sparc: s2->':b'
x86: p0->':q'
o depends on labeling: efi: -> ':wd'
o partition properties on unformatted media don't work.
This approach was also discounted because the "Nblocks" and
"Size" partition properties are typically dynamic, with
values provided on-demand by the driver's prop_op(9E)
implementation. Currently, dynamic properties are not
available in the devinfo snapshot.
Providing new "device-*" properties is seen as the cleanest,
most efficient way of providing device capacity information.
Disk drive vendors have been encouraging Sun to support larger
sector sizes (1K, 2K, 4K). At some point such support will be
mandatory for maximum performance and capacity. For block
devices, a two property representation (device-nblocks X
device-blksize) was chosen with this in mind.
4.2.5 Future
The following guidance is provided relative to future
direction:
o A future case should provide an ldi_get_blksize() interface.
o If a future project delivers T10 data reliability support
<http://www.t10.org/ftp/t10/document.03/03-291r0.pdf> with
520 byte sectors, a "device-pblksize" should be considered.
o A future case should consider making 'blksize' information
via va_blksize and st_blksize stat(2) field, in much the same
way that partition size is currently available via the
st_size stat(2) field.
o A future case may need to provide a "device-size" property to
represent the size of a non-block device.
o Most drivers implement partition properties as dynamic
properties via private common ddi_prop_op_nblocks() code.
Currently, dynamic properties are not represented in
di_init(3DEVINFO) snapshots. A future case should consider
defining a new "ddi-dynamic-properties" property that will
allow a driver to name it's dynamic properties. The devinfo
driver will use the "ddi-dynamic-properties" value to provide
a snapshot representation of dynamic properties.
4.3 Example
Prtconf(1M) output of new properties.
name='device-nblocks' type=int64 items=1 dev=none
value=000000003a386030
name='device-blksize' type=int items=1 dev=none
value=00000200
4.4 Interface Table
------------------------------------------------------------------------
Interface Level Comments
------------------------------------------------------------------------
New:
device-blksize(9P) Committed block device size
device-nblocks(9P) " properties
blksize(9P) Committed partition block size
property
Existing
Nblocks(9P) Committed WDD defined partition
Size(9P) " size properties.
4.5 References
o WDD.
The "Nblocks" and "Size" properties are discussed in
numerous places. There is a short "Device Sizes" section.
http://docs.sun.com/app/docs/doc/816-4854/6mb1o3aja?q=816-4854#hic
o CR 1098989 nblocks and size need to be documented
<http://monaco.sfbay.sun.com/detail.jsf?cr=1098989>
<http://bugs.opensolaris.org/view_bug.do?bug_id=1098989>.
o SCSA SCSI-3 enhancements
http://sac.sfbay/PSARC/1996/113/
http://www.opensolaris.org/os/community/arc/caselog/1996/113
o ldi_get_size(9F) Interface
http://sac.sfbay/PSARC/2004/171
http://www.opensolaris.org/os/community/arc/caselog/2004/171
o Add DKIOCGETMEDIAINFO ioctl
http://sac.sfbay/PSARC/1999/208
http://www.opensolaris.org/os/community/arc/caselog/1999/208
o Dynamic Lun Expansion
http://sac.sfbay/PSARC/2006/373
http://www.opensolaris.org/os/community/arc/caselog/2006/373
o Libdevinfo snapshot cache
http://sac.sfbay/PSARC/2004/169
http://www.opensolaris.org/os/community/arc/caselog/2004/169
o multi-terabyte disk support
http://sac.sfbay/PSARC/2001/570
http://www.opensolaris.org/os/community/arc/caselog/2001/570
o CR 6407365 large-sector disk support in ZFS
<http://monaco.sfbay.sun.com/detail.jsf?cr=1098989>
<http://bugs.opensolaris.org/view_bug.do?bug_id=1098989>.
o Veritas indirect dependency on "Nblocks" via bdev_size()
http://sac.sfbay/PSARC/1998/442 SEVM: Sun StorEdge Volume Manager
http://sac.sfbay/PSARC/2000/282 Veritas Volume Manager 3.1
http://sac.sfbay/PSARC/2001/232 Veritas Volume Manager 3.2
Sector Checksumming
o T10 Architecture for End to End Data Protection
http://www.t10.org/ftp/t10/document.03/03-291r0.pdf
o SCSI transport checksumming
http://sac.sfbay/PSARC/1997/188
http://www.opensolaris.org/os/community/arc/caselog/1997/188
o Disk Driver IOCTLs for Checksumming
http://sac.sfbay/PSARC/2001/240/
http://www.opensolaris.org/os/community/arc/caselog/2001/240
4.6 New Size(9P) man page:
See below.
6. Resources and Schedule
6.4. Steering Committee requested information
6.4.1. Consolidation C-team Name:
ON
6.5. ARC review type:
FastTrack
A.1 Size(9P) man page changes
Kernel Properties for Drivers Size(9P)
Kernel Properties for Drivers Nblocks(9P)
Kernel Properties for Drivers blksize(9P)
Kernel Properties for Drivers device-nblocks(9P)
Kernel Properties for Drivers device-blksize(9P)
NAME
device size properties
DESCRIPTION
A driver can communicate size information to the system by the
values associated with following properties. Size information
falls into two categories: device size associated with a
dev_info_t node, and minor node size associated with a
ddi_create_minor_node(9F) dev_t (partition).
device size property names:
"device-nblocks" An int64_t property representing device
size in device-blksize blocks.
"device-blksize" An integer property representing the
size in bytes of a block. If defined,
the value must be a power of two. If
not defined, DEV_BSIZE is implied.
minor size property names:
"Size" An int64_t property representing the
size in bytes of a character minor
device (S_IFCHR spec_type in
ddi_create_minor_node).
"Nblocks" An int64_t property representing the
number blocks, in device-blksize units,
of a block minor device (S_IFBLK
spec_type in ddi_create_minor_node).
"blksize" An integer property representing the
size in bytes of a block. If defined,
the value must be a power of two. If
not defined, "device-blksize" value is
implied.
A driver that implements both block and character minor device
nodes should support both "Size" and "Nblocks". Typically,
the following is true: Size = Nblocks * blksize.
A driver where all ddi_create_minor_node(9F) calls for a given
instance are associated with the same physical block device
should implement "device-nblocks". If the device has a fixed
block size with a value other than DEV_BSIZE then
"device-blksize" should be implemented.
The driver is responsible for ensuring that property values
are updated when device, media, or partition sizes change.
For each represented item, if its size is know to be zero, the
property value should be zero; if its size is unknown, the
property should not be defined.
A driver may choose to implement size properties within its
prop_op(9E) implementation. This reduces system memory since
no space is used to store the properties.
The DDI property interfaces deal in signed numbers. All
Size(9P) values should be considered unsigned. It is the
responsibility of the code dealing with the property value to
ensure that an unsigned interpretation occurs.
ATTRIBUTES
See attributes(5) for descriptions of the following attri-
butes:
____________________________________________________________
| ATTRIBUTE TYPE | ATTRIBUTE VALUE |
|_____________________________|_____________________________|
| Interface stability | Evolving |
|_____________________________|_____________________________|
SEE ALSO
attach(9E), detach(9E), prop_op(9E), ddi_create_minor_node(9F),
inquiry-vendor-id(9P)
Writing Device Drivers
6. Resources and Schedule
6.4. Steering Committee requested information
6.4.1. Consolidation C-team Name:
ON
6.5. ARC review type: FastTrack