I am sponsoring the following fasttrack for myself, requesting patch
binding and a timeout of 06/18/2007.

This proposal is being driven by FMA's need to know the capacity of a
failed disk FRU. If capacity information is not available, identifying
a "similar capacity" replacement requires detailed knowledge of how the
failed disk vendor and product information maps to capacity.

-Chris


Template Version: @(#)sac_nextcase 1.61 05/24/07 SMI

1. Introduction
    1.1. Project/Component Working Name:
         Device Size Properties

    1.2. Name of Document Author/Supplier:
         Author: Chris Horne

    1.3  Date of This Document:
         11 June, 2007

4. Technical Description

   4.1  Problem

        The Solaris WDD (Writing Device Drivers) defines the 64-bit
        "Nblocks" and "Size" properties, and deprecates the 32-bit
        "nblocks" and "size", but there is no Size(9P) or Nblocks(9P)
        man page (CR 1098989).

        The "Nblocks" and "Size" properties describe the size of a
        ddi_create_minor_node(9F) minor node partition). For devices
        that have multiple partitions, there is no simple, generic way
        to obtain the size of the device.


    4.2 Proposal

        This proposal provides man pages for the existing "Nblocks" and
        "Size" partition size properties and defines three new
        properties, "blksize", "device-nblocks" and "device-blksize",
        to represent block device size. All these properties will be
        described in the new Size(9P) man page.

        The 'blksize' properties represents the data bearing block
        size.  The term 'data bearing' refers to how much user data is
        stored in a block.  If we support sector checksums in the
        future, a device with 520 byte sectors still has a data bearing
        block size of 512.  If "blksize" is not defined at the
        partition level, then "device-blksize" applies to all
        partitions. If neither are defined then the DEV_BSIZE (512)
        default is implied.

        The "device-nblocks" property represents the number of
        'blksize' blocks of data on a block device.

        The "device-nblocks" property is an 'int64' property, and
        "device-blksize" and "blksize" are 'int' properties.

        For FMA, fmd(1M) modules subscribe to DR sysevents to
        synchronize with DINFOCACHE snapshot changes - DR invalidates
        the DINFOCACHE snapshot.  The "Dynamic Lun Expansion"
        PSARC/2006/373 case also uses sysevents to learn about capacity
        changes.


    4.2.1 Compatibility

        The proposal changes the units of the existing "Nblocks"
        property from constant DEV_BSIZE byte blocks to variable
        "blksize" or "device-blksize" byte blocks.

        An old driver will not support the "blksize" or
        "device-blksize" properties. New consumer code (specfs/ldi) is
        responsible for falling back to BDEV_BSIZE when "blksize" and
        "device-blksize" don't exist. The allows an old driver that
        only supports "Nblocks" to work with both new and old consumer
        code - making this a compatible change for old drivers.

        With the change in "Nblocks" units, old consumer code that
        operates against a device with a block size other than
        DEV_BSIZE, and asks for "Nblocks", assuming DEV_BSIZE blocks,
        will now think the device is smaller.  In general, consumer
        code needs to check the existence of "blksize" and
        "device-blksize" before defaulting to DEV_BSIZE, or use
        interface ldi_get_size(9F) (or bdev_Size()) which will now
        perform these checks.

        In considering this old consumer incompatibility, the fact that
        "Nblocks" is implemented as a dynamic property, and dynamic
        properties don't show up in devinfo snapshots, means that we
        only need to consider in-kernel consumers.  Most in-kernel
        consumers that depend on "Nblocks", like Veritas VxFS/VxVM, use
        the private legacy undocumented bdev_Size()/bdev_size()
        interfaces instead of asking for the "Nblocks"/"nblocks"
        property value directly.  The bdev_Size()/bdev_size()
        interfaces will continue to return blocks in DEV_BSIZE units.
        This limits exposure to in-kernel consumers that already
        support large sector devices and which directly request the
        "Nblocks" property value.  Discounting this theoretical, very
        unlikely corner case, changing "Nblocks" units is a compatible
        change.

        For in-kernel consumers adding new support for large sector
        size devices, "Nblocks" understanding is just a minor
        consideration.  When in-kernel consumers undertake large sector
        device support, they are encouraged to switch to LDI.  Today,
        we already have LDI consumers that support large sector disks:
        ZFS uses ldi_getsize(9F) and DKIOCGETMEDIAINFO (CR6407365).
        The ZFS implementation has no direct dependency on "Nblocks".
        In the future, adding an ldi_get_blksize() interface would be a
        better solution than using DKIOCGETMEDIAINFO.

        To summarize, in the interest of preserving the 'number of
        blocks' semantic of "Nblocks", it is best to incur an
        incompatibility now - while the impact is limited to a
        theoretical, an very unlikely, corner case.  If we are
        unwilling to accept this theoretical corner case
        incompatibility now, "Nblocks" will over time loose its 'block'
        semantic.


    4.2.2 Stability

        The "Size" and "Nblocks" properties are already public - they
        are defined and discussed in the WDD.  Their original 32bit
        form, "size" and "nblocks", were introduced in 1990 - prior to
        ARC existence.  We are adding man(9P) documentation to these
        already-public 'committed' properties.  Since "Nblocks" is
        defined in units of "blksize", and "blksize" has a  fall back
        of "device-blksize", both "blksize" and "device-blksize" need
        the same 'committed' stability level as "Nblocks".

        Putting the stability level issue aside, the goal is a man page
        that describes meaningful things: device capacity is
        meaningful.


    4.2.3 Delivery

        Since S10, all target drivers in ON use the private
        ddi_prop_op_nblocks() interface off their ddi_prop_op(9E)
        implementation to share the same dynamic partition property
        implementation (see 4814888 - this also supports deprecated
        32-bit "nblocks" and "size" properties). All 'real device'
        target drivers that currently call ddi_prop_op_nblocks() should
        also support the new "device-nblocks"/"device-blksize"
        properties. The list of 'real device' target drivers is:

            common/io/scsi/targets/sd.c
            sun/io/dada/targets/dad.c
            intel/io/dktp/disk/cmdk.c
            common/io/pcmcia/pcdisk.c
            common/io/fd.c
            sun/io/fd.c

        As these drivers are updated to support large sectors, they
        should switch to the new private ddi_prop_op_nblocks_blksize()
        interface.

        To meet schedule requirements, the initial putback may only add
        "device-nblocks"/"device-blksize" support to sd. A subsequent
        putback will bring the other drivers listed above in sync. For
        the sd driver, the new "device-" property values track
        un_blockcount, un_sys_blocksize, and un_tgt_blocksize and will
        be implemented using ddi_prop_update_(9F).


    4.2.4 Alternatives for device capacity

        This section describes alternative ways of obtaining device
        capacity.  For some of the alternatives, having to open the
        device represents a common problem because:

            o You can only determine device size if you have the
              permissions needed to open the device.

            o You will not be able to determine device size if another
              application had an active exclusive open.

            o Opening each device does not scale well. With devinfo
              properties, if there are devices that need to be
              attached, a multi-threaded attach occurs.

        The alternatives considered were:

        o SCSA Alternative:

          The "SCSA SCSI-3 enhancements" PSARC/1996/113 case defined
          SCSA 'capabilities' for 'sector-size' and 'total-sectors'.

          This approach was discounted because SCSA capabilities are
          not available in devinfo snapshots. Also, not all target
          drivers that support size use a SCSA transport (cmdk, dad),
          so exposing SCSA capabilities would not provide a generic
          solution.

        o VTOC Alternative:

          The Intel VTOC sector size has a v_sectorsz field
          <http://tinyurl.com/27exn4>.

          This approach was discounted because there is no v_sectorsz
          field in the sparc vtoc structure. Also, you need to open the
          device to issue the ioctl to obtain the vtoc.

        o DKIOCGETMEDIAINFO Alternative:

          The dk_minfo structure returns dki_lbsize and dki_capacity.

          This approach was discounted because you need to open the
          device to issue the ioctl to obtain dk_minfo.

        o Whole-disk Alternative.

          Use existing "Nblocks" and "Size" properties of the 'whole
          disk' partition.

          This approach was discounted because the application needs to
          understand which partition represents the whole disk. This
          is a problem because the "whole-disk" minor node:

            o depends on platform:
              sparc: s2->':b'
              x86:   p0->':q'

            o depends on labeling:  efi: -> ':wd'

            o partition properties on unformatted media don't work.

          This approach was also discounted because the "Nblocks" and
          "Size" partition properties are typically dynamic, with
          values provided on-demand by the driver's prop_op(9E)
          implementation. Currently, dynamic properties are not
          available in the devinfo snapshot.

        Providing new "device-*" properties is seen as the cleanest,
        most efficient way of providing device capacity information.

        Disk drive vendors have been encouraging Sun to support larger
        sector sizes (1K, 2K, 4K). At some point such support will be
        mandatory for maximum performance and capacity. For block
        devices, a two property representation (device-nblocks X
        device-blksize) was chosen with this in mind.


    4.2.5 Future

        The following guidance is provided relative to future
        direction:

        o A future case should provide an ldi_get_blksize() interface.

        o If a future project delivers T10 data reliability support
          <http://www.t10.org/ftp/t10/document.03/03-291r0.pdf> with
          520 byte sectors, a "device-pblksize" should be considered.

        o A future case should consider making 'blksize' information
          via va_blksize and st_blksize stat(2) field, in much the same
          way that partition size is currently available via the
          st_size stat(2) field.

        o A future case may need to provide a "device-size" property to
          represent the size of a non-block device.

        o Most drivers implement partition properties as dynamic
          properties via private common ddi_prop_op_nblocks() code.
          Currently, dynamic properties are not represented in
          di_init(3DEVINFO) snapshots. A future case should consider
          defining a new "ddi-dynamic-properties" property that will
          allow a driver to name it's dynamic properties.  The devinfo
          driver will use the "ddi-dynamic-properties" value to provide
          a snapshot representation of dynamic properties.


    4.3 Example

        Prtconf(1M) output of new properties.

        name='device-nblocks' type=int64 items=1 dev=none
            value=000000003a386030
        name='device-blksize' type=int items=1 dev=none
            value=00000200


    4.4 Interface Table

        ------------------------------------------------------------------------
        Interface                 Level                 Comments
        ------------------------------------------------------------------------

        New:
          device-blksize(9P)    Committed               block device size
          device-nblocks(9P)    "                       properties

          blksize(9P)           Committed               partition block size
                                                        property

        Existing
          Nblocks(9P)           Committed               WDD defined partition
          Size(9P)              "                       size properties.


    4.5 References

        o WDD.
          The "Nblocks" and "Size" properties are discussed in
          numerous places. There is a short "Device Sizes" section.
          http://docs.sun.com/app/docs/doc/816-4854/6mb1o3aja?q=816-4854#hic

        o CR 1098989 nblocks and size need to be documented
          <http://monaco.sfbay.sun.com/detail.jsf?cr=1098989>
          <http://bugs.opensolaris.org/view_bug.do?bug_id=1098989>.

        o SCSA SCSI-3 enhancements
          http://sac.sfbay/PSARC/1996/113/
          http://www.opensolaris.org/os/community/arc/caselog/1996/113

        o ldi_get_size(9F) Interface
          http://sac.sfbay/PSARC/2004/171
          http://www.opensolaris.org/os/community/arc/caselog/2004/171

        o Add DKIOCGETMEDIAINFO ioctl
          http://sac.sfbay/PSARC/1999/208
          http://www.opensolaris.org/os/community/arc/caselog/1999/208

        o Dynamic Lun Expansion
          http://sac.sfbay/PSARC/2006/373
          http://www.opensolaris.org/os/community/arc/caselog/2006/373

        o Libdevinfo snapshot cache
          http://sac.sfbay/PSARC/2004/169
          http://www.opensolaris.org/os/community/arc/caselog/2004/169

        o multi-terabyte disk support
          http://sac.sfbay/PSARC/2001/570
          http://www.opensolaris.org/os/community/arc/caselog/2001/570

        o CR 6407365 large-sector disk support in ZFS
          <http://monaco.sfbay.sun.com/detail.jsf?cr=1098989>
          <http://bugs.opensolaris.org/view_bug.do?bug_id=1098989>.

        o Veritas indirect dependency on "Nblocks" via bdev_size()
          http://sac.sfbay/PSARC/1998/442  SEVM: Sun StorEdge Volume Manager
          http://sac.sfbay/PSARC/2000/282  Veritas Volume Manager 3.1
          http://sac.sfbay/PSARC/2001/232  Veritas Volume Manager 3.2

        Sector Checksumming

        o T10 Architecture for End to End Data Protection
          http://www.t10.org/ftp/t10/document.03/03-291r0.pdf

        o SCSI transport checksumming
          http://sac.sfbay/PSARC/1997/188
          http://www.opensolaris.org/os/community/arc/caselog/1997/188

        o Disk Driver IOCTLs for Checksumming
          http://sac.sfbay/PSARC/2001/240/
          http://www.opensolaris.org/os/community/arc/caselog/2001/240

    4.6 New Size(9P) man page:

        See below.

6. Resources and Schedule
    6.4. Steering Committee requested information
        6.4.1. Consolidation C-team Name:
                ON

    6.5. ARC review type:
        FastTrack


A.1 Size(9P) man page changes

    Kernel Properties for Drivers                               Size(9P)
    Kernel Properties for Drivers                            Nblocks(9P)
    Kernel Properties for Drivers                            blksize(9P)
    Kernel Properties for Drivers                     device-nblocks(9P)
    Kernel Properties for Drivers                     device-blksize(9P)

    NAME
         device size properties

    DESCRIPTION

         A driver can communicate size information to the system by the
         values associated with following properties. Size information
         falls into two categories: device size associated with a
         dev_info_t node, and minor node size associated with a
         ddi_create_minor_node(9F) dev_t (partition).

         device size property names:

            "device-nblocks"    An int64_t property representing device
                                size in device-blksize blocks.

            "device-blksize"    An integer property representing the
                                size in bytes of a block.  If defined,
                                the value must be a power of two.  If
                                not defined, DEV_BSIZE is implied.

         minor size property names:

            "Size"              An int64_t property representing the
                                size in bytes of a character minor
                                device (S_IFCHR spec_type in
                                ddi_create_minor_node).

            "Nblocks"           An int64_t property representing the
                                number blocks, in device-blksize units,
                                of a block minor device (S_IFBLK
                                spec_type in ddi_create_minor_node).

            "blksize"           An integer property representing the
                                size in bytes of a block.  If defined,
                                the value must be a power of two.  If
                                not defined, "device-blksize" value is
                                implied.

         A driver that implements both block and character minor device
         nodes should support both "Size" and "Nblocks".  Typically,
         the following is true: Size = Nblocks * blksize.

         A driver where all ddi_create_minor_node(9F) calls for a given
         instance are associated with the same physical block device
         should implement "device-nblocks".  If the device has a fixed
         block size with a value other than DEV_BSIZE then
         "device-blksize" should be implemented.

         The driver is responsible for ensuring that property values
         are updated when device, media, or partition sizes change.
         For each represented item, if its size is know to be zero, the
         property value should be zero; if its size is unknown, the
         property should not be defined.

         A driver may choose to implement size properties within its
         prop_op(9E) implementation. This reduces system memory since
         no space is used to store the properties.

         The DDI property interfaces deal in signed numbers.  All
         Size(9P) values should be considered unsigned. It is the
         responsibility of the code dealing with the property value to
         ensure that an unsigned interpretation occurs.


    ATTRIBUTES
         See attributes(5) for descriptions of the following attri-
         butes:
         ____________________________________________________________
        |       ATTRIBUTE TYPE        |       ATTRIBUTE VALUE       |
        |_____________________________|_____________________________|
        | Interface stability         | Evolving                    |
        |_____________________________|_____________________________|

    SEE ALSO
         attach(9E), detach(9E), prop_op(9E), ddi_create_minor_node(9F),
         inquiry-vendor-id(9P)
         Writing Device Drivers

6. Resources and Schedule
    6.4. Steering Committee requested information
        6.4.1. Consolidation C-team Name:
                ON
    6.5. ARC review type: FastTrack

Reply via email to