forcedirectio mount flag in nfsstat [PSARC/2009/587 Self Review]
I'm sponsoring this automatic-approved self review case for Marcel Telka (RPE). This case corrects an obvious oversight (missing display of 'forcedirectio' mount flag) in the output of nfsstat(1m) and updates the corresponding man page. I believe this case qualifies for self-review, but if anyone disagrees, let me know and I'll promote it to a fast track. Template Version: @(#)sac_nextcase 1.68 02/23/09 SMI This information is Copyright 2009 Sun Microsystems 1. Introduction 1.1. Project/Component Working Name: forcedirectio mount flag in nfsstat 1.2. Name of Document Author/Supplier: Author: Marcel Telka 1.3 Date of This Document: 27 October, 2009 4. Technical Description The nfsstat(1m) command has an option (-m) which displays statistics for each NFS mounted file system. This option also displays the mount flags for each NFS mounted file system; however, there is one existing mount flag that is missing from the display: forcedirectio Support for the forcedirectio mount option was added to the Solaris NFS client together with directio(3C) capability as part of the fix for CR 4190364 (Solaris 8). The 'forcedirectio' mount option is currently documented in mount_nfs(1M) man page. It was integrated into the man page in Solaris 9 by CR 4521941. During the development mentioned above, the nfsstat utility was not updated to display the forcedirectio mount flag when run with the -m option. The lack of the forcedirectio flag in the 'nfsstat -m' output can cause confusion for users. This case corrects that oversight by adding support for the forcedirectio mount flag to the 'nfsstat -m' output. This change will be documented in nfsstat(1M) man page (see below). For example, given the following NFS file system: # mount |grep test /tmp/test on snvx.czech:/builds remote/read/write/setuid/devices/forcedirectio/xattr/dev=615 on Mon Oct 5 08:48:09 2009 The current 'nfsstat' displays the following (note the lack of the 'forcedirectio' flag): # nfsstat -m /tmp/test /tmp/test from snvx.czech:/builds Flags: vers=4,proto=tcp,sec=sys,hard,intr,link,symlink,acl,rsize=1048576,wsize=1048576,retrans=5,timeo=600 Attr cache:acregmin=3,acregmax=60,acdirmin=30,acdirmax=60 With the proposed change, the 'nfsstat' command would now include the 'forcedirectio' mount flag: # nfsstat -m /tmp/test /tmp/test from snvx.czech:/builds Flags: vers=4,proto=tcp,sec=sys,hard,intr,link,symlink,acl,forcedirectio,rsize=1048576,wsize=1048576,retrans=5,timeo=600 Attr cache:acregmin=3,acregmax=60,acdirmin=30,acdirmax=60 This case also includes an update to the nfsstat(1m) man page which has been pre-approved by Terry Gibson. Terry is the RE for the associated man page CR (6888023: nfsstat: forcedirectio mount flag needs to be documented) 8-- The -m option includes information about mount flags set by mount options, mount flags internal to the system, and other mount information. See mount_nfs(1M). The following mount flags are set by mount options: forcedirectio | | Data transferred directly between client and server, | with no buffering on client. | grpid System V group id inheritance. 8-- 6. Resources and Schedule 6.4. Steering Committee requested information 6.4.1. Consolidation C-team Name: ON 6.5. ARC review type: Automatic 6.6. ARC Exposure: open
NFS Referrals [PSARC/2009/502 FastTrack timeout 09/25/2009]
I'm sponsoring this case for Rob Thurlow. This case proposes to add NFS Referral support to the Solaris client and server. Minor binding is requested. This times out on Friday, 25 September, 2009. Template Version: @(#)sac_nextcase 1.68 02/23/09 SMI This information is Copyright 2009 Sun Microsystems 1. Introduction 1.1. Project/Component Working Name: NFS Referrals 1.2. Name of Document Author/Supplier: Author: Rob Thurlow 1.3 Date of This Document: 18 September, 2009 4. Technical Description NFS Referrals A. Introduction This project intends to introduce basic NFS referral support to the Solaris client and server. A Solaris client will follow a referral to a new location and transparently mount it, and a Solaris server will have basic support for creating and managing referrals. Referral support will permit construction of a server-based unified namespace that NFS clients will be able to participate in. NFSv4 referrals are supported by modern HP-UX, AIX, Linux releases. This project makes use of Reparse Points, PSARC 2009/387, and is a follow-on to the umbrella case PSARC 2009/399. The project also uses and extends interfaces from NFSv4 Mirror Mounts, PSARC 2007/416; most of the automatic mounting code and all of the automatic unmounting code is untouched. This work is based on replication/migration primitives defined in NFSv4 (RFC 3530)[1] with behaviour defined by later work [2]. B. Scope The NFSv4 working group in the IETF is working on Federated FS, which is a unified, centrally managed back-end to support a set of servers presenting a uniform namespace. This project is not delivering support for Federated FS since that specification is still a work-in-progress. However, our work will permit drop-in support of FedFS in the future. This project is not delivering support for replication and migration as described by the NFSv4 specification. Solaris currently has limited support for replication and client-side failover via mount_nfs(1M) and the automounter via PSARC 1995/143. This work is expected to be a basis for future migration support. C. Behaviour C.1 Client Behaviour The client discovers referral objects as it examines the NFSv4 server's filesystems, and the client associates a distinct set of vnode ops with these objects. Most client filesystem operations will trigger referral mounts, but VOP_LOOKUP() and most VOP_GETATTR() calls will not, to match existing automount and mirror mount behaviour. Prior to a mount, the NFSv4 client will display referral objects as directories with artificial metadata, like autofs trigger nodes before mounting. As with mirror mounts, find(1)'s use of _AT_TRIGGER (PSARC 2007/563) will force a mount prior to collecting attributes. Mounting will be done without checking the privilege of the calling process. Automatic umounting will be done in the same way as for mirror mounts. Mounts will time out like mounts done by the automounter, and a manual unmount of the enclosing filesystem will unmount referral mounts and mirror mounts as well unless they are being kept busy by processes' open files. The NFSv4 client will advertise support for referrals via the protocol's SETCLIENTID operation; see section C.2 for more. The client will perform a mount of the first reachable location in the fs_locations data. The multi-valued fs_locations attribute describes the locations (server:/path combinations) for the same data. fs_locations can use hostnames, dotted quad numeric IPv4 address strings, or IPv6 address strings; we will convert these according to the client's transport configuration by doing a new door upcall to the existing nfsmapid binary. A new string, referral, will be visible in nfsstat -m output for an automatic mount done this way. Other observability will include a kstat counter of referral mounts done and a number of dtrace probes to report the results of name resolution and mounts. A kernel tunable will permit disabling referral mounts if it should be deemed necessary, but it will not be documented. C.2 Server Behaviour When NFSv3 and NFSv4 server processes operations on behalf of the client, it may encounter reparse points (as described in PSARC 2009/387). When that occurs, the server will examine the reparse point data and look for a service type starting with 'nfs', indicating NFS service data. If found, the NFSv4 server will normally return NFS4ERR_MOVED. The NFSv4 client will then normally request the fs_locations attribute. In response to this, the NFSv4 server will upcall with the NFS service data to the reparse daemon. In general, with reparse points, the service data can be a key to find the desired location information, perhaps by consulting a distributed or networked database. For this first release, the service data will simply be the location information in the form of a host:/path [host:/path ...] string. This has the property that the referral will work when the
Copy Reduction Interfaces [PSARC/2009/478 FastTrack timeout 09/16/2009]
I'm sponsoring this case on behalf of Mahesh Siddheshwar and Chunli Zhang. This case proposes new interfaces to support copy reduction in the I/O path especially for file sharing services. Minor binding is requested. This times out on Wednesday, 16 September, 2009. Template Version: @(#)sac_nextcase 1.68 02/23/09 SMI This information is Copyright 2009 Sun Microsystems 1. Introduction 1.1. Project/Component Working Name: Copy Reduction Interfaces 1.2. Name of Document Author/Supplier: Author: Mahesh Siddheshwar, Chunli Zhang 1.3 Date of This Document: 09 September, 2009 4. Technical Description == Introduction/Background == Zero-copy (copy avoidance) is essentially buffer sharing among multiple modules that pass data between the modules. This proposal avoids the data copy in the READ/WRITE path of filesystems, by providing a mechanism to share data buffers between the modules. It is intended to be used by network file sharing services like NFS, CIFS or others. Although the buffer sharing can be achieved through a few different solutions, any such solution must work with File Event Monitors (FEM monitors)[1] installed on the files. The solution must allow the underlying filesystem to maintain any existing file range locking in the filesystem. The proposed solution provides extensions to the existing VOP interface to request and return buffers from a filesystem. The buffers are then used with existing VOP_READ/VOP_WRITE calls with minimal changes. == Proposed Changes == VOP Extensions for Zero-Copy Support a. Extended struct uio, xuio_t The following proposes an extensible uio structure that can be extended for multiple purposes. For example, an immediate extension, xu_zc, is to be used by the proposed VOP_REQZCBUF/VOP_RETZCBUF interfaces to pass loaned zero-copy buffers, as well as to be passed to the existing VOP_READ/VOP_WRITE calls for normal read/write operations. Another example of extension, xu_aio, is intended to replace uioa_t for async I/O. This new structure, xuio_t, contains the following: - the existing uio structure (embedded) as the first member - additional fields to support extensibility - a union of all the defined extensions The following uio_extflag is added to indicate that an uio structure is indeed an xuio_t: #define UIO_XUIO0x004 /* Structure is xuio_t */ The following uio_extflag will be removed after uioa_t has been converted to xuio_t: #define UIO_ASYNC 0x002 /* Structure is xuio_t */ The project team has commitment from the networking team to remove the current use of uioa_t and use the proposed extensions (CR 6880095). The definition of xuio_t is: typedef struct xuio { uio_t xu_uio; /* Embedded UIO structure */ /* Extended uio fields */ enum xuio_type xu_type; /* What kind of uio structure? */ union { /* Async I/O Support */ struct { uint32_t xu_a_state;/* state of async i/o */ uint32_t xu_a_state;/* state of async i/o */ ssize_t xu_a_mbytes;/* bytes that have been uioamove()ed */ uioa_page_t *xu_a_lcur; /* pointer into uioa_locked[] */ void **xu_a_lppp; /* pointer into lcur-uioa_ppp[] */ void *xu_a_hwst[4]; /* opaque hardware state */ uioa_page_t xu_a_locked[UIOA_IOV_MAX]; /* Per iov locked pages */ } xu_aio; /* Zero Copy Support */ struct { enum uio_rw xu_zc_rw; /* the use of the buffer */ void *xu_zc_priv; /* fs specific */ } xu_zc; } xu_ext; } xuio_t; where xu_type is currently defined as: typedef enum xuio_type { UIOTYPE_ASYNCIO, UIOTYPE_ZEROCOPY } xuio_type_t; New uio extensions can be added by defining a new xuio_type_t, and adding a new member to the xu_ext union. b. Requesting zero-copy buffers #define VOP_REQZCBUF(vp, rwflag, uiozcp, cr, ct) \ fop_reqzcbuf(vp, rwflag, uiozcp, cr, ct) int fop_reqzcbuf(vnode_t *, enum uio_rw, xuio_t *, cred_t *, caller_context_t *); This function requests buffers associated with file vp in preparation for a subsequent zero copy read or write. The extended uio_t -- xuio_t is used to pass the parameters and results. Only the following fields of xuio_t are relevant to this call. uiozcp-xu_uio.uio_resid: used by the caller to specify the total length of the buffer. uiozcp-xu_uio.uio_loffset: Used by the caller to indicate the file offset it would like the buffers to be associated with. A value of -1 indicates that the provider returns buffers that are not associated with a particular offset. These are defined to be anonymous buffers. Anonymous buffers may be used
nfs4_fid() [PSARC/2009/468 FastTrack timeout 09/04/2009]
I'm sponsoring this case for Bill Baker of the NFS team. This case proposed enabling technology for a forthcoming closed case. The timer expires on Friday, September 4, 2009. This case requests minor binding. Template Version: @(#)sac_nextcase 1.68 02/23/09 SMI This information is Copyright 2009 Sun Microsystems 1. Introduction 1.1. Project/Component Working Name: nfs4_fid() 1.2. Name of Document Author/Supplier: Author: Bill Baker 1.3 Date of This Document: 28 August, 2009 4. Technical Description Historically, the NFSv4 client has not implemented the VOP_FID() function due to fundamental limitations of the protocol. The client cannot construct a file identifier which can be used to reactivate a vnode (via VFS_VGET()) which is usable in all cases. In particular, volatile file handle recovery as well as handling VOP_OPEN() is impossible without the additional information which is created and maintained by the NFSv4 client during VOP_LOOKUP(). Vnodes activated via VFS_VGET() may not have this information, nor can it be constructed. Given these limitations, the current nfs4_fid() simply returns EREMOTE. However, having nfs4_fid() return a file id can be useful in a very narrow, controlled context. This file ID can be used as an opaque cookie which can be compared to file IDs from other vnodes from the same vfs. This could be used by a file tree walking program to determine if a newly looked up file had been discovered previously since, by definition, the file handle uniquely identifies a different file in the protocol. This ID is persistent (assuming the server is using persistent file handles) and is therefore durable and reliable across reboots of both the client and server. The consumer of this ID can write it to stable storage and safely recover its state even after a client reboot. Therefore, nfs4_fid() is reimplemented to return the client file handle from the rnode as a file ID, solely for the purpose of doing this equivalency comparison. nfs4_vget() will not honor this file ID, meaning that it can ONLY be used in this manner, it cannot be used to activate a vnode. Since this new behavior may expose existing subsystems, like cachefs, to new failure modes, nfs4_fid() will only return a file ID when the client file system is mounted with -o fid. By default, the option is off and the NFSv4 client will retain its current behavior. Ideally, this mount option is only used by system components which wish to use the file ID for identification as described above. Other uses are not supported. The option will NOT be documented, due to its limited utility. |Proposed |Specified | |Stability |in what| Interface Name |Classification |Document? | Comments === |Consolidation |This | -o fid|Private|Document | new option to | | | mount_nfs | | | 6. Resources and Schedule 6.4. Steering Committee requested information 6.4.1. Consolidation C-team Name: ON 6.5. ARC review type: FastTrack 6.6. ARC Exposure: open
VSD locking update [PSARC/2009/343 FastTrack timeout 06/12/2009]
I'm sponsoring the following fast-track for Bob Mastors. This case is an update to PSARC 2007/456 Vnode Specific Data and adds a new field to the vnode structure (in sys/vnode.h): v_vsd_lock. The case seeks Minor binding which matches the binding of the original case. The timer expires on June 12, 2009. Template Version: @(#)sac_nextcase 1.68 02/23/09 SMI This information is Copyright 2009 Sun Microsystems 1. Introduction 1.1. Project/Component Working Name: VSD locking update 1.2. Name of Document Author/Supplier: Author: Bob Mastors 1.3 Date of This Document: 05 June, 2009 4. Technical Description PSARC 2007/456 Vnode Specific Data was created to easily associate project specific data with a vnode. The current implementation requires consumers of vsd_get() and vsd_set() to hold the vnode v_lock mutex across the calls. However, vsd_set() may go to sleep on kmem_alloc(). This could cause the system to deadlock if the vnode had dirty pages that needed to be written. This problem is described by: 6839233 VSD usage of v_lock could cause deadlock The solution adds a new mutex to vnode_t: kmutex_tv_vsd_lock;/* protects v_vsd field */ Consumers of vsd_get() and vsd_set() would hold v_vsd_lock across the calls instead of v_lock. There are no lock ordering issues since the locks are independent and there is no reason to hold both simultaneously. Currently, the only ON consumer of vsd_get() and vsd_set() is NFS. Exported Interfaces Interface Name | Classification | Comments = || v_vsd_lock | Consolidation | New mutex in the vnode structure | Private| to protect v_vsd field 6. Resources and Schedule 6.4. Steering Committee requested information 6.4.1. Consolidation C-team Name: ON 6.5. ARC review type: FastTrack 6.6. ARC Exposure: open
CIFS Client Commands Update [PSARC/2009/226 Self Review]
I'm sponsoring the following for Gordon Ross. This is an update to an approved case. I believe this change qualifies for self-review and I've marked it Approved Automatic. If anyone disagrees, please let me know and I'll promote this to a fast-track. The requested binding is PATCH which matches the binding approved for the original case. Template Version: @(#)sac_nextcase 1.68 02/23/09 SMI This information is Copyright 2009 Sun Microsystems 1. Introduction 1.1. Project/Component Working Name: CIFS Client Commands Update 1.2. Name of Document Author/Supplier: Author: Gordon Ross 1.3 Date of This Document: 07 April, 2009 4. Technical Description PSARC 2005/695 CIFS Client on Solaris (approved on 13 Sept 2007) introduced a new filesystem type: smbfs. Since then the project team realized that several smbfs-specific commands were not included in the original case: dfshare_smbfs, share_smbfs, and unshare_smbfs. The lack of an smbfs-specific dfshares program causes the dfshares(1m) command to exit with a failure code which causes regression tests to fail. (See CR 6670499 for details.) The smbfs-specific programs share and unshare are being included for completeness and conformity with autofs, cachefs, etc. This case corrects that oversight and adds the following commands: /usr/lib/fs/smbfs/dfshares dfshares_smbfs simply returns an exit code of 0 /usr/lib/fs/smbfs/share share_smbfs prints smbfs share is not supported and returns an exit code of 1 /usr/lib/fs/smbfs/unshare unshare_smbfs prints smbfs unshare is not supported and returns an exit code of 1 Just as there are no share_{fstype}.1m, unshare_{fstype}.1m, and dfshares_{fstype}.1m manual pages for autofs and cachefs, there are also no corresponding manual pages needed for smbfs. 6. Resources and Schedule 6.4. Steering Committee requested information 6.4.1. Consolidation C-team Name: ON 6.5. ARC review type: Automatic 6.6. ARC Exposure: open
umountall -Z [PSARC/2008/765 FastTrack timeout 12/18/2008]
I'm sponsoring this case for Pavel Filipensky to add the '-Z' option to umountall(1M). This case times out on 12/18/2008. Micro/patch binding is requested for this case. Template Version: @(#)sac_nextcase %I% %G% SMI This information is Copyright 2008 Sun Microsystems 1. Introduction 1.1. Project/Component Working Name: umountall -Z 1.2. Name of Document Author/Supplier: Author: Pavel Filipensky 1.3 Date of This Document: 11 December, 2008 4. Technical Description This case proposes two changes: 1) It introduces a new command line option, -Z, to umountall(1M). When the umountall(1M) command is run in the global zone, this option applies the unmounting action(s) only to the file systems mounted in non-global zones. The use of -Z option in non-global zones will have no effect. 2) The default behavior of umountall(1M) is changed to limit the unmounting action(s) to the current zone. Rationale for limiting the default scope to the current zone: Currently, running umountall(1M) in the global zone unmounts file systems from the global zone and from non-global zones as well. This is causing following bugs: 6502014 NFS mounts in non-global zones are unmounted if NFS is restarted in the global zone 6512906 Autofs mounts in non-global zones are unmounted when autofs is restarted in the global zone 6777323 smb mounts in non-global zones are unmounted when smb/client is restarted in the global zone Limiting the default scope of umountall(1M) to the current zone will fix the bugs above. Rationale for adding the new -Z option: The -Z option will be used in the stop method of svc:/system/zones:default. This will take care of the case when we try to stop zones and some of them fail to shut down. It is better to try to unmount the filesystems mounted in them to free resources on the servers. There are no side effects of using -Z option on other suboptions to umountall(1). Using -Z never changes the behaviour of other suboptions, -Z only changes their scope. The webrev for these changes is available here: http://cr.opensolaris.org/~pavelf/6779275 Related CR: 6779275 umountall(1M) -Z ... limit unmounting action(s) to the non-global zones EXPORTED INTERFACES umountall(1M) optionStability Level -Z Committed DOCUMENTATION IMPACT (See 6780521) manpage umountall(1M) changes: 1. a new -Z option 2. change in the default behavior Changes are as follows: SYNOPSIS mountall [-F FSType] [-l | -r] [file_system_table] umountall [-k] [-s] [-F FSType] [-l | -r] [-n] [-Z] + umountall [-k] [-s] [-h host] [-n] [-Z]+ [...] umountall causes all mounted file systems in the current + zone except root, /usr, /var, /var/adm, /var/run, /proc, and + /dev/fd to be unmounted. If the FSType is specified, moun- tall and umountall limit their actions to the FSType speci- [...] -s Do not perform the umount operation in parallel. -Z Apply the action(s) only to the file systems + mounted in non-global zones. By default, umoun- + tall unmounts only file systems mounted in the + current zone. Has no effect if used in a non- + global zone. + FILES [...] 6. Resources and Schedule 6.4. Steering Committee requested information 6.4.1. Consolidation C-team Name: ON 6.5. ARC review type: FastTrack 6.6. ARC Exposure: open
VFSFT_SYSATTR_VIEWS [PSARC/2008/588 Self Review]
I'm sponsoring this for Janice Chang. This case adds a new VFS Feature, VFSFT_SYSATTR_VIEWS, which is registered when a file system supports the extended attribute files that describe extensible system attributes (a.k.a. views). This case is an extension of the Extensible Attribute Interface (PSARC 2007/315) which was approved with minor binding. Since this case simply allows the (VFS) registration of an existing interface, I'm filing this as Closed Approved Automatic. If anyone disagrees, let me know and I'll promote it to a fast track. Template Version: @(#)sac_nextcase %I% %G% SMI This information is Copyright 2008 Sun Microsystems 1. Introduction 1.1. Project/Component Working Name: VFSFT_SYSATTR_VIEWS 1.2. Name of Document Author/Supplier: Author: Janice Chang 1.3 Date of This Document: 17 September, 2008 4. Technical Description PSARC 2007/315 (Extensible Attribute Interfaces) introduced a set of interfaces to retrieve and manipulate extensible system attributes on file objects. Extensible system attributes (also known as system attributes) were introduced specifically to support the CIFS service, which requires support for DOS attributes (PSARC 2006/715). One of the interfaces described by this case is an extensible vattr_t structure called xvattr_t. It is used with VOP_SETATTR()/VOP_GETATTR() to set/retrieve the new system attributes. File systems that support this interface communicate this to consumers by using the VFS Feature Registration facility (PSARC 2007/227) to register VFSFT_XVATTR. Another interface described by the Extensible Attribute Interfaces case is called a view. Each view exposes a group of system attributes as an extended attribute file whose name begins with SUNWattr_ (e.g., SUNWattr_rw for modifiable attributes and SUNWattr_ro for attributes that cannot be modified). These views are used to accomodate existing extended attribute aware utilities. File systems that support modifiable system attributes use both the xvattr_t and views interface. However, some file systems (tmpfs, ufs) support only non-modifiable system attributes (e.g., FSID) which are exposed only through a read-only view (SUNWattr_ro). Unfortunately, the VFSFT_XVATTR feature was set on these file systems to indicate support for system attributes. Overloading VFSFT_XVATTR meant that consumers would attempt to retrieve system attributes by using xvattr_t with VOP_GETATTR(), which would result in an error. In order to remedy this problem, a new VFS Feature is introduced-- VFSFT_SYSATTR_VIEWS--which denotes support specifically for the view interface for extensible system attributes. All ON file systems that support views will be modified to register the VFSFT_SYSATTR_VIEWS feature and only those ON file systems that support the xvattr_t interface will register VFSFT_XVATTR. Consumers (in particular, the CIFS service) will be able to reliably determine which interface is available to manipulate the system attributes of a file. This change is Consolidation Private and will be communicated to the unbundled file system teams (internal and external). 6. Resources and Schedule 6.4. Steering Committee requested information 6.4.1. Consolidation C-team Name: ON 6.5. ARC review type: Automatic 6.6. ARC Exposure: open
64 bit offsets for VOP_DUMP [PSARC/2008/053 FastTrack timeout 02/01/2008]
I'm submitting this fast-track for Bob Mastors. Requested binding is MINOR. Time-out is 1 Feb, 2008. Template Version: @(#)sac_nextcase 1.64 07/13/07 SMI This information is Copyright 2008 Sun Microsystems 1. Introduction 1.1. Project/Component Working Name: 64 bit offsets for VOP_DUMP 1.2. Name of Document Author/Supplier: Author: Bob Mastors 1.3 Date of This Document: 25 January, 2008 4. Technical Description INTRODUCTION The vnode operation VOP_DUMP uses 32-bit values for block addressing. This prevents system crash dumps and cpr statefiles from being saved on very large devices via VOP_DUMP. PROPOSED CHANGES Change block offset and lengths to type offset_t for VOP_DUMP and VOP_DUMPCTL. Change corresponding functions in FEM, file event monitoring. Change file system implementations of VOP_DUMP and VOP_DUMPCTL to work with the new types. Change callers of the above VOP/FEM functions as needed. The VFSDEF_VERSION number in sys/vfs.h will be bumped from 4 to 5 in order to prevent unbundled file system kernel modules with the old signatures from loading. Once the unbundled file system modules are updated with the new signatures and recompiled, they will also pick up the new VFSDEF_VERSION number and be allowed to load. All of the ON file systems will be updated with these changes. Unbundled file system developers (internal and external) will be given a heads up about these changes. Rich Brown is coordinating a TOI on all file system changes made in Solaris Nevada. Note that Solaris Nevada now performs strong type-checking on vnode/FEM operations. This means that the compilers will inform unbundled file system developers of the signature discrepancy in their code. DEVICE DRIVER INTERFACE No changes are proposed to the device driver interface. File system specific dump functions typically call the DDI routine bdev_dump which has the following signature: int bdev_dump(dev_t dev, caddr_t addr, daddr_t blkno, int blkcnt) bdev_dump calls the underlying device dump(9e) function which has the same signature. On the 64-bit solaris kernel, daddr_t is 64-bits. However on the 32-bit solaris kernel, daddr_t is a 32-bit value. bdev_dump and device driver dump(9e) functions have the following characteristics: 64-bit device drivers have 64-bit block addressing and 32-bit block counts. 32-bit device drivers have 32-bit block addressing and 32-bit block counts. Current consumers of VOP_DUMP limit the transfer size to a few megabytes at a time. They do not overflow the 32-bit block count. 32-bit block addressing results in the following limitation. A file should not be used as the dump device or cpr statefile when all of the following conditions occur: a) the solaris kernel is compiled 32-bit b) the file system is UFS c) the file system is larger then 1 TB This limitation may also apply to unbundled file systems. ZFS does not have this limitation because it does not support files as dump devices or cpr statefiles. The fop_dump() function will be changed to perform safety checks to ensure the offset and length passed to VOP_DUMP can be passed onto bdev_dump safely. fop_dump will return EIO if the values cannot be passed safely. These safety checks may be removed in the future if the DDI dump(9e) signature is modified to support 64-bit addressing and 64-bit block counts on all architectures. ALTERNATIVES TO OFFSET_T The selection of offset_t for the type of the block address and block count seemed consistent with usage in other VOP functions and struct uio. Also offset_t is a signed value, as are the types it is replacing. Alternatives considered include the following: typedef u_longlong_tdiskaddr_t; typedef u_longlong_tlen_t; typedef u_longlong_tu_offset_t; typedef uint64_tpaddr_t; RELATED CASES PSARC/2001/679 Vnode Interfaces PSARC/2007/124 Strong Type-Checking for VFS Operations RELATED CONTRACTS PSARC 2001/599 (FS related interfaces for SAM-QFS) PSARC 2004/177 (FS related interfaces for Sun Cluster) RELATED CR 6214480 System crash dump fails when dump device is 1 TB DELIVERY These modifications are intended to be part of Solaris Nevada. MODIFIED INTERFACES +---+--++ | Interface| Classification | Comments | +---+--++ | | Contracted || | VOP_DUMP, | Consolidation| changed block addr and len | | fop_dump | Private | to offset_t| | |
Caller context flags [PSARC/2007/632 FastTrack timeout 11/09/2007]
I'm sponsoring this fast-track for Jim Wahlig. This case seeks Minor binding. Template Version: @(#)sac_nextcase 1.64 07/13/07 SMI This information is Copyright 2007 Sun Microsystems 1. Introduction 1.1. Project/Component Working Name: Caller context flags 1.2. Name of Document Author/Supplier: Author: Jim Wahlig 1.3 Date of This Document: 02 November, 2007 4. Technical Description One of the uses of the caller_context structure is to pass information between the caller of a vnode operation (VOP) and a File Event Monitor (FEM). It is used by both NFS and CIFS servers. Monitors often need to perform operations that would block the caller. For example, an NFSv4 delegation monitor may need to perform an over-the-wire operation to recall a delegation. The problem is that the caller may not be in a position to block and has no way to communicate that state to the monitor. Proposed Solution This case proposes to add a flags field (cc_flags) to the caller_context structure as well as values to communicate the behavior needed by the caller. The new caller context structure looks like this: typedef struct caller_context { pid_t cc_pid; /* Process ID of the caller */ int cc_sysid; /* System ID, used for remote calls */ u_longlong_tcc_caller_id; /* Identifier for (set of) caller(s) */ uint64_tcc_flags; -- NEW FLAG FIELD } caller_context_t; The first two new flags to be defined: #define CC_WOULDBLOCK 0x1 /* set upon return by monitor */ #define CC_DONTBLOCK0x2 /* set by caller */ The caller sets CC_DONTBLOCK in cc_flags to direct the monitor not to perform an operation that might block. In the case where a monitor would perform a blocking operation and CC_DONTBLOCK is set, the monitor sets CC_WOULDBLOCK in the cc_flags and returns EAGAIN to the caller. The first consumer of this new field would be the NFS server. The flags passed would inform the monitors on delegated files whether to wait for the delegation to be returned or just kick off the recall and return an error. The NFS server will set CC_DONTBLOCK to inform the delegation monitors not to wait for a delegation to be returned when there is a conflict. Instead, the monitors will return EAGAIN and set the CC_WOULDBLOCK flag after issuing the delegation recall. Exported Interfaces || Interface Name | Classification | Comments = || CC_WOULDBLOCK | consolidation | set when returning EAGAIN to an op | private| that would have been blocked. || CC_DONTBLOCK || set by caller to indicate that op's || should not block. 6. Resources and Schedule 6.4. Steering Committee requested information 6.4.1. Consolidation C-team Name: ON 6.5. ARC review type: FastTrack 6.6. ARC Exposure: open