forcedirectio mount flag in nfsstat [PSARC/2009/587 Self Review]

2009-10-27 Thread rich.br...@sun.com

I'm sponsoring this automatic-approved self review case for Marcel
Telka (RPE).

This case corrects an obvious oversight (missing display of
'forcedirectio' mount flag) in the output of nfsstat(1m) and updates
the corresponding man page.

I believe this case qualifies for self-review, but if anyone disagrees,
let me know and I'll promote it to a fast track.


Template Version: @(#)sac_nextcase 1.68 02/23/09 SMI
This information is Copyright 2009 Sun Microsystems
1. Introduction
1.1. Project/Component Working Name:
 forcedirectio mount flag in nfsstat
1.2. Name of Document Author/Supplier:
 Author:  Marcel Telka
1.3  Date of This Document:
27 October, 2009
4. Technical Description

 The nfsstat(1m) command has an option (-m) which displays statistics
 for each NFS mounted file system.  This option also displays the mount
 flags for each NFS mounted file system; however, there is one existing
 mount flag that is missing from the display:  forcedirectio

 Support for the forcedirectio mount option was added to the Solaris
 NFS client together with directio(3C) capability as part of the fix
 for CR 4190364 (Solaris 8).  The 'forcedirectio' mount option is
 currently documented in mount_nfs(1M) man page. It was integrated into
 the man page in Solaris 9 by CR 4521941.

 During the development mentioned above, the nfsstat utility was not
 updated to display the forcedirectio mount flag when run with the -m
 option.  The lack of the forcedirectio flag in the 'nfsstat -m' output
 can cause confusion for users.

 This case corrects that oversight by adding support for the
 forcedirectio mount flag to the 'nfsstat -m' output.  This change will
 be documented in nfsstat(1M) man page (see below).

 For example, given the following NFS file system:

   # mount |grep test
   /tmp/test on snvx.czech:/builds 
remote/read/write/setuid/devices/forcedirectio/xattr/dev=615 on Mon Oct  5 
08:48:09 2009

 The current 'nfsstat' displays the following (note the lack of the
 'forcedirectio' flag):

   # nfsstat -m /tmp/test
   /tmp/test from snvx.czech:/builds
Flags: 
vers=4,proto=tcp,sec=sys,hard,intr,link,symlink,acl,rsize=1048576,wsize=1048576,retrans=5,timeo=600
Attr cache:acregmin=3,acregmax=60,acdirmin=30,acdirmax=60

 With the proposed change, the 'nfsstat' command would now include the
 'forcedirectio' mount flag:

   # nfsstat -m /tmp/test  
   /tmp/test from snvx.czech:/builds
Flags: 
vers=4,proto=tcp,sec=sys,hard,intr,link,symlink,acl,forcedirectio,rsize=1048576,wsize=1048576,retrans=5,timeo=600
Attr cache:acregmin=3,acregmax=60,acdirmin=30,acdirmax=60


 This case also includes an update to the nfsstat(1m) man page which
 has been pre-approved by Terry Gibson.  Terry is the RE for the
 associated man page CR (6888023: nfsstat: forcedirectio mount flag
 needs to be documented)

8--
 The -m option includes information about mount flags set  by
 mount options, mount flags internal to the system, and other
 mount information. See mount_nfs(1M).

 The following mount flags are set by mount options:

 forcedirectio  |
|
 Data transferred directly  between  client  and  server,   |
 with no buffering on client.   |

 grpid

 System V group id inheritance.
8--

6. Resources and Schedule
6.4. Steering Committee requested information
6.4.1. Consolidation C-team Name:
ON
6.5. ARC review type: Automatic
6.6. ARC Exposure: open



NFS Referrals [PSARC/2009/502 FastTrack timeout 09/25/2009]

2009-09-18 Thread rich.br...@sun.com
I'm sponsoring this case for Rob Thurlow.  This case proposes
to add NFS Referral support to the Solaris client and server.

Minor binding is requested.

This times out on Friday, 25 September, 2009.


Template Version: @(#)sac_nextcase 1.68 02/23/09 SMI
This information is Copyright 2009 Sun Microsystems
1. Introduction
1.1. Project/Component Working Name:
 NFS Referrals
1.2. Name of Document Author/Supplier:
 Author:  Rob Thurlow
1.3  Date of This Document:
18 September, 2009
4. Technical Description
NFS Referrals

A. Introduction

This project intends to introduce basic NFS referral support to
the Solaris client and server.  A Solaris client will follow a
referral to a new location and transparently mount it, and a
Solaris server will have basic support for creating and managing
referrals.  Referral support will permit construction of a
server-based unified namespace that NFS clients will be able
to participate in.  NFSv4 referrals are supported by modern
HP-UX, AIX, Linux releases.

This project makes use of Reparse Points, PSARC 2009/387, and is
a follow-on to the umbrella case PSARC 2009/399.  The project
also uses and extends interfaces from NFSv4 Mirror Mounts,
PSARC 2007/416; most of the automatic mounting code and all
of the automatic unmounting code is untouched.

This work is based on replication/migration primitives defined
in NFSv4 (RFC 3530)[1] with behaviour defined by later work [2].

B. Scope

The NFSv4 working group in the IETF is working on Federated FS,
which is a unified, centrally managed back-end to support a set
of servers presenting a uniform namespace.  This project is not
delivering support for Federated FS since that specification is
still a work-in-progress.  However, our work will permit drop-in
support of FedFS in the future.

This project is not delivering support for replication and
migration as described by the NFSv4 specification.  Solaris
currently has limited support for replication and client-side
failover via mount_nfs(1M) and the automounter via PSARC 1995/143.
This work is expected to be a basis for future migration support.

C. Behaviour

C.1 Client Behaviour

The client discovers referral objects as it examines the
NFSv4 server's filesystems, and the client associates a
distinct set of vnode ops with these objects.  Most client
filesystem operations will trigger referral mounts, but
VOP_LOOKUP() and most VOP_GETATTR() calls will not, to
match existing automount and mirror mount behaviour.  Prior
to a mount, the NFSv4 client will display referral objects
as directories with artificial metadata, like autofs trigger
nodes before mounting.  As with mirror mounts, find(1)'s use
of _AT_TRIGGER (PSARC 2007/563) will force a mount prior to
collecting attributes.  Mounting will be done without checking
the privilege of the calling process.

Automatic umounting will be done in the same way as for mirror
mounts.  Mounts will time out like mounts done by the automounter,
and a manual unmount of the enclosing filesystem will unmount
referral mounts and mirror mounts as well unless they are being
kept busy by processes' open files.

The NFSv4 client will advertise support for referrals via the
protocol's SETCLIENTID operation; see section C.2 for more.

The client will perform a mount of the first reachable location
in the fs_locations data.  The multi-valued fs_locations
attribute describes the locations (server:/path
combinations) for the same data.  fs_locations can use
hostnames, dotted quad numeric IPv4 address strings, or
IPv6 address strings; we will convert these according to
the client's transport configuration by doing a new door
upcall to the existing nfsmapid binary.  A new string,
referral, will be visible in nfsstat -m output for
an automatic mount done this way.  Other observability
will include a kstat counter of referral mounts done and
a number of dtrace probes to report the results of name
resolution and mounts.

A kernel tunable will permit disabling referral mounts
if it should be deemed necessary, but it will not be
documented.

C.2 Server Behaviour

When NFSv3 and NFSv4 server processes operations on behalf of
the client, it may encounter reparse points (as described in
PSARC 2009/387).  When that occurs, the server will examine the
reparse point data and look for a service type starting with
'nfs', indicating NFS service data.  If found, the NFSv4
server will normally return NFS4ERR_MOVED.  The NFSv4 client
will then normally request the fs_locations attribute.  In
response to this, the NFSv4 server will upcall with the NFS
service data to the reparse daemon.

In general, with reparse points, the service data can be
a key to find the desired location information, perhaps by
consulting a distributed or networked database.  For this
first release, the service data will simply be the location
information in the form of a host:/path [host:/path ...]
string.  This has the property that the referral will work
when the 

Copy Reduction Interfaces [PSARC/2009/478 FastTrack timeout 09/16/2009]

2009-09-09 Thread rich.br...@sun.com
I'm sponsoring this case on behalf of Mahesh Siddheshwar and Chunli Zhang.
This case proposes new interfaces to support copy reduction in the I/O path
especially for file sharing services.

Minor binding is requested.

This times out on Wednesday, 16 September, 2009.


Template Version: @(#)sac_nextcase 1.68 02/23/09 SMI
This information is Copyright 2009 Sun Microsystems
1. Introduction
1.1. Project/Component Working Name:
 Copy Reduction Interfaces
1.2. Name of Document Author/Supplier:
 Author:  Mahesh Siddheshwar, Chunli Zhang
1.3  Date of This Document:
09 September, 2009
4. Technical Description

 == Introduction/Background ==

 Zero-copy (copy avoidance) is essentially buffer sharing
 among multiple modules that pass data between the modules. 
 This proposal avoids the data copy in the READ/WRITE path 
 of filesystems, by providing a mechanism to share data buffers
 between the modules. It is intended to be used by network file
 sharing services like NFS, CIFS or others.

 Although the buffer sharing can be achieved through a few different
 solutions, any such solution must work with File Event Monitors
 (FEM monitors)[1] installed on the files. The solution must
 allow the underlying filesystem to maintain any existing file 
 range locking in the filesystem.
 
 The proposed solution provides extensions to the existing VOP
 interface to request and return buffers from a filesystem. The 
 buffers are then used with existing VOP_READ/VOP_WRITE calls with
 minimal changes.


 == Proposed Changes ==

 VOP Extensions for Zero-Copy Support
 

 a. Extended struct uio, xuio_t

  The following proposes an extensible uio structure that can be extended for
  multiple purposes.  For example, an immediate extension, xu_zc, is to be 
  used by the proposed VOP_REQZCBUF/VOP_RETZCBUF interfaces to pass loaned
  zero-copy buffers, as well as to be passed to the existing VOP_READ/VOP_WRITE
  calls for normal read/write operations.  Another example of extension,
  xu_aio, is intended to replace uioa_t for async I/O.

  This new structure, xuio_t, contains the following:

  - the existing uio structure (embedded) as the first member
  - additional fields to support extensibility
  - a union of all the defined extensions

  The following uio_extflag is added to indicate that an uio structure is
  indeed an xuio_t:

  #define   UIO_XUIO0x004   /* Structure is xuio_t */

  The following uio_extflag will be removed after uioa_t has been converted 
  to xuio_t:

  #define   UIO_ASYNC   0x002   /* Structure is xuio_t */

  The project team has commitment from the networking team to remove
  the current use of uioa_t and use the proposed extensions (CR 6880095).

  The definition of xuio_t is:

  typedef struct xuio {
uio_t xu_uio;   /* Embedded UIO structure */

/* Extended uio fields */
enum xuio_type xu_type; /* What kind of uio structure? */

union {

/* Async I/O Support */
struct {
uint32_t xu_a_state;/* state of async i/o */
uint32_t xu_a_state;/* state of async i/o */
ssize_t xu_a_mbytes;/* bytes that have been uioamove()ed */
uioa_page_t *xu_a_lcur; /* pointer into uioa_locked[] */
void **xu_a_lppp;   /* pointer into lcur-uioa_ppp[] */
void *xu_a_hwst[4]; /* opaque hardware state */
uioa_page_t xu_a_locked[UIOA_IOV_MAX];   /* Per iov locked pages */
} xu_aio;

/* Zero Copy Support */
struct {
enum uio_rw xu_zc_rw;   /* the use of the buffer */
void *xu_zc_priv;   /* fs specific */
} xu_zc;

} xu_ext;
  } xuio_t;

  where xu_type is currently defined as:

  typedef enum xuio_type {
UIOTYPE_ASYNCIO,
UIOTYPE_ZEROCOPY
  } xuio_type_t;

  New uio extensions can be added by defining a new xuio_type_t, and adding a
  new member to the xu_ext union.

 b. Requesting zero-copy buffers

#define VOP_REQZCBUF(vp, rwflag, uiozcp, cr, ct) \
fop_reqzcbuf(vp, rwflag, uiozcp, cr, ct)

int fop_reqzcbuf(vnode_t *, enum uio_rw, xuio_t *, cred_t *,
caller_context_t *);
 
This function requests buffers associated with file vp in preparation for a
subsequent zero copy read or write. The extended uio_t -- xuio_t is used
to pass the parameters and results. Only the following fields of xuio_t are
relevant to this call.
 
uiozcp-xu_uio.uio_resid: used by the caller to specify the total length
 of the buffer.

uiozcp-xu_uio.uio_loffset: Used by the caller to indicate the file offset
 it would like the buffers to be associated with. A value of -1 
 indicates that the provider returns buffers that are not associated
 with a particular offset.  These are defined to be anonymous buffers.
 Anonymous buffers may be used 

nfs4_fid() [PSARC/2009/468 FastTrack timeout 09/04/2009]

2009-08-28 Thread rich.br...@sun.com

I'm sponsoring this case for Bill Baker of the NFS team.  This
case proposed enabling technology for a forthcoming closed case.

The timer expires on Friday, September 4, 2009.

This case requests minor binding.


Template Version: @(#)sac_nextcase 1.68 02/23/09 SMI
This information is Copyright 2009 Sun Microsystems
1. Introduction
1.1. Project/Component Working Name:
 nfs4_fid()
1.2. Name of Document Author/Supplier:
 Author:  Bill Baker
1.3  Date of This Document:
28 August, 2009
4. Technical Description

Historically, the NFSv4 client has not implemented the VOP_FID()
function due to fundamental limitations of the protocol.  The client
cannot construct a file identifier which can be used to reactivate a
vnode (via VFS_VGET()) which is usable in all cases.  In particular,
volatile file handle recovery as well as handling VOP_OPEN() is
impossible without the additional information which is created and
maintained by the NFSv4 client during VOP_LOOKUP().  Vnodes activated
via VFS_VGET() may not have this information, nor can it be
constructed.  Given these limitations, the current nfs4_fid() simply
returns EREMOTE.

However, having nfs4_fid() return a file id can be useful in a very
narrow, controlled context.  This file ID can be used as an opaque
cookie which can be compared to file IDs from other vnodes from the
same vfs.  This could be used by a file tree walking program to
determine if a newly looked up file had been discovered previously
since, by definition, the file handle uniquely identifies a different
file in the protocol.  This ID is persistent (assuming the server is
using persistent file handles) and is therefore durable and reliable
across reboots of both the client and server.  The consumer of this ID
can write it to stable storage and safely recover its state even after
a client reboot.

Therefore, nfs4_fid() is reimplemented to return the client file handle
from the rnode as a file ID, solely for the purpose of doing this
equivalency comparison.  nfs4_vget() will not honor this file ID,
meaning that it can ONLY be used in this manner, it cannot be used to
activate a vnode.

Since this new behavior may expose existing subsystems, like cachefs,
to new failure modes, nfs4_fid() will only return a file ID when the
client file system is mounted with -o fid.  By default, the option is
off and the NFSv4 client will retain its current behavior.

Ideally, this mount option is only used by system components which wish
to use the file ID for identification as described above.  Other uses
are not supported.  The option will NOT be documented, due to its
limited utility.

|Proposed   |Specified  |
|Stability  |in what|
Interface Name  |Classification |Document?  | Comments
===
|Consolidation  |This   | 
  -o fid|Private|Document   | new option to
|   |   | mount_nfs
|   |   |


6. Resources and Schedule
6.4. Steering Committee requested information
6.4.1. Consolidation C-team Name:
ON
6.5. ARC review type: FastTrack
6.6. ARC Exposure: open




VSD locking update [PSARC/2009/343 FastTrack timeout 06/12/2009]

2009-06-05 Thread rich.br...@sun.com

I'm sponsoring the following fast-track for Bob Mastors.

This case is an update to PSARC 2007/456 Vnode Specific Data and adds
a new field to the vnode structure (in sys/vnode.h): v_vsd_lock.

The case seeks Minor binding which matches the binding of the original
case.  The timer expires on June 12, 2009.


Template Version: @(#)sac_nextcase 1.68 02/23/09 SMI
This information is Copyright 2009 Sun Microsystems
1. Introduction
1.1. Project/Component Working Name:
 VSD locking update
1.2. Name of Document Author/Supplier:
 Author:  Bob Mastors
1.3  Date of This Document:
05 June, 2009
4. Technical Description

PSARC 2007/456 Vnode Specific Data was created to easily associate
project specific data with a vnode.  The current implementation
requires consumers of vsd_get() and vsd_set() to hold the vnode v_lock
mutex across the calls.  However, vsd_set() may go to sleep on
kmem_alloc().  This could cause the system to deadlock if the vnode had
dirty pages that needed to be written.  This problem is described by:

  6839233 VSD usage of v_lock could cause deadlock

The solution adds a new mutex to vnode_t:

kmutex_tv_vsd_lock;/* protects v_vsd field */

Consumers of vsd_get() and vsd_set() would hold v_vsd_lock across the
calls instead of v_lock.  There are no lock ordering issues since the
locks are independent and there is no reason to hold both simultaneously.

Currently, the only ON consumer of vsd_get() and vsd_set() is NFS.

Exported Interfaces

Interface Name | Classification | Comments
=
   ||
v_vsd_lock | Consolidation  | New mutex in the vnode structure
   | Private| to protect v_vsd field

6. Resources and Schedule
6.4. Steering Committee requested information
6.4.1. Consolidation C-team Name:
ON
6.5. ARC review type: FastTrack
6.6. ARC Exposure: open




CIFS Client Commands Update [PSARC/2009/226 Self Review]

2009-04-07 Thread rich.br...@sun.com

I'm sponsoring the following for Gordon Ross.  This is an update to an
approved case.

I believe this change qualifies for self-review and I've marked it
Approved Automatic.  If anyone disagrees, please let me know and I'll
promote this to a fast-track.

The requested binding is PATCH which matches the binding approved for
the original case.


Template Version: @(#)sac_nextcase 1.68 02/23/09 SMI
This information is Copyright 2009 Sun Microsystems
1. Introduction
1.1. Project/Component Working Name:
 CIFS Client Commands Update
1.2. Name of Document Author/Supplier:
 Author:  Gordon Ross
1.3  Date of This Document:
07 April, 2009
4. Technical Description

 PSARC 2005/695 CIFS Client on Solaris (approved on 13 Sept 2007)
 introduced a new filesystem type: smbfs.  Since then the project team
 realized that several smbfs-specific commands were not included in the
 original case: dfshare_smbfs, share_smbfs, and unshare_smbfs.

 The lack of an smbfs-specific dfshares program causes the dfshares(1m)
 command to exit with a failure code which causes regression tests to
 fail. (See CR 6670499 for details.)  The smbfs-specific programs share
 and unshare are being included for completeness and conformity with
 autofs, cachefs, etc.

 This case corrects that oversight and adds the following commands:

/usr/lib/fs/smbfs/dfshares

dfshares_smbfs simply returns an exit code of 0

/usr/lib/fs/smbfs/share

share_smbfs prints smbfs share is not supported
and returns an exit code of 1

/usr/lib/fs/smbfs/unshare

unshare_smbfs prints smbfs unshare is not supported
and returns an exit code of 1

 Just as there are no share_{fstype}.1m, unshare_{fstype}.1m, and
 dfshares_{fstype}.1m manual pages for autofs and cachefs, there are
 also no corresponding manual pages needed for smbfs.

6. Resources and Schedule
6.4. Steering Committee requested information
6.4.1. Consolidation C-team Name:
ON
6.5. ARC review type: Automatic
6.6. ARC Exposure: open




umountall -Z [PSARC/2008/765 FastTrack timeout 12/18/2008]

2008-12-11 Thread rich.br...@sun.com

I'm sponsoring this case for Pavel Filipensky to add the '-Z' option to
umountall(1M).  This case times out on 12/18/2008.

Micro/patch binding is requested for this case.


Template Version: @(#)sac_nextcase %I% %G% SMI
This information is Copyright 2008 Sun Microsystems
1. Introduction
1.1. Project/Component Working Name:
 umountall -Z
1.2. Name of Document Author/Supplier:
 Author:  Pavel Filipensky
1.3  Date of This Document:
11 December, 2008
4. Technical Description

This case proposes two changes:

1) It introduces a new command line option, -Z, to umountall(1M).  When
   the umountall(1M) command is run in the global zone, this option
   applies the unmounting action(s) only to the file systems mounted in
   non-global zones.  The use of -Z option in non-global zones will
   have no effect.

2) The default behavior of umountall(1M) is changed to limit the
   unmounting action(s) to the current zone. 


Rationale for limiting the default scope to the current zone:

Currently, running umountall(1M) in the global zone unmounts file
systems from the global zone and from non-global zones as well.  This
is causing following bugs:

  6502014 NFS mounts in non-global zones are unmounted if NFS is restarted in 
the global zone
  6512906 Autofs mounts in non-global zones are unmounted when autofs is 
restarted in the global zone
  6777323 smb mounts in non-global zones are unmounted when smb/client is 
restarted in the global zone

Limiting the default scope of umountall(1M) to the current zone will
fix the bugs above.

Rationale for adding the new -Z option:

The -Z option will be used in the stop method of
svc:/system/zones:default.  This will take care of the case when we try
to stop zones and some of them fail to shut down.  It is better to try
to unmount the filesystems mounted in them to free resources on the
servers.

There are no side effects of using -Z option on other suboptions to
umountall(1).  Using -Z never changes the behaviour of other
suboptions, -Z only changes their scope.


The webrev for these changes is available here:
http://cr.opensolaris.org/~pavelf/6779275


Related CR:

  6779275 umountall(1M) -Z  ... limit unmounting action(s) to the non-global 
zones

EXPORTED INTERFACES

umountall(1M) optionStability Level

-Z  Committed

DOCUMENTATION IMPACT (See 6780521)

  manpage umountall(1M) changes:
1. a new -Z option
2. change in the default behavior


  Changes are as follows:

  SYNOPSIS
   mountall [-F FSType] [-l | -r] [file_system_table]

   umountall [-k] [-s] [-F FSType] [-l | -r] [-n]  [-Z]   +

   umountall [-k] [-s] [-h host] [-n] [-Z]+
  [...]
   umountall causes all mounted file  systems  in  the  current  +
   zone except root, /usr, /var, /var/adm, /var/run, /proc, and  +
   /dev/fd to be unmounted. If the FSType is  specified,  moun-
   tall  and umountall limit their actions to the FSType speci-
  [...]
   -s Do not perform the umount operation in parallel.

   -Z Apply the action(s)  only  to  the  file  systems  +
  mounted  in  non-global zones. By default, umoun-  +
  tall unmounts only file systems  mounted  in  the  +
  current  zone.  Has  no  effect if used in a non-  +
  global zone.   +

  FILES
  [...]   


6. Resources and Schedule
6.4. Steering Committee requested information
6.4.1. Consolidation C-team Name:
ON
6.5. ARC review type: FastTrack
6.6. ARC Exposure: open




VFSFT_SYSATTR_VIEWS [PSARC/2008/588 Self Review]

2008-09-17 Thread rich.br...@sun.com
I'm sponsoring this for Janice Chang.  This case adds a new VFS
Feature, VFSFT_SYSATTR_VIEWS, which is registered when a file system
supports the extended attribute files that describe extensible system
attributes (a.k.a. views).  This case is an extension of the
Extensible Attribute Interface (PSARC 2007/315) which was approved with
minor binding.

Since this case simply allows the (VFS) registration of an existing
interface, I'm filing this as Closed Approved Automatic.  If anyone
disagrees, let me know and I'll promote it to a fast track.


Template Version: @(#)sac_nextcase %I% %G% SMI
This information is Copyright 2008 Sun Microsystems
1. Introduction
1.1. Project/Component Working Name:
 VFSFT_SYSATTR_VIEWS
1.2. Name of Document Author/Supplier:
 Author:  Janice Chang
1.3  Date of This Document:
17 September, 2008
4. Technical Description

  PSARC 2007/315 (Extensible Attribute Interfaces) introduced a set of
  interfaces to retrieve and manipulate extensible system attributes on
  file objects. Extensible system attributes (also known as system
  attributes) were introduced specifically to support the CIFS service,
  which requires support for DOS attributes (PSARC 2006/715).

  One of the interfaces described by this case is an extensible vattr_t
  structure called xvattr_t. It is used with VOP_SETATTR()/VOP_GETATTR()
  to set/retrieve the new system attributes. File systems that support
  this interface communicate this to consumers by using the VFS Feature
  Registration facility (PSARC 2007/227) to register VFSFT_XVATTR.

  Another interface described by the Extensible Attribute Interfaces
  case is called a view. Each view exposes a group of system attributes
  as an extended attribute file whose name begins with SUNWattr_ (e.g.,
  SUNWattr_rw for modifiable attributes and SUNWattr_ro for
  attributes that cannot be modified). These views are used to accomodate
  existing extended attribute aware utilities.

  File systems that support modifiable system attributes use both the
  xvattr_t and views interface. However, some file systems (tmpfs, ufs)
  support only non-modifiable system attributes (e.g., FSID) which are
  exposed only through a read-only view (SUNWattr_ro).  Unfortunately,
  the VFSFT_XVATTR feature was set on these file systems to indicate
  support for system attributes. Overloading VFSFT_XVATTR meant that
  consumers would attempt to retrieve system attributes by using xvattr_t
  with VOP_GETATTR(), which would result in an error.

  In order to remedy this problem, a new VFS Feature is introduced--
  VFSFT_SYSATTR_VIEWS--which denotes support specifically for the view
  interface for extensible system attributes. All ON file systems that
  support views will be modified to register the VFSFT_SYSATTR_VIEWS
  feature and only those ON file systems that support the xvattr_t
  interface will register VFSFT_XVATTR.

  Consumers (in particular, the CIFS service) will be able to reliably
  determine which interface is available to manipulate the system
  attributes of a file.

  This change is Consolidation Private and will be communicated to the
  unbundled file system teams (internal and external).

6. Resources and Schedule
6.4. Steering Committee requested information
6.4.1. Consolidation C-team Name:
ON
6.5. ARC review type: Automatic
6.6. ARC Exposure: open




64 bit offsets for VOP_DUMP [PSARC/2008/053 FastTrack timeout 02/01/2008]

2008-01-25 Thread rich.br...@sun.com
I'm submitting this fast-track for Bob Mastors.
Requested binding is MINOR. Time-out is 1 Feb, 2008.


Template Version: @(#)sac_nextcase 1.64 07/13/07 SMI
This information is Copyright 2008 Sun Microsystems
1. Introduction
1.1. Project/Component Working Name:
 64 bit offsets for VOP_DUMP
1.2. Name of Document Author/Supplier:
 Author:  Bob Mastors
1.3  Date of This Document:
25 January, 2008
4. Technical Description

INTRODUCTION

The vnode operation VOP_DUMP uses 32-bit values for block
addressing. This prevents system crash dumps and cpr statefiles
from being saved on very large devices via VOP_DUMP.

PROPOSED CHANGES
Change block offset and lengths to type offset_t for VOP_DUMP
and VOP_DUMPCTL.

Change corresponding functions in FEM, file event monitoring.

Change file system implementations of VOP_DUMP and VOP_DUMPCTL
to work with the new types.

Change callers of the above VOP/FEM functions as needed.

The VFSDEF_VERSION number in sys/vfs.h will be bumped from 4 to 5
in order to prevent unbundled file system kernel modules with the
old signatures from loading.  Once the unbundled file system
modules are updated with the new signatures and recompiled, they
will also pick up the new VFSDEF_VERSION number and be allowed to
load.

All of the ON file systems will be updated with these changes.

Unbundled file system developers (internal and external) will be
given a heads up about these changes. Rich Brown is coordinating a
TOI on all file system changes made in Solaris Nevada.

Note that Solaris Nevada now performs strong type-checking on
vnode/FEM operations.  This means that the compilers will inform
unbundled file system developers of the signature discrepancy
in their code.

DEVICE DRIVER INTERFACE
No changes are proposed to the device driver interface.

File system specific dump functions typically call
the DDI routine bdev_dump which has the following signature:
  int bdev_dump(dev_t dev, caddr_t addr, daddr_t blkno, int blkcnt)
bdev_dump calls the underlying device dump(9e) function
which has the same signature.

On the 64-bit solaris kernel, daddr_t is 64-bits.
However on the 32-bit solaris kernel, daddr_t is a
32-bit value.

bdev_dump and device driver dump(9e) functions
have the following characteristics:

64-bit device drivers have 64-bit block addressing
and 32-bit block counts.

32-bit device drivers have 32-bit block addressing
and 32-bit block counts.

Current consumers of VOP_DUMP limit the transfer size
to a few megabytes at a time. They do not overflow
the 32-bit block count.

32-bit block addressing results in the following limitation.  A
file should not be used as the dump device or cpr statefile when
all of the following conditions occur:
a) the solaris kernel is compiled 32-bit
b) the file system is UFS
c) the file system is larger then 1 TB
This limitation may also apply to unbundled file systems.
ZFS does not have this limitation because it does not support
files as dump devices or cpr statefiles.

The fop_dump() function will be changed to perform safety checks
to ensure the offset and length passed to VOP_DUMP can be passed
onto bdev_dump safely. fop_dump will return EIO if the values
cannot be passed safely.  These safety checks may be removed in
the future if the DDI dump(9e) signature is modified to support
64-bit addressing and 64-bit block counts on all architectures.

ALTERNATIVES TO OFFSET_T
The selection of offset_t for the type of the block address and
block count seemed consistent with usage in other VOP functions
and struct uio.
Also offset_t is a signed value, as are the types it is replacing.

Alternatives considered include the following:
typedef u_longlong_tdiskaddr_t;
typedef u_longlong_tlen_t;
typedef u_longlong_tu_offset_t;
typedef uint64_tpaddr_t;

RELATED CASES
PSARC/2001/679 Vnode Interfaces
PSARC/2007/124 Strong Type-Checking for VFS Operations

RELATED CONTRACTS
 PSARC 2001/599 (FS related interfaces for SAM-QFS)
 PSARC 2004/177 (FS related interfaces for Sun Cluster)

RELATED CR
6214480 System crash dump fails when dump device is  1 TB

DELIVERY
 These modifications are intended to be part of Solaris Nevada.

MODIFIED INTERFACES
 +---+--++
 |  Interface|  Classification  |  Comments  |
 +---+--++
 |   | Contracted   ||
 | VOP_DUMP, | Consolidation| changed block addr and len |
 | fop_dump  | Private  | to offset_t|
 |   |  

Caller context flags [PSARC/2007/632 FastTrack timeout 11/09/2007]

2007-11-02 Thread rich.br...@sun.com
I'm sponsoring this fast-track for Jim Wahlig.  This case seeks Minor binding.


Template Version: @(#)sac_nextcase 1.64 07/13/07 SMI
This information is Copyright 2007 Sun Microsystems
1. Introduction
1.1. Project/Component Working Name:
 Caller context flags
1.2. Name of Document Author/Supplier:
 Author:  Jim Wahlig
1.3  Date of This Document:
02 November, 2007
4. Technical Description

 One of the uses of the caller_context structure is to pass information
 between the caller of a vnode operation (VOP) and a File Event Monitor
 (FEM).  It is used by both NFS and CIFS servers.

 Monitors often need to perform operations that would block the
 caller.  For example, an NFSv4 delegation monitor may need to
 perform an over-the-wire operation to recall a delegation.  The
 problem is that the caller may not be in a position to block and
 has no way to communicate that state to the monitor.

   Proposed Solution

 This case proposes to add a flags field (cc_flags) to the caller_context
 structure as well as values to communicate the behavior needed by the
 caller.

 The new caller context structure looks like this:
 typedef struct caller_context {
pid_t   cc_pid; /* Process ID of the caller */
int cc_sysid;   /* System ID, used for remote calls */
u_longlong_tcc_caller_id;   /* Identifier for (set of) caller(s) */
uint64_tcc_flags;  -- NEW FLAG FIELD
 } caller_context_t;

 The first two new flags to be defined:
 #define CC_WOULDBLOCK   0x1 /* set upon return by monitor */
 #define CC_DONTBLOCK0x2 /* set by caller */

 The caller sets CC_DONTBLOCK in cc_flags to direct the monitor not to
 perform an operation that might block.  In the case where a monitor would
 perform a blocking operation and CC_DONTBLOCK is set, the monitor
 sets CC_WOULDBLOCK in the cc_flags and returns EAGAIN to the caller.

 The first consumer of this new field would be the NFS server.  The flags
 passed would inform the monitors on delegated files whether to wait for
 the delegation to be returned or just kick off the recall and return
 an error.  The NFS server will set CC_DONTBLOCK to inform the 
 delegation monitors not to wait for a delegation to be returned when
 there is a conflict.  Instead, the monitors will return EAGAIN and set
 the CC_WOULDBLOCK flag after issuing the delegation recall.


   Exported Interfaces

  ||
   Interface Name | Classification | Comments
   =
  ||
   CC_WOULDBLOCK  | consolidation  | set when returning EAGAIN to an op
  | private| that would have been blocked.
  ||
   CC_DONTBLOCK   || set by caller to indicate that op's 
  || should not block.

6. Resources and Schedule
6.4. Steering Committee requested information
6.4.1. Consolidation C-team Name:
ON
6.5. ARC review type: FastTrack
6.6. ARC Exposure: open