I'm sponsoring the following fast track for Afshin Salek and the CIFS
i-team.  It times out on Friday, July 17th.

A copy of the specification below appears in the case directory under
the name "specification".

I've pre-reviewed it and will give it a +1 up front.

                -- Glenn

----------------

Template Version: @(#)onepager.txt 1.35 07/11/07 SMI
Copyright 2007 Sun Microsystems

1. Introduction
   1.1. Project/Component Working Name:
        Support for Reparse Points

   1.2. Name of Document Author/Supplier:
        Author: Afshin Salek

   1.3. Date of This Document:
        07/08/09
        
   1.4. Name of Major Document Customer(s)/Consumer(s):
        PSARC
        CIFS team

   1.5. Email Aliases:
        1.5.1. Responsible Manager: Barry.Greenberg at Sun.COM
        1.5.2. Responsible Engineer: Afshin.Ardakani at Sun.COM
        1.5.3. Marketing Manager:
        1.5.4. Interest List: cifs-team at sun.com

   A patch binding is requested for this change.        

4. Technical Description:
    4.1. Details:

       INTRODUCTION
          
         There are situations where a mechanism is needed to reflect
         the concept that data is not present at a particular path, but
         can be found in some alternate location(s).  Examples include
         "referrals" used to build unified name spaces in NFSv4.x and
         SMB, and data relocation in HSM systems.  A "reparse point" is
         defined as the marker for a namespace redirection and a
         container for the metadata to specify where the target of this
         redirection is.
          
         Reparse points are intended to be a general mechanism for
         location redirection and as such the file system that contains
         them is not cognizant of the reparse point format or content.
         Services that use reparse points know how to interpret and use
         the stored data.
          
       REPARSE POINT OBJECT
          
         After a lot of discussion the consensus is that the best way
         to represent reparse points in the file system, in order to
         minimize the effect on existing applications and utilities, to
         use symbolic links.  One of the main goals in this context has
         been the ability to use existing utilities for backup/restore
         and also ZFS send/receive without having to modify them to
         know how to deal with reparse points.

         Some of what is envisioned here could be done with extensions
         to the Solaris automounter capability.  Part of the
         motivation, though, is to create centrally-administrated
         namespaces served by a group of fileservers to near-zero-admin
         clients.  It is expected to be easier to keep the namespaces
         uniform if only a small number of servers need to participate.
         HSM solutions would also normally be tied closely to a storage
         server by this mechanism.  Also, for both NFS and SMB
         referrals, it is the client that chooses the target and not
         the server.  The server only provides the targets' information
         and it is up to the client to pick the desirable target to
         access the data.

         To distinguish a regular symlink from a reparse point, an
         extensible system attribute will be set on the symlink.  This
         system attribute is only one bit which indicates whether or
         not a symlink contains reparse data.
          
         The reparse data will be stored as the link target.  The
         reparse data is not in file system path format, which is the
         typical format of a link target.  In order to avoid coming up
         with a totaly new format for reparse data as the link target
         we decided to adopt the format used by magic links in BSD:
         (http://www.daemon-systems.org/man/symlink.7.html)
          
         @{repa...@{service-type1:data} [...@{service-type2:data}]...}
          
         Where some examples of service-type are:
       
         #define REPARSE_SVC_SMB        "SMB"
         #define REPARSE_SVC_NFS        "NFS"
         #define REPARSE_SVC_HSM        "HSM"
          
         The data for each service will be in string format, which is
         expected to be typically a UUID string.

         The pattern above starts with "REPARSE" to distinguish it from
         a other magic links, such as those supported by BSD.  Note
         that this case is not a proposal to support BSD magic links,
         the intent is to avoid precluding the future addition of full
         BSD magic link support.
          
         Multiple services entries can co-exist within the symlink
         data.  It is expected that normally, all entries would resolve
         to the same logical location, e.g.  NFS and CIFS clients would
         find the same files.
          
       BASIC INTERFACES
          
         There is a need for both userspace and kernel APIs to work
         with reparse points.
          
       Userspace API
          
         In userspace the symlink(2) system call will be used to set a
         reparse point.  The readlink(2) system call will be used in
         turn to read the reparse data.
          
       Kernel API
          
         In the kernel, VOP_SYMLINK and VOP_READLINK will be used to
         set/get reparse data.
          
         These interfaces will support all replication, archive and
         copy operations to preserve reparse points without further
         changes.
          
         fop_symlink() needs to be modified to recognize the reparse
         @{REPARSE} tag and pass the appropriate attribute (i.e.
         reparse system attribute) to VOP_SYMLINK to be set on the
         symlink.
       
       IMPLEMENTATION OBSERVATIONS
          
         VFS feature registration can be used to determine whether or
         not a file system supports reparse points.
          
         Two things are needed to obtain the reparse point data in the
         kernel.  First, the consumer needs to know that a reparse
         point has been encountered and, second, it needs the vnode
         pointer to the symlink.  The proposal is to enhance VOP_LOOKUP
         to return the attributes of the looked up vnode.  This way
         when the vnode is available the caller can check the
         attributes to determine if the returned vnode is a reparse
         point or a regular symlink.  Here are the old and revised
         signatures of VOP_LOOKUP:

         int VOP_LOOKUP(vnode_t *dvp, char *nm, vnode_t **vpp,
              pathname_t *pnp, int flags, vnode_t *rdir, cred_t *cr,
              caller_context_t *ct, int *deflags, pathname_t *ppnp)

         int VOP_LOOKUP(vnode_t *dvp, char *nm, vnode_t **vpp,
              pathname_t *pnp, int flags, vnode_t *rdir, cred_t *cr,
              caller_context_t *ct, int *deflags, pathname_t *ppnp,
              vattr_t *vap)
          
         A vattr_t pointer argument is added at the end to return the
         attributes if it is non-NULL.  This is an optimization so that
         consumers don't have to invoke an extra VOP_GETATTR after
         lookup for obtaining the attributes.

         The symlink target size should be increased to 16K to
         accomodate the maximum size supported for MS-DFS referrals by
         Windows.  Applications are expected to query the PATH_MAX and
         SYMLINK_MAX values on the local system using
         pathconf(2)/fpathconf(2).  The value of SYMLINK_MAX would be
         changed to 16K on ZFS.  The value of PATH_MAX will not be
         affected.
            
         To provide compatibility with other UNIXes (see section 6
         below), sharemgr(1M) would be enhanced to support a "refer"
         option for NFS exports.  This option would only result in
         creation of a reparse point at the specified path and does not
         actually share the path over NFS.
            
         This case is only about the underlying infrastructure and a
         future case will be presented to deal with details and
         specifics of handling referrals for NFSv4 server.

       SECURITY CONSIDERATIONS
            
         Referrals are similar to regular symbolic links in that they
         are only pointers to data that could be discovered in some
         other way.  The presence of such a pointer does not compromise
         the security of the target object or data; the target service
         or file system must still enforce security.
            
       OPERATION FLOW
            
         Once a kernel service encounters a reparse point, it reads the
         data using VOP_READLINK and passes the data up to a user space
         daemon (e.g.  reparsed) along with its desired record type.
         Depending on the requested record type the daemon could simply
         extract the information from the passed data and return it to
         kernel or do any other processing necessary to obtain the
         actual referral information e.g.  in the case of FedFS,
         contacting NSDB.  Going through a common user space daemon to
         get the referral data makes this process generic and easily
         expandable for possible future use cases.
            
         Referral extraction and creation by a userspace daemon can be
         handled via a library plugin architecture for different
         service types.
            
       Operation Flow Example
            
         Here is a simplified example of operation for a CIFS client
         that tries to access a file where the path contains a DFS
         link:
            
         a) Client tries to access \\srv\root\...\link\...\file.txt
            where:
               'root' is a share (namespace root)
               'link' is a reparse point seen as a folder by client
          
         b) CIFS server does a VOP_LOOKUP for 'link' when it is
            recognized as a reparse point by examining the attributes
            return by VOP_LOOKUP.  At this point a
            STATUS_PATH_NOT_COVERED is returned to client
          
         c) Client sends a "link referral" request to the server.  CIFS
            server uses VOP_READLINK to get the 'link' data and sends
            the data to 'reparsed' daemon via a door call and gets back
            the DFS link targets in a format understandable by the CIFS
            client.  The targets are sent back to the client in
            response to its "link referral" request.
          
         b) Client picks one of the targets and contacts the target
            server to access 'file.txt'
          
       NFS REFERRAL IN OTHER UNIXES
            
         FS referrals have been implemented in other major UNIX
         distributions such as Linux, AIX and HP-UX but there is no
         unified approach or implementation.

         Linux, AIX and HP-UX specify referrals as an NFS export
         option.  The option format is basically the same in all three
         operating systems (refer=path at host) but the presentation is
         somewhat different in each case:

         - In Linux a referral is presented as a mount point.
         - In HP-UX a referral is a file system partition or logical volume.
         - In AIX a special object is used to represent a referral.

         These are all mechanisms to trigger a change in namespace
         while resolving a path.
      
         This proposal is somewhat aligned with the AIX approach but
         does not require a new object type to be defined, which has
         the advantage of not impacting existing applications.  As
         mentioned previously, an NFS "refer" option will be supported
         to provide option format compatibility.
      
         Additionally, the Solaris requirements include support for
         both NFS and SMB referrals whereas these other operating
         systems only support NFS referrals, and they do not provide
         native SMB support.  For the Solaris operating system, this
         proposal provides a generic solution to support multiple,
         disparate referral mechanisms without placing restrictions on
         the format required by each mechanism.
    
         The following links provide a bit more details about each OS
         discussed above:
            
         http://www.citi.umich.edu/projects/nfsv4/linux/using-referrals.html
         http://nfsv4.bullopensource.org/doc/migration-and-replication-0.2.pdf
         http://docs.hp.com/en/5900-0306/ch01s11.html?jumpid=reg_R1002_USEN
         http://docs.hp.com/en/13578/nfsv4_whitepaper.pdf 
         
http://publib.boulder.ibm.com/infocenter/systems/index.jsp?topic=/com.ibm.aix.commadmn/doc/commadmndita/nfs_referrals.htm
 

 INTERFACE TABLE

                          |Proposed       |Specified   |
                          |Stability      |in what     |
  Interface Name          |Classification |Document?   | Comments
  ===========================================================================
   XAT_REPARSE            |Consolidation  |This        |Reparse extensible
                          |Private        |Document    |attribute
                          |               |            |
   VOP_LOOKUP, fop_lookup |Contracted     |This        |Added new argument:
                          |Consolidation  |Document    |vattr_t *vap 
                          |Private*        |            |
                          |               |            |
   Reparse token syntax   |Committed      |This        |
                          |Private        |Document    |
                          |               |            |
   SYMLINK_MAX            |Committed      |This        |Increased to 16K
                          |               |Document    |

 * The project's deliverables will all go into the OS/NET
   Consolidation, so no contracts are required.

6. Resources and Schedule:

   6.4. Product Approval Committee requested information:
        6.4.1. Consolidation or Component Name:
               ON

   6.5. ARC review type:
        FastTrack


Reply via email to