> This bug, 6793488, has been created recently for a
> case I have open with Sun Support. This is a very
> severe case to me and my customer. I would like to
> get it fixed as quickly as possible. I was told the
> bug was dictating this case, which in turn, affects
> the priority of my case being resolved. If this
> affects others please add comments to this post. If
> an engineer can also help it would be greatly
> appreciated. Another bug similar to this one is
> 6571565. Here are the links:
> 
> http://bugs.opensolaris.org/view_bug.do?bug_id=6793488

Hey Josh, by accident I run across your posting. Since I'm the engineer who 
closed
this bug as not a bug I'd like to share my reasoning with you that I've put 
into the bugs
evaluation that is not visible to you. I hope this helps you to better 
understand the issue
and overall picture.

This is not a bug (tm), it is not a bug in lofs(7FS), not in ufs(7FS) 
and not in mv(1) either.

Rather this constitutes a collection of expectations how all these 3 things
above work together that just does not reflect reality and does not 
match actual defined system behavior.

The key to all those misunderstandings is the use of mv(1) here.

Let's start with some trivia first, from this, it'll become obvious what 
happens.

So what is mv(1) supposed to do ? 

http://www.opengroup.org/onlinepubs/009695399/utilities/mv.html

<snip>
The mv utility shall perform actions equivalent to the rename() 
function defined in the System Interfaces volume of IEEE 
Std 1003.1-2001, called with the following arguments:

The source_file operand is used as the old argument.

The destination path is used as the new argument.

If the destination path exists, mv shall attempt to remove it. 
<snip end>

Ie. mv(1) is essentially a rename(2), a possibly existing target file is lost,
and the source file is being renamed to the target name.

So what is rename(2) supposed to do ?

http://www.opengroup.org/onlinepubs/009695399/functions/rename.html

<snip>
The rename() function shall change the name of a file. The old argument 
points to the pathname of the file to be renamed. 
The new argument points to the new pathname of the file.

If the link named by the new argument exists, it shall be
removed and old renamed to new. 
 
If the link named by the new argument exists and the file's link count
becomes 0 when it is removed and no process has the file open, the space
occupied by the file shall be freed and the file shall no longer
be accessible. If one or more processes have the file open when
the last link is removed, the link shall be removed before rename()
returns, but the  removal of the file contents shall be postponed 
until all references to the file are closed.
<snip end>

To picture this, here's the code flow from mv(1) down into ufs(7FS), leaving
lofs(7FS) aside for the moment as it is not relevant to the basic idea.

usr/src/cmd/mv/mv.c:cpymve()

    599         if (mve) {
    600                 if (rename(source, target) >= 0)
    601                         return (0);
    
usr/src/uts/common/syscall/rename.c:rename()

     57         if (error = vn_rename(from, to, UIO_USERSPACE))
     
usr/src/uts/common/fs/vnode.c:vn_rename()->vn_renameat()

  1678  error = VOP_RENAME(fromvp, fpn.pn_path, tovp, tpn.pn_path, CRED(),
  
usr/src/uts/common/fs/ufs/ufs_vnops.c:ufs_rename()

   3666          * Link source to the target.  If a target exists, return its
   3667          * vnode pointer in tvp.  We'll release it after sending the
   3668          * vnevent.
   
### in here, we rename the entry in the directory tdp so that it points to
### the source inode # instead of target inode #
### ie. the target name in the namespace now points to the source inode #
### if it existed previously, the source name dissappears from the namespace
### this happens in ufs_dirrename()

   3670         if (error = ufs_direnter_lr(tdp, tnm, DE_RENAME, sdp, sip, cr, 
&tvp)) {
   3671                 /*
   3672                  * ESAME isn't really an error; it indicates that the
   3673                  * operation should not be done because the source and 
target
   3674                  * are the same file, but that no error should be 
reported.
   3675                  */
   3676                 if (error == ESAME)
   3677                         error = 0;
   3678                 goto errout;
   3679         }
[...]
   3682          * Unlink the source.
   3683          * Remove the source entry.  ufs_dirremove() checks that the 
entry
   3684          * still reflects sip, and returns an error if it doesn't.
   3685          * If the entry has changed just forget about it.  Release
   3686          * the source inode.
   3687          */
   3688         if ((error = ufs_dirremove(sdp, snm, sip, (struct vnode *)0,
   3689             DR_RENAME, cr, NULL)) == ENOENT)

So what are the implications of this wrt. to the behavior
complained about in this bug, ie. the example from the description:

<snip>
Maybe the following example could make things a little clearer:

# echo user:pass1 > /etc/curpassword
# touch /etc/mirrorpasswd
# mount -F lofs -o ro /etc/curpassword /etc/mirrorpasswd
# cat /etc/mirrorpasswd
user:pass1
# echo user:pass2 > /etc/newpassword
# mv /etc/newpassword /etc/curpassword
# cat /etc/mirrorpasswd
user:pass1

Here /etc/curpassword is the password or shadow table provided by the
global zone, and /etc/mirrorpasswd would be the Read-Only view given
in a local zone.
Changing the content of /etc/curpassword with a rename
(here mv, but vipw does just the same) is not reflected in the 
/etc/mirrorpasswd view.
<snip end>

let's re-do this with a bit more details:

1) create the 1 file:

opteron.root./export/home/batschul/test.=> echo user:pass1 > curpassword
opteron.root./export/home/batschul/test.=> ls -lisa
     29575    2 drwxr-xr-x   2 batschul other        512 Jan 23 13:03 .
         4   10 drwxr-xr-x  86 batschul other       5120 Jan 23 13:02 ..
     29577    2 -rw-r--r--   1 root     root          11 Jan 23 13:03 
curpassword

   we've got inode number #29577 for the UFS file 'curpassword'
   
2) create the new file that becomes the lofs mount point 'mirrorpasswd':

opteron.root./export/home/batschul/test.=> touch mirrorpasswd
opteron.root./export/home/batschul/test.=> ls -lisa
     29575    2 drwxr-xr-x   2 batschul other        512 Jan 23 13:03 .
         4   10 drwxr-xr-x  86 batschul other       5120 Jan 23 13:02 ..
     29577    2 -rw-r--r--   1 root     root          11 Jan 23 13:03 
curpassword
     29578    0 -rw-r--r--   1 root     root           0 Jan 23 13:03 
mirrorpasswd
     
  for this, we've got inode number #29578 for the UFS file
  
3) now perform the loopback mount of file 1) onto file 2)

opteron.root./export/home/batschul/test.=> mount -F lofs -o ro curpassword 
`pwd`/mirrorpasswd

opteron.root./export/home/batschul/test.=> mount -v|grep lofs
curpassword on /export/home/batschul/test/mirrorpasswd type lofs 
read-only/setuid/devices/dev=1980004 on Fri Jan 23 
13:04:24 2009

opteron.root./export/home/batschul/test.=> ls -lisa
     29575    2 drwxr-xr-x   2 batschul other        512 Jan 23 13:03 .
         4   10 drwxr-xr-x  86 batschul other       5120 Jan 23 13:02 ..
     29577    2 -rw-r--r--   1 root     root          11 Jan 23 13:03 
curpassword
     29578    2 -rw-r--r--   1 root     root          11 Jan 23 13:03 
mirrorpasswd
     
4) verify that our loopback mount works:

opteron.root./export/home/batschul/test.=> cat mirrorpasswd
user:pass1

5) create the 2nd new file 'newpassword'

opteron.root./export/home/batschul/test.=> echo user:pass2 > newpassword
opteron.root./export/home/batschul/test.=> ls -lisa
     29575    2 drwxr-xr-x   2 batschul other        512 Jan 23 13:06 .
         4   10 drwxr-xr-x  86 batschul other       5120 Jan 23 13:02 ..
     29577    2 -rw-r--r--   1 root     root          11 Jan 23 13:03 
curpassword
     29578    2 -rw-r--r--   1 root     root          11 Jan 23 13:03 
mirrorpasswd
     29579    2 -rw-r--r--   1 root     root          11 Jan 23 13:06 
newpassword
     
   for this we've got inode number #29579 for UFS file 'newpassword'
   
6) now move the 2nd 'newpassord' file over the original 1st 'currpassword'

opteron.root./export/home/batschul/test.=> mv newpassword curpassword
opteron.root./export/home/batschul/test.=> ls -lisa
     29575    2 drwxr-xr-x   2 batschul other        512 Jan 23 13:06 .
         4   10 drwxr-xr-x  86 batschul other       5120 Jan 23 13:02 ..
     29579    2 -rw-r--r--   1 root     root          11 Jan 23 13:06 
curpassword
     29578    2 -rw-r--r--   0 root     root          11 Jan 23 13:03 
mirrorpasswd
     
     see what happened ? the original 1st 'curpassword' file 
     with inode #29577 is gone, the 2nd file 'newpassword'
     with inode #29579 has been renamed to 'curpassword'
     
7) now check the content of our file and via the loopback mount:

opteron.root./export/home/batschul/test.=> cat mirrorpasswd
user:pass1

opteron.root./export/home/batschul/test.=> cat curpassword
user:pass2

Bummer, they are different! Now what happened ? 
Quite easy, remember the preliminaries about mv(1) and rename(2).

The access via the loopback mount 'mirrorpasswd' still does
show us the old content prior the mv(1) because that file
has been deleted from the namespace as part of the mv(1)
but there's still a VN_HOLD on its vnode, ie. the file 1)
is still open, and that is via the loopback mount. Hence
its content is still accesible via the loopback mount 
and while the file 1) had been removed from the namespace
its content is not yet deleted. 

The renamed file 2) however now exists in the namespace under
its new name 'curpassword' which was the former name 
of file 1), yet it is a different file with different
content and not accociated with a loopback mount at all.

To picture this from the kernel point of view _after_
the mv(1) has happened in 6)

8) check the dnlc for the 'curpassword' entrie:

> ::dnlc!grep curpassword
VP               DVP              NAME
ffffff01b1b67e80 ffffff01b5b9aa00 curpassword

9) the corresponding UFS vnode

> ffffff01b1b67e80::print vnode_t
{
    v_lock = {
        _opaque = [ 0 ]
    }
    v_flag = 0x10000
    v_count = 0x1
    v_data = 0xffffff01b1b68de8 ### UFS inode #29579 from 5)
    v_vfsp = 0xffffff01ac655aa0 ### UFS /export/home
    v_stream = 0
    v_type = 1 (VREG)
    v_rdev = 0xffffffffffffffff
    v_vfsmountedhere = 0
    v_op = 0xffffff01a8ec4780
    v_pages = 0xffffff0004f0e2e0
    v_filocks = 0
    v_shrlocks = 0
    v_nbllock = {
        _opaque = [ 0 ]
    }
    v_cv = {
        _opaque = 0
    }
    v_locality = 0
    v_femhead = 0
    v_path = 0xffffff01bb97f1e8 "/export/home/batschul/test/newpassword"
    (thats because rename on UFS does not update v_path)
    v_rdcnt = 0
    v_wrcnt = 0
    v_mmap_read = 0
    v_mmap_write = 0
    v_mpssdata = 0
    v_fopdata = 0
    v_vsd = 0
    v_xattrdir = 0
    v_count_dnlc = 0x1
}

10) verify the inode number and link count:

> ffffff01b1b68de8::inode -v
ADDR                INUMBER T  MODE     SIZE      DEVICE FLAG
ffffff01b1b68de8      29579 -  0644        b  6600000004 <REF>
  2009 Jan 23 13:06:03
  /export/home/batschul/test/newpassword
  
> ffffff01b1b68de8::print inode_t i_ic
{
    i_ic.ic_smode = 0x81a4
    i_ic.ic_nlink = 0x1
    
11) check the dnlc for the 'mirrorpasswd' lofs mount point:

> ::dnlc!grep mirrorpasswd
ffffff01b5ba6140 ffffff01b5b9aa00 mirrorpasswd

12) the corresponding UFS vnode acting as the lofs mount
    point 'mirrorpasswd'
   
> ffffff01b5ba6140::print vnode_t
{
    v_lock = {
        _opaque = [ 0 ]
    }
    v_flag = 0x10100
    v_count = 0x2
    v_data = 0xffffff01b5ba59c8 ### UFS inode #29578 from 2)
    v_vfsp = 0xffffff01ac655aa0 ### UFS /export/home
    v_stream = 0
    v_type = 1 (VREG)
    v_rdev = 0xffffffffffffffff
    v_vfsmountedhere = 0xffffff01ae0b2908  ### lofs mount
    v_op = 0xffffff01a8ec4780
    v_pages = 0
    v_filocks = 0
    v_shrlocks = 0
    v_nbllock = {
        _opaque = [ 0 ]
    }
    v_cv = {
        _opaque = 0
    }
    v_locality = 0
    v_femhead = 0
    v_path = 0xffffff01bc8661e0 "/export/home/batschul/test/mirrorpasswd"
    v_rdcnt = 0
    v_wrcnt = 0
    v_mmap_read = 0
    v_mmap_write = 0
    v_mpssdata = 0
    v_fopdata = 0
    v_vsd = 0
    v_xattrdir = 0
    v_count_dnlc = 0x1
}

13) verify the inode number and link count:

> 0xffffff01b5ba59c8::inode -v
ADDR                INUMBER T  MODE     SIZE      DEVICE FLAG
ffffff01b5ba59c8      29578 -  0644        0  6600000004 <MODTIME,REF>
  2009 Jan 23 13:03:33
  /export/home/batschul/test/mirrorpasswd

the VFS mounted here is lofs loopback mount
/export/home/batschul/test/mirrorpasswd

14) the lofs mounts VFS:

> 0xffffff01ae0b2908::print vfs_t
{
    vfs_next = root
    vfs_prev = 0xffffff01ae0b29d8
    vfs_op = vfssw+0x538
    vfs_vnodecovered = 0xffffff01b5ba6140 
    ### covered vnode is UFS inode #29578 from 2), our "mountpoint"
    vfs_flag = 0x2401
    vfs_bsize = 0x2000
    vfs_fstype = 0xa
    vfs_fsid = {
        val = [ 0x1980004, 0x2 ]
    }
    vfs_data = 0xffffff01b7659e30  ### loinfo struct
    vfs_dev = 0x6600000004
    vfs_bcount = 0
    vfs_list = 0
    vfs_hash = 0
    vfs_reflock = {
        _opaque = [ 0, 0 ]
    }
    vfs_count = 0x1
    vfs_mntopts = {
        mo_count = 0x11
        mo_list = 0xffffff01adeba000
    }
    vfs_resource = 0xffffff01ad1ecb08
    vfs_mntpt = 0xffffff01aadd3d80
    vfs_mtime = 2009 Jan 23 13:04:24
    vfs_implp = 0xffffff01bfc95500
    vfs_zone = zone0
    vfs_zone_next = root
    vfs_zone_prev = 0xffffff01ae0b29d8
    vfs_femhead = 0
    vfs_lofi_minor = 0
}

15) grab the corresponding lofs loinfo struct:

> 0xffffff01b7659e30::print 'struct loinfo'
{
    li_realvfs = 0xffffff01ac655aa0 ### UFS /export/home
    li_mountvfs = 0xffffff01ae0b2908 ### LOFS 
/export/home/batschul/test/mirrorpasswd
    li_rootvp = 0xffffff01b6ebe880
    li_mflag = 0x1
    li_dflag = 0
    li_refct = 0x1
    li_htsize = 0x1
    li_hashtable = 0xffffff01b155d8c0
    li_lfs = 0
    li_lfslock = {
        _opaque = [ 0 ]
    }
    li_htlock = {
        _opaque = [ 0 ]
    }
    li_retired = 0
    li_flag = 0
}

16) look at the root lofs vnode shadowing our UFS mountpoint 'mirrorpasswd':

> 0xffffff01b6ebe880::print vnode_t
{
    v_lock = {
        _opaque = [ 0 ]
    }
    v_flag = 0x1
    v_count = 0x1
    v_data = 0xffffff01ab298f68 ### lofs lnode
    v_vfsp = 0xffffff01ae0b2908 ### LOFS mount 
/export/home/batschul/test/mirrorpasswd
    v_stream = 0
    v_type = 1 (VREG)
    v_rdev = 0xffffffffffffffff
    v_vfsmountedhere = 0
    v_op = 0xffffff01aa208340
    v_pages = 0
    v_filocks = 0
    v_shrlocks = 0
    v_nbllock = {
        _opaque = [ 0 ]
    }
    v_cv = {
        _opaque = 0
    }
    v_locality = 0
    v_femhead = 0
    v_path = 0xffffff01bc8d1810 "/export/home/batschul/test/mirrorpasswd"
    v_rdcnt = 0
    v_wrcnt = 0
    v_mmap_read = 0
    v_mmap_write = 0
    v_mpssdata = 0
    v_fopdata = 0
    v_vsd = 0
    v_xattrdir = 0
    v_count_dnlc = 0
}

17) the corresponding lofs lnode still shadows our original
    UFS inode/vnode from step 1) !!!

> 0xffffff01ab298f68::print lnode_t
{
    lo_next = 0
    lo_vp = 0xffffff01b395ee00  ### original, UFS inode #29577 from 1) !!!
    lo_looping = 0
    lo_vnode = 0xffffff01b6ebe880  ### our loinfo->li_rootvp
}

 * The lnode is the "inode" for loop-back files.  It contains
 * all the information necessary to handle loop-back file on the
 * client side.
 */
typedef struct lnode {
        struct lnode    *lo_next;       /* link for hash chain */
        struct vnode    *lo_vp;         /* pointer to real vnode */
        uint_t          lo_looping;     /* looping flags (see below) */
        struct vnode    *lo_vnode;      /* place holder vnode for file */
} lnode_t;

the lo_vp, real vnode points to the deleted inital, original
UFS inode #29577 from 1)

18) verify the orginal UFS inode/vnode from step 1) still
    being alive:

> 0xffffff01b395ee00::print vnode_t
{
    v_lock = {
        _opaque = [ 0 ]
    }
    v_flag = 0x10000
    v_count = 0x1  ### only 1 VN_HOLD left, from LOFS
    v_data = 0xffffff01b3960678
    v_vfsp = 0xffffff01ac655aa0  ### UFS /export/home
    v_stream = 0
    v_type = 1 (VREG)
    v_rdev = 0xffffffffffffffff
    v_vfsmountedhere = 0
    v_op = 0xffffff01a8ec4780
    v_pages = 0xffffff0005649898
    v_filocks = 0
    v_shrlocks = 0
    v_nbllock = {
        _opaque = [ 0 ]
    }
    v_cv = {
        _opaque = 0
    }
    v_locality = 0
    v_femhead = 0
    v_path = 0xffffff01bb5c55b8 "/export/home/batschul/test/curpassword"
    v_rdcnt = 0
    v_wrcnt = 0
    v_mmap_read = 0
    v_mmap_write = 0
    v_mpssdata = 0
    v_fopdata = 0
    v_vsd = 0
    v_xattrdir = 0
    v_count_dnlc = 0
}

> 0xffffff01b3960678::inode -v
ADDR                INUMBER T  MODE     SIZE      DEVICE FLAG
ffffff01b3960678      29577 -  0644        b  6600000004 <REF>
  2009 Jan 23 13:03:22
  /export/home/batschul/test/curpassword

> ffffff01b3960678::print inode_t i_ic
{
    i_ic.ic_smode = 0x81a4
    i_ic.ic_nlink = 0  ### deleted inode !
    
18) So what happens after we unmount the loopback mount ?

opteron.root./export/home/batschul/test.=> umount 
/export/home/batschul/test/mirrorpasswd

opteron.root./export/home/batschul/test.=> ls -lisa
     29575    2 drwxr-xr-x   2 batschul other        512 Jan 23 13:06 .
         4   10 drwxr-xr-x  87 batschul other       5120 Jan 23 17:35 ..
     29579    2 -rw-r--r--   1 root     root          11 Jan 23 13:06 
curpassword
     29578    0 -rw-r--r--   1 root     root           0 Jan 23 13:03 
mirrorpasswd

correct, the mount point is again an empty file.

So every layer did what it has been asked for including eventually lofs, 
which still shadows the original file created in 1) which is all it has 
been asked for.

So the bussiness summary out of this all is that replacing the original
target of a loopback mount by the use of mv(1) or rename(2) with a new file
and assuming that the loopback mount somehow magically shadows the new 
replacement is a wrong assumption.

This is btw, not different as if you'd do the same replacement operation 
to a given file that a given process has still open, he'll always see 
the original file until he closes it, lofs does not change anything in
the picture here. There's no magic refreshment operation inside lofs.

cheers
frankB
-- 
This message posted from opensolaris.org

Reply via email to