Re: Testing if two files are on the same file system

2017-10-26 Thread Joerg Schilling
Stephane Chazelas  wrote:

> On FreeBSD, gnustat -fc%i returns 0 for everything including UFS
> file systems. On Solaris, one seems to get non zero values even
> for /proc

If this number is used to generate a NFS file handle for NFS export, then the 
NFS server would be unusable

Jörg

-- 
 EMail:jo...@schily.net(home) Jörg Schilling D-13353 Berlin
joerg.schill...@fokus.fraunhofer.de (work) Blog: http://schily.blogspot.com/
 URL: http://cdrecord.org/private/ http://sf.net/projects/schilytools/files/'



Re: Testing if two files are on the same file system

2017-10-26 Thread Joerg Schilling
Garrett Wollman  wrote:

> Actually, it is quite well known (just apparently not by the author of
> that document).  The fsid is used to generate NFS file handles.  It's
> supposed to be random and unique per file server.  (I think also it
> was supposed to be hard to guess as well, but the original NFS protocol
> didn't provide the sort of security guarantees that would make that
> property useful.)

The filesystem ID is not supposed to be randon but just unique for each 
filesystem on a machine.

It cannot be random as it needs to survive a reboot.

A NFS filehandle in total is indeed de-facto random. It is the combination of 
the filesystem IS, the inode number and the file generation number that is 
incremented each time, the file association is changed, e.g. by a unlink()
operation.

BTW: to make the NFS file handle "random", the initial file generation number 
is randomized, to prevent people from being able to guess a NFS file handle.

Jörg

-- 
 EMail:jo...@schily.net(home) Jörg Schilling D-13353 Berlin
joerg.schill...@fokus.fraunhofer.de (work) Blog: http://schily.blogspot.com/
 URL: http://cdrecord.org/private/ http://sf.net/projects/schilytools/files/'



Re: Testing if two files are on the same file system

2017-10-25 Thread Stephane Chazelas
2017-10-25 12:01:04 +0200, Martijn Dekker:
[...]
> > I'm not sure what the output of "LC_ALL=C df -P" should be for a
> > mount point like a UTF-8 /home/stéphane
> 
> That's a good point. I had actually added LC_ALL=C to my 'df'
> invocation, more out of habit than anything. I should remove it.

I'd say in practice, LC_ALL=C is more likely  to help than be a
hindrance as in the C locale, I'd expect you'd get a straight
byte dump of the value of the mount source/target.

> 
> > symlinks and device files
> > #
> > 
> > Depending on the stat implementation, for symlinks, you'll get
> > the st_dev of the symlink or the file it points to, while
> > AFAICT, df gives you the information for the target of the
> > symlink
> 
> In modernish I deal with that:
> 
> - for 'is onsamefs', by stripping the last element from each given path
> if it's not a directory;

Note that there's nothing stopping a non-directory file from
being a mount-point. It's not unheard of to bind-mount FIFOs
into areas intended to be used as chroots for instance.

Or bind-mounting a file to make it readonly and not
renamable/removable on fs that don't support immutable
attributes.

> - for 'is -L onsamefs' (-L = follow symlinks), by stripping the last
> element from each given path if it's not a directory or a non-broken
> symlink.
> 
> An extra complication for 'is -L onsamefs' is that 'df -P' on Mac and
> BSD does not like device files, e.g. 'df -P /dev/null' gives an error:
> "Raw devices not supported". Is that situation POSIX compliant? I
> suspect not, but in any case I have to deal with it.

Yes, POSIX leaves it unspecified for a device file that doesn't
contain a mounted file system.

> It's fine for 'is onsamefs /dev/null /foo/bar' as the final /null is
> stripped anyway, but for 'is -L on samefs symlink_to_dev_null /foo/bar',
> a valid symlink to a device still gives the error.

The behaviour of df on symlinks is unspecified AFAICT

-- 
Stephane



Re: Testing if two files are on the same file system

2017-10-25 Thread Stephane Chazelas
2017-10-25 10:11:35 +0200, Vincent Lefevre:
> On 2017-10-24 21:11:45 +0200, Joerg Schilling wrote:
> > Nick Stoughton  wrote:
> > > > If you are correct, this is a Linux kernel bug.
> > >
> > > Why? The stat command is not standardized. The --file-system argument
> > > changes the output of %i to something other than the inode ... no bug.
> > 
> > Whether it is standardized or not, it prints st_dev and if this is 0,
> > the underlying OS is not OK.
> 
> No, st_dev is correct (not 0). So, that would be a GNU stat bug.
[...]

As discussed elsewhere in this thread,

gnustat -fc %i

reports the statfs().f_fsid, not the stat().st_dev.

You need gnustat -c %D for stat().st_dev

Both Linux (for returning a 0 f_fsid for a fs that won't survice
a reboot, and anyway the f_fsid semantic is not specified and
(probably) not used on Linux) and GNU stat (for correctly
reporting the f_fsid returned by statfs()) are OK.

-- 
Stephane



Re: Testing if two files are on the same file system

2017-10-25 Thread Robert Elz
Date:Wed, 25 Oct 2017 11:17:56 +0200
From:Martijn Dekker 
Message-ID:  

  | if is onsamefs "$file1" "$file2"; then
  | ln "$file2" "$file1"# hard links are possible
  | else
  | ln -s "$file2" "$file1"
  | fi

That's just creating a race condition.   Much better to just do (attempt)
the ln and see what happens.

kre



Re: Testing if two files are on the same file system

2017-10-25 Thread Martijn Dekker
Op 23-10-17 om 17:36 schreef Vincent Lefevre:
> The initial question was actually not clear. First, you should define
> what a file system is. If this is not what is identified by st_dev[*],
> what is it?

That's a good point. What I mean is an instance of a file system on a
particular device or partition, mounted on a particular mount point.

A concrete working definition is: if I can create a hard link from a
file on one directory into another directory, those directories are on
the same file system. I'd like to be able to do something like:

if is onsamefs "$file1" "$file2"; then
ln "$file2" "$file1"# hard links are possible
else
ln -s "$file2" "$file1"
fi

I found that, on Linux, when a file system is mounted twice on different
mount points using mount --bind, 'ln' will refuse to create a hard link
between the different mount points, even though they are on the same
device (with the same device ID). So including the mount point in the
definition turns out to be essential. It looks like neither GNU 'stat'
nor BSD 'stat' will allow this.

(I know in the concrete example above I could just do
ln "$file2" "$file1" 2>/dev/null || ln -s "$file2" "$file1"
but there are other use cases. Besides, doing a proper test first is
actually better, as you can distinguish between failure modes.)

Thanks,

- M.



Re: Testing if two files are on the same file system

2017-10-25 Thread Robert Elz
Date:Wed, 18 Oct 2017 15:01:23 +0200
From:Martijn Dekker 
Message-ID:  <4d781f3b-085c-d2ca-1912-b08410266...@inlv.org>

  | Is there a way, using POSIX shell and utilities, to reliably test if two
  | files are on the same file system?

I have been ignoring this question, as it didn't seem important enough
to bother with, but it has generated a lot of traffic, so ...

  | It seems like a basic feature that the shell should have access to,

Why?

What possible purpose can possibly be served by that information, beyond
random curiosity?

I can see a use for knowing if two (different) pathnames represent the
same file, but who cares which filesystem the file is housed on?
That is, that anyone would want to use in a script?

Seems like it is all a waste of effort to me.

kre



Re: Testing if two files are on the same file system

2017-10-25 Thread Vincent Lefevre
On 2017-10-24 21:11:45 +0200, Joerg Schilling wrote:
> Nick Stoughton  wrote:
> > > If you are correct, this is a Linux kernel bug.
> >
> > Why? The stat command is not standardized. The --file-system argument
> > changes the output of %i to something other than the inode ... no bug.
> 
> Whether it is standardized or not, it prints st_dev and if this is 0,
> the underlying OS is not OK.

No, st_dev is correct (not 0). So, that would be a GNU stat bug.

-- 
Vincent Lefèvre  - Web: 
100% accessible validated (X)HTML - Blog: 
Work: CR INRIA - computer arithmetic / AriC project (LIP, ENS-Lyon)



Re: Testing if two files are on the same file system

2017-10-25 Thread Garrett Wollman
On October 25, 2017 1:39:01 AM EDT, Stephane Chazelas 
 wrote:

>On FreeBSD, gnustat -fc%i returns 0 for everything including UFS
>file systems. On Solaris, one seems to get non zero values even
>for /proc

To find the actual fsid as used in file handles requires appropriate privilege. 
(This is because being able to generate valid file handles as an unprivileged 
user is an access control bypass.) Filesystems that don't implement 
VFSOP_FHTOVP() don't need to implement fsids, either.

>In any case, it doesn't look like it's use for encoding
>filehandles in nfs on Linux (at least not the in-kernel server

Pretty sure it's an API artifact of the original SunOS implementation of NFS, 
faithfully preserved in Rick Macklem's 4.4BSD implementation because "that's 
how it was documented to work".

So, relevant to the original question: don't bother trying to use fsids for 
this purpose. Just use st_dev/st_ino. (Filesystems exist that break the latter, 
but they are typically nonconforming in other ways too.)

-GAWollman




Re: Testing if two files are on the same file system

2017-10-24 Thread Stephane Chazelas
2017-10-25 06:39:01 +0100, Stephane Chazelas:
[...]
> On FreeBSD, gnustat -fc%i returns 0 for everything including UFS
> file systems.
[...]

Sorry, only if you're not root as hinted in the Linux statfs man
page. You do get a non-zero value as root including for devfs
and tmpfs on FreeBSD.

But the end result is still that Martijn cannot use gnustat
-fc%i there either. Again, best to use st_dev (stat -f%d, zstat
+dev or gnustat -c%D, -L to get the  target of symlinks
(reversed for zstat))

-- 
Stephane



Re: Testing if two files are on the same file system

2017-10-24 Thread Stephane Chazelas
2017-10-24 16:01:22 -0400, Garrett Wollman:
[...]
> >> Nobody knows what f_fsid is supposed to contain (but see below).
> 
> Actually, it is quite well known (just apparently not by the author of
> that document).  The fsid is used to generate NFS file handles.  It's
> supposed to be random and unique per file server.  (I think also it
> was supposed to be hard to guess as well, but the original NFS protocol
> didn't provide the sort of security guarantees that would make that
> property useful.)
[...]

Thanks.

So I suppose the difference with st_dev is that that f_fsid is
supposed to be consistent accross reboots, which probably
explains why it's 0 for FS like tmpfs, proc, sys, fuse... as
they don't survive reboots or at least Linux can't guarantee
you'll get the same inode for a given file on the next reboot
anyway.

On FreeBSD, gnustat -fc%i returns 0 for everything including UFS
file systems. On Solaris, one seems to get non zero values even
for /proc

In any case, it doesn't look like it's use for encoding
filehandles in nfs on Linux (at least not the in-kernel server
implementation, maybe that's use by userspace nfs server
implementations or other things that need unique IDs for file
systems).

-- 
Stephane



Re: Testing if two files are on the same file system

2017-10-24 Thread Garrett Wollman
< said:

> 2017-10-24 09:06:19 -0700, Nick Stoughton:
>> > If you are correct, this is a Linux kernel bug.
>> 
>> Why? The stat command is not standardized. The --file-system argument
>> changes the output of %i to something other than the inode ... no bug.
> [...]

> Actually, the Linux man page for statfs(2)
> (http://man7.org/linux/man-pages/man2/statfs.2.html) says:

>> Nobody knows what f_fsid is supposed to contain (but see below).

Actually, it is quite well known (just apparently not by the author of
that document).  The fsid is used to generate NFS file handles.  It's
supposed to be random and unique per file server.  (I think also it
was supposed to be hard to guess as well, but the original NFS protocol
didn't provide the sort of security guarantees that would make that
property useful.)

> And it's true POSIX doesn't seem to say what that is for. In
> any case, that would seem to confirm that that can't be used
> reliably to identify mounted file systems uniquely. The
> stat().st_dev is probably better (gstat -c %D, zstat +dev...)

Also BSD stat -f %d.

-GAWollman



Re: Testing if two files are on the same file system

2017-10-24 Thread Joerg Schilling
Nick Stoughton  wrote:

> > If you are correct, this is a Linux kernel bug.
>
> Why? The stat command is not standardized. The --file-system argument
> changes the output of %i to something other than the inode ... no bug.

Whether it is standardized or not, it prints st_dev and if this is 0,
the underlying OS is not OK.

Jörg

-- 
 EMail:jo...@schily.net(home) Jörg Schilling D-13353 Berlin
joerg.schill...@fokus.fraunhofer.de (work) Blog: http://schily.blogspot.com/
 URL: http://cdrecord.org/private/ http://sf.net/projects/schilytools/files/'



Re: Testing if two files are on the same file system

2017-10-24 Thread Stephane Chazelas
2017-10-24 10:22:42 +0100, Stephane Chazelas:
[...]
> stat implementations
> 
[...]
> Only the zsh stat builtin (can be loaded as
> "zstat" to avoid overriding the system's stat for those that now
> have added (an incompatible) one since) can avoid the fork (if
> performance is an issue).
[...]

Another one that doesn't involve a fork is ast-open's ls, which
can be made a ksh93 builtin.

Unfortunately, for device files, it gives the dev maj/min of the
device file itself instead of its st_dev:

~$ ksh -c 'command /opt/ast/bin/ls -dZ "%(dev)s" /dev/.'
000,006
~$ ksh -c 'command /opt/ast/bin/ls -dZ "%(dev)s" /dev/sda'
008,000

It gives confusing results if you use %(dev)u or %..16(dev)u

If POSIX were to specify a proper interface to lstat()/stat(),
that ls -Z could be a good candidate (maybe with that %(dev)s
not always giving st_dev issue fixed and a way to output
NUL-delimited or shell-quoted records (-K)).

-- 
Stephane

-- 



Re: Testing if two files are on the same file system

2017-10-24 Thread Stephane Chazelas
2017-10-24 09:06:19 -0700, Nick Stoughton:
> > If you are correct, this is a Linux kernel bug.
> 
> Why? The stat command is not standardized. The --file-system argument
> changes the output of %i to something other than the inode ... no bug.
[...]

Actually, the Linux man page for statfs(2)
(http://man7.org/linux/man-pages/man2/statfs.2.html) says:

> Nobody knows what f_fsid is supposed to contain (but see below).
[...]
> The f_fsid field
> Solaris, Irix and POSIX have a system call statvfs(2) that returns a
> struct statvfs (defined in ) containing an unsigned
> long f_fsid.  Linux, SunOS, HP-UX, 4.4BSD have a system call statfs()
> that returns a struct statfs (defined in ) containing a
> fsid_t f_fsid, where fsid_t is defined as struct { int val[2]; }.
> The same holds for FreeBSD, except that it uses the include file
> .
>
> The general idea is that f_fsid contains some random stuff such that
> the pair (f_fsid,ino) uniquely determines a file.  Some operating
> systems use (a variation on) the device number, or the device number
> combined with the filesystem type.  Several operating systems
> restrict giving out the f_fsid field to the superuser only (and zero
> it for unprivileged users), because this field is used in the
> filehandle of the filesystem when NFS-exported, and giving it out is
> a security concern.
>
> Under some operating systems, the fsid can be used as the second
> argument to the sysfs(2) system call.

And it's true POSIX doesn't seem to say what that is for. In
any case, that would seem to confirm that that can't be used
reliably to identify mounted file systems uniquely. The
stat().st_dev is probably better (gstat -c %D, zstat +dev...)

POSIX does tell us that:

> The st_ino and st_dev fields taken together uniquely identify
> the file within the system

(on Linux that's true, though note that a file could be found
under different mount points with different mount options, for
instance as  /a/file and /b/file where /b is a read-only
bind-mount of /a).

-- 
Stephane



Re: Testing if two files are on the same file system

2017-10-24 Thread Nick Stoughton
> If you are correct, this is a Linux kernel bug.

Why? The stat command is not standardized. The --file-system argument
changes the output of %i to something other than the inode ... no bug.
-- 
Nick

On Tue, Oct 24, 2017 at 2:26 AM, Joerg Schilling <
joerg.schill...@fokus.fraunhofer.de> wrote:

> Vincent Lefevre  wrote:
>
> > as with "stat --file-system --format=%i", the output is 0 for any file
> > on a tmpfs.
>
> If you are correct, this is a Linux kernel bug.
>
> The original tmpfs implementation on Solaris does not have this problem.
>
>
> Jörg
>
> --
>  EMail:jo...@schily.net(home) Jörg Schilling D-13353
> Berlin
> joerg.schill...@fokus.fraunhofer.de (work) Blog:
> http://schily.blogspot.com/
>  URL: http://cdrecord.org/private/ http://sf.net/projects/
> schilytools/files/'
>
>


Re: Testing if two files are on the same file system

2017-10-24 Thread Andries E. Brouwer
On Tue, Oct 24, 2017 at 10:22:42AM +0100, Stephane Chazelas wrote:
> 2017-10-21 12:39:25 +0200, Andries E. Brouwer:
> > On Sat, Oct 21, 2017 at 09:26:04AM +0100, Geoff Clare wrote:
> > 
> > > Linux allows two mounts to the same mount point.  (I tested it with
> > > "mount --bind ..."; not sure if it would work with a "real" mount.)
> > > However, the files in the first mount are inaccessible until the
> > > second is unmounted, so passing that mount point (or anything below
> > > it) to "df -P" only shows the second mount.
> > 
> > It is just the semantics of mount and path resolution:
> > after mount(fs,dir) the name dir refers to the root of fs.
> > (But the inode of dir is unchanged.)
> 
> The inode of "dir" does change, it becomes the inode of "/" on fs.

You misunderstand what I wrote.

The system call mount() is a namespace operation.
It changes the interpretation of certain path names.

After the mount, the full pathname for dir will refer to the root
of the new fs. But dir itself is still there, and references to it
other than by pronouncing the pathname, for example via an open file,
or via a relative pathname in the overmounted fs, will get you the old inode.

Andries



Re: Testing if two files are on the same file system

2017-10-24 Thread Joerg Schilling
Stephane Chazelas  wrote:

> However if "dir" is (was) ".", the inode of "." will stay
> unchanged (the cwd case you mentioned).
>
> /tmp/1$ sudo mount -t tmpfs x $PWD
> /tmp/1$ ls -id . $PWD
> 311402 .   2 /tmp/1
> $ df -P  . $PWD
> Filesystem  512-blocks Used Available Capacity Mounted on
> /dev/mapper/VG-LV 15350768 12893888   1654064  89% /
> x  61467000   6146700   0% /tmp/1
>
> And that's where you could get the same mountpoint for two different
> filesystems.

See example below

> GNU df on Linux at least is also easily fooled. If you do:
>
> mount -t tmpfs x 1/2
> mount -t tmpfs y 1
> mkdir 1/2
> df -P 1/2
>
> You see "x" instead of "y".

This is caused by a recently mentioned bug in the Linux kernel.

See what happens with the tmpfs original:

mkdir /tmp/tmp
mount -F tmpfs swap /tmp/tmp

stat /tmp /tmp/tmp/
  File: `/tmp'
  Size: 6327Blocks: 16 IO Block: 4096   directory
Device: 6080002h/101187586d Inode: 961340467   Links: 21
Access: (1777/drwxrwxrwt)  Uid: (0/root)   Gid: (3/ sys)
Access: 2017-10-24 11:41:32.694238858 +0200
Modify: 2017-10-24 11:41:03.363144908 +0200
Change: 2017-10-24 11:41:03.363144908 +0200
  File: `/tmp/tmp/'
  Size: 117 Blocks: 8  IO Block: 4096   directory
Device: 6080004h/101187588d Inode: 1010466645  Links: 2
Access: (0755/drwxr-xr-x)  Uid: (0/root)   Gid: (0/root)
Access: 2017-10-24 11:41:19.805544706 +0200
Modify: 2017-10-24 11:41:19.805548235 +0200
Change: 2017-10-24 11:41:19.805548235 +0200

You see, st_dev differs for both mounts.

umount /tmp/tmp 
mount -F tmpfs bla /tmp/tmp 
df /tmp/tmp
DateisystemkByte  belegt verfügbar Kapazität Eingehängt auf
swap 1114088   0 1114088 0%/tmp/tmp

The background storage argument is ignored and the df output correctly names 
the actual background storage.

Jörg

-- 
 EMail:jo...@schily.net(home) Jörg Schilling D-13353 Berlin
joerg.schill...@fokus.fraunhofer.de (work) Blog: http://schily.blogspot.com/
 URL: http://cdrecord.org/private/ http://sf.net/projects/schilytools/files/'



Re: Testing if two files are on the same file system

2017-10-24 Thread Joerg Schilling
Vincent Lefevre  wrote:

> as with "stat --file-system --format=%i", the output is 0 for any file
> on a tmpfs.

If you are correct, this is a Linux kernel bug.

The original tmpfs implementation on Solaris does not have this problem.


Jörg

-- 
 EMail:jo...@schily.net(home) Jörg Schilling D-13353 Berlin
joerg.schill...@fokus.fraunhofer.de (work) Blog: http://schily.blogspot.com/
 URL: http://cdrecord.org/private/ http://sf.net/projects/schilytools/files/'



Re: Testing if two files are on the same file system

2017-10-24 Thread Stephane Chazelas
2017-10-21 12:39:25 +0200, Andries E. Brouwer:
> On Sat, Oct 21, 2017 at 09:26:04AM +0100, Geoff Clare wrote:
> 
> > Linux allows two mounts to the same mount point.  (I tested it with
> > "mount --bind ..."; not sure if it would work with a "real" mount.)
> > However, the files in the first mount are inaccessible until the
> > second is unmounted, so passing that mount point (or anything below
> > it) to "df -P" only shows the second mount.
> 
> It is just the semantics of mount and path resolution:
> after mount(fs,dir) the name dir refers to the root of fs.
> (But the inode of dir is unchanged.)

The inode of "dir" does change, it becomes the inode of "/" on
fs. (note however that as discussed recently, while stat("dir")
will return the new inode, a readdir() on the parent of "dir"
may return the old inode for "dir" in the entry.d_ino).

However if "dir" is (was) ".", the inode of "." will stay
unchanged (the cwd case you mentioned).

/tmp/1$ sudo mount -t tmpfs x $PWD
/tmp/1$ ls -id . $PWD
311402 .   2 /tmp/1
$ df -P  . $PWD
Filesystem  512-blocks Used Available Capacity Mounted on
/dev/mapper/VG-LV 15350768 12893888   1654064  89% /
x  61467000   6146700   0% /tmp/1

And that's where you could get the same mountpoint for two different
filesystems.

I suppose chroots and mount namespaces add more complications.

GNU df on Linux at least is also easily fooled. If you do:

mount -t tmpfs x 1/2
mount -t tmpfs y 1
mkdir 1/2
df -P 1/2

You see "x" instead of "y".


More comments on the discussion so far:

tmpfs on Linux
##

On Linux, for tmpfs, the fs name is neither "tmpfs" nor
"swap", it's the (arbitrary and ignored) source name you gave to
mount(). That's the case for any filesystem like proc, sysfs,
devpts where the source name is not relevant.

It can even be the empty string (which some implementations of
df like busybox or ast-open choke on btw).

using sed on the output of df
#

the output "df -P" is only specified in the POSIX locale.

I'm not sure what the output of "LC_ALL=C df -P" should be for a
mount point like a UTF-8 /home/stéphane

Even if we assume it doesn't transform that stéphane to st?phane
or st??phane, that is still not post processable by sed POSIXly
as the output is not necessarily text. Since it outputs 2 paths
plus some extra text, it can be up to 2 * PATH_MAX
+ a bit and even more, so possibly greater than LINE_MAX
(PATH_MAX can be and generally is greated than LINE_MAX) making
it non-text.

On Linux at least, you can mount file systems at arbitrary
depths (even greater than PATH_MAX if you use relative paths),
though all df implementations I tried choke if they can't stat()
the absolute path of the mount point.

stat implementations


GNU stat is one of many stat implementations. Many systems have
their own to palliate the lack of a proper command line
interface to the lstat() and stat() system calls. The only
advantage of stat() over "find -printf" that predates it by
decades is actually its interface to statfs(). (the format of
gfind -printf is incompatible and better for formatting dates for
instance and generally more intuitive).

In chronological order, there's IRIX stat (though I don't think
IRIX is maintained anymore), the zsh stat builtin, GNU stat and
FreeBSD stat at least.

AIX has a "istat".

All those (and gfind -printf) should be able to tell you the
st_dev of the file. Only the zsh stat builtin (can be loaded as
"zstat" to avoid overriding the system's stat for those that now
have added (an incompatible) one since) can avoid the fork (if
performance is an issue).

symlinks and device files
#

Depending on the stat implementation, for symlinks, you'll get
the st_dev of the symlink or the file it points to, while
AFAICT, df gives you the information for the target of the
symlink (POSIX leaves it unspecified and unspecified for
anything but regular/directory, and for devices gives you the
info for the file system on the device if any and if mounted
instead of the file system that device file is on)

FS vs mount point
#

st_dev identifies a file system. On Linux at least, a file
system can be mounted in several spaces possibly with different
options like readonly, nodev... (using bind mounts).

df will try (and sometimes fail as seen above) to identify the
mount point of the FS the file is on. That's another reason why
using st_dev and df are not equivalent methods

malicious users forging arbitrary mount devices or points
#

Beside FUSE already mentioned, users may be able to forge
arbitrary mount points in the output of df with automounts or
ZFS mount delegations at least.

It's not safe to assume that mount points and mount sources are
a privileged namespace.

But, I wouldn't do directory traversal on files (let alone file
systems!) under the 

Re: Testing if two files are on the same file system

2017-10-23 Thread Vincent Lefevre
On 2017-10-18 17:27:04 +0200, Joerg Wunsch wrote:
> As Wheeler, David A wrote:
> 
> > It's not just spaces.  Filesystem names may contain newlines and
> > other control characters, too, so "df -P" is fundamentally unsafe.
> 
> Well, it's a question of whether your goal is to be always on the
> safe side, or just pragmatically to cope with a number of really
> existing operating systems.
> 
> Unlike a filename, a filesystem name is nothing that could be invented
> by Mr. Malicious User, so if the only point is to handle OSX as well
> as (say) Linux, BSD, Solaris etc. the pragmatic way that has been
> posted might suffice.

Are you sure about that? Even with FUSE on Linux?

-- 
Vincent Lefèvre  - Web: 
100% accessible validated (X)HTML - Blog: 
Work: CR INRIA - computer arithmetic / AriC project (LIP, ENS-Lyon)



Re: Testing if two files are on the same file system

2017-10-21 Thread Andries E. Brouwer
On Sat, Oct 21, 2017 at 09:26:04AM +0100, Geoff Clare wrote:

> Linux allows two mounts to the same mount point.  (I tested it with
> "mount --bind ..."; not sure if it would work with a "real" mount.)
> However, the files in the first mount are inaccessible until the
> second is unmounted, so passing that mount point (or anything below
> it) to "df -P" only shows the second mount.

It is just the semantics of mount and path resolution:
after mount(fs,dir) the name dir refers to the root of fs.
(But the inode of dir is unchanged.)

It is not completely true that the overmounted filesystem
is inaccessible: if some process has its cwd in the overmounted tree
then it can still walk around and refer to files using relative pathnames.

This is not only Linux.

Andries



Re: Testing if two files are on the same file system

2017-10-21 Thread Joerg Schilling
> Hmm... on Linux, there can be several tmpfs mounts and they all have the
> same file system name in the first column. Example:
>
> tmpfs 33039212   0 33039212   0% /dev/shm
> tmpfs 33039212 3277180 29762032  10% /run
> tmpfs 5120   0 5120   0% /run/lock
> tmpfs 33039212   0 33039212   0% /sys/fs/cgroup

The string "tmpfs" is a result of a problem in the tmpfs clone used on Linux.

The original code fom SunOS, introduced around 1988, lists the expected string: 
The "background storage" that I mentioned before.

swap 1429648  108088 1321560 8%/tmp


tmpfs is based on anonymous pages from the swap space that in general is a 
virtual "device" built on several swap devices.

Since several tmpfs mounts all share the same background storage, SunOS lists 
the same name with several tmpfs mounts as well.

Jörg

-- 
 EMail:jo...@schily.net(home) Jörg Schilling D-13353 Berlin
joerg.schill...@fokus.fraunhofer.de (work) Blog: http://schily.blogspot.com/
 URL: http://cdrecord.org/private/ http://sf.net/projects/schilytools/files/'



Re: Testing if two files are on the same file system

2017-10-21 Thread Geoff Clare
Martijn Dekker  wrote, on 21 Oct 2017:
>
> Op 19-10-17 om 15:06 schreef Martijn Dekker:
> > Op 18-10-17 om 16:11 schreef Geoff Clare:
> >> After the filesystem name there are four numeric fields with a trailing '%'
> >> on the fourth.  Treating this as the terminator for the filesystem name
> >> ought to be good enough in practice.  It would only not work in the
> >> extremely unlikely event that a filesystem name has within it four numeric
> >> fields with a trailing '%' on the fourth, or if two different filesystem
> >> names are the same when trailing blanks are removed.
> >>
> >> I tried it with the following command and it seems to work:
> >>
> >> df -P file1 file2 |
> >> sed '1d;s/\([[:blank:]]\{1,\}[[:digit:]]\{1,\}\)\{4\}%[[:blank:]].*//'
> > 
> > That's very clever, and does seem to be fairly bullet-proof. Thanks!
> 
> Hmm... on Linux, there can be several tmpfs mounts and they all have the
> same file system name in the first column. Example:
> 
> tmpfs 33039212   0 33039212   0% /dev/shm
> tmpfs 33039212 3277180 29762032  10% /run
> tmpfs 5120   0 5120   0% /run/lock
> tmpfs 33039212   0 33039212   0% /sys/fs/cgroup
> 
> Looks like we also need the mount point to uniquely identify a file
> system. A little tweak to the sed incantation accomplishes that:
> 
> df -P /dev/shm /run |
> sed '1d;
>   s/\([[:blank:]]\{1,\}[[:digit:]]\{1,\}\)\{4\}%[[:blank:]]\{1,\}/,/'
> 
> gives
> 
> tmpfs,/dev/shm
> tmpfs,/run
> 
> But then, wouldn't it be sufficient simply to use the mount point only?
> Can two file systems ever be on the same mount point?

Linux allows two mounts to the same mount point.  (I tested it with
"mount --bind ..."; not sure if it would work with a "real" mount.)
However, the files in the first mount are inaccessible until the
second is unmounted, so passing that mount point (or anything below
it) to "df -P" only shows the second mount.

So I think just comparing the mount points is indeed sufficient.

-- 
Geoff Clare 
The Open Group, Apex Plaza, Forbury Road, Reading, RG1 1AX, England



Re: Testing if two files are on the same file system

2017-10-19 Thread Martijn Dekker
Op 19-10-17 om 15:14 schreef Joerg Wunsch:
> As Martijn Dekker wrote:
> 
>> There is another, less unlikely problem, though: a file system could be
>> mounted on a directory with a name containing a newline, which would
>> break line-based parsing.
> 
> Pipe the output of df -P through "tail -n +2 | head -1". :-)
> 
> That way, you don't have to care whether the mountpoint adds rubbish
> after a newline.  As I understand it, you are going to apply df -P to
> a single file at a time only, so it's sure its filesystem name will
> always at the beginning of line 2.

Well no, the idea was to apply 'df -P' to two files at once. Once you've
got two lines in a variable, it's fast and easy to use POSIX parameter
substitutions to split and compare them.

You're right, I could avoid the whole problem by using two separate 'df
-P' invocations and parsing them separately. But that would double the
overhead. This feature is especially useful in directory traversal
loops, so performance is important.

An actual case of a file system being mounted on a directory with a name
containing a newline is a sure sign of something dodgy going on, so I
don't really mind killing a modernish program outright if that condition
is detected.

To give people a more concerete idea of what I'm talking about, below is
my function that now performs this feature on systems without GNU
'stat'. It's an internal function, meant to be invoked indirectly via
'is onsamefs file1 file2' (don't follow symlinks) or 'is -L onsamefs
file1 file2' (follow symlinks).

The bulk of the overhead (i.e.: the subshell forks) is in the command
substitution at the beginning. This function performs the feature using
just two or three forks, depending on the shell.

Notes:
- $CCn is a newline;
- DEFPATH=$(getconf PATH);
- die() performs a reliable emergency program halt on scripts (the '||
return' is only ever reached on an interactive shell that traps and
ignores SIGINT).

_Msh_testOnSameFs() {
  _Msh_is=$(export LC_ALL=C POSIXLY_CORRECT=Y "PATH=$DEFPATH"
 unset -f df sed  # QRK_EXECFNBI compat
 exec df -P -- "$1" "$2" | exec sed "1d;
 s/\([[:blank:]]\{1,\}[[:digit:]]\{1,\}\)\{4\}%[[:blank:]].*//")
  # Sanity check: verify that exactly 2 lines were produced.
  case ${_Msh_is} in
  (*"$CCn"*"$CCn"*) die "is onsamefs: internal error 1" || return ;;
  (*"$CCn"*)  ;;
  (*) die "is onsamefs: internal error 2" || return ;;
  esac
  # If the two lines are identical, the files are on the same filesystem.
  case ${_Msh_is#*"$CCn"} in
  ("${_Msh_is%%"$CCn"*}") unset -v _Msh_is ;;
  (*) ! unset -v _Msh_is ;;
  esac
}

The more robust and faster version using GNU 'stat' (just one fork) is
identical, except for the command substitution which looks like:

  _Msh_is=$(export LC_ALL=C POSIXLY_CORRECT=Y
exec /opt/local/bin/gstat --file-system --format=%i,%t -- "$1" "$2")

(The presence of GNU 'stat' and its specific path are of course detected
at init time.)

I'll push the new feature to github tonight after some more testing, so
people can try to break it. :)

- Martijn



Re: Testing if two files are on the same file system

2017-10-19 Thread Joerg Wunsch
As Martijn Dekker wrote:

> There is another, less unlikely problem, though: a file system could be
> mounted on a directory with a name containing a newline, which would
> break line-based parsing.

Pipe the output of df -P through "tail -n +2 | head -1". :-)

That way, you don't have to care whether the mountpoint adds rubbish
after a newline.  As I understand it, you are going to apply df -P to
a single file at a time only, so it's sure its filesystem name will
always at the beginning of line 2.

-- 
cheers, Joerg   .-.-.   --... ...--   -.. .  DL8DTL

http://www.sax.de/~joerg/
Never trust an operating system you don't have sources for. ;-)



Re: Testing if two files are on the same file system

2017-10-19 Thread Martijn Dekker
Op 19-10-17 om 13:12 schreef Joerg Schilling:
> No, the first field usually is the background storage, it may be the filesytem
> name in some cases.

Whatever it technically may be, the POSIX spec calls it a file system
name, so I'm using that term to be consistent with it.
http://pubs.opengroup.org/onlinepubs/9699919799/utilities/df.html#tag_20_33_10

> You need to look at the 2nd line and you may use the '%' sign to synchronize
> fields and you may even do this:
> 
> /usr/xpg4/bin/df -P /etc/passwd /etc/hosts
> Dateisystem   512-Blöcke  Belegt   Verfügbar Kapazität Eingehängt 
> auf
> /dev/dsk/c1t0d0s3   2420396023115442  84648097%/
> /dev/dsk/c1t0d0s3   2420396023115442  84648097%/
> 
> and just compare the last two lines.

What if this is done while something else is writing to a file? As far
as I can tell, there's nothing to say the number of free blocks can't
change between printing the first and second lines.

As for the '%' sign method, see Geoff's message and my reply to it.

- M.



Re: Testing if two files are on the same file system

2017-10-19 Thread Martijn Dekker
Op 18-10-17 om 16:11 schreef Geoff Clare:
> After the filesystem name there are four numeric fields with a trailing '%'
> on the fourth.  Treating this as the terminator for the filesystem name
> ought to be good enough in practice.  It would only not work in the
> extremely unlikely event that a filesystem name has within it four numeric
> fields with a trailing '%' on the fourth, or if two different filesystem
> names are the same when trailing blanks are removed.
> 
> I tried it with the following command and it seems to work:
> 
> df -P file1 file2 |
> sed '1d;s/\([[:blank:]]\{1,\}[[:digit:]]\{1,\}\)\{4\}%[[:blank:]].*//'

That's very clever, and does seem to be fairly bullet-proof. Thanks!

There is another, less unlikely problem, though: a file system could be
mounted on a directory with a name containing a newline, which would
break line-based parsing. These days, with things like FUSE, you don't
even need to be root to perform a mount. So a cleverly crafted mount
could conceivably be abused to manipulate the behaviour of a script
using this technique.

But modernish can at least detect this condition and reduce it to a
simple denial of service: if the parsed 'df -P' output doesn't contain
exactly two lines, things have gone pear-shaped and the program should
halt immediately[*] to avoid potential disaster.

I'll also have to make it detect GNU 'stat' and use 'gstat --file-system
--format=%i' instead where possible, because that's the only properly
robust way I know of and it's fairly widespread.

- Martijn

[*]
https://github.com/modernish/modernish#user-content-reliable-emergency-halt



Re: Testing if two files are on the same file system

2017-10-19 Thread Geoff Clare
Joerg Schilling  wrote, on 19 Oct 2017:
>
> You need to look at the 2nd line and you may use the '%' sign to synchronize
> fields and you may even do this:
> 
> /usr/xpg4/bin/df -P /etc/passwd /etc/hosts
> Dateisystem   512-Blöcke  Belegt   Verfügbar Kapazität Eingehängt 
> auf
> /dev/dsk/c1t0d0s3   2420396023115442  84648097%/
> /dev/dsk/c1t0d0s3   2420396023115442  84648097%/
> 
> and just compare the last two lines.

Comparing the whole of the last two lines is not reliable, as some
of the numbers could change between when df processes the first
pathname and when it processes the second.

See my previous email in this thread for how to use the four numeric
fields (and the '%' on the fourth) as a "terminator" for the filesystem
name.

-- 
Geoff Clare 
The Open Group, Apex Plaza, Forbury Road, Reading, RG1 1AX, England



Re: Testing if two files are on the same file system

2017-10-19 Thread Joerg Schilling
Martijn Dekker  wrote:

> Is there a way, using POSIX shell and utilities, to reliably test if two
> files are on the same file system? It seems like a basic feature that
> the shell should have access to, so I'd like to add it to the modernish
> shell library.
>
> The only POSIX command I've found so far that identifies the file system
> that a file is on, is 'df -P', so I've experimented with parsing its
> output. Given two file arguments, it normally outputs their file system
> names in the first field of the second and third lines. The fields are
> separated by one or more spaces.

No, the first field usually is the background storage, it may be the filesytem
name in some cases.

> Unfortunately, on the Mac, some of the file system names contain spaces
> (on my system: "map -hosts" and "map auto_home"). POSIX doesn't seem to
> prohibit this: file system names are considered implementation-defined[*].
>
> Given that there is a fixed number of fields, it would be possible to
> cope with this if no other fields can contain spaces, but the last field
> is a directory name that may also contain spaces -- or even newlines.
>
> So the output of 'df -P' is unparseable.

Did you really check a POSIX compliant "df" output?

Here is an example:

/usr/xpg4/bin/df -P /etc/passwd
Dateisystem   512-Blöcke  Belegt   Verfügbar Kapazität Eingehängt 
auf
/dev/dsk/c1t0d0s3   2420396023115442  84648097%/

You need to look at the 2nd line and you may use the '%' sign to synchronize
fields and you may even do this:

/usr/xpg4/bin/df -P /etc/passwd /etc/hosts
Dateisystem   512-Blöcke  Belegt   Verfügbar Kapazität Eingehängt 
auf
/dev/dsk/c1t0d0s3   2420396023115442  84648097%/
/dev/dsk/c1t0d0s3   2420396023115442  84648097%/

and just compare the last two lines.

Jörg

-- 
 EMail:jo...@schily.net(home) Jörg Schilling D-13353 Berlin
joerg.schill...@fokus.fraunhofer.de (work) Blog: http://schily.blogspot.com/
 URL: http://cdrecord.org/private/ http://sf.net/projects/schilytools/files/'



RE: Testing if two files are on the same file system

2017-10-18 Thread Wheeler, David A
Joerg Wunsch [mailto:aus...@uriah.heep.sax.de]:
> Unlike a filename, a filesystem name is nothing that could be invented by
> Mr. Malicious User, so if the only point is to handle OSX as well as (say)
> Linux, BSD, Solaris etc. the pragmatic way that has been posted might
> suffice.

My mistake.  I read "filesystem name" as "filename" which are not the same 
thing.

> But in general, I agree with your argument.

Thanks.

That earlier reminded me of my continued concern that there should be *easy* 
standard ways to safely handle filenames.  The current mechanisms don't 
adequately do the job.  I'd like to see things improved.

--- David A. Wheeler




Re: Testing if two files are on the same file system

2017-10-18 Thread Joerg Wunsch
As Wheeler, David A wrote:

> It's not just spaces.  Filesystem names may contain newlines and
> other control characters, too, so "df -P" is fundamentally unsafe.

Well, it's a question of whether your goal is to be always on the
safe side, or just pragmatically to cope with a number of really
existing operating systems.

Unlike a filename, a filesystem name is nothing that could be invented
by Mr. Malicious User, so if the only point is to handle OSX as well
as (say) Linux, BSD, Solaris etc. the pragmatic way that has been
posted might suffice.

But in general, I agree with your argument.
-- 
cheers, Joerg   .-.-.   --... ...--   -.. .  DL8DTL

http://www.sax.de/~joerg/
Never trust an operating system you don't have sources for. ;-)



Re: Testing if two files are on the same file system

2017-10-18 Thread Geoff Clare
Joerg Wunsch  wrote, on 18 Oct 2017:
>
> As Martijn Dekker wrote:
> 
> > Is there a way, using POSIX shell and utilities, to reliably test if two
> > files are on the same file system?
> 
> df -P appears to be required to have the filesystem name as the first
> column.  Filesystem names with a space however might be a problem, at
> least if they contain a number after the space since that cannot be
> distinguished from the number of blocks.

After the filesystem name there are four numeric fields with a trailing '%'
on the fourth.  Treating this as the terminator for the filesystem name
ought to be good enough in practice.  It would only not work in the
extremely unlikely event that a filesystem name has within it four numeric
fields with a trailing '%' on the fourth, or if two different filesystem
names are the same when trailing blanks are removed.

I tried it with the following command and it seems to work:

df -P file1 file2 |
sed '1d;s/\([[:blank:]]\{1,\}[[:digit:]]\{1,\}\)\{4\}%[[:blank:]].*//'

-- 
Geoff Clare 
The Open Group, Apex Plaza, Forbury Road, Reading, RG1 1AX, England



Re: Testing if two files are on the same file system

2017-10-18 Thread Joerg Wunsch
As Martijn Dekker wrote:

> Is there a way, using POSIX shell and utilities, to reliably test if two
> files are on the same file system?

df -P appears to be required to have the filesystem name as the first
column.  Filesystem names with a space however might be a problem, at
least if they contain a number after the space since that cannot be
distinguished from the number of blocks.
-- 
cheers, Joerg   .-.-.   --... ...--   -.. .  DL8DTL

http://www.sax.de/~joerg/
Never trust an operating system you don't have sources for. ;-)