Re: Bug#369822: ls -i stats unnecessarily

2008-07-07 Thread Jim Meyering
Thanks to the prod from Wayne Pollock,
I've just revived this thread:

  http://thread.gmane.org/gmane.comp.gnu.coreutils.bugs/14020


___
Bug-coreutils mailing list
Bug-coreutils@gnu.org
http://lists.gnu.org/mailman/listinfo/bug-coreutils


Re: Bug#369822: ls -i stats unnecessarily

2006-06-05 Thread Paul Eggert
Ian Jackson [EMAIL PROTECTED] writes:

 the inum on its own isn't useful if you don't know either (a) there
 are no mountpoints here or (b) exactly where the mountpoints are.

No, it can be useful.  For example, suppose you have the inum Fi of
some other file F, and want to know whether this directory entry could
be that of F.  If readdir returns d_ino == Ri, and if Ri != Fi, then
you know that it can't possibly be a match.  This sort of optimization
can be quite useful to avoid a stat, but it doesn't work if d_ino
might be junk from an underlying file system.

On operating systems where d_ino might be junk at mount points, we
have to suppress optimizations like this.  On operating systems where
d_ino is reliable at mount points, our code can run faster.

 If you know it's a mounted filesystem, you can ask
   ls -id /mount/point/.

Sure, but I shouldn't have to ask pretty please, can I get the
correct inode number?  readdir should give me the correct inode
number in the first place.

 Eg,
   inum=`ls -id /unexpected/mount/point/.`
   find / -xdev -inum $inum
 isn't going to work properly !

Sure it works properly.  Any file in the subsidiary file system will
have an inode number that is allocated independently of inode numbers
of files in the root file system.  If I changed the example to this:

   inum=`ls -id /unexpected/mount/point/somefile`
   find / -xdev -inum $inum

then that works properly, in that it attempts to find a file in the
root file system whose inode number is the same as the inode number of
somefile.  Your example should work the same way, with . rather
than somefile.


Anyway, I just now checked Posix on this and found that it allows
implementations of readdir to return a random value for d_ino!  Wow.
I filed an Aardvark about it here, which you are welcome to follow up:

http://www.opengroup.org/sophocles/show_mail.tpl?source=Llistname=austin-review-lid=2070


___
Bug-coreutils mailing list
Bug-coreutils@gnu.org
http://lists.gnu.org/mailman/listinfo/bug-coreutils


Re: Bug#369822: ls -i stats unnecessarily

2006-06-05 Thread Ian Jackson
Paul Eggert writes (Re: Bug#369822: ls -i stats unnecessarily):
 Ian Jackson [EMAIL PROTECTED] writes:
  This behaviour is expected: if you readdir the directory containing
  a mountpoint, you get the inode number of the directory in the
  underlying filesystem;
 
 That's not the behavior that I expected.  Also, it's not useful
 behavior--at least, it's not useful for the vast majority of
 real-world applications.  In contrast, it is useful for 'ls -i' to
 print the inode number of the root of the mounted file system, for
 'find -inum' to use that inode number, and so forth.

If you know it's a mounted filesystem, you can ask
  ls -id /mount/point/.
instead of
  ls -id /mount/point

If you don't know it's a mounted filesystem then the inode number of
the mounted filesystem is useless to you.  Eg,
  inum=`ls -id /unexpected/mount/point/.`
  find / -xdev -inum $inum
isn't going to work properly !

 I can understand why readdir might have the behavior that you
 describe: it might be more efficient internally.  But that doesn't
 make it correct, or even expected.  It's a bug in readdir.

You might say that it's a deficiency in the readdir interface, as well
as in ls -i, etc. etc., that it doesn't provide the dev as well as the
inum.  But however you look at it, the inum on its own isn't useful if
you don't know either (a) there are no mountpoints here or (b) exactly
where the mountpoints are.  In case (a) you don't care about the
distinction; in case (b) you can compensate with stat.

Ian.


___
Bug-coreutils mailing list
Bug-coreutils@gnu.org
http://lists.gnu.org/mailman/listinfo/bug-coreutils


Re: Bug#369822: ls -i stats unnecessarily

2006-06-03 Thread James Youngman

On 6/2/06, Ian Jackson [EMAIL PROTECTED] wrote:


There are I think two approaches to this problem:
 * find a list of mountpoints in some system-specific way
   for each one stat mountpoint/..


I would strongly advise against this option.   Briefly, findutils did
this for other reasons (as part of its symlink race condition paranoia
checking).  It makes the program hang is the system is a client of a
dead NFS server, even if the user is not using ls to work with
filesystems on that server.

There were many bug reports.  I ended up finding an alternative way to
solve the problem.

If I was trying to diagnose a problem on a client of a dead NFS
server, I would expect ls to _help_ in the diagnosis, not be
affected by the problem too.  For a fundamental tool like ls, it is
reasonable to favour robusness over performance.

I suppose one could have a fast /bin/ls and a robust /sbin/ls, but
that would I think only lead to people including the wrong binary in
rescue disks (having said this though of course that special niche is
often filled by busybox anyway).

James.


___
Bug-coreutils mailing list
Bug-coreutils@gnu.org
http://lists.gnu.org/mailman/listinfo/bug-coreutils


Re: Bug#369822: ls -i stats unnecessarily

2006-06-03 Thread Ian Jackson
Jim Meyering writes (Re: Bug#369822: ls -i stats unnecessarily):
 So at least Solaris 8 and some glibc are affected.

Err, what's glibc got to do with it ?  This behaviour is expected: if
you readdir the directory containing a mountpoint, you get the inode
number of the directory in the underlying filesystem; if you then stat
the mountpoint, you get the inode number of the root of the
filesystem mounted on top.

There are I think two approaches to this problem:
 * find a list of mountpoints in some system-specific way
   for each one stat mountpoint/..
   compare device and inode with those of the directory we're readdir'ing
 * provide an option to allow the user to specify that they don't
   mind the inode numbers of mountpoints being wrong

The 2nd is easier and certainly less fragile, and since this no-stat
optimisation is only necessary in some specialised applications (of
which I happen to have an application where it's absolutely essential
because statting each file takes far far too long), it's not that
unreasonable to demand a special option.

 unless I find a better approach, I'll turn off this optimization
 by default, and add an option to turn it back on.

Right.

Ian.


___
Bug-coreutils mailing list
Bug-coreutils@gnu.org
http://lists.gnu.org/mailman/listinfo/bug-coreutils


Re: Bug#369822: ls -i stats unnecessarily

2006-06-03 Thread Ian Jackson
Ian Jackson writes (Re: Bug#369822: ls -i stats unnecessarily):
 There are I think two approaches to this problem:
  * find a list of mountpoints in some system-specific way
for each one stat mountpoint/..
compare device and inode with those of the directory we're readdir'ing
  * provide an option to allow the user to specify that they don't
mind the inode numbers of mountpoints being wrong

Someone has just pointed out to me that no matter what you do, you
don't get the dev for the covering filesystem.  So returning the inum
of the root of the covering fs is definitely wrong and should never be
done.

Think about it: if you ls -i anywhere near a mount point you're
_inevitably_ going to get useless data because the output doesn't
contain devs.  So anyone who does ls -i usefully must know that there
are no mountpoints and this whole issue can be ignored.

Ian.


___
Bug-coreutils mailing list
Bug-coreutils@gnu.org
http://lists.gnu.org/mailman/listinfo/bug-coreutils


Re: Bug#369822: ls -i stats unnecessarily

2006-06-03 Thread Eric Blake
 
 I can understand why readdir might have the behavior that you
 describe: it might be more efficient internally.  But that doesn't
 make it correct, or even expected.  It's a bug in readdir.

I agree - readdir should ALWAYS match stat, even in the face of
mount points, if it is going to provide d_ino.  Even cygwin developers
took great care in their upcoming 1.5.20 release to ensure that
mount points are handled correctly, such that readdir reports
the same value that stat will provide.

Is there an easy configure or runtime test we can do to detect
buggy readdir implementations, and always do stat on those
platforms?

-- 
Eric Blake


___
Bug-coreutils mailing list
Bug-coreutils@gnu.org
http://lists.gnu.org/mailman/listinfo/bug-coreutils


Re: Bug#369822: ls -i stats unnecessarily

2006-06-02 Thread Jim Meyering
I wrote:
 2006-02-25  Eric Blake  [EMAIL PROTECTED]

   In ls, avoid calling stat for --inode (-i), when possible.
   * src/pwd.c (NOT_AN_INODE_NUMBER, D_INO): Move to ...
   * src/system.h: ... here, for use in ...
   * src/ls.c (main): ... here.  Prefer dirent.d_ino to stat when
   possible.
   (gobble_file): Add inode argument.
   (print_dir): Pass inode if available.
   (usage): Remove inaccuracy.

 The problem is rare enough that I won't be losing any
 sleep over it.  But it would be good to fix it, or at least
 add a test case comparing st_ino vs. d_ino for every readable
 directory from . up to /.  Then (from test failure reports)
 we can hope to get an idea of how often the problem arises.

Well, chance would have it that just minutes ago I saw this
new test fail on a solaris 8 system:
[ This test is on the trunk, only, as is the no-stat-for-inode
  ls optimization. ]

  ./stat-vs-dirent: test failed: /export/home: d_ino(16768) != st_ino(2)
  ./stat-vs-dirent: This may indicate a flaw in your kernel or file system 
implementation.

I see that I wrote that test less than a month ago.  Humph.
And /export/home is indeed a mount point.

  $ df|grep 'home$'
  /dev/dsk/c0t0d0s71935191  949204  92793251%/export/home

James Youngman [EMAIL PROTECTED] wrote:
 You could stat / at startup, and if its inode number is 2 (hint that
 we're probably not chrooted), trust d_ino, and don't trust it if d_ino
 is not 2 (we're probably chrooted).   The check is only probabilistic,
 but it might help.  It shouldn't be fooled by fsirand.

Unfortunately, that heuristic wouldn't work in this case:

  $ ./stat --format=%i /
  2

So at least Solaris 8 and some glibc are affected.

Unless I find a better approach, I'll turn off this optimization
by default, and add an option to turn it back on.


___
Bug-coreutils mailing list
Bug-coreutils@gnu.org
http://lists.gnu.org/mailman/listinfo/bug-coreutils


Re: Bug#369822: ls -i stats unnecessarily

2006-06-02 Thread Jim Meyering
Ian Jackson [EMAIL PROTECTED] wrote:

 Ian Jackson writes (Re: Bug#369822: ls -i stats unnecessarily):
 There are I think two approaches to this problem:
  * find a list of mountpoints in some system-specific way
for each one stat mountpoint/..
compare device and inode with those of the directory we're readdir'ing
  * provide an option to allow the user to specify that they don't
mind the inode numbers of mountpoints being wrong

 Someone has just pointed out to me that no matter what you do, you
 don't get the dev for the covering filesystem.  So returning the inum
 of the root of the covering fs is definitely wrong and should never be
 done.

 Think about it: if you ls -i anywhere near a mount point you're
 _inevitably_ going to get useless data because the output doesn't
 contain devs.  So anyone who does ls -i usefully must know that there
 are no mountpoints and this whole issue can be ignored.

I still think we'll need an option.

Otherwise (with the current/trunk optimization), for each of these special
directories, ls -i reports different inode numbers depending on whether
it appears as a command line argument (in which case, ls must lstat
(or stat) it) or it is encountered as an entry in some other directory.

Besides, from a portability standpoint, GNU ls -i should continue to
work the same way it always has, wrt mount points: print what is usually
a `2' for each of them.


___
Bug-coreutils mailing list
Bug-coreutils@gnu.org
http://lists.gnu.org/mailman/listinfo/bug-coreutils


Re: Bug#369822: ls -i stats unnecessarily

2006-06-02 Thread Paul Eggert
Jim Meyering [EMAIL PROTECTED] writes:

 So at least Solaris 8 and some glibc are affected.

I confirmed it on Solaris 10 too.

Amusingly enough, Solaris /bin/sh and /usr/xpg4/bin/sh behave like new
coreutils, not like old coreutils.  That is, ls -i dir just uses the
readdir results; it doesn't stat.  For example:

$ /bin/ls -i / | grep tmp
  1570 tmp
$ /bin/ls -id /tmp
   5153472 /tmp
$ uname -a
SunOS moa.cs.ucla.edu 5.10 Generic_118833-03 sun4u sparc 
SUNW,Sun-Fire-280R Solaris
$ mount -p | grep /tmp
swap - /tmp tmpfs - no xattr,size=1024m

 Unless I find a better approach, I'll turn off this optimization
 by default, and add an option to turn it back on.

Another possibility would be to disable the optimization.
Is it all that important that ls -i dir be fast?


___
Bug-coreutils mailing list
Bug-coreutils@gnu.org
http://lists.gnu.org/mailman/listinfo/bug-coreutils


Re: Bug#369822: ls -i stats unnecessarily

2006-06-01 Thread Jim Meyering
Ian Jackson [EMAIL PROTECTED] wrote:
 Package: coreutils
 Version: 5.2.1-2

 -davenant:~ strace -e trace=lstat64 /bin/ls --sort=none -i 
 /export/mirror/Repository/data-md5 21 | head
 lstat64(/export/mirror/Repository/data-md5/063096bcf34e489e5a6c3a7a20214368,
  {st_mode=S_IFREG|0664, st_size=834, ...}) = 0
 lstat64(/export/mirror/Repository/data-md5/b07ac8b4b6e9918f3e29e7c267f843d3,
  {st_mode=S_IFREG|0664, st_size=531, ...}) = 0

Thanks for the report.

You're right that in some cases ls could be optimized to avoid the
lstat calls.  However deciding when to do it is not easy.
It is possible
  - when dirent.d_ino is available (this is easy), and
  - when dirent.d_ino is guaranteed to be valid (this is tricky)

The latter is harder because for some files (mount points in a chroot
with a buggy glibc) d_ino is nonzero and wrong.  In those cases, you have
to use lstat to get the true value.  The invalid d_ino problem came up
recently with the report of pwd failing on systems with a losing (and
slightly old) glibc.

If someone else does all the work to make ensure the optimization
is safe, I'd accept a patch.


___
Bug-coreutils mailing list
Bug-coreutils@gnu.org
http://lists.gnu.org/mailman/listinfo/bug-coreutils


Re: Bug#369822: ls -i stats unnecessarily

2006-06-01 Thread Jim Meyering
I wrote:
 Thanks for the report.

 You're right that in some cases ls could be optimized to avoid the
 lstat calls.  However deciding when to do it is not easy.
 It is possible
   - when dirent.d_ino is available (this is easy), and
   - when dirent.d_ino is guaranteed to be valid (this is tricky)

 The latter is harder because for some files (mount points in a chroot
 with a buggy glibc) d_ino is nonzero and wrong.  In those cases, you have
 to use lstat to get the true value.  The invalid d_ino problem came up
 recently with the report of pwd failing on systems with a losing (and
 slightly old) glibc.

 If someone else does all the work to make ensure the optimization
 is safe, I'd accept a patch.

After writing that, I remembered that this optimization
has already been done on the trunk.  However, there's no check
for the glibc problem:

2006-02-25  Eric Blake  [EMAIL PROTECTED]

In ls, avoid calling stat for --inode (-i), when possible.
* src/pwd.c (NOT_AN_INODE_NUMBER, D_INO): Move to ...
* src/system.h: ... here, for use in ...
* src/ls.c (main): ... here.  Prefer dirent.d_ino to stat when
possible.
(gobble_file): Add inode argument.
(print_dir): Pass inode if available.
(usage): Remove inaccuracy.

The problem is rare enough that I won't be losing any
sleep over it.  But it would be good to fix it, or at least
add a test case comparing st_ino vs. d_ino for every readable
directory from . up to /.  Then (from test failure reports)
we can hope to get an idea of how often the problem arises.


___
Bug-coreutils mailing list
Bug-coreutils@gnu.org
http://lists.gnu.org/mailman/listinfo/bug-coreutils


Re: Bug#369822: ls -i stats unnecessarily

2006-06-01 Thread James Youngman

On 6/1/06, Jim Meyering [EMAIL PROTECTED] wrote:


 The latter is harder because for some files (mount points in a chroot
 with a buggy glibc) d_ino is nonzero and wrong.  In those cases, you have
 to use lstat to get the true value.  The invalid d_ino problem came up
 recently with the report of pwd failing on systems with a losing (and
 slightly old) glibc.

 If someone else does all the work to make ensure the optimization
 is safe, I'd accept a patch.

After writing that, I remembered that this optimization
has already been done on the trunk.  However, there's no check
for the glibc problem:

[...]

The problem is rare enough that I won't be losing any
sleep over it.  But it would be good to fix it, or at least
add a test case comparing st_ino vs. d_ino for every readable
directory from . up to /.  Then (from test failure reports)
we can hope to get an idea of how often the problem arises.


You could stat / at startup, and if its inode number is 2 (hint that
we're probably not chrooted), trust d_ino, and don't trust it if d_ino
is not 2 (we're probably chrooted).   The check is only probabilistic,
but it might help.  It shouldn't be fooled by fsirand.

James.


___
Bug-coreutils mailing list
Bug-coreutils@gnu.org
http://lists.gnu.org/mailman/listinfo/bug-coreutils