Re: Quicker alternative to find /?

2004-08-16 Thread Felix E. Klee
On Sun, 15 Aug 2004 19:00:41 -0500 didn't post to the list wrote:
 In that case, you may want to try a recursive ls.  But for the same
 format as find, try using locate with updatedb on a cron job.

Thanks for the hints. I did some benchmarks:

find / /tmp/find.dirtree:
3:44 (1st run)
3:47 (repeated run)
updatedb -u --output=/tmp/updatedb.dirtree:
8:51
8:47
ls -R / /tmp/ls.dirtree:
6:16
6:55

With less than 4 minutes find is by far the fastest of these
methods.

Why am I not using updatedb in a cron job? Because
1. the data is needed for a backup and thus needs to be current.
2. updatedb is annoying when it runs in the background and I have some
   actual work to do (the system is not running at night, so a nightly 
   cron job is not an option).
3. I usually don't need locate. If I need to quickly find something
   that's not in my home directory, then often I can find it by greping
   the package database (Slackware).

So, I guess I'll stick with find. I was hoping for a tool, though, that
is optimized for ReiserFS.

Felix

PS: To contact me off list don't reply but send mail to felix.klee at
the domain inka.de. Otherwise your email to me might get automatically
deleted!


Re: Quicker alternative to find /?

2004-08-16 Thread Christian Mayrhuber
On Monday 16 August 2004 14:04, Felix E. Klee wrote:
 On Sun, 15 Aug 2004 19:00:41 -0500 didn't post to the list wrote:
  In that case, you may want to try a recursive ls.  But for the same
  format as find, try using locate with updatedb on a cron job.

 Thanks for the hints. I did some benchmarks:

 find / /tmp/find.dirtree:
 3:44 (1st run)
 3:47 (repeated run)
 updatedb -u --output=/tmp/updatedb.dirtree:
 8:51
 8:47
 ls -R / /tmp/ls.dirtree:
 6:16
 6:55

 With less than 4 minutes find is by far the fastest of these
 methods.

 Why am I not using updatedb in a cron job? Because
 1. the data is needed for a backup and thus needs to be current.
 2. updatedb is annoying when it runs in the background and I have some
actual work to do (the system is not running at night, so a nightly
cron job is not an option).
 3. I usually don't need locate. If I need to quickly find something
that's not in my home directory, then often I can find it by greping
the package database (Slackware).

 So, I guess I'll stick with find. I was hoping for a tool, though, that
 is optimized for ReiserFS.

 Felix

 PS: To contact me off list don't reply but send mail to felix.klee at
 the domain inka.de. Otherwise your email to me might get automatically
 deleted!

For a backup?
If you use tar, you can list it's contents after the backup.

I use star to backup file system meta data (rights, acl's, attributes), 
because our stupid commercial backup software looses the directory rights 
(that's what you have a backup software for).
http://freshmeat.net/projects/star/

for backup:
/usr/bin/star -c -M -z -meta artype=exustar f=meta-backup.star.gz /

for restore:
/usr/bin/star -x -z -U -meta f=meta-backup.star.gz

You may want to try it, I doubt that it's faster than find, though.

-- 
lg, Chris



Re: Quicker alternative to find /?

2004-08-16 Thread Christophe Saout
Am Sonntag, den 15.08.2004, 23:16 +0200 schrieb Felix E. Klee:

 I'd like to store the directory structure of a partition formatted as
 ReiserFS into a file. Currently, I use
 
 find / file
 
 This process takes approximately 5 minutes (the result is 26MB of
 data). Are there any alternative *quicker* ways to do this?

The main problem is that find only uses one thread. This thread only
reads one directory at once and as a result of that you'll get a lot of
seeks.

This can usually be improved *a lot* by doing a massively multi-threaded
search with a lot of threads trying to read a lot of directories at
once. The disk io scheduler will then linearize all the outstanding read
requests.

I've done something similar to speed up a diff -r using a shell script
(not for find but for reading the file content that should be compared).



signature.asc
Description: Dies ist ein digital signierter Nachrichtenteil


Re: Quicker alternative to find /?

2004-08-16 Thread Chris Mason
On Mon, 2004-08-16 at 08:52, Christophe Saout wrote:
 Am Sonntag, den 15.08.2004, 23:16 +0200 schrieb Felix E. Klee:
 
  I'd like to store the directory structure of a partition formatted as
  ReiserFS into a file. Currently, I use
  
  find / file
  
  This process takes approximately 5 minutes (the result is 26MB of
  data). Are there any alternative *quicker* ways to do this?
 
 The main problem is that find only uses one thread. This thread only
 reads one directory at once and as a result of that you'll get a lot of
 seeks.
 
 This can usually be improved *a lot* by doing a massively multi-threaded
 search with a lot of threads trying to read a lot of directories at
 once. The disk io scheduler will then linearize all the outstanding read
 requests.
 
 I've done something similar to speed up a diff -r using a shell script
 (not for find but for reading the file content that should be compared).
 

This gets tricky quickly.  Many threads reading many different
directories at once will introduce a lot of seeks, since the directories
are likely to be far apart on disk.  It is far better for the filesystem
to realize a sequential scan of the directory is in progress and do
smarter readahead based on that.  

The latest patches in 2.6.8 for reiser v3 do some of this, triggering
metadata readahead on readdirs.  You can also make things faster by
mounting with -o noatime,nodiratime.

This is one workload where v4 should do better, since the inode data is
close to the directory entry.

-chris




Re: Quicker alternative to find /?

2004-08-16 Thread Spam


 Am Sonntag, den 15.08.2004, 23:16 +0200 schrieb Felix E. Klee:

 I'd like to store the directory structure of a partition formatted as
 ReiserFS into a file. Currently, I use
 
 find / file
 
 This process takes approximately 5 minutes (the result is 26MB of
 data). Are there any alternative *quicker* ways to do this?

 The main problem is that find only uses one thread. This thread only
 reads one directory at once and as a result of that you'll get a lot of
 seeks.

  I am confused about this in general with most filesystems. I thought
  that all filenames/foldernames etc were stored in one place and not
  spread out over the intire filesystem.

  It seem to me very strange that things like find/ls -R etc takes so
  so long just read/list files like this on any modern filesystem.

 This can usually be improved *a lot* by doing a massively multi-threaded
 search with a lot of threads trying to read a lot of directories at
 once. The disk io scheduler will then linearize all the outstanding read
 requests.

 I've done something similar to speed up a diff -r using a shell script
 (not for find but for reading the file content that should be compared).







Re: Quicker alternative to find /?

2004-08-16 Thread Chris Mason
On Mon, 2004-08-16 at 09:19, Spam wrote:
  Am Sonntag, den 15.08.2004, 23:16 +0200 schrieb Felix E. Klee:
 
  I'd like to store the directory structure of a partition formatted as
  ReiserFS into a file. Currently, I use
  
  find / file
  
  This process takes approximately 5 minutes (the result is 26MB of
  data). Are there any alternative *quicker* ways to do this?
 
  The main problem is that find only uses one thread. This thread only
  reads one directory at once and as a result of that you'll get a lot of
  seeks.
 
   I am confused about this in general with most filesystems. I thought
   that all filenames/foldernames etc were stored in one place and not
   spread out over the intire filesystem.
 
   It seem to me very strange that things like find/ls -R etc takes so
   so long just read/list files like this on any modern filesystem.

It varies by filesystem.  The filenames and folder names are stored in
one place on almost all filesystems.  But, the actual inode information
that tells you what kind of file it is and how to read the file are
sometimes stored in a different place (v3, ext[23]).

find has to read both sets of information because a recursive find has
to descend into all subdirectories, and the only way it can know if
something is a subdirectory is by reading the inode.  There is an
optimization that ext[23] use to store the mode information (identifying
things as a file or dir) in the directory listing.  reiserfs v3 doesn't
have this in the disk format, not sure if v4 does or not.

Different directories are likely to be stored in different areas of the
disk.  So, a multithreaded find that tries to read multiple dirs at once
is likely to introduce more seeks.

-chris