Re: Quicker alternative to find /?
On Sun, 15 Aug 2004 19:00:41 -0500 didn't post to the list wrote: In that case, you may want to try a recursive ls. But for the same format as find, try using locate with updatedb on a cron job. Thanks for the hints. I did some benchmarks: find / /tmp/find.dirtree: 3:44 (1st run) 3:47 (repeated run) updatedb -u --output=/tmp/updatedb.dirtree: 8:51 8:47 ls -R / /tmp/ls.dirtree: 6:16 6:55 With less than 4 minutes find is by far the fastest of these methods. Why am I not using updatedb in a cron job? Because 1. the data is needed for a backup and thus needs to be current. 2. updatedb is annoying when it runs in the background and I have some actual work to do (the system is not running at night, so a nightly cron job is not an option). 3. I usually don't need locate. If I need to quickly find something that's not in my home directory, then often I can find it by greping the package database (Slackware). So, I guess I'll stick with find. I was hoping for a tool, though, that is optimized for ReiserFS. Felix PS: To contact me off list don't reply but send mail to felix.klee at the domain inka.de. Otherwise your email to me might get automatically deleted!
Re: Quicker alternative to find /?
On Monday 16 August 2004 14:04, Felix E. Klee wrote: On Sun, 15 Aug 2004 19:00:41 -0500 didn't post to the list wrote: In that case, you may want to try a recursive ls. But for the same format as find, try using locate with updatedb on a cron job. Thanks for the hints. I did some benchmarks: find / /tmp/find.dirtree: 3:44 (1st run) 3:47 (repeated run) updatedb -u --output=/tmp/updatedb.dirtree: 8:51 8:47 ls -R / /tmp/ls.dirtree: 6:16 6:55 With less than 4 minutes find is by far the fastest of these methods. Why am I not using updatedb in a cron job? Because 1. the data is needed for a backup and thus needs to be current. 2. updatedb is annoying when it runs in the background and I have some actual work to do (the system is not running at night, so a nightly cron job is not an option). 3. I usually don't need locate. If I need to quickly find something that's not in my home directory, then often I can find it by greping the package database (Slackware). So, I guess I'll stick with find. I was hoping for a tool, though, that is optimized for ReiserFS. Felix PS: To contact me off list don't reply but send mail to felix.klee at the domain inka.de. Otherwise your email to me might get automatically deleted! For a backup? If you use tar, you can list it's contents after the backup. I use star to backup file system meta data (rights, acl's, attributes), because our stupid commercial backup software looses the directory rights (that's what you have a backup software for). http://freshmeat.net/projects/star/ for backup: /usr/bin/star -c -M -z -meta artype=exustar f=meta-backup.star.gz / for restore: /usr/bin/star -x -z -U -meta f=meta-backup.star.gz You may want to try it, I doubt that it's faster than find, though. -- lg, Chris
Re: Quicker alternative to find /?
Am Sonntag, den 15.08.2004, 23:16 +0200 schrieb Felix E. Klee: I'd like to store the directory structure of a partition formatted as ReiserFS into a file. Currently, I use find / file This process takes approximately 5 minutes (the result is 26MB of data). Are there any alternative *quicker* ways to do this? The main problem is that find only uses one thread. This thread only reads one directory at once and as a result of that you'll get a lot of seeks. This can usually be improved *a lot* by doing a massively multi-threaded search with a lot of threads trying to read a lot of directories at once. The disk io scheduler will then linearize all the outstanding read requests. I've done something similar to speed up a diff -r using a shell script (not for find but for reading the file content that should be compared). signature.asc Description: Dies ist ein digital signierter Nachrichtenteil
Re: Quicker alternative to find /?
On Mon, 2004-08-16 at 08:52, Christophe Saout wrote: Am Sonntag, den 15.08.2004, 23:16 +0200 schrieb Felix E. Klee: I'd like to store the directory structure of a partition formatted as ReiserFS into a file. Currently, I use find / file This process takes approximately 5 minutes (the result is 26MB of data). Are there any alternative *quicker* ways to do this? The main problem is that find only uses one thread. This thread only reads one directory at once and as a result of that you'll get a lot of seeks. This can usually be improved *a lot* by doing a massively multi-threaded search with a lot of threads trying to read a lot of directories at once. The disk io scheduler will then linearize all the outstanding read requests. I've done something similar to speed up a diff -r using a shell script (not for find but for reading the file content that should be compared). This gets tricky quickly. Many threads reading many different directories at once will introduce a lot of seeks, since the directories are likely to be far apart on disk. It is far better for the filesystem to realize a sequential scan of the directory is in progress and do smarter readahead based on that. The latest patches in 2.6.8 for reiser v3 do some of this, triggering metadata readahead on readdirs. You can also make things faster by mounting with -o noatime,nodiratime. This is one workload where v4 should do better, since the inode data is close to the directory entry. -chris
Re: Quicker alternative to find /?
Am Sonntag, den 15.08.2004, 23:16 +0200 schrieb Felix E. Klee: I'd like to store the directory structure of a partition formatted as ReiserFS into a file. Currently, I use find / file This process takes approximately 5 minutes (the result is 26MB of data). Are there any alternative *quicker* ways to do this? The main problem is that find only uses one thread. This thread only reads one directory at once and as a result of that you'll get a lot of seeks. I am confused about this in general with most filesystems. I thought that all filenames/foldernames etc were stored in one place and not spread out over the intire filesystem. It seem to me very strange that things like find/ls -R etc takes so so long just read/list files like this on any modern filesystem. This can usually be improved *a lot* by doing a massively multi-threaded search with a lot of threads trying to read a lot of directories at once. The disk io scheduler will then linearize all the outstanding read requests. I've done something similar to speed up a diff -r using a shell script (not for find but for reading the file content that should be compared).
Re: Quicker alternative to find /?
On Mon, 2004-08-16 at 09:19, Spam wrote: Am Sonntag, den 15.08.2004, 23:16 +0200 schrieb Felix E. Klee: I'd like to store the directory structure of a partition formatted as ReiserFS into a file. Currently, I use find / file This process takes approximately 5 minutes (the result is 26MB of data). Are there any alternative *quicker* ways to do this? The main problem is that find only uses one thread. This thread only reads one directory at once and as a result of that you'll get a lot of seeks. I am confused about this in general with most filesystems. I thought that all filenames/foldernames etc were stored in one place and not spread out over the intire filesystem. It seem to me very strange that things like find/ls -R etc takes so so long just read/list files like this on any modern filesystem. It varies by filesystem. The filenames and folder names are stored in one place on almost all filesystems. But, the actual inode information that tells you what kind of file it is and how to read the file are sometimes stored in a different place (v3, ext[23]). find has to read both sets of information because a recursive find has to descend into all subdirectories, and the only way it can know if something is a subdirectory is by reading the inode. There is an optimization that ext[23] use to store the mode information (identifying things as a file or dir) in the directory listing. reiserfs v3 doesn't have this in the disk format, not sure if v4 does or not. Different directories are likely to be stored in different areas of the disk. So, a multithreaded find that tries to read multiple dirs at once is likely to introduce more seeks. -chris