File system for large directories?

2007-04-21 Thread Amos Shapira

Hi,

Our servers have to deal with huge amounts of small files (tens, sometimes
hundreds of thousands of files IN ONE DIRECTORY).

Currently they use ext3 but I wonder wether this is the prefered FS.

I used to be fond of ReiserFS v3 until I got beaten by it not recovering
from a partition resizing excercise.

Trying to find the answer on the net I found:

http://librenix.com/?inode=3296 (Circa 2003, recommends ReiserFS v4, which
isn't in the mainstream kernel yet).

and

http://www.debian-administration.org/articles/388 (Circa 2006, recommends
XFS).

The later compared handling of large trees (i.e. not necessarily single
directory with lots of files in it).

Does anyone have good and up to date recommendations for such situation?

The files are e-mail messages which are writen, transferred then deleted.

I'm also thinking about better ways to handle the files (e.g. putting every
few thousands of them in a .zip file to transfer, spreading them across a
two-level directory tree etc) but I'd rathertry to keep the changes to the
existing software and scripts the the minimum which is required to speed
things up.

Thanks,

--Amos


Re: File system for large directories?

2007-04-21 Thread Marc A. Volovic
Quoth Amos Shapira:

 Hi,
 
 Our servers have to deal with huge amounts of small files (tens,
 sometimes hundreds of thousands of files IN ONE DIRECTORY).
 
 Currently they use ext3 but I wonder wether this is the prefered FS.

Ext3 is - last I chaecked (about two years ago) possibly the worst
filesystem for dealing with LOTS of files in a single directory. Reiser 3
was very good (did not try reiser 4).

However, I am very wary of reiser now - what with poor (or, maybe, not so
poor) Hans being in jail, reiserfs may be going the way of the dodo.

I'd run bonnie (just the creation/deletion tests) for JFS, XFS and Ext4
(which is starting to make an appearance here and there). IIRC - XFS is
ALSO not very good with lots of small files.

 I'm also thinking about better ways to handle the files (e.g. putting every
 few thousands of them in a .zip file to transfer, spreading them across a
 two-level directory tree etc) but I'd rathertry to keep the changes to the
 existing software and scripts the the minimum which is required to speed
 things up.

B-sort em? Switch the back-end to database (assuming the blobs are small)?

-- 
---MAV
Marc A. Volovic [EMAIL PROTECTED]
Swiftouch, LTD +972-544-676764

=
To unsubscribe, send mail to [EMAIL PROTECTED] with
the word unsubscribe in the message body, e.g., run the command
echo unsubscribe | mail [EMAIL PROTECTED]



Re: File system for large directories?

2007-04-21 Thread Gil Freund

On 4/21/07, Amos Shapira [EMAIL PROTECTED] wrote:

Hi,

Our servers have to deal with huge amounts of small files (tens, sometimes
hundreds of thousands of files IN ONE DIRECTORY).


Do you access them locally or remotely, if so, how?



Currently they use ext3 but I wonder wether this is the prefered FS.


I have found ReiserFS outperformed EXT3 on a similar site, this,
however was made irreverent, as access was via done mainly via NFS.



I used to be fond of ReiserFS v3 until I got beaten by it not recovering
from a partition resizing excercise.


Resizing a partition is not a good indicator. There are too many other
factors involved.



Trying to find the answer on the net I found:

http://librenix.com/?inode=3296 (Circa 2003, recommends ReiserFS v4, which
isn't in the mainstream kernel yet).

and

http://www.debian-administration.org/articles/388 (Circa
2006, recommends XFS).

The later compared handling of large trees (i.e. not necessarily single
directory with lots of files in it).

Does anyone have good and up to date recommendations for such situation?

The files are e-mail messages which are writen, transferred then deleted.

I'm also thinking about better ways to handle the files ( e.g. putting every
few thousands of them in a .zip file to transfer, spreading them across a
two-level directory tree etc) but I'd rathertry to keep the changes to the
existing software and scripts the the minimum which is required to speed
things up.


Benchmark your own environment. Hardware specs (RAID, RAM, CPU, etc)
can tilt the results. Repeat the benchmarks for about 3-5 time. I like
Bonnie++. Look for the results that best match your environment (Read,
Write, Create, etc).



Thanks,

--Amos




--
Gil Freund, Systems Analyst
---
Sysnet consulting
[EMAIL PROTECTED],  http://www.sysnet.co.il
voice: +972-54-2035888, Fax: +972-8-9356026

=
To unsubscribe, send mail to [EMAIL PROTECTED] with
the word unsubscribe in the message body, e.g., run the command
echo unsubscribe | mail [EMAIL PROTECTED]



Re: File system for large directories?

2007-04-21 Thread Amos Shapira

On 22/04/07, Marc A. Volovic [EMAIL PROTECTED] wrote:


Quoth Amos Shapira:

 Hi,

 Our servers have to deal with huge amounts of small files (tens,
 sometimes hundreds of thousands of files IN ONE DIRECTORY).

 Currently they use ext3 but I wonder wether this is the prefered FS.

Ext3 is - last I chaecked (about two years ago) possibly the worst
filesystem for dealing with LOTS of files in a single directory. Reiser 3
was very good (did not try reiser 4).

However, I am very wary of reiser now - what with poor (or, maybe, not so
poor) Hans being in jail, reiserfs may be going the way of the dodo.



If Reiser3 is already in mainline and stable - wouldn't it be supported even
if Hans/Nemesis vanishes?
Reiser4 is not relevant because I want to stick to mainline kernels, much
preferably Debian supplied kernels.

I'd run bonnie (just the creation/deletion tests) for JFS, XFS and Ext4

(which is starting to make an appearance here and there). IIRC - XFS is
ALSO not very good with lots of small files.



Will try to do that, though again - if ext4 isn't in the mainline yet then
it's not relevant for me.


I'm also thinking about better ways to handle the files (e.g. putting
every
 few thousands of them in a .zip file to transfer, spreading them across
a
 two-level directory tree etc) but I'd rathertry to keep the changes to
the
 existing software and scripts the the minimum which is required to speed
 things up.

B-sort em? Switch the back-end to database (assuming the blobs are small)?



I'm thinking of databases sometimes (the files are around 4k on average) but
it feels like Hans Reiser was sort of right about that - a filesystem can be
used as a database for this sort of data.


Cheers,

--Amos


Re: File system for large directories?

2007-04-21 Thread Amos Shapira

On 22/04/07, Gil Freund [EMAIL PROTECTED] wrote:


On 4/21/07, Amos Shapira [EMAIL PROTECTED] wrote:
 Hi,

 Our servers have to deal with huge amounts of small files (tens,
sometimes
 hundreds of thousands of files IN ONE DIRECTORY).

Do you access them locally or remotely, if so, how?



They are writen locally, then transferred over FTP to Windows machines.


I used to be fond of ReiserFS v3 until I got beaten by it not recovering
 from a partition resizing excercise.

Resizing a partition is not a good indicator. There are too many other
factors involved.



I just became worry of the admin tools available for ReiserFS. It's
wonderfull when everything is dandy (and survived many power failures at my
previous home) but then when I needed to do something else, which I hear is
trivial with ext3 for instance, it failed measerebly.

Benchmark your own environment. Hardware specs (RAID, RAM, CPU, etc)

can tilt the results. Repeat the benchmarks for about 3-5 time. I like
Bonnie++. Look for the results that best match your environment (Read,
Write, Create, etc).



It looks like Bonnie++ is what everyone and his dog are doing. I'll try to
see how can I do that (hardly any headroom in terms of spare hardware to
shift things around).

Cheers,

--Amos