On Fri, 13 Aug 1999, john smith wrote:

> Hello. I am creating a qmail-based webmail system(please contact me if
> anyone else is working on this too!) and have some questions about
> Maildirs. 

It's been done already.  www.qmail.org has some pointers to a couple of
webmail CGIs that you can use.

> I am concerned with running up against file/inode limits with maildirs.
> However, I have no real knowledge of the underlying file system to base
> this fear on. Can anyone recommend somewhere where I can learn more
> about this? Some of my questions: Does the size of the drive affect the
> maximum number of files? What are the average practical limits?

It has been my experience that the size of an average E-mail message is
somewhere between 4K and 7K.

As long as you format your filesystem with 4096 bytes per inode, you'll
get it right on the button.

This also happens to be how Linux formats ext2fs by default, so apparently
on most UNIXes the average size of a file is also 4K to 7K, so basically
storing E-mail messages one per file does not really skewer your
filesystem stats.

> However, I still sometimes question my choice. Using Maildirs, I simply
> need to scan a directory for files and then put that list up for the
> webmail user to choose from. However, it seems I would have to open
> each file in the dir to get the header info out. I've considered
> updating a central index file as mail arrives but I don't know how well
> that would work.
> 
> I've also considered just inserting all the mails into a mysql database
> when they arrive. Thoughts?

My webmail CGI creates a cache file that stores the headers of all the
messages in the Maildir.  The cache file gets automatically rebuilt when
new messages arrive.  I compare the timestamps to figure out when I need
to rebuild the cache file.  Works very well, to format the folder contents
I only need to open the cache file, and read it.

It's been my experience that this approach scales to about a thousand
messages per a Maildir.  If you're dealing with more mail than that,
you're better off with a commercial, database-driven solution.  You also
have to be aware, if you choose to write something like this yourself,
that there are some pretty tricky race conditions that can bite you, if
you compare timestamps in such a fashion.  Furthermore, if you're using
NFS, you have to have the clocks on your server and client synched up.  
Otherwise, you get screwed.

Reply via email to