Re: [vchkpw] Hashed domain directories - options

2012-08-09 Thread John Simpson
On 2012-08-08, at 2132, Eric Shubert wrote:
 
 #define MAX_USERS_PER_LEVEL 100
 ...
 
 In an ext3 environment, it could be set (by the admin) to 3 (ext3 
 supports 32000 subdirectories), and with ext4 it could be set to 6 (ext4 
 supports 64000). These settings would for the most part disable hashed 
 directories, while still allowing hashes should the filesystem limits be 
 approached. Of course, a default value in dir_control could still be 100, 
 which would maintain former behavior. If this were done, the 
 --disable-users-big-dir option should probably be changed to 
 --allow-single-digit-users as well. ;)
 
 Please let me know what the prospects of such changes are. If it doesn't look 
 like anything that might ever happen in this area, I just may patch the 
 vauth.h file to be 3 and call it done.

The filesystem's limit on how many entries can exist in a directory is not the 
only issue... the other issue is performance.

On most filesystems (including ext2/3/4), in order to find a particular file 
within a directory, the kernel has to do a linear search on the contents. It 
can take longer to do a linear search across 30K items than it does to search 
through 100 entries, open a new directory, and do a second search through 100 
entries. This isn't an issue for filesystems which implement directories as 
binary trees instead of linear lists.

Personally, I don't build servers without both hashing options enabled. The 
hashing doesn't affect small machines (or small domains) because it doesn't 
kick in until a certain number of domains or mailboxes exist. And if the server 
becomes busy after the fact, the hashing code kicks in when needed and keeps 
mailbox access from being slow.

The scripts that I write which access the mailboxes all use vdominfo or 
vuserinfo (or the qmail virtualdomains and users/assign files, and the 
domain's vpasswd.cdb file) to locate the directories, rather than making 
assumptions about where a particular domain or mailbox might be on the disk. 
This way I'm using the same exact method that qmail uses to deliver mail, so I 
know I'm ending up in the right place.

If I'm not mistaken, the limitation on single-character mailbox names has 
something to do with how the hashing is implemented. The hash directories all 
have single-digit or single-letter names, and if a mailbox exists with the same 
name, it causes problems (or at least confusion.) Personally, I always thought 
they should have given the hash directories names which aren't used in SMTP 
address, like ,0 or ,a, but that's not how it was originally written.


| John M. Simpson  --  KG4ZOW  --  Programmer At Large |
| http://www.jms1.net/ j...@jms1.net |




signature.asc
Description: Message signed with OpenPGP using GPGMail
!DSPAM:5023ce3d34216837713534!

Re: [vchkpw] Hashed domain directories - options

2012-08-09 Thread Matt Brookings
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 8/9/2012 9:50 AM, John Simpson wrote:
 On 2012-08-08, at 2132, Eric Shubert wrote:

 #define MAX_USERS_PER_LEVEL 100
 ...

 In an ext3 environment, it could be set (by the admin) to 3 (ext3 
 supports 32000 subdirectories), and with ext4 it could be set to 6 (ext4 
 supports 64000). These settings would for the most part disable hashed 
 directories, while still allowing hashes should the filesystem limits be 
 approached. Of course, a default value in dir_control could still be 100, 
 which would maintain former behavior. If this were done, the 
 --disable-users-big-dir option should probably be changed to 
 --allow-single-digit-users as well. ;)

 Please let me know what the prospects of such changes are. If it doesn't 
 look like anything that might ever happen in this area, I just may patch the 
 vauth.h file to be 3 and call it done.
 
 The filesystem's limit on how many entries can exist in a directory is not 
 the only issue... the other issue is performance.
 
 On most filesystems (including ext2/3/4), in order to find a particular file 
 within a directory, the kernel has to do a linear search on the contents. It 
 can take longer to do a linear search across 30K items than it does to search 
 through 100 entries, open a new directory, and do a second search through 100 
 entries. This isn't an issue for filesystems which implement directories as 
 binary trees instead of linear lists.
 
 Personally, I don't build servers without both hashing options enabled. The 
 hashing doesn't affect small machines (or small domains) because it doesn't 
 kick in until a certain number of domains or mailboxes exist. And if the 
 server becomes busy after the fact, the hashing code kicks in when needed and 
 keeps mailbox access from being slow.
 
 The scripts that I write which access the mailboxes all use vdominfo or 
 vuserinfo (or the qmail virtualdomains and users/assign files, and the 
 domain's vpasswd.cdb file) to locate the directories, rather than making 
 assumptions about where a particular domain or mailbox might be on the disk. 
 This way I'm using the same exact method that qmail uses to deliver mail, so 
 I know I'm ending up in the right place.
 
 If I'm not mistaken, the limitation on single-character mailbox names has 
 something to do with how the hashing is implemented. The hash directories all 
 have single-digit or single-letter names, and if a mailbox exists with the 
 same name, it causes problems (or at least confusion.) Personally, I always 
 thought they should have given the hash directories names which aren't used 
 in SMTP address, like ,0 or ,a, but that's not how it was originally 
 written.

John has basically said everything I was going to :)  The only thing I would
mention is that the 5.4.32 and 5.4.33 both include changes that re-populate old
hash directories that have been made lighter by user deletion.  It's the
backfill feature.
- -- 
/*
Matt Brookings m...@inter7.com   GnuPG Key 5F3258AD
Software developer Systems technician
Inter7 Internet Technologies, Inc. (815)776-9465
*/
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.10 (MingW32)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iQEcBAEBAgAGBQJQI+ziAAoJEGgfLslfMlitqr4H/jPrpjOAFmRpqPtSzYrF7FE8
N7TYiGmvicx/flRFrrKdBDO9ZYzT8cUIUJFDJVFOHqAZRccn/vee8P/A3cKMeoJ6
czDFaVeEg8wAuo5VpdvPb5wQ49nrx2fyxDpnmHJ81kVSK4jhW8Uu0TzAQuM+kBqQ
6igvGewBJ83UFKebHnF6kvMHbpTuTJFJV/MZHhl45kaKznq3Cp/3cKEdbMYUGkGo
pk7NQg7OSrRgU3uthD1F/emLpUqSx1PBiDMjTDQdp4NMVleKvrCloQN16d+OHTCM
9FhwlX8NmaQ7P+y+5ak+sqYUxRQQ2MVLpGUfjNCSRqqTxqJVe+8mJmWw7Zg9VcY=
=QyQF
-END PGP SIGNATURE-