On 08/09/2012 10:01 AM, Matt Brookings wrote:
Hash: SHA1

On 8/9/2012 9:50 AM, John Simpson wrote:
On 2012-08-08, at 2132, Eric Shubert wrote:


In an ext3 environment, it could be set (by the admin) to 30000 (ext3 supports 
32000 subdirectories), and with ext4 it could be set to 60000 (ext4 supports 
64000). These settings would for the most part disable hashed directories, 
while still allowing hashes should the filesystem limits be approached. Of 
course, a default value in dir_control could still be 100, which would maintain 
former behavior. If this were done, the --disable-users-big-dir option should 
probably be changed to --allow-single-digit-users as well. ;)

Please let me know what the prospects of such changes are. If it doesn't look 
like anything that might ever happen in this area, I just may patch the vauth.h 
file to be 30000 and call it done.

The filesystem's limit on how many entries can exist in a directory is not the 
only issue... the other issue is performance.

On most filesystems (including ext2/3/4), in order to find a particular file 
within a directory, the kernel has to do a linear search on the contents. It 
can take longer to do a linear search across 30K items than it does to search 
through 100 entries, open a new directory, and do a second search through 100 
entries. This isn't an issue for filesystems which implement directories as 
binary trees instead of linear lists.

Recent versions of ext/2/3/4 have an option to created a hashed index for directories (tune2fs -O dir_index), as I imagine you are aware. I sincerely doubt that having vpopmail hashing provides any significant benefit beyond that.

From an architectural point of view, I also expect you'd agree that directory hashing belongs better at the filesystem level than in the application code.

Perhaps it's desirable to provide hashing for some filesystems (any of which I'm unaware) that do not provide directory hashing on their own. That's fine. I'm not suggesting that the capability be removed, only that it be able to be managed more effectively.

There is presently an option to turn this feature off at the user level, which is great (imo). I suppose that the likelihood of having hashed users is greater than that of having hashed domains in most situations, but if the option is appropriate for users, why would it not also be appropriate for domains? I initially wondered if the --disable-users-big-dir option turned off hashing at both levels, which seemed reasonable to me. I verified by examining the code though that this is not the case. So the --disable-users-big-dir option seems reasonable at least.

The scripts that I write which access the mailboxes all use "vdominfo" or 
"vuserinfo" (or the qmail virtualdomains and users/assign files, and the domain's 
vpasswd.cdb file) to locate the directories, rather than making assumptions about where a 
particular domain or mailbox might be on the disk. This way I'm using the same exact method that 
qmail uses to deliver mail, so I know I'm ending up in the right place.

Thanks for this tip. This is the proper way to access this data. I'll look for any qmailtoaster-plus scripts that should be changed and fix them.

If I'm not mistaken, the limitation on single-character mailbox names has something to do with how 
the hashing is implemented. The hash directories all have single-digit or single-letter names, and 
if a mailbox exists with the same name, it causes problems (or at least confusion.) Personally, I 
always thought they should have given the hash directories names which aren't used in SMTP address, 
like ",0" or ",a", but that's not how it was originally written.

I agree.

John has basically said everything I was going to :)  The only thing I would
mention is that the 5.4.32 and 5.4.33 both include changes that re-populate old
hash directories that have been made lighter by user deletion.  It's the
"backfill" feature.

I think you both missed a significant part of my post. Let me make my question as direct as I can. Why in the world is
hard coded in the source (something nearly always best avoided), while it's brethren data:
level_cur 0
level_max 3
level_start0 0
level_start1 0
level_start2 0
level_end0 61
level_end1 61
level_end2 61
level_mod0 0
level_mod1 2
level_mod2 4
level_index0 0
level_index1 0
level_index2 0
all live in the dom_89 record of the dir_control table?

I think it's great, however dangerous, that the directory hashing has parameters which are so easily changed. I'm simply wondering, what would be the problem (if any) with making max_users_per_level a field in the same record, instead of it being hard coded in a header file?

The benefit of doing so is quite significant (more so than application hashing on top of a filesystem hashing certainly). Most significantly, it allows the administrator to tune the point at which application hashing kicks in. Being able to tune things in this way is precisely what I'm looking to be able to do. To be honest, I'm aiming at turning vpopmail's directory hashing off, but I think that less drastic measures would have value as well (IOW, letting it kick in at a higher level).

Another benefit is that, in theory, the limit of the number of domains and users could be increased astronomically, perhaps to the actual limits of any filesystem. In other words, instead of 100 + (62 * 100) + (62 * 62 * 100) + (62 * 62 * 62 * 100) = over 24 million domains (or users per domain), there could be over 30000 + (62 * 30000) + (62 * 62 * 30000) + (62 * 62 * 62 * 30000) of either/both on ext3. Ext4 does over 60000 where I've used 30000 for ext3. I'll let someone else do the math (scientific notation would be handy in this case).

Just one more question. If there's no problem or objection with doing this, which version of vpopmail would you suggest I use to write a patch to accomplish this? 5.4.33 (I would prefer) or a 5.5 version?

Thanks for your understanding and attention.

-Eric 'shubes'


Reply via email to