Re: [vchkpw] vpopmail development
On Fri, 2009-01-09 at 08:57 -0600, Matt Brookings wrote: This would not work because users can be deleted out of the hash tree anywhere. It appears your patch assumes a FILO ordering of user additions and deletions. I have not been able to explain properly. It would be FIFO. If the hashes, 'a' through 'd' existed, and the 'b' hash directory cleared out, your method would fail to backfill correctly. Let's take an example suppose there are 100 users (with 100 directories) in /var/vpopmail/domains there are 100 users (with 100 directories) in /var/vpopmail/domains/0 there are 100 users (with 100 directories) in /var/vpopmail/domains/1 there are 100 users (with 100 directories) in /var/vpopmail/domains/2 there are 50 users (with 50 direcotires) in /var/vpopmail/domains/3 Now let say I delete a user who has a directory in /var/vpopmail/domains/1 The backfill code will put the entry '1' in the first line in the file dir_control_free. Also let us say that we delete two users in /var/vpopmail/domains/2 The backfill code in vdeluser will put entry '2' twice in the file dir_control_free So after deleting 3 users, the file dir_control_free will have 3 lines 1 2 2 So now we have 99 users in /var/vpopmail/domains/1 andwe have 98 users in /var/vpopmail/domains/2 Now the modified vadduser will call a function called backfill() which will open this file, lock the file and pickup the first line, delete the line and return the value as user_hash #ifdef USERS_BIG_DIR /* go into a user hash dir if required */ if (!(user_hash = backfill(domain))) { open_big_dir(domain, uid, gid); user_hash = next_big_dir(uid, gid); close_big_dir(domain, uid, gid); chdir(user_hash); } #endif Each time the function backfill() is called it will deplete the file dir_control_free by one line and will always return the first line as the user_hash. When all lines get depleted, backfill() will return NULL in which case the regular dir_control will again come into effect and start from where it had left earlier. The advantage of this method is that you can use the find command to generate the missing directories in dir_control_free to catch up with the actual dir_control. Another way to explain this is that when backfill is in operation, dir_control stops working and when backfill() gets depleted and stops working, dir_control starts working !DSPAM:496b235e32678184414047!
Re: [vchkpw] vpopmail development
On Monday 12 January 2009 07:48:17 am ISP Lists wrote: Can someone please provide a brief discussion as to when a vpopmail hashed folder tree becomes big enough to warrant backfilling? Or, is big just one concern amongst others such as: rate of deletes and adds, filesystem choice... I'm not quite picking up why the backfill is important. Well, I don't know what other people are considering too big, but I actually wrote a backfill patch when I was working at a medium-sized college. We kept all 62 top-level hash directories on separate partitions, but didn't ever want to go to second level hashes - and with ~1200 adds (incoming freshmen) and deletes (outgoing seniors) every year, this became an issue pretty quick. The other issue with backfill is that the current implementation makes it so that you can easily exceed the 100 users per hash dir limit by deleting users from prior hash dirs and then adding new ones since the only check for a new hash dir is total users/100. My patch and the reasons therefore can be found at http://sourceforge.net/tracker/index.php?func=detailaid=1619600group_id=85937atid=577800. The main reason it's not currently slated for inclusion is that it's for the mysql backend only, and whatever process is used to provide backfill must be available for all backends. One last note - the idea of maintaining a list of backfill slots in a text file is a pretty good one, but it still doesn't address the issue of not properly calculating the number of users in a directory... Josh -- Joshua Megerman SJGames MIB #5273 - OGRE AI Testing Division You can't win; You can't break even; You can't even quit the game. - Layman's translation of the Laws of Thermodynamics j...@honorablemenschen.com !DSPAM:496b440e32671399810511!
Re: [vchkpw] vpopmail development
Manvendra Bhangui wrote: Now let say I delete a user who has a directory in /var/vpopmail/domains/1 The backfill code will put the entry '1' in the first line in the file dir_control_free. So after deleting 3 users, the file dir_control_free will have 3 lines 1 2 2 Each time the function backfill() is called it will deplete the file dir_control_free by one line and will always return the first line as the user_hash. When all lines get depleted, backfill() will return NULL in which case the regular dir_control will again come into effect and start from where it had left earlier. Okay. I can definitely see how this would work. It is a reasonable solution, and I'd be very interested to see a completed patch against the CVS head. The one comment I would make is that it's okay for user deletion to be an expensive call since it won't be being done nearly as much as queries for user information, but that depending upon the number of users a system has, for instance, where the hash levels have tripled up, the dir_control_free file would become very large and your solution requires a re-write of the file occasionally. It would be interesting to see a more efficient method where duplicates, as in your example, the hash directory 2, could be listed a single time. Remember that this feature does not yet exist, and that there are probably many systems with backfilling needs that go back years. Potentially this patch could hit a system with four levels of hashing simply because there's been a lot of additions and deletions. If the backfill patch doesn't take this into consideration, we may need to consider writing some sort of utility to analyze and clean, a system that is overhashed. The advantage of this method is that you can use the find command to generate the missing directories in dir_control_free to catch up with the actual dir_control. Another way to explain this is that when backfill is in operation, dir_control stops working and when backfill() gets depleted and stops working, dir_control starts working Agreed. -- /* Matt Brookings m...@inter7.com GnuPG Key D9414F70 Software developer Systems technician Inter7 Internet Technologies, Inc. (815)776-9465 */
Re: [vchkpw] vpopmail development
ISP Lists wrote: Can someone please provide a brief discussion as to when a vpopmail hashed folder tree becomes big enough to warrant backfilling? Or, is big just one concern amongst others such as: rate of deletes and adds, filesystem choice... I'm not quite picking up why the backfill is important. You've got it backwards. Backfilling becomes important when adding users. vpopmail hashes directories at around 100 users per directory. It also does this with domain directories. The problem is that the hashing does not take user removal into account. If you add 1000 users, and delete the first 500, the hashing leaves empty hash directories and continues to add new ones, rather than re-using previously created hash directories that are no longer full. -- /* Matt Brookings m...@inter7.com GnuPG Key D9414F70 Software developer Systems technician Inter7 Internet Technologies, Inc. (815)776-9465 */
Re: [vchkpw] vpopmail development
Joshua Megerman wrote: One last note - the idea of maintaining a list of backfill slots in a text file is a pretty good one, but it still doesn't address the issue of not properly calculating the number of users in a directory... What are you referring to when you say it doesn't properly calculate the number of users? The current hashing structure keeps track of additions only. Once users are removed, it is no longer up to date with correct user counts. That's what we're addressing with this proposed patch. -- /* Matt Brookings m...@inter7.com GnuPG Key D9414F70 Software developer Systems technician Inter7 Internet Technologies, Inc. (815)776-9465 */
Re: [vchkpw] vpopmail development
Matt Brookings wrote: Remember that this feature does not yet exist, and that there are probably many systems with backfilling needs that go back years. Potentially this patch could hit a system with four levels of hashing simply because there's been a lot of additions and deletions. If the backfill patch doesn't take this into consideration, we may need to consider writing some sort of utility to analyze and clean, a system that is overhashed. My system would be one of those, here are the stats from just one domain after 4 years of use. I have been putting off hacking together a Perl script to move everything around and update the MySQL tables. Honestly, I cannot say there is any performance hit even with the dirs this messed up. [r...@newnfs:/usr/local/scripts/old-scripts]# ./dircheck.sh tls.net dir 0 -35 dir 1 -38 dir 2 -36 dir 3 -30 dir 4 -32 dir 5 -38 dir 6 -32 dir 7 -33 dir 8 -38 dir 9 -33 dir A -31 dir B -26 dir C -45 dir D -32 dir E -32 dir F -19 dir G -36 dir H -42 dir I -39 dir J -30 dir K -34 dir L -30 dir M -33 dir N -31 dir O -33 dir P -26 dir Q -27 dir R -24 dir S -29 dir T -31 dir U -32 dir V -25 dir W -38 dir X -45 dir Y -30 dir Z -30 dir a -31 dir b -11 dir c -31 dir d -36 dir e - 3 dir f - 2 dir g - 5 dir h -64 dir i -14 dir j -13 dir k -13 dir l -13 dir m -26 dir n -17 dir o -36 dir p -16 dir q -17 dir r -33 dir s -23 dir t -30 dir u - 620 dir v - 759 DAve -- The whole internet thing is sucking the life out of me, there ain't no pony in there. !DSPAM:496b560c32679005657564!