Since this problem is of somewhat a priority to us, I started to implement the delete-at-once functionality. It works, but I still need to implement read/write locks because during the execution of the doall() method we can't afford an insert.
Do you guys have a good testing method? I'm looking forward to your opinions about this solution. If I'm not responding to email the next few days, that's because I'm busy cooking and celebrating Christmas :-) Steven -----Oorspronkelijk bericht----- Van: Steven van der Vegt [mailto:[email protected]] Verzonden: donderdag 23 december 2010 12:15 Aan: [email protected] Onderwerp: [Pound Mailing List] RE: Website stalls every 60 seconds Hi Joe, Thanks for your quick and careful response! A few things that came to mind: > I guess what you're really looking for is a read/write locking system. That's something I had in mind too. This will be a speedup in general since all the read threads wouldn't be waiting on each other anymore. > Maybe instead of calling delete session it could create a temporary array or > linked list to save the nodes that need to be deleted and then delete them in > one shot... Great suggestion. Like a static array of n elements. If the array is full: lock -> delete -> unlock. If there are any unchecked elements left, continue checking and deleting. > What concerns me more efficiency-wise is the method of deletion... Since the > hashtable is a black box, we can't remove the in-place node... the tree > traversal has already located the node we want to remove, and once we do our > comparison, we have to do lh_delete which will go through the table and > locate it all over again. Depending on how good the hash is and how many > collisions you have that could be significant. That is a waste of recourses indeed, but since the blackbox principle, I don't see a way around it except for taking over some responsibility of the lhash, which is somewhat "not done". However, the lhash code hasn't changed within 12 years or so... > It could mean that in your case t_hash isn't returning significantly > different hash values. I will dig into that. For now the standard lhash lh_strhash function is used. If the values suggest a lot of hash collisions take place, is MD5 a good alternative? > What type of data is in the cookie? Is it something like jsessionid? How > long is your TTL? The cookie values are pretty basic, they only contain hashkeys. TTL is set on 20 minutes. > We use pound on high volume sites and haven't had any issues... Then again, > my weakest box (on my test sites) is a dual processor 2.8GHZ w/ 4gb ram. > Production has more horsepower. For now we have put Pound onto a ~3Ghz box. This solves the problem, but in mine opinion that's a waste of iron. Would you all be interested if I look into the delete-all-at-once and the readers-writers-lock? Kind regards, Steven -----Oorspronkelijk bericht----- Van: Joe Gooch [mailto:[email protected]] Verzonden: woensdag 22 december 2010 15:50 Aan: [email protected] Onderwerp: [Pound Mailing List] RE: Website stalls every 60 seconds I believe the hashtable implementation is from the openssl library, so the implementation isn't part of pound. From my quick look, it looks like the lhash nodes are using a linked list to hold the hashtable collisions... Which would need a lock on insert and delete because the structure changes. I guess what you're really looking for is a read/write locking system. (i.e. t_expire does a read lock, new sessions try a write lock and fail until the read lock is done, but multiple read locks can be in place at a time) Similar to Java's ReadWriteLock. (looks like pthreads has rwlocks instead of mutexes) Maybe instead of calling delete session it could create a temporary array or linked list to save the nodes that need to be deleted and then delete them in one shot... would require a temporary malloc or heap space... But even if it was a static array, t_expire is only going to run in a single thread, one at a time... and if it didn't delete every expired session every time, that wouldn't be so bad, it's just going to run again anyway. What concerns me more efficiency-wise is the method of deletion... Since the hashtable is a black box, we can't remove the in-place node... the tree traversal has already located the node we want to remove, and once we do our comparison, we have to do lh_delete which will go through the table and locate it all over again. Depending on how good the hash is and how many collisions you have that could be significant. It could mean that in your case t_hash isn't returning significantly different hash values. It'd probably require digging into the lhash stats numbers (i.e. from openssl): " If you are interested in performance the field to watch is num_comp_calls. The hash library keeps track of the 'hash' value for each item so when a lookup is done, the 'hashes' are compared, if there is a match, then a full compare is done, and hash->num_comp_calls is incremented. If num_comp_calls is not equal to num_delete plus num_retrieve it means that your hash function is generating hashes that are the same for different values. It is probably worth changing your hash function if this is the case because even if your hash table has 10 items in a 'bucket', it can be searched with 10 unsigned long compares and 10 linked list traverses. This will be much less expensive that 10 calls to your compare function." Also interesting: " When doing this, be careful if you delete entries from the hash table in your callbacks: the table may decrease in size, moving the item that you are currently on down lower in the hash table - this could cause some entries to be skipped during the iteration. The second best solution to this problem is to set hash->down_load efore you start (which will stop the hash table ever decreasing in size). The best solution is probably to avoid deleting items from the hash table inside a ``doall'' callback!" Pound does set down_load to 0, so it does do that... Saving a list of nodes in a temporary array and then deleting them in one shot would fix this. What type of data is in the cookie? Is it something like jsessionid? How long is your TTL? You could set your EXPIRE_TO higher... seems to me that's only a stopgap though. We use pound on high volume sites and haven't had any issues... Then again, my weakest box (on my test sites) is a dual processor 2.8GHZ w/ 4gb ram. Production has more horsepower. Joe > -----Original Message----- > From: Steven van der Vegt [mailto:[email protected]] > Sent: Wednesday, December 22, 2010 8:50 AM > To: [email protected] > Subject: [Pound Mailing List] Website stalls every 60 seconds > > Hello list, > > We are a ISP and happy Poundusers! But on one system we experience some > problems. For this system Pound is configured to use cookies for > session tracking. It is installed on a PF-Sense firewall (FreeBSD) > which isn't a very powerful device (Celeron M. 1Ghz 512Mb). We notice > that the website seems to stall every minute for 10 seconds. After > looking through the Pound code I notice the do_expire function, which > is called every 60 seconds. It runs over all sessions in the hash and > checks if they are expired. If so, deletes them. This takes almost 10 > seconds and since the hashtable is locked no lookups and inserts are > permitted. There are ~3000 sessions in the table and Pound is handling > ~100 hits/s. > Obviously this is a locking problem. The first thing that comes to mind > is: drop the non-threadsafe hashtable. But since this structure is a > big part of the code this is not a fast-to-implement solution. Maybe we > can sharpen up the lock-policy. Instead of locking the complete table > during the t_expire, maybe only lock it when actually deleting a node > (in t_old). I studied the code but I'm not sure if it is a (thread)safe > solution. What do you think? > I wonder, are we the only one experiencing these troubles? Is this a > known problem? > > Kind regards, > > Steven van der Vegt > Echelon B.V. The Netherlands > > -- > To unsubscribe send an email with subject unsubscribe to > [email protected]. > Please contact [email protected] for questions. -- To unsubscribe send an email with subject unsubscribe to [email protected]. Please contact [email protected] for questions. -- To unsubscribe send an email with subject unsubscribe to [email protected]. Please contact [email protected] for questions. -- To unsubscribe send an email with subject unsubscribe to [email protected]. Please contact [email protected] for questions.
