[Pound Mailing List] RE: Website stalls every 60 seconds

Joe Gooch Wed, 22 Dec 2010 06:55:55 -0800

I believe the hashtable implementation is from the openssl library, so the 
implementation isn't part of pound.  From my quick look, it looks like the 
lhash nodes are using a linked list to hold the hashtable collisions... Which 
would need a lock on insert and delete because the structure changes.  I guess 
what you're really looking for is a read/write locking system. (i.e. t_expire 
does a read lock, new sessions try a write lock and fail until the read lock is 
done, but multiple read locks can be in place at a time)  Similar to Java's 
ReadWriteLock.  (looks like pthreads has rwlocks instead of mutexes)


Maybe instead of calling delete session it could create a temporary array or 
linked list to save the nodes that need to be deleted and then delete them in 
one shot... would require a temporary malloc or heap space... But even if it 
was a static array, t_expire is only going to run in a single thread, one at a 
time...  and if it didn't delete every expired session every time, that 
wouldn't be so bad, it's just going to run again anyway.

What concerns me more efficiency-wise is the method of deletion... Since the 
hashtable is a black box, we can't remove the in-place node...  the tree 
traversal has already located the node we want to remove, and once we do our 
comparison, we have to do lh_delete which will go through the table and locate 
it all over again.  Depending on how good the hash is and how many collisions 
you have that could be significant.

It could mean that in your case t_hash isn't returning significantly different 
hash values.  It'd probably require digging into the lhash stats numbers (i.e. 
from openssl):

" If you are interested in performance the field to watch is num_comp_calls. 
The hash library keeps track of the 'hash' value for each item so when a lookup 
is done, the 'hashes' are compared, if there is a match, then a full compare is 
done, and hash->num_comp_calls is incremented. If num_comp_calls is not equal 
to num_delete plus num_retrieve it means that your hash function is generating 
hashes that are the same for different values. It is probably worth changing 
your hash function if this is the case because even if your hash table has 10 
items in a 'bucket', it can be searched with 10 unsigned long compares and 10 
linked list traverses. This will be much less expensive that 10 calls to your 
compare function."

Also interesting:
" When doing this, be careful if you delete entries from the hash table in your 
callbacks: the table may decrease in size, moving the item that you are 
currently on down lower in the hash table - this could cause some entries to be 
skipped during the iteration. The second best solution to this problem is to 
set hash->down_load=0 before you start (which will stop the hash table ever 
decreasing in size). The best solution is probably to avoid deleting items from 
the hash table inside a ``doall'' callback!"

Pound does set down_load to 0, so it does do that... Saving a list of nodes in 
a temporary array and then deleting them in one shot would fix this.

What type of data is in the cookie?  Is it something like jsessionid?  How long 
is your TTL?

You could set your EXPIRE_TO higher... seems to me that's only a stopgap though.

We use pound on high volume sites and haven't had any issues... Then again, my 
weakest box (on my test sites) is a dual processor 2.8GHZ w/ 4gb ram.  
Production has more horsepower.

Joe

> -----Original Message-----
> From: Steven van der Vegt [mailto:[email protected]]
> Sent: Wednesday, December 22, 2010 8:50 AM
> To: [email protected]
> Subject: [Pound Mailing List] Website stalls every 60 seconds
>
> Hello list,
>
> We are a ISP and happy Poundusers! But on one system we experience some
> problems. For this system Pound is configured to use cookies for
> session tracking. It is installed on a PF-Sense firewall (FreeBSD)
> which isn't a very powerful device (Celeron M. 1Ghz 512Mb). We notice
> that the website seems to stall every minute for 10 seconds. After
> looking through the Pound code I notice the do_expire function, which
> is called every 60 seconds. It runs over all sessions in the hash and
> checks if they are expired. If so, deletes them. This takes almost 10
> seconds and since the hashtable is locked no lookups and inserts are
> permitted. There are ~3000 sessions in the table and Pound is handling
> ~100 hits/s.
> Obviously this is a locking problem. The first thing that comes to mind
> is: drop the non-threadsafe hashtable. But since this structure is a
> big part of the code this is not a fast-to-implement solution. Maybe we
> can sharpen up the lock-policy. Instead of locking the complete table
> during the t_expire, maybe only lock it when actually deleting a node
> (in t_old). I studied the code but I'm not sure if it is a (thread)safe
> solution. What do you think?
> I wonder, are we the only one experiencing these troubles? Is this a
> known problem?
>
> Kind regards,
>
> Steven van der Vegt
> Echelon B.V. The Netherlands
>
> --
> To unsubscribe send an email with subject unsubscribe to
> [email protected].
> Please contact [email protected] for questions.

--
To unsubscribe send an email with subject unsubscribe to [email protected].
Please contact [email protected] for questions.

[Pound Mailing List] RE: Website stalls every 60 seconds

Reply via email to