Re: [AOLSERVER] Ns_RegisterAtReady, NsRunAtReadyProcs
On Saturday 27 October 2007 05:22, Gustaf Neumann wrote: > > I'll commit my code over the weekend replacing the current code. My fix > > will be a bug fix. The bugs are these: > > 1. unable to respond to load changes (up and down) > > 2. unable to respond to configuration changes (up and down) > > 3. poor, incorrect or missing synchronization code. > > 4. queued doesn't reflect current queued requests. > > 5. threads exit unnecessarily on timeout, then new threads to replace > > 6. current patches cause signal 11 under easily repeated conditions. > > > > If anyone wants to stick to the current code, let me know, and I'll just > > keep my changes here. > > well, this list of issues is not complete (e.g. queuing in the driver) > and rather > vague (1-3). For (4) you seem to propose a change in semantics for the > reported > value, for (5) you proposal seems to be "ignore maxconns", (6) is not > easy to repeat, > i was not able to do so. > > I have no objections in replacing the current code in CVS, as long > - it fixes the issues we have discussed, > - it does not introduce new issues, > - it respects configuration values, > - it is performance-wise no drawback. You're right, nevermind. I can't guarantee any of these conditions. We don't need two people overwriting each other's commits. tom jackson -- AOLserver - http://www.aolserver.com/ To Remove yourself from this list, simply send an email to <[EMAIL PROTECTED]> with the body of "SIGNOFF AOLSERVER" in the email message. You can leave the Subject: field of your email blank.
Re: [AOLSERVER] Ns_RegisterAtReady, NsRunAtReadyProcs
Tom Jackson schrieb: I'm not sure if you are disputing the bug or not, but I'm somewhat tired of this. I've been testing various patches for the last few weeks in addition to my own code. It seems that everyone knows I've been working on this for a few weeks or more, and everytime I identify a problem, a new patch with a new problem is placed into CVS. Well, i think, i have identified and pinpointed a couple of bugs, and you did as well. Your are probably as well aware, that i am working on the code, so i commited over time a couple of improvements based on different load scenarios etc. Yes, i tried rather to fix the code than to come up with a different approach (and was as well surprised the original code has much problems). Each time I try to point out that this isn't a simple issue, but patches show up with a one dimensional solution or no testing to back them up. I'll commit my code over the weekend replacing the current code. My fix will be a bug fix. The bugs are these: 1. unable to respond to load changes (up and down) 2. unable to respond to configuration changes (up and down) 3. poor, incorrect or missing synchronization code. 4. queued doesn't reflect current queued requests. 5. threads exit unnecessarily on timeout, then new threads to replace 6. current patches cause signal 11 under easily repeated conditions. If anyone wants to stick to the current code, let me know, and I'll just keep my changes here. well, this list of issues is not complete (e.g. queuing in the driver) and rather vague (1-3). For (4) you seem to propose a change in semantics for the reported value, for (5) you proposal seems to be "ignore maxconns", (6) is not easy to repeat, i was not able to do so. What makes you believe that (4) is a bug? The total number of processed queries is certainly interested for e.g. implementing a performance monitor. If you siltently change the semantics, you will break such applications. Therefore, it is better to add another variable (e.g. "currently_queued") than altering something you find not useful. Btw, with the current code, i get on my home pc (intel imac, mac os x) up to 5000 to 6000 requests per second on small files (maxhreads 5) in cases without driver stalls. With maxthreads 10 i get about twice the value (e.g.g 10169.49 [#/sec]). Certainly, these values are hard to compare since they depend on OS and hardware. Server Software:AOLserver/4.5.0 Server Hostname:localhost Server Port:8001 Document Path: /hires_image-learn.gif Document Length:6339 bytes Concurrency Level: 10 Time taken for tests: 2.111 seconds Complete requests: 12000 Failed requests:0 Broken pipe errors: 0 Total transferred: 78638204 bytes HTML transferred: 76093356 bytes Requests per second:5684.51 [#/sec] (mean) Time per request: 1.76 [ms] (mean) Time per request: 0.18 [ms] (mean, across all concurrent requests) Transfer rate: 37251.64 [Kbytes/sec] received Connnection Times (ms) min mean[+/-sd] median max Connect:0 00.0 0 2 Processing: 0 10.3 1 6 Waiting:0 10.3 1 6 Total: 0 10.4 1 6 My code is well tested with small, large and very large files, low, and high maxconns, low-high threads, and also using [ns_conn channel] to send the sock to a worker thread. I run the queue under continuous loads and vary the threads/maxconns to see the response. note that different load it is not primarily an issue of small and large files but more an issue of fast and slow requests (fast requests are typically small files, but slow requests might return as well small responses, but require a many CPU cycles) and as well an issue of slow and fast thread start up time (aolserver vs. openacs). I have observed quite different issues in such situations. I have no objections in replacing the current code in CVS, as long - it fixes the issues we have discussed, - it does not introduce new issues, - it respects configuration values, - it is performance-wise no drawback. best regards -gustaf neumann -- AOLserver - http://www.aolserver.com/ To Remove yourself from this list, simply send an email to <[EMAIL PROTECTED]> with the body of "SIGNOFF AOLSERVER" in the email message. You can leave the Subject: field of your email blank.
Re: [AOLSERVER] Ns_RegisterAtReady, NsRunAtReadyProcs
On Friday 26 October 2007 17:27, Gustaf Neumann wrote: > i was not able to recreate the crash that you have reported in the other > mail. > However, with ab with small files and pure aolserver (without openacs), > i could verify that some of the counters were wrong. The decision to > create a thread or not in NsQueueConn() was based on a counter that > was wrong, since it did not take into account that some more threads > are already starting in the background. To address this issuess, there > are now two additional variables, namely starting (# of currently > starting threads) and waiting (# of threads in a cond wait). > It would be great if you could test this code again in your environment. I'm not sure if you are disputing the bug or not, but I'm somewhat tired of this. I've been testing various patches for the last few weeks in addition to my own code. It seems that everyone knows I've been working on this for a few weeks or more, and everytime I identify a problem, a new patch with a new problem is placed into CVS. Each time I try to point out that this isn't a simple issue, but patches show up with a one dimensional solution or no testing to back them up. I'll commit my code over the weekend replacing the current code. My fix will be a bug fix. The bugs are these: 1. unable to respond to load changes (up and down) 2. unable to respond to configuration changes (up and down) 3. poor, incorrect or missing synchronization code. 4. queued doesn't reflect current queued requests. 5. threads exit unnecessarily on timeout, then new threads to replace 6. current patches cause signal 11 under easily repeated conditions. If anyone wants to stick to the current code, let me know, and I'll just keep my changes here. My code is well tested with small, large and very large files, low, and high maxconns, low-high threads, and also using [ns_conn channel] to send the sock to a worker thread. I run the queue under continuous loads and vary the threads/maxconns to see the response. tom jackson -- AOLserver - http://www.aolserver.com/ To Remove yourself from this list, simply send an email to <[EMAIL PROTECTED]> with the body of "SIGNOFF AOLSERVER" in the email message. You can leave the Subject: field of your email blank.
Re: [AOLSERVER] Ns_RegisterAtReady, NsRunAtReadyProcs
Tom Jackson schrieb: I've pointed out several times that I'm looking at the operation of the queue from the point of view of performance (both economy and responsiveness), not the specific numbers (min, max, timeout, maxconns) being maintained. ... So for instance, if Andrew thinks it is a bug for minthreads to be violated, it is best to not use a thread timeout. That fixes the issue instantly. to obey minthreads helps to improve performance, since when all connection threads are gone, there will be a delay for thread creation upon a new request. I agree with Andrew that it is a bug if - as it was before - minthreads are only available at server start, and disappear as soon the threads timeout. If minthreads is taken as a sacred number, then your solution doesn't cut it. The thread exits and then recreates another one. This is merely cosmetic, and it only appears correct if you don't notice that the number actually went below minthreads. If there was a correct solution, it would prevent the violation of this lower limit, not fix it up after it happens. i do not agree here. maxconns should be obeyed as well. If a thread has served the specified number of items, it should exit. If the number of threads is reduced below minthreads, a fresh thread should be created. Keeping the thread instead alife is not correct. If one does not like the behavior, one could specify maxconns=99. As an example, just look at the original code. How did threads escape exit at startup if there is a timeout? 388 if (poolPtr->threads.current <= poolPtr->threads.min) { 389 timePtr = NULL; 390 } ... So there was already some thought at preventing thread exit based upon a timeout, if minthreads would be violated. So the question is: why isn't this code working? simply because it does not consider that maxconn cycles may lead as well to thread exits. There are a number of bugs in this code, and fixing it up at the end, after a thread has exited doesn't remove the actual bugs. An individual thread sits in a while loop waiting for a queued request this loop, and the following check is currently: status = NS_OK; while (!poolPtr->shutdown && status == NS_OK && poolPtr->queue.wait.firstPtr == NULL) { /* nothing is queued, we wait for a queue entry */ status = Ns_CondTimedWait(&poolPtr->cond, &poolPtr->lock, timePtr); } if (poolPtr->queue.wait.firstPtr == NULL) { msg = "timeout waiting for connection"; break; } Status starts out as NS_OK, the entire while loop is skipped if there is a waiting request, poolPtr->queue.wait.firstPtr != NULL. this is fine, since the waiting request can be immediately processed without the condwait. Once the wait is done, the conditions are again checked, the loop is skipped if status becomes NS_TIMEOUT, or if there is a waiting request (or shutdown). The problem is that we made the decision to exit on timeout before we knew if exiting would violate the minthread condition. So we could do a wait, then again check if we will violate this condition, and if so, update status to NS_OK, and avoid unnecessary exit. If we move the timeout code inside the loop, we also avoid repeating/executing this anywhere else. this won't get rid of the exits due to maxconns. The code after your snippet assumes, that it will be reached only when there is a request ready for processing. This is what was happening under Apache Bench. Send a 1000 requests, as long as concurrency is a little bigger than current threads, things get stuck. It doesn't necessarily appear to be a thread exit issue. Other problems also show up. The driver thread also does sock cleanup, which gets stuck. These stuck socks are dropped connections, they just appear to be in a queue. In fact, they are not waiting to be serviced, the sock is just waiting for cleanup. I have noticed this as well, but have not tried to fix this. Sometimes the driver stops queueing requests, although some connection threads are idling around. Maybe this is the problem that jeff rogers had already fixed for 4.0? Another issue was the fact that the driver thread was not doing a condBroadcast on thread create, and if thread create is skipped, it just does a condSignal. Apparently it is not a good idea to just do condSignal. there was another problem in the book-keeping of idle etc. which i have fixed today (committed to cvs). The problem was that under heavy concurrency, NsQueueConn() did not know if there are conneciton threads in a cond wait or not. A condSignal is the right thing. It does not make sense to broadcast and wake up multiple waiting threads for a single request. i was not able to recreate the crash that you have reported in the other mail. However, with ab with small files and pure aolserver (without openacs), i could verify that some of the counters were wrong. The
Re: [AOLSERVER] Ns_RegisterAtReady, NsRunAtReadyProcs
Okay, there is a slight bug here in the new cvs code. What is happening is that sometimes, the driver thread and a connection thread are creating threads at the same time. I've added a few log statements, which makes the bug show up less often, but I can easily reproduce it in a few minutes at most (7-100k requests). Here is the instrumented proc, minus the code comments: void NsCreateConnThread(Pool *poolPtr, int joinThreads) { ConnData *dataPtr; Ns_Log(Notice, "Creating Threads, joinThreads = %i", joinThreads); if (joinThreads) { NsJoinConnThreads(); } dataPtr = ns_malloc(sizeof(ConnData)); dataPtr->poolPtr = poolPtr; dataPtr->connPtr = NULL; Ns_ThreadCreate(NsConnThread, dataPtr, 0, &dataPtr->thread); Ns_Log(Notice, "Done Creating Threads"); } Also I've added a log just before a thread is going to create a new thread to replace itself (from the NsConnThread) poolPtr->threads.current++; poolPtr->threads.idle++; Ns_Log(Notice, "About to exit and create new conn thread"); Ns_MutexUnlock(&poolPtr->lock); NsCreateConnThread(poolPtr, 0); The logs which lead up to fatal 11: [24/Oct/2007:14:20:20.924295][26946.1126394176][-default:36551-] Notice: Added thread to join list for thread exit [24/Oct/2007:14:20:20.924335][26946.1126394176][-default:36551-] Notice: exiting: exceeded max connections per thread [24/Oct/2007:14:20:20.925328][26946.1121110336][-nssock3:driver-] Notice: Closing Sock in SockClose [24/Oct/2007:14:20:20.925450][26946.1121110336][-nssock3:driver-] Notice: Creating Threads, joinThreads = 1 [24/Oct/2007:14:20:20.925508][26946.1121110336][-nssock3:driver-] Notice: Done Creating Threads [24/Oct/2007:14:20:20.926428][26946.1121110336][-nssock3:driver-] Notice: Closing Sock in SockClose [24/Oct/2007:14:20:20.927418][26946.1121110336][-nssock3:driver-] Notice: Closing Sock in SockClose [24/Oct/2007:14:20:20.928339][26946.1121110336][-nssock3:driver-] Notice: Closing Sock in SockClose [24/Oct/2007:14:20:20.929239][26946.1121110336][-nssock3:driver-] Notice: Closing Sock in SockClose [24/Oct/2007:14:20:20.930144][26946.1121110336][-nssock3:driver-] Notice: Closing Sock in SockClose [24/Oct/2007:14:20:20.930271][26946.1181346112][-default:36554-] Notice: Added thread to join list for thread exit [24/Oct/2007:14:20:20.930293][26946.1181346112][-default:36554-] Notice: About to exit and create new conn thread [24/Oct/2007:14:20:20.930301][26946.1181346112][-default:36554-] Notice: Creating Threads, joinThreads = 0 [24/Oct/2007:14:20:20.931191][26946.1121110336][-nssock3:driver-] Notice: Closing Sock in SockClose [24/Oct/2007:14:20:20.932100][26946.1121110336][-nssock3:driver-] Notice: Closing Sock in SockClose [24/Oct/2007:14:20:20.932986][26946.1121110336][-nssock3:driver-] Notice: Closing Sock in SockClose [24/Oct/2007:14:20:20.933628][26946.1130621248][-default:36553-] Notice: Added thread to join list for thread exit [24/Oct/2007:14:20:20.933650][26946.1130621248][-default:36553-] Notice: exiting: exceeded max connections per thread [24/Oct/2007:14:20:20.936221][26946.1126394176][-default:36555-] Notice: Added thread to join list for thread exit [24/Oct/2007:14:20:20.936242][26946.1126394176][-default:36555-] Notice: exiting: exceeded max connections per thread [24/Oct/2007:14:20:20.937692][26946.1121110336][-nssock3:driver-] Notice: Creating Threads, joinThreads = 1 [24/Oct/2007:14:20:20.937722][26946.1121110336][-nssock3:driver-] Fatal: received fatal signal 11 Summary: thread 36554: adds itself to join list thread 36554: about to exit mutex and run NsCreateConnThread thread 36554: At top of NsCreateConnThread thread 36554: sleeping driver thred: closing a sock, three times thread 36553: adds itself to join list, then exits (Doesn't create thread) thread 36555: adds itself to join list, then exits (Doesn't create thread) driver thred: At top of NsCreateConnThread driver thred: receives signal 11. Regardless of the actual cause of this bug, this is very illustrative of how quickly action switches from one thread to another. I'm guessing that since the original code only had the driver creating threads, there might be something unprotected, or maybe something is deleted, then assumed to exist. I kind of stumbled upon this when trying to test a pool setup like this: ns_pools set default -minthreads 1 -maxthreads 5 -maxconns 5 -timeout 10 Then I ran ab on a 28k static file (very fast to serve): ab -n 12 -c 120 http://192.168.1.102:8000/bgwrite/sns-thumb.jpg With higher -maxconns I doubt this bug would show up in real life. tom jackson On Wednesday 24 October 2007 10:10, Tom Jackson wrote: > I've fixed the last bug in my code, but I haven't yet focused on the > timeout issue as I'm benchmarking the current cvs code. -- AOLserver - http://www.aolserver.com/ To Remove yourself from this list, simply send an email to <[EMAIL PROTECTED]> with th
Re: [AOLSERVER] Ns_RegisterAtReady, NsRunAtReadyProcs
On Wednesday 24 October 2007 01:07, Gustaf Neumann wrote: > concerning the code in cvs head: > the server creates now worker threads when a > - thread exits and more requests are queued, > and no other thread is idle, or when a > - thread exits and the number of connection threads > falls below minthreads, > both cases unless shutdown. The handled thread > exists are typically due to exceeding maxconns > or the idle timeout. > > This is what my tests show. What are the problems? Or are > you taking about the version in cvs before these changes? > Of course, before these changes, minthreads was respected > only at startup, queued requests were not processed. > I've pointed out several times that I'm looking at the operation of the queue from the point of view of performance (both economy and responsiveness), not the specific numbers (min, max, timeout, maxconns) being maintained. Also, I'm looking at the impact of choosing different combinations of numbers. So for instance, if Andrew thinks it is a bug for minthreads to be violated, it is best to not use a thread timeout. That fixes the issue instantly. I don't think the operation of the queue has to hold your hand to keep you from making bad decisions. But lets address this a little more carefully before rushing in to fix something which may logically appear to be a bug, but probably isn't one. And also we need to look at the solution. I'll start with yours. If minthreads is taken as a sacred number, then your solution doesn't cut it. The thread exits and then recreates another one. This is merely cosmetic, and it only appears correct if you don't notice that the number actually went below minthreads. If there was a correct solution, it would prevent the violation of this lower limit, not fix it up after it happens. As an example, just look at the original code. How did threads escape exit at startup if there is a timeout? 388 if (poolPtr->threads.current <= poolPtr->threads.min) { 389 timePtr = NULL; 390 } ... So there was already some thought at preventing thread exit based upon a timeout, if minthreads would be violated. So the question is: why isn't this code working? There are a number of bugs in this code, and fixing it up at the end, after a thread has exited doesn't remove the actual bugs. An individual thread sits in a while loop waiting for a queued request this loop, and the following check is currently: status = NS_OK; while (!poolPtr->shutdown && status == NS_OK && poolPtr->queue.wait.firstPtr == NULL) { /* nothing is queued, we wait for a queue entry */ status = Ns_CondTimedWait(&poolPtr->cond, &poolPtr->lock, timePtr); } if (poolPtr->queue.wait.firstPtr == NULL) { msg = "timeout waiting for connection"; break; } Status starts out as NS_OK, the entire while loop is skipped if there is a waiting request, poolPtr->queue.wait.firstPtr != NULL. Once the wait is done, the conditions are again checked, the loop is skipped if status becomes NS_TIMEOUT, or if there is a waiting request (or shutdown). The problem is that we made the decision to exit on timeout before we knew if exiting would violate the minthread condition. So we could do a wait, then again check if we will violate this condition, and if so, update status to NS_OK, and avoid unnecessary exit. If we move the timeout code inside the loop, we also avoid repeating/executing this anywhere else. But there is another bug/error in the code. You can set timeout to <= 0. In this case, the code happily sets the timeout (as a real future time), but will immediately exit in wait. I've fixed the last bug in my code, but I haven't yet focused on the timeout issue as I'm benchmarking the current cvs code. Current CVS: Running the latest code, I discovered the same issue that my code had, but has been eliminated. If maxthreads is reduced [ns_pools set mypool -maxthreads x] the number of threads doesn't decrease. It also doesn't go down based upon load. Instead it goes immediately to original maxthreads and stays there. I'll need to be more exact in my descriptions later, but basically, the queue operation goes to max, which is better than it was before, when it was hard to get even up to min. However, as an overall performance goal, threadpools should moderate their behavior, as activity may move from one threadpool to another. If everything stays at max, this isn't achieved. The original problem is easy to describe now that I have done a bunch of testing: build up a large enough queue so that all remaining threads will not be able to service them, then stop sending requests. This is what was happening under Apache Bench. Send a 1000 requests, as long as concurrency is a little bigger than current threads, things get stuck. It doesn't necessarily appear to be a thread exit issue. Other problems also show up. The driv
Re: [AOLSERVER] Ns_RegisterAtReady, NsRunAtReadyProcs
Tom Jackson schrieb: I've been working on a slightly different angle here, trying to avoid thread exit, and to allow threads to range usefully between min and max. The current code in CVS has a number of problems which lead to the server not creating more threads when there are queued requests and the number of threads is below max (sometimes below min!). concerning the code in cvs head: the server creates now worker threads when a - thread exits and more requests are queued, and no other thread is idle, or when a - thread exits and the number of connection threads falls below minthreads, both cases unless shutdown. The handled thread exists are typically due to exceeding maxconns or the idle timeout. This is what my tests show. What are the problems? Or are you taking about the version in cvs before these changes? Of course, before these changes, minthreads was respected only at startup, queued requests were not processed. My first solution was based on a strategy, where a thread did not exit, when there was more to do (more queued requests), allthough it should exit due to maxconns. However, a user specifying maxconns might have good reasons for doing so, therefore the value must be respected. Avoiding exit in this situation is not viable option IMHO. It can be discussed, whether the idle timeout should be obeyed e.g. in situations where number of threads is below minthreads. In the code in cvs the thread exits and is recreated as in other cases, when it falls below minthreads. It would be more efficient to avoid the timeout in such situations, but one might end up with quite old worker threads, which are more sensitive to growth. -gustaf neumann -- AOLserver - http://www.aolserver.com/ To Remove yourself from this list, simply send an email to <[EMAIL PROTECTED]> with the body of "SIGNOFF AOLSERVER" in the email message. You can leave the Subject: field of your email blank.
Re: [AOLSERVER] Ns_RegisterAtReady, NsRunAtReadyProcs
I've been working on a slightly different angle here, trying to avoid thread exit, and to allow threads to range usefully between min and max. The current code in CVS has a number of problems which lead to the server not creating more threads when there are queued requests and the number of threads is below max (sometimes below min!). Now my test code has solved most of these issues. 1. threads go up very quickly to with load increase. 2. threads go down when load goes down (if queue is kept empty) to slightly below concurrency. 3. if maxthreads is lowered, threads slowly go down to meet the new value. There are still a few things I'd like to look at, so I'll hold off on any commit for now. tom jackson On Monday 22 October 2007 16:04, Gustaf Neumann wrote: > Andrew Piskorski schrieb: > >> My only point here was that I'm going to stop looking at the timeout > >> parameter, and timed out threads as an issue. If threads timeout, the > >> number of threads in a threadpool will drop below minthreads, usually to > >> zero. > > > > But isn't that a bug? If not, just what is "minthreads" supposed to > > really mean? > > you are right, if someone specified minthreads, they should be maintained. > i have added an additional condition to the thread recreation to > maintain the specified minimal number of threads. > maybe Tom has something more sophisticated in mind. this is again > a minimal invasive fix, similar to the fix for assuring the processing > of the queued requests, when all connection threads exit. > > for more details, see: > http://aolserver.cvs.sourceforge.net/aolserver/aolserver/nsd/queue.c?view=l >og > > best regards > -gustaf neumann > > > -- > AOLserver - http://www.aolserver.com/ > > To Remove yourself from this list, simply send an email to > <[EMAIL PROTECTED]> with the body of "SIGNOFF AOLSERVER" in the > email message. You can leave the Subject: field of your email blank. -- AOLserver - http://www.aolserver.com/ To Remove yourself from this list, simply send an email to <[EMAIL PROTECTED]> with the body of "SIGNOFF AOLSERVER" in the email message. You can leave the Subject: field of your email blank.
Re: [AOLSERVER] Ns_RegisterAtReady, NsRunAtReadyProcs
Andrew Piskorski schrieb: My only point here was that I'm going to stop looking at the timeout parameter, and timed out threads as an issue. If threads timeout, the number of threads in a threadpool will drop below minthreads, usually to zero. But isn't that a bug? If not, just what is "minthreads" supposed to really mean? you are right, if someone specified minthreads, they should be maintained. i have added an additional condition to the thread recreation to maintain the specified minimal number of threads. maybe Tom has something more sophisticated in mind. this is again a minimal invasive fix, similar to the fix for assuring the processing of the queued requests, when all connection threads exit. for more details, see: http://aolserver.cvs.sourceforge.net/aolserver/aolserver/nsd/queue.c?view=log best regards -gustaf neumann -- AOLserver - http://www.aolserver.com/ To Remove yourself from this list, simply send an email to <[EMAIL PROTECTED]> with the body of "SIGNOFF AOLSERVER" in the email message. You can leave the Subject: field of your email blank.
Re: [AOLSERVER] Ns_RegisterAtReady, NsRunAtReadyProcs
Andrew, Maybe I should have said that I'll look at the issue, but not as a performance issue. The bug aspect may be simply allowing a timeout. Otherwise finding a solution would probably require a thread to say that it is going to sleep until a connection arrives. Then other threads can figure out if they can exit on timeout without pushing current below min. On Monday 22 October 2007 07:48, Andrew Piskorski wrote: > On Mon, Oct 22, 2007 at 06:30:40AM -0700, Tom Jackson wrote: > > My only point here was that I'm going to stop looking at the timeout > > parameter, and timed out threads as an issue. If threads timeout, the > > number of threads in a threadpool will drop below minthreads, usually to > > zero. > > But isn't that a bug? If not, just what is "minthreads" supposed to > really mean? -- AOLserver - http://www.aolserver.com/ To Remove yourself from this list, simply send an email to <[EMAIL PROTECTED]> with the body of "SIGNOFF AOLSERVER" in the email message. You can leave the Subject: field of your email blank.
Re: [AOLSERVER] Ns_RegisterAtReady, NsRunAtReadyProcs
Andrew, It might be a bug, and I just discovered another one related to this. If -timeout is set to zero or less, the value is still used, but it should probably be set to null, meaning infinite timeout: 403if (poolPtr->threads.current <= poolPtr->threads.min) { 404timePtr = NULL; 405} else { For one, I've moved this check inside the following while loop so it gets a chance to change on each cycle, but I also made these modifications: . if (poolPtr->threads.timeout <= 0 . || poolPtr->threads.current -1 <= poolPtr->threads.min) { . timePtr = NULL; . } else { The behavior I see with apache bench is that after a bunch of requests are finished, most of the threads exit, if there is a timeout. However, this may be a result of the fact that apache bench sends extra requests and then disconnects them. The problem is that each thread calculates for itself what to do, and there is no way for them to know what other threads are going to do within the next cycle. I've started looking at JMeter, because apache bench doesn't really do what you ask it to do, but maybe that is good. You discover things you weren't testing for. For one, even with concurrency set at some number, fast requests will lead apache bench to send another request before cleanup is complete on AOLserver. AOLserver thinks the concurrency is higher than ab, up to double the concurrency. I know this because I'm testing a boost factor for minthreads. On large requests, threads.current never reaches the concurrency level of ab, but on small requests, it can exceed it, up to maxthreads-2. With my latest changes to my setup, the threadpool moderates to something else: with high concurrency, the idle threads value tends to zero. If concurrency is less than maxthreads, the queued requests tends to zero. Eventually, if concurrency is less than maxthreads, both of these values tend to zero, they bounce around a low number and zero, so all threads are busy and the queue is empty. Otherwise the trend appears to be current+queued=concurrency and idle=0. tom jackson On Monday 22 October 2007 07:48, Andrew Piskorski wrote: > On Mon, Oct 22, 2007 at 06:30:40AM -0700, Tom Jackson wrote: > > My only point here was that I'm going to stop looking at the timeout > > parameter, and timed out threads as an issue. If threads timeout, the > > number of threads in a threadpool will drop below minthreads, usually to > > zero. > > But isn't that a bug? If not, just what is "minthreads" supposed to > really mean? -- AOLserver - http://www.aolserver.com/ To Remove yourself from this list, simply send an email to <[EMAIL PROTECTED]> with the body of "SIGNOFF AOLSERVER" in the email message. You can leave the Subject: field of your email blank.
Re: [AOLSERVER] Ns_RegisterAtReady, NsRunAtReadyProcs
On Mon, Oct 22, 2007 at 06:30:40AM -0700, Tom Jackson wrote: > My only point here was that I'm going to stop looking at the timeout > parameter, and timed out threads as an issue. If threads timeout, the number > of threads in a threadpool will drop below minthreads, usually to zero. But isn't that a bug? If not, just what is "minthreads" supposed to really mean? -- Andrew Piskorski <[EMAIL PROTECTED]> http://www.piskorski.com/ -- AOLserver - http://www.aolserver.com/ To Remove yourself from this list, simply send an email to <[EMAIL PROTECTED]> with the body of "SIGNOFF AOLSERVER" in the email message. You can leave the Subject: field of your email blank.
Re: [AOLSERVER] Ns_RegisterAtReady, NsRunAtReadyProcs
On Sunday 21 October 2007 23:47, Gustaf Neumann wrote: > i don't understand. what timeouts exactly are you taking about? In the > code are several timeouts for > various purposes... The timeout associated with pool threads. If a thread times out waiting for a request, it exits. But in AOLserver, you don't really reclaim any memory, so unless there is some other reason to exit, it doesn't seem like a very useful parameter, when performance is a consideration. My only point here was that I'm going to stop looking at the timeout parameter, and timed out threads as an issue. If threads timeout, the number of threads in a threadpool will drop below minthreads, usually to zero. tom jackson -- AOLserver - http://www.aolserver.com/ To Remove yourself from this list, simply send an email to <[EMAIL PROTECTED]> with the body of "SIGNOFF AOLSERVER" in the email message. You can leave the Subject: field of your email blank.
Re: [AOLSERVER] Ns_RegisterAtReady, NsRunAtReadyProcs
Tom Jackson schrieb: But there was something a little more interesting to track down: what happens when a thread exits? ... Apparently, all the interps associated with the thread exit. certainly, every worker thread has its own tcl interpreter associated. When the interpreter exits, the tcl interpreter with all its state exists as well. This impacts the current discussion because I'm having trouble finding an easy way to avoid thread exit because of a timeout. But this information seems to simplify the problem. Why exit a thread just because of a timeout? i don't understand. what timeouts exactly are you taking about? In the code are several timeouts for various purposes... -gustaf neumann -- AOLserver - http://www.aolserver.com/ To Remove yourself from this list, simply send an email to <[EMAIL PROTECTED]> with the body of "SIGNOFF AOLSERVER" in the email message. You can leave the Subject: field of your email blank.
Re: [AOLSERVER] Ns_RegisterAtReady, NsRunAtReadyProcs
I've been working on this more with some effort to get the number of connection threads to actually stay somewhere close to the range of minthreads to maxthreads. But there was something a little more interesting to track down: what happens when a thread exits? I think there is some idea that the reason for exiting a thread is to remove memory leaks which might occur. But the question is, if threads are shared among servers, and threads exit on a number of connections, what else happens when the thread exits? Apparently, all the interps associated with the thread exit. I haven't seen any evidence against this conclusion. What this means is that thread exit is expensive. It could take down multiple interps (one for each server sharing the thread). This impacts the current discussion because I'm having trouble finding an easy way to avoid thread exit because of a timeout. But this information seems to simplify the problem. Why exit a thread just because of a timeout? I can't see any possible advantage. Instead, every effort should be directed at keeping threads alive until maxconns is reached. Also, shared threadpools should be limited to the default and error pools, otherwise a simple server could quickly expire interps of a more expensive server. tom jackson On Sunday 21 October 2007 14:47, Gustaf Neumann wrote: > Tom Jackson schrieb: > > The if statement on line 507 checks the Pool structure outside of a mutex > > lock. > > Good observation. > The code was not correct, although the potential damage was small. > Fixed by now. -- AOLserver - http://www.aolserver.com/ To Remove yourself from this list, simply send an email to <[EMAIL PROTECTED]> with the body of "SIGNOFF AOLSERVER" in the email message. You can leave the Subject: field of your email blank.
Re: [AOLSERVER] Ns_RegisterAtReady, NsRunAtReadyProcs
On Friday 19 October 2007 09:12, Gustaf Neumann wrote: > Actually, if minthreads is set to a value > 0 (it was not set), then idle > threads should care about queued requests. Maybe the > easier fix is to set minthreads per default to 1. I will > try this, when i am back home (in a couple of hours). AOLserver does not respect minthreads except at startup. This is part of the same issue: Nothing except a request can create a thread. It seems like on thread exit, a little accounting process could go on to bring threads.current up to threads.min, this might require more than one thread creation, maybe not. In your patch, you change the while loop test: + while (poolPtr->threads.maxconns <= 0 + || ncons-- > 0 + || poolPtr->queue.wait.num > 1) { Shouldn't the loop continue with poolPtr->queue.wait.num > 0 ? Your patch looks like a great fix...I just still don't understand why the server would completely lock up. As long as you have requests coming in, seems like a new thread would get created. I wonder if what is happening is that Apache Bench simple stops sending new requests because none of the requests are finishing. If it can block, I wonder if simply visiting with a browser would kick things off again? tom jackson -- AOLserver - http://www.aolserver.com/ To Remove yourself from this list, simply send an email to <[EMAIL PROTECTED]> with the body of "SIGNOFF AOLSERVER" in the email message. You can leave the Subject: field of your email blank.
Re: [AOLSERVER] Ns_RegisterAtReady, NsRunAtReadyProcs
Tom Jackson schrieb: The if statement on line 507 checks the Pool structure outside of a mutex lock. Good observation. The code was not correct, although the potential damage was small. Fixed by now. -gustaf -- AOLserver - http://www.aolserver.com/ To Remove yourself from this list, simply send an email to <[EMAIL PROTECTED]> with the body of "SIGNOFF AOLSERVER" in the email message. You can leave the Subject: field of your email blank.
Re: [AOLSERVER] Ns_RegisterAtReady, NsRunAtReadyProcs
Hey, (Note: none of my changes have been committed, they are not complete or integrated with Gustaf's code yet.) I also looked more at this, the idea of triggering the accounting process at the end of a thread's life seems very good. One real problem with this setup is that there is no independent thread which can be triggered to do a pool maintenance cycle. Until now this has been characterized by several bad behaviors which leads to poor performance and too much thread exiting on timeouts. If a timeout exists, any lull in activity will expire all threads. I haven't tried Gustaf's patch yet because I have been working on the accounting portion of the problem. Here are some of the things I have tried, and the apparent results: 1. Added a new counter to the Pool structure to signal the number of threads to create. (poolPtr->threads.create) 2. A connection thread incrs this under a certain conditions: a. if (ncons == 1 && poolPtr->threads.maxconns > 0 ) { ++poolPtr->threads.create; } However, this leads to slightly more threads sometimes than maxthreads, Usually, the number of threads exactly equals minthreads due to other changes. Maxthreads is exceeded if you use a low number of maxconns. b. Thread timeout is moved inside the wait loop, and recalculated on each cycle: while (!poolPtr->shutdown && status == NS_OK && poolPtr->queue.wait.firstPtr == NULL) { /* * Wait for a connection to arrive, exiting if one doesn't * arrive in the configured timeout period. */ if (poolPtr->threads.current -1 <= poolPtr->threads.min) { timePtr = NULL; } else { Ns_GetTime(&wait); Ns_IncrTime(&wait, poolPtr->threads.timeout, 0); timePtr = &wait; } status = Ns_CondTimedWait(&poolPtr->cond, &poolPtr->lock, timePtr); } Also, the conditions for infinite wait are slightly changed. c. If a thread times out, a check is made if this will push the min threads too low, and an additional create is added. It seems possible that a single thread could signal the creation of two threads during it's lifetime, but this may just be making up for threads which exit without signalling a new thread. This is probably a mistake. What I have observed is that this doesn't prevent threads from exiting below minthreads if there is a lull in activity, but create still has the accounting of these and if a single request comes in, they are all created. Then they may timeout again. Probably timed out threads should not be replaced, but a way to keep timed out threads from exiting if this pushes below minthreads doesn't exist yet. So this will be removed, but for now it allows a little probe as to the thread state and driver thread state. if (poolPtr->queue.wait.firstPtr == NULL) { msg = "timeout waiting for connection"; if (poolPtr->threads.current -1 <= poolPtr->threads.min) { ++poolPtr->threads.create; } break; } 3. The number of queued requests was unknown before. poolPtr->threads.queued only increased. I added a line to reduce this number when requests are dequeued. This made it very easy to discover how things are working between the driver thread and the connection threads. For instance, if a heavy load of requests suddenly ends, there appear to be leftover queued requests. Using netstat, you can see these in CLOSE_WAIT. These appear to be requests which were abandoned (by apache bench), but the driver thread or connection threads are stuck, and they remain in this state until another (even single) request comes in. So another difference with this code appears to be that the queued, but abandoned requests are not being serviced anymore. It may be that the driver thread gets triggerd one more time and cleans up the mess. I still have to verify this, but it would be good if true. 4. Gustaf identified another problem with the code: the thread creation proc includes a thread join. This makes it impossible to use inside a connection thread. There is actually no reason to do the thread join in a the thread create proc. Only the driver thread can do this, or the pool exit code. So it should be removed from where it is and placed in the NsQueueConn proc, just before thread creation is done. My code doesn't yet reflect this idea, but I have added another proc to create multiple threads. It does the accounting and then loops while poolPtr->threads.create-- > 0: void NsCreateConnThreads(Pool *poolPtr) { /* * Reap any dead threads. NOTE: Move this to NsQueueConn */ NsJoinConnThreads(); /* * Create new connection threads. */ Ns_MutexLock(&poolPtr->lock); if
Re: [AOLSERVER] Ns_RegisterAtReady, NsRunAtReadyProcs
Tom Jackson schrieb: On Friday 19 October 2007 09:12, Gustaf Neumann wrote: Actually, if minthreads is set to a value > 0 (it was not set), then idle threads should care about queued requests. Maybe the easier fix is to set minthreads per default to 1. I will try this, when i am back home (in a couple of hours). AOLserver does not respect minthreads except at startup. This is part of the same issue: Nothing except a request can create a thread. It seems like on thread exit, a little accounting process could go on to bring threads.current up to threads.min, this might require more than one thread creation, maybe not. fully agreed. i continued a little on the isue and commited a patch to CVS, which respects maxconns always. Instead of performing in boundary situations in the worst case more than the configured maxconns requests, the code creates now a new connection thread automatically after the exit of a thread coming to the end of it work cycle, when jobs are pending and no other thread is able to process these. i made as well a small variant of this (not in CVS), which defines a new command NsCheckPools() in pools.c which iterates over pools and starts a thread in the same situation as above. NsCheckPools could be extended to check for the existance of minthreads etc. However, for this patch, i wanted to be as little invasive as necessary, and added the thread-revival code to the place, where a thread exits. For the new thread generation, i had to parameterize NsCreateConnThread() to avoid a resource deadlock in Ns_ThreadJoin(). In your patch, you change the while loop test: + while (poolPtr->threads.maxconns <= 0 + || ncons-- > 0 + || poolPtr->queue.wait.num > 1) { Shouldn't the loop continue with poolPtr->queue.wait.num > 0 ? 0 or 1, both is fine here. the condition trigger for the new cases only when > 1 is used. Your patch looks like a great fix...I just still don't understand why the server would completely lock up. As long as you have requests coming in, seems like a new thread would get created. I wonder if what is happening is that Apache Bench simple stops sending new requests because none of the requests are finishing. If it can block, I wonder if simply visiting with a browser would kick things off again? one can use the browser to kick things off again, but if there are already say 50 requests in the queue, and the browser hangs in the first request. The bad thing is, that in the meantime, some other bulky requests are queued as well. So the queue might be acutally permanently growing. for my testing, i use: ab -c 20 -n 35 http://localhost:8003/file-storage/view/dotlrn-2.3.1.tgz with maxthreads 5 and maxconns 3 I get reliable hangs, where the last thread exists with about 5 to 15 queued requests. Without the recent patches, the queue is processed until it is empty. btw, the reason, when the patch helps, is no miracle. It is completely clear why the server hangs. This bug is not a classical deadlock (it is possible that the queued reuests are processed), but shares some properties of a life lock (some other requests prevent the processing of some other requests, at least for a while). It is not a race condition either. I don't believe that TriggerDriver() is the right place to handle the problem, since the driver is happily accepting requests in the bug situation. It can certainly be, that the fixed bug is different from the bug the Jeff Rogers fixed originally. -gustaf neumann PS: i am pretty sure that this is the same bug as on openacs.org. Around the time of the bug, somebody in spain was apparently benchmarking his internet connection, downloading dotlrn*.tar* in multiple parallel sessions. Once i saw that the user was trying to download 1200 copies of dotlrn.tar in about 10 minutes. -- AOLserver - http://www.aolserver.com/ To Remove yourself from this list, simply send an email to <[EMAIL PROTECTED]> with the body of "SIGNOFF AOLSERVER" in the email message. You can leave the Subject: field of your email blank.
Re: [AOLSERVER] Ns_RegisterAtReady, NsRunAtReadyProcs
On Saturday 20 October 2007 00:14, Jeff Rogers wrote: > Do you still get the lockup in your scenario with keepalives off? If > not, it might be a workaround until it gets fixed. If memory serves, it > would not be the first time that keepalives have been broken. What I suspect is that there isn't a race condition, just a test condition which causes the system to appear stuck. I doubt I'll ever see this in production. But a low number of threads may lead to this type of thing where you have a few clients requesting a bunch of files using keepalive. When the thread exits, it could leave unprocessed requests in the queue. Or maybe a few clients could essentially hog all the threads within the keep-alive loop. I'm not 100% sure this is a short circuit of the regular queue, but it could be. If these greedy clients hog all the connection threads, and work fast enough, the connection threads will never exit (due to the new patch). This might actually work against sharing threads, but I'll have to test if exiting a thread and creating another one gives queued requests a chance to execute. tom jackson -- AOLserver - http://www.aolserver.com/ To Remove yourself from this list, simply send an email to <[EMAIL PROTECTED]> with the body of "SIGNOFF AOLSERVER" in the email message. You can leave the Subject: field of your email blank.
Re: [AOLSERVER] Ns_RegisterAtReady, NsRunAtReadyProcs
Tom Jackson wrote: Now how to get ab stuck and give up? I use a regular browser (mozilla), and I try to load the same image, I hit reload like 5-7 times or so. The number of reloads is larger than the number of maxconns per thread. There is only one thread working. What may be happening is that the keepalive system is expiring the thread all by itself (one client with keepalive). Even though there is a queue, maybe the sock and therefore the conn are being reused, giving no other requests a chance. The last log entry indicates that the thread has processed too many conns and is exiting. Then ab freezes. However, if I reload the page one more time in the browser, ab starts up again. I still think all the evidence points to a logical error of deleting a thread and then not maintaining at least one thread (and prefer minthreads) if there is something in the queue. This isn't a race condition, I don't think, because it is easy to repeat this: it happens exactly the same way every time. So it looks like a keep-wait socket is immediately put to the head of the line looking for more requests from the same client. If you have a web page with 50+ images, you can easily blow out any reasonable maxconns setting, or if someone reloads a less heavy page. Do you still get the lockup in your scenario with keepalives off? If not, it might be a workaround until it gets fixed. If memory serves, it would not be the first time that keepalives have been broken. -J -- AOLserver - http://www.aolserver.com/ To Remove yourself from this list, simply send an email to <[EMAIL PROTECTED]> with the body of "SIGNOFF AOLSERVER" in the email message. You can leave the Subject: field of your email blank.
Re: [AOLSERVER] Ns_RegisterAtReady, NsRunAtReadyProcs
I just started another thread about using apache bench. I'm not sure if it is working as advertised, and I wonder if it might be misleading us with the testing of this potential race condition fix. Using the most recent version of queue.c, I have been able to create a timeout on the apache bench test. That is, the client gets stuck and eventually times out and quits. The ab command was: $ ab -t 500 -n 50 -c 1 http://192.168.1.102:8000/bgwrite/sns-thumb.jpg There is a rotation of 0-5 sec delay built in to this, so even with a 28k file, it takes a while to chew through 50 requests. (about 101 sec). The -t 500 appears to be a useless option, ab always times out much quicker than this value. Now how to get ab stuck and give up? I use a regular browser (mozilla), and I try to load the same image, I hit reload like 5-7 times or so. The number of reloads is larger than the number of maxconns per thread. There is only one thread working. What may be happening is that the keepalive system is expiring the thread all by itself (one client with keepalive). Even though there is a queue, maybe the sock and therefore the conn are being reused, giving no other requests a chance. The last log entry indicates that the thread has processed too many conns and is exiting. Then ab freezes. However, if I reload the page one more time in the browser, ab starts up again. I still think all the evidence points to a logical error of deleting a thread and then not maintaining at least one thread (and prefer minthreads) if there is something in the queue. This isn't a race condition, I don't think, because it is easy to repeat this: it happens exactly the same way every time. This might not be easy to do correctly because killing threads is part of shutting down, so a thread might want to die because of a shutdown, not because maxconns was reached, or a timeout. In driver.c here is the keep-alive section: /* * Process Sock's returned for keep-alive or close. */ while ((sockPtr = closePtr) != NULL) { closePtr = sockPtr->nextPtr; if (!stop && sockPtr->state == SOCK_READWAIT) { sockPtr->connPtr = AllocConn(drvPtr, &now, sockPtr); SockWait(sockPtr, &now, drvPtr->keepwait, &waitPtr); /* <--keep*/ } else if (!drvPtr->closewait || shutdown(sockPtr->sock, 1) != 0) { /* Graceful close diabled or shutdown() failed. */ SockClose(sockPtr); } else { SockWait(sockPtr, &now, drvPtr->closewait, &waitPtr); } } So it looks like a keep-wait socket is immediately put to the head of the line looking for more requests from the same client. If you have a web page with 50+ images, you can easily blow out any reasonable maxconns setting, or if someone reloads a less heavy page. tom jackson -- AOLserver - http://www.aolserver.com/ To Remove yourself from this list, simply send an email to <[EMAIL PROTECTED]> with the body of "SIGNOFF AOLSERVER" in the email message. You can leave the Subject: field of your email blank.
Re: [AOLSERVER] Ns_RegisterAtReady, NsRunAtReadyProcs
On Friday 19 October 2007 12:46, Jeff Rogers wrote: > Looking at it slightly more deeply, the AtReady callback shouldn't be > needed, because TriggerDriver is called from NsFreeConn, which is called > every time through the ConnThread loop after the connections have been > put back on the free list. > This is one thing that I noticed, TriggerDriver is called in more places, but the point seems to be to update any statistics, and also to return conns. And of course, the driver doesn't need triggering when a lot of requests are coming in, only when the server starts to reap stale or used up threads. One thing which seems different is that the driver doesn't hold on to requests, they are queued, so there is no reason for threads to need to do a TriggerDriver in order to get a new conn (assuming one is in the queue). In other words, the TriggerDriver is used to return stuff to the driver thread, not to signal ready. At least one thing is still missing here: if there is a -timeout on the thread wait, all threads will end up exiting, going past minthreads. If -timeout is set to 0, threads will exit and are not replaced when this ends up leaving less than minthreads available. If you get surges of requests, a timeout of zero would help a little to keep from continuously creating new threads. tom jackson -- AOLserver - http://www.aolserver.com/ To Remove yourself from this list, simply send an email to <[EMAIL PROTECTED]> with the body of "SIGNOFF AOLSERVER" in the email message. You can leave the Subject: field of your email blank.
Re: [AOLSERVER] Ns_RegisterAtReady, NsRunAtReadyProcs
Tom Jackson wrote: Jeff, I don't have a problem with it being in, but it actually needs to do something we can measure. There are significant overall differences between 4.0.10 and 4.5, although if you look at small parts it seems the same. Just as an example, the patch in driver.c was hundreds of lines off from 4.0.10 to 4.5. I was suggesting adding in extra stuff so that the patch to register the AtReady proc would actually run; you're right in that it appears that without such the patch looks like it does nothing. Looking at it slightly more deeply, the AtReady callback shouldn't be needed, because TriggerDriver is called from NsFreeConn, which is called every time through the ConnThread loop after the connections have been put back on the free list. Somehow Gustaf has been able to reproduce the fix with just the driver.c change, and the new fix he is trying out also works. I'm not sure we understand why either of them work quite yet, or if they will continue to work...and that is the real problem. Magical fixes are nice, but in the long we might need to know more about it. Magical fixes are fine so long as the magic keeps working. But something like there where there appears to be no effect may mean that it is super timing sensitive, maybe pushing an allocated structure into another block which causes a cache miss at just the right time to allow a task switch... ugh, my brain hurts just thinking about it. 5 guys and 5 chopsticks are so much simpler :) I have ab run without either fix and never had a problem. Actually the call was in driver.c, but the function body was a noop, it never even registered the TriggerDriver, the AtReadyProcs fuction was completely removed, not even defined. To me this only means that I don't know how to reproduce the bad behavior, but at least it didn't cause any difference for me. Gustaf's description of his reproduction scenario sounds much different from mine from last year, which suggests that the underlying cause is different, and this trigger/atready is a red herring. -J tom jackson On Friday 19 October 2007 11:11, Jeff Rogers wrote: I'm thinking the call to NsRunAtReadyProcs should be re-added in. It was apparently removed a long time ago, just after 4.0.10 was released: http://aolserver.cvs.sourceforge.net/aolserver/aolserver/nsd/queue.c?r1=1.2 4&r2=1.25 As the api was not being used at all it was fine to remove it, until it became useful for this race condition :) I haven't successfully reproduced the problem in 4.5 yet (or read the code deeply, but the main connection service loop - queue.c:NsConnThread - is not significantly different now) so it may be a different underlying problem than the one I described, but it should be an easy test to see if adding the AtReady callbacks back into the end of the loop helps things. -- AOLserver - http://www.aolserver.com/ To Remove yourself from this list, simply send an email to <[EMAIL PROTECTED]> with the body of "SIGNOFF AOLSERVER" in the email message. You can leave the Subject: field of your email blank.
Re: [AOLSERVER] Ns_RegisterAtReady, NsRunAtReadyProcs
Jeff, I don't have a problem with it being in, but it actually needs to do something we can measure. There are significant overall differences between 4.0.10 and 4.5, although if you look at small parts it seems the same. Just as an example, the patch in driver.c was hundreds of lines off from 4.0.10 to 4.5. Somehow Gustaf has been able to reproduce the fix with just the driver.c change, and the new fix he is trying out also works. I'm not sure we understand why either of them work quite yet, or if they will continue to work...and that is the real problem. Magical fixes are nice, but in the long we might need to know more about it. I have ab run without either fix and never had a problem. Actually the call was in driver.c, but the function body was a noop, it never even registered the TriggerDriver, the AtReadyProcs fuction was completely removed, not even defined. To me this only means that I don't know how to reproduce the bad behavior, but at least it didn't cause any difference for me. tom jackson On Friday 19 October 2007 11:11, Jeff Rogers wrote: > I'm thinking the call to NsRunAtReadyProcs should be re-added in. It > was apparently removed a long time ago, just after 4.0.10 was released: > > http://aolserver.cvs.sourceforge.net/aolserver/aolserver/nsd/queue.c?r1=1.2 >4&r2=1.25 > > As the api was not being used at all it was fine to remove it, until it > became useful for this race condition :) I haven't successfully > reproduced the problem in 4.5 yet (or read the code deeply, but the main > connection service loop - queue.c:NsConnThread - is not significantly > different now) so it may be a different underlying problem than the one > I described, but it should be an easy test to see if adding the AtReady > callbacks back into the end of the loop helps things. > > -J > > > But, you're absolutely right: we need to figure this out, one way or > > another. Thanks for finding this. > > > > -- Dossy > > -- > AOLserver - http://www.aolserver.com/ > > To Remove yourself from this list, simply send an email to > <[EMAIL PROTECTED]> with the body of "SIGNOFF AOLSERVER" in the > email message. You can leave the Subject: field of your email blank. -- AOLserver - http://www.aolserver.com/ To Remove yourself from this list, simply send an email to <[EMAIL PROTECTED]> with the body of "SIGNOFF AOLSERVER" in the email message. You can leave the Subject: field of your email blank.
Re: [AOLSERVER] Ns_RegisterAtReady, NsRunAtReadyProcs
Dossy Shiobara wrote: On 2007.10.13, Tom Jackson <[EMAIL PROTECTED]> wrote: This all looks great, except for the fact that the NsRunAtReadyProcs proc is never called. In 4.5, that appears to be the case. In 4.0, it's called from nsd/queue.c. I'd need to look more closely at the code to see if we should remove those functions entirely in 4.5 or if there's actually a missing call to NsRunAtReadyProcs() somewhere. I'm thinking the call to NsRunAtReadyProcs should be re-added in. It was apparently removed a long time ago, just after 4.0.10 was released: http://aolserver.cvs.sourceforge.net/aolserver/aolserver/nsd/queue.c?r1=1.24&r2=1.25 As the api was not being used at all it was fine to remove it, until it became useful for this race condition :) I haven't successfully reproduced the problem in 4.5 yet (or read the code deeply, but the main connection service loop - queue.c:NsConnThread - is not significantly different now) so it may be a different underlying problem than the one I described, but it should be an easy test to see if adding the AtReady callbacks back into the end of the loop helps things. -J But, you're absolutely right: we need to figure this out, one way or another. Thanks for finding this. -- Dossy -- AOLserver - http://www.aolserver.com/ To Remove yourself from this list, simply send an email to <[EMAIL PROTECTED]> with the body of "SIGNOFF AOLSERVER" in the email message. You can leave the Subject: field of your email blank.
Re: [AOLSERVER] Ns_RegisterAtReady, NsRunAtReadyProcs
Tom Jackson schrieb: I can try running my tests again with 3-5 threads, what other parameters were there? Does this get stuck running OpenACS on any machine, or just on openacs.org? As written before, i set maxthreads to 5 and maxconnections to 3 to trigger the problem. I did not mess with openacs.org, but reproduced the problem under mac os x. Also, Gustaf, the maxconnections should be higher...with maxconnections=3 you are exiting a connection thread every three requests, is this how you want it to work? yes, i wanted to trigger the bug. When the problem occured, openacs.org had different settings (maxthreads 5 maxconnections 10). Very interesting, I'll look into this more today. One thing I didn't try was a configuration where maxconnections was lower than the number of simultaneous requests from Apache Bench. That is not the point. Run ab with more parallel requests than maxthreads, such that requests queue up. As long as maxconnections is high enough, the connection threads will process the queued requests. If the number of queued requests is higher than the number of maxconns, matters degrade. Having one or two queued requests on a thread exit (due to maxconn exceeded) is no real problem, since on the next request, a new thread is started, which will care about the queued request as well. That means, that under "good conditions", it is possible that there are e.g. 10 requests queued, but a slow and steady heartbeat of request will cause the queue to empty itself, such that aolserver recovers. The worst case scenario is actually maxconnections 1 (stop every connection thread after every request). In this situation (without my patch) if a queue piles up, it is never shrinking. Actually, if minthreads is set to a value > 0 (it was not set), then idle threads should care about queued requests. Maybe the easier fix is to set minthreads per default to 1. I will try this, when i am back home (in a couple of hours). How exactly does the patch work? It appears to simply extend the tour-of-duty for a thread, yes but does the exiting thread ever get to trigger the queue to create new threads? no, i have not modified anything in this regard. When a thread processes all queued requests before exiting, there is no urgent need to create a new connection request. The next incoming request will create a new thread on demand. Once threads die, it is nearly impossible to get the number of threads back up, i have not observed this in my test, the problem appears with reaching maxthreads. This does not exclude, that there might be another problem. -gustaf neumann -- AOLserver - http://www.aolserver.com/ To Remove yourself from this list, simply send an email to <[EMAIL PROTECTED]> with the body of "SIGNOFF AOLSERVER" in the email message. You can leave the Subject: field of your email blank.
Re: [AOLSERVER] Ns_RegisterAtReady, NsRunAtReadyProcs
On Friday 19 October 2007 03:17, Gustaf Neumann wrote: > I have commited a patch to cvs head, that processes these pending > requests, even when maxconns is exceeded. Maybe it is better to > restart exiting connection threads automatically or to limit the > number of queued requests at first hand; or maybe there is a callback > solution. However. I have removed the old - useless - patch > from the head version in CVS. > More on the patch and the problem. Your code comment is that: +request is queued without resources to process these. +One has to take care about that one restarts the queue, when +resources become available again. The only way the driver thread could guess if a queue has resources is to look at the number of maxthreads, making sure it is at least 1. But it would only be a guess, it could instantly change to zero if someone decided to zero out the threadpool. How exactly does the patch work? It appears to simply extend the tour-of-duty for a thread, but does the exiting thread ever get to trigger the queue to create new threads? This seemed to be the issue several users noticed: exiting threads never cause the queue to reassess the situation. At the very least, it seems that when a thread exits, the queue checks if the number of threads is less than minthreads, and if so, creates a new thread, but also creates a new thread if minthreads = 0 and there are queued requests. Then the queue should loop over requests creating threads for queued requests until it reaches maxthreads. Okay, maybe restate this: the queue should loop over requests up to the point of reaching maxthreads. If there is a spike in activity, it is best to immediately respond with more threads up to the limit. I think the opposite iw what is happening. Once threads die, it is nearly impossible to get the number of threads back up, and threads is how long requests are handled, otherwise the wait time is additive. tom jackson -- AOLserver - http://www.aolserver.com/ To Remove yourself from this list, simply send an email to <[EMAIL PROTECTED]> with the body of "SIGNOFF AOLSERVER" in the email message. You can leave the Subject: field of your email blank.
Re: [AOLSERVER] Ns_RegisterAtReady, NsRunAtReadyProcs
On Friday 19 October 2007 03:17, Gustaf Neumann wrote: > The problem > actually happend when the number of available connection threads > is configured very low, as well as the the number of requests > per connection thread (e.g. 5 and 3). When the server accepts a > high number of requests, these are queued even without > guaranteeing resources to process these. I know this is true, several months ago I configured a threadpool with zero threads after startup. The driver thread will still send requests to the threadpool queue, where it will sit forever. Also, if you then configure the threadpool with more threads after going to zero, they are never created. I also noticed that you should be able to trigger the driver by calling the Tcl command [ns_driver query ...], so maybe an experiment with a scheduled proc would be interesting: put the server into the unresponsive mode and see if it gets unstuck. The log should show the scheduled thread running, then hopefully the server unsticking. Otherwise it isn't necessarily the driver thread which is stuck, so triggering it would be pointless. I can try running my tests again with 3-5 threads, what other parameters were there? Does this get stuck running OpenACS on any machine, or just on openacs.org? Okay, just read Gustaf's comments on his patch, there might be a problem with the configuration on openacs.org, see below... > The situation > is especially bad with bursty requests. When a connection thread > exits e.g. after 3 requests, and there are still many requests queued, > there is no mechanism to automatically trigger new > connection threads in this situation. When a client connects > in this situation to the server, some old (maybe obsolete) requests > are taken from the queue, but after this, the server might sit idle > even with a high number of requests queued. As above, I wonder if the driver thread and the exiting connection thread should signal the queue thread? There seems to be something not complete with the queue loop. Also, Gustaf, the maxconnections should be higher...with maxconnections=3 you are exiting a connection thread every three requests, is this how you want it to work? If resources are limited, this isn't going to help. > > I have commited a patch to cvs head, that processes these pending > requests, even when maxconns is exceeded. Maybe it is better to > restart exiting connection threads automatically or to limit the > number of queued requests at first hand; or maybe there is a callback > solution. However. I have removed the old - useless - patch > from the head version in CVS. > Very interesting, I'll look into this more today. One thing I didn't try was a configuration where maxconnections was lower than the number of simultaneous requests from Apache Bench. tom jackson -- AOLserver - http://www.aolserver.com/ To Remove yourself from this list, simply send an email to <[EMAIL PROTECTED]> with the body of "SIGNOFF AOLSERVER" in the email message. You can leave the Subject: field of your email blank.
Re: [AOLSERVER] Ns_RegisterAtReady, NsRunAtReadyProcs
Dossy Shiobara schrieb: On 2007.10.13, Tom Jackson <[EMAIL PROTECTED]> wrote: This all looks great, except for the fact that the NsRunAtReadyProcs proc is never called. But, you're absolutely right: we need to figure this out, one way or another. Thanks for finding this. The question, why the patch helped is indeed a miracle. Maybe it changed the timing or causes a different serialization due to a mutex. However, it did NOT solve the underlying problem. Activated by Tom's question, i spent some more time on the problem and managed to reproduce the problem reliably. The problem actually happend when the number of available connection threads is configured very low, as well as the the number of requests per connection thread (e.g. 5 and 3). When the server accepts a high number of requests, these are queued even without guaranteeing resources to process these. The situation is especially bad with bursty requests. When a connection thread exits e.g. after 3 requests, and there are still many requests queued, there is no mechanism to automatically trigger new connection threads in this situation. When a client connects in this situation to the server, some old (maybe obsolete) requests are taken from the queue, but after this, the server might sit idle even with a high number of requests queued. I have commited a patch to cvs head, that processes these pending requests, even when maxconns is exceeded. Maybe it is better to restart exiting connection threads automatically or to limit the number of queued requests at first hand; or maybe there is a callback solution. However. I have removed the old - useless - patch from the head version in CVS. best regards -gustaf neumann -- AOLserver - http://www.aolserver.com/ To Remove yourself from this list, simply send an email to <[EMAIL PROTECTED]> with the body of "SIGNOFF AOLSERVER" in the email message. You can leave the Subject: field of your email blank.
Re: [AOLSERVER] Ns_RegisterAtReady, NsRunAtReadyProcs
On 2007.10.13, Tom Jackson <[EMAIL PROTECTED]> wrote: > This all looks great, except for the fact that the NsRunAtReadyProcs > proc is never called. In 4.5, that appears to be the case. In 4.0, it's called from nsd/queue.c. I'd need to look more closely at the code to see if we should remove those functions entirely in 4.5 or if there's actually a missing call to NsRunAtReadyProcs() somewhere. But, you're absolutely right: we need to figure this out, one way or another. Thanks for finding this. -- Dossy -- Dossy Shiobara | [EMAIL PROTECTED] | http://dossy.org/ Panoptic Computer Network | http://panoptic.com/ "He realized the fastest way to change is to laugh at your own folly -- then you can let go and quickly move on." (p. 70) -- AOLserver - http://www.aolserver.com/ To Remove yourself from this list, simply send an email to <[EMAIL PROTECTED]> with the body of "SIGNOFF AOLSERVER" in the email message. You can leave the Subject: field of your email blank.