Chuck, thanks for the information. The skill set is a bit of a rare one. Michael, interesting to hear about a deadlock on Linux. Do you mean you were using the 'worker' MPM? You don't have a stack trace by any chance? We may have hit the same bottleneck or deadlock. I have been wondering whether this is a sparc/Solaris specific issue.
Not being able to use the worker MPM with WO seems like a serious scalability issue for anyone who isn't 'akamazed'. If I can narrow the problem down to something specific enough we may look to pay someone to fix this. Sent from my iPhone On 5 Jul 2012, at 00:10, "Chuck Hill" <[email protected]> wrote: > Hi James, > > > On 2012-06-29, at 9:35 AM, Brook, James wrote: > >> It's probably bad form to keep answering my own mails but no-one had >> anything to say about this. Are there still people on the list who are >> familiar with the adaptor internals? This problem is causing us a lot of >> pain in production. > > At this point in time, you are probably the world's authority on this. > > >> Does anyone use the MPM worker module with Apache or are we all still with >> pre-fork? I don't think we could live without the performance gains. Perhaps >> it doesn't matter. > > I would guess that very few of us are using Apache on Solaris. > > >> I haven't quite proven this but I am pretty certain that my problem is with >> fcntl. That's what the adaptor uses to lock the shared memory file. It's >> apparently an outdated way of doing this - APR now has better abstractions >> for these sorts of mutexes. Even the code that does the locking is in a >> retry loop with up to 50 attempts! I started trying to rewrite the locking >> stuff but I am out of my depth. > > There are probably a few people here with current C skills, I am not one of > them. And then you probably need Apache and Solaris API knowledge too. > > >> It strikes me that in general this would not be a bad bit of code for the >> community to have updated. Can anyone help me with that please? > > I would. but I can't. I was trying to help one company that had a deployment > problem on Solaris that sounds somewhat similar to yours. So yes, it would > be good to get this updated. But finding someone else with the skill set is > unlikely. > > > Chuck > > >> ________________________________________ >> From: Brook, James >> Sent: 13 June 2012 18:48 >> To: <[email protected]> >> Subject: Re: Deadlock on Apache 2.2 Adaptor under high load - Solaris 10 - >> Worker MPM >> >> Now I have some detailed adaptor logging from a time close to the deadlock. >> Here is an example of an error with a lock: >> >> Debug: thread 37 locking WOShmem_lock from ../Adaptor/shmem.c:375 >> Debug: thread 37 unlocking WOShmem_lock from ../Adaptor/shmem.c:379 >> Error: lock_file_section(): failed to lock (1 attempts): Deadlock situation >> detected/avoided >> Debug: thread 37 locking str_lock from ../Adaptor/wastring.c:93 >> Debug: thread 37 unlocking str_lock from ../Adaptor/wastring.c:100 >> Debug: thread 37 locking str_lock from ../Adaptor/wastring.c:152 >> Debug: thread 37 unlocking str_lock from ../Adaptor/wastring.c:158 >> Debug: thread 37 locking WOShmem_lock from ../Adaptor/shmem.c:391 >> Debug: thread 37 unlocking WOShmem_lock from ../Adaptor/shmem.c:394 >> Error: ac_readConfiguration: WOShmem_lock() failed. Skipping reading config. >> >> On Jun 13, 2012, at 5:30 PM, James Brook wrote: >> >>> We have a big problem with the Apache 2.2 WebObjects adaptor on our Solaris >>> 10 web servers. We are using the 'worker' MPM but when the sites get busy >>> nearly every Apache thread is waiting for a shared memory lock to call the >>> function that reads the adaptor config. The remaining threads are in the >>> fcntl function trying to lock a section of shared memory. See below for a >>> couple of example thread stacks. >>> >>> I read in several posts that fcntl on Solaris 10 causes deadlocks under >>> high load and that the problem is worse with the 'worker MPM'. The >>> recommend locking mechanism for Solaris seems to be to use pthreads. >>> >>> I know that at least a few list members are running with the Solaris >>> adaptor. My questions: >>> * Has anyone experienced this problem and found a solution? >>> * Anyone using the 'worker' MPM or do people still use pre-fork (I don't >>> think this a thread safety problem). >>> * Any help or suggestions? Especially, any tips on rewriting to use >>> pthreads? >>> >>> -- >>> James >>> >>> >>> feec5638 fcntl (d8, 7, 2abe588) >>> feeb8258 fcntl (d8, 1, fefcc200, 4d6880, 1580, 20a58) + 84 >>> febe8570 lock_file_section (d8, 4d6880, 14, 2abe588, 147c, 2) + 58 >>> febe8e14 WOShmem_lock (2abe588, 14, 1, 4d6880, 1580, 1400) + d4 >>> febef410 ac_readConfiguration (1, fffee980, 11400, fec08f74, 1d84, 1c00) + >>> 40 >>> febe71cc _runRequest (fc9fb9c4, 0, 2d77168, 2d18b40, 5, 0) + 260 >>> febe6a0c tr_handleRequest (2d18b40, 27226f0, fc9fbc50, 0, 5, 2) + 30c >>> febf42a8 WebObjects_handler (2721208, 0, 10000, 0, 2d18b40, fec08f74) + 48c >>> 00041484 ap_run_handler (2721208, febf3e1c, 7b578, 6b5a10, 2, 8) + 40 >>> 00041ab4 ap_invoke_handler (2721208, 0, 2721208, 0, 6b58bc, 79c00) + ec >>> 0005132c ap_process_request (2721208, 79400, 4, 1, 0, 2721208) + 54 >>> 0004d9a4 ap_process_http_connection (26b61c0, 7c000, 0, 1, 79548, 5) + 78 >>> 00049654 ap_process_connection (26b61c0, 26b5f10, 6b5d90, 0, 7bd98, 6b5d78) >>> + d4 >>> 00057558 worker_thread (14d888, ad7, fc9fbf98, 7c24c, 2b, 17) + 280 >>> feec5238 _lwp_start (0, 0, 0, 0, 0, 0) >>> >>> >>> feec52d8 lwp_park (0, 0, 0) >>> feebf350 cond_wait_queue (ef50a8, ef5090, 0, 0, 1c00, 0) + 28 >>> feebf874 cond_wait (ef50a8, ef5090, ef50a8, 0, fec0a8f8, 3) + 10 >>> feebf8b0 pthread_cond_wait (ef50a8, ef5090, ef5090, 0, 1c00, 3a) + 8 >>> febf2730 _WA_lock (ef5088, febf5974, ef50a8, 0, fec0a8f8, 3) + 90 >>> febe9494 sha_lock (100, 4, fffeca64, fec08f74, ef3230, 13400) + 5c >>> febedd84 ac_findApplication (fe0fb54c, 4, fec0acfc, fec08f74, 0, fec0a474) >>> + 70 >>> febe6794 tr_handleRequest (2402c38, 30bbec0, fe0fb7d8, 798f0, ffffffff, >>> 14400) + 94 >>> febf42a8 WebObjects_handler (30baf40, 0, 10000, 0, 2402c38, fec08f74) + 48c >>> 00041484 ap_run_handler (30baf40, febf3e1c, 7b578, 6b5a10, 2, 8) + 40 >>> 00041ab4 ap_invoke_handler (30baf40, 0, 2ba5f10, 2ba5348, 30baf40, 2b824d8) >>> + ec >>> 0003f080 ap_run_sub_req (ffffffff, 30bb0e8, 20, 0, 30bc370, 30baf40) + 3c >>> fed336d8 handle_include (2ba4d20, 10800, 2ba5f10, 2ba5348, 30baf40, >>> 2b824d8) + 334 >>> fed378f8 send_parsed_content (11a8, 7c021, 2ba4d20, 2c01898, 2ba5f14, >>> 2ba5f10) + 1080 >>> 0003afb0 default_handler (0, 2c01898, 2b91e10, 2b7c748, 2b7e598, 2ba5328) + >>> 4a8 >>> 00041484 ap_run_handler (2c01898, 3ab08, 7b578, 6b5a74, 7, 8) + 40 >>> 00041ab4 ap_invoke_handler (2c01898, 0, 2c01898, 2b9eb80, ffb1b6a0, 4e4960) >>> + ec >>> 00051a58 ap_internal_redirect (0, 2c01898, fe0fbd10, fe0fbcac, 1, 2c01898) >>> + 44 >>> febab53c handler_redirect (2b9eb80, ffffffff, febbd238, 2c01560, fffefd64, >>> 10000) + 90 >>> 00041484 ap_run_handler (2b9eb80, febab4ac, 7b578, 6b5a4c, 5, 8) + 40 >>> 00041ab4 ap_invoke_handler (2b9eb80, 0, 2b9eb80, 0, 6b58bc, 79c00) + ec >>> 0005132c ap_process_request (2b9eb80, 79400, 4, 1, 0, 2b9eb80) + 54 >>> 0004d9a4 ap_process_http_connection (2b7c748, 7c000, 0, 1, 79548, 5) + 78 >>> 00049654 ap_process_connection (2b7c748, 2b7c498, 6b5d90, 0, 7bd98, 6b5d78) >>> + d4 >>> 00057558 worker_thread (14d5a8, a00, fe0fbf98, 7c24c, 28, 0) + 280 >>> feec5238 _lwp_start (0, 0, 0, 0, 0, 0) >>> >>> >>> >>> >>> >> >> >> _______________________________________________ >> Do not post admin requests to the list. They will be ignored. >> Webobjects-dev mailing list ([email protected]) >> Help/Unsubscribe/Update your Subscription: >> https://lists.apple.com/mailman/options/webobjects-dev/chill%40global-village.net >> >> This email sent to [email protected] > > -- > Chuck Hill Senior Consultant / VP Development > > Practical WebObjects - for developers who want to increase their overall > knowledge of WebObjects or who are trying to solve specific problems. > http://www.global-village.net/gvc/practical_webobjects > > > > > > > > > > _______________________________________________ > Do not post admin requests to the list. They will be ignored. > Webobjects-dev mailing list ([email protected]) > Help/Unsubscribe/Update your Subscription: > https://lists.apple.com/mailman/options/webobjects-dev/jbrook%40upcbroadband.com > > This email sent to [email protected] _______________________________________________ Do not post admin requests to the list. They will be ignored. Webobjects-dev mailing list ([email protected]) Help/Unsubscribe/Update your Subscription: https://lists.apple.com/mailman/options/webobjects-dev/archive%40mail-archive.com This email sent to [email protected]
