Hi All, 

I resurrect this discussion again ;-)

We had today the same symptom "Failed to contact..." which was persistent. We 
got this problem in the past but rarely.

After googling "Failed to contact..." I found Kieran email. And we got the same 
result when executing the following command:
ibabar:~ admin$ sudo lsof -i tcp | grep CLOSE_WAIT
java      34524   _appserver  137u  IPv6 0x171e9344      0t0  TCP 
ibabar.sophiacom.fr:dc->ibabar.sophiacom.fr:58973 (CLOSE_WAIT)
java      34524   _appserver  138u  IPv6 0x2148f5a8      0t0  TCP 
ibabar.sophiacom.fr:dc->ibabar.sophiacom.fr:59191 (CLOSE_WAIT)
java      34524   _appserver  140u  IPv6 0x2141d344      0t0  TCP 
ibabar.sophiacom.fr:dc->ibabar.sophiacom.fr:59070 (CLOSE_WAIT)
java      34524   _appserver  144u  IPv6 0x2e28c984      0t0  TCP 
ibabar.sophiacom.fr:dc->ibabar.sophiacom.fr:59114 (CLOSE_WAIT)
java      34524   _appserver  145u  IPv6 0x2db8bb2c      0t0  TCP 
ibabar.sophiacom.fr:dc->ibabar.sophiacom.fr:59074 (CLOSE_WAIT)
java      34524   _appserver  146u  IPv6 0x13509a70      0t0  TCP 
ibabar.sophiacom.fr:dc->ibabar.sophiacom.fr:58845 (CLOSE_WAIT)
java      34524   _appserver  152u  IPv6 0x214440e0      0t0  TCP 
ibabar.sophiacom.fr:dc->ibabar.sophiacom.fr:58853 (CLOSE_WAIT)
java      34524   _appserver  158u  IPv6 0x2db23400      0t0  TCP 
ibabar.sophiacom.fr:dc->ibabar.sophiacom.fr:59155 (CLOSE_WAIT)
java      34524   _appserver  176u  IPv6 0x2e23b19c      0t0  TCP 
ibabar.sophiacom.fr:dc->ibabar.sophiacom.fr:59034 (CLOSE_WAIT)
java      34524   _appserver  178u  IPv6 0x2102f8c8      0t0  TCP 
ibabar.sophiacom.fr:dc->ibabar.sophiacom.fr:59163 (CLOSE_WAIT)
java      34524   _appserver  179u  IPv6 0x21523d90      0t0  TCP 
ibabar.sophiacom.fr:dc->ibabar.sophiacom.fr:59110 (CLOSE_WAIT)
java      34524   _appserver  184u  IPv6 0x20c995a8      0t0  TCP 
ibabar.sophiacom.fr:dc->ibabar.sophiacom.fr:59199 (CLOSE_WAIT)
java      34524   _appserver  187u  IPv6 0x2e1f98c8      0t0  TCP 
ibabar.sophiacom.fr:dc->ibabar.sophiacom.fr:59042 (CLOSE_WAIT)
java      34524   _appserver  190u  IPv6 0x2df27664      0t0  TCP 
ibabar.sophiacom.fr:dc->ibabar.sophiacom.fr:59046 (CLOSE_WAIT)
java      34524   _appserver  191u  IPv6 0x2dd3b4bc      0t0  TCP 
ibabar.sophiacom.fr:dc->ibabar.sophiacom.fr:59086 (CLOSE_WAIT)
java      34524   _appserver  193u  IPv6 0x2e01cf38      0t0  TCP 
ibabar.sophiacom.fr:dc->ibabar.sophiacom.fr:59050 (CLOSE_WAIT)

After doing a dump, we saw the threads were locked as follow:
   java.lang.Thread.State: BLOCKED (on object monitor)
        at 
er.extensions.eof.ERXEnterpriseObjectCache.cache(ERXEnterpriseObjectCache.java:380)

My question is about the cause of the CLOSE_WAITs and JavaMonitor: why the 
monitor is not able to contact the wotaskd because one instance is locked and I 
presume because the wotask is not able to contact the instance above?

I resurrect this mail because it's a good tip to use if someone get the message 
"Failed to contact..." in the monitor.

Cheers,

Philippe

On 30 avr. 2009, at 23:30, Kieran Kelleher wrote:

> Resurrecting this old discussion again :-(
> 
> OK, a while ago, one xserve "omega" (running Leopard Server 10.5.6, WO 5.4.X 
> wotaskd with fully embedded WO 5.3.3 apps) showed up in WOMonitor as Failed 
> to Contact again. Remember WOMonitor is running on Tiger Server 10.4.8 with 
> the wotaskd from WO 5.3.3.
> 
> Rather than assume this is a wotaskd/networking problem this time, I decided 
> to check the WO apps on that server "192.168.3.154" using lsof and jstack to 
> see if I can find anything unusual and I did:
> 
> OK, 192.168.3.154 has 2 apps running on it. pid-479 port 2001) and pid-43 
> (port 2004). Also wotaskd is running as pid 43
> 
> app pid-479 lsof -i tcp:2001 shows nothing unusual
> COMMAND PID       USER   FD   TYPE    DEVICE SIZE/OFF NODE NAME
> java    479 _appserver    7u  IPv6 0x830bb2c      0t0  TCP 
> [::192.168.3.154]:dc (LISTEN)
> 
> app pid-947 has unusual output, lsof -i tcp:2004 reveals 256 CLOSE_WAITs!!! 
> .... this app is not allowing logins
> http://67.78.26.66:81/~kieran/misc/lsof_tcp_2004_pid_43.txt
> 
> BTW, the other IP 192.168.3.149 shown on the CLOSE_WAIT lines is the machine 
> that is running WOMonitor/apache, so this would seem to indicate a lot of 
> hung requests? (that's a question, Chuck ;-) )
> 
> lsof for wotaskd itself gives this, which doesn't seem unusual
> bash-3.2# lsof -i tcp:1085
> COMMAND PID       USER   FD   TYPE    DEVICE SIZE/OFF NODE NAME
> java     43 _appserver    8u  IPv6 0x6e1d258      0t0  TCP 
> [::192.168.3.154]:webobjects (LISTEN)
> java     43 _appserver   11u  IPv6 0x830b664      0t0  TCP 
> [::192.168.3.154]:webobjects->[::192.168.3.154]:49665 (ESTABLISHED)
> java     43 _appserver   12u  IPv6 0x8e41cd4      0t0  TCP 
> [::192.168.3.154]:webobjects->[::192.168.3.154]:53449 (ESTABLISHED)
> java    479 _appserver   10u  IPv6 0x830b8c8      0t0  TCP 
> [::192.168.3.154]:49665->[::192.168.3.154]:webobjects (ESTABLISHED)
> java    947 _appserver   10u  IPv6 0x8e7d344      0t0  TCP 
> [::192.168.3.154]:53449->[::192.168.3.154]:webobjects (ESTABLISHED)
> 
> 
> Now looking at the jstack outputs, we also have more useful clues.
> 
> jstack on the pid-947 (port 2004) app reveals it has session store 
> deadlocks!! This is the same app with all the CLOSE_WAITs
> http://67.78.26.66:81/~kieran/misc/jstack_pid_947.txt
> 
> So, it would seem that the stupid 'Failed to contact" stuff I have been 
> seeing are really caused by Session Store deadlocks. So, the first thing I am 
> going to do now is turn OFF concurrent request handling and turn on Wonder 
> Session Store Deadlock detection for this app ...... however, I would wager 
> that I will not see any Sesion Store deadlocks with concurrent request 
> handling turned off!
> 
> Any ideas on a strategy for deadlock detection with concurrent request 
> handling ON?
> 
> 
> 
> 
> 
> 
> 
> On Mar 25, 2009, at 10:34 PM, Chuck Hill wrote:
> 
>> 
>> On Mar 25, 2009, at 7:21 PM, Kieran Kelleher wrote:
>> 
>>> Hi again Chuck,
>>> 
>>> If you are going to use the the domain name (for example www.website.com, 
>>> which resolves to 67.88.91.233 for example) doesn't that mean you have to 
>>> open port 1085 on the router between public internet and that 
>>> apache/WoMonitor machine?
>> 
>> Apache is behind the firewall.  Only ports 80 and 443 go though.
>> 
>> 
>> Chuck
>> 
>> 
>> 
>>> -Kieran
>>> 
>>> On Mar 23, 2009, at 12:25 PM, Chuck Hill wrote:
>>> 
>>>> On Mar 21, 2009, at 6:35 PM, Kieran Kelleher wrote:
>>>> 
>>>>> Hi Chuck,
>>>>> 
>>>>> Still getting this problem after a few days of running .... last time we 
>>>>> discussed, I had updated all the WO servers which run leopard to use IP 
>>>>> address for host name...... I still have not touched the single only 
>>>>> Tiger machine that is apache and runs the site's WOMonitor and has a 
>>>>> couple tiny insignificant WO apps. I am not ready to upgrade this machine 
>>>>> to a Leopard machine just yet, so I guess that is the next guy to be 
>>>>> updated with IP addresses instead of its Bonjour name ..... but I have a 
>>>>> question for you based on your experience with this:
>>>>> 
>>>>> - For that primary WOMonitor machine which is the main site webserver, 
>>>>> should I change to localhost, 127.0.0.1 or the actual IP address of the 
>>>>> machine in WOMonitor Host settings and wotaskd properties?  (FWIW, for 
>>>>> last couple of years, we have used the Bonjour host.local name style on 
>>>>> that machine)
>>>> 
>>>> We usually use neither.  We use the name that DNS lookups (reverse lookup 
>>>> working is important too) to the primary IP on that machine.
>>> 
>>> 
>> 
>> -- 
>> Chuck Hill             Senior Consultant / VP Development
>> 
>> Practical WebObjects - for developers who want to increase their overall 
>> knowledge of WebObjects or who are trying to solve specific problems.
>> http://www.global-village.net/products/practical_webobjects
>> 
>> 
>> 
>> 
>> 
>> 
> 
> _______________________________________________
> Do not post admin requests to the list. They will be ignored.
> Webobjects-dev mailing list      ([email protected])
> Help/Unsubscribe/Update your Subscription:
> http://lists.apple.com/mailman/options/webobjects-dev/prabier%40mac.com
> 
> This email sent to [email protected]


 _______________________________________________
Do not post admin requests to the list. They will be ignored.
Webobjects-dev mailing list      ([email protected])
Help/Unsubscribe/Update your Subscription:
https://lists.apple.com/mailman/options/webobjects-dev/archive%40mail-archive.com

This email sent to [email protected]

Reply via email to