Hi All,
I resurrect this discussion again ;-)
We had today the same symptom "Failed to contact..." which was persistent. We
got this problem in the past but rarely.
After googling "Failed to contact..." I found Kieran email. And we got the same
result when executing the following command:
ibabar:~ admin$ sudo lsof -i tcp | grep CLOSE_WAIT
java 34524 _appserver 137u IPv6 0x171e9344 0t0 TCP
ibabar.sophiacom.fr:dc->ibabar.sophiacom.fr:58973 (CLOSE_WAIT)
java 34524 _appserver 138u IPv6 0x2148f5a8 0t0 TCP
ibabar.sophiacom.fr:dc->ibabar.sophiacom.fr:59191 (CLOSE_WAIT)
java 34524 _appserver 140u IPv6 0x2141d344 0t0 TCP
ibabar.sophiacom.fr:dc->ibabar.sophiacom.fr:59070 (CLOSE_WAIT)
java 34524 _appserver 144u IPv6 0x2e28c984 0t0 TCP
ibabar.sophiacom.fr:dc->ibabar.sophiacom.fr:59114 (CLOSE_WAIT)
java 34524 _appserver 145u IPv6 0x2db8bb2c 0t0 TCP
ibabar.sophiacom.fr:dc->ibabar.sophiacom.fr:59074 (CLOSE_WAIT)
java 34524 _appserver 146u IPv6 0x13509a70 0t0 TCP
ibabar.sophiacom.fr:dc->ibabar.sophiacom.fr:58845 (CLOSE_WAIT)
java 34524 _appserver 152u IPv6 0x214440e0 0t0 TCP
ibabar.sophiacom.fr:dc->ibabar.sophiacom.fr:58853 (CLOSE_WAIT)
java 34524 _appserver 158u IPv6 0x2db23400 0t0 TCP
ibabar.sophiacom.fr:dc->ibabar.sophiacom.fr:59155 (CLOSE_WAIT)
java 34524 _appserver 176u IPv6 0x2e23b19c 0t0 TCP
ibabar.sophiacom.fr:dc->ibabar.sophiacom.fr:59034 (CLOSE_WAIT)
java 34524 _appserver 178u IPv6 0x2102f8c8 0t0 TCP
ibabar.sophiacom.fr:dc->ibabar.sophiacom.fr:59163 (CLOSE_WAIT)
java 34524 _appserver 179u IPv6 0x21523d90 0t0 TCP
ibabar.sophiacom.fr:dc->ibabar.sophiacom.fr:59110 (CLOSE_WAIT)
java 34524 _appserver 184u IPv6 0x20c995a8 0t0 TCP
ibabar.sophiacom.fr:dc->ibabar.sophiacom.fr:59199 (CLOSE_WAIT)
java 34524 _appserver 187u IPv6 0x2e1f98c8 0t0 TCP
ibabar.sophiacom.fr:dc->ibabar.sophiacom.fr:59042 (CLOSE_WAIT)
java 34524 _appserver 190u IPv6 0x2df27664 0t0 TCP
ibabar.sophiacom.fr:dc->ibabar.sophiacom.fr:59046 (CLOSE_WAIT)
java 34524 _appserver 191u IPv6 0x2dd3b4bc 0t0 TCP
ibabar.sophiacom.fr:dc->ibabar.sophiacom.fr:59086 (CLOSE_WAIT)
java 34524 _appserver 193u IPv6 0x2e01cf38 0t0 TCP
ibabar.sophiacom.fr:dc->ibabar.sophiacom.fr:59050 (CLOSE_WAIT)
After doing a dump, we saw the threads were locked as follow:
java.lang.Thread.State: BLOCKED (on object monitor)
at
er.extensions.eof.ERXEnterpriseObjectCache.cache(ERXEnterpriseObjectCache.java:380)
My question is about the cause of the CLOSE_WAITs and JavaMonitor: why the
monitor is not able to contact the wotaskd because one instance is locked and I
presume because the wotask is not able to contact the instance above?
I resurrect this mail because it's a good tip to use if someone get the message
"Failed to contact..." in the monitor.
Cheers,
Philippe
On 30 avr. 2009, at 23:30, Kieran Kelleher wrote:
> Resurrecting this old discussion again :-(
>
> OK, a while ago, one xserve "omega" (running Leopard Server 10.5.6, WO 5.4.X
> wotaskd with fully embedded WO 5.3.3 apps) showed up in WOMonitor as Failed
> to Contact again. Remember WOMonitor is running on Tiger Server 10.4.8 with
> the wotaskd from WO 5.3.3.
>
> Rather than assume this is a wotaskd/networking problem this time, I decided
> to check the WO apps on that server "192.168.3.154" using lsof and jstack to
> see if I can find anything unusual and I did:
>
> OK, 192.168.3.154 has 2 apps running on it. pid-479 port 2001) and pid-43
> (port 2004). Also wotaskd is running as pid 43
>
> app pid-479 lsof -i tcp:2001 shows nothing unusual
> COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
> java 479 _appserver 7u IPv6 0x830bb2c 0t0 TCP
> [::192.168.3.154]:dc (LISTEN)
>
> app pid-947 has unusual output, lsof -i tcp:2004 reveals 256 CLOSE_WAITs!!!
> .... this app is not allowing logins
> http://67.78.26.66:81/~kieran/misc/lsof_tcp_2004_pid_43.txt
>
> BTW, the other IP 192.168.3.149 shown on the CLOSE_WAIT lines is the machine
> that is running WOMonitor/apache, so this would seem to indicate a lot of
> hung requests? (that's a question, Chuck ;-) )
>
> lsof for wotaskd itself gives this, which doesn't seem unusual
> bash-3.2# lsof -i tcp:1085
> COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
> java 43 _appserver 8u IPv6 0x6e1d258 0t0 TCP
> [::192.168.3.154]:webobjects (LISTEN)
> java 43 _appserver 11u IPv6 0x830b664 0t0 TCP
> [::192.168.3.154]:webobjects->[::192.168.3.154]:49665 (ESTABLISHED)
> java 43 _appserver 12u IPv6 0x8e41cd4 0t0 TCP
> [::192.168.3.154]:webobjects->[::192.168.3.154]:53449 (ESTABLISHED)
> java 479 _appserver 10u IPv6 0x830b8c8 0t0 TCP
> [::192.168.3.154]:49665->[::192.168.3.154]:webobjects (ESTABLISHED)
> java 947 _appserver 10u IPv6 0x8e7d344 0t0 TCP
> [::192.168.3.154]:53449->[::192.168.3.154]:webobjects (ESTABLISHED)
>
>
> Now looking at the jstack outputs, we also have more useful clues.
>
> jstack on the pid-947 (port 2004) app reveals it has session store
> deadlocks!! This is the same app with all the CLOSE_WAITs
> http://67.78.26.66:81/~kieran/misc/jstack_pid_947.txt
>
> So, it would seem that the stupid 'Failed to contact" stuff I have been
> seeing are really caused by Session Store deadlocks. So, the first thing I am
> going to do now is turn OFF concurrent request handling and turn on Wonder
> Session Store Deadlock detection for this app ...... however, I would wager
> that I will not see any Sesion Store deadlocks with concurrent request
> handling turned off!
>
> Any ideas on a strategy for deadlock detection with concurrent request
> handling ON?
>
>
>
>
>
>
>
> On Mar 25, 2009, at 10:34 PM, Chuck Hill wrote:
>
>>
>> On Mar 25, 2009, at 7:21 PM, Kieran Kelleher wrote:
>>
>>> Hi again Chuck,
>>>
>>> If you are going to use the the domain name (for example www.website.com,
>>> which resolves to 67.88.91.233 for example) doesn't that mean you have to
>>> open port 1085 on the router between public internet and that
>>> apache/WoMonitor machine?
>>
>> Apache is behind the firewall. Only ports 80 and 443 go though.
>>
>>
>> Chuck
>>
>>
>>
>>> -Kieran
>>>
>>> On Mar 23, 2009, at 12:25 PM, Chuck Hill wrote:
>>>
>>>> On Mar 21, 2009, at 6:35 PM, Kieran Kelleher wrote:
>>>>
>>>>> Hi Chuck,
>>>>>
>>>>> Still getting this problem after a few days of running .... last time we
>>>>> discussed, I had updated all the WO servers which run leopard to use IP
>>>>> address for host name...... I still have not touched the single only
>>>>> Tiger machine that is apache and runs the site's WOMonitor and has a
>>>>> couple tiny insignificant WO apps. I am not ready to upgrade this machine
>>>>> to a Leopard machine just yet, so I guess that is the next guy to be
>>>>> updated with IP addresses instead of its Bonjour name ..... but I have a
>>>>> question for you based on your experience with this:
>>>>>
>>>>> - For that primary WOMonitor machine which is the main site webserver,
>>>>> should I change to localhost, 127.0.0.1 or the actual IP address of the
>>>>> machine in WOMonitor Host settings and wotaskd properties? (FWIW, for
>>>>> last couple of years, we have used the Bonjour host.local name style on
>>>>> that machine)
>>>>
>>>> We usually use neither. We use the name that DNS lookups (reverse lookup
>>>> working is important too) to the primary IP on that machine.
>>>
>>>
>>
>> --
>> Chuck Hill Senior Consultant / VP Development
>>
>> Practical WebObjects - for developers who want to increase their overall
>> knowledge of WebObjects or who are trying to solve specific problems.
>> http://www.global-village.net/products/practical_webobjects
>>
>>
>>
>>
>>
>>
>
> _______________________________________________
> Do not post admin requests to the list. They will be ignored.
> Webobjects-dev mailing list ([email protected])
> Help/Unsubscribe/Update your Subscription:
> http://lists.apple.com/mailman/options/webobjects-dev/prabier%40mac.com
>
> This email sent to [email protected]
_______________________________________________
Do not post admin requests to the list. They will be ignored.
Webobjects-dev mailing list ([email protected])
Help/Unsubscribe/Update your Subscription:
https://lists.apple.com/mailman/options/webobjects-dev/archive%40mail-archive.com
This email sent to [email protected]