Re: [Zope-dev] TCP CLOSE_WAIT leaks

2006-04-04 Thread Dieter Maurer
Alan Milligan wrote at 2006-4-4 21:19 +1000:
>Dieter Maurer wrote:
>
>> Looks as if your Python were compiled without debugging symbols.
>> You will get much better tracebacks (easier to understand),
>> when you rebuild your Python with debugging symbols.
>
>But of course, it's a production system :( - temporarily installing
>python-debuginfo ...

Debug symbols do not increase the runtime. They only make the binaries
(considerably) larger.

You must also recompile the Zope extensions (as they compile the same
way, Python has been compiled).

-- 
Dieter
___
Zope-Dev maillist  -  Zope-Dev@zope.org
http://mail.zope.org/mailman/listinfo/zope-dev
**  No cross posts or HTML encoding!  **
(Related lists - 
 http://mail.zope.org/mailman/listinfo/zope-announce
 http://mail.zope.org/mailman/listinfo/zope )


Re: [Zope-dev] TCP CLOSE_WAIT leaks

2006-04-04 Thread Alan Milligan
Dieter Maurer wrote:

> Looks as if your Python were compiled without debugging symbols.
> You will get much better tracebacks (easier to understand),
> when you rebuild your Python with debugging symbols.

But of course, it's a production system :( - temporarily installing
python-debuginfo ...

> 
> The following GDB macros can be used to help in the analysis.
> "pfr" can be called in frames "eval_frame" to learn
> about the Python frame.
> 
> def ps
> x/s ({PyStringObject}$arg0)->ob_sval
> end
> 

Dieter, I don't know where you manage to find this stuff!!

Have stuck it in my .gdbinit and am knocking myself out with it...

Cheers, Alan

___
Zope-Dev maillist  -  Zope-Dev@zope.org
http://mail.zope.org/mailman/listinfo/zope-dev
**  No cross posts or HTML encoding!  **
(Related lists - 
 http://mail.zope.org/mailman/listinfo/zope-announce
 http://mail.zope.org/mailman/listinfo/zope )


Re: [Zope-dev] TCP CLOSE_WAIT leaks

2006-04-03 Thread Dieter Maurer
Alan Milligan wrote at 2006-4-3 12:05 +1000:
> ...
>#36 0xa86a19b0 in ?? ()
>#37 0xa86a19ac in ?? ()
>#38 0xb7a07583 in ?? () from
>/opt/zope2.8/lib/python/Acquisition/_Acquisition.so
>#39 0xb7a83660 in ?? ()
>#40 0x in ?? ()
>
>
>I must confess that I'm quite puzzled that the main thread is also
>blocked - DeadlockDebugger's publish magic is not being invoked,
>concurring with strace.

Looks as if your Python were compiled without debugging symbols.
You will get much better tracebacks (easier to understand),
when you rebuild your Python with debugging symbols.

The following GDB macros can be used to help in the analysis.
"pfr" can be called in frames "eval_frame" to learn
about the Python frame.

def ps
x/s ({PyStringObject}$arg0)->ob_sval
end

def pfr
ps f->f_code->co_filename
ps f->f_code->co_name
#p f->f_lineno
lineno
end

define lineno
set $__co = f->f_code
set $__lasti = f->f_lasti
set $__sz = ((PyStringObject *)$__co->co_lnotab)->ob_size/2
set $__p = (unsigned char *)((PyStringObject *)$__co->co_lnotab)->ob_sval
set $__li = $__co->co_firstlineno
set $__ad = 0
while ($__sz-1 >= 0)
  set $__sz = $__sz - 1
  set $__ad = $__ad + *$__p
  set $__p = $__p + 1
  if ($__ad > $__lasti)
# break -- interpreted as "breakpoint"
set $__sz = -1
  end
  if ($__sz >= 0)
set $__li = $__li + *$__p
set $__p = $__p + 1
  end
end
printf "%d\n", $__li
end

-- 
Dieter
___
Zope-Dev maillist  -  Zope-Dev@zope.org
http://mail.zope.org/mailman/listinfo/zope-dev
**  No cross posts or HTML encoding!  **
(Related lists - 
 http://mail.zope.org/mailman/listinfo/zope-announce
 http://mail.zope.org/mailman/listinfo/zope )


Re: [Zope-dev] TCP CLOSE_WAIT leaks

2006-04-02 Thread Alan Milligan
Dieter Maurer wrote:
> Alan Milligan wrote at 2006-3-28 10:44 +1000:

> A missing "clearing down" of the client's connection cannot be the 
> cause for this. The (worker) thread must finish (and you
> should see a log entry) long before this connection is expected to
> be closed. The problem you observe seems to indicate a problem
> inside your request processing. Activating the "debuglogger"
> (or similarly spelled) can prove whether or not this assumption
> is correct.
> 

You are correct.  It appears that all the threads are blocked on a
lock.acquire() call (possibly three given the memory addresses) :

[EMAIL PROTECTED] tmp]# strace -fp `cat /opt/zope2.8/instance/var/Z2.pid`
Process 12814 attached with 23 threads - interrupt to quit
[pid 12788] futex(0x83560c0, FUTEX_WAIT, 0, NULL 
[pid 12790] futex(0x83560c0, FUTEX_WAIT, 0, NULL 
[pid 12798] futex(0x93e7f50, FUTEX_WAIT, 0, NULL 
[pid 12802] futex(0x93e7f50, FUTEX_WAIT, 0, NULL 
[pid 12803] futex(0x93e7f50, FUTEX_WAIT, 0, NULL 
[pid 12806] futex(0x93e7f50, FUTEX_WAIT, 0, NULL 
[pid 12808] futex(0x83560c0, FUTEX_WAIT, 0, NULL 
[pid 12809] futex(0x93e7f50, FUTEX_WAIT, 0, NULL 
[pid 12810] futex(0x93e7f50, FUTEX_WAIT, 0, NULL 
[pid 12795] futex(0x93e7f50, FUTEX_WAIT, 0, NULL 
[pid 12813] futex(0x83560c0, FUTEX_WAIT, 0, NULL 
[pid 12807] futex(0x93e7f50, FUTEX_WAIT, 0, NULL 
[pid 12804] futex(0x93e7f50, FUTEX_WAIT, 0, NULL 
[pid 12800] futex(0x93e7f50, FUTEX_WAIT, 0, NULL 
[pid 12797] futex(0x93e7f50, FUTEX_WAIT, 0, NULL 
[pid 12812] futex(0x93e7f50, FUTEX_WAIT, 0, NULL 
[pid 12796] futex(0x93e7f50, FUTEX_WAIT, 0, NULL 
[pid 12805] futex(0x93e7f50, FUTEX_WAIT, 0, NULL 
[pid 12814] futex(0x93e7f50, FUTEX_WAIT, 0, NULL 
[pid 12799] futex(0x93e7f50, FUTEX_WAIT, 0, NULL 
[pid 12792] futex(0xa993760, FUTEX_WAIT, 0, NULL 
[pid 12801] futex(0x93e7f50, FUTEX_WAIT, 0, NULL 


I tried to take Paul's advice to use gdb to get a python stacktrace, but
gbd wouldn't let me:

(gdb)  call PyRun_SimpleString("import sys, traceback;
sys.stderr=open('/tmp/tb','w',0); traceback.print_stack()")

Program received signal SIGSTOP, Stopped (signal).
[Switching to Thread -1469408336 (LWP 31487)]
0x00a527a2 in _dl_sysinfo_int80 () from /lib/ld-linux.so.2
The program being debugged was signaled while in a function called from GDB.
GDB remains in the frame where the signal was received.
To change this behavior use "set unwindonsignal on"
Evaluation of the expression containing the function (malloc) will be
abandoned.


So, setting unwindonsignal 


(gdb) set unwindonsignal on
(gdb)  call PyRun_SimpleString("import sys, traceback;
sys.stderr=open('/tmp/tb','w',0); traceback.print_stack()")

Program received signal SIGSTOP, Stopped (signal).
[Switching to Thread -1259328592 (LWP 31465)]
0x00a527a2 in _dl_sysinfo_int80 () from /lib/ld-linux.so.2
The program being debugged was signaled while in a function called from GDB.
GDB has restored the context to what it was before the call.
To change this behavior use "set unwindonsignal off"
Evaluation of the expression containing the function
(PyRun_SimpleString) will be abandoned


The best I could do was a good old-fashioned where ...


(gdb) where
#0  0x00a527a2 in _dl_sysinfo_int80 () from /lib/ld-linux.so.2
#1  0x00cb4a24 in [EMAIL PROTECTED] () from /lib/tls/libpthread.so.0
#2  0x0099842c in ?? () from /usr/lib/libpython2.3.so.1.0
#3  0x0096f04e in PyThread_acquire_lock () from /usr/lib/libpython2.3.so.1.0
#4  0x009722f7 in _PyObject_GC_Del () from /usr/lib/libpython2.3.so.1.0
#5  0x0091b921 in PyCFunction_Call () from /usr/lib/libpython2.3.so.1.0
#6  0x0094e63f in _PyEval_SliceIndex () from /usr/lib/libpython2.3.so.1.0
#7  0x009501b6 in PyEval_EvalCodeEx () from /usr/lib/libpython2.3.so.1.0
#8  0x0094ee6b in _PyEval_SliceIndex () from /usr/lib/libpython2.3.so.1.0
#9  0x0094fa65 in _PyEval_SliceIndex () from /usr/lib/libpython2.3.so.1.0
#10 0x0094fa65 in _PyEval_SliceIndex () from /usr/lib/libpython2.3.so.1.0
#11 0x009501b6 in PyEval_EvalCodeEx () from /usr/lib/libpython2.3.so.1.0
#12 0x0090be4e in PyFunction_SetClosure () from /usr/lib/libpython2.3.so.1.0
#13 0x008f8627 in PyObject_Call () from /usr/lib/libpython2.3.so.1.0
#14 0x008ffdb8 in PyMethod_New () from /usr/lib/libpython2.3.so.1.0
#15 0x008f8627 in PyObject_Call () from /usr/lib/libpython2.3.so.1.0
#16 0x008f8824 in PyObject_CallMethod () from /usr/lib/libpython2.3.so.1.0
#17 0xb7b3762b in ?? () from
/opt/zope2.8/lib/python/persistent/cPersistence.so
#18 0xa67cf3cc in ?? ()
#19 0xb7b3900a in ?? () from
/opt/zope2.8/lib/python/persistent/cPersistence.so
#20 0xb7b39008 in ?? () from
/opt/zope2.8/lib/python/persistent/cPersistence.so
#21 0x9c10776c in ?? ()
#22 0xb786cce8 in ?? () from
/opt/zope2.8/lib/python/Persistence/_Persistence.so
#23 0x9c10776c in ?? ()
#24 0xa86a1968 in ?? ()
#25 0xb7b38ae4 in ?? () from
/opt/zope2.8/lib/python/persistent/cPersistence.so
#26 0xb7a83660 in ?? ()
#27 0xb7a83660 in ?? ()
#28 0xa86a19a8 in ?? ()
#29 0xb786b934 in ?? () from
/opt/zope2.8/lib/p

Re: [Zope-dev] TCP CLOSE_WAIT leaks

2006-04-01 Thread Dieter Maurer
Alan Milligan wrote at 2006-3-28 10:44 +1000:
> ...
>>>It seems something's not properly clearing down the connection to
>>>release the thread.
> ...
>As these threads have not completed, there's no access log entry, and
>the other socket terminal has obviously closed, meaning I can't chase
>the issue down from the other end either.

A missing "clearing down" of the client's connection cannot be the 
cause for this. The (worker) thread must finish (and you
should see a log entry) long before this connection is expected to
be closed. The problem you observe seems to indicate a problem
inside your request processing. Activating the "debuglogger"
(or similarly spelled) can prove whether or not this assumption
is correct.

-- 
Dieter
___
Zope-Dev maillist  -  Zope-Dev@zope.org
http://mail.zope.org/mailman/listinfo/zope-dev
**  No cross posts or HTML encoding!  **
(Related lists - 
 http://mail.zope.org/mailman/listinfo/zope-announce
 http://mail.zope.org/mailman/listinfo/zope )


Re: [Zope-dev] TCP CLOSE_WAIT leaks

2006-03-29 Thread Paul Winkler
On Thu, Mar 30, 2006 at 02:32:58AM +1000, Alan Milligan wrote:
> I managed to get a DeadlockDebugger trace on this thing, it made very
> interesting reading:
(snip)
>   File "/opt/zope2.8/lib/python/ZEO/ClientStorage.py", line 781, in loadEx
> return data, tid, ver
> 
> *every* thread was block-waiting on zeo (from a wide range of different
> Zope/Plone types)!  It looks to me like Apache has timed out, clearing
> down it's end, Zope however is still having to wait for zeo which is
> completely borked.
> 
> I've consequently ditched zeo and everything is again well-behaved.

Is your zeo server on a separate box? Is there a firewall between them?

The *only* time I've ever had problems like that was in the following
scenario:

* firewall between zope and zeo

* minimal traffic at times (it was a secondary system, most of its
  usage was when our primary data center was down for maintenance)

* firewall was of an evil type that tears down "unused" connections
  without either end being able to know it happened 

In this scenario, after suitably long period of no traffic between Zope
and Zeo, the firewall would disconnect them but they would still think
they were connected, and we would get a problem like yours.

Dieter Maurer observed the same thing and gave me the hint that this
might be the problem.  Implementing his suggested "keepalive" product 
was less trouble than arguing with the firewall administrators.
http://aspn.activestate.com/ASPN/Mail/Message/zope-list/2918584

I used something very close to that, I believe I just saved it as 
Products/ZeoKeepalive/__init__.py.
(I've changed jobs so I'm going by memory.)

-- 

Paul Winkler
http://www.slinkp.com
___
Zope-Dev maillist  -  Zope-Dev@zope.org
http://mail.zope.org/mailman/listinfo/zope-dev
**  No cross posts or HTML encoding!  **
(Related lists - 
 http://mail.zope.org/mailman/listinfo/zope-announce
 http://mail.zope.org/mailman/listinfo/zope )


Re: [Zope-dev] TCP CLOSE_WAIT leaks

2006-03-29 Thread Alan Milligan
I managed to get a DeadlockDebugger trace on this thing, it made very
interesting reading:

Thread -1578087504 (GET
/VirtualHostBase/http/www.last-bastion.net:80/BastionLedger/ledgerlist/Mail_338253076):
  File "/opt/zope2.8/lib/python/ZServer/PubCore/ZServerPublisher.py",
line 23, in __init__
response=response)
  File "/opt/zope2.8/lib/python/ZPublisher/Publish.py", line 387, in
publish_module
environ, debug, request, response)
  File "/opt/zope2.8/lib/python/ZPublisher/Publish.py", line 188, in
publish_module_standard
response = publish(request, module_name, after_list, debug=debug)
  File
"/opt/zope2.8/lib/python/Products/PlacelessTranslationService/PatchStringIO.py",
line 51, in new_publish
x = Publish.old_publish(request, module_name, after_list, debug)
  File "/opt/zope2.8/lib/python/ZPublisher/Publish.py", line 114, in publish
request, bind=1)
  File "/opt/zope2.8/lib/python/ZPublisher/mapply.py", line 88, in mapply
if debug is not None: return debug(object,args,context)
  File "/opt/zope2.8/lib/python/ZPublisher/Publish.py", line 40, in
call_object
result=apply(object,args) # Type s to step into published object.
  File "/opt/zope2.8/lib/python/Shared/DC/Scripts/Bindings.py", line
311, in __call__
return self._bindAndExec(args, kw, None)
  File "/opt/zope2.8/lib/python/Shared/DC/Scripts/Bindings.py", line
329, in _bindAndExec
def _bindAndExec(self, args, kw, caller_namespace):
  File "/opt/zope2.8/lib/python/Shared/DC/Scripts/Bindings.py", line
348, in _bindAndExec
return self._exec(bound_data, args, kw)
  File "/opt/zope2.8/lib/python/Products/CMFCore/FSPageTemplate.py",
line 195, in _exec
result = self.pt_render(extra_context=bound_names)
  File "/opt/zope2.8/lib/python/Products/CMFCore/FSPageTemplate.py",
line 134, in pt_render
result = FSPageTemplate.inheritedAttribute('pt_render')(
  File "/opt/zope2.8/lib/python/Products/PageTemplates/PageTemplate.py",
line 104, in pt_render
tal=not source, strictinsert=0)()
  File "/opt/zope2.8/lib/python/TAL/TALInterpreter.py", line 202, in
__call__
def __call__(self):
  File "/opt/zope2.8/lib/python/Products/PageTemplates/ZRPythonExpr.py",
line 47, in __call__
return eval(code, g, {})
  File "Python expression "putils.createTopLevelTabs(actions)"", line 1,
in 
  File "/opt/zope2.8/lib/python/Products/CMFPlone/PloneTool.py", line
701, in createTopLevelTabs
default=data['name'])
  File
"/opt/zope2.8/lib/python/Products/CMFPlone/TranslationServiceTool.py",
line 35, in utranslate
return utranslate(*args, **kw)
  File "/opt/zope2.8/lib/python/Products/CMFPlone/i18nl10n.py", line 53,
in utranslate
return service.utranslate(*args, **kw)
  File
"/opt/zope2.8/lib/python/Products/PlacelessTranslationService/PlacelessTranslationService.py",
line 133, in utranslate
if not service: return default
  File "/opt/zope2.8/lib/python/ZODB/Connection.py", line 704, in setstate
self._setstate(obj)
  File "/opt/zope2.8/lib/python/ZODB/Connection.py", line 760, in _setstate
self._reader.setGhostState(obj, p)
  File "/opt/zope2.8/lib/python/ZODB/serialize.py", line 495, in
setGhostState
state = self.getState(pickle)
  File "/opt/zope2.8/lib/python/ZODB/serialize.py", line 488, in getState
return unpickler.load()
  File "/opt/zope2.8/lib/python/ZODB/serialize.py", line 436, in
_persistent_load
return self._conn.get(oid)
  File "/opt/zope2.8/lib/python/ZODB/Connection.py", line 216, in get
return obj
  File "/opt/zope2.8/lib/python/ZEO/ClientStorage.py", line 746, in load
return self.loadEx(oid, version)[:2]
  File "/opt/zope2.8/lib/python/ZEO/ClientStorage.py", line 781, in loadEx
return data, tid, ver

*every* thread was block-waiting on zeo (from a wide range of different
Zope/Plone types)!  It looks to me like Apache has timed out, clearing
down it's end, Zope however is still having to wait for zeo which is
completely borked.

I've consequently ditched zeo and everything is again well-behaved.

Alan

___
Zope-Dev maillist  -  Zope-Dev@zope.org
http://mail.zope.org/mailman/listinfo/zope-dev
**  No cross posts or HTML encoding!  **
(Related lists - 
 http://mail.zope.org/mailman/listinfo/zope-announce
 http://mail.zope.org/mailman/listinfo/zope )


Re: [Zope-dev] TCP CLOSE_WAIT leaks

2006-03-28 Thread Paul Winkler
On Tue, Mar 28, 2006 at 10:44:09AM +1000, Alan Milligan wrote:
> Unfortunately, the server is out of threads and I can't use
> DeadlockDebugger to see what exactly Z2 thinks each thread is doing.

Have you tried the gdb instructions that come up if
you google "debug spinning zope"?

-- 

Paul Winkler
http://www.slinkp.com
___
Zope-Dev maillist  -  Zope-Dev@zope.org
http://mail.zope.org/mailman/listinfo/zope-dev
**  No cross posts or HTML encoding!  **
(Related lists - 
 http://mail.zope.org/mailman/listinfo/zope-announce
 http://mail.zope.org/mailman/listinfo/zope )


Re: [Zope-dev] TCP CLOSE_WAIT leaks

2006-03-27 Thread Alan Milligan
Dieter Maurer wrote:

>>I've been noticing a situation for some time that my 2.8.5 Z2 is running
>>out of threads.  When doing an lsof on it, I see a large number of TCP
>>handles in CLOSE_WAIT state.
>>
>>It seems something's not properly clearing down the connection to
>>release the thread.
> 
> 
> I doubt that this is the case.
> 
>   The IO, including the closing of the connection, is
>   handled by the ZServer thread. This is a single thread.
>   You would immediately observe should it block (Zope would
>   be irresponsive).
> 

This *is* exactly the behaviour I'm observing.

>   When the connection is closed, the Zope worker thread has
>   already been released. By the way, their number is fixed
>   (and usually small).

Unfortunately, the server is out of threads and I can't use
DeadlockDebugger to see what exactly Z2 thinks each thread is doing.

As these threads have not completed, there's no access log entry, and
the other socket terminal has obviously closed, meaning I can't chase
the issue down from the other end either.

I've upgraded a lot of software, and this issue is present on 2.8.6 as
well.  I'm confident this issue is with Zope (or more likely, a badly
behaving product) - it manifests with httpd-2.0.54, and
httpd-2.2.0-5.1.2 (both forked and worker implementations).

Any ideas on how to track it down??

Alan

___
Zope-Dev maillist  -  Zope-Dev@zope.org
http://mail.zope.org/mailman/listinfo/zope-dev
**  No cross posts or HTML encoding!  **
(Related lists - 
 http://mail.zope.org/mailman/listinfo/zope-announce
 http://mail.zope.org/mailman/listinfo/zope )


Re: [Zope-dev] TCP CLOSE_WAIT leaks

2006-03-22 Thread Dieter Maurer
Alan Milligan wrote at 2006-3-22 10:12 +1100:
>I've been noticing a situation for some time that my 2.8.5 Z2 is running
>out of threads.  When doing an lsof on it, I see a large number of TCP
>handles in CLOSE_WAIT state.
>
>It seems something's not properly clearing down the connection to
>release the thread.

I doubt that this is the case.

  The IO, including the closing of the connection, is
  handled by the ZServer thread. This is a single thread.
  You would immediately observe should it block (Zope would
  be irresponsive).

  When the connection is closed, the Zope worker thread has
  already been released. By the way, their number is fixed
  (and usually small).

-- 
Dieter
___
Zope-Dev maillist  -  Zope-Dev@zope.org
http://mail.zope.org/mailman/listinfo/zope-dev
**  No cross posts or HTML encoding!  **
(Related lists - 
 http://mail.zope.org/mailman/listinfo/zope-announce
 http://mail.zope.org/mailman/listinfo/zope )


[Zope-dev] TCP CLOSE_WAIT leaks

2006-03-21 Thread Alan Milligan
Hi,

I've been noticing a situation for some time that my 2.8.5 Z2 is running
out of threads.  When doing an lsof on it, I see a large number of TCP
handles in CLOSE_WAIT state.

It seems something's not properly clearing down the connection to
release the thread.

Unfortunately, these connections were almost guaranteed to have been set
up via Apache redirect processing, and it's quite successfully cleaned
up its end ...

Does anyone have any idea how this situation might occur, and why Zope's
not properly housekeeping these (maybe Medusa needs a timeout set on
close waits??)

Alan
___
Zope-Dev maillist  -  Zope-Dev@zope.org
http://mail.zope.org/mailman/listinfo/zope-dev
**  No cross posts or HTML encoding!  **
(Related lists - 
 http://mail.zope.org/mailman/listinfo/zope-announce
 http://mail.zope.org/mailman/listinfo/zope )