Re: [Zope-dev] TCP CLOSE_WAIT leaks
Alan Milligan wrote at 2006-4-4 21:19 +1000: >Dieter Maurer wrote: > >> Looks as if your Python were compiled without debugging symbols. >> You will get much better tracebacks (easier to understand), >> when you rebuild your Python with debugging symbols. > >But of course, it's a production system :( - temporarily installing >python-debuginfo ... Debug symbols do not increase the runtime. They only make the binaries (considerably) larger. You must also recompile the Zope extensions (as they compile the same way, Python has been compiled). -- Dieter ___ Zope-Dev maillist - Zope-Dev@zope.org http://mail.zope.org/mailman/listinfo/zope-dev ** No cross posts or HTML encoding! ** (Related lists - http://mail.zope.org/mailman/listinfo/zope-announce http://mail.zope.org/mailman/listinfo/zope )
Re: [Zope-dev] TCP CLOSE_WAIT leaks
Dieter Maurer wrote: > Looks as if your Python were compiled without debugging symbols. > You will get much better tracebacks (easier to understand), > when you rebuild your Python with debugging symbols. But of course, it's a production system :( - temporarily installing python-debuginfo ... > > The following GDB macros can be used to help in the analysis. > "pfr" can be called in frames "eval_frame" to learn > about the Python frame. > > def ps > x/s ({PyStringObject}$arg0)->ob_sval > end > Dieter, I don't know where you manage to find this stuff!! Have stuck it in my .gdbinit and am knocking myself out with it... Cheers, Alan ___ Zope-Dev maillist - Zope-Dev@zope.org http://mail.zope.org/mailman/listinfo/zope-dev ** No cross posts or HTML encoding! ** (Related lists - http://mail.zope.org/mailman/listinfo/zope-announce http://mail.zope.org/mailman/listinfo/zope )
Re: [Zope-dev] TCP CLOSE_WAIT leaks
Alan Milligan wrote at 2006-4-3 12:05 +1000: > ... >#36 0xa86a19b0 in ?? () >#37 0xa86a19ac in ?? () >#38 0xb7a07583 in ?? () from >/opt/zope2.8/lib/python/Acquisition/_Acquisition.so >#39 0xb7a83660 in ?? () >#40 0x in ?? () > > >I must confess that I'm quite puzzled that the main thread is also >blocked - DeadlockDebugger's publish magic is not being invoked, >concurring with strace. Looks as if your Python were compiled without debugging symbols. You will get much better tracebacks (easier to understand), when you rebuild your Python with debugging symbols. The following GDB macros can be used to help in the analysis. "pfr" can be called in frames "eval_frame" to learn about the Python frame. def ps x/s ({PyStringObject}$arg0)->ob_sval end def pfr ps f->f_code->co_filename ps f->f_code->co_name #p f->f_lineno lineno end define lineno set $__co = f->f_code set $__lasti = f->f_lasti set $__sz = ((PyStringObject *)$__co->co_lnotab)->ob_size/2 set $__p = (unsigned char *)((PyStringObject *)$__co->co_lnotab)->ob_sval set $__li = $__co->co_firstlineno set $__ad = 0 while ($__sz-1 >= 0) set $__sz = $__sz - 1 set $__ad = $__ad + *$__p set $__p = $__p + 1 if ($__ad > $__lasti) # break -- interpreted as "breakpoint" set $__sz = -1 end if ($__sz >= 0) set $__li = $__li + *$__p set $__p = $__p + 1 end end printf "%d\n", $__li end -- Dieter ___ Zope-Dev maillist - Zope-Dev@zope.org http://mail.zope.org/mailman/listinfo/zope-dev ** No cross posts or HTML encoding! ** (Related lists - http://mail.zope.org/mailman/listinfo/zope-announce http://mail.zope.org/mailman/listinfo/zope )
Re: [Zope-dev] TCP CLOSE_WAIT leaks
Dieter Maurer wrote: > Alan Milligan wrote at 2006-3-28 10:44 +1000: > A missing "clearing down" of the client's connection cannot be the > cause for this. The (worker) thread must finish (and you > should see a log entry) long before this connection is expected to > be closed. The problem you observe seems to indicate a problem > inside your request processing. Activating the "debuglogger" > (or similarly spelled) can prove whether or not this assumption > is correct. > You are correct. It appears that all the threads are blocked on a lock.acquire() call (possibly three given the memory addresses) : [EMAIL PROTECTED] tmp]# strace -fp `cat /opt/zope2.8/instance/var/Z2.pid` Process 12814 attached with 23 threads - interrupt to quit [pid 12788] futex(0x83560c0, FUTEX_WAIT, 0, NULL [pid 12790] futex(0x83560c0, FUTEX_WAIT, 0, NULL [pid 12798] futex(0x93e7f50, FUTEX_WAIT, 0, NULL [pid 12802] futex(0x93e7f50, FUTEX_WAIT, 0, NULL [pid 12803] futex(0x93e7f50, FUTEX_WAIT, 0, NULL [pid 12806] futex(0x93e7f50, FUTEX_WAIT, 0, NULL [pid 12808] futex(0x83560c0, FUTEX_WAIT, 0, NULL [pid 12809] futex(0x93e7f50, FUTEX_WAIT, 0, NULL [pid 12810] futex(0x93e7f50, FUTEX_WAIT, 0, NULL [pid 12795] futex(0x93e7f50, FUTEX_WAIT, 0, NULL [pid 12813] futex(0x83560c0, FUTEX_WAIT, 0, NULL [pid 12807] futex(0x93e7f50, FUTEX_WAIT, 0, NULL [pid 12804] futex(0x93e7f50, FUTEX_WAIT, 0, NULL [pid 12800] futex(0x93e7f50, FUTEX_WAIT, 0, NULL [pid 12797] futex(0x93e7f50, FUTEX_WAIT, 0, NULL [pid 12812] futex(0x93e7f50, FUTEX_WAIT, 0, NULL [pid 12796] futex(0x93e7f50, FUTEX_WAIT, 0, NULL [pid 12805] futex(0x93e7f50, FUTEX_WAIT, 0, NULL [pid 12814] futex(0x93e7f50, FUTEX_WAIT, 0, NULL [pid 12799] futex(0x93e7f50, FUTEX_WAIT, 0, NULL [pid 12792] futex(0xa993760, FUTEX_WAIT, 0, NULL [pid 12801] futex(0x93e7f50, FUTEX_WAIT, 0, NULL I tried to take Paul's advice to use gdb to get a python stacktrace, but gbd wouldn't let me: (gdb) call PyRun_SimpleString("import sys, traceback; sys.stderr=open('/tmp/tb','w',0); traceback.print_stack()") Program received signal SIGSTOP, Stopped (signal). [Switching to Thread -1469408336 (LWP 31487)] 0x00a527a2 in _dl_sysinfo_int80 () from /lib/ld-linux.so.2 The program being debugged was signaled while in a function called from GDB. GDB remains in the frame where the signal was received. To change this behavior use "set unwindonsignal on" Evaluation of the expression containing the function (malloc) will be abandoned. So, setting unwindonsignal (gdb) set unwindonsignal on (gdb) call PyRun_SimpleString("import sys, traceback; sys.stderr=open('/tmp/tb','w',0); traceback.print_stack()") Program received signal SIGSTOP, Stopped (signal). [Switching to Thread -1259328592 (LWP 31465)] 0x00a527a2 in _dl_sysinfo_int80 () from /lib/ld-linux.so.2 The program being debugged was signaled while in a function called from GDB. GDB has restored the context to what it was before the call. To change this behavior use "set unwindonsignal off" Evaluation of the expression containing the function (PyRun_SimpleString) will be abandoned The best I could do was a good old-fashioned where ... (gdb) where #0 0x00a527a2 in _dl_sysinfo_int80 () from /lib/ld-linux.so.2 #1 0x00cb4a24 in [EMAIL PROTECTED] () from /lib/tls/libpthread.so.0 #2 0x0099842c in ?? () from /usr/lib/libpython2.3.so.1.0 #3 0x0096f04e in PyThread_acquire_lock () from /usr/lib/libpython2.3.so.1.0 #4 0x009722f7 in _PyObject_GC_Del () from /usr/lib/libpython2.3.so.1.0 #5 0x0091b921 in PyCFunction_Call () from /usr/lib/libpython2.3.so.1.0 #6 0x0094e63f in _PyEval_SliceIndex () from /usr/lib/libpython2.3.so.1.0 #7 0x009501b6 in PyEval_EvalCodeEx () from /usr/lib/libpython2.3.so.1.0 #8 0x0094ee6b in _PyEval_SliceIndex () from /usr/lib/libpython2.3.so.1.0 #9 0x0094fa65 in _PyEval_SliceIndex () from /usr/lib/libpython2.3.so.1.0 #10 0x0094fa65 in _PyEval_SliceIndex () from /usr/lib/libpython2.3.so.1.0 #11 0x009501b6 in PyEval_EvalCodeEx () from /usr/lib/libpython2.3.so.1.0 #12 0x0090be4e in PyFunction_SetClosure () from /usr/lib/libpython2.3.so.1.0 #13 0x008f8627 in PyObject_Call () from /usr/lib/libpython2.3.so.1.0 #14 0x008ffdb8 in PyMethod_New () from /usr/lib/libpython2.3.so.1.0 #15 0x008f8627 in PyObject_Call () from /usr/lib/libpython2.3.so.1.0 #16 0x008f8824 in PyObject_CallMethod () from /usr/lib/libpython2.3.so.1.0 #17 0xb7b3762b in ?? () from /opt/zope2.8/lib/python/persistent/cPersistence.so #18 0xa67cf3cc in ?? () #19 0xb7b3900a in ?? () from /opt/zope2.8/lib/python/persistent/cPersistence.so #20 0xb7b39008 in ?? () from /opt/zope2.8/lib/python/persistent/cPersistence.so #21 0x9c10776c in ?? () #22 0xb786cce8 in ?? () from /opt/zope2.8/lib/python/Persistence/_Persistence.so #23 0x9c10776c in ?? () #24 0xa86a1968 in ?? () #25 0xb7b38ae4 in ?? () from /opt/zope2.8/lib/python/persistent/cPersistence.so #26 0xb7a83660 in ?? () #27 0xb7a83660 in ?? () #28 0xa86a19a8 in ?? () #29 0xb786b934 in ?? () from /opt/zope2.8/lib/p
Re: [Zope-dev] TCP CLOSE_WAIT leaks
Alan Milligan wrote at 2006-3-28 10:44 +1000: > ... >>>It seems something's not properly clearing down the connection to >>>release the thread. > ... >As these threads have not completed, there's no access log entry, and >the other socket terminal has obviously closed, meaning I can't chase >the issue down from the other end either. A missing "clearing down" of the client's connection cannot be the cause for this. The (worker) thread must finish (and you should see a log entry) long before this connection is expected to be closed. The problem you observe seems to indicate a problem inside your request processing. Activating the "debuglogger" (or similarly spelled) can prove whether or not this assumption is correct. -- Dieter ___ Zope-Dev maillist - Zope-Dev@zope.org http://mail.zope.org/mailman/listinfo/zope-dev ** No cross posts or HTML encoding! ** (Related lists - http://mail.zope.org/mailman/listinfo/zope-announce http://mail.zope.org/mailman/listinfo/zope )
Re: [Zope-dev] TCP CLOSE_WAIT leaks
On Thu, Mar 30, 2006 at 02:32:58AM +1000, Alan Milligan wrote: > I managed to get a DeadlockDebugger trace on this thing, it made very > interesting reading: (snip) > File "/opt/zope2.8/lib/python/ZEO/ClientStorage.py", line 781, in loadEx > return data, tid, ver > > *every* thread was block-waiting on zeo (from a wide range of different > Zope/Plone types)! It looks to me like Apache has timed out, clearing > down it's end, Zope however is still having to wait for zeo which is > completely borked. > > I've consequently ditched zeo and everything is again well-behaved. Is your zeo server on a separate box? Is there a firewall between them? The *only* time I've ever had problems like that was in the following scenario: * firewall between zope and zeo * minimal traffic at times (it was a secondary system, most of its usage was when our primary data center was down for maintenance) * firewall was of an evil type that tears down "unused" connections without either end being able to know it happened In this scenario, after suitably long period of no traffic between Zope and Zeo, the firewall would disconnect them but they would still think they were connected, and we would get a problem like yours. Dieter Maurer observed the same thing and gave me the hint that this might be the problem. Implementing his suggested "keepalive" product was less trouble than arguing with the firewall administrators. http://aspn.activestate.com/ASPN/Mail/Message/zope-list/2918584 I used something very close to that, I believe I just saved it as Products/ZeoKeepalive/__init__.py. (I've changed jobs so I'm going by memory.) -- Paul Winkler http://www.slinkp.com ___ Zope-Dev maillist - Zope-Dev@zope.org http://mail.zope.org/mailman/listinfo/zope-dev ** No cross posts or HTML encoding! ** (Related lists - http://mail.zope.org/mailman/listinfo/zope-announce http://mail.zope.org/mailman/listinfo/zope )
Re: [Zope-dev] TCP CLOSE_WAIT leaks
I managed to get a DeadlockDebugger trace on this thing, it made very interesting reading: Thread -1578087504 (GET /VirtualHostBase/http/www.last-bastion.net:80/BastionLedger/ledgerlist/Mail_338253076): File "/opt/zope2.8/lib/python/ZServer/PubCore/ZServerPublisher.py", line 23, in __init__ response=response) File "/opt/zope2.8/lib/python/ZPublisher/Publish.py", line 387, in publish_module environ, debug, request, response) File "/opt/zope2.8/lib/python/ZPublisher/Publish.py", line 188, in publish_module_standard response = publish(request, module_name, after_list, debug=debug) File "/opt/zope2.8/lib/python/Products/PlacelessTranslationService/PatchStringIO.py", line 51, in new_publish x = Publish.old_publish(request, module_name, after_list, debug) File "/opt/zope2.8/lib/python/ZPublisher/Publish.py", line 114, in publish request, bind=1) File "/opt/zope2.8/lib/python/ZPublisher/mapply.py", line 88, in mapply if debug is not None: return debug(object,args,context) File "/opt/zope2.8/lib/python/ZPublisher/Publish.py", line 40, in call_object result=apply(object,args) # Type s to step into published object. File "/opt/zope2.8/lib/python/Shared/DC/Scripts/Bindings.py", line 311, in __call__ return self._bindAndExec(args, kw, None) File "/opt/zope2.8/lib/python/Shared/DC/Scripts/Bindings.py", line 329, in _bindAndExec def _bindAndExec(self, args, kw, caller_namespace): File "/opt/zope2.8/lib/python/Shared/DC/Scripts/Bindings.py", line 348, in _bindAndExec return self._exec(bound_data, args, kw) File "/opt/zope2.8/lib/python/Products/CMFCore/FSPageTemplate.py", line 195, in _exec result = self.pt_render(extra_context=bound_names) File "/opt/zope2.8/lib/python/Products/CMFCore/FSPageTemplate.py", line 134, in pt_render result = FSPageTemplate.inheritedAttribute('pt_render')( File "/opt/zope2.8/lib/python/Products/PageTemplates/PageTemplate.py", line 104, in pt_render tal=not source, strictinsert=0)() File "/opt/zope2.8/lib/python/TAL/TALInterpreter.py", line 202, in __call__ def __call__(self): File "/opt/zope2.8/lib/python/Products/PageTemplates/ZRPythonExpr.py", line 47, in __call__ return eval(code, g, {}) File "Python expression "putils.createTopLevelTabs(actions)"", line 1, in File "/opt/zope2.8/lib/python/Products/CMFPlone/PloneTool.py", line 701, in createTopLevelTabs default=data['name']) File "/opt/zope2.8/lib/python/Products/CMFPlone/TranslationServiceTool.py", line 35, in utranslate return utranslate(*args, **kw) File "/opt/zope2.8/lib/python/Products/CMFPlone/i18nl10n.py", line 53, in utranslate return service.utranslate(*args, **kw) File "/opt/zope2.8/lib/python/Products/PlacelessTranslationService/PlacelessTranslationService.py", line 133, in utranslate if not service: return default File "/opt/zope2.8/lib/python/ZODB/Connection.py", line 704, in setstate self._setstate(obj) File "/opt/zope2.8/lib/python/ZODB/Connection.py", line 760, in _setstate self._reader.setGhostState(obj, p) File "/opt/zope2.8/lib/python/ZODB/serialize.py", line 495, in setGhostState state = self.getState(pickle) File "/opt/zope2.8/lib/python/ZODB/serialize.py", line 488, in getState return unpickler.load() File "/opt/zope2.8/lib/python/ZODB/serialize.py", line 436, in _persistent_load return self._conn.get(oid) File "/opt/zope2.8/lib/python/ZODB/Connection.py", line 216, in get return obj File "/opt/zope2.8/lib/python/ZEO/ClientStorage.py", line 746, in load return self.loadEx(oid, version)[:2] File "/opt/zope2.8/lib/python/ZEO/ClientStorage.py", line 781, in loadEx return data, tid, ver *every* thread was block-waiting on zeo (from a wide range of different Zope/Plone types)! It looks to me like Apache has timed out, clearing down it's end, Zope however is still having to wait for zeo which is completely borked. I've consequently ditched zeo and everything is again well-behaved. Alan ___ Zope-Dev maillist - Zope-Dev@zope.org http://mail.zope.org/mailman/listinfo/zope-dev ** No cross posts or HTML encoding! ** (Related lists - http://mail.zope.org/mailman/listinfo/zope-announce http://mail.zope.org/mailman/listinfo/zope )
Re: [Zope-dev] TCP CLOSE_WAIT leaks
On Tue, Mar 28, 2006 at 10:44:09AM +1000, Alan Milligan wrote: > Unfortunately, the server is out of threads and I can't use > DeadlockDebugger to see what exactly Z2 thinks each thread is doing. Have you tried the gdb instructions that come up if you google "debug spinning zope"? -- Paul Winkler http://www.slinkp.com ___ Zope-Dev maillist - Zope-Dev@zope.org http://mail.zope.org/mailman/listinfo/zope-dev ** No cross posts or HTML encoding! ** (Related lists - http://mail.zope.org/mailman/listinfo/zope-announce http://mail.zope.org/mailman/listinfo/zope )
Re: [Zope-dev] TCP CLOSE_WAIT leaks
Dieter Maurer wrote: >>I've been noticing a situation for some time that my 2.8.5 Z2 is running >>out of threads. When doing an lsof on it, I see a large number of TCP >>handles in CLOSE_WAIT state. >> >>It seems something's not properly clearing down the connection to >>release the thread. > > > I doubt that this is the case. > > The IO, including the closing of the connection, is > handled by the ZServer thread. This is a single thread. > You would immediately observe should it block (Zope would > be irresponsive). > This *is* exactly the behaviour I'm observing. > When the connection is closed, the Zope worker thread has > already been released. By the way, their number is fixed > (and usually small). Unfortunately, the server is out of threads and I can't use DeadlockDebugger to see what exactly Z2 thinks each thread is doing. As these threads have not completed, there's no access log entry, and the other socket terminal has obviously closed, meaning I can't chase the issue down from the other end either. I've upgraded a lot of software, and this issue is present on 2.8.6 as well. I'm confident this issue is with Zope (or more likely, a badly behaving product) - it manifests with httpd-2.0.54, and httpd-2.2.0-5.1.2 (both forked and worker implementations). Any ideas on how to track it down?? Alan ___ Zope-Dev maillist - Zope-Dev@zope.org http://mail.zope.org/mailman/listinfo/zope-dev ** No cross posts or HTML encoding! ** (Related lists - http://mail.zope.org/mailman/listinfo/zope-announce http://mail.zope.org/mailman/listinfo/zope )
Re: [Zope-dev] TCP CLOSE_WAIT leaks
Alan Milligan wrote at 2006-3-22 10:12 +1100: >I've been noticing a situation for some time that my 2.8.5 Z2 is running >out of threads. When doing an lsof on it, I see a large number of TCP >handles in CLOSE_WAIT state. > >It seems something's not properly clearing down the connection to >release the thread. I doubt that this is the case. The IO, including the closing of the connection, is handled by the ZServer thread. This is a single thread. You would immediately observe should it block (Zope would be irresponsive). When the connection is closed, the Zope worker thread has already been released. By the way, their number is fixed (and usually small). -- Dieter ___ Zope-Dev maillist - Zope-Dev@zope.org http://mail.zope.org/mailman/listinfo/zope-dev ** No cross posts or HTML encoding! ** (Related lists - http://mail.zope.org/mailman/listinfo/zope-announce http://mail.zope.org/mailman/listinfo/zope )
[Zope-dev] TCP CLOSE_WAIT leaks
Hi, I've been noticing a situation for some time that my 2.8.5 Z2 is running out of threads. When doing an lsof on it, I see a large number of TCP handles in CLOSE_WAIT state. It seems something's not properly clearing down the connection to release the thread. Unfortunately, these connections were almost guaranteed to have been set up via Apache redirect processing, and it's quite successfully cleaned up its end ... Does anyone have any idea how this situation might occur, and why Zope's not properly housekeeping these (maybe Medusa needs a timeout set on close waits??) Alan ___ Zope-Dev maillist - Zope-Dev@zope.org http://mail.zope.org/mailman/listinfo/zope-dev ** No cross posts or HTML encoding! ** (Related lists - http://mail.zope.org/mailman/listinfo/zope-announce http://mail.zope.org/mailman/listinfo/zope )