Re: [Zope-dev] Re: more on the segfault saga
Hello segfaulters and others interested in Zope instability issues! Our demi-god Matt Kromer from ZopeCorp has come up with a possible way to corner the instability issue AND give you a stable, cycle-garbage collecting Zope. Since the problem seems, so far, to be caused by the Python Restricted Compiler (which is used in everything from dtml expressions to python scripts to other stuff) not completing fully collectable objects before the Python cycle garbage collector finds them, the solution is to lock out the gc while creating these objects. The only easy way to do this currently is to disable the automatic gc and run manual garbage collections only when we're pretty sure no one else is running, and at the same time not letting anyone else run when we're running the gc. While I can't speak for Matt but since this is a fairly urgent matter, I believe he agrees that those experiencing segfaults are encouraged to replace Zope/ZServer/PubCore/ZServerPublisher.py with the attached file, which should work on 2.4.x and 2.5.x series Zopes, and report your instability results. This is the same file that can be found at: http://zope.org/Members/matt/ZServerPublisher.py with the difference that my version has some lines removed that are only interesting for those that applied Matt's cprof patches mentioned earlier on this list (which, I bet, means only me :-). The file is small enough so that you can manually look and see that I've installed no trojans in it :-) but those of a paranoid nature are encouraged to download Matt's version and remove the two lines that mention 'cprof'. We're close guys, very close. Cheers, Leo PS: standard disclaimers: I don't speak for anyone else but me and I won't be held responsible for anything you do to your site based on the aforementioned intructions. If you break your site with them, you get to keep both pieces :-) -- Ideas don't stay in some minds very long because they don't like solitary confinement. ## # # Copyright (c) 2001 Zope Corporation and Contributors. All Rights Reserved. # # This software is subject to the provisions of the Zope Public License, # Version 2.0 (ZPL). A copy of the ZPL should accompany this distribution. # THIS SOFTWARE IS PROVIDED AS IS AND ANY AND ALL EXPRESS OR IMPLIED # WARRANTIES ARE DISCLAIMED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED # WARRANTIES OF TITLE, MERCHANTABILITY, AGAINST INFRINGEMENT, AND FITNESS # FOR A PARTICULAR PURPOSE # ## from ZPublisher import publish_module import ThreadLock gc_lock = ThreadLock.allocate_lock() active_threads = [0,] class ZServerPublisher: def __init__(self, accept): import os import sys import gc gc.disable() while 1: try: name, request, response=accept() gc_lock.acquire() active_threads[0] = active_threads[0] + 1 gc_lock.release() publish_module( name, request=request, response=response) finally: response._finish() request=response=None gc_lock.acquire() a = active_threads[0] - 1 if a == 0: #sys.stderr.write(Invoking gc.collect()\n) gc.collect() else: #sys.stderr.write(Skipping gc.collect(), %d threads active\n% a) pass active_threads[0] = a gc_lock.release()
Re: [Zope-dev] Re: more on the segfault saga
Ok, got some data on using this patches. First of all, for those following, these patches don't seem to work well if starting Zope as root, cause gdb will be started as the user Zope turns to, and this gdb won't be able to attach to a root started process, even if it's dropped it's privileges. Now, the gdb.cmd script that comes with it is not being able to make the trace_dump file for some reason. Below are the urls to Zope's stdout/err in 2 segfault instances, one generated by an external method that calls cprof.segfault() and another that was generated by normal load. http://www.ibccrim.org.br/imagens/data-temp/stdout-20020321-ext-method-segfault http://www.ibccrim.org.br/imagens/data-temp/stdout-20020321-natural-segfault The 'No such process' message might be caused by the process dying while trying to generate the file in the trace_dump() call, but I don't know why would that be. I'll see if I can install another Zope instance where it all belongs to another user, so that we can rule out lack of permissions for this problem. On Tue, 2002-03-19 at 18:10, Matthew T. Kromer wrote: Leonardo Rochael Almeida wrote: The official unofficial Zope place on irc is #zope at irc.openprojects.net. Lots of cool and very knowledgeable people hang out there. OK, I put up a set of patches and a rather frazzled looking README for a profiler patch to Python at http://www.zope.org/Members/matt You want the C profiler patch; you have to build your OWN python 2.1.2 and it will probably only work under Linux -- dont bother with Windows, parts of the code use mmap() for speed and Windows doesn't provide mmap. There's a README document inside that has some rather vague and minimal installation instructions. This is very definately use-at-your-own-risk stuff. I'm posting notice here because others are interested in trying to help diagnose the Zope crashing problem so this serves as a reminder of where something is as it sits in your inbox waiting for bits to decay. Here's the readme in its entirety: To activate python tracing Rebuild a clean python 2.1.2 with the two patches (included) applied. Patch 1 is for the garbage collector module, it installs a segfault handler which allows for an environment variable CRASHCMD to be present to tell python what to do in the event of a segfault. Patch 2 is a patch to ceval.c which builds in addtional tracing. The cprof module must be built; a simple make -f Makefile.pre.in PYTHON=/path/to/rebuilt/python2.1.2 will build the cprof module. Once built, test the cprof module /path/to/rebuild/python2.1.2 import cprof cprof.activate() cprof.dump(filename) and the filename specified should be created. For the curious, the pb.py program will play back the trace file to get data out of it. PATCHING ZOPE TO USE THIS Replace Zope's ZServer/PubCore/ZServerPublisher file with the included one. Patch the line that contains the gdb command to point to your rebuilt python. Copy the file gdb.cmd to where you start Zope. Copy the file cprof.so to lib/python in your Zope directory Start Zope. Wait. GDB will be invoked to gather crash data, save the gdb output if possble (keep stdout from gdb). Unfortunately, the README forgets to mention that you need to run Zope under the patched python. Whoops. -- Matt Kromer Zope Corporation http://www.zope.com/ -- Ideas don't stay in some minds very long because they don't like solitary confinement. ___ Zope-Dev maillist - [EMAIL PROTECTED] http://lists.zope.org/mailman/listinfo/zope-dev ** No cross posts or HTML encoding! ** (Related lists - http://lists.zope.org/mailman/listinfo/zope-announce http://lists.zope.org/mailman/listinfo/zope )
Re: [Zope-dev] Re: more on the segfault saga
Leonardo Rochael Almeida wrote: The official unofficial Zope place on irc is #zope at irc.openprojects.net. Lots of cool and very knowledgeable people hang out there. OK, I put up a set of patches and a rather frazzled looking README for a profiler patch to Python at http://www.zope.org/Members/matt You want the C profiler patch; you have to build your OWN python 2.1.2 and it will probably only work under Linux -- dont bother with Windows, parts of the code use mmap() for speed and Windows doesn't provide mmap. There's a README document inside that has some rather vague and minimal installation instructions. This is very definately use-at-your-own-risk stuff. I'm posting notice here because others are interested in trying to help diagnose the Zope crashing problem so this serves as a reminder of where something is as it sits in your inbox waiting for bits to decay. Here's the readme in its entirety: To activate python tracing Rebuild a clean python 2.1.2 with the two patches (included) applied. Patch 1 is for the garbage collector module, it installs a segfault handler which allows for an environment variable CRASHCMD to be present to tell python what to do in the event of a segfault. Patch 2 is a patch to ceval.c which builds in addtional tracing. The cprof module must be built; a simple make -f Makefile.pre.in PYTHON=/path/to/rebuilt/python2.1.2 will build the cprof module. Once built, test the cprof module /path/to/rebuild/python2.1.2 import cprof cprof.activate() cprof.dump(filename) and the filename specified should be created. For the curious, the pb.py program will play back the trace file to get data out of it. PATCHING ZOPE TO USE THIS Replace Zope's ZServer/PubCore/ZServerPublisher file with the included one. Patch the line that contains the gdb command to point to your rebuilt python. Copy the file gdb.cmd to where you start Zope. Copy the file cprof.so to lib/python in your Zope directory Start Zope. Wait. GDB will be invoked to gather crash data, save the gdb output if possble (keep stdout from gdb). Unfortunately, the README forgets to mention that you need to run Zope under the patched python. Whoops. -- Matt Kromer Zope Corporation http://www.zope.com/ ___ Zope-Dev maillist - [EMAIL PROTECTED] http://lists.zope.org/mailman/listinfo/zope-dev ** No cross posts or HTML encoding! ** (Related lists - http://lists.zope.org/mailman/listinfo/zope-announce http://lists.zope.org/mailman/listinfo/zope )
Re: [Zope-dev] Re: more on the segfault saga
Hi Martijn, We're basically just trying to construct traps to try to identify a smoking gun. The upside is, if it works, we'll be able to fix the bug very quickly. However, its based on assumptions about the exact nature of the bug -- so each trap I write essentially is making a hypothesis and then testing it. Because Leo can get the crash very quickly, if you have a difficult time reproducing it, you don't need to spend a lot of effort trying to keep up with my traps. On Friday, March 15, 2002, at 06:19 AM, Martijn Jacobs wrote: Hello Leo, Matt, Brian, I'm on it. Will send results when they're available. If anyone wants to talk to me during the period, I'll be on IRC. If you need any assistance for anything, I'm at your service Which channel/server are you on IRC? Did somebody succeed reproducing the crash? We try the best we can to make a reproducable testcase, but Zope doesn't want to crash here... The clients who use the production Zope which crashes are all using Active Desktop (I know :( ), could that be of any matter? Technically it's insane if it does matter, but you never know... I'm out of capabilities right now, don't know what to do anymore, so I hope the bug will be found soon. regards, martijn ___ Zope-Dev maillist - [EMAIL PROTECTED] http://lists.zope.org/mailman/listinfo/zope-dev ** No cross posts or HTML encoding! ** (Related lists - http://lists.zope.org/mailman/listinfo/zope-announce http://lists.zope.org/mailman/listinfo/zope ) ___ Zope-Dev maillist - [EMAIL PROTECTED] http://lists.zope.org/mailman/listinfo/zope-dev ** No cross posts or HTML encoding! ** (Related lists - http://lists.zope.org/mailman/listinfo/zope-announce http://lists.zope.org/mailman/listinfo/zope )
Re: [Zope-dev] Re: more on the segfault saga
On Fri, 2002-03-15 at 08:19, Martijn Jacobs wrote: Hello Leo, Matt, Brian, I'm on it. Will send results when they're available. If anyone wants to talk to me during the period, I'll be on IRC. If you need any assistance for anything, I'm at your service Which channel/server are you on IRC? The official unofficial Zope place on irc is #zope at irc.openprojects.net. Lots of cool and very knowledgeable people hang out there. I'll be there today while I apply Matt's incref patches and run Zope I also have a very demanding client who goes bezerk every time the site is down, so I recomend you do the following, if you want to help with debugging (this assumes you run Zope behind a proxy server such as apache or squid): * Install ZEO on your current Zope and configure both the ZEO Client and Server on the same machine serving your site. Only the ZEO Client should get the segfaults and it restarts much faster (less than 10 secs, usually) than in standalone mode. * Open a source Zope package in another directory. Open a Python source package next to it. Configure Python to install it's files inside this Zope tree (./configure --prefix=/path/to/Zope-src). Apply Matt's patches, make and make install. Install ZEO in this instance but only configure the ZEO Client, making it listen in a different port from the other Zope. Copy over all the external methods and extra products, and make it access the other instance ZEO Server. Don't forget to REDIRECT STDERR TO A FILE (the best way is to redirect stderr to stdout and append stdout to a file). Start it and check that it's working as expected. * Keep two configuration files of your frontend proxy around: one pointing the site to the original Zope and another pointing the site to the instrumented Zope. When you want to test the crashes, switch the conf. files around and reload the proxy. * Report everything you find in Zope stderr. * If you want to increase the perceived stability of your site, put the two following lines somewhere in the original Zope z2.py: import gc gc.disable() It should stop crashing, but it'll start leaking instead. If the leak isn't so severe that it allows you to restart only once a day, in the period of least traffic, then leave it that way. Having ZEO Client will ensure you have the least amount of downtime possible in this restart. -- Ideas don't stay in some minds very long because they don't like solitary confinement. ___ Zope-Dev maillist - [EMAIL PROTECTED] http://lists.zope.org/mailman/listinfo/zope-dev ** No cross posts or HTML encoding! ** (Related lists - http://lists.zope.org/mailman/listinfo/zope-announce http://lists.zope.org/mailman/listinfo/zope )
Re: [Zope-dev] Re: more on the segfault saga
Sorry, the correct URL is http://www.coherence.nl/crash.txt (without the dot) martijn. ___ Zope-Dev maillist - [EMAIL PROTECTED] http://lists.zope.org/mailman/listinfo/zope-dev ** No cross posts or HTML encoding! ** (Related lists - http://lists.zope.org/mailman/listinfo/zope-announce http://lists.zope.org/mailman/listinfo/zope )
Re: [Zope-dev] Re: more on the segfault saga
The On Wed, 2002-03-13 at 10:05, Martijn Jacobs wrote: [...] I don't know where to start, because attaching GDB doesn't make any sense, since you have to start zope single threaded (according to Matts Stability Howto) and then no crashes occur. Actually, at least in Linux, with a recent gdb, you can attach gdb to zope in multithread mode. Just take the -t 1 from the command line sugested by the StabilityHOWTO and you're set. Best results are achieved by compiling everything from source (python even, use the --prefix=/path/to/zope-src so as not to mix up with your installed python and be careful to use this python when installing zope) and running: $ VARIABLE=value gdb path/to/your/python (gdb) run z2.py -Z '' where VARIABLE=value should be replaced by the env vars that are set in the ./start script inside Zope. Is this problem solved if I install python 2.2 for example? Are there any bugfixes in this release from Python 2.1.2 ? No, as far as I know. I don't know what the status is right now? Is zope corp. working on it trying to find the bug? Can I be of any help tracking down this bug? I don't know about Zope Corp. in general, but Matt Kromer has been trying to help as much as his time permits. I think you're helping a lot just by reporting this problem because it helps raise awarenes to the fact that the stability problems aren't all solved with the last Zope/Python releases. So far there are three confirmed cases of instability: yours, mine and Dario's. All of them seem to involve PythonScripts, although this might not be related, and all of them are solved by using '-t 1' (is that correct, Dario?) so it looks like a threading issue. Let's just hope ZC or someone else in the community with more knowledge of the Zope/Python internal arcana can help us debug this, 'cause it's reached the limit of our exploration capability. Cheers, Leo ___ Zope-Dev maillist - [EMAIL PROTECTED] http://lists.zope.org/mailman/listinfo/zope-dev ** No cross posts or HTML encoding! ** (Related lists - http://lists.zope.org/mailman/listinfo/zope-announce http://lists.zope.org/mailman/listinfo/zope )
Re: [Zope-dev] Re: more on the segfault saga
Actually, at least in Linux, with a recent gdb, you can attach gdb to zope in multithread mode. Just take the -t 1 from the command line sugested by the StabilityHOWTO and you're set. Best results are achieved by compiling everything from source (python even, use the --prefix=/path/to/zope-src so as not to mix up with your installed python and be careful to use this python when installing zope) and running: Ok, I succeeded tying up the gdb on the production server. I have to wait until tomorrow for results, because in the evening the intranet is not used by the specific company :) Tomorrow it will crash for sure, because it crashes about 20/30 times a day, so then I will post the results as soon as possible! It's very frustrating that we cannot reproduce this bug in out own environment, whatever we try. (all workstations requesting like hell, but we cannot succeed crashing it!) It's very nice to hear that you people are trying to solve the problem, also thanks to the guys from Zope Corp. who are spending there time for it! Hope the bug will be resolved soon. kind regards, martijn jacobs ___ Zope-Dev maillist - [EMAIL PROTECTED] http://lists.zope.org/mailman/listinfo/zope-dev ** No cross posts or HTML encoding! ** (Related lists - http://lists.zope.org/mailman/listinfo/zope-announce http://lists.zope.org/mailman/listinfo/zope )