Re: [Zope-dev] more on the segfault saga
I've applied both patches, however I've changed the incref part a little. Now it reads: #define Py_INCREF(op) ((op)-ob_refcnt 0 ? (op)-ob_refcnt++ : fprintf(stderr,Eeek! Increfing an object from refct 0 at %s:%d\n,__FILE__,__LINE__), (op)-ob_refcnt++) It's all in one line if my MUA wrapped it. I did it so as to make sure it doesn't crash in different places than it crashed before. I'll report anything I find Cheers, Leo On Thu, 2002-03-14 at 19:44, Matthew T. Kromer wrote: Matthew T. Kromer wrote: Attached is another diagnostic patch which you might apply to Python. If you apply this patch, you WILL need to rebuild Zope to include it. What it will do is complain to stderr if an object is INCREF'd from refcount 0. It also silences the complaint for the one area which I know revives dead objects. This patch will probably cause a crash after an erroneous incref-from-0 is detected, since it doesnt actually DO the incref in that case. The intent is to find a case in the code where an object is held between threads; one thread decrefs to zero, the other thread increfs, causing a revive -- but too late to save the patient. extensionclass also brings back the dead; the following patch to Zope's extensionclass will turn off the warning when it happens when you apply the previous patch that I sent out that complains when an object is incref'd from a refcount of zero. -- Matt Kromer Zope Corporation http://www.zope.com/ Index: lib/Components/ExtensionClass/src/ExtensionClass.c === RCS file: /cvs-repository/Zope/lib/Components/ExtensionClass/src/ExtensionClass.c,v retrieving revision 1.46.36.1 diff -u -r1.46.36.1 ExtensionClass.c --- lib/Components/ExtensionClass/src/ExtensionClass.c4 Oct 2001 14:25:19 - 1.46.36.1 +++ lib/Components/ExtensionClass/src/ExtensionClass.c14 Mar 2002 22:43:10 - @@ -3047,8 +3047,9 @@ fprintf(stderr,Deallocating a %s\n, self-ob_type-tp_name); #endif + self-ob_refcnt++; PyErr_Fetch(t,v,tb); - Py_INCREF(self); /* Give us a new lease on life */ + /* Py_INCREF(self); /* Give us a new lease on life */ if (subclass_watcher ! PyObject_CallMethod(subclass_watcher,destroying,O,self)) -- Ideas don't stay in some minds very long because they don't like solitary confinement. ___ Zope-Dev maillist - [EMAIL PROTECTED] http://lists.zope.org/mailman/listinfo/zope-dev ** No cross posts or HTML encoding! ** (Related lists - http://lists.zope.org/mailman/listinfo/zope-announce http://lists.zope.org/mailman/listinfo/zope )
Re: [Zope-dev] more on the segfault saga
On Mon, 2002-03-18 at 17:44, Leonardo Rochael Almeida wrote: I've applied both patches, however I've changed the incref part a little. Now it reads: #define Py_INCREF(op) ((op)-ob_refcnt 0 ? (op)-ob_refcnt++ : fprintf(stderr,Eeek! Increfing an object from refct 0 at %s:%d\n,__FILE__,__LINE__), (op)-ob_refcnt++) Scratch that. It should read: #define Py_INCREF(op) ((op)-ob_refcnt 0 ? (op)-ob_refcnt++ : (fprintf(stderr,Eeek! Increfing an object from refct 0 at %s:%d\n,__FILE__,__LINE__), (op)-ob_refcnt++)) the precedence on the previous version would probably cause a leak. -- Ideas don't stay in some minds very long because they don't like solitary confinement. ___ Zope-Dev maillist - [EMAIL PROTECTED] http://lists.zope.org/mailman/listinfo/zope-dev ** No cross posts or HTML encoding! ** (Related lists - http://lists.zope.org/mailman/listinfo/zope-announce http://lists.zope.org/mailman/listinfo/zope )
Re: [Zope-dev] more on the segfault saga
Leonardo Rochael Almeida wrote: On Wed, 2002-03-13 at 21:30, Matthew T. Kromer wrote: On Wednesday, March 13, 2002, at 10:40 AM, Leonardo Rochael Almeida wrote: What about patching Python to report the freed objects like you mentioned on IRC? Also, how about turning on some flags in gc.seg_debug()? Do you think we might be able to glance something by seeing what objects where logged as freed or by storing them in gc.garbage? setting gc.set_debug(gc.DEBUG_LEAK) floods your stderr in a way you can only believe by seeing it. And it didn't give me any clue. the last object freed was an instance method. Most everything running inside Zope is an instance method or another... OK, I'm attaching a patch to Python's Modules/gcmodule.c which should set a trap for where the garbage collector trips over bad data; this will grab the bad data and send it to stderr so I can build a better trap. This is ONLY step one in tracking this down. You will have to rebuild Python to activate this patch; and all it basically is doing is setting a SIGSEGV handler; and setting up a small trace area for the GC to record data in to, so at the time the SIGSEGV comes in, it can print out what the last thing was the code was doing. This is ONLY going to tell me that the GC tripped over something, but it WILL at least tell me what object it is scanning, that object's refcount (which I bet is zero, and forms the basis for a better trap) and the object's type and traverse pointers. The traverse pointer should NOT be null. If it is, then thats something wrong with gc being called for that type. If you apply this patch, run Zope with a python with this patch applied with stderr saved to a file. send me the file, and then you can revert to running zope w/o the patch. When the patch triggers, it will exit Python immediately with exit code 999 after it prints its information. -- Matt Kromer Zope Corporation http://www.zope.com/ --- Modules/gcmodule.c.orig Thu Mar 14 10:35:21 2002 +++ Modules/gcmodule.c Thu Mar 14 11:14:13 2002 -22,6 +22,8 #include Python.h #ifdef WITH_CYCLE_GC +#include signal.h +#include stdarg.h /* magic gc_refs value */ #define GC_MOVED -1 -34,6 +36,7 static PyGC_Head generation2 = {generation2, generation2, 0}; static int generation = 0; /* current generation being collected */ + /* collection frequencies, XXX tune these */ static int enabled = 1; /* automatic collection enabled? */ static int threshold0 = 700; /* net new containers before collection */ -60,12 +63,82 DEBUG_SAVEALL static int debug; + +static int CRASHTRAP = 0; +static int CRASHFLAG = 0; +static char *CRASHTYPE = NULL; +static int CRASHLOG[16]; + /* list of uncollectable objects */ static PyObject *garbage; /* Python string to use if unhandled exception occurs */ static PyObject *gc_str; +static void CRASH_trip(int i, siginfo_t *siginfo, void *p) { + + int n; + + fprintf(stderr,CRASH %d at %08x\n, (int) siginfo-si_signo, + (unsigned int) siginfo-si_addr); + + if (CRASHFLAG == 0) { + fprintf(stderr,\tCrash handler not activated for this!\n); + } else { + fprintf(stderr,\tCrash type %s\n, CRASHTYPE ? CRASHTYPE : (none)); + fprintf(stderr,\tCrash log: %d values: , CRASHLOG[0]); + for (n = 0; n CRASHLOG[0]; n++) { + fprintf(stderr, %08x, (unsigned int) CRASHLOG[n+1]); + } + fprintf(stderr,\n); + } + exit(999); +} + +static void CRASH_activate(void) { + + struct sigaction sa; + struct sigaction oldsa; + + sa.sa_sigaction = CRASH_trip; + sigemptyset(sa.sa_mask); + sa.sa_flags = SA_SIGINFO; + + if (CRASHTRAP == 0) { + sigaction(SIGSEGV, sa, oldsa); + CRASHTRAP = 1; + } + + CRASHFLAG = 1; + CRASHTYPE = NULL; + CRASHLOG[0] = 0; + +} + +static void CRASH_deactivate(void) { + CRASHFLAG = 0; +} + +static void CRASH_type(char *s) { + CRASHTYPE = s; +} + +static void CRASH_record(int n, ...) { + va_list ap; + int i; + + va_start(ap, n); + + for (i = 0; i n; i++) { + CRASHLOG[i+1] = va_arg(ap, int); + } + + va_end(ap); + + CRASHLOG[0] = n; +} + + /*** list functions ***/ static void -164,13 +237,29 subtract_refs(PyGC_Head *containers) { traverseproc traverse; + PyObject *obj; + + PyGC_Head *gc = containers-gc_next; + + CRASH_activate(); + CRASH_type(subtract_refs); + for (; gc != containers; gc=gc-gc_next) { + obj = (PyObject *)PyObject_FROM_GC(gc); + CRASH_record(4, obj, + obj != 0 ? obj-ob_refcnt : 0, + obj != NULL ? obj-ob_type : NULL, + obj != NULL obj-ob_type != NULL ? +
Re: [Zope-dev] more on the segfault saga
5On Thu, 2002-03-14 at 13:28, Matthew T. Kromer wrote: OK, I'm attaching a patch to Python's Modules/gcmodule.c which should set a trap for where the garbage collector trips over bad data; this will grab the bad data and send it to stderr so I can build a better trap. I'm on it. Will send results when they're available. If anyone wants to talk to me during the period, I'll be on IRC. -- Ideas don't stay in some minds very long because they don't like solitary confinement. ___ Zope-Dev maillist - [EMAIL PROTECTED] http://lists.zope.org/mailman/listinfo/zope-dev ** No cross posts or HTML encoding! ** (Related lists - http://lists.zope.org/mailman/listinfo/zope-announce http://lists.zope.org/mailman/listinfo/zope )
Re: [Zope-dev] more on the segfault saga
Don't know if this helps, but the last three segfaults I have seen were right after someone logs in, during loading /manage. Zope-2.5.0 Win32 binary on Win2k. The pop-up referenced the same instruction 0x1e13490a at the same memory address 0x005c all three times, saying 'memory could not be read.' -- Jim Washington ___ Zope-Dev maillist - [EMAIL PROTECTED] http://lists.zope.org/mailman/listinfo/zope-dev ** No cross posts or HTML encoding! ** (Related lists - http://lists.zope.org/mailman/listinfo/zope-announce http://lists.zope.org/mailman/listinfo/zope )
Re: [Zope-dev] more on the segfault saga
Leonardo Rochael Almeida writes: In any event, Martijn Jacobs (a.k.a. instability case #3 :-) sees his crashes in pure dtml methods, which could mean that PythonScripts are inocent in this case... or not, since the segfault hits inside the gc, which might be collecting something completely unrelated to the current requests. Just a wild guess: is the GC guaranteed to be thread safe? Dieter ___ Zope-Dev maillist - [EMAIL PROTECTED] http://lists.zope.org/mailman/listinfo/zope-dev ** No cross posts or HTML encoding! ** (Related lists - http://lists.zope.org/mailman/listinfo/zope-announce http://lists.zope.org/mailman/listinfo/zope )
Re: [Zope-dev] more on the segfault saga
Dieter Maurer wrote: Leonardo Rochael Almeida writes: In any event, Martijn Jacobs (a.k.a. instability case #3 :-) sees his crashes in pure dtml methods, which could mean that PythonScripts are inocent in this case... or not, since the segfault hits inside the gc, which might be collecting something completely unrelated to the current requests. Just a wild guess: is the GC guaranteed to be thread safe? Dieter I'm fairly sure it is; certainly, there's an activity flag which should prevent the collector from being reentered. -- Matt Kromer Zope Corporation http://www.zope.com/ ___ Zope-Dev maillist - [EMAIL PROTECTED] http://lists.zope.org/mailman/listinfo/zope-dev ** No cross posts or HTML encoding! ** (Related lists - http://lists.zope.org/mailman/listinfo/zope-announce http://lists.zope.org/mailman/listinfo/zope )
Re: [Zope-dev] more on the segfault saga
On Thu, 2002-03-14 at 17:17, Dieter Maurer wrote: Leonardo Rochael Almeida writes: In any event, Martijn Jacobs (a.k.a. instability case #3 :-) sees his crashes in pure dtml methods, which could mean that PythonScripts are inocent in this case... or not, since the segfault hits inside the gc, which might be collecting something completely unrelated to the current requests. Just a wild guess: is the GC guaranteed to be thread safe? The gc acquires the big interpreter lock before doing it's stuff. which is not the same thing, since C code could be doing bad stuff. Question, how the gc differentiates between an unreachable object and an object that's reachable only by C code? -- Ideas don't stay in some minds very long because they don't like solitary confinement. ___ Zope-Dev maillist - [EMAIL PROTECTED] http://lists.zope.org/mailman/listinfo/zope-dev ** No cross posts or HTML encoding! ** (Related lists - http://lists.zope.org/mailman/listinfo/zope-announce http://lists.zope.org/mailman/listinfo/zope )
Re: [Zope-dev] more on the segfault saga
Attached is another diagnostic patch which you might apply to Python. If you apply this patch, you WILL need to rebuild Zope to include it. What it will do is complain to stderr if an object is INCREF'd from refcount 0. It also silences the complaint for the one area which I know revives dead objects. This patch will probably cause a crash after an erroneous incref-from-0 is detected, since it doesnt actually DO the incref in that case. The intent is to find a case in the code where an object is held between threads; one thread decrefs to zero, the other thread increfs, causing a revive -- but too late to save the patient. -- Matt Kromer Zope Corporation http://www.zope.com/ --- Include/object.h.orig Thu Mar 14 16:44:36 2002 +++ Include/object.hThu Mar 14 16:54:29 2002 -442,7 +442,7 #define _Py_NewReference(op) ((op)-ob_refcnt = 1) #endif -#define Py_INCREF(op) ((op)-ob_refcnt++) +#define Py_INCREF(op) ((op)-ob_refcnt 0 ? (op)-ob_refcnt++ : +fprintf(stderr,Eeek! Increfing an object from refct 0 at %s:%d\n,__FILE__,__LINE__) +) #define Py_DECREF(op) \ if (--(op)-ob_refcnt != 0) \ ; \ --- Objects/classobject.c.orig Thu Mar 14 17:04:40 2002 +++ Objects/classobject.c Thu Mar 14 17:01:36 2002 -535,7 +535,8 #endif #else /* !Py_TRACE_REFS */ /* Py_INCREF boosts _Py_RefTotal if Py_REF_DEBUG is defined */ - Py_INCREF(inst); + /* Py_INCREF(inst); */ + inst-ob_refcnt++; /* we dont want to trap this one */ #endif /* !Py_TRACE_REFS */ /* Save the current exception, if any. */
Re: [Zope-dev] more on the segfault saga
Matthew T. Kromer wrote: Attached is another diagnostic patch which you might apply to Python. If you apply this patch, you WILL need to rebuild Zope to include it. What it will do is complain to stderr if an object is INCREF'd from refcount 0. It also silences the complaint for the one area which I know revives dead objects. This patch will probably cause a crash after an erroneous incref-from-0 is detected, since it doesnt actually DO the incref in that case. The intent is to find a case in the code where an object is held between threads; one thread decrefs to zero, the other thread increfs, causing a revive -- but too late to save the patient. extensionclass also brings back the dead; the following patch to Zope's extensionclass will turn off the warning when it happens when you apply the previous patch that I sent out that complains when an object is incref'd from a refcount of zero. -- Matt Kromer Zope Corporation http://www.zope.com/ Index: lib/Components/ExtensionClass/src/ExtensionClass.c === RCS file: /cvs-repository/Zope/lib/Components/ExtensionClass/src/ExtensionClass.c,v retrieving revision 1.46.36.1 diff -u -r1.46.36.1 ExtensionClass.c --- lib/Components/ExtensionClass/src/ExtensionClass.c 4 Oct 2001 14:25:19 - 1.46.36.1 +++ lib/Components/ExtensionClass/src/ExtensionClass.c 14 Mar 2002 22:43:10 - -3047,8 +3047,9 fprintf(stderr,Deallocating a %s\n, self-ob_type-tp_name); #endif + self-ob_refcnt++; PyErr_Fetch(t,v,tb); - Py_INCREF(self); /* Give us a new lease on life */ + /* Py_INCREF(self); /* Give us a new lease on life */ if (subclass_watcher ! PyObject_CallMethod(subclass_watcher,destroying,O,self))
Re: [Zope-dev] more on the segfault saga
Hi Matt, I'll wait for the patch where you also silence the dead-raising area in ExtensionClass. What if, instead of detecting this situation, we try to detect if the incref is happening without the interpreter lock held? increfs and decrefs shouldn't be happening freely and simultaneously even in C code right? Is holding the interpreter lock the correct way to signal that you'll be doing increfings and decrefings in C code? Cheers, Leo On Thu, 2002-03-14 at 19:10, Matthew T. Kromer wrote: Attached is another diagnostic patch which you might apply to Python. If you apply this patch, you WILL need to rebuild Zope to include it. What it will do is complain to stderr if an object is INCREF'd from refcount 0. It also silences the complaint for the one area which I know revives dead objects. This patch will probably cause a crash after an erroneous incref-from-0 is detected, since it doesnt actually DO the incref in that case. The intent is to find a case in the code where an object is held between threads; one thread decrefs to zero, the other thread increfs, causing a revive -- but too late to save the patient. -- Matt Kromer Zope Corporation http://www.zope.com/ --- Include/object.h.orig Thu Mar 14 16:44:36 2002 +++ Include/object.h Thu Mar 14 16:54:29 2002 @@ -442,7 +442,7 @@ #define _Py_NewReference(op) ((op)-ob_refcnt = 1) #endif -#define Py_INCREF(op) ((op)-ob_refcnt++) +#define Py_INCREF(op) ((op)-ob_refcnt 0 ? (op)-ob_refcnt++ : fprintf(stderr,Eeek! Increfing an object from refct 0 at %s:%d\n,__FILE__,__LINE__) ) #define Py_DECREF(op) \ if (--(op)-ob_refcnt != 0) \ ; \ --- Objects/classobject.c.origThu Mar 14 17:04:40 2002 +++ Objects/classobject.c Thu Mar 14 17:01:36 2002 @@ -535,7 +535,8 @@ #endif #else /* !Py_TRACE_REFS */ /* Py_INCREF boosts _Py_RefTotal if Py_REF_DEBUG is defined */ - Py_INCREF(inst); + /* Py_INCREF(inst); */ + inst-ob_refcnt++; /* we dont want to trap this one */ #endif /* !Py_TRACE_REFS */ /* Save the current exception, if any. */ -- Ideas don't stay in some minds very long because they don't like solitary confinement. ___ Zope-Dev maillist - [EMAIL PROTECTED] http://lists.zope.org/mailman/listinfo/zope-dev ** No cross posts or HTML encoding! ** (Related lists - http://lists.zope.org/mailman/listinfo/zope-announce http://lists.zope.org/mailman/listinfo/zope )
Re: [Zope-dev] more on the segfault saga
Dieter Maurer wrote Just a wild guess: is the GC guaranteed to be thread safe? Yep. The GC is _almost_ certainly not the problem here - it's just that the GC is the poor bunny that has to walk through the objects in memory. So when something's been mangled, the GC is the thing that falls over and breaks. I think I've mentioned it before, but looking at the object _before_ the corrupted one in memory might be a useful thing to try... Anthony -- Anthony Baxter [EMAIL PROTECTED] It's never to late to have a happy childhood. ___ Zope-Dev maillist - [EMAIL PROTECTED] http://lists.zope.org/mailman/listinfo/zope-dev ** No cross posts or HTML encoding! ** (Related lists - http://lists.zope.org/mailman/listinfo/zope-announce http://lists.zope.org/mailman/listinfo/zope )
Re: [Zope-dev] more on the segfault saga
On Tuesday, March 12, 2002, at 05:08 PM, Leonardo Rochael Almeida wrote: Matthew, thanks for taking the time to gdb the beast with me. Did you come up with any instrumentation I should add to Python or Zope to get what it is that Python is trying to resease twice? If you want, I can arrange a MySQL-less period during production so that we can capture the crash in a cleaner environment. Sorry, I only dug out what I had and made THAT work; it was a single-thread profiler. It will require some modding to turn it into a useful debug tool instead, and I've been busy with other things. Speaking of instrumentation, since the gremlin seems to be threading related (it stops with '-t 1'), it might be useful to serialize certain parts of the execution path with semaphors, like the path to the restoration or the execution of PythonScripts, the path to the execution of SQL queries, etc. this means running with small locks in certain sections instead of the big '-t 1' lock (which is not really a lock, but you get the picture :-). Well, if you have the energy to try serializing some of of the base parts of the code, by all means, go ahead. I can't even begin to guess where the problem is though. ___ Zope-Dev maillist - [EMAIL PROTECTED] http://lists.zope.org/mailman/listinfo/zope-dev ** No cross posts or HTML encoding! ** (Related lists - http://lists.zope.org/mailman/listinfo/zope-announce http://lists.zope.org/mailman/listinfo/zope )
Re: [Zope-dev] more on the segfault saga
On Wed, 2002-03-13 at 09:09, Matthew T. Kromer wrote: On Tuesday, March 12, 2002, at 05:08 PM, Leonardo Rochael Almeida wrote: Matthew, thanks for taking the time to gdb the beast with me. Did you come up with any instrumentation I should add to Python or Zope to get what it is that Python is trying to resease twice? If you want, I can arrange a MySQL-less period during production so that we can capture the crash in a cleaner environment. Sorry, I only dug out what I had and made THAT work; it was a single-thread profiler. It will require some modding to turn it into a useful debug tool instead, and I've been busy with other things. What about patching Python to report the freed objects like you mentioned on IRC? Also, how about turning on some flags in gc.seg_debug()? Do you think we might be able to glance something by seeing what objects where logged as freed or by storing them in gc.garbage? Speaking of instrumentation, since the gremlin seems to be threading related (it stops with '-t 1'), it might be useful to serialize certain parts of the execution path with semaphors [...] Well, if you have the energy to try serializing some of of the base parts of the code, by all means, go ahead. I can't even begin to guess where the problem is though. Well, I have the energy, I just don't know where to start. But it's beginning to look like I'll just have to roll up my sleeves and dive in C code to hunt this beast down. And to think that I'd chosen Python as my official programming language to avoid just that... :-) -- Ideas don't stay in some minds very long because they don't like solitary confinement. ___ Zope-Dev maillist - [EMAIL PROTECTED] http://lists.zope.org/mailman/listinfo/zope-dev ** No cross posts or HTML encoding! ** (Related lists - http://lists.zope.org/mailman/listinfo/zope-announce http://lists.zope.org/mailman/listinfo/zope )
Re: [Zope-dev] more on the segfault saga
Leonardo Rochael Almeida wrote: Well, I have the energy, I just don't know where to start. But it's beginning to look like I'll just have to roll up my sleeves and dive in C code to hunt this beast down. And to think that I'd chosen Python as my official programming language to avoid just that... :-) I just found out about something that might help. If you compiled against the GNU C library, you can set the environment variable MALLOC_CHECK_ to 1 to get malloc usage warnings printed to stderr, or set it to 2 to cause an abort() as soon as an error is detected. Assuming you're running in production, I'd start with 1 (making sure stderr is connected to something), then if any warnings occur but they aren't informative enough, switch to 2. I learned this here: http://www.gnu.org/manual/glibc-2.2.3/html_node/libc_32.html Shane ___ Zope-Dev maillist - [EMAIL PROTECTED] http://lists.zope.org/mailman/listinfo/zope-dev ** No cross posts or HTML encoding! ** (Related lists - http://lists.zope.org/mailman/listinfo/zope-announce http://lists.zope.org/mailman/listinfo/zope )
Re: [Zope-dev] more on the segfault saga
I set MALLOC_CHECK_ to 1 and it said it was using the malloc debug hooks, but didn't report anything else before the crashes, so no point in setting it to 2... On Wed, 2002-03-13 at 13:49, Shane Hathaway wrote: Leonardo Rochael Almeida wrote: On Wed, 2002-03-13 at 13:04, Shane Hathaway wrote: I just found out about something that might help. If you compiled against the GNU C library, you can set the environment variable MALLOC_CHECK_ to 1 to get malloc usage warnings printed to stderr, or set it to 2 to cause an abort() as soon as an error is detected. Assuming you're running in production, I'd start with 1 (making sure stderr is connected to something), then if any warnings occur but they aren't informative enough, switch to 2. I learned this here: http://www.gnu.org/manual/glibc-2.2.3/html_node/libc_32.html Thanks Shane, I'll try that. But first I need a way to not supply -D and still get the stderr redirected. This site uses cookie authentication (exUserFolder) and even though the traceback ends up in a page that is shortly redirected from, some of our client's customers can spot it sometimes and they usually call complaining about the Zope error they saw imediatelly before the login page so we had to disable '-D'. -D is actually not related AFAIK. The C library will output to stderr regardless of whether -D is supplied, which means you need to use standard redirection anyway, for example: ./start /var/local/log/zope_output 21 -- Ideas don't stay in some minds very long because they don't like solitary confinement. ___ Zope-Dev maillist - [EMAIL PROTECTED] http://lists.zope.org/mailman/listinfo/zope-dev ** No cross posts or HTML encoding! ** (Related lists - http://lists.zope.org/mailman/listinfo/zope-announce http://lists.zope.org/mailman/listinfo/zope )
Re: [Zope-dev] more on the segfault saga
On Wed, 2002-03-13 at 21:30, Matthew T. Kromer wrote: On Wednesday, March 13, 2002, at 10:40 AM, Leonardo Rochael Almeida wrote: What about patching Python to report the freed objects like you mentioned on IRC? Also, how about turning on some flags in gc.seg_debug()? Do you think we might be able to glance something by seeing what objects where logged as freed or by storing them in gc.garbage? setting gc.set_debug(gc.DEBUG_LEAK) floods your stderr in a way you can only believe by seeing it. And it didn't give me any clue. the last object freed was an instance method. Most everything running inside Zope is an instance method or another... Well, what I'm thinking about doing is trying to patch the Py_DECREF macro to record the free objects in a table and mark the freed memory with a signal value. Good thing it doesn't involve writing anything to stderr (right?), otherwise if gc.DEBUG_LEAK is a flood I cannot even begin to imagine the flood of PyDECREF messages... If you can produce a patch, I'm more than willing to apply it. I'm worried about the python script aspects. It's frustrating, because I am not aware of anything in pythonscripts that should be thread-dependent. The way the bytecode versions of said PythonScripts are kept in memory, perhaps? Although that's not likely, since each thread keeps its own version of that, even when recompilation is needed, right? In any event, Martijn Jacobs (a.k.a. instability case #3 :-) sees his crashes in pure dtml methods, which could mean that PythonScripts are inocent in this case... or not, since the segfault hits inside the gc, which might be collecting something completely unrelated to the current requests. Questions: If I call gc.disable() but run gc.collect() from time to time I get the same effect, right? In this case, where in the code would I put a call to gc.collect() to get it to happen after the second phase of the two phase commit? Another aproach: would it be possible to, from time to time, put zope in a state where it enqueue new connections instead of servicing them while waiting for the currently running requests to finnish, then run gc.collect(), then start servicing requests again? Cheers, Leo ___ Zope-Dev maillist - [EMAIL PROTECTED] http://lists.zope.org/mailman/listinfo/zope-dev ** No cross posts or HTML encoding! ** (Related lists - http://lists.zope.org/mailman/listinfo/zope-announce http://lists.zope.org/mailman/listinfo/zope )
[Zope-dev] more on the segfault saga
narrator voice=koshAnd so it continues.../narrator I've finally recompiled all pythonScripts (all scripts and ZCatalog tricks I tried before didn't know how to get the PythonScripts inside the ZClasses. BTW, if anyone is interested, I can send you the scripts I used to recompile all pythonScripts inside ZClasses). But it still segfaults (and it doesn't seem to be any more stable, so there Anthony :-). At least I got a cleaner stupid_log_file, no more needs recompilation messages :-) Matthew, thanks for taking the time to gdb the beast with me. Did you come up with any instrumentation I should add to Python or Zope to get what it is that Python is trying to resease twice? If you want, I can arrange a MySQL-less period during production so that we can capture the crash in a cleaner environment. Speaking of instrumentation, since the gremlin seems to be threading related (it stops with '-t 1'), it might be useful to serialize certain parts of the execution path with semaphors, like the path to the restoration or the execution of PythonScripts, the path to the execution of SQL queries, etc. this means running with small locks in certain sections instead of the big '-t 1' lock (which is not really a lock, but you get the picture :-). Cheers, Leo -- Ideas don't stay in some minds very long because they don't like solitary confinement. ___ Zope-Dev maillist - [EMAIL PROTECTED] http://lists.zope.org/mailman/listinfo/zope-dev ** No cross posts or HTML encoding! ** (Related lists - http://lists.zope.org/mailman/listinfo/zope-announce http://lists.zope.org/mailman/listinfo/zope )