Re: [Zope-dev] more on the segfault saga

2002-03-18 Thread Leonardo Rochael Almeida

I've applied both patches, however I've changed the incref part a
little. Now it reads:

#define Py_INCREF(op) ((op)-ob_refcnt  0 ? (op)-ob_refcnt++ :
fprintf(stderr,Eeek! Increfing an object from refct 0 at
%s:%d\n,__FILE__,__LINE__), (op)-ob_refcnt++)

It's all in one line if my MUA wrapped it. I did it so as to make sure
it doesn't crash in different places than it crashed before.

I'll report anything I find

Cheers, Leo

On Thu, 2002-03-14 at 19:44, Matthew T. Kromer wrote:
 Matthew T. Kromer wrote:
 
  Attached is another diagnostic patch which you might apply to Python. 
  If you apply this patch, you WILL need to rebuild Zope to include it.
 
  What it will do is complain to stderr if an object is INCREF'd from 
  refcount 0.  It also silences the complaint for the one area which I 
  know revives dead objects.
 
  This patch will probably cause a crash after an erroneous 
  incref-from-0 is detected, since it doesnt actually DO the incref in 
  that case.
 
  The intent is to find a case in the code where an object is held 
  between threads; one thread decrefs to zero, the other thread increfs, 
  causing a revive -- but too late to save the patient.
 
 
 extensionclass also brings back the dead; the following patch to Zope's 
 extensionclass will turn off the warning when it happens when you apply 
 the previous patch that I sent out that complains when an object is 
 incref'd from a refcount of zero.
 
 
 -- 
 Matt Kromer
 Zope Corporation  http://www.zope.com/ 
 
 
 
 

 Index: lib/Components/ExtensionClass/src/ExtensionClass.c
 ===
 RCS file: /cvs-repository/Zope/lib/Components/ExtensionClass/src/ExtensionClass.c,v
 retrieving revision 1.46.36.1
 diff -u -r1.46.36.1 ExtensionClass.c
 --- lib/Components/ExtensionClass/src/ExtensionClass.c4 Oct 2001 14:25:19 
-   1.46.36.1
 +++ lib/Components/ExtensionClass/src/ExtensionClass.c14 Mar 2002 22:43:10 
-
 @@ -3047,8 +3047,9 @@
fprintf(stderr,Deallocating a %s\n, self-ob_type-tp_name);
  #endif
  
 +  self-ob_refcnt++;
PyErr_Fetch(t,v,tb);
 -  Py_INCREF(self);   /* Give us a new lease on life */
 + /* Py_INCREF(self); /* Give us a new lease on life */
  
if (subclass_watcher 
   ! PyObject_CallMethod(subclass_watcher,destroying,O,self))
-- 
Ideas don't stay in some minds very long because they don't like
solitary confinement.


___
Zope-Dev maillist  -  [EMAIL PROTECTED]
http://lists.zope.org/mailman/listinfo/zope-dev
**  No cross posts or HTML encoding!  **
(Related lists - 
 http://lists.zope.org/mailman/listinfo/zope-announce
 http://lists.zope.org/mailman/listinfo/zope )



Re: [Zope-dev] more on the segfault saga

2002-03-18 Thread Leonardo Rochael Almeida

On Mon, 2002-03-18 at 17:44, Leonardo Rochael Almeida wrote:
 I've applied both patches, however I've changed the incref part a
 little. Now it reads:
 
 #define Py_INCREF(op) ((op)-ob_refcnt  0 ? (op)-ob_refcnt++ :
 fprintf(stderr,Eeek! Increfing an object from refct 0 at
 %s:%d\n,__FILE__,__LINE__), (op)-ob_refcnt++)

Scratch that. It should read:
#define Py_INCREF(op) ((op)-ob_refcnt  0 ? (op)-ob_refcnt++ :
(fprintf(stderr,Eeek! Increfing an object from refct 0 at
%s:%d\n,__FILE__,__LINE__), (op)-ob_refcnt++))

the precedence on the previous version would probably cause a leak.

-- 
Ideas don't stay in some minds very long because they don't like
solitary confinement.


___
Zope-Dev maillist  -  [EMAIL PROTECTED]
http://lists.zope.org/mailman/listinfo/zope-dev
**  No cross posts or HTML encoding!  **
(Related lists - 
 http://lists.zope.org/mailman/listinfo/zope-announce
 http://lists.zope.org/mailman/listinfo/zope )



Re: [Zope-dev] more on the segfault saga

2002-03-14 Thread Matthew T. Kromer

Leonardo Rochael Almeida wrote:

On Wed, 2002-03-13 at 21:30, Matthew T. Kromer wrote:

On Wednesday, March 13, 2002, at 10:40 AM, Leonardo Rochael Almeida 
wrote:

What about patching Python to report the freed objects like you
mentioned on IRC? Also, how about turning on some flags in
gc.seg_debug()? Do you think we might be able to glance something by
seeing what objects where logged as freed or by storing them in
gc.garbage?


setting gc.set_debug(gc.DEBUG_LEAK) floods your stderr in a way you can
only believe by seeing it. And it didn't give me any clue. the last
object freed was an instance method. Most everything running inside Zope
is an instance method or another...


OK, I'm attaching a patch to Python's Modules/gcmodule.c which should 
set a trap for where the garbage collector trips over bad data; this 
will grab the bad data and send it to stderr so I can build a better trap.

This is ONLY step one in tracking this down.  You will have to rebuild 
Python to activate this patch; and all it basically is doing is setting 
a SIGSEGV handler; and setting up a small trace area for the GC to 
record data in to, so at the time the SIGSEGV comes in, it can print out 
what the last thing was the code was doing.

This is ONLY going to tell me that the GC tripped over something, but it 
WILL at least tell me what object it is scanning, that object's refcount 
(which I bet is zero, and forms the basis for a better trap) and the 
object's type and traverse pointers.

The traverse pointer should NOT be null.  If it is, then thats something 
wrong with gc being called for that type.

If you apply this patch, run Zope with a python with this patch applied 
with stderr saved to a file.  send me the file, and then you can revert 
to running zope w/o the patch.

When the patch triggers, it will exit Python immediately with exit code 
999 after it prints its information.


-- 
Matt Kromer
Zope Corporation  http://www.zope.com/ 




--- Modules/gcmodule.c.orig Thu Mar 14 10:35:21 2002
+++ Modules/gcmodule.c  Thu Mar 14 11:14:13 2002
 -22,6 +22,8 
 #include Python.h
 
 #ifdef WITH_CYCLE_GC
+#include signal.h
+#include stdarg.h
 
 /* magic gc_refs value */
 #define GC_MOVED -1
 -34,6 +36,7 
 static PyGC_Head generation2 = {generation2, generation2, 0};
 static int generation = 0; /* current generation being collected */
 
+
 /* collection frequencies, XXX tune these */
 static int enabled = 1; /* automatic collection enabled? */
 static int threshold0 = 700; /* net new containers before collection */
 -60,12 +63,82 
DEBUG_SAVEALL
 static int debug;
 
+
+static int CRASHTRAP = 0;
+static int CRASHFLAG = 0;
+static char *CRASHTYPE = NULL;
+static int CRASHLOG[16];
+
 /* list of uncollectable objects */
 static PyObject *garbage;
 
 /* Python string to use if unhandled exception occurs */
 static PyObject *gc_str;
 
+static void CRASH_trip(int i, siginfo_t *siginfo, void *p) {
+
+   int n;
+
+   fprintf(stderr,CRASH %d at %08x\n, (int) siginfo-si_signo,
+   (unsigned int) siginfo-si_addr);
+
+   if (CRASHFLAG == 0) {
+   fprintf(stderr,\tCrash handler not activated for this!\n);
+   } else {
+   fprintf(stderr,\tCrash type %s\n, CRASHTYPE ? CRASHTYPE : (none));
+   fprintf(stderr,\tCrash log: %d values: , CRASHLOG[0]);
+   for (n = 0; n  CRASHLOG[0]; n++) {
+   fprintf(stderr, %08x, (unsigned int) CRASHLOG[n+1]);
+   }
+   fprintf(stderr,\n);
+   }
+   exit(999);
+}
+
+static void CRASH_activate(void) {
+
+   struct sigaction sa;
+   struct sigaction oldsa;
+
+   sa.sa_sigaction = CRASH_trip;
+   sigemptyset(sa.sa_mask);
+   sa.sa_flags = SA_SIGINFO;
+
+   if (CRASHTRAP == 0) {
+   sigaction(SIGSEGV, sa, oldsa); 
+   CRASHTRAP = 1;
+   }
+
+   CRASHFLAG = 1;
+   CRASHTYPE = NULL;
+   CRASHLOG[0] = 0;
+
+}
+
+static void CRASH_deactivate(void) {
+   CRASHFLAG = 0;
+}
+
+static void CRASH_type(char *s) {
+   CRASHTYPE = s;
+}
+
+static void CRASH_record(int n, ...) {
+   va_list ap;
+   int i;
+
+   va_start(ap, n);
+
+   for (i = 0; i  n; i++) {
+   CRASHLOG[i+1] = va_arg(ap, int);
+   }
+
+   va_end(ap);
+
+   CRASHLOG[0] = n;
+}
+
+
 /*** list functions ***/
 
 static void
 -164,13 +237,29 
 subtract_refs(PyGC_Head *containers)
 {
traverseproc traverse;
+   PyObject *obj;
+
+
PyGC_Head *gc = containers-gc_next;
+
+   CRASH_activate();
+   CRASH_type(subtract_refs);
+
for (; gc != containers; gc=gc-gc_next) {
+   obj = (PyObject *)PyObject_FROM_GC(gc);
+   CRASH_record(4, obj,
+   obj != 0 ? obj-ob_refcnt : 0,
+   obj != NULL ? obj-ob_type : NULL,
+   obj != NULL  obj-ob_type != NULL ?
+   

Re: [Zope-dev] more on the segfault saga

2002-03-14 Thread Leonardo Rochael Almeida

5On Thu, 2002-03-14 at 13:28, Matthew T. Kromer wrote:
 
 OK, I'm attaching a patch to Python's Modules/gcmodule.c which should 
 set a trap for where the garbage collector trips over bad data; this 
 will grab the bad data and send it to stderr so I can build a better trap.

I'm on it. Will send results when they're available. If anyone wants to
talk to me during the period, I'll be on IRC.

-- 
Ideas don't stay in some minds very long because they don't like
solitary confinement.


___
Zope-Dev maillist  -  [EMAIL PROTECTED]
http://lists.zope.org/mailman/listinfo/zope-dev
**  No cross posts or HTML encoding!  **
(Related lists - 
 http://lists.zope.org/mailman/listinfo/zope-announce
 http://lists.zope.org/mailman/listinfo/zope )



Re: [Zope-dev] more on the segfault saga

2002-03-14 Thread Jim Washington

Don't know if this helps, but the last three segfaults I have seen were 
right after someone logs in, during loading /manage.

Zope-2.5.0 Win32 binary on Win2k. The pop-up referenced the same 
instruction 0x1e13490a at the same memory address 0x005c all 
three times, saying 'memory could not be read.'

-- Jim Washington


___
Zope-Dev maillist  -  [EMAIL PROTECTED]
http://lists.zope.org/mailman/listinfo/zope-dev
**  No cross posts or HTML encoding!  **
(Related lists - 
 http://lists.zope.org/mailman/listinfo/zope-announce
 http://lists.zope.org/mailman/listinfo/zope )



Re: [Zope-dev] more on the segfault saga

2002-03-14 Thread Dieter Maurer

Leonardo Rochael Almeida writes:
  In any event, Martijn Jacobs (a.k.a. instability case #3 :-) sees his
  crashes in pure dtml methods, which could mean that PythonScripts are
  inocent in this case... or not, since the segfault hits inside the gc,
  which might be collecting something completely unrelated to the current
  requests.
Just a wild guess: is the GC guaranteed to be thread safe?


Dieter

___
Zope-Dev maillist  -  [EMAIL PROTECTED]
http://lists.zope.org/mailman/listinfo/zope-dev
**  No cross posts or HTML encoding!  **
(Related lists - 
 http://lists.zope.org/mailman/listinfo/zope-announce
 http://lists.zope.org/mailman/listinfo/zope )



Re: [Zope-dev] more on the segfault saga

2002-03-14 Thread Matthew T. Kromer

Dieter Maurer wrote:

Leonardo Rochael Almeida writes:
  In any event, Martijn Jacobs (a.k.a. instability case #3 :-) sees his
  crashes in pure dtml methods, which could mean that PythonScripts are
  inocent in this case... or not, since the segfault hits inside the gc,
  which might be collecting something completely unrelated to the current
  requests.
Just a wild guess: is the GC guaranteed to be thread safe?


Dieter


I'm fairly sure it is; certainly, there's an activity flag which should 
prevent the collector from being reentered.

-- 
Matt Kromer
Zope Corporation  http://www.zope.com/ 




___
Zope-Dev maillist  -  [EMAIL PROTECTED]
http://lists.zope.org/mailman/listinfo/zope-dev
**  No cross posts or HTML encoding!  **
(Related lists - 
 http://lists.zope.org/mailman/listinfo/zope-announce
 http://lists.zope.org/mailman/listinfo/zope )



Re: [Zope-dev] more on the segfault saga

2002-03-14 Thread Leonardo Rochael Almeida

On Thu, 2002-03-14 at 17:17, Dieter Maurer wrote:
 Leonardo Rochael Almeida writes:
   In any event, Martijn Jacobs (a.k.a. instability case #3 :-) sees his
   crashes in pure dtml methods, which could mean that PythonScripts are
   inocent in this case... or not, since the segfault hits inside the gc,
   which might be collecting something completely unrelated to the current
   requests.
 Just a wild guess: is the GC guaranteed to be thread safe?

The gc acquires the big interpreter lock before doing it's stuff. which
is not the same thing, since C code could be doing bad stuff. Question,
how the gc differentiates between an unreachable object and an object
that's reachable only by C code?

-- 
Ideas don't stay in some minds very long because they don't like
solitary confinement.


___
Zope-Dev maillist  -  [EMAIL PROTECTED]
http://lists.zope.org/mailman/listinfo/zope-dev
**  No cross posts or HTML encoding!  **
(Related lists - 
 http://lists.zope.org/mailman/listinfo/zope-announce
 http://lists.zope.org/mailman/listinfo/zope )



Re: [Zope-dev] more on the segfault saga

2002-03-14 Thread Matthew T. Kromer

Attached is another diagnostic patch which you might apply to Python. 
 If you apply this patch, you WILL need to rebuild Zope to include it.

What it will do is complain to stderr if an object is INCREF'd from 
refcount 0.  It also silences the complaint for the one area which I 
know revives dead objects.

This patch will probably cause a crash after an erroneous incref-from-0 
is detected, since it doesnt actually DO the incref in that case.

The intent is to find a case in the code where an object is held between 
threads; one thread decrefs to zero, the other thread increfs, causing a 
revive -- but too late to save the patient.

-- 
Matt Kromer
Zope Corporation  http://www.zope.com/ 




--- Include/object.h.orig   Thu Mar 14 16:44:36 2002
+++ Include/object.hThu Mar 14 16:54:29 2002
 -442,7 +442,7 
 #define _Py_NewReference(op) ((op)-ob_refcnt = 1)
 #endif
 
-#define Py_INCREF(op) ((op)-ob_refcnt++)
+#define Py_INCREF(op) ((op)-ob_refcnt  0 ? (op)-ob_refcnt++ : 
+fprintf(stderr,Eeek! Increfing an object from refct 0 at %s:%d\n,__FILE__,__LINE__) 
+)
 #define Py_DECREF(op) \
if (--(op)-ob_refcnt != 0) \
; \
--- Objects/classobject.c.orig  Thu Mar 14 17:04:40 2002
+++ Objects/classobject.c   Thu Mar 14 17:01:36 2002
 -535,7 +535,8 
 #endif
 #else /* !Py_TRACE_REFS */
/* Py_INCREF boosts _Py_RefTotal if Py_REF_DEBUG is defined */
-   Py_INCREF(inst);
+   /* Py_INCREF(inst); */
+   inst-ob_refcnt++;  /* we dont want to trap this one */
 #endif /* !Py_TRACE_REFS */
 
/* Save the current exception, if any. */



Re: [Zope-dev] more on the segfault saga

2002-03-14 Thread Matthew T. Kromer

Matthew T. Kromer wrote:

 Attached is another diagnostic patch which you might apply to Python. 
 If you apply this patch, you WILL need to rebuild Zope to include it.

 What it will do is complain to stderr if an object is INCREF'd from 
 refcount 0.  It also silences the complaint for the one area which I 
 know revives dead objects.

 This patch will probably cause a crash after an erroneous 
 incref-from-0 is detected, since it doesnt actually DO the incref in 
 that case.

 The intent is to find a case in the code where an object is held 
 between threads; one thread decrefs to zero, the other thread increfs, 
 causing a revive -- but too late to save the patient.


extensionclass also brings back the dead; the following patch to Zope's 
extensionclass will turn off the warning when it happens when you apply 
the previous patch that I sent out that complains when an object is 
incref'd from a refcount of zero.


-- 
Matt Kromer
Zope Corporation  http://www.zope.com/ 




Index: lib/Components/ExtensionClass/src/ExtensionClass.c
===
RCS file: /cvs-repository/Zope/lib/Components/ExtensionClass/src/ExtensionClass.c,v
retrieving revision 1.46.36.1
diff -u -r1.46.36.1 ExtensionClass.c
--- lib/Components/ExtensionClass/src/ExtensionClass.c  4 Oct 2001 14:25:19 -  
 1.46.36.1
+++ lib/Components/ExtensionClass/src/ExtensionClass.c  14 Mar 2002 22:43:10 -
 -3047,8 +3047,9 
   fprintf(stderr,Deallocating a %s\n, self-ob_type-tp_name);
 #endif
 
+  self-ob_refcnt++;
   PyErr_Fetch(t,v,tb);
-  Py_INCREF(self); /* Give us a new lease on life */
+ /* Py_INCREF(self);   /* Give us a new lease on life */
 
   if (subclass_watcher 
  ! PyObject_CallMethod(subclass_watcher,destroying,O,self))



Re: [Zope-dev] more on the segfault saga

2002-03-14 Thread Leonardo Rochael Almeida

Hi Matt,

I'll wait for the patch where you also silence the dead-raising area in
ExtensionClass.

What if, instead of detecting this situation, we try to detect if the 
incref is happening without the interpreter lock held? increfs and
decrefs shouldn't be happening freely and simultaneously even in C code
right? Is holding the interpreter lock the correct way to signal that
you'll be doing increfings and decrefings in C code?

Cheers, Leo

On Thu, 2002-03-14 at 19:10, Matthew T. Kromer wrote:
 Attached is another diagnostic patch which you might apply to Python. 
  If you apply this patch, you WILL need to rebuild Zope to include it.
 
 What it will do is complain to stderr if an object is INCREF'd from 
 refcount 0.  It also silences the complaint for the one area which I 
 know revives dead objects.
 
 This patch will probably cause a crash after an erroneous incref-from-0 
 is detected, since it doesnt actually DO the incref in that case.
 
 The intent is to find a case in the code where an object is held between 
 threads; one thread decrefs to zero, the other thread increfs, causing a 
 revive -- but too late to save the patient.
 
 -- 
 Matt Kromer
 Zope Corporation  http://www.zope.com/ 
 
 
 
 

 --- Include/object.h.orig Thu Mar 14 16:44:36 2002
 +++ Include/object.h  Thu Mar 14 16:54:29 2002
 @@ -442,7 +442,7 @@
  #define _Py_NewReference(op) ((op)-ob_refcnt = 1)
  #endif
  
 -#define Py_INCREF(op) ((op)-ob_refcnt++)
 +#define Py_INCREF(op) ((op)-ob_refcnt  0 ? (op)-ob_refcnt++ : 
fprintf(stderr,Eeek! Increfing an object from refct 0 at %s:%d\n,__FILE__,__LINE__) 
)
  #define Py_DECREF(op) \
   if (--(op)-ob_refcnt != 0) \
   ; \
 --- Objects/classobject.c.origThu Mar 14 17:04:40 2002
 +++ Objects/classobject.c Thu Mar 14 17:01:36 2002
 @@ -535,7 +535,8 @@
  #endif
  #else /* !Py_TRACE_REFS */
   /* Py_INCREF boosts _Py_RefTotal if Py_REF_DEBUG is defined */
 - Py_INCREF(inst);
 + /* Py_INCREF(inst); */
 + inst-ob_refcnt++;  /* we dont want to trap this one */
  #endif /* !Py_TRACE_REFS */
  
   /* Save the current exception, if any. */
-- 
Ideas don't stay in some minds very long because they don't like
solitary confinement.


___
Zope-Dev maillist  -  [EMAIL PROTECTED]
http://lists.zope.org/mailman/listinfo/zope-dev
**  No cross posts or HTML encoding!  **
(Related lists - 
 http://lists.zope.org/mailman/listinfo/zope-announce
 http://lists.zope.org/mailman/listinfo/zope )



Re: [Zope-dev] more on the segfault saga

2002-03-14 Thread Anthony Baxter


 Dieter Maurer wrote
 Just a wild guess: is the GC guaranteed to be thread safe?

Yep. 

The GC is _almost_ certainly not the problem here - it's just that the
GC is the poor bunny that has to walk through the objects in memory. So
when something's been mangled, the GC is the thing that falls over and
breaks.

I think I've mentioned it before, but looking at the object _before_
the corrupted one in memory might be a useful thing to try...

Anthony

-- 
Anthony Baxter [EMAIL PROTECTED]   
It's never to late to have a happy childhood.


___
Zope-Dev maillist  -  [EMAIL PROTECTED]
http://lists.zope.org/mailman/listinfo/zope-dev
**  No cross posts or HTML encoding!  **
(Related lists - 
 http://lists.zope.org/mailman/listinfo/zope-announce
 http://lists.zope.org/mailman/listinfo/zope )



Re: [Zope-dev] more on the segfault saga

2002-03-13 Thread Matthew T. Kromer


On Tuesday, March 12, 2002, at 05:08 PM, Leonardo Rochael Almeida wrote:


 Matthew, thanks for taking the time to gdb the beast with me. Did you
 come up with any instrumentation I should add to Python or Zope to get
 what it is that Python is trying to resease twice? If you want, I can
 arrange a MySQL-less period during production so that we can capture the
 crash in a cleaner environment.



Sorry, I only dug out what I had and made THAT work; it was a 
single-thread profiler.  It will require some modding to turn it into a 
useful debug tool instead, and I've been busy with other things.


 Speaking of instrumentation, since the gremlin seems to be threading
 related (it stops with '-t 1'), it might be useful to serialize certain
 parts of the execution path with semaphors, like the path to the
 restoration or the execution of PythonScripts, the path to the execution
 of SQL queries, etc. this means running with small locks in certain
 sections instead of the big '-t 1' lock (which is not really a lock, but
 you get the picture :-).


Well, if you have the energy to try serializing some of of the base 
parts of the code, by all means, go ahead.   I can't even begin to guess 
where the problem is though.


___
Zope-Dev maillist  -  [EMAIL PROTECTED]
http://lists.zope.org/mailman/listinfo/zope-dev
**  No cross posts or HTML encoding!  **
(Related lists - 
 http://lists.zope.org/mailman/listinfo/zope-announce
 http://lists.zope.org/mailman/listinfo/zope )



Re: [Zope-dev] more on the segfault saga

2002-03-13 Thread Leonardo Rochael Almeida

On Wed, 2002-03-13 at 09:09, Matthew T. Kromer wrote:
 
 On Tuesday, March 12, 2002, at 05:08 PM, Leonardo Rochael Almeida wrote:
 
 
  Matthew, thanks for taking the time to gdb the beast with me. Did you
  come up with any instrumentation I should add to Python or Zope to get
  what it is that Python is trying to resease twice? If you want, I can
  arrange a MySQL-less period during production so that we can capture the
  crash in a cleaner environment.
 
 Sorry, I only dug out what I had and made THAT work; it was a 
 single-thread profiler.  It will require some modding to turn it into a 
 useful debug tool instead, and I've been busy with other things.

What about patching Python to report the freed objects like you
mentioned on IRC? Also, how about turning on some flags in
gc.seg_debug()? Do you think we might be able to glance something by
seeing what objects where logged as freed or by storing them in
gc.garbage?

  Speaking of instrumentation, since the gremlin seems to be threading
  related (it stops with '-t 1'), it might be useful to serialize certain
  parts of the execution path with semaphors [...]
 
 Well, if you have the energy to try serializing some of of the base 
 parts of the code, by all means, go ahead. I can't even begin to guess 
 where the problem is though.

Well, I have the energy, I just don't know where to start. But it's
beginning to look like I'll just have to roll up my sleeves and dive in
C code to hunt this beast down. And to think that I'd chosen Python as
my official programming language to avoid just that... :-)

-- 
Ideas don't stay in some minds very long because they don't like
solitary confinement.


___
Zope-Dev maillist  -  [EMAIL PROTECTED]
http://lists.zope.org/mailman/listinfo/zope-dev
**  No cross posts or HTML encoding!  **
(Related lists - 
 http://lists.zope.org/mailman/listinfo/zope-announce
 http://lists.zope.org/mailman/listinfo/zope )



Re: [Zope-dev] more on the segfault saga

2002-03-13 Thread Shane Hathaway

Leonardo Rochael Almeida wrote:
 Well, I have the energy, I just don't know where to start. But it's
 beginning to look like I'll just have to roll up my sleeves and dive in
 C code to hunt this beast down. And to think that I'd chosen Python as
 my official programming language to avoid just that... :-)

I just found out about something that might help.  If you compiled 
against the GNU C library, you can set the environment variable 
MALLOC_CHECK_ to 1 to get malloc usage warnings printed to stderr, or 
set it to 2 to cause an abort() as soon as an error is detected. 
Assuming you're running in production, I'd start with 1 (making sure 
stderr is connected to something), then if any warnings occur but they 
aren't informative enough, switch to 2.

I learned this here:

http://www.gnu.org/manual/glibc-2.2.3/html_node/libc_32.html

Shane


___
Zope-Dev maillist  -  [EMAIL PROTECTED]
http://lists.zope.org/mailman/listinfo/zope-dev
**  No cross posts or HTML encoding!  **
(Related lists - 
 http://lists.zope.org/mailman/listinfo/zope-announce
 http://lists.zope.org/mailman/listinfo/zope )



Re: [Zope-dev] more on the segfault saga

2002-03-13 Thread Leonardo Rochael Almeida

I set MALLOC_CHECK_ to 1 and it said it was using the malloc debug
hooks, but didn't report anything else before the crashes, so no point
in setting it to 2...

On Wed, 2002-03-13 at 13:49, Shane Hathaway wrote:
 Leonardo Rochael Almeida wrote:
  On Wed, 2002-03-13 at 13:04, Shane Hathaway wrote:
  
 I just found out about something that might help.  If you compiled 
 against the GNU C library, you can set the environment variable 
 MALLOC_CHECK_ to 1 to get malloc usage warnings printed to stderr, or 
 set it to 2 to cause an abort() as soon as an error is detected. 
 Assuming you're running in production, I'd start with 1 (making sure 
 stderr is connected to something), then if any warnings occur but they 
 aren't informative enough, switch to 2.
 
 I learned this here:
 
 http://www.gnu.org/manual/glibc-2.2.3/html_node/libc_32.html
 
  
  Thanks Shane, I'll try that. But first I need a way to not supply -D and
  still get the stderr redirected. This site uses cookie authentication
  (exUserFolder) and even though the traceback ends up in a page that is
  shortly redirected from, some of our client's customers can spot it
  sometimes and they usually call complaining about the Zope error they
  saw imediatelly before the login page so we had to disable '-D'.
 
 -D is actually not related AFAIK.  The C library will output to stderr 
 regardless of whether -D is supplied, which means you need to use 
 standard redirection anyway, for example:
 
 ./start /var/local/log/zope_output 21

-- 
Ideas don't stay in some minds very long because they don't like
solitary confinement.


___
Zope-Dev maillist  -  [EMAIL PROTECTED]
http://lists.zope.org/mailman/listinfo/zope-dev
**  No cross posts or HTML encoding!  **
(Related lists - 
 http://lists.zope.org/mailman/listinfo/zope-announce
 http://lists.zope.org/mailman/listinfo/zope )



Re: [Zope-dev] more on the segfault saga

2002-03-13 Thread Leonardo Rochael Almeida

On Wed, 2002-03-13 at 21:30, Matthew T. Kromer wrote:
 
 On Wednesday, March 13, 2002, at 10:40 AM, Leonardo Rochael Almeida 
 wrote:
 
  What about patching Python to report the freed objects like you
  mentioned on IRC? Also, how about turning on some flags in
  gc.seg_debug()? Do you think we might be able to glance something by
  seeing what objects where logged as freed or by storing them in
  gc.garbage?
 

setting gc.set_debug(gc.DEBUG_LEAK) floods your stderr in a way you can
only believe by seeing it. And it didn't give me any clue. the last
object freed was an instance method. Most everything running inside Zope
is an instance method or another...

 Well, what I'm thinking about doing is trying to patch the Py_DECREF 
 macro to record the free objects in a table and mark the freed memory 
 with a signal value.

Good thing it doesn't involve writing anything to stderr (right?),
otherwise if gc.DEBUG_LEAK is a flood I cannot even begin to imagine the
flood of PyDECREF messages...

If you can produce a patch, I'm more than willing to apply it.

 I'm worried about the python script aspects.  It's frustrating, because 
 I am not aware of anything in pythonscripts that should be 
 thread-dependent.

The way the bytecode versions of said PythonScripts are kept in memory,
perhaps? Although that's not likely, since each thread keeps its own
version of that, even when recompilation is needed, right?

In any event, Martijn Jacobs (a.k.a. instability case #3 :-) sees his
crashes in pure dtml methods, which could mean that PythonScripts are
inocent in this case... or not, since the segfault hits inside the gc,
which might be collecting something completely unrelated to the current
requests.

Questions:

If I call gc.disable() but run gc.collect() from time to time I get the
same effect, right?

In this case, where in the code would I put a call to gc.collect() to
get it to happen after the second phase of the two phase commit?

Another aproach: would it be possible to, from time to time, put zope in
a state where it enqueue new connections instead of servicing them while
waiting for the currently running requests to finnish, then run
gc.collect(), then start servicing requests again?

Cheers, Leo


___
Zope-Dev maillist  -  [EMAIL PROTECTED]
http://lists.zope.org/mailman/listinfo/zope-dev
**  No cross posts or HTML encoding!  **
(Related lists - 
 http://lists.zope.org/mailman/listinfo/zope-announce
 http://lists.zope.org/mailman/listinfo/zope )



[Zope-dev] more on the segfault saga

2002-03-12 Thread Leonardo Rochael Almeida

narrator voice=koshAnd so it continues.../narrator

I've finally recompiled all pythonScripts (all scripts and ZCatalog
tricks I tried before didn't know how to get the PythonScripts inside
the ZClasses. BTW, if anyone is interested, I can send you the scripts I
used to recompile all pythonScripts inside ZClasses).

But it still segfaults (and it doesn't seem to be any more stable, so
there Anthony :-). At least I got a cleaner stupid_log_file, no more
needs recompilation messages :-)

Matthew, thanks for taking the time to gdb the beast with me. Did you
come up with any instrumentation I should add to Python or Zope to get
what it is that Python is trying to resease twice? If you want, I can
arrange a MySQL-less period during production so that we can capture the
crash in a cleaner environment.

Speaking of instrumentation, since the gremlin seems to be threading
related (it stops with '-t 1'), it might be useful to serialize certain
parts of the execution path with semaphors, like the path to the
restoration or the execution of PythonScripts, the path to the execution
of SQL queries, etc. this means running with small locks in certain
sections instead of the big '-t 1' lock (which is not really a lock, but
you get the picture :-).

Cheers, Leo

-- 
Ideas don't stay in some minds very long because they don't like
solitary confinement.


___
Zope-Dev maillist  -  [EMAIL PROTECTED]
http://lists.zope.org/mailman/listinfo/zope-dev
**  No cross posts or HTML encoding!  **
(Related lists - 
 http://lists.zope.org/mailman/listinfo/zope-announce
 http://lists.zope.org/mailman/listinfo/zope )