Re: [Zope-dev] recipe for trapping SIGSEGV and SIGILL signals on solaris

2001-12-13 Thread Joseph Wayne Norton


Matt -

Ok, I installed everything and the system is running fine (or no
worse).  However, we still faced one restart so far.  I have included
the debug information below.

This looks similiar to the problem report on sourceforge:

 http://sourceforge.net/tracker/?func=detailatid=105470aid=471942group_id=5470

I posted a comment to see if they have any updates.

One question ... does anyone every malloc a plain ClassExtension
object?  It seems that every CE-based object has their own struct
typedef. If so, then I think yesterday's patch problaby won't do any
harm but won't help either.

The current running process is being monitored by truss so I will be
able to get at least one more core dump (if we get one). I won't be
able to get any more information until tomorrow.

Any other ideas?  Thanks for your help.

- joe .

(gdb) info threads
  17 Thread 10  0xef5b9810 in _lwp_sema_wait ()
  16 Thread 9  0xef647cac in _swtch ()
  15 Thread 8  0xef5b9810 in _lwp_sema_wait ()
  14 Thread 7 (LWP 5)  0xcaeb50 in ?? ()
  13 Thread 6  0xef647cac in _swtch ()
  12 Thread 5  0xef5b9810 in _lwp_sema_wait ()
  11 Thread 4  0xef647cac in _swtch ()
  10 Thread 3  0xef647cac in _swtch ()
  9 Thread 2 (LWP 2)  0xef5b9958 in _signotifywait ()
  8 Thread 1 (LWP 6)  0xef5b7488 in _poll ()
  7 LWP8  0xef5b6a24 in door_restart ()
  6 LWP6  0xef5b7488 in _poll ()
  5 LWP5  0xcaeb50 in ?? ()
  4 LWP4  0xef5b9810 in _lwp_sema_wait ()
  3 LWP3  0xef5b9810 in _lwp_sema_wait ()
  2 LWP2  0xef5b9958 in _signotifywait ()
* 1 LWP1  0xef5b9810 in _lwp_sema_wait ()

(gdb) thread 14
[Switching to Thread 7 (LWP 5)]
#0  0xcaeb50 in ?? ()

(gdb) where
#0  0xcaeb50 in ?? ()
#1  0x516bc in collect (young=0x13dec8, old=0x13ded4)
at ./Modules/gcmodule.c:379
#2  0x51984 in collect_generations () at ./Modules/gcmodule.c:484
#3  0x519fc in _PyGC_Insert (op=0xecf7d4) at ./Modules/gcmodule.c:507
#4  0x664ec in PyMethod_New (func=0x3f796c, self=0x11c0d44,
class=0x3c7e5c)
at Objects/classobject.c:1834
#5  0x63850 in instance_getattr2 (inst=0x11c0d44, name=0x3d5378)
at Objects/classobject.c:642
#6  0x63750 in instance_getattr1 (inst=0x11c0d44, name=0x3d5378)
at Objects/classobject.c:608
#7  0x63898 in instance_getattr (inst=0x11c0d44, name=0x3d5378)
at Objects/classobject.c:656
#8  0x78330 in PyObject_GetAttr (v=0x11c0d44, name=0x3d5378)
at Objects/object.c:1052
#9  0x895ec in builtin_hasattr (self=0x0, args=0x12ed944)
at Python/bltinmodule.c:886
#10 0x35a44 in call_cfunction (func=0x1609b0, arg=0x12ed944, kw=0x0)
at Python/ceval.c:2854
#11 0x33c5c in eval_code2 (co=0x3cbf80, globals=0x1, locals=0x0,
args=0x2, 
argcount=0, kws=0x0, kwcount=0, defs=0x0, defcount=0, closure=0x0)
at Python/ceval.c:1948

and so on 

___
Zope-Dev maillist  -  [EMAIL PROTECTED]
http://lists.zope.org/mailman/listinfo/zope-dev
**  No cross posts or HTML encoding!  **
(Related lists - 
 http://lists.zope.org/mailman/listinfo/zope-announce
 http://lists.zope.org/mailman/listinfo/zope )



Re: [Zope-dev] recipe for trapping SIGSEGV and SIGILL signals on solaris

2001-12-12 Thread Joseph Wayne Norton


Matt -

Well, your patch seems fine in our testing environment.
Unfortunately, we do not see any restarts in the testing environment
... always in production.  I had to rebuild our entire software base
because we are using other products that use extensions class and they
are not included under the main zope installation.  It caused a bus
error the first time (with only running wo_pcgi.py).

As I mentioned in my prior e-mail, I modified the patch slightly to
exactly match the struct in Python's object.h.  Please review this
patch.  I will apply the patch in production tomorrow morning, 12/13,
(Japan Standard Time or GMT+9) and monitor the system.  If zope does
not restart during the day, then I think you have fixed the problem.

I'm using Zope 2.4.3 and Python 2.1.1 with pymalloc disabled on the
solaris platform.

thanks and regards,

- joe n.

p.s. I looked **briefly** at the Zope 2.5 source and this patch will
not be compatible since there doesn't seem to be a standard among the
different extension classes on whether to include or not include the
COUNT_ALLOCS define.  The cAccessControl class seems to be the
exception.



*** ExtensionClass.h.bakFri Nov 16 10:37:11 2001
--- ExtensionClass.hWed Dec 12 15:10:03 2001
***
*** 136,154 
PySequenceMethods *tp_as_sequence;
PyMappingMethods *tp_as_mapping;
  
!   /* More standard operations (at end for binary compatibility) */
  
hashfunc tp_hash;
ternaryfunc tp_call;
reprfunc tp_str;
getattrofunc tp_getattro;
setattrofunc tp_setattro;
!   /* Space for future expansion */
!   long tp_xxx3;
!   long tp_xxx4;
  
char *tp_doc; /* Documentation string */
  
  #ifdef COUNT_ALLOCS
/* these must be last */
int tp_alloc;
--- 136,169 
PySequenceMethods *tp_as_sequence;
PyMappingMethods *tp_as_mapping;
  
!   /* More standard operations (here for binary compatibility) */
  
hashfunc tp_hash;
ternaryfunc tp_call;
reprfunc tp_str;
getattrofunc tp_getattro;
setattrofunc tp_setattro;
! 
!   /* Functions to access object as input/output buffer */
!   PyBufferProcs *tp_as_buffer;
!   
!   /* Flags to define presence of optional/expanded features */
!   long tp_flags;
  
char *tp_doc; /* Documentation string */
  
+   /* call function for all accessible objects */
+   traverseproc tp_traverse;
+   
+   /* delete references to contained objects */
+   inquiry tp_clear;
+ 
+   /* rich comparisons */
+   richcmpfunc tp_richcompare;
+ 
+   /* weak reference enabler */
+   long tp_weaklistoffset;
+ 
  #ifdef COUNT_ALLOCS
/* these must be last */
int tp_alloc;
***
*** 302,308 
 { PyExtensionClassCAPI-Export(D,N,T); }
  
  /* Convert a method list to a method chain.  */
! #define METHOD_CHAIN(DEF) { DEF, NULL }
  
  /* The following macro checks whether a type is an extension class: */
  #define PyExtensionClass_Check(TYPE) \
--- 317,330 
 { PyExtensionClassCAPI-Export(D,N,T); }
  
  /* Convert a method list to a method chain.  */
! /* MTK -- make it pad the type structure out -- presumes only use is in
! ** type structure initialization
! */
! #ifdef COUNT_ALLOCS
! #define METHOD_CHAIN(DEF) 0,0,0,0,0,0,0,0,{ DEF, NULL }
! #else
! #define METHOD_CHAIN(DEF) 0,0,0,0,{ DEF, NULL }
! #endif
  
  /* The following macro checks whether a type is an extension class: */
  #define PyExtensionClass_Check(TYPE) \
***
*** 336,342 
  #define PURE_MIXIN_CLASS(NAME,DOC,METHODS) \
  static PyExtensionClass NAME ## Type = { PyObject_HEAD_INIT(NULL) \
0, # NAME, sizeof(PyPureMixinObject), 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, \
!   0, 0, 0, 0, 0, 0, 0, DOC, {METHODS, NULL}, \
  EXTENSIONCLASS_BASICNEW_FLAG}
  
  /* The following macros provide limited access to extension-class
--- 358,364 
  #define PURE_MIXIN_CLASS(NAME,DOC,METHODS) \
  static PyExtensionClass NAME ## Type = { PyObject_HEAD_INIT(NULL) \
0, # NAME, sizeof(PyPureMixinObject), 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, \
!   0, 0, 0, 0, 0, 0, 0, DOC, METHOD_CHAIN(METHODS), \
  EXTENSIONCLASS_BASICNEW_FLAG}
  
  /* The following macros provide limited access to extension-class

___
Zope-Dev maillist  -  [EMAIL PROTECTED]
http://lists.zope.org/mailman/listinfo/zope-dev
**  No cross posts or HTML encoding!  **
(Related lists - 
 http://lists.zope.org/mailman/listinfo/zope-announce
 http://lists.zope.org/mailman/listinfo/zope )



Re: [Zope-dev] recipe for trapping SIGSEGV and SIGILL signals on solaris

2001-12-12 Thread Matthew T. Kromer

Joseph Wayne Norton wrote:

Matt -

Well, your patch seems fine in our testing environment.
Unfortunately, we do not see any restarts in the testing environment
... always in production.  I had to rebuild our entire software base
because we are using other products that use extensions class and they
are not included under the main zope installation.  It caused a bus
error the first time (with only running wo_pcgi.py).

As I mentioned in my prior e-mail, I modified the patch slightly to
exactly match the struct in Python's object.h.  Please review this
patch.  I will apply the patch in production tomorrow morning, 12/13,
(Japan Standard Time or GMT+9) and monitor the system.  If zope does
not restart during the day, then I think you have fixed the problem.

I'm using Zope 2.4.3 and Python 2.1.1 with pymalloc disabled on the
solaris platform.

thanks and regards,

- joe n.

p.s. I looked **briefly** at the Zope 2.5 source and this patch will
not be compatible since there doesn't seem to be a standard among the
different extension classes on whether to include or not include the
COUNT_ALLOCS define.  The cAccessControl class seems to be the
exception.


My fingers and toes are crossed for you ;)

I've actually built 2.5 with the modified extensionclass.h and it seems 
to build OK and it runs and passes all of its unit tests. Thats not 
proof one way or another, but...

Sorry our turnaround times are so laggy; thats the downside of 
diagnosing a problem on the other side of the globe.






___
Zope-Dev maillist  -  [EMAIL PROTECTED]
http://lists.zope.org/mailman/listinfo/zope-dev
**  No cross posts or HTML encoding!  **
(Related lists - 
 http://lists.zope.org/mailman/listinfo/zope-announce
 http://lists.zope.org/mailman/listinfo/zope )



Re: [Zope-dev] recipe for trapping SIGSEGV and SIGILL signals on solaris

2001-12-12 Thread Florent Guillaume

 (gdb) print *((PyObject *) gc)-ob_type
 $1 = {ob_refcnt = 18213696, ob_type = 0x2d70b0, ob_size = 0, 
   tp_name = 0x1 T, tp_basicsize = 1328272, tp_itemsize = 4156348, 
   tp_dealloc = 0x125865c, tp_print = 0x3c1b04, tp_getattr = 0,
 tp_setattr = 0, 
   tp_compare = 0x29, tp_repr = 0x3adeb0, tp_as_number = 0xf66198, 
   tp_as_sequence = 0xdf3fa0, tp_as_mapping = 0x0, tp_hash = 0x1, 
   tp_call = 0x144490 PyMethod_Type, tp_str = 0x3f0a1c, 
   tp_getattro = 0x125865c, tp_setattro = 0x3c1b04, tp_as_buffer = 0x0,
 
   tp_flags = 158561192, tp_doc = 0x29 , tp_traverse = 0x4c4f4144, 
   tp_clear = 0xd908c0, tp_richcompare = 0x1151300, tp_weaklistoffset =
 0}
[...]
 gdb) x 0x4c4f4144
 0x4c4f4144: Cannot access memory at address 0x4c4f4144.


0x4c4f4144 is big-endian ascii for LOAD. Things were corrupted
before...


Florent
-- 
Florent Guillaume, Nuxeo (Paris, France)
+33 1 40 33 79 10  http://nuxeo.com  mailto:[EMAIL PROTECTED]

___
Zope-Dev maillist  -  [EMAIL PROTECTED]
http://lists.zope.org/mailman/listinfo/zope-dev
**  No cross posts or HTML encoding!  **
(Related lists - 
 http://lists.zope.org/mailman/listinfo/zope-announce
 http://lists.zope.org/mailman/listinfo/zope )



Re: [Zope-dev] recipe for trapping SIGSEGV and SIGILL signals on solaris

2001-12-12 Thread Matthew T. Kromer

Florent Guillaume wrote:

(gdb) print *((PyObject *) gc)-ob_type
$1 = {ob_refcnt = 18213696, ob_type = 0x2d70b0, ob_size = 0, 
  tp_name = 0x1 T, tp_basicsize = 1328272, tp_itemsize = 4156348, 
  tp_dealloc = 0x125865c, tp_print = 0x3c1b04, tp_getattr = 0,
tp_setattr = 0, 
  tp_compare = 0x29, tp_repr = 0x3adeb0, tp_as_number = 0xf66198, 
  tp_as_sequence = 0xdf3fa0, tp_as_mapping = 0x0, tp_hash = 0x1, 
  tp_call = 0x144490 PyMethod_Type, tp_str = 0x3f0a1c, 
  tp_getattro = 0x125865c, tp_setattro = 0x3c1b04, tp_as_buffer = 0x0,

  tp_flags = 158561192, tp_doc = 0x29 , tp_traverse = 0x4c4f4144, 
  tp_clear = 0xd908c0, tp_richcompare = 0x1151300, tp_weaklistoffset =
0}

[...]

gdb) x 0x4c4f4144
0x4c4f4144: Cannot access memory at address 0x4c4f4144.



0x4c4f4144 is big-endian ascii for LOAD. Things were corrupted
before...


Florent


Yes, the whole block is bad, so it probably isn't really a Python type 
object.  The refcount is a bit high, the name is really low (0x01!) the 
basicsize and itemsize are extremely large, the compare function is too 
low, the hash function is too low -- ie it isn't a type object.  

So, I may have been telling him to get the wrong thing; the source code 
that he faulted in reads:

/* Subtract internal references from gc_refs */
static void
subtract_refs(PyGC_Head *containers)
{
traverseproc traverse;
PyGC_Head *gc = containers-gc_next;
for (; gc != containers; gc=gc-gc_next) {
/* The next line is the line that was active at the time of his fault */
traverse = PyObject_FROM_GC(gc)-ob_type-tp_traverse;
(void) traverse(PyObject_FROM_GC(gc),
   (visitproc)visit_decref,
   NULL);
}
}

And PyObject_FROM_GC(gc) is either (gc) or ((PyObject *)(((PyGC_Head 
*)gc)+1)) depending on on whether or not WITH_CYCLE_GC is defined.  I 
took the easy route and asked Joe to assume that the former was true.
If the latter is true, then the type object is shifted upwards in memory 
by three words; the new first three fields are gc_next, gc_prev, and 
gc_refs.

That means every value in the type header is off by three fields, if it 
isn't aligned, meaning the real type object would be:

gc_next = 0x115eb40
gc_prev = 0x2d70b0
gc_refs = 0
ob_refcnt = 0x1
ob_type = 0x144490 (which we actually know is PyMethod_Type -- yay)
ob_size = 0x3f6bbc (which is too large for my comfort)
tp_name = 0x12865c (valid pointer but we dont know what it is)
tp_basicsize=0x3c1b04 (seems high again, but is 0x350b8 less than ob_size)
tp_itemsize = 0
tp_dealloc = 0
tp_print = 0x29 (boo!)
tp_getattr = 0x3adeb0
tp_setattr = 0xf66198
tp_compare = 0xdf3fa0
tp_repr = 0
tp_as_number = 1 (boo!)
tp_as_sequence = 0x144490 PyMethod_Type (boo!)

etc...

even shifting THESE values by 1 (assuming the compiler takes PyGC_Head 
which is three words and pads it up to 4 words for alignment) puts 
garbage values like 0x29 in tp_dealloc.

Ergo, I'm pretty confident that the gc pointer itself is bad.

If I was just a *wee* bit more familiar with how Solaris loaded 
segments, I'd be able to glean some more information from the addresses 
(ie are they code or data segment pointers).  Normally I like seeing 
OS's use the high nybble or byte of an address as a segment number to 
make that sort of diagnosis easier.

It actually looks like page zero is MAPPED on Solaris (I didnt think it 
was) which in my book is a baaad thing since it means a null pointer CAN 
be dereferenced.








___
Zope-Dev maillist  -  [EMAIL PROTECTED]
http://lists.zope.org/mailman/listinfo/zope-dev
**  No cross posts or HTML encoding!  **
(Related lists - 
 http://lists.zope.org/mailman/listinfo/zope-announce
 http://lists.zope.org/mailman/listinfo/zope )



Re: [Zope-dev] recipe for trapping SIGSEGV and SIGILL signals on solaris

2001-12-11 Thread Matthew T. Kromer

Hi Joe,

The problem you're seeing is that the fault is happening on a different
thread than the receiver of the signal; that truss syntax is interesting
though (I have an old SPARC around to test on but its painfully slow) so I'm
wondering if first you needed to do an 'info thread' in gdb and then a
'thread N' to switch to the real crashing thread before getting the
backtrace.


- Original Message -
From: Joseph Wayne Norton [EMAIL PROTECTED]
To: [EMAIL PROTECTED]
Sent: Tuesday, December 11, 2001 2:20 AM
Subject: [Zope-dev] recipe for trapping SIGSEGV and SIGILL signals on
solaris



 Hello.

 We are facing zope restarts on the solaris 5.6 platform with zope
 2.4.3 and python 2.1.1.  I put together a script based some
 information on an old posting to the apache mailing list.  The
 following shell/perl script allows one to get a core file from a dying
 zope child process and also allow the zope to restart without any side
 effects.


 The script 

 #!/bin/sh
 PATH=$PATH:/usr/local/bin
 export PATH
 cd /tmp
 for PID in `ps -u zfs -f -o pid,comm,args | fgrep z2.py | cut -d' ' -f1`
 do
 export PID
 truss -f -l -t\!all -S SIGSEGV,SIGILL -p $PID 21 \
 | perl -pe 'system(gcore $ENV{'PID'}  sleep 5  kill -9
$ENV{'PID'}), exit($ENV{'PID'}) if /(SIGSEGV|SIGILL)/;' 
 done


 Step 1:  modify script to match your environment.

 Step 2: execute script

 Step 3: wait for core file to be dumped in /tmp.

 Step 4: analyze with gdb where $PID is the pid of the dumped process

 #bash gdb /path/to/bin/python /tmp/core.$PID

 #0  0xef5b9810 in _lwp_sema_wait ()
 (gdb) where
 #0  0xef5b9810 in _lwp_sema_wait ()
 #1  0xef647ea0 in _park ()
 #2  0xef647b84 in _swtch ()
 #3  0xef6468a4 in cond_wait ()
 #4  0xef6467c8 in _ti_pthread_cond_wait ()
 #5  0x50220 in PyThread_acquire_lock (lock=0xd9d878, waitflag=1)
 at Python/thread_pthread.h:313
 #6  0x51f18 in lock_PyThread_acquire_lock (self=0xda39b8, args=0x0)
 at ./Modules/threadmodule.c:67
 #7  0x35db4 in fast_cfunction (func=0xda39b8, pp_stack=0xed40f828,
 na=0)
 at Python/ceval.c:2994
 #8  0x33ca0 in eval_code2 (co=0x267848, globals=0x51ec4, locals=0x0,
 args=0x0,
 argcount=0, kws=0x0, kwcount=0, defs=0x0, defcount=0, closure=0x0)
 at Python/ceval.c:1951

 :
 :


 It seems that we are facing trouble due to the thread library on
 solaris (unless the truss command has introduced a side-effect).

 Anyone else facing similiar troubles?   or maybe I should post
 this to a python mailing list.

 - joe



 ___
 Zope-Dev maillist  -  [EMAIL PROTECTED]
 http://lists.zope.org/mailman/listinfo/zope-dev
 **  No cross posts or HTML encoding!  **
 (Related lists -
  http://lists.zope.org/mailman/listinfo/zope-announce
  http://lists.zope.org/mailman/listinfo/zope )



___
Zope-Dev maillist  -  [EMAIL PROTECTED]
http://lists.zope.org/mailman/listinfo/zope-dev
**  No cross posts or HTML encoding!  **
(Related lists - 
 http://lists.zope.org/mailman/listinfo/zope-announce
 http://lists.zope.org/mailman/listinfo/zope )



Re: [Zope-dev] recipe for trapping SIGSEGV and SIGILL signals on solaris

2001-12-11 Thread Joseph Wayne Norton


At Tue, 11 Dec 2001 10:42:46 -0500,
Matthew T. Kromer wrote:
 
 #0  0xef5b9810 in _lwp_sema_wait ()
 (gdb) info threads
   19 Thread 10  0xef5b9810 in _lwp_sema_wait ()
   18 Thread 9  0xef5b9810 in _lwp_sema_wait ()
   17 Thread 8  0xef5b9810 in _lwp_sema_wait ()
   16 Thread 7 (LWP 8)  subtract_refs (containers=0x13dec8)
 at ./Modules/gcmodule.c:166
 
 
 Aha!  See?
 

Matthew -

I performed the operations that you recommended and here are the
results (see below). The problem seems to be with the value of the
tp_traverse field.

I am not aware of any T type python object.  I'm wondering if this
is an extension class type (just a guess).  I searched through all of
the *.c files in zope, etc. but I as not able to find any type of name
T.

I also ran across a bug posting at sourceforge ...

  http://sourceforge.net/tracker/?func=detailatid=105470aid=471942group_id=5470

This bug report looks very similiar.

- j


#0  0xef5b9810 in _lwp_sema_wait ()
(gdb) info threads
  19 Thread 10  0xef5b9810 in _lwp_sema_wait ()
  18 Thread 9  0xef5b9810 in _lwp_sema_wait ()
  17 Thread 8  0xef5b9810 in _lwp_sema_wait ()
  16 Thread 7 (LWP 8)  subtract_refs (containers=0x13dec8)
at ./Modules/gcmodule.c:166
  15 Thread 6  0xef647cac in _swtch ()
  14 Thread 5  0xef5b9810 in _lwp_sema_wait ()
  13 Thread 4 (LWP 0)  0xef647b7c in _swtch ()
  12 Thread 3  0xef647cac in _swtch ()
  11 Thread 2 (LWP 2)  0xef5b9958 in _signotifywait ()
  10 Thread 1 (LWP 6)  0xef5b7488 in _poll ()
  9 LWP9  0xef5b6a24 in door_restart ()
  8 LWP8  subtract_refs (containers=0x13dec8)
at ./Modules/gcmodule.c:166
  7 LWP7  0xef5b9810 in _lwp_sema_wait ()
  6 LWP6  0xef5b7488 in _poll ()
  5 LWP5  0xef5b9814 in _lwp_sema_wait ()
  4 LWP4  0xef5b9810 in _lwp_sema_wait ()
  3 LWP3  0xef5b9810 in _lwp_sema_wait ()
  2 LWP2  0xef5b9958 in _signotifywait ()
* 1 LWP1  0xef5b9810 in _lwp_sema_wait ()

(gdb) thread 16
[Switching to Thread 7 (LWP 8)]
#0  subtract_refs (containers=0x13dec8) at ./Modules/gcmodule.c:166
./Modules/gcmodule.c:166: No such file or directory.

(gdb) print *((PyObject *) gc)-ob_type
$1 = {ob_refcnt = 18213696, ob_type = 0x2d70b0, ob_size = 0, 
  tp_name = 0x1 T, tp_basicsize = 1328272, tp_itemsize = 4156348, 
  tp_dealloc = 0x125865c, tp_print = 0x3c1b04, tp_getattr = 0,
tp_setattr = 0, 
  tp_compare = 0x29, tp_repr = 0x3adeb0, tp_as_number = 0xf66198, 
  tp_as_sequence = 0xdf3fa0, tp_as_mapping = 0x0, tp_hash = 0x1, 
  tp_call = 0x144490 PyMethod_Type, tp_str = 0x3f0a1c, 
  tp_getattro = 0x125865c, tp_setattro = 0x3c1b04, tp_as_buffer = 0x0,

  tp_flags = 158561192, tp_doc = 0x29 , tp_traverse = 0x4c4f4144, 
  tp_clear = 0xd908c0, tp_richcompare = 0x1151300, tp_weaklistoffset =
0}

(gdb) print *((PyObject *) 0x2d70b0)-ob_type
$2 = {ob_refcnt = 2977968, ob_type = 0xff5b80, ob_size = 0, tp_name =
0x1 T, 
  tp_basicsize = 1328272, tp_itemsize = 4155228, tp_dealloc =
0x125865c, 
  tp_print = 0x3c1b04, tp_getattr = 0, tp_setattr = 0, tp_compare =
0x29, 
  tp_repr = 0, tp_as_number = 0x1212b48, tp_as_sequence = 0xbf8d30, 
  tp_as_mapping = 0x, tp_hash = 0x1, 
  tp_call = 0x144490 PyMethod_Type, tp_str = 0x4ab2cc, 
  tp_getattro = 0x1089d5c, tp_setattro = 0x4ab30c, tp_as_buffer = 0x0,

  tp_flags = 0, tp_doc = 0x29 , tp_traverse = 0, tp_clear =
0x122d140, 
  tp_richcompare = 0x11ccd70, tp_weaklistoffset = -1}

gdb) x 0x4c4f4144
0x4c4f4144: Cannot access memory at address 0x4c4f4144.

___
Zope-Dev maillist  -  [EMAIL PROTECTED]
http://lists.zope.org/mailman/listinfo/zope-dev
**  No cross posts or HTML encoding!  **
(Related lists - 
 http://lists.zope.org/mailman/listinfo/zope-announce
 http://lists.zope.org/mailman/listinfo/zope )



[Zope-dev] recipe for trapping SIGSEGV and SIGILL signals on solaris

2001-12-10 Thread Joseph Wayne Norton


Hello.

We are facing zope restarts on the solaris 5.6 platform with zope
2.4.3 and python 2.1.1.  I put together a script based some
information on an old posting to the apache mailing list.  The
following shell/perl script allows one to get a core file from a dying
zope child process and also allow the zope to restart without any side
effects.


The script 

#!/bin/sh
PATH=$PATH:/usr/local/bin
export PATH
cd /tmp
for PID in `ps -u zfs -f -o pid,comm,args | fgrep z2.py | cut -d' ' -f1`
do
export PID
truss -f -l -t\!all -S SIGSEGV,SIGILL -p $PID 21 \
| perl -pe 'system(gcore $ENV{'PID'}  sleep 5  kill -9 $ENV{'PID'}), 
exit($ENV{'PID'}) if /(SIGSEGV|SIGILL)/;' 
done


Step 1:  modify script to match your environment.

Step 2: execute script

Step 3: wait for core file to be dumped in /tmp.

Step 4: analyze with gdb where $PID is the pid of the dumped process

#bash gdb /path/to/bin/python /tmp/core.$PID 

#0  0xef5b9810 in _lwp_sema_wait ()
(gdb) where
#0  0xef5b9810 in _lwp_sema_wait ()
#1  0xef647ea0 in _park ()
#2  0xef647b84 in _swtch ()
#3  0xef6468a4 in cond_wait ()
#4  0xef6467c8 in _ti_pthread_cond_wait ()
#5  0x50220 in PyThread_acquire_lock (lock=0xd9d878, waitflag=1)
at Python/thread_pthread.h:313
#6  0x51f18 in lock_PyThread_acquire_lock (self=0xda39b8, args=0x0)
at ./Modules/threadmodule.c:67
#7  0x35db4 in fast_cfunction (func=0xda39b8, pp_stack=0xed40f828,
na=0)
at Python/ceval.c:2994
#8  0x33ca0 in eval_code2 (co=0x267848, globals=0x51ec4, locals=0x0,
args=0x0, 
argcount=0, kws=0x0, kwcount=0, defs=0x0, defcount=0, closure=0x0)
at Python/ceval.c:1951

:
:


It seems that we are facing trouble due to the thread library on
solaris (unless the truss command has introduced a side-effect).

Anyone else facing similiar troubles?   or maybe I should post
this to a python mailing list.

- joe



___
Zope-Dev maillist  -  [EMAIL PROTECTED]
http://lists.zope.org/mailman/listinfo/zope-dev
**  No cross posts or HTML encoding!  **
(Related lists - 
 http://lists.zope.org/mailman/listinfo/zope-announce
 http://lists.zope.org/mailman/listinfo/zope )