Hi,
 
  I've been trying to track down source of the segmentation faults and bus errors I'm getting running samples with the current CVS opendx.  Some things are starting to slowly make sense, so to speed things up I thought I'd toss out a few things I've found out (and also things I don't quite get).  I actually have some time available to try and sort through this, but I'll need some guidance.
 
The first thing is just an observation:  If you want to find memory access problems in a program, just run it under IRIX!  I work with a lot of researchers who run code on both an IBM SP and an Origin 2000.  I've seen many codes that run apparently fine on the SP but that die with segmentation faults on the Origin. 
 
One thing I've been puzzling over, but can't tell if it's important or not, is a difference in the behavior of shmctl on IRIX and other systems (I've tried AIX and Linux).  When called with IPC_RMID, like on line 398 of src/exec/libdx/mem.c, AIX and Linux both mark the shared memory segment for deletion, but don't actually delete it right away.  IRIX actually removes the segment immediately.  My understanding of shared memory is dim at best, so I don't know if this difference in behavior is important or not.
 
I've managed to trace all of the segmentation faults and bus errors I'm getting to routines that work with the dictionary routines in src/exec/dpexec/d.c.  I get a segmentation fault in strcmp on line 299, in the ExDictionarySearchE routine, because e->key is null.  I can reproduce this at will by running the AutoColor.net sample program, switching to rotation mode and then trying to rotate the water molecule.  Does this happen with anyone else running on an IRIX system?  It happens on an o32-built DX running on an Indigo2 and an n32-built DX running on an Octane.  I can also make this happen by opening AutoColor.net and changing the rendering to hardware.
 
A second error I can reproduce also uses the AutoColor.net program.  If I open it, then immediately switch to another sample program I get a bus error in dxf_ExDictionaryPurge (in src/exec/dpexec/d.c, line 260).
 
Just to make it clear -- I get these kinds of errors using other programs than AutoColor.  (If it were just AutoColor, I wouldn't worry).  To make them happen quickly and easily reproducible, AutoColor.net is the way to go.
 
It looks like all of the problems I'm seeing are somehow related to the dictionary routines.  I've tried working through them to track down what's happening, but it's beyond me.  I'm guessing that the problems are actually occurring before the segmentation violation happens, but that it's taking a while before the effects are seen.  I haven't seen many reports of problems like this, probably because most people are running OpenDX on something other than IRIX machines.
 
I don't think I've pinpointed this enough to report it as a bug yet, but I'll be happy to enter it if Those Who Control The Buglist think it should go there.
 
*********************************************************************
Alan M. Ferrenberg ([EMAIL PROTECTED])
Manager, Research and Computational Science Support
UGA Computing and Networking Services
 
Get the latest information from UCNS!  Subscribe to our weekly e-mail
publication "UCNS Weekly News".  For more information, check out:
 
 
*********************************************************************
 
 

Reply via email to