|
Hi,
I've been trying to track down source of the
segmentation faults and bus errors I'm getting running samples with the current
CVS opendx. Some things are starting to slowly make sense, so to speed
things up I thought I'd toss out a few things I've found out (and also things I
don't quite get). I actually have some time available to try and sort
through this, but I'll need some guidance.
The first thing is just an observation: If
you want to find memory access problems in a program, just run it under
IRIX! I work with a lot of researchers who run code on both an IBM SP and
an Origin 2000. I've seen many codes that run apparently fine on the SP
but that die with segmentation faults on the Origin.
One thing I've been puzzling over, but can't tell
if it's important or not, is a difference in the behavior of shmctl on IRIX and
other systems (I've tried AIX and Linux). When called with IPC_RMID, like
on line 398 of src/exec/libdx/mem.c, AIX and Linux both mark the shared memory
segment for deletion, but don't actually delete it right away. IRIX
actually removes the segment immediately. My understanding of shared
memory is dim at best, so I don't know if this difference in behavior is
important or not.
I've managed to trace all of the segmentation
faults and bus errors I'm getting to routines that work with the dictionary
routines in src/exec/dpexec/d.c. I get a segmentation fault in strcmp on
line 299, in the ExDictionarySearchE routine, because e->key is null. I
can reproduce this at will by running the AutoColor.net sample program,
switching to rotation mode and then trying to rotate the water molecule.
Does this happen with anyone else running on an IRIX system? It happens on
an o32-built DX running on an Indigo2 and an n32-built DX running on an
Octane. I can also make this happen by opening AutoColor.net and changing
the rendering to hardware.
A second error I can reproduce also uses the
AutoColor.net program. If I open it, then immediately switch to another
sample program I get a bus error in dxf_ExDictionaryPurge (in src/exec/dpexec/d.c, line 260).
Just to make it clear -- I get these kinds of
errors using other programs than AutoColor. (If it were just AutoColor, I
wouldn't worry). To make them happen quickly and easily reproducible,
AutoColor.net is the way to go.
It looks like all of the problems I'm seeing are
somehow related to the dictionary routines. I've tried working through
them to track down what's happening, but it's beyond me. I'm guessing
that the problems are actually occurring before the segmentation violation
happens, but that it's taking a while before the effects are seen. I
haven't seen many reports of problems like this, probably because most people
are running OpenDX on something other than IRIX machines.
I don't think I've pinpointed this enough to report
it as a bug yet, but I'll be happy to enter it if Those Who Control The Buglist
think it should go there.
*********************************************************************
Alan M. Ferrenberg ([EMAIL PROTECTED]) Manager, Research and Computational Science Support UGA Computing and Networking Services Get the latest information from UCNS!
Subscribe to our weekly e-mail
publication "UCNS Weekly News". For more information, check out: *********************************************************************
|
- [opendx-dev] More about memory problems with IRIX 6.... Alan Ferrenberg
- Re: [opendx-dev] More about memory problems wit... Jeff Braun
- Re: [opendx-dev] More about memory problems... Peter Daniel Kirchner
- Re: [opendx-dev] More about memory prob... Jeff Braun
- Re: [opendx-dev] More about memory ... Richard Gillilan
- Re: [opendx-dev] More about memory ... Tom Goodale
