Re: [OMPI users] exited on signal 11 (Segmentation fault).

2011-10-26 Thread Mouhamad Al-Sayed-Ali
Hi Gus; I have done as uou suggest me but it always doesn't work! Many thanks for your help Mouhamad Gus Correa a écrit : Hi Mouhamad Stack of 10240kB is probably the Linux default, not necessarily good for HPC and number crunching. I'd suggest that you change it

Re: [OMPI users] exited on signal 11 (Segmentation fault).

2011-10-26 Thread Gus Correa
Hi Mouhamad Stack of 10240kB is probably the Linux default, not necessarily good for HPC and number crunching. I'd suggest that you change it to unlimited, unless your system administrator has a very good reason not to do so. We've seen many atmosphre/ocean/climate models crash because they

Re: [OMPI users] exited on signal 11 (Segmentation fault).

2011-10-26 Thread Mouhamad Al-Sayed-Ali
Hi Gus Correa, the output of ulimit -a is file(blocks) unlimited coredump(blocks) 2048 data(kbytes) unlimited stack(kbytes)10240 lockedmem(kbytes)unlimited memory(kbytes) unlimited nofiles(descriptors) 1024 processes256

Re: [OMPI users] exited on signal 11 (Segmentation fault).

2011-10-25 Thread Gus Correa
Hi Mouhamad The locked memory is set to unlimited, but the lines about the stack are commented out. Have you tried to add this line: * - stack -1 then run wrf again? [Note no "#" hash character] Also, if you login to the compute nodes, what is the output of 'limit' [csh,tcsh] or

Re: [OMPI users] exited on signal 11 (Segmentation fault).

2011-10-25 Thread Mouhamad Al-Sayed-Ali
Hi all, I've checked the "limits.conf", and it contains theses lines # Jcb 29.06.2007 : pbs wrf (Siji) #* hardstack 100 #* softstack 100 # Dr 14.02.2008 : pour voltaire mpi * hardmemlock unlimited * softmemlock unlimited Many thanks for

Re: [OMPI users] exited on signal 11 (Segmentation fault).

2011-10-25 Thread Gus Correa
Hi Mouhamad, Ralph, Terry Very often big programs like wrf crash with segfault because they can't allocate memory on the stack, and assume the system doesn't impose any limits for it. This has nothing to do with MPI. Mouhamad: Check if your stack size is set to unlimited on all compute nodes.

Re: [OMPI users] exited on signal 11 (Segmentation fault).

2011-10-25 Thread Ralph Castain
Looks like you are crashing in wrf - have you asked them for help? On Oct 25, 2011, at 7:53 AM, Mouhamad Al-Sayed-Ali wrote: > Hi again, > > This is exactly the error I have: > > > taskid: 0 hostname: part034.u-bourgogne.fr > [part034:21443] *** Process received signal *** >

Re: [OMPI users] exited on signal 11 (Segmentation fault).

2011-10-25 Thread TERRY DONTJE
This looks more like a seg fault in wrf and not OMPI. Sorry not much I can do here to help you. --td On 10/25/2011 9:53 AM, Mouhamad Al-Sayed-Ali wrote: Hi again, This is exactly the error I have: taskid: 0 hostname: part034.u-bourgogne.fr [part034:21443] *** Process received signal

Re: [OMPI users] exited on signal 11 (Segmentation fault).

2011-10-25 Thread Mouhamad Al-Sayed-Ali
Hi again, This is exactly the error I have: taskid: 0 hostname: part034.u-bourgogne.fr [part034:21443] *** Process received signal *** [part034:21443] Signal: Segmentation fault (11) [part034:21443] Signal code: Address not mapped (1) [part034:21443] Failing at address: 0xfffe01eeb340

Re: [OMPI users] exited on signal 11 (Segmentation fault).

2011-10-25 Thread Mouhamad Al-Sayed-Ali
Hello can you run wrf successfully on one node? NO, It can't run on one node Can you run a simple code across your two nodes? I would try hostname then some simple MPI program like the ring example. Yes, I can run a simple code many thanks Mouhamad

Re: [OMPI users] exited on signal 11 (Segmentation fault).

2011-10-25 Thread TERRY DONTJE
Can you run wrf successfully on one node? Can you run a simple code across your two nodes? I would try hostname then some simple MPI program like the ring example. --td On 10/25/2011 9:05 AM, Mouhamad Al-Sayed-Ali wrote: hello, -What version of ompi are you using I am using ompi

Re: [OMPI users] exited on signal 11 (Segmentation fault).

2011-10-25 Thread Mouhamad Al-Sayed-Ali
hello, -What version of ompi are you using I am using ompi version 1.4.1-1 compiled with gcc 4.5 -What type of machine and os are you running on I'm using linux machine 64 bits. -What does the machine file look like part033 part033 part031 part031 -Is there a stack trace

Re: [OMPI users] exited on signal 11 (Segmentation fault).

2011-10-25 Thread TERRY DONTJE
Some more info would be nice like: -What version of ompi are you using -What type of machine and os are you running on -What does the machine file look like -Is there a stack trace left behind by the pid that seg faulted? --td On 10/25/2011 8:07 AM, Mouhamad Al-Sayed-Ali wrote: Hello, I have