Hi Gus;
I have done as uou suggest me but it always doesn't work!
Many thanks for your help
Mouhamad
Gus Correa a écrit :
Hi Mouhamad
Stack of 10240kB is probably the Linux default,
not necessarily good for HPC and number crunching.
I'd suggest that you change it
Hi Mouhamad
Stack of 10240kB is probably the Linux default,
not necessarily good for HPC and number crunching.
I'd suggest that you change it to unlimited,
unless your system administrator has a very good reason not to do
so.
We've seen many atmosphre/ocean/climate models crash because
they
Hi Gus Correa,
the output of ulimit -a is
file(blocks) unlimited
coredump(blocks) 2048
data(kbytes) unlimited
stack(kbytes)10240
lockedmem(kbytes)unlimited
memory(kbytes) unlimited
nofiles(descriptors) 1024
processes256
Hi Mouhamad
The locked memory is set to unlimited, but the lines
about the stack are commented out.
Have you tried to add this line:
* - stack -1
then run wrf again? [Note no "#" hash character]
Also, if you login to the compute nodes,
what is the output of 'limit' [csh,tcsh] or
Hi all,
I've checked the "limits.conf", and it contains theses lines
# Jcb 29.06.2007 : pbs wrf (Siji)
#* hardstack 100
#* softstack 100
# Dr 14.02.2008 : pour voltaire mpi
* hardmemlock unlimited
* softmemlock unlimited
Many thanks for
Hi Mouhamad, Ralph, Terry
Very often big programs like wrf crash with segfault because they
can't allocate memory on the stack, and assume the system doesn't
impose any limits for it. This has nothing to do with MPI.
Mouhamad: Check if your stack size is set to unlimited on all compute
nodes.
Looks like you are crashing in wrf - have you asked them for help?
On Oct 25, 2011, at 7:53 AM, Mouhamad Al-Sayed-Ali wrote:
> Hi again,
>
> This is exactly the error I have:
>
>
> taskid: 0 hostname: part034.u-bourgogne.fr
> [part034:21443] *** Process received signal ***
>
This looks more like a seg fault in wrf and not OMPI.
Sorry not much I can do here to help you.
--td
On 10/25/2011 9:53 AM, Mouhamad Al-Sayed-Ali wrote:
Hi again,
This is exactly the error I have:
taskid: 0 hostname: part034.u-bourgogne.fr
[part034:21443] *** Process received signal
Hi again,
This is exactly the error I have:
taskid: 0 hostname: part034.u-bourgogne.fr
[part034:21443] *** Process received signal ***
[part034:21443] Signal: Segmentation fault (11)
[part034:21443] Signal code: Address not mapped (1)
[part034:21443] Failing at address: 0xfffe01eeb340
Hello
can you run wrf successfully on one node?
NO, It can't run on one node
Can you run a simple code across your two nodes? I would try
hostname then some simple MPI program like the ring example.
Yes, I can run a simple code
many thanks
Mouhamad
Can you run wrf successfully on one node?
Can you run a simple code across your two nodes? I would try hostname
then some simple MPI program like the ring example.
--td
On 10/25/2011 9:05 AM, Mouhamad Al-Sayed-Ali wrote:
hello,
-What version of ompi are you using
I am using ompi
hello,
-What version of ompi are you using
I am using ompi version 1.4.1-1 compiled with gcc 4.5
-What type of machine and os are you running on
I'm using linux machine 64 bits.
-What does the machine file look like
part033
part033
part031
part031
-Is there a stack trace
Some more info would be nice like:
-What version of ompi are you using
-What type of machine and os are you running on
-What does the machine file look like
-Is there a stack trace left behind by the pid that seg faulted?
--td
On 10/25/2011 8:07 AM, Mouhamad Al-Sayed-Ali wrote:
Hello,
I have
13 matches
Mail list logo