Am 14.11.2014 um 03:32 schrieb Doan Trung Tung:

> Hi Reuti,
> 
> The mpi-ring is the test program from Intel which sends msg in a ring. If I 
> run the program manually using mpirun, it works just fine. The problem is 
> only when I use OSG to submit jobs with more than 16 slots (each node consist 
> 16 processors).

There is no limit in SGE for the number of slots requested by an application. 
Without knowing more details it's hard to make any assumption as I have no 
access to the application in question. As it's running outside of SGE:

- do you request any resources during the submission?
- can you print the $PE_HOSTFILE, as the application will get more than one 
node by SGE, as only 16 are available per machine as you stated
- you use the PE "orte" but compiled with Intel MPI? Which MPI are you going to 
use?
- do you request also two nodes when running interactively?

- the file xxx.core will be produced by the kernel to debug the problem in the 
application when it segfaults

To change this behavior for the interactive access:

$ ulimit -Ha
core file size          (blocks, -c) unlimited
...
$ ulimit -Sa
core file size          (blocks, -c) 0
$ ulimit -Sc unlimited

and you get the file too.

Inside SGE: these are the settings to switch it off:

$ qconf -sq all.q
...
s_core                INFINITY
h_core                0


-- Reuti


> Tung
> 
> On Fri, Nov 14, 2014 at 12:44 AM, Reuti <[email protected]> wrote:
> Hi,
> 
> Am 13.11.2014 um 18:09 schrieb Doan Trung Tung:
> 
> > I have OSG installed as a role on Rock cluster installation for a cluster 
> > of 16 nodes, each node has 16 processors. I'm new with OSG so I let 
> > everything in default.
> > When I submit mpi-ring example using qsub, if the number of slots is less 
> > than or equal to 16, all threads are run on a single random node. So I 
> > increase the number of slots to a number that larger than 16 hoping that 
> > they will run on different nodes, but actually they get errors.
> >
> > Here is the script I used to submit mpi-ring:
> > #!/bin/bash
> >
> > #$ -cwd
> > #$ -S /bin/bash
> > #$ -j y
> > #$ -pe orte 8
> > mpirun $HOME/testmpi/mpi-ring
> 
> What mpi-ring in detail - where is the source resp. from what MPI library?
> 
> -- Reuti
> 
> 
> > (orte is one of 4 default parallel environments the system has)
> >
> > If I change the number of slots to 17 instead of 8, I get this error:
> > APPLICATION TERMINATED WITH THE EXIT STRING: Segmentation fault (signal 11)
> > also a stranged file was produced: core.xxxx
> >
> > Why do I cannot submit more thatn 16 slots?
> >
> > Thanks.
> > _______________________________________________
> > users mailing list
> > [email protected]
> > https://gridengine.org/mailman/listinfo/users
> 
> 
> 
> 
> -- 
> Doan Trung Tung, PhD.
> Researcher, HPC - Hanoi University of Technologies
> Mobile: 0914720240


_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users

Reply via email to