Hi,

Am 24.05.2016 um 18:01 schrieb Manuel Pérez Jigato:

> hello
> 
> I am running my code on an AMD cluster (32 nodes, each with 16 cores) under 
> the sun grid engine scheduler, version 6.2u5p2, and alongside openMPI 1.4.4
> 
> I have realised that my calculations suffer from a huge performance loss 
> (even 40 times) due to other jobs running on the same nodes

This sounds like there is no tight integration of Open MPI into SGE which 
honors the granted slots. Please try Open MPI 1.6.5 and compile with --with-sge 
Later versions of Open MPI do an automatic core binding (which may bind several 
jobs to cores 0 upwards in case there is more than one Open MPI job on a node) 
and an extensive network scan, which may lead to a startup delay of 1 to 2 
minutes to get all routes between the nodes.


> I have been unable to control the actual nodes in order to make them use the 
> "fill-up" mode, all 16 cores must be selected in a strict manner

Yes, fill-up will also use partial nodes until all requested slots could be 
collected. But you could define a fixed slot count of 16 in the PE's allocation 
rule. Then only bunches of a multiple of 16 can be requested though.


> Previous SGE versions used to have the option "qsub -l exclusive..", but 
> 6.2u5p2 does not.

This is a different issue but nevertheless: did you define a complex 
"exclusive" of with "relop" = "excl" and attached it to the exechosts?

-- Reuti


> Will you please give me a hint of how to solve my loss of performance problem?
> 
> thanks a lot
> 
> Manuel Perez Jigato
> 
> _______________________________________________
> users mailing list
> users@gridengine.org
> https://gridengine.org/mailman/listinfo/users


_______________________________________________
users mailing list
users@gridengine.org
https://gridengine.org/mailman/listinfo/users

Reply via email to