Hi, Am 24.05.2016 um 18:01 schrieb Manuel Pérez Jigato:
> hello > > I am running my code on an AMD cluster (32 nodes, each with 16 cores) under > the sun grid engine scheduler, version 6.2u5p2, and alongside openMPI 1.4.4 > > I have realised that my calculations suffer from a huge performance loss > (even 40 times) due to other jobs running on the same nodes This sounds like there is no tight integration of Open MPI into SGE which honors the granted slots. Please try Open MPI 1.6.5 and compile with --with-sge Later versions of Open MPI do an automatic core binding (which may bind several jobs to cores 0 upwards in case there is more than one Open MPI job on a node) and an extensive network scan, which may lead to a startup delay of 1 to 2 minutes to get all routes between the nodes. > I have been unable to control the actual nodes in order to make them use the > "fill-up" mode, all 16 cores must be selected in a strict manner Yes, fill-up will also use partial nodes until all requested slots could be collected. But you could define a fixed slot count of 16 in the PE's allocation rule. Then only bunches of a multiple of 16 can be requested though. > Previous SGE versions used to have the option "qsub -l exclusive..", but > 6.2u5p2 does not. This is a different issue but nevertheless: did you define a complex "exclusive" of with "relop" = "excl" and attached it to the exechosts? -- Reuti > Will you please give me a hint of how to solve my loss of performance problem? > > thanks a lot > > Manuel Perez Jigato > > _______________________________________________ > users mailing list > users@gridengine.org > https://gridengine.org/mailman/listinfo/users _______________________________________________ users mailing list users@gridengine.org https://gridengine.org/mailman/listinfo/users