Hi, Am 26.01.2015 um 21:39 schrieb Winkler, Ursula ([email protected]):
> I'll trying to find a solution for an environment running serial jobs as well > as mpi jobs on > 6 hosts where each host has 32 cores/slots. Due to the small number of nodes, > assigning > each sort of jobs to separate nodes (e.g. nodes 1-2 for serial, nodes 3-6 for > mpi jobs) is > not an option, expecially because the ratio serial:mpi is quite a variable > one. > > I tried out to set up 2 queues with "serial" as a subordinate queue to "mpi". > - But that > only is unwasteful if the mpi job(s) use ~ 32 slots per host. Otherwise there > are serial > jobs which could run but persist unnecessarily in a suspended state due to > the fact > that the whole queue "serial" is suspended. > > The other possible option should be the subordination of slots, but that > doesn't work either > because the scheduler obviously (concerning subordination) is not capable of > figuring out how many slots a mpi job actually is requesting, and so suspends > stubbornly only one serial job - > which of course causes core oversubscription. > > Has somebody an idea to solve this problem in a satisfying way? Why not submitting all jobs to one and the same queue? It might be good to provide a suitable: $ qconf -ssconf ... max_reservation 20 default_duration 8760:00:00 and submit the parallel jobs with "-R y" to avoid starvation. To use the backfilling in a proper way a value h_rt needs to be provided too during submission. -- Reuti _______________________________________________ users mailing list [email protected] https://gridengine.org/mailman/listinfo/users
