12 mar 2013 kl. 15:33 skrev Reuti <[email protected]> : > >> The problem we would like to solve is when users submit a job with software >> that defaults to start a number of worker threads equal to the number of >> cores, and thus parasitising on other jobs allocations within the node. > > Often this comes due to the software checking the number of installed cores. > Even if it's limited to certain cores, it would be good not to oversubscribe > them.
Sure, that is the case. The problem, though, is that we have users who are not aware of this. The goal is, sort of, to protect the well-behaving users from the ones who are not. The latter are not behaving well due to lack of knowledge. Ie, they might download a program and submit it to as single slot program while not being aware that the program will actually check how many cores it can find and launch that number of threads. In this case it is perfectly fine if the confinement is a "soft limit", as these users don't have the competence to work around the soft limit. > With a file $SGE_ROOT/default/common/sge_request > > you could add (or similar variables): > > -v OMP_NUM_THREADS=1,MKL_NUM_THREADS=1 > > to avoid threads at all. If it's more sophisticated with a varying number, > it's also possible to use either a JSV or to add them to > $SGE_JOB_SPOOL_DIR/environment (the script must run as queue prolog under the > SGE admin account). Sure, this can be overridden by the job. Ok, so one could basically use the prolog to set OMP_NUM_THREADS to $NSLOTS for smp jobs. I had not realised that it was possible to edit $SGE_JOB_SPOOL_DIR/environment in order to modify the job environment. Thanks for the hint. >> >> We have 8 or 16 cores per machine. However, as the workloads is a mixture >> between serial and parallel jobs, it is quite common that parallel jobs get >> stuck in the queue as there is no complete node free. To get around this >> problem we are quite often spreading MPI jobs across nodes to pick up free >> slots. The softwares we are using are not passing a lot of information >> across MPI, but mostly uses it for synchronisation. > > Depending on your workflow options are: > > - limit certain nodes to serial resp. parallel jobs > - limit the number of serial and parallel slost per exechost by using two > queues Yes, but that would still end up in unused slots in cases where short serial jobs could be scheduled. In that case we prefer some jobs having to wait slightly longer while keeping the hardware busy. It's a small departmental cluster used for diverse bioinformatic workloads. Mikael _______________________________________________ users mailing list [email protected] https://gridengine.org/mailman/listinfo/users
