SLURM Devs,

This is probably a FAQ whose answer is "nope" but my search-fu has failed me. We recently had a need to think about something. This is going to be a generic experiment because I don't want to have to remember all the details of the real names of qos, etc.

Namely, on our cluster, lets say we have three ways to run:

  1. --partition=limited
  2. --qos=high
  3. Default

Number one is a partition that not many can submit to, is a dedicated chunk of the cluster, but one can only run 3 jobs in it.

Number two is a qos with a high priority in the "general" "default" partition of the machine. This might have a limit on number of jobs (let's say 6, though I don't know if there is a limit) so people don't abuse it.

Number three is when you just sbatch and get whatever the default is.


Obviously, #1 is the gold standard, run until you limit out; #2 is better, and #3 is least attractive.

Now, we have a situation where an experiment needs to run, say 12 jobs that take 3 hours each. If we had our druthers, we'd submit all 12 to #1 and all 12 would launch at once. Can't do that. You get only 3 in. So now go to #2, only get 6 in (assuming the general cluster partition isn't full). If you limit out of #2, then fall over to #3.

I think you get what I want. I'd love to have a single sbatch call that says:

  Take this job and submit such that it runs under #1,  #2,  #3, and
  whatever can take it first wins.

In our case, I can see 3 perhaps getting in right away into #1, a few more a bit later in #2 and then the next ones maybe when #1 is free again, or perhaps #3... I know the --constraint has a nice OR operator, but I'm not sure anything else does.


Now, one way we can think to do this (since I don't know if you can do the above) is to submit 12 jobs to *each* queue-config possibility and then underneath, have a lockfile-managed script that holds a MasterList of all the possible jobs. If someone manages to get an allocation, that one pops a job off the MasterList, now there are 11 left, and so on.

Once the MasterList is empty (aka all jobs run or running), you could then clean up all the queued jobs that never will run anything useful (and if they get an allocation, the empty MasterList would just return the allocation immediately).

We have experience with this lock and masterlist (for other purposes), so we can do it, but as I said, it'd be nice if we could make one big meta sbatch call. Because it's nice to only have 12 jobs in the queue instead of 36 :)

Matt
--
Matt Thompson, SSAI, Sr Scientific Programmer/Analyst
NASA GSFC,    Global Modeling and Assimilation Office
Code 610.1,  8800 Greenbelt Rd,  Greenbelt,  MD 20771
Phone: 301-614-6712                 Fax: 301-614-6246
http://science.gsfc.nasa.gov/sed/bio/matthew.thompson

Reply via email to