Unfortunately Slurm will need modification to do what you ask. It's  
straightforward work (everything would go into the function _run_now  
in the module src/plugins/select/cons_res), but I have no idea when it  
might happen.

Quoting Mike Donahue <[email protected]>:

> Hi,
>
> I'm relatively new to SLURM, but I'm trying to come up with a  
> prototype SLURM configuration for our particular needs.   One of our  
> goals is to setup low and high priority partitions, such that jobs  
> submitted to the high priority partition will preempt jobs running  
> in the lower priority partition.  I was able to get this basic  
> functionality to work fairly easily, using  
> /*PreemptType=preempt/partition_pr*//*i*//*o*/. This works fine when  
> I have
> */SelectType=select/cons_res/**//*and  
> */SelectTypeParameters=CR_CPU/**/. /*Preemption occurs as expected  
> when the number of available CPUs is the limiting resource.
>
> However, in our situation, our license pool is by far the most  
> limiting resource.  We have many more CPUs available than licenses.  
> The simple */License /*specification in the slurm.conf file seems to  
> work well to model a central pool of arbitrary license resources,  
> with jobs pending when license resources are fully utilized, until  
> running jobs which have requested the license complete and  
> relinquish the resource.   However, our real goal is to have running  
> jobs preempted by higher priority jobs when licenses are the  
> limiting resource.   If we use  
> /*PreemptType=preempt/partition_pr*//*i*//*o , */jobs submitted to  
> the high priority partition will only preempt running jobs when CPU  
> resources are fully utilized.
>
> I tried several experiments defining QOS specifications to try to  
> model the license pool, rather than using the */License  
> /*specification. These included creating a high and low priority  
> QOS, each with a limited number of total jobs available to users of  
> the QOS, which would mimic the license pool, and changing to  
> /*PreemptType=preempt/qos*/. Once everything was setup correctly, I  
> could get jobs submitted to the high priority QOS to preempt running  
> jobs submitted to the lower priority QOS.  However, the artificial  
> job count limit I'd setup for each QOS was not really a shared pool  
> of job slots, but a separate count for each QOS.   Trying to combine  
> the
> the */License /*specification with the high and low priority QOS did  
> not seem to help things.
>
> I also tried setting up a common job count limit in acctmgr, equal  
> to the number of license resources, for the one and only "account"  
> we've defined.   This seems to act effectively as a common limit for  
> all jobs submitted to any queue.   Still, with  
> /*PreemptType=preempt/partition_pr*//*i*//*o, and  
> */*/SelectType=select/cons_res/*/*,  
> */*/SelectTypeParameters=CR_CPU/*, jobs submitted to the higher  
> priority partition would only preempt other jobs when the total  
> number of available CPUs was "consumed".
>
> No matter what I try, it seems that the /*License */resource is sort  
> of a second-class criteria, and is really taken into account last in  
> the section process, and not at all for the preemption process.    
> Ideally, it it would be desirable that licenses could be able to be  
> promoted to the level of a "consumable resource" that would be  
> considered by the preemption algorithm.
>
> Any suggestions would be appreciated!
>
> One note:  I'm using simple scripts which execute the "sleep"  
> command as my test vehicle.   As such, these jobs use hardly any CPU  
> bandwidth or memory resources.  Not sure if this could skew the  
> behavior of the scheduler and/or preemption algorithms.
>
> We are currently using SLURM release 2.4.4.
>
> Thanks,
> Mike Donahue
>
>

Reply via email to