So, I have this figured out, I felt pretty dumb when I traced back the debug
function and found it was as easy as setting the flag in slurm.conf.
Here are the sections I added to the job_submit_partition.c to accomplish what
I was after below.
/* This will compare the number of nodes requested to the
number of nodes a user can request for the partition */
static bool _valid_nodes (struct part_record *part_ptr,
struct job_descriptor *job_desc)
{
uint32_t job_limit, part_limit;
/* If the value is undefined in job_desc it throws back the max
value for a 32 bit int -1 */
if (job_desc->num_tasks != UINT32_MAX-1)
{ job_limit = job_desc->num_tasks;
part_limit = part_ptr->max_nodes;
}
if (job_desc->min_nodes != UINT32_MAX-1)
{ job_limit = job_desc->min_nodes;
part_limit = part_ptr->max_nodes;
}
if (job_limit > part_limit)
{ debug("job_submit/partition: skipping partition %s due
to "
"node limit (%u > %u)",
part_ptr->name, job_limit, part_limit);
return false;
}
return true;
}
/* This will check the number of total cpus in the partition to
that which is requested */
static bool _valid_cpu (struct part_record *part_ptr,
struct job_descriptor *job_desc)
{
uint32_t job_limit, part_limit;
/* If the value is undefined in job_desc it
throws back the max value for a 32 bit int -1 */
if (job_desc->min_cpus != UINT32_MAX-1){
job_limit = job_desc->min_cpus;
part_limit =
part_ptr->total_cpus;
}
if (job_limit > part_limit){
debug("job_submit/partition: skipping partition %s due to "
"cpu limit (%u > %u)",
part_ptr->name, job_limit, part_limit);
return false;
}
return true;
}
/* this part gets put in where it tests job elements */
if (!_valid_nodes(part_ptr,
job_desc))
continue;
if (!_valid_cpu(part_ptr,
job_desc))
continue;
What this adds is a check to see how many nodes the job requests, if that value
is defined, also number of tasks if that value is defined and does a compare on
those values with partition limits. Then it does a cpu check to make sure that
there is actually enough cpus in the partition to run the job.
Hope this helps someone doing something similar to what we are.
Buddy.
From: Scharfenberg, Buddy Lee
Sent: Thursday, April 23, 2015 9:10 AM
To: '[email protected]'
Subject: Job Submit plugin help
Hello all,
I've been pouring through the slurm-dev archive trying to find how to modify
the partition plugin to do my bidding and I've come up empty, might be there
and I just missed it but I can't dig any longer and am resigned to ask for some
help from someone more familiar with the problem than I.
Let me start by describing my problem, We run a heterogeneous cluster with some
Infinniband enabled nodes and some non infiniband enabled nodes, we also have
some Researchers who have put money down and bought hardware to in our cluster
and in return we provide priority access to those nodes. My ideal config to get
the best usage out of what is under the Headnode's purview is to have all nodes
in either a serial partition or a mpi partition and these partitions are set to
preempt by re-queuing the job based on partition priority, the priority on
these are set low, then I am putting the researcher nodes in a high priority
partition that is set to prevent preemption.
So my partitions look like this
Researcher owned MaxNodes=#of nodes owned
^
|
Serial MaxNodes=1
^
|
MPI MaxNodes=infinite
I would like the job_submit partition plugin to route jobs automatically based
on number of nodes required for that job. From what I can see in the code it
only does a check to make sure that max mem per cpu for the partition is less
than that which has been requested by the user and that user is a member of the
groups allowed to submit to that partition out of the box. I added a node
validator block to it to attempt to do the job min_nodes to the partition
max_nodes setting.
static bool _valid_nodes (struct part_record *part_ptr,
struct job_descriptor *job_desc)
{
uint32_t job_limit, part_limit;
job_limit = job_desc->min_nodes;
part_limit = part_ptr->max_nodes;
if (job_limit > part_limit) {
debug("job_submit/partition: skipping partition %s due to "
"node limit (%u > %u)",
part_ptr->name, job_limit, part_limit);
return false;
}
return true;
}
Then later on I added it to the iterator alongside the existing one.
if (!_valid_nodes(part_ptr, job_desc))
continue;
It compiles and doesn't complain, I put the .so in my slurm libs directory and
set JobSubmitPlugins=partition in slurm.conf.
In testing I have found that my node validator will only return true when I
have the partition node limit set to infinite because everything goes in MPI
until I define the max nodes for that partition then it simply never returns a
SLURM_SUCCESS. I don't know what I need to set to get the debugging ouput to
show up somewhere that I can look at what values it is trying to compare, but
using the debug flag on slurmctld just gets me the output of the log printed to
Std. out.
Anyone have anything to offer that might help me get this configured?
Thanks,
Buddy.