Hello everyone,

I'm looking for a way to properly catch an incoming srun/salloc/etc. command, check which node its supposed to run on, and possibly redirect it to some other node of my choosing.
All of this from within the code.
My current point of invasion is within the scheduler plugin.

My approach so far:

From within the slurm_sched_p_newalloc( struct job_record *job_ptr )-method in the scheduler-wrapper i'm doing the following:

//get current target
char *alloc_node = bitmap2node_name(job_ptr->node_bitmap);

[logic to determine wether a nodechange is due]

struct node_record *oldnode = find_node_record(alloc_node);
        if (oldnode) {
                struct node_record *newnode = find_node_record(newnode_name);
                if (newnode) {
                        oldnode->run_job_cnt--;
                        oldnode->no_share_job_cnt--;
                        newnode->run_job_cnt++;
                        newnode->no_share_job_cnt++;
                        bitstr_t *t_node_bitmap;
                        if (!node_name2bitmap(newnode_name, true, 
&t_node_bitmap)) {
                                job_ptr->node_bitmap = bit_copy(t_node_bitmap);
                                job_ptr->nodes = strdup(newnode_name);
                        }
                }
        }

However this
a) only works rarely, depending on if amount of cpus requested etc. actually match, b) doesn't properly set the states of the nodes (which i could do manually aswell, sure).

But this is neither elegant nor properly working most of the time (to no surprise). Therefore i'd like to get some starting points on how to properly use the internal rpc-system etc.

Any help? Thanks in advance.

Regards, M. Wagner

Reply via email to