[slurm-dev] Re: Re:Best Way to Schedule Jobs based on predetermined Lists

2017-04-12 Thread dani

  
  



On 12/04//2017 02:19,
  maviko.wag...@fau.de wrote:


  
  Hello Thomas and others,
  
  
  thanks again for the feedback. I agree, i don't actually need
  Slurm for my small-scale cluster.
  
  However it's part of the baseline-assignment i'm working on to use
  as much hpc-established software as possible.
  
  
  For now i settled with expanding the standard /sched/builtin
  plugin to support taking advise regarding node-selection based on
  my lists.
  
  I'm aware that this will be working against the established
  scheduler and part of my project is to see how and why.
  
  
  Your suggestion regarding "features" as suggestions is interesting
  though, and i believe it could actually be integrated with the
  shipped scheduler.
  
  However this won't be my focus for now and might be something
  worth looking into for a follow-up project.
  
  
  Currently i'm looking for advice on two topics, going a bit deeper
  into the code:
  
  
  a) Where/How does slurm store whatever information i passed with
  "srun --export""?
  
  I found both the job-desc and job-details structs, however i can't
  seem to find any info regarding the "export"-envvar-strings.
  
  But it has to be stored somewhere internally, right?
  
  I'd like to be able to access/modify it from within the scheduler
  if possible.
  
  
  b) On some of my machines i need to set powercaps with rapl
  (utilizing a binary i usually call via ssh) from within slurm.
  

rapl support exists in slurm
https://slurm.schedmd.com/slurm.conf.html#OPT_acct_gather_energy/rapl
You might also wish to take a look at
https://slurm.schedmd.com/slurm.conf.html#OPT_capmc_path=
Possibly you might get away with just providing your own wrapper,
instead of using cray capmc, as long as you conform to the api.
Might help to take a look at
https://github.com/SchedMD/slurm/blob/master/src/plugins/power/cray/power_cray.c
I suppose i could either try to setup custom
  prologues, or edit the existing one. I'm unsure if thats possible
  however.
  
  Any advice on how i could realize that behaviour?
  
  
  Thanks in advance,
  
  
  M. Wagner
  
  
  On 2017-04-10 20:28, Thomas M. Payerle wrote:
  
  
On 2017-04-05 16:00,
  maviko.wag...@fau.de wrote:
  
  Hello Dani and Thomas,


[ ...]


However i think i did not specify clear enough what my
cluster looks

like and what i'm trying to achieve.

Compared to a regular HPC-Cluster my testing cluster
consists of as

little as 5 nodes (each having the same
"grandscale"-features, so no

IB-nodes etc., and only differ in hardware details like cpu,
#RAM

etc., including some MCUs akin to a raspberry pi).

The purpose of this cluster is to investigate how smart
distribution

of workloads based on predetermined performance and energy
data can

benefit hpc-clusters that consist of heterogenous systems
that differ

greatly regarding energy consumption and performance.

Its just a small research project.

  


IF your only intent is to do this on your 5 node test cluster,
you

probably do not need Slurm.  If you are looking to have
something expand

to real clusters, then you really should be using something like
features

and the scheduler.  The scheduler is already taking into account
what nodes

provide what resources, what resources on the nodes are
currently available

for use, and handling the complex (and generally, at least I
found it to

be much more complex than I initially and naively thought it
should be when

first started thinking about it) task of scheduling the jobs.


My IB example was just an example.  You could just as easily
assign

Slurm "features" based on the feature set of the CPU, etc. 
E.g., if

only

some nodes have CPUs with support AVX, label those nodes as
"avx" and jobs

requiring AVX can restrict themselves to such nodes.


If you start specifying specific nodes in requests to the
scheduler, you

are 

[slurm-dev] Re: Re:Best Way to Schedule Jobs based on predetermined Lists

2017-04-11 Thread maviko . wagner


Hello Thomas and others,

thanks again for the feedback. I agree, i don't actually need Slurm for 
my small-scale cluster.
However it's part of the baseline-assignment i'm working on to use as 
much hpc-established software as possible.


For now i settled with expanding the standard /sched/builtin plugin to 
support taking advise regarding node-selection based on my lists.
I'm aware that this will be working against the established scheduler 
and part of my project is to see how and why.


Your suggestion regarding "features" as suggestions is interesting 
though, and i believe it could actually be integrated with the shipped 
scheduler.
However this won't be my focus for now and might be something worth 
looking into for a follow-up project.


Currently i'm looking for advice on two topics, going a bit deeper into 
the code:


a) Where/How does slurm store whatever information i passed with "srun 
--export""?
I found both the job-desc and job-details structs, however i can't seem 
to find any info regarding the "export"-envvar-strings.

But it has to be stored somewhere internally, right?
I'd like to be able to access/modify it from within the scheduler if 
possible.


b) On some of my machines i need to set powercaps with rapl (utilizing a 
binary i usually call via ssh) from within slurm.
I suppose i could either try to setup custom prologues, or edit the 
existing one. I'm unsure if thats possible however.

Any advice on how i could realize that behaviour?

Thanks in advance,

M. Wagner

On 2017-04-10 20:28, Thomas M. Payerle wrote:

On 2017-04-05 16:00, maviko.wag...@fau.de wrote:

Hello Dani and Thomas,

[ ...]

However i think i did not specify clear enough what my cluster looks
like and what i'm trying to achieve.
Compared to a regular HPC-Cluster my testing cluster consists of as
little as 5 nodes (each having the same "grandscale"-features, so no
IB-nodes etc., and only differ in hardware details like cpu, #RAM
etc., including some MCUs akin to a raspberry pi).
The purpose of this cluster is to investigate how smart distribution
of workloads based on predetermined performance and energy data can
benefit hpc-clusters that consist of heterogenous systems that differ
greatly regarding energy consumption and performance.
Its just a small research project.


IF your only intent is to do this on your 5 node test cluster, you
probably do not need Slurm.  If you are looking to have something 
expand
to real clusters, then you really should be using something like 
features
and the scheduler.  The scheduler is already taking into account what 
nodes
provide what resources, what resources on the nodes are currently 
available
for use, and handling the complex (and generally, at least I found it 
to
be much more complex than I initially and naively thought it should be 
when

first started thinking about it) task of scheduling the jobs.

My IB example was just an example.  You could just as easily assign
Slurm "features" based on the feature set of the CPU, etc.  E.g., if
only
some nodes have CPUs with support AVX, label those nodes as "avx" and 
jobs

requiring AVX can restrict themselves to such nodes.

If you start specifying specific nodes in requests to the scheduler, 
you
are going to find yourself working against the scheduler, and that is 
not

likely to have a good outcome.   You are better off telling Slurm
which nodes have which features (essentially a one-time configuration
of the cluster)
and then have your code translate the requirements into a list of 
"features"

requested for the job under Slurm.

The only part that I see as a potential major problem is that, as I 
tried to
explain previously, the "features" requested by a job are REQUIREMENTS 
to
Slurm, not SUGGESTIONS.  E.g., if a job can run either with or without 
AVX
support, but runs better with AVX support, requiring the "avx" feature 
will
force the job to wait for a node supporting AVX, even if all the AVX 
nodes

are in use and there are plenty of non-AVX nodes which are idle.

I am not aware of anything in Slurm which handles such as SUGGESTIONS, 
and
doing such I believe would greatly complicate an already complex 
algorithm.

I believe anything done would need to modify the actual C code for the
scheduler.
It probably is not _too_ bad to have a situation wherein when a job
"suggesting" avx
starts to run, it picks any avx nodes currently available to it first.  
But that
is likely to have only limited success.  The closest thing currently in 
the
Slurm code base is the stuff for attempting to keep the nodes for a job 
on the
same leaf-switch (e.g. https://slurm.schedmd.com/topology.html), but I 
suspect
that would be quite complicated to handle across a number of 
"features".


[slurm-dev] Re: Re:Best Way to Schedule Jobs based on predetermined Lists

2017-04-10 Thread Thomas M. Payerle




On 2017-04-05 16:00, maviko.wag...@fau.de wrote:

Hello Dani and Thomas,

[ ...]

However i think i did not specify clear enough what my cluster looks
like and what i'm trying to achieve.
Compared to a regular HPC-Cluster my testing cluster consists of as
little as 5 nodes (each having the same "grandscale"-features, so no
IB-nodes etc., and only differ in hardware details like cpu, #RAM
etc., including some MCUs akin to a raspberry pi).
The purpose of this cluster is to investigate how smart distribution
of workloads based on predetermined performance and energy data can
benefit hpc-clusters that consist of heterogenous systems that differ
greatly regarding energy consumption and performance.
Its just a small research project.


IF your only intent is to do this on your 5 node test cluster, you
probably do not need Slurm.  If you are looking to have something expand
to real clusters, then you really should be using something like features
and the scheduler.  The scheduler is already taking into account what nodes
provide what resources, what resources on the nodes are currently available
for use, and handling the complex (and generally, at least I found it to
be much more complex than I initially and naively thought it should be when
first started thinking about it) task of scheduling the jobs.

My IB example was just an example.  You could just as easily assign 
Slurm "features" based on the feature set of the CPU, etc.  E.g., if only

some nodes have CPUs with support AVX, label those nodes as "avx" and jobs
requiring AVX can restrict themselves to such nodes.

If you start specifying specific nodes in requests to the scheduler, you
are going to find yourself working against the scheduler, and that is not
likely to have a good outcome.   You are better off telling Slurm which 
nodes have which features (essentially a one-time configuration of the cluster)

and then have your code translate the requirements into a list of "features"
requested for the job under Slurm.

The only part that I see as a potential major problem is that, as I tried to
explain previously, the "features" requested by a job are REQUIREMENTS to
Slurm, not SUGGESTIONS.  E.g., if a job can run either with or without AVX
support, but runs better with AVX support, requiring the "avx" feature will
force the job to wait for a node supporting AVX, even if all the AVX nodes
are in use and there are plenty of non-AVX nodes which are idle.

I am not aware of anything in Slurm which handles such as SUGGESTIONS, and
doing such I believe would greatly complicate an already complex algorithm.
I believe anything done would need to modify the actual C code for the 
scheduler.
It probably is not _too_ bad to have a situation wherein when a job 
"suggesting" avx
starts to run, it picks any avx nodes currently available to it first.  But that
is likely to have only limited success.  The closest thing currently in the
Slurm code base is the stuff for attempting to keep the nodes for a job on the
same leaf-switch (e.g. https://slurm.schedmd.com/topology.html), but I suspect
that would be quite complicated to handle across a number of "features".



[slurm-dev] Re: Re:Best Way to Schedule Jobs based on predetermined Lists

2017-04-05 Thread Benjamin Redling

Am 05.04.2017 um 15:58 schrieb maviko.wag...@fau.de:
[...]
> The purpose of this cluster is to investigate how smart distribution of
> workloads based on predetermined performance and energy data can benefit
> hpc-clusters that consist of heterogenous systems that differ greatly
> regarding energy consumption and performance.
> Its just a small research project.
[...]
> I already have all the data i need and now just need to find a way to
> integrate node selection based on these priority lists into Slurm.
> 
> My idea is to write a plugin that, on job submission to slurm, reads
> those lists, makes a smart selection based on different criteria which
> currently available node would be suited best, and forwards the job to
> that note. Partition-Selection is not needed since i run all Nodes in
> one partition for easier usage.
> The only information my plugin needs to forward besides nodename is some
> small config params in the form of Environment Variables on the target
> machine.

Sorry, but that sounds like trying to be more clever than the existing
scheduler.
(Occasionally somebody asks on this list for any details on scheduler
development -- without deeper knowledge of slurm -- and as to expect:
nothing to be heard again... Maybe start small?)

How about providing the necessary factors to the scheduler via a plugin.
Than everybody could incorporate that via multifactor to one owns heart.
That would be really useful.

> So far i did those job requests manually via:
> srun -w --export="",... 
> 
> I would like to include functionality into slurm so upon a simple "srun
> " it supplies node selection and matching envVars automatically.
> Based on my current knowledge of slurms architecture, a plugin (either
> select, or schedule) seems to be the apparent fit for what i'm trying to
> achieve.

> However, as stated in my first mail, i have not dabbled with plugin
> development/editing yet and kindly ask for advice from someone more
> experienced with that if indeed i pursue the correct approach.
> Or if a frontend solution, albeit less elegant, would be both easier and
> better fitting for the purpose of this project.

Apart from "power save" there is already infrastructure for "power
management":
https://slurm.schedmd.com/power_mgmt.html

Is yours the future non-Cray plugin?
Hope so.

All the best,
Benjamin
-- 
FSU Jena | JULIELab.de/Staff/Benjamin+Redling.html
vox: +49 3641 9 44323 | fax: +49 3641 9 44321