Re: [slurm-users] Free Gres resources

2018-02-13 Thread Nadav Toledo

  
  
This solution is even better.
I am actually using pestat for my (as admin) needs.
But I originally asked the question in order to enhance the ability
of slurm_exporter which is a client side code for prometheus/grafana
that export slurm statistics to be read as graphs.
for this need squeue -o %b is enough
But I am sure there is a need for pestat to print the gres info as
well, you already atleast helping yair and myself.

Thanks, Nadav
On 13/02/2018 17:41, Ole Holm Nielsen
  wrote:

On 02/13/2018 08:13 AM, Nadav Toledo wrote:> Does
  anyone know of way to get amount of idle gpu per node or for all
  
  cluster ?


sinfo -o %G gives the total amount of gres resource for each
node. Is there a way to get the idle amount same as you can get
for cpu (%C)?

Perhaps if one use lock file like /dev/nvidia# for each gpu you
can check their states?

  
  
  I think printing the GRES usage for nodes is a neat idea.  So I've
  added a flag "-G" to my pestat command so that the GRES usage for
  each job on each node is printed.  The squeue command can print
  GRES usage using -o %b.
  
  
  Could you give pestat a try to see if it fits your needs:
  
  https://github.com/OleHolmNielsen/Slurm_tools/tree/master/pestat
  
  
  Just run "pestat -G" on your Slurm cluster.
  
  
  At the moment pestat doesn't print a column of total configured
  GRES in the node, but this could be added if there is interest.
  
  
  Please send me feedback and comments about pestat.
  
  
  /Ole
  
  


  




Re: [slurm-users] Free Gres resources

2018-02-13 Thread Ole Holm Nielsen
On 02/13/2018 08:13 AM, Nadav Toledo wrote:> Does anyone know of way to 
get amount of idle gpu per node or for all

cluster ?

sinfo -o %G gives the total amount of gres resource for each node. Is 
there a way to get the idle amount same as you can get for cpu (%C)?
Perhaps if one use lock file like /dev/nvidia# for each gpu you can 
check their states?


I think printing the GRES usage for nodes is a neat idea.  So I've added 
a flag "-G" to my pestat command so that the GRES usage for each job on 
each node is printed.  The squeue command can print GRES usage using -o %b.


Could you give pestat a try to see if it fits your needs:
https://github.com/OleHolmNielsen/Slurm_tools/tree/master/pestat

Just run "pestat -G" on your Slurm cluster.

At the moment pestat doesn't print a column of total configured GRES in 
the node, but this could be added if there is interest.


Please send me feedback and comments about pestat.

/Ole



Re: [slurm-users] Free Gres resources

2018-02-13 Thread Nadav Toledo

  
  
Thanks ,that might be enough I will check it out

On 13/02/2018 16:33, Yair Yarom wrote:


  
Hi,

I haven't found a direct way. Here I have my own script that parses the
output of "scontrol show node" and "scontrol show job", summing up and
displaying the allocated gres.

Yair.

On Tue, Feb 13 2018, Nadav Toledo  wrote:


  
Hello everyone,

Does anyone know of way to get amount of idle gpu per node or for all cluster ?

sinfo -o %G gives the total amount of gres resource for each node. Is there a
way to get the idle amount same as you can get for cpu (%C)?
Perhaps if one use lock file like /dev/nvidia# for each gpu you can check their
states?

Thanks in advance, Nadav

  
  



  




Re: [slurm-users] Free Gres resources

2018-02-13 Thread Yair Yarom

Hi,

I haven't found a direct way. Here I have my own script that parses the
output of "scontrol show node" and "scontrol show job", summing up and
displaying the allocated gres.

Yair.

On Tue, Feb 13 2018, Nadav Toledo  wrote:

> Hello everyone,
>
> Does anyone know of way to get amount of idle gpu per node or for all cluster 
> ?
>
> sinfo -o %G gives the total amount of gres resource for each node. Is there a
> way to get the idle amount same as you can get for cpu (%C)?
> Perhaps if one use lock file like /dev/nvidia# for each gpu you can check 
> their
> states?
>
> Thanks in advance, Nadav



[slurm-users] Free Gres resources

2018-02-12 Thread Nadav Toledo

  
  
Hello everyone,

Does anyone know of way to get amount of idle gpu per node or for
all cluster ?

sinfo -o %G gives the total amount of gres resource for each node.
Is there a way to get the idle amount same as you can get for cpu
(%C)?
Perhaps if one use lock file like /dev/nvidia# for each gpu you can
check their states?

Thanks in advance, Nadav