Re: [slurm-users] slurm only looking in "default" partition during scheduling

2020-07-06 Thread Williams, Jenny Avis
You cannot have two default partitions.

The slurm.conf is picking up the last of the entries flagged as Default; 
because the compute partition has no partition specified it is being sent to 
the default partition, thus the first srun is being submitted to the compute 
partition, and that partition has only a node with no gpu GRES defined, thus 
the srun is correct to fail.

What is the output of the command
sinfo

The partition understood to be the default will be indicated with * after the 
partition name.




From: slurm-users  On Behalf Of Durai 
Arasan
Sent: Tuesday, May 12, 2020 10:47 AM
To: slurm-users@lists.schedmd.com
Cc: benjamin.glaes...@uni-tuebingen.de
Subject: [slurm-users] slurm only looking in "default" partition during 
scheduling

Hi,
We have a cluster with 2 slave nodes. These are the slurm.conf lines describing 
nodes and partitions:

NodeName=slurm-gpu-1 NodeAddr=192.168.0.200  Procs=16 Gres=gpu:2 State=UNKNOWN
NodeName=slurm-gpu-2 NodeAddr=192.168.0.124  Procs=1 Gres=gpu:0 State=UNKNOWN
PartitionName=gpu Nodes=slurm-gpu-1 Default=YES MaxTime=INFINITE State=UP
PartitionName=compute Nodes=slurm-gpu-2 Default=YES MaxTime=INFINITE State=UP

Running sinfo gives the following:

PARTITION AVAIL  TIMELIMIT  NODES  STATE NODELIST
gpu  up   infinite  1   idle slurm-gpu-1
compute* up   infinite  1   idle slurm-gpu-2

When I request a gpu job to be run using the following command:

srun --gres=gpu:2 nvidia-smi

I get the error:

srun: error: Unable to allocate resources: Requested node configuration is not 
available

and in slurmctld.log these are the entries:

[2020-05-12T14:33:47.578] _pick_best_nodes: JobId=55 never runnable in 
partition compute
[2020-05-12T14:33:47.578] _slurm_rpc_allocate_resources: Requested node 
configuration is not available

It seems like slurm is looking only in the partition "compute" and not in the 
other partitions.
Even if I explicitly specify the gpu node to srun it fails:

srun --nodelist=slurm-gpu-1 nvidia-smi

I get the same error:

srun: error: Unable to allocate resources: Requested node configuration is not 
available

and in slurmctld.log:

[2020-05-12T14:38:57.242] No nodes satisfy requirements for JobId=56 in 
partition compute
[2020-05-12T14:38:57.242] _slurm_rpc_allocate_resources: Requested node 
configuration is not available

It is still looking in partition "compute" even after specifying the node to 
srun.

But when I specify a partition, it works:

srun -p gpu nvidia-smi

But I would not like to specify the partition and would like slurm to select 
nodes based on the options specified in the srun command. Does anyone 
understand what is wrong in the setup?

Thanks,
Durai






Re: [slurm-users] save job comment into job completion data

2020-07-06 Thread Fred Liu
I put job completion info into a MySQL database from which I can keep a 
thorough job history. The job accounting (without -c) database can’t hold a 
long history.


发件人: slurm-users  代表 Ole Holm Nielsen 

发送时间: 星期一, 七月 6, 2020 4:01 下午
收件人: slurm-users@lists.schedmd.com
主题: Re: [slurm-users] save job comment into job completion data

On 7/5/20 5:42 PM, Fred Liu wrote:
> It looks job comment won't be saved into job completion data, for I can't see 
> it when I use sacct -c
> But I can see it when I use sacct(without -c).
>
> Is it possible to make it work?

The sacct manual page explains the -c parameter:

-c, --completion
Use job completion data instead of job accounting. The JobCompType
parameter in the slurm.conf file must be defined to a non-none option.

I'm not familiar with the job comment field, but you say that it is
printed without -c with the job accounting data. Why do you want or
expect this to work with the job completion data?


/Ole



Re: [slurm-users] save job comment into job completion data

2020-07-06 Thread Ole Holm Nielsen

On 7/5/20 5:42 PM, Fred Liu wrote:

It looks job comment won't be saved into job completion data, for I can't see 
it when I use sacct -c
But I can see it when I use sacct(without -c).

Is it possible to make it work?


The sacct manual page explains the -c parameter:

  -c, --completion
 Use job completion data instead of job accounting.  The JobCompType 
parameter in the slurm.conf file must be defined to a non-none option.


I'm not familiar with the job comment field, but you say that it is 
printed without -c with the job accounting data.  Why do you want or 
expect this to work with the job completion data?



/Ole