[slurm-users] How to queue jobs based on non-existent features

2020-07-09 Thread Raj Sahae
Hi all,

My apologies if this is sent twice. The first time I sent it without my 
subscription to the list being complete.

I am attempting to use Slurm as a test automation system for its fairly 
advanced queueing and job control abilities, and also because it scales very 
well.
However, since our use case is a bit outside the standard usage of Slurm, we 
are hitting some issues that don’t appear to have obvious solutions.

In our current setup, the Slurm nodes are hosts attached to a test system. Our 
pipeline (greatly simplified) would be to install some software on the test 
system and then run sets of tests against it.
In our old pipeline, this was done in a single job, however with Slurm I was 
hoping to decouple these two actions as it makes the entire pipeline more 
robust to update failures and would give us more finely grained job control for 
the actual test run.

I would like to allow users to queue jobs with constraints indicating which 
software version they need. Then separately some automated job would scan the 
queue, see jobs that are not being allocated due to missing resources, and 
queue software installs appropriately. We attempted to do this using the 
Active/Available Features configuration. We use HealthCheck and Epilog scripts 
to scrape the test system for software properties (version, commit, etc.) and 
assign them as Features. Once an install is complete and the Features are 
updated, queued jobs would start to be allocated on those nodes.

Herein lies the conundrum. If a user submits a job, constraining to run on 
Version A, but all nodes in the cluster are currently configured with 
Features=Version-B, Slurm will fail to queue the job, indicating an invalid 
feature specification. I completely understand why Features are implemented 
this way, so my question is, is there some workaround or other Slurm 
capabilities that I could use to achieve this behavior? Otherwise my options 
seem to be:

  1.  Go back to how we did it before. The pipeline would have the same level 
of robustness as before but at least we would still be able to leverage other 
queueing capabilities of Slurm.
  2.  Write our own Feature or Job Submit plugin that customizes this behavior 
just for us. Seems possible but adds lead time and complexity to the situation.

It's not feasible to update the config for all branches/versions/commits to be 
AvailableFeatures, as our branch ecosystem is quite large and the maintenance 
of that approach would not scale well.

Thanks,

Raj Sahae  |  Manager, Software QA
3500 Deer Creek Rd, Palo Alto, CA 94304
m. +1 (408) 230-8531  | 
rsa...@tesla.commailto:rsa...@tesla.com%3E>

[cid:image001.png@01D6560C.399F5D30]



Re: [slurm-users] priority/multifactor, sshare, and AccountingStorageEnforce

2020-07-09 Thread Paul Edmon
Try setting RawShares to something greater than 1.  I've seen it be the 
case then when you set 1 it creates weirdness like this.



-Paul Edmon-


On 7/9/2020 1:12 PM, Dumont, Joey wrote:


Hi,


We recently set up fair tree scheduling (we have 19.05 running), and 
are trying to use sshare to see usage information. Unfortunately, 
sshare reports all zeros, even though there seems to be data in the 
backend DB. Here's an example output:



$ sshare -l
           Account       User  RawShares  NormShares RawUsage  
 NormUsage  EffectvUsage  FairShare    LevelFS               
GrpTRESMins                    TRESRunMins
 -- -- --- --- 
--- - -- -- 
-- --
root                                                            0     
            0.00              0.00                 
cpu=0,mem=0,energy=0,node=0,b+
 covid                                  1                       0     
          0.00              0.00               
cpu=0,mem=0,energy=0,node=0,b+
covid-01                               1  0                  0.00  
            0.00                     cpu=0,mem=0,energy=0,node=0,b+
covid-02                               1  0                  0.00  
            0.00                     cpu=0,mem=0,energy=0,node=0,b+
 group1                                 1                       0     
        0.00              0.00             
cpu=0,mem=0,energy=0,node=0,b+
subgroup1                              1  0                  0.00  
            0.00                     cpu=0,mem=0,energy=0,node=0,b+
 othersubgroups                        1  0                  0.00  
            0.00                     cpu=0,mem=0,energy=0,node=0,b+
 othersubgroups                        1  0                  0.00  
            0.00                     cpu=0,mem=0,energy=0,node=0,b+
 othersubgroups                        1  0                  0.00  
            0.00                     cpu=0,mem=0,energy=0,node=0,b+
 othersubgroups                        1  0                  0.00  
            0.00                     cpu=0,mem=0,energy=0,node=0,b+
 othersubgroups                        1  0                  0.00  
            0.00                     cpu=0,mem=0,energy=0,node=0,b+
 othersubgroups                        1  0                  0.00  
            0.00                     cpu=0,mem=0,energy=0,node=0,b+
 othersubgroups                        1  0                  0.00  
            0.00                     cpu=0,mem=0,energy=0,node=0,b+
 othersubgroups                        1  0                  0.00  
            0.00                     cpu=0,mem=0,energy=0,node=0,b+
 othersubgroups                        1  0                  0.00  
            0.00                     cpu=0,mem=0,energy=0,node=0,b+
 othersubgroups                        1  0                  0.00  
            0.00                     cpu=0,mem=0,energy=0,node=0,b+
subgroups                              1  0                  0.00  
            0.00                     cpu=0,mem=0,energy=0,node=0,b+
subgroups                              4  0                  0.00  
            0.00                     cpu=0,mem=0,energy=0,node=0,b+
subgroups                              1  0                  0.00  
            0.00                     cpu=0,mem=0,energy=0,node=0,b+
 SUBGROUP                               1                       0     
      0.00              0.00           
cpu=0,mem=0,energy=0,node=0,b+
 SUBGROUP                               1                       0     
      0.00              0.00           
cpu=0,mem=0,energy=0,node=0,b+




And the slurm.conf config:


ClusterName=trixie
SlurmctldHost=trixie(10.10.0.11)
SlurmctldHost=hn2(10.10.0.12)
GresTypes=gpu
SlurmUser=slurm
SlurmctldPort=6817
SlurmdPort=6818
AuthType=auth/munge
StateSaveLocation=/gpfs/share/slurm/
SlurmdSpoolDir=/var/spool/slurm/d
SwitchType=switch/none
MpiDefault=none
SlurmctldPidFile=/var/run/slurmctld.pid
SlurmdPidFile=/var/run/slurmd.pid
ProctrackType=proctrack/cgroup
ReturnToService=2
PrologFlags=x11
TaskPlugin=task/cgroup

# TIMERS
SlurmctldTimeout=60
SlurmdTimeout=300
InactiveLimit=0
MinJobAge=300
KillWait=30
Waittime=0
#

# SCHEDULING
SchedulerType=sched/backfill
SelectType=select/cons_res
SelectTypeParameters=CR_Core_Memory
FastSchedule=1

SchedulerParameters=bf_interval=60,bf_continue,bf_resolution=600,bf_window=2880,bf_max_job_test=5000,bf_max_job_part=1000,bf_max_job_user=10,bf_max_job_start=100

PriorityType=priority/multifactor
PriorityDecayHalfLife=14-0
PriorityWeightFairshare=10
PriorityWeightAge=1000
PriorityWeightPartition=1
PriorityWeightJobSize=1000
PriorityMaxAge=1-0

# LOGGING
SlurmctldDebug=3

[slurm-users] priority/multifactor, sshare, and AccountingStorageEnforce

2020-07-09 Thread Dumont, Joey
Hi,


We recently set up fair tree scheduling (we have 19.05 running), and are trying 
to use sshare to see usage information. Unfortunately, sshare reports all 
zeros, even though there seems to be data in the backend DB. Here's an example 
output:


$ sshare -l
 Account   User  RawShares  NormSharesRawUsage   NormUsage  
EffectvUsage  FairShareLevelFSGrpTRESMins   
 TRESRunMins
 -- -- --- --- --- 
- -- -- -- 
--
root 0  
0.00  0.00
cpu=0,mem=0,energy=0,node=0,b+
 covid   1   0  
0.00  0.00
cpu=0,mem=0,energy=0,node=0,b+
  covid-01   1   0  
0.00  0.00
cpu=0,mem=0,energy=0,node=0,b+
  covid-02   1   0  
0.00  0.00
cpu=0,mem=0,energy=0,node=0,b+
 group1  1   0  
0.00  0.00
cpu=0,mem=0,energy=0,node=0,b+
  subgroup1  1   0  
0.00  0.00
cpu=0,mem=0,energy=0,node=0,b+
   othersubgroups1   0  
0.00  0.00
cpu=0,mem=0,energy=0,node=0,b+
   othersubgroups1   0  
0.00  0.00
cpu=0,mem=0,energy=0,node=0,b+
   othersubgroups1   0  
0.00  0.00
cpu=0,mem=0,energy=0,node=0,b+
   othersubgroups1   0  
0.00  0.00
cpu=0,mem=0,energy=0,node=0,b+
   othersubgroups1   0  
0.00  0.00
cpu=0,mem=0,energy=0,node=0,b+
   othersubgroups1   0  
0.00  0.00
cpu=0,mem=0,energy=0,node=0,b+
   othersubgroups1   0  
0.00  0.00
cpu=0,mem=0,energy=0,node=0,b+
   othersubgroups1   0  
0.00  0.00
cpu=0,mem=0,energy=0,node=0,b+
   othersubgroups1   0  
0.00  0.00
cpu=0,mem=0,energy=0,node=0,b+
   othersubgroups1   0  
0.00  0.00
cpu=0,mem=0,energy=0,node=0,b+
  subgroups  1   0  
0.00  0.00
cpu=0,mem=0,energy=0,node=0,b+
  subgroups  4   0  
0.00  0.00
cpu=0,mem=0,energy=0,node=0,b+
  subgroups  1   0  
0.00  0.00
cpu=0,mem=0,energy=0,node=0,b+
 SUBGROUP1   0  
0.00  0.00
cpu=0,mem=0,energy=0,node=0,b+
 SUBGROUP1   0  
0.00  0.00
cpu=0,mem=0,energy=0,node=0,b+



And the slurm.conf config:


ClusterName=trixie
SlurmctldHost=trixie(10.10.0.11)
SlurmctldHost=hn2(10.10.0.12)
GresTypes=gpu
SlurmUser=slurm
SlurmctldPort=6817
SlurmdPort=6818
AuthType=auth/munge
StateSaveLocation=/gpfs/share/slurm/
SlurmdSpoolDir=/var/spool/slurm/d
SwitchType=switch/none
MpiDefault=none
SlurmctldPidFile=/var/run/slurmctld.pid
SlurmdPidFile=/var/run/slurmd.pid
ProctrackType=proctrack/cgroup
ReturnToService=2
PrologFlags=x11
TaskPlugin=task/cgroup

# TIMERS
SlurmctldTimeout=60
SlurmdTimeout=300
InactiveLimit=0
MinJobAge=300
KillWait=30
Waittime=0
#

# SCHEDULING
SchedulerType=sched/backfill

Re: [slurm-users] changes in slurm.

2020-07-09 Thread Brian Andrus

Navin,

1. you will need to restart slurmctld when you make changes to the 
physical definition of a node. This can be done without affecting 
running jobs.


2. You can have a node in more than one partition. That will not hurt 
anything. Jobs are allocated to nodes, not partitions, the partition is 
used to determine which node(s) and filter/order jobs. You should add 
the node to the new partition, but also leave it in the 'test' 
partition. If you are looking to remove the 'test' partition, set it to 
down and once all the running jobs that are in it finish, then remove it.


Brian Andrus

On 7/8/2020 10:57 PM, navin srivastava wrote:

Hi Team,

i have 2 small query.because of the lack of testing environment i am 
unable to test the scenario. working on to set up a test environment.


1. In my environment i am unable to pass #SBATCH --mem-2GB option.
i found the reason is because there is no RealMemory entry in the node 
definition of the slurm.


NodeName=Node[1-12] NodeHostname=deda1x[1450-1461] NodeAddr=Node[1-12] 
Sockets=2 CoresPerSocket=10 State=UNKNOWN


if i add the RealMemory it should be able to pick. So my query here 
is, is it possible to add RealMemory in the definition anytime while 
the jobs are in progres and execute the scontrol reconfigure and 
reload the daemon on client node?  or do we need to take a 
downtime?(which i don't think so)


2. Also I would like to know what will happen if some jobs are running 
in a partition(say test) and I will move the associated node to some 
other partition(say normal) without draining the node.or if i suspend 
the job and then change the node partition and will resume the job. I 
am not deleting the partition here.


Regards
Navin.











Re: [slurm-users] Automatically stop low priority jobs when submitting high priority jobs

2020-07-09 Thread Durai Arasan
Hi,

Please see job preemption:
https://slurm.schedmd.com/preempt.html

Best,
Durai Arasan
Zentrum für Datenverarbeitung
Tübingen

On Tue, Jul 7, 2020 at 6:45 PM zaxs84  wrote:

> Hi all.
>
> Is there a scheduler option that allows low priority jobs to be
> immediately paused (or even stopped) when jobs with higher priority are
> submitted?
>
> Related to this question, I am also a bit confused about how "scontrol
> suspend" works; my understanding is that a job that gets suspended receives
> a SIGSTOP signal, and a SIGCONT signal when resumed. If that's the case, is
> the user's process that needs to implement the code needed to catch those
> signals and free the resources, like memory and cpu?
>
> Thanks a lot.
> M.
>


[slurm-users] changes in slurm.

2020-07-09 Thread navin srivastava
Hi Team,

i have 2 small query.because of the lack of testing environment i am unable
to test the scenario. working on to set up a test environment.

1. In my environment i am unable to pass #SBATCH --mem-2GB option.
i found the reason is because there is no RealMemory entry in the node
definition of the slurm.

NodeName=Node[1-12] NodeHostname=deda1x[1450-1461] NodeAddr=Node[1-12]
Sockets=2 CoresPerSocket=10 State=UNKNOWN

if i add the RealMemory it should be able to pick. So my query here is, is
it possible to add RealMemory in the definition anytime while the jobs are
in progres and execute the scontrol reconfigure and reload the daemon on
client node?  or do we need to take a downtime?(which i don't think so)

2. Also I would like to know what will happen if some jobs are running in a
partition(say test) and I will move the associated node to some other
partition(say normal) without draining the node.or if i suspend the job and
then change the node partition and will resume the job. I am not deleting
the partition here.

Regards
Navin.