Re: [slurm-users] About x11 support

2018-11-20 Thread Marcus Wagner
Hi Chris, On 11/20/2018 09:09 PM, Chris Samuel wrote: On Wednesday, 21 November 2018 12:16:04 AM AEDT Mahmood Naderan wrote: So, I am *guessing* that the latest version of slurm is not compatible with 1804 from Centos. In other word, something has been added/fixed in the ssh library which is n

Re: [slurm-users] Slurm Accounting Question

2018-11-20 Thread Loris Bennett
Hi Douglas, Douglas Duckworth writes: > Hi > > We are in the process of migrating several clusters from SGE to Slurm. > > We discovered that accounting does not show what command a previous > job ran. For currently running jobs scontrol for example will show: > > JobId=33028 JobName=bash > Comm

Re: [slurm-users] Excessive use of backfill on a cluster

2018-11-20 Thread Chris Samuel
On Tuesday, 20 November 2018 11:42:49 PM AEDT Baker D. J. wrote: > We are running Slurm 18.08.0 on our cluster and I am concerned that Slurm > appears to be using backfill scheduling excessively. What are your SchedulerParameters ? All the beest, Chris -- Chris Samuel : http://www.csamuel.o

Re: [slurm-users] About x11 support

2018-11-20 Thread Chris Samuel
On Wednesday, 21 November 2018 12:16:04 AM AEDT Mahmood Naderan wrote: > So, I am *guessing* that the latest version of slurm is not compatible with > 1804 from Centos. In other word, something has been added/fixed in the ssh > library which is now causing some mismatches. It's not getting that f

Re: [slurm-users] About x11 support

2018-11-20 Thread Chris Samuel
On Wednesday, 21 November 2018 2:27:15 AM AEDT Christopher Benjamin Coffey wrote: > Are you using the built in slurm x11 support? Or that spank plugin? We > haven't been able to get the right combo of things in place to get the > built in x11 to work. We're using the built in X11 support with SS

[slurm-users] Slurm Accounting Question

2018-11-20 Thread Douglas Duckworth
Hi We are in the process of migrating several clusters from SGE to Slurm. We discovered that accounting does not show what command a previous job ran. For currently running jobs scontrol for example will show: JobId=33028 JobName=bash Command=bash Yet we would like to have that stored in acco

Re: [slurm-users] Excessive use of backfill on a cluster

2018-11-20 Thread Loris Bennett
Hi David, We have PriorityType=priority/multifactor PriorityDecayHalfLife=14-0 PriorityWeightFairshare=1000 PriorityWeightAge=1 PriorityWeightPartition=1 PriorityWeightJobSize=0 PriorityWeightQOS=1 PriorityMaxAge=7-0 PriorityCalcPeriod=5 SchedulerType=sched/bac

Re: [slurm-users] Excessive use of backfill on a cluster

2018-11-20 Thread Baker D . J .
Hello, Thank you for your reply and for the explanation. That makes sense -- your explanation of backfill is as we expected. I think it's more that we are surprised that almost all our jobs were being scheduled using backfill. We very rarely see any being scheduled normally. It could be that w

Re: [slurm-users] About x11 support

2018-11-20 Thread Christopher Benjamin Coffey
Hi Chris, Are you using the built in slurm x11 support? Or that spank plugin? We haven't been able to get the right combo of things in place to get the built in x11 to work. Best, Chris — Christopher Coffey High-Performance Computing Northern Arizona University 928-523-1167 On 11/15/18, 5

Re: [slurm-users] Excessive use of backfill on a cluster

2018-11-20 Thread Bjørn-Helge Mevik
It might be unrelated, but I remember we had some similar problems when setting up a new cluster two years ago. I don't remember the details, but I believe it was related to qos'es overriding partition limits. Jobs in these qos'es (with requests that exceeded a partition limit like the minimum num

Re: [slurm-users] Excessive use of backfill on a cluster

2018-11-20 Thread Loris Bennett
Hi David, Baker D.J. writes: > Hello, > > We are running Slurm 18.08.0 on our cluster and I am concerned that > Slurm appears to be using backfill scheduling excessively. In fact the > vast majority of jobs are being scheduled using backfill. So, for > example, I have just submitted a set of thr

Re: [slurm-users] About x11 support

2018-11-20 Thread Mahmood Naderan
I think I know what is going wrong. Actually the bug is not related to slurm or rocks itself. It is a result of some mismatches due to the update of softwares including ssh, centos, rocks and slurm. Recently, I have updated my rocks using "yum update". The result was fetching the latest packages o

[slurm-users] Excessive use of backfill on a cluster

2018-11-20 Thread Baker D . J .
Hello, We are running Slurm 18.08.0 on our cluster and I am concerned that Slurm appears to be using backfill scheduling excessively. In fact the vast majority of jobs are being scheduled using backfill. So, for example, I have just submitted a set of three serial jobs. They all started on a c

Re: [slurm-users] About x11 support

2018-11-20 Thread Chris Samuel
On Tuesday, 20 November 2018 2:51:26 AM AEDT Mahmood Naderan wrote: > With and without --x11, I am not able to see xclock on a compute node. > > [mahmood@rocks7 ~]$ srun --x11 --nodelist=compute-0-3 -n 1 -c 6 --mem=8G -A > y8 -p RUBY xclock > srun: error: Cannot forward to local display. Can only

Re: [slurm-users] Slurm missing non primary group memberships

2018-11-20 Thread Chris Samuel
On Tuesday, 20 November 2018 10:12:59 PM AEDT Janne Blomqvist wrote: > I reworked the logic so that it should only be required in some special > weird cases. But that patch was several years ago, hopefully whatever > bugs were caused by it have been ironed out by now (*knocking on wood*). It's wo

Re: [slurm-users] Slurm missing non primary group memberships

2018-11-20 Thread Janne Blomqvist
On 10/11/2018 13.17, Douglas Jacobsen wrote: We've had issues getting sssd to work reliably on compute nodes (at least at scale), the reason is not fully understood, but basically if the connection times out with sssd it'll black list the server for 60s, which then causes those kinds of issues.