Hi Chris,
On 11/20/2018 09:09 PM, Chris Samuel wrote:
On Wednesday, 21 November 2018 12:16:04 AM AEDT Mahmood Naderan wrote:
So, I am *guessing* that the latest version of slurm is not compatible with
1804 from Centos. In other word, something has been added/fixed in the ssh
library which is n
Hi Douglas,
Douglas Duckworth writes:
> Hi
>
> We are in the process of migrating several clusters from SGE to Slurm.
>
> We discovered that accounting does not show what command a previous
> job ran. For currently running jobs scontrol for example will show:
>
> JobId=33028 JobName=bash
> Comm
On Tuesday, 20 November 2018 11:42:49 PM AEDT Baker D. J. wrote:
> We are running Slurm 18.08.0 on our cluster and I am concerned that Slurm
> appears to be using backfill scheduling excessively.
What are your SchedulerParameters ?
All the beest,
Chris
--
Chris Samuel : http://www.csamuel.o
On Wednesday, 21 November 2018 12:16:04 AM AEDT Mahmood Naderan wrote:
> So, I am *guessing* that the latest version of slurm is not compatible with
> 1804 from Centos. In other word, something has been added/fixed in the ssh
> library which is now causing some mismatches.
It's not getting that f
On Wednesday, 21 November 2018 2:27:15 AM AEDT Christopher Benjamin Coffey
wrote:
> Are you using the built in slurm x11 support? Or that spank plugin? We
> haven't been able to get the right combo of things in place to get the
> built in x11 to work.
We're using the built in X11 support with SS
Hi
We are in the process of migrating several clusters from SGE to Slurm.
We discovered that accounting does not show what command a previous job ran.
For currently running jobs scontrol for example will show:
JobId=33028 JobName=bash
Command=bash
Yet we would like to have that stored in acco
Hi David,
We have
PriorityType=priority/multifactor
PriorityDecayHalfLife=14-0
PriorityWeightFairshare=1000
PriorityWeightAge=1
PriorityWeightPartition=1
PriorityWeightJobSize=0
PriorityWeightQOS=1
PriorityMaxAge=7-0
PriorityCalcPeriod=5
SchedulerType=sched/bac
Hello,
Thank you for your reply and for the explanation. That makes sense -- your
explanation of backfill is as we expected. I think it's more that we are
surprised that almost all our jobs were being scheduled using backfill. We very
rarely see any being scheduled normally. It could be that w
Hi Chris,
Are you using the built in slurm x11 support? Or that spank plugin? We haven't
been able to get the right combo of things in place to get the built in x11 to
work.
Best,
Chris
—
Christopher Coffey
High-Performance Computing
Northern Arizona University
928-523-1167
On 11/15/18, 5
It might be unrelated, but I remember we had some similar problems when
setting up a new cluster two years ago. I don't remember the details,
but I believe it was related to qos'es overriding partition limits.
Jobs in these qos'es (with requests that exceeded a partition limit like
the minimum num
Hi David,
Baker D.J. writes:
> Hello,
>
> We are running Slurm 18.08.0 on our cluster and I am concerned that
> Slurm appears to be using backfill scheduling excessively. In fact the
> vast majority of jobs are being scheduled using backfill. So, for
> example, I have just submitted a set of thr
I think I know what is going wrong. Actually the bug is not related to
slurm or rocks itself. It is a result of some mismatches due to the update
of softwares including ssh, centos, rocks and slurm.
Recently, I have updated my rocks using "yum update". The result was
fetching the latest packages o
Hello,
We are running Slurm 18.08.0 on our cluster and I am concerned that Slurm
appears to be using backfill scheduling excessively. In fact the vast majority
of jobs are being scheduled using backfill. So, for example, I have just
submitted a set of three serial jobs. They all started on a c
On Tuesday, 20 November 2018 2:51:26 AM AEDT Mahmood Naderan wrote:
> With and without --x11, I am not able to see xclock on a compute node.
>
> [mahmood@rocks7 ~]$ srun --x11 --nodelist=compute-0-3 -n 1 -c 6 --mem=8G -A
> y8 -p RUBY xclock
> srun: error: Cannot forward to local display. Can only
On Tuesday, 20 November 2018 10:12:59 PM AEDT Janne Blomqvist wrote:
> I reworked the logic so that it should only be required in some special
> weird cases. But that patch was several years ago, hopefully whatever
> bugs were caused by it have been ironed out by now (*knocking on wood*).
It's wo
On 10/11/2018 13.17, Douglas Jacobsen wrote:
We've had issues getting sssd to work reliably on compute nodes (at
least at scale), the reason is not fully understood, but basically if
the connection times out with sssd it'll black list the server for 60s,
which then causes those kinds of issues.
16 matches
Mail list logo