I certainly missed that. The documentation seems to still have the --x11 flag
in it (below).
Is there another way to use x11 via sbatch with similar behavior? This has
affected some users and I'd like to find a similar simple solution.
The man page (19.05.0/share/man/man1/sbatch.1) and
Hi Dean,
On Thu, 30 May 2019 at 07:30, Hidas, Dean
mailto:dhi...@bnl.gov>> wrote:
Is there any idea what I might have missed?
The release notes say that the new X11 code will not work with sbatch -
https://github.com/SchedMD/slurm/blob/master/RELEASE_NOTES
NOTE: The X11 forwarding code has
Hello,
I recently upgraded from 18.08.7 to 19.05.0 and at the moment it seems that the
following is not accepted in a sbatch script, although it was working fine for
us from 18x:
#SBATCH --x11=all
(or batch, first, last). sbatch exits with the following message:
sbatch: unrecognized option
I think this error usually means that on your node cn7 it has either the
wrong /etc/hosts or the wrong /etc/slurm/slurm.conf
E.g. try 'srun --nodelist=cn7 ping -c 1 cn7'
On Wed, May 29, 2019 at 6:00 AM Alexander Åhman
wrote:
> Hi,
> Have a very strange problem. The cluster has been working
I believe it is still the case, but I haven't tested it. I put this in
way back when partition_job_depth was first introduced (which was eons
ago now). We run about 100 or so partitions, so this has served us well
as a general rule. What happens is that if you set partition job depth
too
Hi Chad,
for us (also running slurm 17.11), the crucial point was the balance
between PriorityWeightFairshare, PriorityWeightAge and PriorityMaxAge.
We set the PriorityWeightAge high (higher than PriorityWeightFairshare,
in fact), so that even a job by some power user will eventually be the
Hi Paul,
I'm wondering about this part in your SchedulerParameters:
### default_queue_depth should be some multiple of the partition_job_depth,
### ideally number_of_partitions * partition_job_depth, but typically the
main
### loop exits prematurely if you go over about 400. A
I have tried to find a network error but can't see anything. Every node
I've tested has the same (and correct) view of things.
_On node cn7:_ (the problematic one)
em1: link/ether 50:9a:4c:79:31:4d inet 10.28.3.137/24
_On login machine:_
[alex@li1 ~]$ host cn7
cn7.ydesign.se has address
For reference we are running 18.08.7
-Paul Edmon-
On 5/29/19 10:39 AM, Paul Edmon wrote:
Sure. Here is what we have:
## Scheduling
#
### This section is specific to scheduling
### Tells the scheduler to enforce limits for all
Sure. Here is what we have:
## Scheduling #
### This section is specific to scheduling
### Tells the scheduler to enforce limits for all partitions
### that a job submits to.
EnforcePartLimits=ALL
### Let's slurm know that we have a
All,
We rushed our Slurm install due to a short timeframe and missed some important
items. We are now looking to implement a better system than the first in,
first out we have now. My question, are the defaults listed in the slurm.conf
file a good start? Would anyone be willing to share
I might look at these options:
*preempt_reorder_count=#*
Specify how many attempts should be made in reording preemptable
jobs to minimize the count of jobs preempted. The default value is
1. High values may adversely impact performance. The logic to
support this option is only
Ok thanks we will look into that! Thought we were the only ones who had the
problem and yes it's like windows 98SE, you can try all you want but
eventually we end up rebooting the nodes. Interns are starting to show up and
you know they can bend a cluster in ways you never seen before. We
Hi Alexander,
The error "can't find address for host cn7" would indicate a DNS
problem. What is the output of "host cn7" from the srun host li1?
How many network devices are in your subnet? It may be that the Linux
kernel is doing "ARP cache trashing" if the number of devices approaches
Hi,
Have a very strange problem. The cluster has been working just fine
until one node died and now I can't submit jobs to 2 of the nodes using
srun from the login machine. Using sbatch works just fine and also if I
use srun from the same host as slurmctld.
All the other nodes works just fine
I am relatively new to SLURM, and am having difficulty configuring our
scheduling to behave as we'd like.
Partition based job preemption is configured as follows:
PreemptType=preempt/partition_prio
PreemptMode=suspend,gang
This has been working fine. However, we recently added an older
Paddy Doyle writes:
> Hi Jacob,
>
> On Tue, May 28, 2019 at 11:38:23AM -0400, Jacob Chappell wrote:
>
>> Hello all,
>>
>> Is it possible in Slurm to check RawUsage against GrpTRESMins and prevent a
>> job from being submitted if the RawUsage exceeds the GrpTRESMins? My center
>> needs this
Hi,
Check the UnkillableStepProgram and UnkillableStepTimeout options in
slurm.conf.
We use it to drain the stuck nodes and mail us - as here, usually stuck
processes will require a reboot. As the drained strigger will never get
triggered, we also set a finished trigger for the next RUNNING job.
18 matches
Mail list logo