Thank you! It appears that setting SelectTypeParameters=CR_CPU does what I
want.
On Thu, Feb 9, 2017 at 11:55 AM, Allan Streib wrote:
>
> I just got something similar working. I used CR_CPU for
> SelectTypeParameters.
>
> Marc Rollins writes:
>
>
I just got something similar working. I used CR_CPU for SelectTypeParameters.
Marc Rollins writes:
> Hello,
>
> I am attempting to run multiple jobs simultaneously on a node with two GPUs.
> However, all my attempts fail. Both
> jobs are queued, but only one runs at
Hello,
I am attempting to run multiple jobs simultaneously on a node with two
GPUs. However, all my attempts fail. Both jobs are queued, but only one
runs at a time while the other remains in the queue. My SLURM configuration
is below. Any assistance is greatly appreciated.
# slurm.conf
I have used ulimits in the past to limit users to 768MB of RAM per process.
This seemed to be enough to run anything they were actually supposed to be
running. I would use cgroups on a more modern (this was RHEL5).
A related question: we used cgroups on a CentOS 6 system, but then switched our
If you're interested in the programmatic method I mentioned to increase
limits for file transfers,
https://github.com/BYUHPC/uft/tree/master/cputime_controls might be
worth looking at. It works well for us, though a user will occasionally
start using a new file transfer program that you
That reminds me, we also don't allow file transfers through the head node:
chmod 750 /usr/bin/sftp /usr/bin/scp /usr/bin/rsync
All file transfer operations must go through one of the file servers.
On 02/09/17 12:13, Nicholas McCollum wrote:
While this isn't a SLURM issue, it's something we
While this isn't a SLURM issue, it's something we all face. Due to my
system being primarily students, it's something I face a lot.
I second the use of ulimits, although this can kill off long running
file transfers. What you can do to help out users is set a low soft
limit and a somewhat
We limit the cpu times in /etc/security/limits.conf so that user processes have
a maximum of 10 minutes. It doesn't eliminate the problem completely, but it's
fairly effective on users who misunderstood the role of login nodes.
On Thu, Feb 9, 2017 at 6:38 PM +0100, "Jason Bacon"
We simply make it impossible to run computational software on the head
nodes.
1.No scientific software packages are installed on the local disk.
2.Our NFS-mounted application directory is mounted with noexec.
Regards,
Jason
On 02/09/17 07:09, John Hearns wrote:
Does anyone
Thanks to Ryan, Sarlo and Sean.
> "Killed" isn't usually a helpful error message that they understand.
Au contraire, I usually find that is a message they understand. Pour
encourarger les autres you understand.
-Original Message-
From: Ryan Cox [mailto:ryan_...@byu.edu]
Sent: 09
John,
We use /etc/security/limits.conf to set cputime limits on processes:
* hard cpu 60
root hard cpu unlimited
It works pretty well but long running file transfers can get killed. We
have a script that looks for whitelisted programs to remove the limit
from on a periodic basis. We
Hi Daniel,
Daniel Ruiz Molina writes:
> Hi,
>
> I'm adding user to accounts in accounting information. However, some users in
> my
> system have capital letters and when I try to add them to their account,
> sacctmgr returns this message: "There is no uid for user
Hi,
We use cgroups to limit usage to 3 cores and 4G of memory on the head nodes. I
didn't do it but will copy and paste in our documentation below.
Those limits, 3 cores are 4G are global to all non root users I think as they
apply to a group. We obviously don't do this on the nodes.
We also
You need that nvidia-module has been started since machine boot. This
script is worked for me
I use Centos. Yo can add it to init.d
Regards
El 09/02/17 a las 13:56, Christian Goll escribió:
Hello Daniel,
do /dev/nvidia[0-1] exist on the machines?
If not see under
Hi,
I'm adding user to accounts in accounting information. However, some
users in my system have capital letters and when I try to add them to
their account, sacctmgr returns this message: "There is no uid for user
'MY_USER' Are you sure you want to continue?".
Then, if I click "y", user is
Does anyone have a good suggestion for this problem?
On a cluster I am implementing I noticed a user is running a code on 16 cores,
on one of the login nodes, outside the batch system.
What are the accepted techniques to combat this? Other than applying a LART, if
you all know what this means.
Hello Daniel,
do /dev/nvidia[0-1] exist on the machines?
If not see under
http://docs.nvidia.com/cuda/cuda-installation-guide-linux/
there is shell scripted which creates the device nodes for you. They are not
always created during startup, especially if there is not X on the system.
kind
Sean, much thankyou.
Guinness owed if I am ever in Temple Bar soon.
-Original Message-
From: Sean McGrath [mailto:smcg...@tchpc.tcd.ie]
Sent: 09 February 2017 11:58
To: slurm-dev
Subject: [slurm-dev] Re: Abaqus with Slurm
Hi,
We have slurm 16.05.4 and the latest
Hi,
We have slurm 16.05.4 and the latest version of Abaqus we use is 6.14. I
remember running into a similar problem with Abaqus so I wrote some bad bash to
populate the host list file; http://www.tchpc.tcd.ie/node/1261
The github script seems to be doing something similar but in a better way.
Hi,
In my GPU cluster, slurmd daemon doesn't start correctly because when
daemon start, it doesn't find /dev/nvidia[0-1] device (mapped in
gres.conf). For solving this problem, I have added attribute
"ExecStartPre=@/usr/bin/nvidia-smi >/dev/null" in service file and now
daemon starts
I would guess quite a few sites are using Abaqus with Slurm. I would be
grateful for some pointers on the submission scripts for MPI parallel Abaqus
runs.
I am setting up Abaqus version 6.14-1 on a system with Slurm 16.05 and an
Omnipath interconnect.
Specifically I am using this script to
21 matches
Mail list logo