ask-prolog and --task-epilog which should
be short otherwise killed.
Also you could have a look into slurm.conf options like: KillWait, WaitTime and
all the Timeouts (not all of them are useful for your case though).
Best Regards,
Chrysovalantis Paschoulas
On 12.04.2017 23:40, Roger Moye wro
privileges are again in effect.
This patch was created for Slurm git tag slurm-16-05-8-1:
{{{
From 141474a9fae4cec475bdab09b01205b395ec176d Mon Sep 17 00:00:00 2001
From: Chrysovalantis Paschoulas <c.paschou...@fz-juelich.de>
Date: Mon, 3 Apr 2017 15:14:39 +0200
Subject: [PATCH 1/1] Bring
when you get a shell
on the compute nodes after calling salloc. Could you please give us the
output of the command "cat /proc/$$/status | grep Groups" after calling
salloc? Try this with and without CacheGroups enabled.
Thanks!
Valantis
On 05/25/2016 04:10 PM, Chrysovalantis Pascho
017(thekla) gid=5000(cstrc) groups=5000(cstrc),10257(build)
We have had this problem before and we have upgraded a few days ago to
version 15.08.11 to see if a newer version resolves the problem but
unfortunately the result is the same.
Thanks,
Thekla
On 25/05/2016 04:48 μμ, Chrysovalantis Pascho
SLURM) and the only differences are the SLURM environment variables
and nothing else.
Thanks,
Thekla
On 25/05/2016 02:07 μμ, Chrysovalantis Paschoulas wrote:
Hi Thekla! :)
For me it looks like it's a configuration issue of the client LDAP name
service on the compute nodes. Which service
) but shouldn't change the behavior of commands
like id..
Best Regards,
Chrysovalantis Paschoulas
On 05/25/2016 10:32 AM, Thekla Loizou wrote:
Dear all,
We have noticed a very strange problem every time we add an existing
user to a secondary group.
We manage our users in LDAP. When we add a user
have to handle the IPC message.
3. Just use the the --signal option of sbatch where slurm will send the
signal you specify to your c++ program when you are close to the
walltime. Your c++ program must handle that signal.
Best Regards,
Chrysovalantis Paschoulas
On 04/12/2016 01:03 PM, Amit Goel
should try some
sacctmgr commands to check if it works correctly).
You should always be careful with the (major) versions of slurm you jump
with your upgrades. Check the available online documentation.
Best Regards,
Chrysovalantis Paschoulas
PS: a good idea would also be to backup your mysql DB
. In any case follow the online
documentation for upgrading.
Cheers
Chrysovalantis Paschoulas
PS: Always upgrade and start slurmdbd and let it do the updates in the
database and then start slurmctld. And when everything is OK with the
controllers then you can upgrade Slurm also on the compute
Hi!
In the man page of slurmdbd.conf you can find the parameter PrivateData.
So I guess you should set: PrivateData=accounts,usage,users in file
slurmdbd.conf.
It is the same parameter in both slurm.conf and slurmdbd.conf ;)
Regards,
Chrysoavalantis Paschoulas
PS: RTFM :P
On
-in-slurm.conf - /sharedfs/slurm/run/master2.pid
Cheers,
Chrysovalantis Paschoulas
On 06/10/2015 04:47 AM, Qianqian Sha wrote:
Hi,
We store all slurm logs/pids/tmpfiles on a shared disk. logs/pids/tmpfiles
of different slurmds can be distinguished by nodename(%n) or hostname(%h). But
it seems
Hi Loris!
What I would do in your case is the following:
- First option:
I would use 2 QoSs: the first one would be the default normal QoS with
the normal limits and then second would be the restrict QoS with the
more restricting limits for the specific partition. In each user
association I
Hello!
Having a quick look in your email I found a simple mistake:
The CPUs should the total number of Virtual Cores per node and not the number
of Sockets.
So set: CPUs=32 and maybe your problems will be solved ;)
Best Regards,
Chrysovalantis Paschoulas
On 11/27/2014 12:41 PM, Michael
Try the option --overcommit from srun.
Check also the man page. You can not exceed the MAX_TASKS_PER_NODE limit of
your configuration.
Cheers,
Valantis
On 11/27/2014 01:19 PM, Michael Lush wrote:
So set: CPUs=32 and maybe your problems will be solved ;)
Thankyou for answering :-)
That
Hi!
You need 2 packages installed on the system(in my case it is a RHEL based
distro) where you build Slurm: hwloc and hwloc-devel. And also you don't need
the .rpmmacros for hwloc if you have installed these packages. By default this
option is enabled. ;) Btw, I have never used a custom
enough I am sure you will get the desired result.
In our case we had set for example: I_MPI_DAPL_PROVIDER_LIST=ofa-v2-mlx4_0-1
which solved similar problems if I remember correctly.
Best Regards,
Chrysovalantis Paschoulas
On 10/20/2014 06:46 PM, Tingyang Xu wrote:
To whom it may concern,
Hello
which
processes are started? In the end I would suggest you to use a different
machine as a compute node. If you use VMs then is shouldn't be difficult
to set up a second VM ;)
I hope this will help you!
Best Regards,
Chrysovalantis Paschoulas
On 10/09/2014 09:11 PM, Uwe Sauter wrote:
And port
Hello!
You have to check if the file /usr/lib64/slurm/job_submit_defaults.so
exists on your system. This is included in package/rpm slurm-plugins.
If it is not there, you have to rebuild with the right options (I am not
sure right now what are the required options, I have to check).
Best
on the compute nodes then you should
execute:
salloc -N1 -p active srun -N1 --pty sh
or directly an srun command (without salloc involved):
srun -N1 -p active --pty sh
Best Regards,
Chrysovalantis Paschoulas
On 09/12/2014 09:19 AM, Sergio Iserte wrote:
Hello Eva,
you must remove
Regards,
Chrysovalantis Paschoulas
On 09/10/2014 03:50 PM, Erica Riello wrote:
Hi,
The manual says the default value is 300.
Erica
2014-09-09 14:28 GMT-03:00 Hill, Marti Torrey
mh...@lanl.govmailto:mh...@lanl.gov:
I found it MinJobAge.
From: Hill, Marti Torrey [mailto:mh...@lanl.govmailto:mh
/application.
Best Regards,
Chrysovalantis Paschoulas
On 09/09/2014 05:14 PM, Michal Zielinski wrote:
Phil,
I believe that 1 core per node is correct.
Maybe let me ask this first: is possible for a single task to use specific CPUs
across several nodes?
Thanks,
Mike
On Tue, Sep 9, 2014 at 10:38 AM
helpful to understand the reasons of your
problems ;)
Best Regards,
Chrysovalantis Paschoulas
Forschungszentrum Juelich - Juelich Supercomputing Centre
On 09/08/2014 02:53 PM, Erica Riello wrote:
Hello all,
I have 2 machines running Slurm 14.03.07, called torquepbs and torquepbsno1.
Slurmctld
? We want a good solution so even the accounting will be also HA
always.
I am waiting for your interesting answers and thanks in advance!!!
Best Regards,
Chrysovalantis Paschoulas
Juelich Supercomputing Centre
Forschungszentrum Juelich
by Chrysovalantis Paschoulas, Juelich
Supercomputing Centre - Forschungszentrum Juelich
# Intial Script by (C) Olli-Pekka Lehto - CSC IT Center for Science Ltd.
# **
# Usage message
USAGE=
USAGE
$(basename $0) [ [-h] | [-v] [-x
/c667a9952e6674b85289cb92c62fd5b7bece9adf
Quoting Chrysovalantis Paschoulas cpaschou...@gmail.com:
Dear developers,
we have a small testing cluster with 44 compute nodes, the OS is
Scientific
Linux 6.4 and we are using SLURM. The first installed version of Slurm was
2.5.7 and we have upgraded Slurm
version 2.6.4?? It
could be also possible that the default init script has some other issues
that we haven't found yet, so it would be nice if you could check it more
deeply. Thanks in advance!
Best Regards,
Chrysovalantis Paschoulas
Juelich Supercomputing Centre
Forschungszentrum Juelich, Germany
26 matches
Mail list logo