[slurm-dev] Re: How to allow Epilog script to run for job that is cancelled

2017-04-13 Thread Chrysovalantis Paschoulas
ask-prolog and --task-epilog which should be short otherwise killed. Also you could have a look into slurm.conf options like: KillWait, WaitTime and all the Timeouts (not all of them are useful for your case though). Best Regards, Chrysovalantis Paschoulas On 12.04.2017 23:40, Roger Moye wro

[slurm-dev] src/slurmctld/job_mgr.c: Bring back coordinator privileges for updating jobs

2017-04-03 Thread Chrysovalantis Paschoulas
privileges are again in effect. This patch was created for Slurm git tag slurm-16-05-8-1: {{{ From 141474a9fae4cec475bdab09b01205b395ec176d Mon Sep 17 00:00:00 2001 From: Chrysovalantis Paschoulas <c.paschou...@fz-juelich.de> Date: Mon, 3 Apr 2017 15:14:39 +0200 Subject: [PATCH 1/1] Bring

[slurm-dev] Re: Problem when adding user to secondary group

2016-05-25 Thread Chrysovalantis Paschoulas
when you get a shell on the compute nodes after calling salloc. Could you please give us the output of the command "cat /proc/$$/status | grep Groups" after calling salloc? Try this with and without CacheGroups enabled. Thanks! Valantis On 05/25/2016 04:10 PM, Chrysovalantis Pascho

[slurm-dev] Re: Problem when adding user to secondary group

2016-05-25 Thread Chrysovalantis Paschoulas
017(thekla) gid=5000(cstrc) groups=5000(cstrc),10257(build) We have had this problem before and we have upgraded a few days ago to version 15.08.11 to see if a newer version resolves the problem but unfortunately the result is the same. Thanks, Thekla On 25/05/2016 04:48 μμ, Chrysovalantis Pascho

[slurm-dev] Re: Problem when adding user to secondary group

2016-05-25 Thread Chrysovalantis Paschoulas
SLURM) and the only differences are the SLURM environment variables and nothing else. Thanks, Thekla On 25/05/2016 02:07 μμ, Chrysovalantis Paschoulas wrote: Hi Thekla! :) For me it looks like it's a configuration issue of the client LDAP name service on the compute nodes. Which service

[slurm-dev] Re: Problem when adding user to secondary group

2016-05-25 Thread Chrysovalantis Paschoulas
) but shouldn't change the behavior of commands like id.. Best Regards, Chrysovalantis Paschoulas On 05/25/2016 10:32 AM, Thekla Loizou wrote: Dear all, We have noticed a very strange problem every time we add an existing user to a secondary group. We manage our users in LDAP. When we add a user

[slurm-dev] Re: Finding out time in process

2016-04-12 Thread Chrysovalantis Paschoulas
have to handle the IPC message. 3. Just use the the --signal option of sbatch where slurm will send the signal you specify to your c++ program when you are close to the walltime. Your c++ program must handle that signal. Best Regards, Chrysovalantis Paschoulas On 04/12/2016 01:03 PM, Amit Goel

[slurm-dev] Re: Problem while updating to new slurm version

2015-10-16 Thread Chrysovalantis Paschoulas
should try some sacctmgr commands to check if it works correctly). You should always be careful with the (major) versions of slurm you jump with your upgrades. Check the available online documentation. Best Regards, Chrysovalantis Paschoulas PS: a good idea would also be to backup your mysql DB

[slurm-dev] Re: Update order for HA configuration

2015-10-16 Thread Chrysovalantis Paschoulas
. In any case follow the online documentation for upgrading. Cheers Chrysovalantis Paschoulas PS: Always upgrade and start slurmdbd and let it do the updates in the database and then start slurmctld. And when everything is OK with the controllers then you can upgrade Slurm also on the compute

[slurm-dev] Re: PrivateData - slurm.conf

2015-07-03 Thread Chrysovalantis Paschoulas
Hi! In the man page of slurmdbd.conf you can find the parameter PrivateData. So I guess you should set: PrivateData=accounts,usage,users in file slurmdbd.conf. It is the same parameter in both slurm.conf and slurmdbd.conf ;) Regards, Chrysoavalantis Paschoulas PS: RTFM :P On

[slurm-dev] Re: How to distinguish slurmctld.pid of server and backup-server on a shared disk(slurm 14.11)

2015-06-10 Thread Chrysovalantis Paschoulas
-in-slurm.conf - /sharedfs/slurm/run/master2.pid Cheers, Chrysovalantis Paschoulas On 06/10/2015 04:47 AM, Qianqian Sha wrote: Hi, We store all slurm logs/pids/tmpfiles on a shared disk. logs/pids/tmpfiles of different slurmds can be distinguished by nodename(%n) or hostname(%h). But it seems

[slurm-dev] Re: Restricting number of jobs per user in partition

2015-01-14 Thread Chrysovalantis Paschoulas
Hi Loris! What I would do in your case is the following: - First option: I would use 2 QoSs: the first one would be the default normal QoS with the normal limits and then second would be the restrict QoS with the more restricting limits for the specific partition. In each user association I

[slurm-dev] Re: overcommit on independent jobs

2014-11-27 Thread Chrysovalantis Paschoulas
Hello! Having a quick look in your email I found a simple mistake: The CPUs should the total number of Virtual Cores per node and not the number of Sockets. So set: CPUs=32 and maybe your problems will be solved ;) Best Regards, Chrysovalantis Paschoulas On 11/27/2014 12:41 PM, Michael

[slurm-dev] Re: overcommit on independent jobs

2014-11-27 Thread Chrysovalantis Paschoulas
Try the option --overcommit from srun. Check also the man page. You can not exceed the MAX_TASKS_PER_NODE limit of your configuration. Cheers, Valantis On 11/27/2014 01:19 PM, Michael Lush wrote: So set: CPUs=32 and maybe your problems will be solved ;) Thankyou for answering :-) That

[slurm-dev] Re: building slurm with rpmbuild and hwloc support

2014-10-27 Thread Chrysovalantis Paschoulas
Hi! You need 2 packages installed on the system(in my case it is a RHEL based distro) where you build Slurm: hwloc and hwloc-devel. And also you don't need the .rpmmacros for hwloc if you have installed these packages. By default this option is enabled. ;) Btw, I have never used a custom

[slurm-dev] Re: slurm cannot work with Infiniband after rebooting

2014-10-27 Thread Chrysovalantis Paschoulas
enough I am sure you will get the desired result. In our case we had set for example: I_MPI_DAPL_PROVIDER_LIST=ofa-v2-mlx4_0-1 which solved similar problems if I remember correctly. Best Regards, Chrysovalantis Paschoulas On 10/20/2014 06:46 PM, Tingyang Xu wrote: To whom it may concern, Hello

[slurm-dev] Re: Sample slurm.conf

2014-10-10 Thread Chrysovalantis Paschoulas
which processes are started? In the end I would suggest you to use a different machine as a compute node. If you use VMs then is shouldn't be difficult to set up a second VM ;) I hope this will help you! Best Regards, Chrysovalantis Paschoulas On 10/09/2014 09:11 PM, Uwe Sauter wrote: And port

[slurm-dev] Re: jobfilter plugin

2014-09-25 Thread Chrysovalantis Paschoulas
Hello! You have to check if the file /usr/lib64/slurm/job_submit_defaults.so exists on your system. This is included in package/rpm slurm-plugins. If it is not there, you have to rebuild with the right options (I am not sure right now what are the required options, I have to check). Best

[slurm-dev] Re: slurm salloc

2014-09-12 Thread Chrysovalantis Paschoulas
on the compute nodes then you should execute: salloc -N1 -p active srun -N1 --pty sh or directly an srun command (without salloc involved): srun -N1 -p active --pty sh Best Regards, Chrysovalantis Paschoulas On 09/12/2014 09:19 AM, Sergio Iserte wrote: Hello Eva, you must remove

[slurm-dev] RE: completed job information

2014-09-10 Thread Chrysovalantis Paschoulas
Regards, Chrysovalantis Paschoulas On 09/10/2014 03:50 PM, Erica Riello wrote: Hi, The manual says the default value is 300. Erica 2014-09-09 14:28 GMT-03:00 Hill, Marti Torrey mh...@lanl.govmailto:mh...@lanl.gov: I found it MinJobAge. From: Hill, Marti Torrey [mailto:mh...@lanl.govmailto:mh

[slurm-dev] Re: Requested node configuration is not available when using -c

2014-09-09 Thread Chrysovalantis Paschoulas
/application. Best Regards, Chrysovalantis Paschoulas On 09/09/2014 05:14 PM, Michal Zielinski wrote: Phil, I believe that 1 core per node is correct. Maybe let me ask this first: is possible for a single task to use specific CPUs across several nodes? Thanks, Mike On Tue, Sep 9, 2014 at 10:38 AM

[slurm-dev] Re: cluster nodes down

2014-09-08 Thread Chrysovalantis Paschoulas
helpful to understand the reasons of your problems ;) Best Regards, Chrysovalantis Paschoulas Forschungszentrum Juelich - Juelich Supercomputing Centre On 09/08/2014 02:53 PM, Erica Riello wrote: Hello all, I have 2 machines running Slurm 14.03.07, called torquepbs and torquepbsno1. Slurmctld

[slurm-dev] Open discussion about high availability

2013-11-22 Thread Chrysovalantis Paschoulas
? We want a good solution so even the accounting will be also HA always. I am waiting for your interesting answers and thanks in advance!!! Best Regards, Chrysovalantis Paschoulas Juelich Supercomputing Centre Forschungszentrum Juelich

[slurm-dev] new version of mpirun-mic script

2013-11-15 Thread Chrysovalantis Paschoulas
by Chrysovalantis Paschoulas, Juelich Supercomputing Centre - Forschungszentrum Juelich # Intial Script by (C) Olli-Pekka Lehto - CSC IT Center for Science Ltd. # ** # Usage message USAGE= USAGE $(basename $0) [ [-h] | [-v] [-x

[slurm-dev] Re: Problem with init-script

2013-10-17 Thread Chrysovalantis Paschoulas
/c667a9952e6674b85289cb92c62fd5b7bece9adf Quoting Chrysovalantis Paschoulas cpaschou...@gmail.com: Dear developers, we have a small testing cluster with 44 compute nodes, the OS is Scientific Linux 6.4 and we are using SLURM. The first installed version of Slurm was 2.5.7 and we have upgraded Slurm

[slurm-dev] Problem with init-script

2013-10-16 Thread Chrysovalantis Paschoulas
version 2.6.4?? It could be also possible that the default init script has some other issues that we haven't found yet, so it would be nice if you could check it more deeply. Thanks in advance! Best Regards, Chrysovalantis Paschoulas Juelich Supercomputing Centre Forschungszentrum Juelich, Germany