[slurm-dev] Re: slurm-dev SLURM PMI2 performance vs. mpirun/mpiexec (was: Re: Re: more detailed installation guide)

2016-01-07 Thread Novosielski, Ryan
Thanks for all of that Ralph. I was getting a lot of "help" from users 
debugging a performance problem and they were pointing to the use of srun. The 
more concrete info I had, the better (and for my own edification as I'd really 
otherwise not prefer to switch since it makes it easier to be using one 
software package to launch this stuff).

 *Note: UMDNJ is now Rutgers-Biomedical and Health Sciences*
|| \\UTGERS  |-*O*-
||_// Biomedical | Ryan Novosielski - Senior Technologist
|| \\ and Health | novos...@rutgers.edu- 
973/972.0922 (2x0922)
||  \\  Sciences | OIRT/High Perf & Res Comp - MSB C630, Newark
`'

On Jan 7, 2016, at 18:32, Ralph Castain 
> wrote:

Just following up as promised with some data. The graphs below were generated 
using the SLURM master with the PMIx plugin based on PMIx v1.1.0, running 64 
procs/node, using a simple MPI_Init/MPI_Finalize app. The blue line used srun 
to start the job, and used PMI-2. The red line also was started by srun, but 
used PMIx. As you can see, there is some performance benefit from use of PMIx.

The gray line used srun to start the job and the PMIx plugin, but also used the 
new optional features to reduce the startup time. There are two features:

(a) we only do a modex “recv” (i.e., a PMI-get) upon first communication to a 
specific peer

(b) the modex itself (i.e., pmi_fence) operation simply drops thru - we do not 
execute a barrier. Instead, there is an async exchange of the data. We only 
block when the proc requests a specific piece of data


The final yellow line is mpirun (which uses PMIx) using the new optional 
features. As you can see, it’s a little faster than srun-based launch.

We are extending these tests to larger scale, and continuing to push the 
performance as discussed before.

HTH
Ralph





On Jan 6, 2016, at 11:58 PM, Ralph Castain 
> wrote:



On Jan 6, 2016, at 9:31 PM, Novosielski, Ryan 
> wrote:


On Jan 6, 2016, at 23:31, Christopher Samuel 
> wrote:

On 07/01/16 01:03, Novosielski, Ryan wrote:

Since this is an audience that might know, and this is related (but
off-topic, sorry): is there any truth to the suggestions on the Internet
that using srun is /slower/ than mpirun/mpiexec?

In our experience Open-MPI 1.6.x and earlier (PMI-1 support) is slower
with srun than with mpirun.  This was tested with NAMD.

Open-MPI 1.8.x and later with PMI-2 is about the same with srun as with
mpirun.

Thanks very much to both of you who have responded with an answer to this 
question. Both of you have said "about the same" if I'm not mistaken. So I 
guess they're still is a very slight performance penalty to using PMI2 instead 
of mpirun? Probably worth it anyway, but I'm just curious to know the real 
score. Not a lot of info about this other than the mailing list.

FWIW: the reason the gap closed when going from the (1.6 vs srun+PMI1) to the 
(1.8 vs srun+PMI2) scenario is partly because of the PMI-1 vs PMI-2 difference, 
but also because OMPI’s mpirun slowed down significantly between the 1.6 and 
1.8 series. We didn’t catch the loss of performance in time, but are addressing 
it for the upcoming 2.0 series.

In 2.0, mpirun will natively use PMIx, and you can additionally use two new 
optional features to dramatically improve the launch time. I’ll provide a graph 
tomorrow to show the different performance vs PMI-2 even at small scale. Those 
features may become the default behavior at some point - hasn’t fully been 
decided yet as they need time to mature.

However, the situation is fluid. Using the SLURM PMix plugin (in master now and 
tentatively scheduled for release later this year) will effectively close the 
gap. Somewhere in that same timeframe, OMPI will be implementing further 
improvements to mpirun (using fabric instead of mgmt Ethernet to perform 
barriers, distributing the launch mapping procedure, etc.) and will likely move 
ahead again - and then members of the PMIx community are already planning to 
propose some of those changes for SLURM. If accepted, you’ll see the gap close 
again.

So I expect to see this surge and recover pattern to continue for the next 
couple of years, with mpirun ahead for awhile and then even with SLURM when 
using the PMIx plugin.

HTH - and I’ll provide the graph in the morning.
Ralph




Thanks again.



[slurm-dev] Re: slurm-dev SLURM PMI2 performance vs. mpirun/mpiexec

2016-01-07 Thread Christopher Samuel

On 07/01/16 16:31, Novosielski, Ryan wrote:

> Thanks very much to both of you who have responded with an answer to
> this question. Both of you have said "about the same" if I'm not
> mistaken. So I guess they're still is a very slight performance
> penalty to using PMI2 instead of mpirun? Probably worth it anyway,
> but I'm just curious to know the real score. Not a lot of info about
> this other than the mailing list.

The only thing I'll add to what Ralph has said overnight here is that an
awful lot will depend on your local config, so the best way to work it
out is to do what we did and benchmark it yourself and see what happens
for you with the code you care about.

To paraphrase Winston Churchill: there are lies, damn lies and other
peoples benchmarks.. ;-)

Best of luck!
Chris
-- 
 Christopher SamuelSenior Systems Administrator
 VLSCI - Victorian Life Sciences Computation Initiative
 Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
 http://www.vlsci.org.au/  http://twitter.com/vlsci


[slurm-dev] Re: slurm-dev SLURM PMI2 performance vs. mpirun/mpiexec (was: Re: Re: more detailed installation guide)

2016-01-07 Thread Ralph Castain
Just following up as promised with some data. The graphs below were generated 
using the SLURM master with the PMIx plugin based on PMIx v1.1.0, running 64 
procs/node, using a simple MPI_Init/MPI_Finalize app. The blue line used srun 
to start the job, and used PMI-2. The red line also was started by srun, but 
used PMIx. As you can see, there is some performance benefit from use of PMIx.

The gray line used srun to start the job and the PMIx plugin, but also used the 
new optional features to reduce the startup time. There are two features:

(a) we only do a modex “recv” (i.e., a PMI-get) upon first communication to a 
specific peer

(b) the modex itself (i.e., pmi_fence) operation simply drops thru - we do not 
execute a barrier. Instead, there is an async exchange of the data. We only 
block when the proc requests a specific piece of data


The final yellow line is mpirun (which uses PMIx) using the new optional 
features. As you can see, it’s a little faster than srun-based launch.

We are extending these tests to larger scale, and continuing to push the 
performance as discussed before.

HTH
Ralph





> On Jan 6, 2016, at 11:58 PM, Ralph Castain  wrote:
> 
> 
> 
>> On Jan 6, 2016, at 9:31 PM, Novosielski, Ryan  
>> wrote:
>> 
>> 
>>> On Jan 6, 2016, at 23:31, Christopher Samuel  wrote:
>>> 
 On 07/01/16 01:03, Novosielski, Ryan wrote:
 
 Since this is an audience that might know, and this is related (but
 off-topic, sorry): is there any truth to the suggestions on the Internet
 that using srun is /slower/ than mpirun/mpiexec?
>>> 
>>> In our experience Open-MPI 1.6.x and earlier (PMI-1 support) is slower
>>> with srun than with mpirun.  This was tested with NAMD.
>>> 
>>> Open-MPI 1.8.x and later with PMI-2 is about the same with srun as with
>>> mpirun.
>> 
>> Thanks very much to both of you who have responded with an answer to this 
>> question. Both of you have said "about the same" if I'm not mistaken. So I 
>> guess they're still is a very slight performance penalty to using PMI2 
>> instead of mpirun? Probably worth it anyway, but I'm just curious to know 
>> the real score. Not a lot of info about this other than the mailing list.
> 
> FWIW: the reason the gap closed when going from the (1.6 vs srun+PMI1) to the 
> (1.8 vs srun+PMI2) scenario is partly because of the PMI-1 vs PMI-2 
> difference, but also because OMPI’s mpirun slowed down significantly between 
> the 1.6 and 1.8 series. We didn’t catch the loss of performance in time, but 
> are addressing it for the upcoming 2.0 series.
> 
> In 2.0, mpirun will natively use PMIx, and you can additionally use two new 
> optional features to dramatically improve the launch time. I’ll provide a 
> graph tomorrow to show the different performance vs PMI-2 even at small 
> scale. Those features may become the default behavior at some point - hasn’t 
> fully been decided yet as they need time to mature.
> 
> However, the situation is fluid. Using the SLURM PMix plugin (in master now 
> and tentatively scheduled for release later this year) will effectively close 
> the gap. Somewhere in that same timeframe, OMPI will be implementing further 
> improvements to mpirun (using fabric instead of mgmt Ethernet to perform 
> barriers, distributing the launch mapping procedure, etc.) and will likely 
> move ahead again - and then members of the PMIx community are already 
> planning to propose some of those changes for SLURM. If accepted, you’ll see 
> the gap close again.
> 
> So I expect to see this surge and recover pattern to continue for the next 
> couple of years, with mpirun ahead for awhile and then even with SLURM when 
> using the PMIx plugin.
> 
> HTH - and I’ll provide the graph in the morning.
> Ralph
> 
> 
> 
>> 
>> Thanks again.



[slurm-dev] Problem building on Fedora 23 and above

2016-01-07 Thread Adam Huffman

Hello

I've been trying to build Slurm RPMs using Fedora's COPR build service
(at https://copr.fedoraproject.org/coprs/verdurin/slurm/) and a change
in RPM means that builds are failing for Fedora 23 and Rawhide.

The problem is with the perlapi package and the handling of
optimization. Here are the errors when building:

gcc -c  -I. -I../../../.. -I../../../../contribs/perlapi/common
-I../../../.. -g -static -O2 -g -pipe -Wall -Werror=format-security
-Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector-strong
--param=ssp-buffer-size=4 -grecord-gcc-switches
-specs=/usr/lib/rpm/redhat/redhat-hardened-cc1 -m64 -mtune=generic
-pthread -fno-gcse -Wall -g -O0 -fno-strict-aliasing  -g
-DVERSION=\"0.02\" -DXS_VERSION=\"0.02\" -fPIC
"-I/usr/lib64/perl5/CORE"   Slurm.c
In file included from /usr/include/sys/types.h:25:0,
 from /usr/lib64/perl5/CORE/perl.h:699,
 from Slurm.xs:2:
/usr/include/features.h:328:4: warning: #warning _FORTIFY_SOURCE
requires compiling with optimization (-O) [-Wcpp]
 #  warning _FORTIFY_SOURCE requires compiling with optimization (-O)
^

repeated for the other files in that directory, leading  to the
following at the end of the RPM build:

RPM build errors:
error: Empty %files file /builddir/build/BUILD/slurm-15.08.6/perlapi.files
Macro expanded in comment on line 10: # --prefix
%_prefixpathinstall path for commands, libraries, etc.
Macro expanded in comment on line 30: #  Allow defining --with and
--without build options or %_with and %without in .rpmmacros
Macro expanded in comment on line 121: #%if %{slurm_with blcr}
Macro expanded in comment on line 849: #%{_mandir}/man3/slurmdb_*
Empty %files file /builddir/build/BUILD/slurm-15.08.6/perlapi.files

(the macro expansion lines are just warnings, I think)

The problem with the empty perlapi.files did exist in Fedora 22, but
it was only treated as a warning, not an error.

Best Wishes,
Adam


[slurm-dev] Re: slurm job array limit?

2016-01-07 Thread Daniel Letai


Your MaxJobCount/MinJobAge combo might be too high, and the slurmctld is 
exhausting physical memory, resorting to swap which slows it down thus 
exceeding it's scheduling loop time window.
You might wish to increase the scheduling loop duration as per 
http://slurm.schedmd.com/slurm.conf.html#OPT_SchedulerParameters

and specifically:
http://slurm.schedmd.com/slurm.conf.html#OPT_max_sched_time=#
possibly also
http://slurm.schedmd.com/slurm.conf.html#OPT_bf_yield_interval=#
http://slurm.schedmd.com/slurm.conf.html#OPT_build_queue_timeout=#
Although the last 2 seem less likely (sleep has no dependencies, and 
backfill is likely not playing a role).


Other options - From http://slurm.schedmd.com/job_array.html
The sched/backfill plugin has been modified to improve performance with 
job arrays. Once one element of a job array is discovered to not be 
runable or impact the scheduling of pending jobs, the remaining elements 
of that job array will be quickly skipped.


Have you enabled backfill debugging flags to verify this is not 
happening for some reason?




On 01/06/2016 08:12 PM, CB wrote:

slurm job array limit?
Hi,

I'm running Slurm 15.08.1 version.

When I submitted a job array with 5000 tasks, it only scheduled the 
first 102 tasks although there are plenty of slots available.


sbatch --array=1-5000 -o /dev/null --wrap="/bin/sleep 120"

The slurmctld log says:

[2016-01-06T12:43:43.496] debug:  sched: already tested 102 jobs, 
breaking out


Then, after a while, the schduler dispatched some 1000 tasks and says

[[2016-01-06T12:44:24.003] debug:  sched: loop taking too long, 
breaking out
[2016-01-06T12:44:24.004] debug:  Note large processing time from 
schedule: usec=1439516 began=12:44:22.564
[2016-01-06T12:44:24.070] debug:  Note large processing time from 
_slurmctld_background: usec=1531381 began=12:44:22.538


After that, slurm schedules the remaining tasks only one compute nodes.

Has anyone seen this behavior?

Currently we've set the following Slurm parameters:
MaxArraySize=10
MaxJobCount=250

Thanks,
- Chansup


[slurm-dev] MaxTRESMins limit on a job kills a running job -- is it meant to?

2016-01-07 Thread Lennart Karlsson


We have set the MaxTRESMins limit on accounts and users, to make it
impossible to start what we think is outrageously large jobs.

But we have found an unwanted side effect:
When the user asks for a longer timelimit, we often allow that, and
when we increase the timelimit, sometimes jobs run into the
MaxTRESMins limit and die:
Dec 28 17:20:18 milou-q slurmctld: [2015-12-28T17:20:09.072] Job 6574528 timed 
out, the job is at or exceeds assoc 10056(b2013086/ansgar/(null)) max tres(cpu) 
minutes of 60 with 61

For us, this looks like a bug.

Please, we would prefer the MaxTRESMins limit not to kill already
running jobs.

Cheers,
-- Lennart Karlsson
   UPPMAX, Uppsala University, Sweden
   http://www.uppmax.uu.se


[slurm-dev] slurmctld/reservation.c: Fix bug in computation of top_suffix

2016-01-07 Thread Dorian Krause

Dear Slurm developers,

please find below a patch for a slurmctld bug that results in incorrect 
reservation ids after the restart or reconfiguration of slurmctld if a 
reservation with a specific name pattern has been created previously. This may 
(I have not verified that but consider it to be very likely) result in sacct 
reporting incorrect reservation names for jobs when the reservation id wraps 
around.
The patch applies to the current HEAD. We originally found this problem with 
version 14.03.4 to which the identical patch applies. All other versions in 
between are most likely affected as well.

Thank you for your consideration.

Best regards,
Dorian



From 321b6998aa7ecd7bae10e5efbaf7a2bcf309dca4 Mon Sep 17 00:00:00 2001
From: Dorian Krause 
Date: Thu, 7 Jan 2016 13:37:15 +0100
Subject: [PATCH] slurmctld/reservation.c: Fix bug in computation of top_suffix

The top_suffix variable is used to store and update the reservation id when 
creating
new reservations. As the name suggests the top_suffix value is also used as a 
suffix
for reservations without user-provided name. In _validate_all_reservations() the
top_suffix is updated after a restart or reconfiguration based on the suffices 
of
existing reservations. This logic does not take into account the possibility of 
users
specifying reservation names that end with "_[0-9]+". If users create such a
reservation this will result in bogus top_suffix values and thus bogus (e.g.,
non-unique) reservation ids.
Instead of retrieving the reservation id from the name it is faster and easier 
to
use the resv_id member of slurmctld_resv_t.

Prior to the commit:

> cat p_top_suffix
p top_suffix
quit
> scontrol create reservation=test1 nodes=node1 start=now duration=1 users=root
Reservation created: test1
> gdb --pid=$(pgrep slurmctld) -x ./p_top_suffix
0x0039884c8a5d in nanosleep () from /lib64/libc.so.6
$1 = 1
A debugging session is active.

Inferior 1 [process 1166] will be detached.

> scontrol create reservation=test2 nodes=node1 start=now+60 duration=1 
> users=root
Reservation created: test2
> gdb --pid=$(pgrep slurmctld) -x ./p_top_suffix
0x0039884c8a5d in nanosleep () from /lib64/libc.so.6
$1 = 2
A debugging session is active.

Inferior 1 [process 1166] will be detached.

> scontrol create reservation=test3_100 nodes=node1 start=now+120 duration=1 
> users=root
Reservation created: test3_100
> gdb --pid=$(pgrep slurmctld) -x ./p_top_suffix
0x0039884c8a5d in nanosleep () from /lib64/libc.so.6
$1 = 3
A debugging session is active.

Inferior 1 [process 1166] will be detached.

> scontrol reconfigure
> gdb --pid=$(pgrep slurmctld) -x ./p_top_suffix
0x0039884c8a5d in nanosleep () from /lib64/libc.so.6
$1 = 100
A debugging session is active.

Inferior 1 [process 1166] will be detached.

With the fix:

> scontrol create reservation=test1 nodes=node1 start=now duration=1 users=root
Reservation created: test1
> gdb --pid=$(pgrep slurmctld) -x ./p_top_suffix
0x0039884c8a5d in nanosleep () from /lib64/libc.so.6
$1 = 1
A debugging session is active.

Inferior 1 [process 5854] will be detached.

> scontrol create reservation=test2 nodes=node1 start=now+60 duration=1 
> users=root
Reservation created: test2
> gdb --pid=$(pgrep slurmctld) -x ./p_top_suffix
0x0039884c8a5d in nanosleep () from /lib64/libc.so.6
$1 = 2
A debugging session is active.

Inferior 1 [process 5854] will be detached.

> scontrol create reservation=test3_100 nodes=node1 start=now+120 duration=1 
> users=root
Reservation created: test3_100
> gdb --pid=$(pgrep slurmctld) -x ./p_top_suffix
0x0039884c8a5d in nanosleep () from /lib64/libc.so.6
$1 = 3
A debugging session is active.

Inferior 1 [process 5854] will be detached.

> scontrol reconfigure
> gdb --pid=$(pgrep slurmctld) -x ./p_top_suffix
0x0039884c8a5d in nanosleep () from /lib64/libc.so.6
$1 = 3
A debugging session is active.

Inferior 1 [process 5854] will be detached.
---
 src/slurmctld/reservation.c | 8 +---
 1 file changed, 1 insertion(+), 7 deletions(-)

diff --git a/src/slurmctld/reservation.c b/src/slurmctld/reservation.c
index 9d9107a..41258d2 100644
--- a/src/slurmctld/reservation.c
+++ b/src/slurmctld/reservation.c
@@ -3301,8 +3301,6 @@ static void _validate_all_reservations(void)
ListIterator iter;
slurmctld_resv_t *resv_ptr;
struct job_record *job_ptr;
-   char *tmp;
-   uint32_t res_num;

iter = list_iterator_create(resv_list);
while ((resv_ptr = (slurmctld_resv_t *) list_next(iter))) {
@@ -3314,11 +3312,7 @@ static void _validate_all_reservations(void)
list_delete_item(iter);
} else {
_set_assoc_list(resv_ptr);
-   tmp = strrchr(resv_ptr->name, '_');
-   if (tmp) {
-   res_num = atoi(tmp + 1);
-   

[slurm-dev] Re: slurm-dev SLURM PMI2 performance vs. mpirun/mpiexec (was: Re: Re: more detailed installation guide)

2016-01-07 Thread Artem Polyakov
I'd like to jump in with one fact. In OMPI s2 component has one problem
that slows down application when running under PMI2. s2 will allways push a
couple of keys making any fence non zero-byte. This slows down 2
empty Fences in OMPI.
I believe that if this will be fixed we will have comparable performance of
mpirun and srun --mpi=pmi2.
Ralph, I'll point to the place in the code in the next email.

пятница, 8 января 2016 г. пользователь Novosielski, Ryan написал:

> Thanks for all of that Ralph. I was getting a lot of "help" from users
> debugging a performance problem and they were pointing to the use of srun.
> The more concrete info I had, the better (and for my own edification as I'd
> really otherwise not prefer to switch since it makes it easier to be using
> one software package to launch this stuff).
>
>  *Note: UMDNJ is now Rutgers-Biomedical and Health Sciences*
> || \\UTGERS  |-*O*-
> ||_// Biomedical | Ryan Novosielski - Senior Technologist
> || \\ and Health | novos...@rutgers.edu
> - 973/972.0922
> (2x0922)
> ||  \\  Sciences | OIRT/High Perf & Res Comp - MSB C630, Newark
> `'
>
> On Jan 7, 2016, at 18:32, Ralph Castain  > wrote:
>
> Just following up as promised with some data. The graphs below were
> generated using the SLURM master with the PMIx plugin based on PMIx v1.1.0,
> running 64 procs/node, using a simple MPI_Init/MPI_Finalize app. The blue
> line used srun to start the job, and used PMI-2. The red line also was
> started by srun, but used PMIx. As you can see, there is some performance
> benefit from use of PMIx.
>
> The gray line used srun to start the job and the PMIx plugin, but also
> used the new optional features to reduce the startup time. There are two
> features:
>
> (a) we only do a modex “recv” (i.e., a PMI-get) upon first communication
> to a specific peer
>
> (b) the modex itself (i.e., pmi_fence) operation simply drops thru - we do
> not execute a barrier. Instead, there is an async exchange of the data. We
> only block when the proc requests a specific piece of data
>
>
> The final yellow line is mpirun (which uses PMIx) using the new optional
> features. As you can see, it’s a little faster than srun-based launch.
>
> We are extending these tests to larger scale, and continuing to push the
> performance as discussed before.
>
> HTH
> Ralph
>
>
> 
>
>
> On Jan 6, 2016, at 11:58 PM, Ralph Castain  > wrote:
>
>
>
> On Jan 6, 2016, at 9:31 PM, Novosielski, Ryan  > wrote:
>
>
> On Jan 6, 2016, at 23:31, Christopher Samuel  > wrote:
>
> On 07/01/16 01:03, Novosielski, Ryan wrote:
>
> Since this is an audience that might know, and this is related (but
> off-topic, sorry): is there any truth to the suggestions on the Internet
> that using srun is /slower/ than mpirun/mpiexec?
>
>
> In our experience Open-MPI 1.6.x and earlier (PMI-1 support) is slower
> with srun than with mpirun.  This was tested with NAMD.
>
> Open-MPI 1.8.x and later with PMI-2 is about the same with srun as with
> mpirun.
>
>
> Thanks very much to both of you who have responded with an answer to this
> question. Both of you have said "about the same" if I'm not mistaken. So I
> guess they're still is a very slight performance penalty to using PMI2
> instead of mpirun? Probably worth it anyway, but I'm just curious to know
> the real score. Not a lot of info about this other than the mailing list.
>
>
> FWIW: the reason the gap closed when going from the (1.6 vs srun+PMI1) to
> the (1.8 vs srun+PMI2) scenario is partly because of the PMI-1 vs PMI-2
> difference, but also because OMPI’s mpirun slowed down significantly
> between the 1.6 and 1.8 series. We didn’t catch the loss of performance in
> time, but are addressing it for the upcoming 2.0 series.
>
> In 2.0, mpirun will natively use PMIx, and you can additionally use two
> new optional features to dramatically improve the launch time. I’ll provide
> a graph tomorrow to show the different performance vs PMI-2 even at small
> scale. Those features may become the default behavior at some point -
> hasn’t fully been decided yet as they need time to mature.
>
> However, the situation is fluid. Using the SLURM PMix plugin (in master
> now and tentatively scheduled for release later this year) will effectively
> close the gap. Somewhere in that same timeframe, OMPI will be implementing
> further improvements to mpirun (using fabric instead of mgmt Ethernet to
> perform barriers, distributing the launch mapping procedure, etc.) and will
> likely move ahead again - and then members of the PMIx community are
> already planning to propose some of those 

[slurm-dev] Re: MaxTRESMins limit on a job kills a running job -- is it meant to?

2016-01-07 Thread Douglas Jacobsen
I think you probably want to add "safe" to AccountingStorageEnforce in
slurm.conf;  that should prevent it from starting jobs that would exceed
association limits.


Doug Jacobsen, Ph.D.
NERSC Computer Systems Engineer
National Energy Research Scientific Computing Center 
dmjacob...@lbl.gov

- __o
-- _ '\<,_
--(_)/  (_)__


On Thu, Jan 7, 2016 at 7:15 AM, Lennart Karlsson 
wrote:

>
> We have set the MaxTRESMins limit on accounts and users, to make it
> impossible to start what we think is outrageously large jobs.
>
> But we have found an unwanted side effect:
> When the user asks for a longer timelimit, we often allow that, and
> when we increase the timelimit, sometimes jobs run into the
> MaxTRESMins limit and die:
> Dec 28 17:20:18 milou-q slurmctld: [2015-12-28T17:20:09.072] Job 6574528
> timed out, the job is at or exceeds assoc 10056(b2013086/ansgar/(null)) max
> tres(cpu) minutes of 60 with 61
>
> For us, this looks like a bug.
>
> Please, we would prefer the MaxTRESMins limit not to kill already
> running jobs.
>
> Cheers,
> -- Lennart Karlsson
>UPPMAX, Uppsala University, Sweden
>http://www.uppmax.uu.se
>


[slurm-dev] Slurm not using nodes short name

2016-01-07 Thread vaibhav pol
Hi ,

  We are planing to   migrate from Maui to Slurm   so we are testing the
slurm on few nodes. I face problem that Slurm using only the FQDN of nodes in
"NodeName" parameter because of that I am unable to use regex in configuration.
Currently we are not using the dns server , using  "/etc/hosts" file. Whenever
we put  short name in  "NodeName" in slurm.conf slurmctld not able to register
 the compute node . Can we force the slurmctld to use short name rather than
FQDN. Details of envorment is  below



slurm version = slurm-15.08.06

OS = CentOS 6.5



Thanks and regards,
Vaibhav Pol
Senior Technical Officer
National PARAM Supercomputing Facility
Centre for Development of Advanced Computing
Ganeshkhind Road
Pune University Campus
PUNE-Maharashtra
Phone +91-20-25704183 ext: 183
Cell Phone : +919850466409
---
[ C-DAC is on Social-Media too. Kindly follow us at:
Facebook: https://www.facebook.com/CDACINDIA & Twitter: @cdacindia ]

This e-mail is for the sole use of the intended recipient(s) and may
contain confidential and privileged information. If you are not the
intended recipient, please contact the sender by reply e-mail and destroy
all copies and the original message. Any unauthorized review, use,
disclosure, dissemination, forwarding, printing or copying of this email
is strictly prohibited and appropriate legal action will be taken.
---



[slurm-dev] Re: slurm-dev Memory Limits on CentOS 7/RHEL 7 (was: Re: Re: default memory limit (15.08.5)?)

2016-01-07 Thread Wiegand, Paul
I believe that is the case, yes; however, we should let one of the experts 
chime in.  I could not find a PAM-oriented solution.  This was the only way I 
got it to work ... which is not to say it is the only solution.

Paul.


> On Jan 7, 2016, at 15:34, Novosielski, Ryan  wrote:
> 
> Yes — I am running CentOS 7. So to be clear, PAM has no impact on solving 
> this problem on CentOS/RHEL 7, and this is the only way to make sure things 
> are right?
> 
>> On Jan 7, 2016, at 3:01 PM, Wiegand, Paul  wrote:
>> 
>> Are you running systemd (e.g., CentOS 7 or RHEL 7)?  If so, the ulimit 
>> solution mentioned in the FAQ does not work.  Instead, you need to put these 
>> lines in the [] section of your service files:
>> 
>> LimitMEMLOCK=infinity
>> LimitSTACK=infinity
>> LimitCPU=infinity
>> 
>> Paul.
>> 
>> 
>>> On Jan 7, 2016, at 14:53, Novosielski, Ryan  wrote:
>>> 
>>> I read this awhile ago and it was on my list of things to do to ask a 
>>> clarifying question — does this mean that if I’m running SLURM with PAM, 
>>> I’m good on this just automatically? Is there a smart way to check? Running 
>>> a job that just does “ulimit -a” maybe?
>>> 
 On Jan 7, 2016, at 2:18 PM, je...@schedmd.com wrote:
 
 
 See FAQ:
 http://slurm.schedmd.com/faq.html#rlimit
 
 Quoting Michael Richard Colonno :
>>> 
>>>  *Note: UMDNJ is now Rutgers-Biomedical and Health Sciences*
>>> || \\UTGERS  |-*O*-
>>> ||_// Biomedical | Ryan Novosielski - Senior Technologist
>>> || \\ and Health | novos...@rutgers.edu - 973/972.0922 (2x0922)
>>> ||  \\  Sciences | OIRT/High Perf & Res Comp - MSB C630, Newark
>>>   `'
>>> 
>> 
> 
>  *Note: UMDNJ is now Rutgers-Biomedical and Health Sciences*
> || \\UTGERS  |-*O*-
> ||_// Biomedical | Ryan Novosielski - Senior Technologist
> || \\ and Health | novos...@rutgers.edu - 973/972.0922 (2x0922)
> ||  \\  Sciences | OIRT/High Perf & Res Comp - MSB C630, Newark
> `'
> 



[slurm-dev] Re: default memory limit (15.08.5)?

2016-01-07 Thread Wiegand, Paul
Are you running systemd (e.g., CentOS 7 or RHEL 7)?  If so, the ulimit solution 
mentioned in the FAQ does not work.  Instead, you need to put these lines in 
the [] section of your service files:

LimitMEMLOCK=infinity
LimitSTACK=infinity
LimitCPU=infinity

Paul.


> On Jan 7, 2016, at 14:53, Novosielski, Ryan  wrote:
> 
> I read this awhile ago and it was on my list of things to do to ask a 
> clarifying question — does this mean that if I’m running SLURM with PAM, I’m 
> good on this just automatically? Is there a smart way to check? Running a job 
> that just does “ulimit -a” maybe?
> 
>> On Jan 7, 2016, at 2:18 PM, je...@schedmd.com wrote:
>> 
>> 
>> See FAQ:
>> http://slurm.schedmd.com/faq.html#rlimit
>> 
>> Quoting Michael Richard Colonno :
> 
>  *Note: UMDNJ is now Rutgers-Biomedical and Health Sciences*
> || \\UTGERS  |-*O*-
> ||_// Biomedical | Ryan Novosielski - Senior Technologist
> || \\ and Health | novos...@rutgers.edu - 973/972.0922 (2x0922)
> ||  \\  Sciences | OIRT/High Perf & Res Comp - MSB C630, Newark
> `'
> 



[slurm-dev] slurm-dev Memory Limits on CentOS 7/RHEL 7 (was: Re: Re: default memory limit (15.08.5)?)

2016-01-07 Thread Novosielski, Ryan
Yes — I am running CentOS 7. So to be clear, PAM has no impact on solving this 
problem on CentOS/RHEL 7, and this is the only way to make sure things are 
right?

> On Jan 7, 2016, at 3:01 PM, Wiegand, Paul  wrote:
> 
> Are you running systemd (e.g., CentOS 7 or RHEL 7)?  If so, the ulimit 
> solution mentioned in the FAQ does not work.  Instead, you need to put these 
> lines in the [] section of your service files:
> 
> LimitMEMLOCK=infinity
> LimitSTACK=infinity
> LimitCPU=infinity
> 
> Paul.
> 
> 
>> On Jan 7, 2016, at 14:53, Novosielski, Ryan  wrote:
>> 
>> I read this awhile ago and it was on my list of things to do to ask a 
>> clarifying question — does this mean that if I’m running SLURM with PAM, I’m 
>> good on this just automatically? Is there a smart way to check? Running a 
>> job that just does “ulimit -a” maybe?
>> 
>>> On Jan 7, 2016, at 2:18 PM, je...@schedmd.com wrote:
>>> 
>>> 
>>> See FAQ:
>>> http://slurm.schedmd.com/faq.html#rlimit
>>> 
>>> Quoting Michael Richard Colonno :
>> 
>>  *Note: UMDNJ is now Rutgers-Biomedical and Health Sciences*
>> || \\UTGERS  |-*O*-
>> ||_// Biomedical | Ryan Novosielski - Senior Technologist
>> || \\ and Health | novos...@rutgers.edu - 973/972.0922 (2x0922)
>> ||  \\  Sciences | OIRT/High Perf & Res Comp - MSB C630, Newark
>>`'
>> 
> 

 *Note: UMDNJ is now Rutgers-Biomedical and Health Sciences*
|| \\UTGERS  |-*O*-
||_// Biomedical | Ryan Novosielski - Senior Technologist
|| \\ and Health | novos...@rutgers.edu - 973/972.0922 (2x0922)
||  \\  Sciences | OIRT/High Perf & Res Comp - MSB C630, Newark
 `'



signature.asc
Description: Message signed with OpenPGP using GPGMail


[slurm-dev] RE: default memory limit (15.08.5)?

2016-01-07 Thread jette


See FAQ:
http://slurm.schedmd.com/faq.html#rlimit

Quoting Michael Richard Colonno :


Hi ~

Still struggling with a subtle version of this memory  
size issue in version 15.08.5. I don’t get explicit errors but I  
believe it’s at the root cause of the seg faults in parallel codes.  
SLURM seems to “intercept” the max memory limit (-m) and ignore /  
override any system settings.


On the master node:

$ ulimit -a
core file size  (blocks, -c) 0
data seg size   (kbytes, -d) unlimited
scheduling priority (-e) 0
file size   (blocks, -f) unlimited
pending signals (-i) 515700
max locked memory   (kbytes, -l) unlimited
max memory size (kbytes, -m) unlimited
open files  (-n) 1024
pipe size(512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority  (-r) 0
stack size  (kbytes, -s) 10240
cpu time   (seconds, -t) unlimited
max user processes  (-u) 1024
virtual memory  (kbytes, -v) unlimited
file locks  (-x) unlimited

On the node (interactively):

$ ulimit -a
core file size  (blocks, -c) 0
data seg size   (kbytes, -d) unlimited
scheduling priority (-e) 0
file size   (blocks, -f) unlimited
pending signals (-i) 127880
max locked memory   (kbytes, -l) unlimited
max memory size (kbytes, -m) unlimited
open files  (-n) 1024
pipe size(512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority  (-r) 0
stack size  (kbytes, -s) 10240
cpu time   (seconds, -t) unlimited
max user processes  (-u) 1024
virtual memory  (kbytes, -v) unlimited
file locks  (-x) unlimited

Now run through SLURM on the same node:

$ srun -n1 bash -c "ulimit -a"
core file size  (blocks, -c) 0
data seg size   (kbytes, -d) unlimited
scheduling priority (-e) 0
file size   (blocks, -f) unlimited
pending signals (-i) 127880
max locked memory   (kbytes, -l) unlimited
max memory size (kbytes, -m) 1024
open files  (-n) 4096
pipe size(512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority  (-r) 0
stack size  (kbytes, -s) unlimited
cpu time   (seconds, -t) unlimited
max user processes  (-u) 127880
virtual memory  (kbytes, -v) unlimited
file locks  (-x) unlimited

I have added a “ulimit –m unlimited” to all the SLURM  
init.d scripts but no change. Any advice on how to eliminate this  
permanently? I’ve tried various PropagateResourceLimitsExcept  
settings in slurm.conf without any effect on the max memory size.


Thanks,
~MC

From: Colonno, Michael Richard
Sent: Thursday, April 9, 2015 11:10 AM
To: slurm-dev 
Subject: [slurm-dev] Re: default memory limit (14.11.5)?

Nope – my slurm.conf is very basic (been using it for  
several versions).


# COMPUTE NODES
NodeName=node[1-8]   Sockets=2 CoresPerSocket=6   
ThreadsPerCore=1State=IDLE
PartitionName=allNodes=node[1-8]   Default=YES
MaxTime=INFINITE State=UP


Perhaps a system-level limit or something not set in the  
slurm init.d script? This all looks pretty normal:


# ulimit -a
core file size  (blocks, -c) 0
data seg size   (kbytes, -d) unlimited
scheduling priority (-e) 0
file size   (blocks, -f) unlimited
pending signals (-i) 256422
max locked memory   (kbytes, -l) 64
max memory size (kbytes, -m) unlimited
open files  (-n) 1024
pipe size(512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority  (-r) 0
stack size  (kbytes, -s) 10240
cpu time   (seconds, -t) unlimited
max user processes  (-u) 256422
virtual memory  (kbytes, -v) unlimited
file locks  (-x) unlimited

Thanks,
~Mike C.

From: Morris Jette [mailto:je...@schedmd.com]
Sent: Thursday, April 09, 2015 10:59 AM
To: slurm-dev
Subject: [slurm-dev] Re: default memory limit (14.11.5)?

Do you have a DefMemPerCPU or DefMemPerNode configured in slurm.conf?
On April 9, 2015 10:52:37 AM PDT, Michael Colonno  
> wrote:


Hi ~



I just upgraded my cluster to SLURM 14.11.5. Everything  
went smoothly but when I run a test case it seems there is now a  
(very small) memory limit on jobs:




$ srun -n4 date

slurmstepd: Step 19293.0 exceeded memory limit (3324 > 1024), being killed

srun: Exceeded 

[slurm-dev] RE: default memory limit (15.08.5)?

2016-01-07 Thread Novosielski, Ryan
I read this awhile ago and it was on my list of things to do to ask a 
clarifying question — does this mean that if I’m running SLURM with PAM, I’m 
good on this just automatically? Is there a smart way to check? Running a job 
that just does “ulimit -a” maybe?

> On Jan 7, 2016, at 2:18 PM, je...@schedmd.com wrote:
> 
> 
> See FAQ:
> http://slurm.schedmd.com/faq.html#rlimit
> 
> Quoting Michael Richard Colonno :

 *Note: UMDNJ is now Rutgers-Biomedical and Health Sciences*
|| \\UTGERS  |-*O*-
||_// Biomedical | Ryan Novosielski - Senior Technologist
|| \\ and Health | novos...@rutgers.edu - 973/972.0922 (2x0922)
||  \\  Sciences | OIRT/High Perf & Res Comp - MSB C630, Newark
 `'



signature.asc
Description: Message signed with OpenPGP using GPGMail


[slurm-dev] Re: slurm-dev Memory Limits on CentOS 7/RHEL 7 (was: Re: Re: default memory limit (15.08.5)?)

2016-01-07 Thread Michael Richard Colonno
These are CentOS 6 (latest) systems. I'm familiar with the FAQ entry 
and I've used the /etc/init.d/slurm (and /etc/security/limits.conf) method but 
no effect on the specific memory limit I mentioned originally. 

Thanks,
~MC

-Original Message-
From: Wiegand, Paul [mailto:wieg...@ist.ucf.edu] 
Sent: Thursday, January 7, 2016 12:46 PM
To: slurm-dev 
Subject: [slurm-dev] Re: slurm-dev Memory Limits on CentOS 7/RHEL 7 (was: Re: 
Re: default memory limit (15.08.5)?)

I believe that is the case, yes; however, we should let one of the experts 
chime in.  I could not find a PAM-oriented solution.  This was the only way I 
got it to work ... which is not to say it is the only solution.

Paul.


> On Jan 7, 2016, at 15:34, Novosielski, Ryan  wrote:
> 
> Yes — I am running CentOS 7. So to be clear, PAM has no impact on solving 
> this problem on CentOS/RHEL 7, and this is the only way to make sure things 
> are right?
> 
>> On Jan 7, 2016, at 3:01 PM, Wiegand, Paul  wrote:
>> 
>> Are you running systemd (e.g., CentOS 7 or RHEL 7)?  If so, the ulimit 
>> solution mentioned in the FAQ does not work.  Instead, you need to put these 
>> lines in the [] section of your service files:
>> 
>> LimitMEMLOCK=infinity
>> LimitSTACK=infinity
>> LimitCPU=infinity
>> 
>> Paul.
>> 
>> 
>>> On Jan 7, 2016, at 14:53, Novosielski, Ryan  wrote:
>>> 
>>> I read this awhile ago and it was on my list of things to do to ask a 
>>> clarifying question — does this mean that if I’m running SLURM with PAM, 
>>> I’m good on this just automatically? Is there a smart way to check? Running 
>>> a job that just does “ulimit -a” maybe?
>>> 
 On Jan 7, 2016, at 2:18 PM, je...@schedmd.com wrote:
 
 
 See FAQ:
 http://slurm.schedmd.com/faq.html#rlimit
 
 Quoting Michael Richard Colonno :
>>> 
>>>  *Note: UMDNJ is now Rutgers-Biomedical and Health Sciences*
>>> || \\UTGERS  |-*O*-
>>> ||_// Biomedical | Ryan Novosielski - Senior Technologist  \\ and 
>>> ||Health | novos...@rutgers.edu - 973/972.0922 (2x0922)
>>> ||  \\  Sciences | OIRT/High Perf & Res Comp - MSB C630, Newark
>>>   `'
>>> 
>> 
> 
>  *Note: UMDNJ is now Rutgers-Biomedical and Health Sciences*
> || \\UTGERS  |-*O*-
> ||_// Biomedical | Ryan Novosielski - Senior Technologist  \\ and 
> ||Health | novos...@rutgers.edu - 973/972.0922 (2x0922)
> ||  \\  Sciences | OIRT/High Perf & Res Comp - MSB C630, Newark
> `'
>