[slurm-dev] Added spank_item.

2014-03-04 Thread Magnus Jonsson
I have made a patch for spank to allow to fetch the SLURM_RESTART_COUNT into my spank plugin. The patch is attached (against 2.6.6). Best regards, Magnus -- Magnus Jonsson, Developer, HPC2N, Umeå Universitet diff a/slurm/spank.h b/slurm/spank.h --- a/slurm/spank.h +++ b/slurm/spank.h

[slurm-dev] Change node weight based on partition? QOS? other?

2014-05-19 Thread Magnus Jonsson
? Best regards, Magnus -- Magnus Jonsson, Developer, HPC2N, Umeå Universitet smime.p7s Description: S/MIME Cryptographic Signature

[slurm-dev] Killing the backfill...

2014-05-20 Thread Magnus Jonsson
sed on events that actually affects the scheduling of the queue? Best regards, Magnus -- Magnus Jonsson, Developer, HPC2N, Umeå Universitet smime.p7s Description: S/MIME Cryptographic Signature

[slurm-dev] Re: Killing the backfill...

2014-05-20 Thread Magnus Jonsson
On 2014-05-20 14:54, Tommi T wrote: On Tuesday, May 20, 2014 1:51 PM, Magnus Jonsson wrote: Hi! While investigating an other matter I found that if you have lots of jobs running with short job steps they killing the backfill very effective. Hi, Do you use bf_continue-flag? http

[slurm-dev] Changed behaviour of --exclusive in srun (job step context)

2014-09-11 Thread Magnus Jonsson
://www.hpc2n.umu.se/staff/magnus/slurm/stdout.2.6.4 3, http://www.hpc2n.umu.se/staff/magnus/slurm/stderr.2.6.4 4, http://www.hpc2n.umu.se/staff/magnus/slurm/stdout.14.03.7 5, http://www.hpc2n.umu.se/staff/magnus/slurm/stderr.14.03.7 -- Magnus Jonsson, Developer, HPC2N, Umeå Universitet smime

[slurm-dev] Re: Changed behaviour of --exclusive in srun (job step context)

2014-10-02 Thread Magnus Jonsson
Is no one else affected by this? /Magnus On 2014-09-11 14:46, Magnus Jonsson wrote: Hi! A user found a "strange" new behaviour when using --exclusive with srun. I have an example submit-script[1] that shows this. I have tested this on 2.6.4 with the output [2] & [3] (stderr)

[slurm-dev] Re: Job on wrong node

2015-02-04 Thread Magnus Jonsson
problem after that then indeed something else is the matter. Perhaps routing tables or something else. U -- Magnus Jonsson, Developer, HPC2N, Umeå Universitet smime.p7s Description: S/MIME Cryptographic Signature

[slurm-dev] Two patches for jobacct_gather.

2015-02-05 Thread Magnus Jonsson
, Magnus Jonsson -- Magnus Jonsson, Developer, HPC2N, Umeå Universitet diff --git a/doc/man/man5/slurm.conf.5 b/doc/man/man5/slurm.conf.5 index 29f730d..cb85598 100644 --- a/doc/man/man5/slurm.conf.5 +++ b/doc/man/man5/slurm.conf.5 @@ -1046,6 +1046,9 @@ Exclude shared memory from accounting. .TP

[slurm-dev] Slurm restart count in SPANK

2015-02-27 Thread Magnus Jonsson
was changed but I'm pretty sure it worked on in 2.6 (it was when we developed our tmpdir spank plugin). "SLURM_RESTART_COUNT" is available in the job user environment. /Magnus -- Magnus Jonsson, Developer, HPC2N, Umeå Universitet smime.p7s Description: S/MIME Cryptographic Signature

[slurm-dev] Re: Slurm restart count in SPANK

2015-02-27 Thread Magnus Jonsson
n spank_get_item(sp, S_SLURM_RESTART_COUNT, &restart_count); Hope that helps! Sent from my iPhone On Feb 27, 2015, at 8:14 AM, Magnus Jonsson wrote: It seams that the restart count in SPANK (prolog) is missing in resent versions of Slurm. I always returns 0 even if the jobs ha restarted.

[slurm-dev] A way of abuse the priority option in Slurm?

2015-03-31 Thread Magnus Jonsson
hange the priority at all? The user can use the 'nice' option to alter the priority of a job within a small limit that does not alter the priority as defined above. Please let me be wrong :-) /Magnus -- Magnus Jonsson, Developer, HPC2N, Umeå Universitet smime.p7s Description: S/MIM

[slurm-dev] Re: prevent slurm from parsing the full script

2015-04-21 Thread Magnus Jonsson
bscript "job1.sh": #!/bin/bash #SBATCH --nodes=1 #SBATCH --ntasks=1 #SBATCH --job-name=job1 srun -l echo "slurm jobid $SLURM_JOB_ID named: $SLURM_JOB_NAME" cat > job2.sh < -- Magnus Jonsson, Developer, HPC2N, Umeå Universitet smime.p7s Description: S/MIME Cryptographic Signature

[slurm-dev] Re: prevent slurm from parsing the full script

2015-04-21 Thread Magnus Jonsson
le jobscript for #SBATCH statements? Assume I have the following jobscript "job1.sh": #!/bin/bash #SBATCH --nodes=1 #SBATCH --ntasks=1 #SBATCH --job-name=job1 srun -l echo "slurm jobid $SLURM_JOB_ID named: $SLURM_JOB_NAME" cat > job2.sh < -- Magnus Jonsson, Develo

[slurm-dev] sbcast, prolog and SPANK.

2015-04-29 Thread Magnus Jonsson
use "srun cp ${PATH_TO_FILES}/* $TMPDIR/" Also this has the side effect that the prolog on the node is not run until you actually send a job to the node. I.e. you can send data to a node with sbcast before the prolog this might not be an expected/wanted behaviour. Best, Magnus --

[slurm-dev] Re: As a user how can I re-order my job submissions

2015-08-28 Thread Magnus Jonsson
n perspective I wouldn’t want this, because users could misuse this feature. But from a user perspective I could genuinely have some dependencies that I would like to have it addressed before beginning my batch of thousands of jobs. Any help here is greatly appreciated. Regards, Amit -- Magnus Jon

[slurm-dev] sacct vs sacct -X

2016-03-23 Thread Magnus Jonsson
t now)? Best regards, Magnus -- Magnus Jonsson, Developer, HPC2N, Umeå Universitet smime.p7s Description: S/MIME Cryptographic Signature

[slurm-dev] Re: sacct vs sacct -X

2016-03-23 Thread Magnus Jonsson
0 sacct -X --format=JobID,Elapsed,AllocCPUS,CPUTimeRaw -j 7364851 JobIDElapsed AllocCPUS CPUTimeRAW -- -- -- 736485100:00:00 16 0 /Magnus On 2016-03-23 09:29, Magnus Jonsson wrote: Hi! From this simple example

[slurm-dev] Re: sacct vs sacct -X

2016-04-12 Thread Magnus Jonsson
CPUTimeRAW -- -- -- 736485100:00:00 16 0 /Magnus On 2016-03-23 09:29, Magnus Jonsson wrote: Hi! From this simple example could someone explain to me if this is the expected behaviour or a bug? $ srun -n1 --exclusive hostname srun: job 4232239 queued and waiting for reso

[slurm-dev] Re: Restrict access for a user group to certain nodes

2016-12-01 Thread Magnus Jonsson
rators can submit jobs to those certain nodes to perform some tests, which might be disturbed by users submitting their jobs to those nodes. Various Search Engines didn't offer answers to my question, which is why I'm writing you here. Looking forward to some answers! Best, Felix Willenborg

[slurm-dev] Bugs in CR_ALLOCATE_FULL_SOCKET.

2013-01-18 Thread Magnus Jonsson
I can fix this some how otherwise I can bug test patches. I have a small part of our cluster available for testing right now (2 nodes, 8 sockets/node, 6 cores/socket). Best regards, Magnus -- Magnus Jonsson, Developer, HPC2N, Umeå Universitet smime.p7s Description: S/MIME Cryptographic Signature

[slurm-dev] Re: Bugs in CR_ALLOCATE_FULL_SOCKET.

2013-01-18 Thread Magnus Jonsson
Err... Wrong... On 2013-01-18 13:59, Magnus Jonsson wrote: Hi! I'm experimenting with CR_ALLOCATE_FULL_SOCKET and found some weird behaviour. Currently running git/master but have seen the same behaviour on 2.4.3 with the #define. My slurm.conf: SelectType=select/con

[slurm-dev] Re: Bugs in CR_ALLOCATE_FULL_SOCKET.

2013-01-18 Thread Magnus Jonsson
This patch fixes the behaviour with allocating 2 cores instead of one with --ntasks-per-socket=1. /Magnus On 2013-01-18 13:59, Magnus Jonsson wrote: Hi! I'm experimenting with CR_ALLOCATE_FULL_SOCKET and found some weird behaviour. Currently running git/master but have seen the

[slurm-dev] Re: Bugs in CR_ALLOCATE_FULL_SOCKET.

2013-01-18 Thread Magnus Jonsson
I have CR_ALLOCATE_FULL_SOCKET working correctly on block allocation. Will fix cyclic after the weekend and supply a patch.. Best regards, Magnus On 2013-01-18 16:00, Magnus Jonsson wrote: This patch fixes the behaviour with allocating 2 cores instead of one with --ntasks-per-socket=1

[slurm-dev] Re: Is it possible to get hold of parameters from sbatch/salloc in a spank plugin?

2013-01-30 Thread Magnus Jonsson
src/common/slurm_errno.c). We have also discussed adding a mechanism to return an arbitrary string to the user, but this is not possible today. Quoting Magnus Jonsson : Hi! I looking for a way to look at users submitted parameters and if they are using it in a "bad" way inform them that this

[slurm-dev] task_affinity bug in 2.5.1 and after..

2013-02-01 Thread Magnus Jonsson
, Magnus -- Magnus Jonsson, Developer, HPC2N, Umeå Universitet smime.p7s Description: S/MIME Cryptographic Signature

[slurm-dev] Patch for partition based SelectType (CR_Socket/CR_Core).

2013-02-07 Thread Magnus Jonsson
, Magnus -- Magnus Jonsson, Developer, HPC2N, Umeå Universitet diff --git a/src/common/read_config.c b/src/common/read_config.c index 2a54f69..b8d981b 100644 --- a/src/common/read_config.c +++ b/src/common/read_config.c @@ -903,6 +903,7 @@ static int _parse_partitionname(void **dest

[slurm-dev] Re: Patch for partition based SelectType (CR_Socket/CR_Core).

2013-02-07 Thread Magnus Jonsson
d historic reason for that. /Magnus On 2013-02-07 16:42, Aaron Knister wrote: That's awesome! (How) does it handle the case of nodes in multiple partitions? Sent from my iPhone On Feb 7, 2013, at 8:24 AM, Magnus Jonsson wrote: Hi everybody! Here attached is a patch that enables

[slurm-dev] Re: Disable black hole nodes automatically

2013-02-08 Thread Magnus Jonsson
ay have practical reasons, but that's not why we do it" -- Richard P. Feynman -- Magnus Jonsson, Developer, HPC2N, Umeå Universitet smime.p7s Description: S/MIME Cryptographic Signature

[slurm-dev] Preemptation bug

2013-02-12 Thread Magnus Jonsson
es: eval_nodes:0 consec c=48 n=1 b=0 e=0 r=-1 [2013-02-12T14:54:48+01:00] cons_res: cr_job_test: test 1 pass - idle resources found [2013-02-12T14:54:48+01:00] no job_resources info for job 241 [2013-02-12T14:54:48+01:00] debug2: Testing job time limits and checkpoints 8<--- -- Magnus Jonsso

[slurm-dev] task/affinity, --cpu_bind=socket and -c > 1

2013-02-15 Thread Magnus Jonsson
ock? Or should SLURM_DIST_CYCLIC, SLURM_DIST_BLOCK be the same as default? Best regards, Magnus -- Magnus Jonsson, Developer, HPC2N, Umeå Universitet smime.p7s Description: S/MIME Cryptographic Signature

[slurm-dev] Re: task/affinity, --cpu_bind=socket and -c > 1

2013-02-18 Thread Magnus Jonsson
Hi! This does not make a difference. And I think it might not do either according to the man page. /Magnus On 2013-02-15 18:32, Moe Jette wrote: Have you tried the --ntasks-per-socket or --ntasks-per-core options? Quoting Magnus Jonsson : Hi! I have noticed strange behaviour in the task

[slurm-dev] Re: slurmctld prolog delays job start

2013-02-18 Thread Magnus Jonsson
fig generator. Any pointers would be greatly appreciated- I'm out of ideas... Thanks Michael -- Magnus Jonsson, Developer, HPC2N, Umeå Universitet smime.p7s Description: S/MIME Cryptographic Signature

[slurm-dev] Re: task/affinity, --cpu_bind=socket and -c > 1

2013-02-18 Thread Magnus Jonsson
-- Have you tried the --ntasks-per-socket or --ntasks-per-core options? Quoting Magnus Jonsson : > Hi! > > I have noticed strange behaviour in the task/affinity plugin if I > use --cpu_bind=socket and -c > 1. > > My task are distributed one on each sock

[slurm-dev] Buffer overflow bug + patch.

2013-02-22 Thread Magnus Jonsson
Hi! I just found a bug in the slurm that creates a buffer overflow if you run 'scontrol show config'. Patch attached to fix the problem. /Magnus -- Magnus Jonsson, Developer, HPC2N, Umeå Universitet diff --git a/src/common/slurm_protocol_defs.c b/src/common/slurm_protocol_de

[slurm-dev] Problem with backfill and patch for solution

2013-03-01 Thread Magnus Jonsson
idle nodes short job will not start because of this. I have made a patch for backfill with a configuration option (bf_continue) to let backfill continue from the last JobID of the last cycle. This will make backfill look at the whole queue eventually. Best regards, Magnus -- Magnus Jo

[slurm-dev] Re: Problem with backfill and patch for solution

2013-03-01 Thread Magnus Jonsson
speed with that. Best regards, Magnus -- Magnus Jonsson, Developer, HPC2N, Umeå Universitet smime.p7s Description: S/MIME Cryptographic Signature

[slurm-dev] Re: Licenses verification mechanism

2013-03-08 Thread Magnus Jonsson
d again). Does anybody have experience with the case where job (or some script) checks some condition periodically and stay in a queue if the condition has not been complied yet? -- Taras -- Magnus Jonsson, Developer, HPC2N, Umeå Universitet smime.p7s Description: S/MIME Cryptographic Signature

[slurm-dev] Re: Licenses verification mechanism

2013-03-08 Thread Magnus Jonsson
, this solution probably will work for us as well. Also, when a user does not use -L option, than this could be checked (I believe) in contribs/lua/job_submit.lua in several lines of code (in slurm_job_submit function). -- Taras On 03/08/2013 09:37 AM, Magnus Jonsson wrote: We have solved this by u

[slurm-dev] cons_res select_p_select_nodeinfo_set_all problem with multiple partitions.

2013-03-12 Thread Magnus Jonsson
bitstring to count the number of bits in an range (bit_set_count_range) and made a minor improvement of (bit_set_count) while reviewing the range version. Best regards, Magnus -- Magnus Jonsson, Developer, HPC2N, Umeå Universitet diff -ru site/src/common/bitstring.c amd64_ubuntu1004/src/common

[slurm-dev] issue with task/affinity and srun --exclusive

2013-03-21 Thread Magnus Jonsson
nk_ldom sh -c "hwloc-bind --get | ./hex2bin" results in: 00 00 01 01 01 01 01 01 = 0x41041041 This is also looks like the bitmask that task/affinity gets from slurm. Best regards, Magnus -- Magnus Jonsson, Developer, HPC2N, Umeå Universitet

[slurm-dev] sbatch --exclusive --mem-per-cpu

2013-03-21 Thread Magnus Jonsson
As I understand it this will give the wrong input for the fair share scheduler and results the wrong priority (to high) for the user. Best regards, Magnus -- Magnus Jonsson, Developer, HPC2N, Umeå Universitet smime.p7s Description: S/MIME Cryptographic Signature

[slurm-dev] strftime issue(s).

2013-06-17 Thread Magnus Jonsson
buffer. But this might needs to be looked into with more depth. Best regards, Magnus -- Magnus Jonsson, Developer, HPC2N, Umeå Universitet diff -ru site/src/common/parse_time.c amd64_ubuntu1004/src/common/parse_time.c --- site/src/common/parse_time.c 2013-06-05 21:43:00.0 +0200 +++ amd

[slurm-dev] Re: strftime issue(s).

2013-06-17 Thread Magnus Jonsson
be a better solution than changing the code in various places. See attached variation of your patch. Moe Quoting Magnus Jonsson : Hi! We found an issue in sacct that we pined down to a strftime call in 'src/common/parse_time.c' (slurm_make_time_str). Reproducable with (in 2.5.{

[slurm-dev] Re: cons_res: Can't use Partition SelectType

2013-08-07 Thread Magnus Jonsson
Why does slurm complain? I HAVE set CR_Core and I even checked the spelling. Thanks Eva -- Magnus Jonsson, Developer, HPC2N, Umeå Universitet smime.p7s Description: S/MIME Cryptographic Signature

[slurm-dev] Bug in squeue command.

2013-08-22 Thread Magnus Jonsson
uired. Reverting params.max_cpus code I get the expected behaviour. Best regards, Magnus -- Magnus Jonsson, Developer, HPC2N, Umeå Universitet smime.p7s Description: S/MIME Cryptographic Signature

[slurm-dev] Expected start time far, far away...

2013-10-04 Thread Magnus Jonsson
e expected starttimes that are that far in the future should not be allowed to appear. If the starttime is more then a week or two into the future the starttime will probably not be that accurate anyway. Best regards, Magnus -- Magnus Jonsson, Developer, HPC2N, Umeå Universitet smime.p7s Descripti

[slurm-dev] Bad behaviour of slurm with -c

2013-10-04 Thread Magnus Jonsson
mctld[28426]: _slurm_rpc_job_step_create for job 1313514: More processors requested than permitted Oct 4 15:23:03 t-mn02 slurmctld[28426]: completing job 1313514 Oct 4 15:23:03 t-mn02 slurmctld[28426]: sched: job_complete for JobId=1313514 successful, exit code=256 -- Magnus Jonsson, Developer, H

[slurm-dev] Re: Bad behaviour of slurm with -c

2013-10-22 Thread Magnus Jonsson
rmctld[28426]: _slurm_rpc_job_step_create for job 1313514: More processors requested than permitted Oct 4 15:23:03 t-mn02 slurmctld[28426]: completing job 1313514 Oct 4 15:23:03 t-mn02 slurmctld[28426]: sched: job_complete for JobId=1313514 successful, exit code=256 -- Magnus Jonsson, Develope

[slurm-dev] Only allow some nodes in an partition to run jobs that stay within one node.

2014-02-05 Thread Magnus Jonsson
but this might also confuse the users. Best regards, Magnus -- Magnus Jonsson, Developer, HPC2N, Umeå Universitet smime.p7s Description: S/MIME Cryptographic Signature

[slurm-dev] --exclusive together with --ntasks-per-node not working as expected.

2014-02-19 Thread Magnus Jonsson
r more information about how the job was submitted. We are currently running version 2.6.3. Best regards, Magnus -- Magnus Jonsson, Developer, HPC2N, Umeå Universitet JobId=1603907 Name=submit_e UserId=magnus(2066) GroupId=folk(3001) Priority=658834 Account=sysop QOS=normal JobState=R

[slurm-dev] RE: --exclusive together with --ntasks-per-node not working as expected.

2014-02-20 Thread Magnus Jonsson
case srun -l -N2 --tasks-per-node=2 hostname 1: trek0 0: trek0 2: trek1 3: trek1 -Original Message- From: Magnus Jonsson [mailto:mag...@hpc2n.umu.se] Sent: Wednesday, February 19, 2014 1:28 AM To: slurm-dev Subject: [slurm-dev] --exclusive together with --ntasks-per-node not worki