[slurm-dev] Re: Send notification email

2016-09-28 Thread Eckert, Phil
If I understand your question, you can set it in the in slurm.conf file, the 
default is:

MailProg = /usr/bin/mail

From: Fanny Pagés Díaz >
Reply-To: slurm-dev >
Date: Wednesday, September 28, 2016 at 11:45 AM
To: slurm-dev >
Subject: [slurm-dev] Send notification email

I need send notification email from Slurm using other mail server which is not 
the standard one. Any can help me?


[slurm-dev] Re: slum in the nodes not working

2015-12-21 Thread Eckert, Phil
Make sure the slurm.conf file is identical on all nodes.
If the slurmctld is running , and all the slurmd’s are running take a look at 
the slurmctld.log, it should provide some clues, if not you might want to post 
the content of your slurm.conf file.

Phil Eckert
LLNL

From: Fany Pagés Díaz >
Reply-To: slurm-dev >
Date: Monday, December 21, 2015 at 12:39 PM
To: slurm-dev >
Subject: [slurm-dev] Re: slum in the nodes not working

When I start the server, the nodes was down, I start /etc/init.d/slurm en in 
the server and it´s fine, but in the nodes are down. I restart  the nodes again 
and nothing. any idea?

De: Carlos Fenoy [mailto:mini...@gmail.com]
Enviado el: lunes, 21 de diciembre de 2015 12:59
Para: slurm-dev
Asunto: [slurm-dev] Re: slum in the nodes not working

You should not start the slurmctld on all the nodes, only in the head node of 
the cluster, and in the compute nodes start the slurmd with service slurm start

On Mon, Dec 21, 2015 at 6:27 PM, Fany Pagés Díaz 
> wrote:
I had to turn off my cluster by electricity problems, and now slurm not 
working. The nodes are down and the demons of slurm in the nodes fails.
When I run in the slurmctld -D command nodes, I get the following error:

slurmctld: error: this host (compute-0-0) not valid controller (cluster or 
(null))

How can I fix that? any can help me, please?
Ing. Fany Pages Diaz



--
--
Carles Fenoy


[slurm-dev] Re: How can I send a mail when I finished a job?

2015-12-18 Thread Eckert, Phil

I believe that all that is happening in regard to mail, is that the
slurmctld is executing the mail utility, with the standard arguments. Is
mail set up on the node the slurmctld is running on?  A quick test would
be to login there and manually send yourself email.

Phil Eckert
LLNL

On 12/18/15, 9:42 AM, "Fany Pagés Díaz"  wrote:

>
>I send my job like this:
>
>salloc -n 2 -N 2 --gres=gpu:2  --mail-type=ALL --mail-user=fpa...@citi.cu
>mpirun job1
>
>The job finished fine, but never send the email. I don¹t have to do
>anything for the slurm know how send the email?
>
>-Mensaje original-
>De: Wiegand, Paul [mailto:wieg...@ist.ucf.edu]
>Enviado el: viernes, 18 de diciembre de 2015 12:00
>Para: slurm-dev
>Asunto: [slurm-dev] Re: How can I send a mail when I finished a job?
>
>You have to tell it which events you want to receive email about, too.
>Like this in your submit script:
>
>#SBATCH --mail-type=FAIL
>#SBATCH --mail-type=BEGIN
>#SBATCH --mail-type=END
>#SBATCH --mail-user myem...@address.net
>
>
>
>
>> On Dec 18, 2015, at 11:26, Fany Pagés Díaz  wrote:
>> 
>> I need to know the status of the work, but I used the mail-user
>>=myemail  parameter but not working. I have to do some configuration on
>>the server?
>>  
>> Any can help me?
>> Ing. Fany Pagés Díaz


[slurm-dev] Re: A floating exclusive partition

2015-11-19 Thread Eckert, Phil

A possibility might be to do this using reservations.

You could create a 5 node reservation with all concerned users having
access, then have a script run by cron that periodically checks the state
of the node in the reservation, if any go down update the reservation
replacing the down nodes with up nodes. If there are no up nodes determine
the soonest a node will be free and add it to the reservation using the
IGNORE_JOBS flag.

Phil Eckert
LLNL

On 11/19/15, 8:09 AM, "Paul Edmon"  wrote:

>
>Yeah, I guess QoS won't really work for overflow.  I was more thinking
>of the QoS as a way to create a floating partition of 5 nodes with the
>rest being in the public queue.  They would send jobs to the QoS to hit
>that and then when it is full they would submit to public as normal.
>That's at least my thinking, but it's less seamless to the users as they
>will have to consciously monitor what is going on.
>
>-Paul Edmon-
>
>On 11/19/2015 10:50 AM, Daniel Letai wrote:
>>
>> Can you elaborate a little? I'm not sure what kind of QoS will help,
>> nor how to implement one that will satisfy the requirements.
>>
>> On 11/19/2015 04:52 PM, Paul Edmon wrote:
>>>
>>> You might consider a QoS for this.  It may not do everything you want
>>> but it will give you the flexibility.
>>>
>>> -Paul Edmon-
>>>
>>> On 11/19/2015 04:49 AM, Daniel Letai wrote:

 Hi,

 Suppose I have a 100 node cluster with ~5% nodes down at any given
 time (maintanence/hw failure/...).

 One of the projects requires exclusive use of 5 nodes, and be able
 to use entire cluster when available (when other projects aren't
 running).

 I can do this easily if I maintain a static list of the exclusive
 nodes in slurm.conf:

 PartitionName=public Nodes=tux0[01-95] Default=YES
 PartitionName=special Nodes=tux[001-100] Default=NO

 And allowing only that project to use partition special.

 However, due to the downtime of 5%, I'd like to maintain a dynamic
 exclusive 5 nodes.
 Any suggestions?

 The project is serial and deployed as array of single node jobs, so
 I can run it even when the other 95 nodes are full.

 Thanks,
 --Dani_L.


[slurm-dev] Re: User Control of WallTime for running job

2015-11-17 Thread Eckert, Phil
The reason this hss a higher permission level is that a user could game the 
system by submitting a job with a 1 minute time limit, which will generally get 
it started very quickly because of backfill, then they could increase it to 
whatever they wanted. I believe almost all batch system disallow this.

Phil Eckert
LLNL

From: Jay Sullivan >
Reply-To: slurm-dev >
Date: Tuesday, November 17, 2015 at 10:51 AM
To: slurm-dev >
Subject: [slurm-dev] User Control of WallTime for running job

Hello,

I apologize if I missed the answer on how to do this, but I am hoping there is 
a way.

Scenario: A job is in the RUN state, and the job is taking longer than 
expected. The user needs to increase the wall time of the job, to allow it to 
complete. The user cannot increase the wall time, because they do not have 
“operator” or “admin” privileges.

For many reasons, I do not want to give even “operator” control to all users, 
just to give them the ability to adjust their wall time.

So a few questions:

1)  Is there a way to do this with the stock configuration?

2)  If 1 is not possible is there a way to add a custom AdminLevel? One 
where I can set just the commands that users have access to?

3)  If neither of these are possible, can we file an RFE?

Thanks,
-Jay

Jay Sullivan
HPC Systems Administrator
Office: 310-970-3866
Mobile: 424-255-2713



[slurm-dev] Re: Requested node configuration is not available when using -c

2014-09-09 Thread Eckert, Phil
Mike,

In your slurm.conf you have Procs=1, (which is the same as CPUS=1) and Sockets 
(if ommited will be inferred from CPUS, default is 1) and CoresPerSocket 
(default is 1)

So at this point the slurm.conf has a default configuration of 1 core per node.

Phil Eckert
LLNL

From: Michal Zielinski 
michal.zielin...@uconn.edumailto:michal.zielin...@uconn.edu
Reply-To: slurm-dev slurm-dev@schedmd.commailto:slurm-dev@schedmd.com
Date: Tuesday, September 9, 2014 at 6:35 AM
To: slurm-dev slurm-dev@schedmd.commailto:slurm-dev@schedmd.com
Subject: [slurm-dev] Re: Requested node configuration is not available when 
using -c

Josh,

I believe that -n sets the number of tasks. I only want a single task, as when 
a single process uses multiple cores. srun -n 2 hostname returns

linux-slurm2
linux-slurm3

which is definitely not what I want.

Thanks,
Mike


On Mon, Sep 8, 2014 at 8:07 PM, Josh McSavaney 
mcsa...@csh.rit.edumailto:mcsa...@csh.rit.edu wrote:
I believe your slurm.conf is defining 4 nodes with a single logical processor 
each. You are then trying to allocate two CPUs on a single node with srun, 
which (according to your slurm.conf) you do not have.

You may want to consider `srun -n 2 hostname` and see where that lands you.

Regards,

Josh McSavaney
Bit Flipper
Rochester Institute of Technology



On Mon, Sep 8, 2014 at 7:42 PM, Christopher Samuel 
sam...@unimelb.edu.aumailto:sam...@unimelb.edu.au wrote:

On 09/09/14 07:26, Michal Zielinski wrote:

I have a small test cluster (node[1-4]) running slurm 14.03.0 setup with
CR_CPU and no usage restrictions. Each node has just 1 CPU.
[...]
But, *srun -c 2 hostname* does not work, and it returns the above error.

I have no idea why I can't dedicate 2 cores to a single job if I can
dedicate each core individually to a job.

What does scontrol show node say?

cheers,
Chris
--
 Christopher SamuelSenior Systems Administrator
 VLSCI - Victorian Life Sciences Computation Initiative
 Email: sam...@unimelb.edu.aumailto:sam...@unimelb.edu.au Phone: +61 (0)3 903 
55545tel:%2B61%20%280%293%20903%2055545
 http://www.vlsci.org.au/  http://twitter.com/vlsci






[slurm-dev] Re: Fwd: Can I stop slurm from copying a script to execution node

2014-07-10 Thread Eckert, Phil
If you don’t wish to do the submission from the “somepath” directory you can 
use the following sbatch option to achieve what you are looking for.

  -D, --workdir=directory
  Set the working directory of the batch script to directory before 
it is executed.

Phil Eckert
LLNL


From: Thomas Johnson 
tho...@outdoorsnewzealand.co.nzmailto:tho...@outdoorsnewzealand.co.nz
Reply-To: slurm-dev slurm-dev@schedmd.commailto:slurm-dev@schedmd.com
Date: Wednesday, July 9, 2014 at 7:31 PM
To: slurm-dev slurm-dev@schedmd.commailto:slurm-dev@schedmd.com
Subject: [slurm-dev] Fwd: Can I stop slurm from copying a script to execution 
node




I am submitting a job with sbatch /somepath/test.sh

test.sh looks for a config files and other scripts in the same path e.g. 
/somepath/
/somepath/ is available to all submit and compute nodes.

but slurm copies the script to /var/lib/slurm-llnl/slurmd/etc/  before 
executing it. Thus it test.sh can't find the required config and scripts.

I'm changing over from sge where adding the -b y flag to qsub would stop sge 
from copying the script to the execution host.

Is there a similar solution for slurm?








[slurm-dev] Re: pbsdsh -u equivalent

2014-06-30 Thread Eckert, Phil
Hartley,

Sounds like you might be wanting srun.

If I ask for 5 nodes on our rzmerl system:

 salloc -p pdebug -N 5
salloc: Granted job allocation 1966117

 srun hostname
rzmerl1
rzmerl2
rzmerl4
rzmerl3
rzmerl5

Phil Eckert
LLNL

From: Hartley Greenwald jhgreenw...@gmail.commailto:jhgreenw...@gmail.com
Reply-To: slurm-dev slurm-dev@schedmd.commailto:slurm-dev@schedmd.com
Date: Monday, June 30, 2014 at 2:23 PM
To: slurm-dev slurm-dev@schedmd.commailto:slurm-dev@schedmd.com
Subject: [slurm-dev] pbsdsh -u equivalent

Hi,

Is there an equivalent command on slurm for the pbs command pbsdsh -u?  That is 
to say, is there some command which will give one copy of a command to each 
node in a given allocation?  I've combed through the documentation and there 
doesn't seem to be, but that struck me as odd that there wouldn't, so that's 
why I'm asking

Thank you,
Hartley



[slurm-dev] Re: moab/slurm question

2014-04-23 Thread Eckert, Phil
Marti,

If the job is submitted using msub, the release of the dependency would be need 
to be:

mjobctl -m depend=none jobid

If you use:

mjobctl -m depend= jobid

it only removes the dependency in Moab, not Slurm. This works fine if you are 
using just-ini-time scheduling, since the jobs only migrate to Slurm when they 
are have resources to run and dependencies have been met. But using the first 
method should work in both cases.

Phil Eckert
LLNL


From: Hill, Marti T mh...@lanl.govmailto:mh...@lanl.gov
Reply-To: slurm-dev slurm-dev@schedmd.commailto:slurm-dev@schedmd.com
Date: Wednesday, April 23, 2014 at 8:45 AM
To: slurm-dev slurm-dev@schedmd.commailto:slurm-dev@schedmd.com
Subject: [slurm-dev] moab/slurm question


It seems that I can remove a dependency from a job using mjobctl  -m , but it 
does not remove the dependency as far as slurm goes. Squeue still shows the job 
held…

What can we do?

Marti



[slurm-dev] Re: backfill scheduler look ahead?

2014-02-21 Thread Eckert, Phil

Bill,

In addition to what Alejandro said, there is another consideration.

You indicated the top two high priority jobs and the 30 core job, I'm
assuming that the ... indicated a number of other queued jobs ahead of
the 30 core job. Also, you didn't state it, but I'm also assuming there
were other jobs running at the time.

If both of these assumtions are true, then you would need to consider the
completion time of all the running jobs in relation to the needs of the
jobs ahead of the 30 core job in the queue. The 60 cores may be needed by
a higher priority job that is waiting for a currently running job, or
jobs, that will complete in less than two hours and provide the number of
cores it needs.

We have been using backfill batch systems, including SLURM,  here at LLNL
for over 20 years and trying to answer this question for our users is
never easy. A  conclusive way of determining when a job will either start
or be backfilled is to do an squeue and an sinfo then map an X Y
coordinates with time and nodes to represent the blocks that jobs will
use. This is a bit painful, but will provide a lot of insight to backfill.

I hope this is helpful.

Phil Eckert
LLNL




On 2/21/14 2:57 AM, Alejandro Lucero Palau alejandro.luc...@bsc.es
wrote:


Hi Bill,

I think Moe gives you the right answer but it was so concise it can be
easily misunderstood.

If we take the situation you describe with a simple analysis from
backfilling algorithm point of view, the answer is job 300 should be
scheduled without any impact on jobs 201 and 202. However, what I think
Moe tried to say is there are other details to take into account, not
just total number of free cores. Those cores could be really free but,
for example, due to per-node memory requirements they can not be used.
Or maybe you have reservations which are reserving some cores but you
can not see it just looking at free cores. Or you have some licenses or
partitions limitations. Or your system does not allow to share nodes so
free cores does not mean you can use them. All this assuming you do not
have other pending jobs between job 201 and job 300. There is a
backfilling parameter max_job_bf which limits the number of jobs to be
processed by the algorithm. The default number is 50. Also, as
backfilling is so demanding it is suspended after some time. Before
resuming, if something changed in the system, the backfilling algorithm
will start from scratch. You can avoid this using bf_continue parameter.

As you can see there are a lot of details which could have an impact. We
have suffered this situation in the past and it is not always trivial to
see the reason behind scheduling decisions. I added extra debug
information for backfilling algorithm to see how resources were being
reserved by pending jobs and it was helpful. Maybe it would be
interesting to have some way for knowing why a job can not be scheduled.
There are other resource managers giving this detailed information but
it would have a cost, of course.

On 02/21/2014 12:45 AM, Bill Wichser wrote:

 Moe,

 That's quite an obfusicated answer!  I was looking for a yes, this is
 the expected behavior or no, something is amuck.

 In the case presented, again I'll say, it is clearly evident that the
 job waiting, number 300, can run.  It has free cores, the job
 currently waiting will have plenty of cores available when the job it
 is waiting on finishes, yet it does not start simply because the time
 it requires would interfere with the current start time of the
 currently waiting job, #201.

 But the assertion that job 201 would be held up by starting job 300 is
 completely incorrect in this case.

 Now if this is the way the scheduler works, by being simple minded
 about time constraints,  then it is what it is.  I'm asking only if
 this behavior is the expected behavior.  I think you are trying to say
 that indeed this is the case.

 Sincerely,
 Bill


 On 2/20/2014 1:21 PM, Moe Jette wrote:

 Slurm uses what is known as a conservative backfill scheduling
 algorithm. No job will be started that adversely impacts the expected
 start time of _any_ higher priority job. The scheduling can also be
 effected by a job's requirements for memory, generic resources,
 licenses, and resource limits.

 Moe Jette
 SchedMD LLC


 Quoting Bill Wichser b...@princeton.edu:


 Just a question on expected behavior of the backfill scheduler. This
 is an SMP machine if that matters.  Scheduler is backfill with no
 preemption.

 I have a number of jobs queued.  There are three which matter,
 ordered by priority.  In the current state I have 60 free cores.

 job 201 needs 200 cores and will start in 1 hour requiring 24 hours
 of runtime
 job 202 needs 250 cores and will start in 5 hours requiring 24 hours
 of runtime
 ...
 job 300 needs 30 cores and will start in 300 hours requiring 2 hours
 of runtime

 The job completing in 1 hour will free 252 cores.

 Clearly, starting job 300 will not impact job 201's start time in
 any way.  Yet 

[slurm-dev] Re: Can't use sbatch with cron

2013-11-22 Thread Eckert, Phil

A lot of suggestions of what to check for here:

https://groups.google.com/forum/#!topic/slurm-devel/qduhQ5EbjaQ

Phil Eckert
LLNL

On 11/21/13 5:00 PM, Arun Durvasula arun.durvas...@gmail.com wrote:

Zero Bytes were transmitted or received


[slurm-dev] Re: Admin reservation on busy nodes

2013-11-12 Thread Eckert, Phil
I see the nodes busy message only if I am trying to create a reservation on top 
of another reservation that includes the same nodes. You might try adding the 
overlap flag if this is the case.

Phil Eckert
LLNL

From: Jacqueline Scoggins jscogg...@lbl.govmailto:jscogg...@lbl.gov
Reply-To: slurm-dev slurm-dev@schedmd.commailto:slurm-dev@schedmd.com
Date: Tuesday, November 12, 2013 9:27 AM
To: slurm-dev slurm-dev@schedmd.commailto:slurm-dev@schedmd.com
Subject: [slurm-dev] Re: Admin reservation on busy nodes

I tried that and it stated that the nodes were busy.

Jackie


On Tue, Nov 12, 2013 at 9:16 AM, Paul Edmon 
ped...@cfa.harvard.edumailto:ped...@cfa.harvard.edu wrote:
Include the ignore_jobs flag.  That will force the reservation.

-Paul Edmon-


On 11/12/2013 12:11 PM, Jacqueline Scoggins wrote:
Running slurm 2.5.7 and tried to reserve the nodes of the cluster because of 
hardware issues that needed to be repaired.  Some of the nodes were allocated 
with jobs and others were not. Tried to do the following but got an error that 
the Nodes were busy and the reservation was not set.

scontrol create reservation flags=ignore_jobs,maint starttime=now 
endtime=-mm-ddThh:mm
partition=blah

It would not work.

Is there a way of setting system reservations on a partition even if there 
running jobs allocated to nodes?

Thanks

Jackie






[slurm-dev] Re: Admin reservation on busy nodes

2013-11-12 Thread Eckert, Phil
Jackie,

I was trying this with an earlier version of SLURM, I just build a 2.5.7 test 
system and tried it again, and I am seeing the same failures that you do when 
any of the nodes in the partition are allocated. A workaround is to use the 
nodes= option, ie:

scontrol create reservation flags=ignore_jobs nodes=tnodes[32-591] 
starttime=now endtime=tomorrow partition=pbatch user=eckert

Phil Eckert
LLNL

From: Jacqueline Scoggins jscogg...@lbl.govmailto:jscogg...@lbl.gov
Reply-To: slurm-dev slurm-dev@schedmd.commailto:slurm-dev@schedmd.com
Date: Tuesday, November 12, 2013 10:58 AM
To: slurm-dev slurm-dev@schedmd.commailto:slurm-dev@schedmd.com
Subject: [slurm-dev] Re: Admin reservation on busy nodes

I also believe I tried that one as well as the other two and each time I got 
Nodes busy message. If the nodes are in alloc state will either of these flags 
work?  From what I saw they would not work in this case.

Jackie


On Tue, Nov 12, 2013 at 9:34 AM, Eckert, Phil 
ecke...@llnl.govmailto:ecke...@llnl.gov wrote:
I see the nodes busy message only if I am trying to create a reservation on top 
of another reservation that includes the same nodes. You might try adding the 
overlap flag if this is the case.

Phil Eckert
LLNL

From: Jacqueline Scoggins jscogg...@lbl.govmailto:jscogg...@lbl.gov
Reply-To: slurm-dev slurm-dev@schedmd.commailto:slurm-dev@schedmd.com
Date: Tuesday, November 12, 2013 9:27 AM
To: slurm-dev slurm-dev@schedmd.commailto:slurm-dev@schedmd.com
Subject: [slurm-dev] Re: Admin reservation on busy nodes

I tried that and it stated that the nodes were busy.

Jackie


On Tue, Nov 12, 2013 at 9:16 AM, Paul Edmon 
ped...@cfa.harvard.edumailto:ped...@cfa.harvard.edu wrote:
Include the ignore_jobs flag.  That will force the reservation.

-Paul Edmon-


On 11/12/2013 12:11 PM, Jacqueline Scoggins wrote:
Running slurm 2.5.7 and tried to reserve the nodes of the cluster because of 
hardware issues that needed to be repaired.  Some of the nodes were allocated 
with jobs and others were not. Tried to do the following but got an error that 
the Nodes were busy and the reservation was not set.

scontrol create reservation flags=ignore_jobs,maint starttime=now 
endtime=-mm-ddThh:mm
partition=blah

It would not work.

Is there a way of setting system reservations on a partition even if there 
running jobs allocated to nodes?

Thanks

Jackie








[slurm-dev] Re: Admin reservation on busy nodes

2013-11-12 Thread Eckert, Phil
Jackie,

it looks like in 2.5.7, according to the scontrol man page,  the correct syntax 
would be:

scontrol create reservation flags=PART_NODES,IGNORE_JOBS nodes=ALL 
starttime=now endtime=tomorrow partitionname=pbatch  user=eckert

but unfortunately, that doesn't work either.

Phil


From: Jacqueline Scoggins jscogg...@lbl.govmailto:jscogg...@lbl.gov
Reply-To: slurm-dev slurm-dev@schedmd.commailto:slurm-dev@schedmd.com
Date: Tuesday, November 12, 2013 1:29 PM
To: slurm-dev slurm-dev@schedmd.commailto:slurm-dev@schedmd.com
Subject: [slurm-dev] Re: Admin reservation on busy nodes

Here is my problem Phil,

My node name is not like n00[00-91] instead we have a suffix added to our 
hostname like n.jackie0.  Since we have multiple n nodes we had to add 
the cluster they were associated to on the FQDN.  And when I tried 
nodes='n00[00-91].jackie0' I got a message that the name of the names were not 
valid.  So I tried only the partition and it still did not wok.

Thanks

Jackie



On Tue, Nov 12, 2013 at 11:49 AM, Eckert, Phil 
ecke...@llnl.govmailto:ecke...@llnl.gov wrote:
Jackie,

I was trying this with an earlier version of SLURM, I just build a 2.5.7 test 
system and tried it again, and I am seeing the same failures that you do when 
any of the nodes in the partition are allocated. A workaround is to use the 
nodes= option, ie:

scontrol create reservation flags=ignore_jobs nodes=tnodes[32-591] 
starttime=now endtime=tomorrow partition=pbatch user=eckert

Phil Eckert
LLNL

From: Jacqueline Scoggins jscogg...@lbl.govmailto:jscogg...@lbl.gov
Reply-To: slurm-dev slurm-dev@schedmd.commailto:slurm-dev@schedmd.com
Date: Tuesday, November 12, 2013 10:58 AM

To: slurm-dev slurm-dev@schedmd.commailto:slurm-dev@schedmd.com
Subject: [slurm-dev] Re: Admin reservation on busy nodes

I also believe I tried that one as well as the other two and each time I got 
Nodes busy message. If the nodes are in alloc state will either of these flags 
work?  From what I saw they would not work in this case.

Jackie


On Tue, Nov 12, 2013 at 9:34 AM, Eckert, Phil 
ecke...@llnl.govmailto:ecke...@llnl.gov wrote:
I see the nodes busy message only if I am trying to create a reservation on top 
of another reservation that includes the same nodes. You might try adding the 
overlap flag if this is the case.

Phil Eckert
LLNL

From: Jacqueline Scoggins jscogg...@lbl.govmailto:jscogg...@lbl.gov
Reply-To: slurm-dev slurm-dev@schedmd.commailto:slurm-dev@schedmd.com
Date: Tuesday, November 12, 2013 9:27 AM
To: slurm-dev slurm-dev@schedmd.commailto:slurm-dev@schedmd.com
Subject: [slurm-dev] Re: Admin reservation on busy nodes

I tried that and it stated that the nodes were busy.

Jackie


On Tue, Nov 12, 2013 at 9:16 AM, Paul Edmon 
ped...@cfa.harvard.edumailto:ped...@cfa.harvard.edu wrote:
Include the ignore_jobs flag.  That will force the reservation.

-Paul Edmon-


On 11/12/2013 12:11 PM, Jacqueline Scoggins wrote:
Running slurm 2.5.7 and tried to reserve the nodes of the cluster because of 
hardware issues that needed to be repaired.  Some of the nodes were allocated 
with jobs and others were not. Tried to do the following but got an error that 
the Nodes were busy and the reservation was not set.

scontrol create reservation flags=ignore_jobs,maint starttime=now 
endtime=-mm-ddThh:mm
partition=blah

It would not work.

Is there a way of setting system reservations on a partition even if there 
running jobs allocated to nodes?

Thanks

Jackie










[slurm-dev] Re: Job count exceeds limit

2013-08-09 Thread Eckert, Phil

I believe you have exceeded the MaxJobCount specified in your slurm.conf,
or have reached the default of 1 jobs.

   MaxJobCount
  The maximum number of jobs SLURM can have in its active
database at one time. Set the values of MaxJobCount and MinJobAge
  to insure the slurmctld daemon does not exhaust its memory
or other resources. Once this limit is  reached,  requests  to
  submit  additional  jobs will fail. The default value is
1 jobs. This value may not be reset via scontrol reconfig.
  It only takes effect upon restart of the slurmctld daemon.

Phil Eckert

LLNL

On 8/9/13 9:08 AM, Mario Kadastik mario.kadas...@cern.ch wrote:


Hi,

lately we've started to see this:

[2013-08-09T18:57:12+03:00] error: create_job_record: job_count exceeds
limit
[2013-08-09T18:57:13+03:00] error: create_job_record: job_count exceeds
limit
[2013-08-09T18:57:16+03:00] error: create_job_record: job_count exceeds
limit

and I can't quite understand where it comes from.

Mario Kadastik, PhD
Senior researcher

---
  Physics is like sex, sure it may have practical reasons, but that's
not why we do it 
 -- Richard P. Feynman


[slurm-dev] Re: Job submit plugin to improve backfill

2013-06-28 Thread Eckert, Phil
Another route that could be taken is to set the DefaultTime for a
partition to 0, and the
small patch attached to this email will reject a job when is has no time
limit specified
and the default_time limit is 0. I also modified the
ESLURM_INVALID_TIME_LIMIT
to include information that the error might be because of a missing time
limit.

Phil Eckert
LLNL


On 6/28/13 7:29 AM, Daniel M. Weeks week...@rpi.edu wrote:

At CCNI, we use backfill scheduling on all our systems. However, we have
found that users typically do not specify a time limit for their job so
the scheduler assumes the maximum from QoS/user limits/partition
limits/etc. This really hurts backfilling since the scheduler remains
ignorant of short jobs.

Attached is a small patch I wrote containing a job submit plugin and a
new error message. The plugin rejects a job submission when it is
missing a time limit and will provide the user with a clear and distinct
error.

I've just re-tested and the patch applies and builds cleanly on the
slurm-2.5, slurm-2.6, and master branches.

Please let me know if you find this useful, run across problems, or have
suggestions/improvements. Thanks.

-- 
Daniel M. Weeks
Systems Programmer
Computational Center for Nanotechnology Innovations
Rensselaer Polytechnic Institute
Troy, NY 12180
518-276-4458



spatch
Description: spatch


[slurm-dev] Re: fairshare usage

2013-01-22 Thread Eckert, Phil
Have you looked at sshare?

Phil Eckert
LLNL

From: Mario Kadastik mario.kadas...@cern.chmailto:mario.kadas...@cern.ch
Reply-To: slurm-dev slurm-dev@schedmd.commailto:slurm-dev@schedmd.com
Date: Tuesday, January 22, 2013 11:17 AM
To: slurm-dev slurm-dev@schedmd.commailto:slurm-dev@schedmd.com
Subject: [slurm-dev] fairshare usage

Hi,

is there some decent way to get multifactor fairshare current state? Something 
akin to maui's diagnose -f output that shows groups (accounts for slurm) and 
users with their fairshare target as well as their historic usage over the past 
N days. This would seriously help understand how the fairshare is computed 
based on the actual usage statistics and current cluster state.

For example we have all user fairshares set as parent and for the accounts:

   Account Share
-- -
  root 1
  grid 1
  grid-ops 1
  hepusers   100
 kbfiusers 1

now let's assume one of the users in hepusers spends the past N days computing 
with the full cluster and then another user submits a number of jobs it would 
be logical to assume that as there is no distinction between the  users in an 
account the newcomers priority would be higher as (s)he hasn't had any 
allocated time.

[root@slurm-1 slurm]# sreport cluster accountutilizationbyuser start=2013-01-08

Cluster/Account/User Utilization 2013-01-08T00:00:00 - 2013-01-21T23:59:59 
(1209600 secs)
Time reported in CPU Minutes

  Cluster Account Login Proper Name   Used
- --- - --- --
t2estoniaroot  7801048
t2estoniagrid0
t2estoniagridcms134 mapped user fo+  0
t2estoniagrid sgmcms000 mapped user fo+  0
t2estoniahepusers  7801048
t2estoniahepusersandres Andres Tiko  85048
t2estoniahepusers mario  Mario Kadastik7716000

so according to this Mario (me) has computed a huge amount of time in 
comparison to andres. However if I look at the priorities from sinfo -nl I see 
this:

[root@slurm-1 slurm]# sprio -nl|head -3
  JOBID USER PRIORITY   AGEFAIRSHARE  JOBSIZEPARTITION  QOS
  53498mario 0.3497 0.2404977  0.4897101  0.9919238  1.000  
0.000
  53499mario 0.3497 0.2404977  0.4897101  0.9919238  1.000  
0.000
[root@slurm-1 slurm]# sprio -nl|grep andres|head -1
  53835   andres 0.3497 0.2396412  0.4897101  0.9919238  1.000  
0.000

so in fact the fairshare factor is equivalent for both users no matter that one 
has been getting a lot of the resource while the other has not.

or do I misunderstand the =parent part?  I tried also setting all users shares 
to 1 and have no clue how long it will take for sprio to recompute this, but 
right now it's showing the same priorities.

That's one of the reasons why I'd like to be able to see how the actual usage 
and decay over time affect the factor so that I can better understand the 
algorithm and tune the weights.

Thanks,

Mario Kadastik, PhD
Researcher

---
  Physics is like sex, sure it may have practical reasons, but that's not why 
we do it
 -- Richard P. Feynman




[slurm-dev] Re: Problem submitting jobs from a non-compute node

2012-12-11 Thread Eckert, Phil

I have scp'd it as moab.log.invalid.gz

On 12/11/12 1:00 PM, Moe Jette je...@schedmd.com wrote:


I would guess that your machine can communicate with the cluster's
head node (where the slurmctld daemon executes and creates the job
allocation), but not the compute nodes (where the slurmd daemons
execute and spawn your tasks). It's probably a network issue.

Quoting Reza Ramazani-Rend r.ramaz...@gmail.com:

 Hi,

  I am trying to set up a machine for submitting jobs to a cluster that
uses
 slurm. But, when I try to submit a job, for example, using srun command,
 despite the job being allocated resources (for example using squeue
shows
 the job running with the correct amount of resources allocated), it
fails
 to run the application, and I have to terminate the srun process by a
kill
 command on the local machine or use scancel to cancel the job and free
the
 resources for other users. I tried to follow the instructions given on
the
 mailing list for similar problems, and it seems that the machine that
 submits the job fails to receive signals from the compute node. I am
 attaching the output from ³scontrol show config², the srun command log
 (logsrunlocal from ³srun ­v ­p partitionname date 21 | tee
log²),
 and the output of strace (from ³strace ­r ­f ­o logfile srun Š²).

  Other machines on the network with similar configurations can submit
jobs
 without a problem. The log file from the ³srun ­vŠ² command does not
 indicate any problems that I could see until I terminate the job to free
 the resources (for comparison, logsrun301 is the log file from a
successful
 run from one of the compute nodes). The strace log, however, shows that
the
 client is waiting for a signal that it never receives (line 744,
 futex(0x4724ba4,
 FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 1, {1355239853, 0},
  unfinished ..., and line745, ... rt_sigtimedwait
 resumed ) = 15).

  The munge daemon is running on the client, and the permissions to all
the
 directories and files are set up as instructed in the installation
 document. I also thought selinux might be blocking the communications,
but
 disabling it didn¹t help.

  I was wondering if you can identify any problems that I have
overlooked or
 if anything is wrong with the set-up.

  Thank you.



[slurm-dev] Re: Job name env var not set correctly

2012-10-09 Thread Eckert, Phil

In the sbatch code, it checks to see if a job name is provided, if so it
will set the SLURM_JOB_NAME environment variable, but since the overwrite
argument of the call is 0, it doesn't do so if the variable is already
set, which is the case you are running into once the first job is
submitted.

if (opt.job_name)
setenv(SLURM_JOB_NAME, opt.job_name, 0);

each successive job is using the already set environment name as the job
name.

One way to accomplish what you are seeking would be to unset the
SLURM_JOB_NAME environment variable prior to making the sbatch call in the
script.



On 10/9/12 9:10 AM, Carl Schmidtmann carl.schmidtm...@rochester.edu
wrote:


We've run into a foible of how sbatch sets up the environment for
scripts...

SLURM_JOB_NAME is supposed to reflect the currently assigned name (-J or
SBATCH_JOB_NAME), but if you queue up a script from within an executing
queue script, slurm does not overwrite SLURM_JOB_NAME with a new one
regardless of what SBATCH_JOB_NAME is set to or the passed -J option.

For example,

#!/bin/bash

echo My name is $SLURM_JOB_NAME
if [ -e halt ] ; then rm halt ; exit 0 ; fi
sbatch --partition=debug --time=0:1:0 --dependency=afterany:$SLURM_JOB_ID
--nodes=1 --job-name=bar /path/to/this/script
touch halt
sleep 10


If I queue this up with,
   sbatch -p debug -t 0:1:0 --nodes 1 --job-name=foo script

The first output will say My name is foo, but the second one will also
say My name is foo.

It turns out that you can change SLURM_JOB_NAME and it will be propagated
to the next queued script.  So inserting,
   SLURM_JOB_NAME=bar
before the sbatch command works as expected.


The other oddity is that the name that shows up in squeue will change.
Just not the env var...

-- 
Carl Schmidtmann 
Center for Integrated Research Computing
University of Rochester 


[slurm-dev] Re: Problem with quotes in sched/wiki2 plugin

2012-06-06 Thread Eckert, Phil

According to adaptive this change was introduced in:

5_4 branch as of the .0 version changeset 7922ced7105a79a3


Phil Eckert
LLNL

On 6/6/12 1:29 PM, Eckert, Phil ecke...@llnl.gov wrote:


In Moab 6.1 and later the Moab wiki does filter out the quotes in the data
it gets from SLURM. We currently use SLURM 2.3.3 and Moab 6.1 and see none
of the issues that Jon is seeing. Looking through the Moab wiki code I
found the change that does this, and I have a query into Adaptive as to
which release they first implemented it in. I will post the version when I
hear back from them.

Phil Eckert
LLNL

On 6/6/12 12:47 PM, Jon Bringhurst j...@lanl.gov wrote:


I think a good end result would be this:

* Use quotes in the wiki2 syntax to avoid the # issue.
* Have the moab folks update their wiki specification to allow quotes,
or at least find out why it doesn't already support quotes.
* Update the slurm docs to replace Use Moab version 5.0.0 or higher
with whatever version of moab supports quotes in wiki.

In the meantime, I'm going to try to figure out what version of moab we
need to upgrade to for when we upgrade to slurm newer than April 2012 on
production clusters. It's probably overdue to make a push for upgrading
from 5.3.5 anyway.

-Jon

On 06/06/2012 01:12 PM, Moe Jette wrote:
 
 My recollection is this change was made to address someone submitting
 a job in which the working directory contained a #. When Moab read
 job state information from SLURM, it interpreted the # as a job
 separator and could not parse anything after that point. Quoting the
 working directory name fixed the problem. This same problem could
 happen with several other fields that could contain a #. Removing
 this patch will restore this parsing problem.
 
 Moe
 
 
 Quoting Jon Bringhurst j...@lanl.gov:
 

 I'd like to propose backing out a patch, as well as removing quotes
from
 SUBMITHOST in wiki2.

 http://bugs.schedmd.com/show_bug.cgi?id=29

 
https://github.com/SchedMD/slurm/commit/6cd20848dc3ed5375b637cbf34a6ba
6
af5fe9653

 It's breaking several things when used with moab 5.3.5, including
 classes and accounts.

 For example, we're getting this error:

 NOTE:  job violates constraints for partition slurm (partition
 slurm does not support requested class standard)

 note that standard should be standard.

 Here's a patch to back it out as well as remove the quotes from
SUBMITHOST:

 diff --git a/src/plugins/sched/wiki2/get_jobs.c
 b/src/plugins/sched/wiki2/get_jobs.c
 index 3b6153e..ec5d75b 100644
 --- a/src/plugins/sched/wiki2/get_jobs.c
 +++ b/src/plugins/sched/wiki2/get_jobs.c
 @@ -326,7 +326,7 @@ static char *   _dump_job(struct job_record
 *job_ptr, time_t update_time)

 if (!IS_JOB_FINISHED(job_ptr)  job_ptr-details 
 job_ptr-details-work_dir) {
 -   snprintf(tmp, sizeof(tmp), IWD=\%s\;,
 +   snprintf(tmp, sizeof(tmp), IWD=%s;,
  job_ptr-details-work_dir);
 xstrcat(buf, tmp);
 }
 @@ -335,17 +335,17 @@ static char * _dump_job(struct job_record
 *job_ptr, time_t update_time)
 xstrcat(buf, FLAGS=INTERACTIVE;);

 if (job_ptr-gres) {
 -   snprintf(tmp, sizeof(tmp),GRES=\%s\;,
job_ptr-gres);
 +   snprintf(tmp, sizeof(tmp),GRES=%s;, job_ptr-gres);
 xstrcat(buf, tmp);
 }

 if (job_ptr-resp_host) {
 -   snprintf(tmp, sizeof(tmp),SUBMITHOST=\%s\;,
 job_ptr-resp_host);
 +   snprintf(tmp, sizeof(tmp),SUBMITHOST=%s;,
 job_ptr-resp_host);
 xstrcat(buf, tmp);
 }

 if (job_ptr-wckey) {
 -   snprintf(tmp, sizeof(tmp),WCKEY=\%s\;,
job_ptr-wckey);
 +   snprintf(tmp, sizeof(tmp),WCKEY=%s;,
job_ptr-wckey);
 xstrcat(buf, tmp);
 }

 @@ -373,7 +373,7 @@ static char *   _dump_job(struct job_record
 *job_ptr, time_t update_time)
 else
 pname = UNKNOWN;  /* should never see this */
 snprintf(tmp, sizeof(tmp),
 -   QUEUETIME=%u;STARTTIME=%u;RCLASS=\%s\;,
 +   QUEUETIME=%u;STARTTIME=%u;RCLASS=%s;,
 _get_job_submit_time(job_ptr),
 (uint32_t) job_ptr-start_time, pname);
 xstrcat(buf, tmp);
 @@ -407,7 +407,7 @@ static char *   _dump_job(struct job_record
 *job_ptr, time_t update_time)

 if (job_ptr-account) {
 snprintf(tmp, sizeof(tmp),
 -   ACCOUNT=\%s\;, job_ptr-account);
 +   ACCOUNT=%s;, job_ptr-account);
 xstrcat(buf, tmp);
 }

 -Jon

 


[slurm-dev] Re: Implementing soft limits and notifications with Slurm/Moab

2012-06-05 Thread Eckert, Phil

Michael,

I was curious, so I tried the:

RESOURCELIMITPOLICY:ALWAYS,EXTENDEDVIOLATION:NOTIFY,CANCEL:12:00:00

parameter on my test cluster so that I could observe the behavior, and I
also used the OverTimeLimit parameter in my SLURM test system. When the
initial time limit is reached, I see that the job remaining time in Moab
goes negative. From what I've read, Torque supports a hard and soft limit,
so when it uses its initial time, the time remaining reflects to the
extended value, but that fact that with SLURM showing a negative value, at
least it is a an indication that the job is running on the extended time
allotment.

You are saying the jobs shows cancelled after using the initial time, but
I have found that if use the Moab parameter:

JOBMAXOVERRUN 12:00:00

in my moab.cfg, the job will stay in the system and showq will display the
job (reflecting a negative time value) until completion.

Phil Eckert
LLNL



On 6/5/12 8:33 AM, Michael Gutteridge michael.gutteri...@gmail.com
wrote:


On Mon, Jun 4, 2012 at 1:48 PM, Lipari, Don lipa...@llnl.gov wrote:

 What appears to be happening is that Moab is sending the canceljob
message to SLURM when the job's time limit expires.  It should email the
user at that point, but hold off issuing the canceljob command to SLURM
until Moab's EXTENDEDVIOLATION grace period - 12 hours in this case -
has transpired.


I didn't go into this in detail, but it is slurm that is issuing the
cancel command to the job at the originally specified end time- why I
originally set OverTimeLimit=UNLIMITED.  Moab is not sending the
cancel command until it reaches EXTENDEDVIOLATION.

 By setting SLURM's OverTimeLimit to match Moab's grace period, Michael
has solved the problem.

What happens at that point is that the job's EndTime is set to the
time at which EXTENDEDVIOLATION was reached.  That's when the
OverTimeLimit timer takes over- thus, slurm won't cancel the job until
StartTime + WallTime + EXTENDEDVIOLATION + OverTimeLimit.  It works,
but Moab is confused about the job state after EXTENDEDVIOLATION
(i.e., it thinks the job has been cancelled, but the RM reports it
active).

So yes, eventually this works, but has undesirable side effects (i.e.
the job isn't visible in showq, I don't know how the resources would
be  scheduled, etc.)


 If the above changes to Moab behavior are not made, I would recommend
using SLURM's OverTimeLimit as Michael described.  However, I don't see
the need to eliminate _timeout_job function from the wiki*/cancel_job.c
modules.


What I've put together (but haven't tried out yet) is leaving the
_timeout_job module as is, but adding the job cancel code from
_cancel_job.  So it both sets EndTime (which I'm guessing might be
good for accounting purposes) and cancels the job.  Might be
redundant, but likely harmless anyway.

 Don

 -Original Message-
 From: Moe Jette [mailto:je...@schedmd.com]
 Sent: Monday, June 04, 2012 11:29 AM
 To: slurm-dev
 Subject: [slurm-dev] Re: Implementing soft limits and notifications
 with Slurm/Moab


 The code in question dates back about six years to the first
 SLURM/Moab integration. I have no idea what the reason is for the
 reason for the different treatment of job cancellation for time limit
 and an administrator cancellation. I can understand the problem caused
 by the current SLURM code and your configuration. It seems that
 removing the _timeout_job function and calling the _cancel_job()
 function in all cases is reasonable.  If you want to validate that and
 respond to the list, we can change the SLURM code.

 Quoting Michael Gutteridge michael.gutteri...@gmail.com:

 
  I have kind of an interesting situation.  We'd like to enable jobs to
  overrun their requested time by some amount as well as provide
  notifications when that wall time is close to used up.  We've got
 Moab
  Workload Manager (6.1.6) and Slurm 2.3.5 installed.  I'd originally
  attempted to use Moab's resource limit policy:
 
  RESOURCELIMITPOLICY:ALWAYS,EXTENDEDVIOLATION:NOTIFY,CANCEL:12:00:00
 
  Meaning that when the job goes over time, moab notifies the user but
  then cancels the job after it's gone 12 hours past it's wall time.
  Now, this initially didn't work- Slurm just kills the job.  I set
  OverTimeLimit=UNLIMITED and then I got the notifications OK... but
  when the job reaches its overtime limit, the job isn't cancelled.
  Moab cancels the job.  I see it send Slurm the message via wiki2:
 
  05/31 11:15:31  INFO: message sent: 'CMD=CANCELJOB ARG=1508
  TYPE=WALLCLOCK'
 
  And I see slurm acknowledge the event:
 
  112785 05/31 11:15:31  INFO: received message
 'CK=8512712decedc584
  TS=1338488131 AUTH=slurm DT=SC=0 RESPONSE=job 1508 cancelled
  successfully' from wiki server
  112786 05/31 11:15:31  MSUDisconnect(9)
  112787 05/31 11:15:31  INFO: job '1508' cancelled through WIKI RM
 
  At higher log levels I see that Slurm sets the end time for the job
 to
  the current time.  In