[slurm-dev] Re: Installing SLURM locally on Ubuntu 16.04

2017-11-05 Thread Benjamin Redling

Hi Will,

looking at your stackoverflow postings there doesn't seem to be anything
helpful. Did you solve your problem in the meantime?

Am 30.10.2017 um 03:12 schrieb Will L:
> I am trying to install SLURM 15.08.7 locally on an Ubuntu 16.04 machine.
> In my case, the master and worker nodes are the same. 
[...]

Have you tried starting both slurmctld and slurmd in the foreground (-D)?
When I have real trouble with a cluster I open two terminals
side-by-side, set debugging in the slurm.conf to something reasonable
high. Then I start...
... one with: slurmctld -D -f 
... another with: slurmd -D -f 

(I only remember one case where that wasn't helpful: a seemingly random
"user unknown" file access problem)

Regards,
Benjamin
-- 
FSU Jena | JULIELab.de/Staff/Benjamin+Redling.html
☎ +49 3641 9 44323


[slurm-dev] Re: Head Node Hardware Requirements

2017-10-18 Thread Benjamin Redling

Am 17. Oktober 2017 23:12:35 MESZ, schrieb Daniel Barker :
>Hi, All,
>
>I am gathering hardware requirements for head nodes for my next
>cluster.
>The new cluster will have ~1500 nodes. We ran 5 million jobs last year.
>I
>plan to run the slurmctld on one node and the slurmdbd on another. ​I
>also
>plan to write the StateSaveLocation to an NFS appliance. Does the
>following
>configuration look sufficient?
>
>
>Node1:
>slurmctld
>128GB ram
>2TB local disk
>12 core high clock rate CPU
>
>​Node2:​
>​slurmdbd
>slurmctld backup
>128GB ram
>2TB local disk
>mirrored 500GB SSD for database
>12 core high clock rate CPU
>
>
>Do I need more RAM in either node? Is12 cores enough? Is 500GB large
>enough
>for the Slurm database?

What do you do when Node2 goes down?
Regards, BR
--
FSU Jena | JULIELab.de/Staff/Benjamin+Redling.html
vox: +49 3641 9 44323 | fax: +49 3641 9 44321


[slurm-dev] Re: Camacho Barranco, Roberto <rcamachobarra...@utep.edu> ssirimu...@utep.edu

2017-10-10 Thread Benjamin Redling


Hello everybody,

On 10/10/17 8:25 AM, Marcus Wagner wrote:

For a quick view, manually starting the controller

slurmctld -D -vvv


good advice,
for beginners (or a tired help-seeker) a hint to "-f" might be necessary.
Without the current configuration running the central management daemon 
is IMO useless.


Regard,
Benjamin
--
FSU Jena | JULIELab.de/Staff/Benjamin+Redling.html
☎ +49 3641 9 44323


[slurm-dev] Re: Setting up Environment Modules package

2017-10-05 Thread Benjamin Redling


Hello Mike,

On 10/4/17 6:10 PM, Mike Cammilleri wrote:

I'm in search of a best practice for setting up Environment Modules for our 
Slurm 16.05.6 installation (we have not had the time to upgrade to 17.02 yet). 
We're a small group and had no explicit need for this in the beginning, but as 
we are growing larger with more users we clearly need something like this.

what are your needs that brought you to "Environment Modules"?

Have you seen Singularity containers?
We are a small group and they seem to be less of a burden to have 
reproducible environment and allow users to relatively easily get a 
certain setup.


(Main motivation here is the use of TensorFlow, which has the best 
support on Ubuntu. But using that as the host OS proved to be a major 
pain because of its ridiculous package quality compared to a stable Debian.
With singularity I can provide whatever container/distribution is needed 
on top of a stable host OS.
And thank to NeuroDebian and tensorflow/tensorflow:latest-gpu-py3 even 
the newest version with access to Nvidia GPUs -- all ready to use, no 
messing around with dependencies)


Regards,
Benjamin
--
FSU Jena | JULIELab.de/Staff/Benjamin+Redling.html
☎ +49 3641 9 44323


[slurm-dev] Re: How user can requeue an old job?

2017-09-14 Thread Benjamin Redling


On 14.09.2017 11:12, Merlin Hartley wrote:
> I wonder: what would be the ramifications of setting this to 0 in 
production? "A value of zero prevents any job record purging”

> Or is that option only really there for debugging?

(just guessing) should be horrible: once "MaxJobCount" (s. slurm.conf 
help again) is reached, nobody will be able to submit any jobs?


BR,
BR
--
FSU Jena | JULIELab.de/Staff/Benjamin+Redling.html
☎ +49 3641 9 44323


[slurm-dev] Re: How user can requeue an old job?

2017-09-14 Thread Benjamin Redling




On 14.09.2017 10:52, Taras Shapovalov wrote:

Hey guys!

As far as I know now there is a built-in 5 min time interval after a job 
is finished, which leads to the job removal from Slurm "memory" (not 
from accounting). This is ok until users need to requeue the job by some 
reason. Thus if 5 minutes have already passed, then requeue command does 
not work.


[...]


Is there any way to extend this 5 minuts period?


https://slurm.schedmd.com/slurm.conf.html

s. MinJobAge

BR,
BR

--
FSU Jena | JULIELab.de/Staff/Benjamin+Redling.html
☎ +49 3641 9 44323


[slurm-dev] Re: Fwd: using slurm at diffrent VLANS

2017-08-24 Thread Benjamin Redling

On 2017-08-24 09:18, nir wrote:
[...]
> slurm server ip 192.168.10.1
> compute nodes 10.2.2.3-40
> 
> Until yesterday the compute nodes were in the same VLAN as the slurm ,
> but i had to move them to new VLAN.
> After i moved them there is ping connection between slurm server and the
> compute nodes and reverse.
> slurm.conf was updated at slurm server and compute nodes with the new IP
> Address.
> 
> When changing the state of the compute nodes to IDLE  they looks ok for
> some time - 1, 2 , 5 minutes and then goes DOWN
> 
> The error at slurm server slurmctld shows nodes not responding..
> 
> Does anyone had such issue or any have any idea how can be solved ?

Have you seen
https://slurm.schedmd.com/troubleshoot.html
and worked through all the steps (and links)?

BR,
BR
-- 
FSU Jena | JULIELab.de/Staff/Benjamin+Redling.html
☎ +49 3641 9 44323


[slurm-dev] Re: Can you run SLURM on a single node ?

2017-08-10 Thread Benjamin Redling

Am 10. August 2017 13:47:21 MESZ, schrieb Sean McGrath :
>
>Yes, you can run slurm on a single node. There is no need for for a
>different
>head and compute node(s).
>
>You will need to set Shared=Yes if you want multiple people to be able
>to run on
>the machine simultaneously. 
>
>The slurm.conf will have a single node defined in it.
>
>Best
>
>Sean
>
>On Thu, Aug 10, 2017 at 05:39:29AM -0600, Carlos Lijeron wrote:
>
>> Hi Everyone,
>> 
>> In order to use resources more efficiently on a server that has 64
>CPU Cores and 1 TB of RAM, is it possible to use SLURM on a stand alone
>server, or do you always need a head node and compute nodes to setup
>the clients?   Please advise.
>> 
>> Thank you.
>> 
>> 
>> Carlos.
>> 
>> 

AFAIK "Shared" is about resources and known as  "OverSubscribe" in newer 
versions.
As long as constrains are resource based and nodes are not reserved exclusively 
to a single user multiple jobs from different users are possible even without 
oversubscription.

BR
--
FSU Jena | JULIELab.de/Staff/Benjamin+Redling.html
vox: +49 3641 9 44323 | fax: +49 3641 9 44321


[slurm-dev] Re: Reason for job state CANCELLED

2017-07-29 Thread Benjamin Redling

Am 29. Juli 2017 08:07:44 MESZ, schrieb Florian Pommerening 
:
>
>Hi everyone,
>
>is there a way to find out why a job was canceled by slurm? I would
>like 
>to distinguish the cases where a resource limit was hit from all other 
>reasons (like a manual cancellation). In case a resource limit was hit,
>
>I also would like to know which one.
>
>Thank you
>Florian

Hello,

@ https://slurm.schedmd.com/squeue.html
search for: Job State Codes
compare canceled, failed, timeout

Regards,
Benjamin
--
FSU Jena | JULIELab.de/Staff/Benjamin+Redling.html
vox: +49 3641 9 44323 | fax: +49 3641 9 44321


[slurm-dev] Re: Slurm with High Availabilty/Automatic failover

2017-07-26 Thread Benjamin Redling
Hello,

Am 25.07.2017 um 16:19 schrieb J. Smith:
> Does anyone has any suggestions in setting up high availability and
> automatic failover between two servers that run a Controller daemon,
> Database daemon and Mysql Database (i.e replication vs galera cluster)?
> 
> Any input would be appreciated.

we use ganeti instances for most services. In our case KVM (configurable
on a per cluster basis) + DRBD (instance storage)
On Debian they are rock solid.
While HA is experimentally possible, the default is intentionally going
without automatic fail-over:
http://docs.ganeti.org/ganeti/2.15/html/design-linuxha.html#risks

From my point of view a failing Slurm controller is such a rare event,
that I prefer having a look first and only then be able to do a manually
triggered fast fail-over.
  On the other hand the (unwritten) expected SLA for most services here
is 90% per week & month, 95% year
-- sure, relaxed; not knowing your needs, that might just be a
HPC-kindergarden from your perspective.


Regards,
Benjamin
-- 
FSU Jena | JULIELab.de/Staff/Benjamin+Redling.html
☎ +49 3641 9 44323



smime.p7s
Description: S/MIME Cryptographic Signature


[slurm-dev] Re: Announcing Slurm Job Packs

2017-07-14 Thread Benjamin Redling
On 2017-07-13 18:51, Perry, Martin wrote:
> This email is to announce the latest version of the job packs feature
> (heterogeneous resources and MPI-MPMD tight integration support) as
> open-source code.
[...]
> The code can be cloned from this branch:
> _https://github.com/RJMS-Bull/slurm/tree/dev-job-pack-17.02_
> and the documentation can be found here:
> _https://github.com/RJMS-Bull/slurm/blob/dev-job-pack-17.02/doc/html/job_packs.shtml_

Thanks for sharing!
Small issue: the links cause a 404

Probably
https://github.com/RJMS-Bull/slurm/tree/slurm-17.02-jobpacks
and "doc" accordingly now.

Regards,
Benjamin
-- 
FSU Jena | JULIELab.de/Staff/Benjamin+Redling.html
☎ +49 3641 9 44323



smime.p7s
Description: S/MIME Cryptographic Signature


[slurm-dev] Re: Multifactor priority plugin

2017-06-06 Thread Benjamin Redling

Hello Sourabh,

On 2017-06-06 10:52, sourabh shinde wrote:
> Problem :
> As per my understanding, high priority jobs are executed first and takes
> all of the available nodes.
> I need that atleast one low or normal priority job should be executed in
> parallel with the high priority jobs. I want to have some kind of
> proportion 90:10 to execute jobs so that my low priority jobs are not
> starved for weeks till it reaches some greater priority.
> 
> How can i achieve this ?

I would think about how to increase the priority of the jobs you call
"low priority" over time (age?!).
"high vs low" sounds like a binary view of the things happening and as
if a job is stuck in that category once and for all.
With proper weights your jobs' priorities change again and again.

I think you should revisit the six available factors:
https://slurm.schedmd.com/priority_multifactor.html

and most Parameters containing "Frequency" or "Priority"
https://slurm.schedmd.com/slurm.conf.html

If you still have starving jobs you might want to look into gang scheduling.

AFAIK for the "90:10": either you are able to preempt long running jobs
or you have to deny long running jobs that allocate the whole cluster
from the beginning.


Regards,
Benjamin
-- 
FSU Jena | JULIELab.de/Staff/Benjamin+Redling.html
☎ +49 3641 9 44323


[slurm-dev] Re: Multinode MATLAB jobs

2017-05-31 Thread Benjamin Redling

Hi,

Am 31.05.2017 um 10:39 schrieb Loris Bennett:
> Does any one know whether one can run multinode MATLAB jobs with Slurm
> using only the Distributed Computing Toolbox?  Or do I need to be
> running a Distributed Computing Server too?

if you can get a hand on the overpriced and underwhelming DCS (at least
up to the 2016b Linux variant the mdce service neither has startup
scripts with LSB tags, nor systemd units; only the very first annoyance),
the following might be a consolation:
"
Access to all eligible licensed toolboxes or blocksets with a single
server license on the distributed computing resource
"
https://www.mathworks.com/products/distriben/features.html


(We currently use DCS without Slurm integration and thous are bad
citizens considering the license pool we have to share.
But running DCS without scheduler integration is bad in many ways. e.g.
proper security levels don't cooperate with plain LDAP, default security
runs job as root [hello, inaccessible NFS shares] so it seems users
either start single node parallel jobs apart from DCS or DCS-Slurm
integration is mandatory and you get all the benefits -- license count,
security level, multi-node)

Regards,
Benjamin
-- 
FSU Jena | JULIELab.de/Staff/Benjamin+Redling.html
vox: +49 3641 9 44323 | fax: +49 3641 9 44321


[slurm-dev] Re: SLURM terminating jobs before they finish

2017-04-17 Thread Benjamin Redling

Hi Batsirai,

Am 17.04.2017 um 14:54 schrieb Batsirai Mabvakure:
> SLURM has been running okay until recently my jobs are terminating before 
> they finish. 
> I have tried increasing memory using --mem, but still the jobs stop
halfway with an error in the slurm.out file.
> I then tried running again a job which once ran and completed a week
ago, it also terminated halfway. [...]

are you allowed to post (relevant parts of) the slurm.out file?

Regards,
Benjamin
-- 
FSU Jena | JULIELab.de/Staff/Benjamin+Redling.html
vox: +49 3641 9 44323 | fax: +49 3641 9 44321


[slurm-dev] Re: Slurm with Torque

2017-04-16 Thread Benjamin Redling

Hello Mahmood,

Am 16.04.2017 um 16:11 schrieb Mahmood Naderan:
> Hi,
> Currently, Torque is running on our cluster. I want to know, is it
> possible to install Slurm, create some test partitions, submit some test
> jobs and be sure that it is working while Torque is running?
> Then we are able to tell the users to use Slurm scripts.  Any feedback
> is welcomed.

I would recommend to partition the hosts (whatever is easy to implement
for you: namespaces, containers, virtual machines, ...)
Out of necessity we have Slurm running with Ganeti KVM instances (type
"plain") as virtual Slurm nodes to protect infrastructure VMs (type
"DRBD" & "plain") on the same hosts
-- with no obvious performance penalties.

I don't know how Torque tracks resource usage (could it handle an
/external/ process -- your host suddenly having load from /somewhere/
else?).
AFAIK Slurm can only track resources it scheduled itself.

Regards,
Benjamin
-- 
FSU Jena | JULIELab.de/Staff/Benjamin+Redling.html
vox: +49 3641 9 44323 | fax: +49 3641 9 44321


[slurm-dev] Re: LDAP required?

2017-04-11 Thread Benjamin Redling

AFAIK most request never hit LDAP servers.
In production there is always a cache on the client side -- nscd might
have issue, but that's another story.

Regards,
Benjamin

On 2017-04-11 15:32, Grigory Shamov wrote:
> On a larger cluster, deploying NIS, LDAP etc. might require some
> thought, because you will be testing performance of your LDAP server’s
> in worts case of a few hundred simultaneous requests, no? Thats why many
> of specialized cluster tools like ROCKS, Perceus etc. would rather
> synchronize files than doing LDAP.
> 
> -- 
> Grigory Shamov
> 
> 
> 
> 
> From: Marcin Stolarek  >
> Reply-To: slurm-dev >
> Date: Tuesday, April 11, 2017 at 12:48 AM
> To: slurm-dev >
> Subject: [slurm-dev] Re: LDAP required?
> 
> Re: [slurm-dev] Re: LDAP required?
> but... is LDAP such a big issue?
> 
> 2017-04-10 22:03 GMT+02:00 Jeff White  >:
> 
> Using Salt/Ansible/Chef/Puppet/Engine is another way to get it
> done.  Define your users in states/playbooks/whatever and don't
> bother with painful LDAP or ancient NIS solutions.
> 
> -- 
> Jeff White
> HPC Systems Engineer
> Information Technology Services - WSU
> 
> On 04/10/2017 09:39 AM, Alexey Safonov wrote:
>> If you don't want to share passwd and setup LDAP which is complex
>> task you can setup NIS. It will take 30 minutes of your time
>>
>> Alex
>>
>> 11 апр. 2017 г. 0:35 пользователь "Raymond Wan"
>> > написал:
>>
>>
>> Dear all,
>>
>> I'm trying to set up a small cluster of computers (i.e., less
>> than 5
>> nodes).  I don't expect the number of nodes to ever get larger
>> than
>> this.
>>
>> For SLURM to work, I understand from web pages such as
>> https://slurm.schedmd.com/accounting.html
>> 
>> 
>> that UIDs need to be shared
>> across nodes.  Based on this web page, it seems sharing
>> /etc/passwd
>> between nodes appears sufficient.  The word LDAP is mentioned
>> at the
>> end of the paragraph as an alternative.
>>
>> I guess what I would like to know is whether it is acceptable to
>> completely avoid LDAP and use the approach mentioned there?  The
>> reason I'm asking is that I seem to be having a very nasty time
>> setting up LDAP.  It doesn't seem as "easy" as I thought it
>> would be
>> [perhaps it was my fault for thinking it would be easy...].
>>
>> If I can set up a small cluster without LDAP, that would be great.
>> But beyond this web page, I am wondering if there are
>> suggestions for
>> "best practices".  For example, in practice, do most
>> administrators
>> use LDAP?  If so and if it'll pay off in the end, then I can
>> consider
>> continuing with setting it up...
>>
>> Thanks a lot!
>>
>> Ray
>>
> 
> 

-- 
FSU Jena | JULIELab.de/Staff/Benjamin+Redling.html
☎ +49 3641 9 44323


[slurm-dev] Re: LDAP required?

2017-04-11 Thread Benjamin Redling

Am 11. April 2017 08:21:31 MESZ, schrieb Uwe Sauter :
>
>Ray,
>
>if you're going with the easy "copy" method just be sure that the nodes
>are all in the same state (user management-wise) before
>you do your first copy. Otherwise you might accidentally delete already
>existing users.
>
>I also encourage you to have a look into Ansible which makes it easy to
>copy files between nodes (and which helps not to forget a
>node when updateing the files).
>
>
>Regards,
>
>   Uwe
>
>Am 11.04.2017 um 08:17 schrieb Raymond Wan:
>> 
>> Dear all,
>> 
>> Thank you all of you for the many helpful alternatives!
>> 
>> Unfortunately, system administration isn't my main responsibility so
>> I'm (regrettably) not very good at it and have found LDAP on Ubuntu
>to
>> be very unfriendly to set up.  I do understand that it must be a good
>> solution for a larger setup with a full-time system administrator.
>> But, if I can get away with something simpler for a cluster of just a
>> few nodes, then I might try that instead.
>> 
>> So far, no one seems to discourage me from simply copying /etc/passwd
>> between servers.  I can understand that this solution seems a bit
>> ad-hoc, but if it works and there are no "significant" downsides, I
>> might give that a try.  In fact, perhaps I'll give this a try now,
>get
>> the cluster up (since others are waiting for it) and while it is
>> running play with one of the options that have been mentioned and see
>> if it is worth swapping out /etc/passwd for this alternative...  I
>> guess this should work?
>> 
>> I suppose this isn't "urgent", but yes...getting the cluster set up
>> with SLURM soon will allow others to use it.  Then, I can take my
>time
>> with other options.  I guess I was worried if copying /etc/passwd
>will
>> limit what I can do later.  I guess if Linux-based UIDs and GIDs
>> match, then I shouldn't have any surprises?
>> 
>> Thank you for your replies!  They were most helpful!  I thought I had
>> only two options for SLURM:  /etc/passwd vs LDAP.  I didn't realise
>of
>> other choices available to me.  Thank you!
>> 
>> Ray
>> 
>> 
>> 
>> 
>> On Tue, Apr 11, 2017 at 2:05 PM, Lachlan Musicman 
>wrote:
>>> On 11 April 2017 at 02:36, Raymond Wan  wrote:


 For SLURM to work, I understand from web pages such as
 https://slurm.schedmd.com/accounting.html that UIDs need to be
>shared
 across nodes.  Based on this web page, it seems sharing /etc/passwd
 between nodes appears sufficient.  The word LDAP is mentioned at
>the
 end of the paragraph as an alternative.

 I guess what I would like to know is whether it is acceptable to
 completely avoid LDAP and use the approach mentioned there?  The
 reason I'm asking is that I seem to be having a very nasty time
 setting up LDAP.  It doesn't seem as "easy" as I thought it would
>be
 [perhaps it was my fault for thinking it would be easy...].

 If I can set up a small cluster without LDAP, that would be great.
 But beyond this web page, I am wondering if there are suggestions
>for
 "best practices".  For example, in practice, do most administrators
 use LDAP?  If so and if it'll pay off in the end, then I can
>consider
 continuing with setting it up...
>>>
>>>
>>>
>>> We have had success with a FreeIPA installation to manage auth -
>every node
>>> is enrolled in a domain and each node runs SSSD (the FreeIPA
>client).
>>>
>>> Our auth actually backs onto an Active Directory domain - I don't
>even have
>>> to manage the users. Which, to be honest, is quite a relief.
>>>
>>> cheers
>>> L.
>>>
>>> --
>>> The most dangerous phrase in the language is, "We've always done it
>this
>>> way."
>>>
>>> - Grace Hopper
>>>

Indeed, look into ansible and avoid "ad-hoc".
Ansible has a module "user" to handle that case with grace -- no accidental 
overwriting.

Regards ,
Benjamin
--
FSU Jena | JULIELab.de/Staff/Benjamin+Redling.html
vox: +49 3641 9 44323 | fax: +49 3641 9 44321


[slurm-dev] Re: Re:Best Way to Schedule Jobs based on predetermined Lists

2017-04-05 Thread Benjamin Redling

Am 05.04.2017 um 15:58 schrieb maviko.wag...@fau.de:
[...]
> The purpose of this cluster is to investigate how smart distribution of
> workloads based on predetermined performance and energy data can benefit
> hpc-clusters that consist of heterogenous systems that differ greatly
> regarding energy consumption and performance.
> Its just a small research project.
[...]
> I already have all the data i need and now just need to find a way to
> integrate node selection based on these priority lists into Slurm.
> 
> My idea is to write a plugin that, on job submission to slurm, reads
> those lists, makes a smart selection based on different criteria which
> currently available node would be suited best, and forwards the job to
> that note. Partition-Selection is not needed since i run all Nodes in
> one partition for easier usage.
> The only information my plugin needs to forward besides nodename is some
> small config params in the form of Environment Variables on the target
> machine.

Sorry, but that sounds like trying to be more clever than the existing
scheduler.
(Occasionally somebody asks on this list for any details on scheduler
development -- without deeper knowledge of slurm -- and as to expect:
nothing to be heard again... Maybe start small?)

How about providing the necessary factors to the scheduler via a plugin.
Than everybody could incorporate that via multifactor to one owns heart.
That would be really useful.

> So far i did those job requests manually via:
> srun -w --export="",... 
> 
> I would like to include functionality into slurm so upon a simple "srun
> " it supplies node selection and matching envVars automatically.
> Based on my current knowledge of slurms architecture, a plugin (either
> select, or schedule) seems to be the apparent fit for what i'm trying to
> achieve.

> However, as stated in my first mail, i have not dabbled with plugin
> development/editing yet and kindly ask for advice from someone more
> experienced with that if indeed i pursue the correct approach.
> Or if a frontend solution, albeit less elegant, would be both easier and
> better fitting for the purpose of this project.

Apart from "power save" there is already infrastructure for "power
management":
https://slurm.schedmd.com/power_mgmt.html

Is yours the future non-Cray plugin?
Hope so.

All the best,
Benjamin
-- 
FSU Jena | JULIELab.de/Staff/Benjamin+Redling.html
vox: +49 3641 9 44323 | fax: +49 3641 9 44321


[slurm-dev] Re: Fwd: Scheduling jobs according to the CPU load

2017-03-21 Thread Benjamin Redling

re hi,

your script will occasionally fail because the number of fields in the
output of "uptime" is variable.
I was reminded by this one:
http://stackoverflow.com/questions/11735211/get-last-five-minutes-load-average-using-ksh-with-uptime

Even more a reason to use /proc...

Regards,
Benjamin

Am 21.03.2017 um 21:15 schrieb kesim:
> There is an error in the script. It could be:
> 
> scontrol update node=your_node_name WEIGHT=`echo 100*$(uptime | awk
> '{print $12}')/1 | bc`
> 
> 
> On Tue, Mar 21, 2017 at 8:41 PM, kesim  > wrote:
> 
> Dear SLURM Users,
> 
> My response here is for those who are trying to solve the simple
> problem of nodes ordering according to the CPU load. Actually,
> Markus was right and he gave me the idea (THANKS!!!)
> The solution is not pretty but it works and it has a lot of
> flexibility. Just put into crone a script:
>  
> #!/bin/sh
> scontrol update node=your_node_name WEIGHT=`echo 100*$(uptime | awk
> -F'[, ]' '{print $21}')/1 | bc`
> 
> Best Regards,
> 
> Ketiw
> 
> 
> 
> 
> On Mon, Mar 20, 2017 at 3:31 PM, Markus Koeberl
> > wrote:
> 
> 
> On Monday 20 March 2017 05:38:29 Christopher Samuel wrote:
> >
> > On 19/03/17 23:25, kesim wrote:
> >
> > > I have 11 nodes and declared 7 CPUs per node. My setup is
> such that all
> > > desktop belongs to group members who are using them mainly
> as graphics
> > > stations. Therefore from time to time an application is
> requesting high
> > > CPU usage.
> >
> > In this case I would suggest you carve off 3 cores via cgroups for
> > interactive users and give Slurm the other 7 to parcel out to
> jobs by
> > ensuring that Slurm starts within a cgroup dedicated to those
> 7 cores..
> >
> > This is similar to the "boot CPU set" concept that SGI came up
> with (at
> > least I've not come across people doing that before them).
> >
> > To be fair this is not really Slurm's problem to solve, Linux
> gives you
> > the tools to do this already, it's just that people don't
> realise that
> > you can use cgroups to do this.
> >
> > Your use case is valid, but it isn't really HPC, and you can't
> really
> > blame Slurm for not catering to this.  It can use cgroups to
> partition
> > cores to jobs precisely so it doesn't need to care what the
> load average
> > is - it knows the kernel is ensuring the cores the jobs want
> are not
> > being stomped on by other tasks.
> 
> You could additionally define a higher "Weight" value for a host
> if you know that the load is usually higher on it than on the
> others.
> 
> 
> regards
> Markus Köberl
> --
> Markus Koeberl
> Graz University of Technology
> Signal Processing and Speech Communication Laboratory
> E-mail: markus.koeb...@tugraz.at 
> 
> 
> 


-- 
FSU Jena | JULIELab.de/Staff/Benjamin+Redling.html
vox: +49 3641 9 44323 | fax: +49 3641 9 44321


[slurm-dev] Re: Fwd: Scheduling jobs according to the CPU load

2017-03-21 Thread Benjamin Redling

Hi,

if you don't want to depend on the whitespaces in the output of "uptime"
(the number of fields depends on a locale) you can improve that via "awk
'{print $3}' /proc/loadavg" (for the 15min avg) -- it's always better to
avoid programmatically accessing output made for humans as long as possible.

Nice hack anyway!

Regards,
Benjamin


Am 21.03.2017 um 21:15 schrieb kesim:
> There is an error in the script. It could be:
> 
> scontrol update node=your_node_name WEIGHT=`echo 100*$(uptime | awk
> '{print $12}')/1 | bc`
> 
> 
> On Tue, Mar 21, 2017 at 8:41 PM, kesim  > wrote:
> 
> Dear SLURM Users,
> 
> My response here is for those who are trying to solve the simple
> problem of nodes ordering according to the CPU load. Actually,
> Markus was right and he gave me the idea (THANKS!!!)
> The solution is not pretty but it works and it has a lot of
> flexibility. Just put into crone a script:
>  
> #!/bin/sh
> scontrol update node=your_node_name WEIGHT=`echo 100*$(uptime | awk
> -F'[, ]' '{print $21}')/1 | bc`
> 
> Best Regards,
> 
> Ketiw
> 
> 
> 
> 
> On Mon, Mar 20, 2017 at 3:31 PM, Markus Koeberl
> > wrote:
> 
> 
> On Monday 20 March 2017 05:38:29 Christopher Samuel wrote:
> >
> > On 19/03/17 23:25, kesim wrote:
> >
> > > I have 11 nodes and declared 7 CPUs per node. My setup is
> such that all
> > > desktop belongs to group members who are using them mainly
> as graphics
> > > stations. Therefore from time to time an application is
> requesting high
> > > CPU usage.
> >
> > In this case I would suggest you carve off 3 cores via cgroups for
> > interactive users and give Slurm the other 7 to parcel out to
> jobs by
> > ensuring that Slurm starts within a cgroup dedicated to those
> 7 cores..
> >
> > This is similar to the "boot CPU set" concept that SGI came up
> with (at
> > least I've not come across people doing that before them).
> >
> > To be fair this is not really Slurm's problem to solve, Linux
> gives you
> > the tools to do this already, it's just that people don't
> realise that
> > you can use cgroups to do this.
> >
> > Your use case is valid, but it isn't really HPC, and you can't
> really
> > blame Slurm for not catering to this.  It can use cgroups to
> partition
> > cores to jobs precisely so it doesn't need to care what the
> load average
> > is - it knows the kernel is ensuring the cores the jobs want
> are not
> > being stomped on by other tasks.
> 
> You could additionally define a higher "Weight" value for a host
> if you know that the load is usually higher on it than on the
> others.
> 
> 
> regards
> Markus Köberl
> --
> Markus Koeberl
> Graz University of Technology
> Signal Processing and Speech Communication Laboratory
> E-mail: markus.koeb...@tugraz.at 
> 
> 
> 


-- 
FSU Jena | JULIELab.de/Staff/Benjamin+Redling.html
vox: +49 3641 9 44323 | fax: +49 3641 9 44321


[slurm-dev] Re: Scheduling jobs according to the CPU load

2017-03-19 Thread Benjamin Redling

Am 19.03.2017 um 15:36 schrieb kesim:
> ... I only want to find
> the solution for the trivial problem. I also think that slurm was design
> for HPC and it is performing well in such env. I agree with you that my
> env. hardly qualifies as HPC but still one of the simplest concept
> behind any scheduler is to not overload some nodes when the others are
> idling - can it really be by design? I cannot also speak for developers
> but it probably needs a few lines of code to add this feature
> considering that the data is already collected.

(A lot of [rarely used] features might be just away a few extra lines of
code [nobody contributes, or even pays for].)

If you want to utilize the resources of desktops you might want to have
a look at HTCondor.

BR
-- 
FSU Jena | JULIELab.de/Staff/Benjamin+Redling.html
vox: +49 3641 9 44323 | fax: +49 3641 9 44321


[slurm-dev] Re: Fwd: Dependency Problem In Full Queue

2017-03-17 Thread Benjamin Redling

Good examples:
https://hpc.nih.gov/docs/job_dependencies.html

BR

On 2017-03-15 17:37, Álvaro pc wrote:
> Hi again!
> 
> I would really like to know about the behaviour of --dependency argument..
> 
> Nobody know anything?
> 
> *Álvaro Ponce Cabrera.*
> 
> 
> 2017-03-14 12:31 GMT+01:00 Álvaro pc  >:
> 
> Hi,
> 
> I'm having problems trying to launch jobs with dependency of another
> one.
> 
> I'm using '--dependency=afterany:Job_ID' argument. 
> 
> The problem happens when the queue is full and the new job which
> depends on another one (already running) can't enter in the queue
> and need to wait.
> Instead of wait properly to enter in the queue, the job try to enter
> thousands of times per minute. 
> 
> All the tries seems to be waiting to enter  in the queue... Here you
> can see a piece of the queue where you can see the problem:
> 
>  20217   UPO Macs2_DM alvaropc PD   0:00  1
> (Dependency)
>  20218   UPO Macs2_DM alvaropc PD   0:00  1
> (Dependency)
>  20219   UPO Macs2_DM alvaropc PD   0:00  1
> (Dependency)
>  20220   UPO Macs2_DM alvaropc PD   0:00  1
> (Dependency)
>  20221   UPO Macs2_DM alvaropc PD   0:00  1
> (Dependency)
>  20222   UPO Macs2_DM alvaropc PD   0:00  1
> (Dependency)
>  20223   UPO Macs2_DM alvaropc PD   0:00  1
> (Dependency)
>  20224   UPO Macs2_DM alvaropc PD   0:00  1
> (Dependency)
>  20225   UPO Macs2_DM alvaropc PD   0:00  1
> (Dependency)
>   4907   UPO notebookpanos  R 64-01:48:56  1
> nodo01
>   6454   UPO valinomy jraviles  R 7-05:45:32  1
> nodo10
>   6492   UPO input_ra  rbueper  R 13-08:44:42  1
> nodo01
>   6493   UPO input_ra  rbueper  R 13-08:44:42  1
> nodo05
>   6823   UPO FELIX-No said  R 13-09:34:42  1
> nodo06
>   7219   UPO input_ra  rbueper  R 13-08:44:42  1
> nodo05
> 
> 
> 
> In addition I'm obtaining this error from the log/out file: 'sbatch:
> error: Slurm temporarily unable to accept job, sleeping and retrying'. 
> The error is repeated thousands of times too, obviously, one per
> each try of the job entering the queue...
> 
> I just want to launch ONE job  which waits untill another one ends... 
> 
> Any ideas?
> 
> Thank you so much.
> 
> 
> 
> *Álvaro Ponce Cabrera.*
> 
> 
> 

-- 
FSU Jena | JULIELab.de/Staff/Benjamin+Redling.html
vox: +49 3641 9 44323 | fax: +49 3641 9 44321


[slurm-dev] RE: MaxJobs on association not being respected

2017-03-17 Thread Benjamin Redling

Re hi,

On 2017-03-17 03:01, Will Dennis wrote:
> My slurm.conf:
> https://paste.fedoraproject.org/paste/RedFSPXVlR2auRlevS5t~F5M1UNdIGYhyRLivL9gydE=/raw
> 
>> Are you sure the current running config is the one in the file?
>> Did you double check via "scontrol show config"
> 
> Yes, all params set in slurm.conf are showing correctly.

the sacctmgr output from your first mail ("ml-cluster") doesn't fit the
slurm.conf you provided ("test-cluster"). Can you clarify that?

Regards,
Benjamin
-- 
FSU Jena | JULIELab.de/Staff/Benjamin+Redling.html
vox: +49 3641 9 44323 | fax: +49 3641 9 44321


[slurm-dev] RE: MaxJobs on association not being respected

2017-03-16 Thread Benjamin Redling

Hello Will,

On 2017-03-15 18:13, Will Dennis wrote:
> Here are their definitions in slurm.conf:
> 
> # PARTITIONS
> PartitionName=batch Nodes=[nodelist] Default=YES DefMemPerCPU=2048 
> DefaultTime=01:00:00 MaxTime=05:00:00 PriorityTier=100 PreemptMode=off 
> State=UP
> PartitionName=long Nodes=[nodelist] Default=NO DefMemPerCPU=2048 
> DefaultTime=1-00:00:00 MaxTime=UNLIMITED PriorityTier=100 PreemptMode=off 
> State=UP
> PartitionName=scavenger Nodes=[nodelist] Default=NO DefMemPerCPU=2048 
> DefaultTime=1-00:00:00 MaxTime=UNLIMITED PriorityTier=10 PreemptMode=requeue 
> State=UP
> 
> Considering the ‘long’ partition, what is the best way to set up limits of 
> how many jobs can be submitted to it concurrently by a user, or how to limit 
> number of CPUs used? 
> 
> As can be seen from my prior post, we are utilizing job accounting via 
> slurmdbd.

in case you didn't make any progress in the meantime:
are you allowed to post the full slurm.conf of the test setup?
Would be nice. Just to make sure nobody misses a seemingly irrelevant
part. Skimming your posts didn't reveal to me any obvious flaws in the
parts you provided.

Are you sure the current running config is the one in the file?
Did you double check via "scontrol show config"

Regards,
Benjamin
-- 
FSU Jena | JULIELab.de/Staff/Benjamin+Redling.html
vox: +49 3641 9 44323 | fax: +49 3641 9 44321


[slurm-dev] Re: How to sent jobs for all nodes automatically

2017-03-06 Thread Benjamin Redling

Hi David,

Am 06.03.2017 um 12:05 schrieb David Ramírez:
> I have little problem. Slurm allocated job allocated nodes (When a nodes
> is full, sent job to next one).
> 
> I need use all nodes without order (customer like that)

I don't know "without order", but you can spread the load with "least
loaded node (LLN / CR_LLN)".

If that might be a valid solution, you might want to read about it in
the docs or at least skim the following discussion:
https://groups.google.com/forum/#!topic/slurm-devel/i6SjRwFQCK8

Did the customer tell you why so that you can judge the pros and cons?
You esp. might want to consider the difference between "per job" and
"generally" -- it's discussed there too.


Regards,
Benjamin
-- 
FSU Jena | JULIELab.de/Staff/Benjamin+Redling.html
vox: +49 3641 9 44323 | fax: +49 3641 9 44321


[slurm-dev] Re: Error showing in slurmd daemon startup

2016-12-24 Thread Benjamin Redling

Hi Will,

Am 24.12.2016 um 21:10 schrieb Will Dennis:
> Thanks for helping to interpret the error message… Clear enough to me now.

You're welcome! I wrote a bit brief because I used my mobile.

> I was told (by one of my researchers) that setting “FastSchedule=0” would 
> "tell Slurm to get the hardware info from the node instead of from 
> slum.conf”. A read of the relevant section in 
> https://slurm.schedmd.com/slurm.conf.html shows me it’s a bit more nuanced 
> than that ;)

Good to hear you looked it up yourself and didn't just /consume/ my
answer -- I like that spirit!
I hope that way you'll pick up more and more over time and might help
one day me or someone else too :)


> My node configs in slurm.conf are currently very simple:
> 
> NodeName=host01 CPUs=12 State=UNKNOWN
> NodeName=host02 CPUs=12 State=UNKNOWN
> NodeName=host03 CPUs=12 State=UNKNOWN
> NodeName=host04 CPUs=12 State=UNKNOWN
> 
> So maybe I could rewrite them as:
> 
> NodeName=host01,host02,host03,host04 CPUs=12 SocketsPerBoard=2 
> CoresPerSocket=6 ThreadsPerCore=1

Yes, there's no need to list hosts with identical parameters on seperate
lines.

And even that can be condensed -- if you like:
NodeName=host0[1-4] ...

s. https://slurm.schedmd.com/slurm.conf.html
(Easy to miss, because in-front of NodeName...)
--- %< ---
Multiple node names may be comma separated (e.g. "alpha,beta,gamma")
and/or a simple node range expression may optionally be used to specify
numeric ranges of nodes to avoid building a configuration file with
large numbers of entries. The node range expression can contain one pair
of square brackets with a sequence of comma separated numbers and/or
ranges of numbers separated by a "-" (e.g. "linux[0-64,128]", or
"lx[15,18,32-33]"). Note that the numeric ranges can include one or more
leading zeros to indicate the numeric portion has a fixed number of
digits (e.g. "linux[-1023]"). Up to two numeric ranges can be
included in the expression (e.g. "rack[0-63]_blade[0-41]"). If one or
more numeric expressions are included, one of them must be at the end of
the name (e.g. "unit[0-31]rack" is invalid), but arbitrary names can
always be used in a comma separated list.
--- %< ---

And the hidden gem from the slurm download section:
https://www.nsc.liu.se/~kent/python-hostlist/

Merry Christmas!
Benjamin
-- 
FSU Jena | JULIELab.de/Staff/Benjamin+Redling.html
vox: +49 3641 9 44323 | fax: +49 3641 9 44321


[slurm-dev] Re: Error showing in slurmd daemon startup

2016-12-24 Thread Benjamin Redling

Am 24. Dezember 2016 04:43:36 MEZ, schrieb Will Dennis :
>I see the following in the systemctl status ouput of the slurmd service
>on my compute nodes:
>
>Dec 23 21:31:58 host01 slurmd[32101]: error: You are using cons_res or
>gang scheduling with Fastschedule=0 and node configuration differs from
>hardware.  The node configuration used will be what is in the
>slurm.conf because of the bitmaps the slurmctld must create before the
>slurmd registers.
>CPUs=12:12(hw) Boards=1:1(hw) SocketsPerBoard=12:2(hw)
>CoresPerSocket=1:6(hw) ThreadsPerCore=1:1(hw)

12 configured : 2 found
1 core per socket conf : 6 found

Either stay just with CPUs or configure your nodes correctly.
Should be in the section you didn't provide.

>
>Here’s the relevant chunk of my slurm.conf:
>
>FastSchedule=0
>SchedulerType=sched/backfill
>PriorityType=priority/multifactor
>PriorityWeightAge=1000
>PriorityWeightFairshare=1
>PriorityWeightJobSize=1000
>PriorityWeightPartition=1000
>PriorityWeightQOS=0
>SelectType=select/cons_res
>SelectTypeParameters=CR_CPU_Memory
>
>
>How do I go about rectifying this? And, is it not possible to use
>"FastSchedule=0” with "SelectType=select/cons_res”?

It did work. ... and tells you that your HW differs from your config.

Regards,
BR

--
FSU Jena | JULIELab.de/Staff/Benjamin+Redling.html
vox: +49 3641 9 44323 | fax: +49 3641 9 44321


[slurm-dev] Re: SLURM reports much higher memory usage than really used

2016-12-15 Thread Benjamin Redling

Am 15. Dezember 2016 14:48:24 MEZ, schrieb Stefan Doerr :
>$ sinfo --version
>slurm 15.08.11
>
>$ sacct --format="CPUTime,MaxRSS" -j 72491
>   CPUTime MaxRSS
>-- --
>  00:27:06
>  00:27:06  37316236K
>
>
>I will have to ask the sysadms about cgroups since I'm just a user
>here.
>
>On Thu, Dec 15, 2016 at 12:05 PM, Merlin Hartley <
>merlin-sl...@mrc-mbu.cam.ac.uk> wrote:
>
>> I’ve been having the very same problem since I tried to enable
>Accounting
>> (slurmdbd) - so I have now had to disable accounting.
>>
>> It would seem therefore that this part of the documentation should be
>> updated:
>> https://slurm.schedmd.com/archive/slurm-15.08-latest/accounting.html
>> "To enable any limit enforcement you must at least have
>> *AccountingStorageEnforce=limits* in your slurm.conf, otherwise, even
>if
>> you have limits set, they will not be enforced. "
>>
>> I did not set that option at all in my slurm.conf and yet memory
>limits
>> started to be enforced - and again I don’t believe the memory
>estimate was
>> anything like correct.
>>
>> In the new year I may try accounting again but with
>"MemLimitEnforce=no”
>> set as well :)
>>
>>
>> Merlin
>>
>>
>> --
>> Merlin Hartley
>> IT Systems Engineer
>> MRC Mitochondrial Biology Unit
>> Cambridge, CB2 0XY
>> United Kingdom
>>
>> On 15 Dec 2016, at 10:32, Uwe Sauter  wrote:
>>
>>
>> You are correct. Which version do you run? Do you have cgroups
>enabled?
>> Can you enable debugging for slurmd on the nodes? The
>> output should contain what Slurm calculates as maximum memory for a
>job.
>>
>> One other option is do configure MemLimitEnforce=no (which defaults
>to yes
>> since 14.11).
>>
>>
>> Am 15.12.2016 um 11:26 schrieb Stefan Doerr:
>>
>> But this doesn't answer my question why it reports 10 times as much
>memory
>> usage as it is actually using, no?
>>
>> On Wed, Dec 14, 2016 at 1:00 PM, Uwe Sauter <
>> mailto:uwe.sauter...@gmail.com >> wrote:
>>
>>
>>There are only two memory related options "--mem" and
>"--mem-per-cpu".
>>
>>--mem tells slurm the memory requirement of the job (if used with
>> sbatch) or the step (if used with srun). But not the requirement
>>of each process.
>>
>>--mem-per-cpu is used in combination with --ntasks and
>--cpus-per-task.
>> If only --mem-per-cpu is used without other options the
>>memory requirement is calculated using the configured number of
>cores
>> (NOT the number of cores requested), as far as I can tell.
>>
>>You might want to play a bit more with the additionall options.
>>
>>
>>
>>Am 14.12.2016 um 12:09 schrieb Stefan Doerr:
>>
>> Hi, I'm running a python batch job on SLURM with following options
>>
>> #!/bin/bash
>> #
>> #SBATCH --job-name=metrics
>> #SBATCH --partition=xxx
>> #SBATCH --cpus-per-task=6
>> #SBATCH --mem=2
>> #SBATCH --output=slurm.%N.%j.out
>> #SBATCH --error=slurm.%N.%j.err
>>
>> So as I understand each process will have 20GB of RAM dedicated to
>it.
>>
>> Running it I get:
>>
>> slurmstepd: Job 72475 exceeded memory limit (39532832 > 2048),
>being
>> killed
>> slurmstepd: Exceeded job memory limit
>>
>> This however cannot be true. I've run the same script locally and it
>uses
>> 1-2GB of RAM. If it was using 40GB I would have
>>
>>gone to
>>
>> swap and definitely noticed.
>>
>> So I put some prints in my python code to see how much memory is used
>and
>> indeed it shows a max usage of 1.7GB and before the
>> error 1.2GB usage.
>>
>> What is happening here? I mean I could increase the mem option but
>then I
>> will be able to run much fewer jobs on my machines
>>
>>which
>>
>> seems really limiting.
>>
>>
>>
>>
>>

Hi, AFAIK Memory usage with cgroups is more than plain RSS. +File Cache, ...
So, the plugins in use would be really interesting.
Regards, Benjamin
--
FSU Jena | JULIELab.de/Staff/Benjamin+Redling.html
vox: +49 3641 9 44323 | fax: +49 3641 9 44321


[slurm-dev] Re: poor utilization, jobs not being scheduled

2016-10-30 Thread Benjamin Redling

Am 31.10.2016 um 00:47 schrieb Vlad Firoiu:
> What do you mean the ScheduleType is not explicit? I see
> `SchedulerType=sched/backfill`. (I don't know too much about slurm so I
> am probably misunderstanding something.)

Vlad, you are right: ScheduleType _is_ set. I meant SelectType.
See my other mail where I corrected the typo in the meantime and added a
few uncurated ideas.

Cheers, Benjamin
-- 
FSU Jena | JULIELab.de/Staff/Benjamin+Redling.html
vox: +49 3641 9 44323 | fax: +49 3641 9 44321


[slurm-dev] Re: poor utilization, jobs not being scheduled

2016-10-30 Thread Benjamin Redling

Am 31.10.2016 um 00:23 schrieb Benjamin Redling:
> Are you aware that as long as SchedulerType 

Sorry, typo. I meant *SelectType*

(The rest I wrote next is just unfiltered noise from my brain while
skimming the conf:)

>is not set to anything explicitly, select/linear is the default? 

> Am 30. Oktober 2016 19:00:28 MEZ, schrieb Vlad Firoiu <vlad...@gmail.com>:
[...]
> ## SelectType=select/linear

This seems bad for good utilization.

> PriorityType=priority/multifactor
> PriorityDecayHalfLife=7-0
> PriorityUsageResetPeriod=MONTHLY

I'm undecided about that...

> #
[...]
> PriorityWeightQOS   = 10
> PriorityWeightFairShare = 1000

Clearly dominant QOS/Fairshare: I hope accounting is setup right -- in
the "portion" of your slurm.conf you didn't provide?

> PriorityWeightAge   = 10

Does this even matter compared to QOS and Fairshare?

> PriorityWeightJobSize   = 1

Reasons for not using PriorityFavorSmall (with or without
SMALL_RELATIVE_TO_TIME) and considering the job size? I think that would
improve utilization in combination with a SelectType that takes CR into
account.
The 10min job you mentioned would not starve behind the big ones.


> PriorityWeightPartition = 1
> PriorityMaxAge=1-0

This way the age_factor maxes out after one day -- comparatively fast to
the default of a week. But age doesn't really matter compared to QOS and
Fairshare... what's the idea behind that?


> The particular job in question has 0 priority.

My fault: to ask for "sprio -w" with the individual weights in the first
place would have been nice.

Regards,
Benjamin
-- 
FSU Jena | JULIELab.de/Staff/Benjamin+Redling.html
vox: +49 3641 9 44323 | fax: +49 3641 9 44321


[slurm-dev] Re: poor utilization, jobs not being scheduled

2016-10-29 Thread Benjamin Redling
Are you allowed and able to post the slurm.conf? What does sprio -o %Q -j 
 say about that job? BR, Benjamin

Am 29. Oktober 2016 20:56:18 MESZ, schrieb Vlad Firoiu :
>I'm trying to figure out why utilization is low on our university
>cluster.
>It appears that many cores are available, but a minimal resource 10
>minute
>job has been waiting in queue for days. There happen to be some big
>high
>priority jobs at the front of the queue, and I've noticed that these
>are
>being constantly scheduled and unscheduled. Is this expected behavior?
>Might it be causing slurm to never reach lower priority jobs and
>consider
>them for scheduling/backfill?

-- 
Diese Nachricht wurde von meinem Android-Mobiltelefon mit K-9 Mail gesendet.

[slurm-dev] Re: Set Limit Time Per Job

2016-10-26 Thread Benjamin Redling

Hi,

Am 26.10.2016 um 15:35 schrieb Achi Hamza:
> But when i run a job more than 3 minutes it does not stop, like:
> srun -n1 sleep 300
> 
> I also set MaxWall parameter but to no avail:
> sacctmgr show qos format=MaxWall
> MaxWall 
>  --- 
>  00:03:00 
> 
> Please advice where i am doing wrong.

I think you need to check your (job) accounting.

I would check JobAcctGatherType first:
http://slurm.schedmd.com/slurm.conf.html

Btw. it would be way easier to tell what you might have missed if you
could post (more parts of) your slurm.conf

Regards,
Benjamin
-- 
FSU Jena | JULIELab.de/Staff/Benjamin+Redling.html
vox: +49 3641 9 44323 | fax: +49 3641 9 44321


[slurm-dev] RE: slurm_load_partitions: Unable to contact slurm controller (connect failure)

2016-10-25 Thread Benjamin Redling

Hi,

are you both working on the same cluster as the OP?

On 10/25/2016 08:12, suprita.bot...@wipro.com wrote:
> I have installed slurm on a 2 node cluster.
> 
> On the master node when I run sinfo command I get below output.
[...]

> But on compute node:Slurmd daemon is also running but it gives the error:
> 
> Unable to contact slurm controller (connect failure).

> I am not able to understand the error , why this error exists.Although
> in the master node sinfo output state of this node is coming out to be idle.


Have you copied the _exact_ slurm.conf from the master to the compute node?

Regards,
Benjamin

-- 
FSU Jena | JULIELab.de/Staff/Benjamin+Redling.html
vox: +49 3641 9 44323 | fax: +49 3641 9 44321


[slurm-dev] Re: set maximum CPU usage per user

2016-10-24 Thread Benjamin Redling

Hi,

On 10/21/2016 18:58, Steven Lo wrote:
> Is MaxTRESPerUser a better option to use?

if you only ever want to restrict every user alike, that seems reasonable.
I would choose whatever fits your needs right now and in the not so
distant future. That way you gain time to learn about the options slurm
provides.

Anyway, didn't you have any progress with your former setup?
Did you understand what happened?

Regards,
Benjamin
-- 
FSU Jena | JULIELab.de/Staff/Benjamin+Redling.html
vox: +49 3641 9 44323 | fax: +49 3641 9 44321


[slurm-dev] Re: set maximum CPU usage per user

2016-10-20 Thread Benjamin Redling

Hi Steven,

On 10/20/2016 00:22, Steven Lo wrote:
> We have the attribute commented out:
> #AccountingStorageEnforce=0

I think the best is to (re)visit "Accounting and Resource Limits":
http://slurm.schedmd.com/accounting.html

Right know I have no setup that needs accounting but as far as I
currently understand you'll need AccoutingStorageEnforce=limits,qos to
get your examples to work.
And just in case you already didn't set it:
for QOS (http://slurm.schedmd.com/qos.html)

PriorityWeightQOS" configuration parameter must be defined in the
slurm.conf file and assigned an integer value greater than zero.


What I am unsure -- esp. not knowing your config -- if there are any
other unmet dependencies.
Would be nice somebody with real experience with accounting could affirm
or give a pointer.

Regards,
Benjamin
-- 
FSU Jena | JULIELab.de/Staff/Benjamin+Redling.html
vox: +49 3641 9 44323 | fax: +49 3641 9 44321


[slurm-dev] Re: set maximum CPU usage per user

2016-10-19 Thread Benjamin Redling
Hi, what are your AccountingStorage settings? Esp. AccountingStorageEnforce. 
Did limits work before, or is this a first try?
Regards, Benjamin

Am 19. Oktober 2016 22:14:27 MESZ, schrieb Steven Lo :
>
>
>By the way, we do have the following attribute set:
>
> PriorityType=priority/multifactor
>
>
>Thanks
>
>Steven.
>
>
>On 10/19/2016 12:55 PM, Steven Lo wrote:
>>
>>
>> Hi Chris,
>>
>> When we try the command as suggested, it said that nothing modified:
>>
>> [root@pauling ~]# sacctmgr modify account normal set Grpcpus=300
>>  Nothing modified
>>
>> Do you know if there is other method?
>>
>> Thanks
>>
>> Steven.
>>
>>
>> On 10/19/2016 07:25 AM, Christopher Benjamin Coffey wrote:
>>> Hi Steven,
>>>
>>> If you are trying to restrict the cpus for a group, I believe you 
>>> need to set the account value:
>>>
>>> sacctmgr modify account normal set Grpcpus=300
>>>
>>> Best,
>>> Chris
>>>
>>> —
>>> Christopher Coffey
>>> High-Performance Computing
>>> Northern Arizona University
>>> 928-523-1167
>>>
>>> On 10/18/16, 4:04 PM, "Steven Lo"  wrote:
>>>
>>>Hi,
>>>   We are trying to limit 300 CPU usage per user in our
>cluster.
>>>   We have tried:
>>>   sacctmgr modify qos normal set Grpcpus=300
>>>   and
>>>   sacctmgr modify user username set GrpCPUs=300
>>>Both seems to allow job to run which asking for 308
>CPUs.
>>>Is there other way to implement this requirement?
>>>Thanks advance for your suggestion.
>>>Steven.
>>>
>>

-- 
Diese Nachricht wurde von meinem Android-Mobiltelefon mit K-9 Mail gesendet.

[slurm-dev] Re: Slurm Drain/Down Issue

2016-10-04 Thread Benjamin Redling
ThreadsPerCore should be 1, u set it to 4 BR, Benjamin

Am 4. Oktober 2016 16:41:33 MESZ, schrieb evan clark :
>
>I am not sure if this is the correct place to share this, but maybe 
>someone can point me in the correct directions. I recently setup a 
>Centos 7 based slurm cluster, however my nodes continuously show an 
>either down or drained state. The reason for the drained state is =Low 
>socket*core*thread count. The nodes are composed of dual Quad core xeon
>
>processors w/o hyperthreading and the conf file has the configuration
>of 
>2 sockets, 4 cores per socket and 1 thread per core. Below is the node 
>information and the slurm conf file.
>
>NodeName=dragonsdenN3 Arch=x86_64 CoresPerSocket=4
>CPUAlloc=0 CPUErr=0 CPUTot=32 CPULoad=0.23
>AvailableFeatures=(null)
>ActiveFeatures=(null)
>Gres=(null)
>NodeAddr=192.168.0.7 NodeHostName=dragonsdenN3 Version=16.05
>  OS=Linux RealMemory=3 AllocMem=0 FreeMem=31205 Sockets=2 Boards=1
>State=IDLE+DRAIN ThreadsPerCore=4 TmpDisk=0 Weight=1 Owner=N/A 
>MCS_label=N/A
>BootTime=2016-10-04T10:25:42 SlurmdStartTime=2016-10-04T10:26:16
>CapWatts=n/a
>CurrentWatts=0 LowestJoules=0 ConsumedJoules=0
>ExtSensorsJoules=n/s ExtSensorsWatts=0 ExtSensorsTemp=n/s
>Reason=Low socket*core*thread count, Low CPUs 
>[slurm@2016-10-04T10:13:23]
>
>ControlMachine=dragonsden
>ControlAddr=192.168.0.1
>#
>#MailProg=/bin/mail
>MpiDefault=none
>#MpiParams=ports=#-#
>ProctrackType=proctrack/pgid
>ReturnToService=1
>SlurmctldPidFile=/var/run/slurmctld.pid
>#SlurmctldPort=6817
>SlurmdPidFile=/var/run/slurmd.pid
>#SlurmdPort=6818
>SlurmdSpoolDir=/var/spool/slurmd
>SlurmUser=slurm
>#SlurmdUser=root
>StateSaveLocation=/var/spool
>SwitchType=switch/none
>TaskPlugin=task/none
>#
>#
># TIMERS
>#KillWait=30
>#MinJobAge=300
>#SlurmctldTimeout=120
>#SlurmdTimeout=300
>#
>#
># SCHEDULING
>FastSchedule=1
>SchedulerType=sched/backfill
>#SchedulerPort=7321
>SelectType=select/linear
>#
>#
># LOGGING AND ACCOUNTING
>
>#
># SCHEDULING
>FastSchedule=1
>SchedulerType=sched/backfill
>#SchedulerPort=7321
>SelectType=select/linear
>#
>#
># LOGGING AND ACCOUNTING
>AccountingStorageType=accounting_storage/none
>ClusterName=dragonsden
>#JobAcctGatherFrequency=30
>JobAcctGatherType=jobacct_gather/none
>#SlurmctldDebug=3
>#SlurmctldLogFile=
>#SlurmdDebug=3
>#SlurmdLogFile=
>#
>#
>ClusterName=dragonsden
>#JobAcctGatherFrequency=30
>JobAcctGatherType=jobacct_gather/none
>#SlurmctldDebug=3
>#SlurmctldLogFile=
>#SlurmdDebug=3
>#SlurmdLogFile=
>#
>#
># COMPUTE NODES
>NodeName=dragonsden NodeAddr=192.168.0.1 RealMemory=2 Sockets=2 
>CoresPerSocket=4 ThreadsPerCore=4 State=UNKNOWN
>NodeName=dragonsdenN1 NodeAddr=192.168.0.5 RealMemory=3 Sockets=2 
>CoresPerSocket=4 ThreadsPerCore=4 State=UNKNOWN
>NodeName=dragonsdenN2 NodeAddr=192.168.0.6 RealMemory=3 Sockets=2 
>CoresPerSocket=4 ThreadsPerCore=4 State=UNKNOWN
>NodeName=dragonsdenN3 NodeAddr=192.168.0.7 RealMemory=3 Sockets=2 
>CoresPerSocket=4 ThreadsPerCore=4 State=UNKNOWN
>NodeName=dragonsdenN4 NodeAddr=192.168.0.8 RealMemory=3 Sockets=2 
>CoresPerSocket=4 ThreadsPerCore=4 State=UNKNOWN
>NodeName=dragonsdenN5 NodeAddr=192.168.0.10 RealMemory=3 Sockets=2 
>CoresPerSocket=4 ThreadsPerCore=4 State=UNKNOWN
>PartitionName=debug Nodes=dragonsdenN[1-5] Default=YES MaxTime=INFINITE
>
>State=UP

-- 
Diese Nachricht wurde von meinem Android-Mobiltelefon mit K-9 Mail gesendet.

[slurm-dev] Re: Configuring slurm to use all CPUs on a node

2016-09-13 Thread Benjamin Redling

On 09/12/2016 18:57, andrealphus wrote:
> It doesnt seem like changing it to a different resource allocation
> method makes a difference, and almost seems buggy to me, but I guess
> is just a quirk of multithread systems.

Your issue ("using all hyperthreads") was discussed multiple times on
the list in the not so distant past.

The resource allocation method alone won't make it:
http://slurm.schedmd.com/faq.html#cpu_count

Anyway, I think you are on the right track.
BR
-- 
FSU Jena | JULIELab.de/Staff/Benjamin+Redling.html
vox: +49 3641 9 44323 | fax: +49 3641 9 44321


[slurm-dev] Re: Configuring slurm to use all CPUs on a node

2016-09-13 Thread Benjamin Redling

On 09/12/2016 16:55, Uwe Sauter wrote:
> 
> Also. CPUs=32 is wrong. You need
> 
> Sockets=2 CoresPerSocket=8 ThreadsPerCore=2

Setting "CPU" is not wrong according to the FAQ:
http://slurm.schedmd.com/faq.html#cpu_count

BR
-- 
FSU Jena | JULIELab.de/Staff/Benjamin+Redling.html
vox: +49 3641 9 44323 | fax: +49 3641 9 44321


[slurm-dev] Re: Configuring slurm to use all CPUs on a node

2016-09-13 Thread Benjamin Redling



On 09/12/2016 16:48, Uwe Sauter wrote:
> 
> Try SelectTypeParameters=CR_Core instead of CR_CPU

That alone is not sufficient:
http://slurm.schedmd.com/faq.html#cpu_count

BR
-- 
FSU Jena | JULIELab.de/Staff/Benjamin+Redling.html
vox: +49 3641 9 44323 | fax: +49 3641 9 44321


[slurm-dev] Re: single node workstation

2016-09-09 Thread Benjamin Redling

Hi,

I think your case is mentioned in the FAQ Q30 in the "NOTE".
-- according to this you set CR_CPU and "CPU" only; no cores, no
threads, ...:

http://slurm.schedmd.com/faq.html
[...]
30.  Slurm documentation refers to CPUs, cores and threads. What exactly
is considered a CPU?
If your nodes are configured with hyperthreading, then a CPU is
equivalent to a hyperthread. Otherwise a CPU is equivalent to a core.
You can determine if your nodes have more than one thread per core using
the command "scontrol show node" and looking at the values of
"ThreadsPerCore".

Note that even on systems with hyperthreading enabled, the resources
will generally be allocated to jobs at the level of a core (see NOTE
below). Two different jobs will not share a core except through the use
of a partition OverSubscribe configuration parameter. For example, a job
requesting resources for three tasks on a node with ThreadsPerCore=2
will be allocated two full cores. Note that Slurm commands contain a
multitude of options to control resource allocation with respect to base
boards, sockets, cores and threads.

(NOTE: An exception to this would be if the system administrator
configured SelectTypeParameters=CR_CPU and each node's CPU count without
its socket/core/thread specification. In that case, each thread would be
independently scheduled as a CPU. This is not a typical configuration.)

Regards,
Benjamin

On 09/09/2016 02:06, andrealphus wrote:
> 
> p.s. same issue on v16
> 
> On Wed, Sep 7, 2016 at 9:57 AM, andrealphus  wrote:
>>
>> p.s. it's listing 36 processors with sinfo, and that theyre all being
>> used, but it only running 18 jobs. So it looks like while it can see
>> the 36 "processors" its only allocating on the core level and not the
>> thread level;
>>
>>  squeue
>>  JOBID PARTITION NAME USER ST   TIME  NODES
>> NODELIST(REASON)
>>  3850_[19-1000%25] debug slurm_ex   ashton PD   0:00  1 
>> (Resources)
>> 3850_1 debug slurm_ex   ashton  R   0:05  1 localhost
>> 3850_2 debug slurm_ex   ashton  R   0:05  1 localhost
>> 3850_3 debug slurm_ex   ashton  R   0:05  1 localhost
>> 3850_4 debug slurm_ex   ashton  R   0:05  1 localhost
>> 3850_5 debug slurm_ex   ashton  R   0:05  1 localhost
>> 3850_6 debug slurm_ex   ashton  R   0:05  1 localhost
>> 3850_7 debug slurm_ex   ashton  R   0:05  1 localhost
>> 3850_8 debug slurm_ex   ashton  R   0:05  1 localhost
>> 3850_9 debug slurm_ex   ashton  R   0:05  1 localhost
>>3850_10 debug slurm_ex   ashton  R   0:05  1 localhost
>>3850_11 debug slurm_ex   ashton  R   0:05  1 localhost
>>3850_12 debug slurm_ex   ashton  R   0:05  1 localhost
>>3850_13 debug slurm_ex   ashton  R   0:05  1 localhost
>>3850_14 debug slurm_ex   ashton  R   0:05  1 localhost
>>3850_15 debug slurm_ex   ashton  R   0:05  1 localhost
>>3850_16 debug slurm_ex   ashton  R   0:05  1 localhost
>>3850_17 debug slurm_ex   ashton  R   0:05  1 localhost
>>3850_18 debug slurm_ex   ashton  R   0:05  1 localhost
>> sinfo -o %C
>> CPUS(A/I/O/T)
>> 36/0/0/36
>>
>> On Wed, Sep 7, 2016 at 9:41 AM, andrealphus  wrote:
>>>
>>> I tried changing the CPU flag int eh compute node section of the conf
>>> file to 36, but it didnt make a difference, still limited to 18. Also
>>> tried removing the flag and letting slurm calculate it from the other
>>> info, e.g.;
>>>  Sockets=1 CoresPerSocket=18 ThreadsPerCore=2
>>>
>>> also no change. Could it be a non configuration issue, e.g. a slurm
>>> bug related to the processor type? I only say that because I am
>>> normally a torque user, but there is an open bug with Adaptive that
>>> seems to be related to some of the newer intel
>>> processsors/glibc/elision locking
>>>
>>>
>>> On Tue, Sep 6, 2016 at 7:30 PM, andrealphus  wrote:

 a..I'll give that a try. Thanks Lachlan, feel better!

 On Tue, Sep 6, 2016 at 6:49 PM, Lachlan Musicman  wrote:
> No, sorry, I meant that your config file line needs to change:
>
>
> NodeName=localhost CPUs=36 RealMemory=12 Sockets=1 CoresPerSocket=18
> ThreadsPerCore=2 State=UNKNOWN
>
> --
> The most dangerous phrase in the language is, "We've always done it this
> way."
>
> - Grace Hopper
>
> On 7 September 2016 at 11:34, andrealphus  wrote:
>>
>>
>> Yup, thats what I expect too! Since Im brand new to slurm, not sure if
>> there is some other config option or srun flag to enable
>> 

[slurm-dev] Re: resource usage, TRES and --exclusive option

2016-09-01 Thread Benjamin Redling

Hi

On 09/01/2016 10:16, Christof Koehler wrote:

> Now, the point we are not sure about is what happens if a user allocates
> 10 out of 40 and sets "--exclusive" (if possible). Is the usage of that
> user (job) actually computed with 40 CPUs as most people would expect ?
> As described before other systems appear to bill only 10 while 40 are in
> fact used up.

>> Also, have you found this page? http://slurm.schedmd.com/mc_support.html

> That is certainly a very useful resource explaining the different
> concepts and how they relate to the terminology used by slurm. However,
> skimming through it there does not appear to be a direct connection to
> out usage computation question.

You might want to (re)read
1. http://slurm.schedmd.com/cons_res.html
2. http://slurm.schedmd.com/cons_res_share.html

[1] right at the beginning explains the default behaviour (exclusive mode)

[2] should explain best that OverSubscribe (fka. Shared) is depended on
the "Selection Setting".
  If you skim the column "Resulting Behavior" you should know really
fast what you aim for.

We share nodes between jobs of different users (but: all one trusted
group, my colleagues -- really easy case) without any conflict and
neither use OverSubscribe nor cgroups (but cgroups is a ticket).


Regards,
Benjamin

-- 
FSU Jena | JULIELab.de/Staff/Benjamin+Redling.html
vox: +49 3641 9 44323 | fax: +49 3641 9 44321


[slurm-dev] Re: Multiple simultaneous jobs on a single node on SLURM 15.08.

2016-08-30 Thread Benjamin Redling

Hi,

I didn't see an answer so far, so I try to reason:

On 08/29/2016 19:40, Luis Torres wrote:
> We have recently deployed SLURM v 15.08.7-build1 on Ubuntu 16.04
> submission and execution nodes with apt-get; we built and installed the
> source packages of the same release on Ubuntu 14.04 for the controller.
> 
> Our primary issue is that we’re not able to run multiple jobs in a
> single node despite closely following the configuration options
> suggested in the SLURM documentation.  Namely we’ve added “Shared=YES”

Shared=Yes -- "OverSubscribe" on newer versions -- has no effect for
select/cons_res + CR_CPU_MEMORY (what you configured).
Either "NO" or "FORCE".

> to our NodeName definition and have configured the scheduling Type to
> select/cons_res and CR_CPU_Memory parameters.

http://slurm.schedmd.com/cons_res.html
says:

All CR_s assume OverSubscribe=No or OverSubscribe=Force EXCEPT for
CR_MEMORY which assumes OverSubscribe=Yes


The table at
http://slurm.schedmd.com/cons_res_share.html
gives a good oversight.

Regards,
Benjamin
-- 
FSU Jena | JULIELab.de/Staff/Benjamin+Redling.html
vox: +49 3641 9 44323 | fax: +49 3641 9 44321


[slurm-dev] Re: configuring slurm nodes to use less than the total number of processes on a machine

2016-08-24 Thread Benjamin Redling

Hi,

On 08/23/2016 22:29, Tom G wrote:
> I have some slurm nodes with 8 core processors and hyperthreading, so 16
> CPUs in effect.  I'd like to restrict slurm to only use 12 CPUs on this
> machine.  What are the right slurm.conf settings to do this?

> Doing 8 or 16 CPUs seems straightforward since i can set
> CoresPerSocket=8 + ThreadsPerCore=1, or CoresPerSocket=8 +
> ThreadsPerCore=2 .  But if I want to do 12, does that mean i do
> ThreadsPerCore=1.5 ?  Seems strange.  

Wouldn't 6*2=12 be more natural[sic!] than 8*1.5=12?

Anyway http://slurm.schedmd.com/mc_support.html says:

Slurm will automatically detect the architecture of the nodes used by
examining /proc/cpuinfo. If, for some reason, the administrator wishes
to override the automatically selected architecture, the NodeName
parameter can be used in combination with FastSchedule:

FastSchedule=1
NodeName=dualcore[01-16] CPUs=4 CoresPerSocket=2 ThreadsPerCore=1

For a more complete description of the various node configuration
options see the slurm.conf man page.


So, according to this

Fastschedule=1
NodeName= CPUs=12 CoresPerSocket=6 ThreadsPerCore=2

should be enough.

I didn't try myself because I currently prefer to partition hosts via KVM.

Maybe that could be solved more lightweight via cgroup/cpusets.

Side note: if you only want slurmd and slurmstepd to be confined from
user jobs that can be done via CoreSpecCount, CPUSpecList +
TaskPlugin/cgroup ConstrainCores=yes
s. http://slurm.schedmd.com/slurm.conf.html


Best regards,
Benjamin
-- 
FSU Jena | JULIELab.de/Staff/Benjamin+Redling.html
vox: +49 3641 9 44323 | fax: +49 3641 9 44321


[slurm-dev] Re: Single CPU job consuming 8 CPUs

2016-07-12 Thread Benjamin Redling

Hi Yuri,

On 2016-07-12 20:53, Yuri wrote:
> In slurm.conf I have CPUs=4 for each node (but each node actually has a
> Intel Core i7). My question is: why is slurm assigning only one job per
> node and each job is consuming 8 CPUs?

considering that you only provide "CPU... for each node" the sbatch
command seems secondary to me -- I bet it is a basic configuration issue.

For what you seem to aim for, setting CPUs for each node alone is not
sufficient.

E.g. The default SelectType is select/linear (whole nodes).

The selection of the node and resources can depend on a lot of stuff.
Maybe this one first:
http://slurm.schedmd.com/cpu_management.html

And IMO the most important in your case SelectType=select/cons_res and
SelectTypeParameters:
https://slurm.schedmd.com/cons_res.html

If I misunderstood your posting and you think you have a proper setup
that is misbehaving, can you post the other settings in your slurm.conf?

Regards,
Benjamin
-- 
FSU Jena | JULIELab.de/Staff/Benjamin+Redling.html
vox: +49 3641 9 44323 | fax: +49 3641 9 44321


[slurm-dev] Re: Allocation Error

2016-06-14 Thread Benjamin Redling

Hi,

On 2016-06-14 20:19, Martin Kohn wrote:
> As you can see even with an job array only one job runs. Below you can find 
> the script I submit and my configuration.

> SchedulerType=sched/buildin
> #SchedulerType=sched/backfill
> #SchedulerPort=7321
> #SelectType=select/linear
> SelectType=select/cons_res

as a non-developer the first thing I notice:
you are not selecting a consumable resource via
SelectTypeParameters=

I couldn't find it in the documentation what happens if select/cons_res
is missing the SelectTypeParameters but my wild guess is, that your
behavior is still that of select/linear (the default; selecting entire
nodes).

Regards,
Benjamin
-- 
FSU Jena | JULIELab.de/Staff/Benjamin+Redling.html
vox: +49 3641 9 44323 | fax: +49 3641 9 44321


[slurm-dev] Re: How to setup node sequence

2016-06-13 Thread Benjamin Redling



On 06/13/2016 09:50, Husen R wrote:
> Hi all,
> 
> How to setup node sequence/order in slurm ?
> I configured nodes in slurm.conf like this -> Nodes = head,compute,spare.
> 
> Using that configuration, if I use one node in my job, I hope slurm will
> choose head as computing node (as it is in a first order). However slurm
> always choose compute, not head.
> 
> how to fix this ?

http://slurm.schedmd.com/slurm.conf.html

"
Weight
The priority of the node for scheduling purposes. All things being
equal, jobs will be allocated the nodes with the lowest weight which
satisfies their requirements. For example, a heterogeneous collection of
nodes might be placed into a single partition for greater system
utilization, responsiveness and capability. It would be preferable to
allocate smaller memory nodes rather than larger memory nodes if either
will satisfy a job's requirements. The units of weight are arbitrary,
but larger weights should be assigned to nodes with more processors,
memory, disk space, higher processor speed, etc. Note that if a job
allocation request can not be satisfied using the nodes with the lowest
weight, the set of nodes with the next lowest weight is added to the set
of nodes under consideration for use (repeat as needed for higher weight
values). If you absolutely want to minimize the number of higher weight
nodes allocated to a job (at a cost of higher scheduling overhead), give
each node a distinct Weight value and they will be added to the pool of
nodes being considered for scheduling individually. The default value is 1.
"

Benjamin
-- 
FSU Jena | JULIELab.de/Staff/Benjamin+Redling.html
vox: +49 3641 9 44323 | fax: +49 3641 9 44321


[slurm-dev] Re: NFSv4

2016-05-26 Thread Benjamin Redling

Hi,

On 05/25/2016 13:21, Mike Johnson wrote:
> I know this is a long-standing question, but thought it was worth
> asking.  I am in an environment that uses NFSv4, which obviously needs
> user credentials to grant access to filesystems.  Has anyone else
> tackled the issue of unattended batch jobs successfully?  I'm aware of
> AUKS.

isn't $subject misleading? Isn't Kerberos the problem?

> Is there any other method anyone has used?

we use NFSv4 and OpenLDAP without Kerberos.
IMO for a small group of [trusted] users this is good enough. YMMV.

Regards,
Benjamin
-- 
FSU Jena | JULIELab.de/Staff/Benjamin+Redling.html
vox: +49 3641 9 44323 | fax: +49 3641 9 44321


[slurm-dev] Re: How to get command of a running/pending job

2016-05-17 Thread Benjamin Redling

On 2016-05-17 12:19, Loris Bennett wrote:
> 
> Benjamin Redling
> <benjamin.ra...@uni-jena.de> writes:
> 
>> On 05/17/2016 10:02, Loris Bennett wrote:
>>>
>>> Benjamin Redling
>>> <benjamin.ra...@uni-jena.de> writes:
>>>
>>>> On 2016-05-13 05:58, Husen R wrote:
[...]
>>> Which version does this? 
[...]
>> An older one. [...]
>> I assumed slurm commands/parameters don't change (all over the board).
[...]
> I haven't really been bitten by such changes.  My main gripe with the
> Slurm tools is the inconsistency of the interfaces, e.g. output columns:
> 
>   squeue -o " %.18i"
>   sacct -o jobid%18
> 
> or selection according to nodes
> 
>   squeue -w node001
>   sacct -N node001
> 
> This is obviously not a real problem, but it is a daily annoyance.  So
> in that sense, I do think that changing the parameter semantics would be
> a good idea, but only once and only if the options become harmonised
> across all the tools!

I agree. That would be a good reason -- if prominently advertised in the
changelog and highlighted in the documentation.
I wrote it before: I miss an easy way to lookup the documentation for a
specific slurm version on the official site. Like ganeti does it.

Regards,
Benjamin
-- 
FSU Jena | JULIELab.de/Staff/Benjamin+Redling.html
vox: +49 3641 9 44323 | fax: +49 3641 9 44321


[slurm-dev] Re: How to get command of a running/pending job

2016-05-17 Thread Benjamin Redling

On 2016-05-17 12:28, Carlos Fenoy wrote:
> On Tue, May 17, 2016 at 10:02 AM, Loris Bennett 
> wrote:
[...]

>> Which version does this? 15.08.8 just seems to show the 'Command' entry,
>> which is the file containing the actual command.

> You will only see the script in the output of the scontrol if the job was
> submitted with sbatch. If the job has been submitted with srun you will not
> be able to see the script, as it is not stored by slurm and also it may be
> a binary file.

Good to know. (Calms me down and makes me regret the slight rant.)

As far as I know nobody here is using srun for important stuff _and_ the
parameters still seem to work in newer versions there is still hope.

Regards,
Benjamin
-- 
FSU Jena | JULIELab.de/Staff/Benjamin+Redling.html
vox: +49 3641 9 44323 | fax: +49 3641 9 44321


[slurm-dev] Re: More than one job/task per node?

2016-04-29 Thread Benjamin Redling

On 2016-04-29 07:36, Lachlan Musicman wrote:
> I'm finding this a little confusing.
> 
> We have a very simple script we are using to test/train staff how to use
> SLURM (16.05-pre2). They are moving from an old Torque/Maui system.
> 
> I have a test partition set up,
> 
> from slurm.conf
> 
> NodeName=slurm-[01-02] CPUs=8 RealMemory=32000 Sockets=1 CoresPerSocket=4
> ThreadsPerCore=2 State=UNKNOWN
> PartitionName=debug Nodes=slurm-[01-02] Default=YES MaxTime=48:0:0
> DefaultTime=0:40:0 State=UP

[...]

If the above is the complete slurm.conf you might better generate a
config. via the simple or advanced configuration tool

Slurm uses "select/linear" as default -- according to:
[1] http://slurm.schedmd.com/cons_res.html

Albeit
"
All CR_s assume Shared=No or Shared=Force EXCEPT for CR_MEMORY which
assumes Shared=Yes
"
talks about cons. res. -- which you define none -- I think Shared=No is
also a default (which I couldn't look up properly)

But http://slurm.schedmd.com/cons_res_share.html
and at [1] below "Example of Node Allocations Using Consumable Resource
Plugin" is section "Using Slurm's Default Node Allocation (Non-shared
Mode)" what seems to exactly describe your case.

Regards,
Benjamin
-- 
FSU Jena | JULIELab.de/Staff/Benjamin+Redling.html
vox: +49 3641 9 44323 | fax: +49 3641 9 44321


[slurm-dev] Re: scontrol update not allowing jobs

2016-04-15 Thread Benjamin Redling

On 2016-04-15 16:54, Benjamin Redling wrote:
> 
> On 04/15/2016 16:22, Glen MacLachlan wrote:
>> I tried that already by leaving the field blank as in  "flags=" but that
>> has no effect. Should I change it to something else?
> 
> I set my nodes to State=IDLE after maintenance (from DOWN, DRAIN/DOWN).
> 
> Depending on your cases you might have to look at the scontrol man page
> under "State=" and read about ALLOCATED and MIXED.

Sorry for the pointless posting...
I read the thread again and I was wondering what I wrote there only
hours ago.

Benjamin
-- 
FSU Jena | JULIELab.de/Staff/Benjamin+Redling.html
vox: +49 3641 9 44323 | fax: +49 3641 9 44321


[slurm-dev] Re: scontrol update not allowing jobs

2016-04-15 Thread Benjamin Redling

On 04/15/2016 16:22, Glen MacLachlan wrote:
> I tried that already by leaving the field blank as in  "flags=" but that
> has no effect. Should I change it to something else?

I set my nodes to State=IDLE after maintenance (from DOWN, DRAIN/DOWN).

Depending on your cases you might have to look at the scontrol man page
under "State=" and read about ALLOCATED and MIXED.

Benjamin
-- 
FSU Jena | JULIELab.de/Staff/Benjamin+Redling.html
vox: +49 3641 9 44323 | fax: +49 3641 9 44321


[slurm-dev] Re: PySlurm for SLURM 2.3.2 API

2016-04-14 Thread Benjamin Redling

On 04/14/2016 11:08, Naajil Aamir wrote:
> Hi hope you are doing well. I am currently working on a scheduling policy
> of slurm 2.3.2 for that i need *PYSLURM* version that is compatible with
> slurm 2.3.3 which i am unable to find on internet. It would be a great help
> if you could provide a link to PYSLURM for Slurm 2.3.2 repository.

Maybe the stale branches of pyslurm are what you are looking for?
https://github.com/PySlurm/pyslurm/branches

2.3.3 seems to be the oldest

Benjamin
-- 
FSU Jena | JULIELab.de/Staff/Benjamin+Redling.html
vox: +49 3641 9 44323 | fax: +49 3641 9 44321


[slurm-dev] Re: Overview of jobs in the cluster

2016-03-25 Thread Benjamin Redling

15.08 is in Debian testing. A bit risky but I would have a look with pinning 
what else would need an upgrade as a dependency. BR

Am 25. März 2016 11:01:20 MEZ, schrieb Diego Zuccato :
>
>Il 25/03/2016 09:59, Diego Zuccato ha scritto:
>
>> I'm using SLURM 14.03.9 (the one packaged in Debian 8) and slurmtop
>5.00
>> (from schedtop 'sources'). I'll try 5.02 as soon as some new jobs
>gets
>> submitted.
>Seems I found the problem. Searching schedtop, I found the
>announcement,
>where Dennis says:
>> schedtop/slurmtop requires Slurm 15.08 (or a recent beta)
>
>So I'm out of luck till Debian decides to upgrade to 15.08 :(
>Too many other machines to manage to even *think* compiling SLURM from
>sources!
>
>--
>Diego Zuccato
>Servizi Informatici
>Dip. di Fisica e Astronomia (DIFA) - Università di Bologna
>V.le Berti-Pichat 6/2 - 40127 Bologna - Italy
>tel.: +39 051 20 95786
>mail: diego.zucc...@unibo.it
>
>5x1000 AI GIOVANI RICERCATORI
>DELL'UNIVERSITÀ DI BOLOGNA
>Codice Fiscale: 80007010376

--
FSU Jena | JULIELab.de/Staff/Benjamin+Redling.HTML
vox: +49 3641 9 44323 | fax: +49 3641 9 44321


[slurm-dev] Re: Overview of jobs in the cluster

2016-03-25 Thread Benjamin Redling

On 2016-03-24 17:22, John DeSantis wrote:
>>> What I'm looking for is a tool that gives me, for every node/cpu the
>>> corresponding job.
>>
>> squeue -n 
>>
>> As the man page explicitly mentions:  can be a single node and
>> either a NodeName or a NodeHostname
> 
> I believe this is a typo, as we use "squeue -w " which gives
> us all corresponding jobs running on the host(s) in question.  It is
> actually quite useful if you live on the command line.

Not a typo. An older slurm version.

I am quite surprised that something like that changed _and_ "-n" has
another meaning now.

I hope decisions like that don't come light-hearted.

This means helping others is limited to one own version or requires a
look up of the newer, current documentations.
In case of slurm thats really bad as it is missing versioning -- like
ganeti where it is quite easy to point anyone to either the version one
or the other is using.
Other projects like ansible or ganeti really shine here:
* every command an is parameters has a remark like "require 1.6" where
needed
* there is a clear structure from getting started to advanced features
* semantic changes are easy to spot -- prominently highlighted
* in case of ganeti the design drafts add a lot to understanding changes
to expect

Slurm documentatin is scattered, not versioned and now I know I have to
watch out for changing semantics.
Expecting everyone being the newest release is unrealistic.

/BR
-- 
FSU Jena | JULIELab.de/Staff/Benjamin+Redling.html
vox: +49 3641 9 44323 | fax: +49 3641 9 44321


[slurm-dev] Re: Overview of jobs in the cluster

2016-03-24 Thread Benjamin Redling

On 2016-03-24 13:59, Diego Zuccato wrote:
> Is there an equivalent of torque's pbstop for SLURM?

There are a lot of "rosetta" websites for workload schedulers. Most of
the time Slurm, Torque, PBS and  Sun Gridengine & variants are listed.

> I already tried slurmtop, but it seems something is not right (nodes are
> shown as fully allocated with '@' even if only one CPU is really
> allocated, there is no color mapping, etc).

What are you missing from "squeue"?

> What I'm looking for is a tool that gives me, for every node/cpu the
> corresponding job.

squeue -n 

As the man page explicitly mentions:  can be a single node and
either a NodeName or a NodeHostname


Regards,
Benjamin
-- 
FSU Jena | JULIELab.de/Staff/Benjamin+Redling.html
vox: +49 3641 9 44323 | fax: +49 3641 9 44321


[slurm-dev] slurm-dev please fix your quoting first (Re: Slurm Error)

2016-03-24 Thread Benjamin Redling

On 2016-03-24 04:25, Helmi Azizan wrote:

You wrote
> https://groups.google.com/d/msg/slurm-devel/LXmU3BoWGQw/ULqmA85qKAAJ

I wrote:
> hopefully the correct version, fitting to the 2.6 version you are using.

You wrote:
>> helmi@Dellrackmount:~$ srun -N1 /bin/hostname
>> srun: error: Unable to allocate resources: Unable to contact slurm
>> controller (connect failure)

I wrote:
> Before you go into debugging:
> what does sinfo say about available partitions and nodes?
> helmi@Dellrackmount:~$ sinfo
> PARTITION AVAIL  TIMELIMIT  NODES  STATE NODELIST
> debug*   up   infinite  0n/a

You wrote:
> After I allocate the NodeName to Dellrackmount I ran :-


Please, can you please fix the qouting in your mail client or use a
proper mail service?
In your mails your lines and mine have the same level of indentation.
The first time I ignored it -- can happen once in a while for whatever
reason -- but in the meantime you should have noticed.

Benjamin
-- 
FSU Jena | JULIELab.de/Staff/Benjamin+Redling.html
vox: +49 3641 9 44323 | fax: +49 3641 9 44321


[slurm-dev] Re: Multiple Python versions

2016-03-24 Thread Benjamin Redling

+1
If a user needs a newer Python version and a paper submission is due I don't 
want to be a blocker as an admin
The conservative packages of server distributions are of no value to my users. 
They are free to run from their homes whatever they need, otherwise the 
clusters would be pointless! BR

Am 23. März 2016 23:29:19 MEZ, schrieb Craig Yoshioka :
>
>Modules are good for most things but if you do a lot of work with
>Python you'll want a python specific system for managing different
>environments. I prefer conda. 
>
>Sent from my iPhone
>
>> On Mar 23, 2016, at 3:04 PM, Carlos Lijeron
> wrote:
>> 
>> Everyone,
>> 
>> This is not a SLURM specific question, but rather a Python question
>in a cluster environment supported by SLURM.   Does anyone of you have
>any suggestions on managing multiple Python versions (2.6.6, 2.7.9 and
>3.0) on a cluster, including package management and allowing users to
>install their own Python modules in their home directories?
>> 
>> Any ideas will be greatly appreciated.   We are having trouble
>installing Python modules for the different versions.
>> 
>> 
>> Carlos.
>> 

--
FSU Jena | JULIELab.de/Staff/Benjamin+Redling.HTML
vox: +49 3641 9 44323 | fax: +49 3641 9 44321


[slurm-dev] Re: Slurm error

2016-03-22 Thread Benjamin Redling

Hi Helmi Azizan,

On 03/22/2016 05:11, Helmi Azizan wrote:
> I created a new slurm.conf using the easy configurator but am still facing
> the following error:

hopefully the correct version, fitting to the 2.6 version you are using.

> helmi@Dellrackmount:~$ srun -N1 /bin/hostname
> srun: error: Unable to allocate resources: Unable to contact slurm
> controller (connect failure)

Before you go into debugging:
what does sinfo say about available partitions and nodes?


> helmi@Dellrackmount:~$ sudo slurmctld -Dvvv
> password for helmi:
> slurmctld: pidfile not locked, assuming no running daemon
> slurmctld: error: Configured MailProg is invalid
^^^
at least for the older 2.3 in Ubuntu 12.04 I had to install (something
like) mailutils that provides /usr/bin/mail to get it working

> slurmctld: Accounting storage NOT INVOKED plugin loaded
[...]
> slurmctld: fatal: No NodeName information available!
^^ Are you allowed
to post at least your partitions and nodes?
That would make things easier.


Regards,
Benjamin
-- 
FSU Jena | JULIELab.de/Staff/Benjamin+Redling.html
vox: +49 3641 9 44323 | fax: +49 3641 9 44321


[slurm-dev] Re: job can not requeue after preempted

2016-03-19 Thread Benjamin Redling

On 03/17/2016 04:01, 温圣召 wrote:
> The preempted job1 show a PD reason of  BeginTime
> my job invocation at  the info of them as follow:
> [root@szwg]#  sbatch --gres=gpu:4 -N 1 --partition=low  mybatch.sh

You demand for _4_ GPUs and 1 node.
Your config says each node has Gres=gpu:2

> Submitted batch job 103
> 
> 
> [root@szwg]# squeue
>  JOBID PARTITION NAME USER ST   TIME  NODES 
> NODELIST(REASON)
>103   low mybatch. root  R   0:10  1 
> cp01-sys-hic-gpu-00.cp01.baidu.com
> 
> 
> [root@szwg]#  sbatch --gres=gpu:4 -N 1 --partition=hig  mybatch.sh
> Submitted batch job 104
> 
> 
> [root@szwg]# squeue
>  JOBID PARTITION NAME USER ST   TIME  NODES 
> NODELIST(REASON)
>103   low mybatch. root PD   0:00  1 
> (BeginTime)
>104   hig mybatch. root  R   0:45  1 
> cp01-sys-hic-gpu-00.cp01.baidu.com

We are neither using preemption nor gres but and I am wrong, but I think
"BeginTime" is misleading.
As far as I understand there are't engouh free gpus (none) in your
partion with the idle node, requeue can't happe as long as 104 is running.

Regards,
Benjamin
--
FSU Jena | JULIELab.de/Staff/Benjamin+Redling.html
vox: +49 3641 9 44323 | fax: +49 3641 9 44321


[slurm-dev] Re: job can not requeue after preempted

2016-03-19 Thread Benjamin Redling

On 2016-03-16 13:54, 温圣召 wrote:
> my job ... can not be requeue when it preempted ...

Can you please post the job invocation too?
Does the preempted job1 show a PD reason (%R) in the queue?

Regards,
Benjamin
-- 
FSU Jena | JULIELab.de/Staff/Benjamin+Redling.html
vox: +49 3641 9 44323 | fax: +49 3641 9 44321


[slurm-dev] Re: job can not requeue after preempted

2016-03-18 Thread Benjamin Redling

On 03/17/2016 04:01, 温圣召 wrote:
> The preempted job1 show a PD reason of  BeginTime
> my job invocation at  the info of them as follow:
> [root@szwg]#  sbatch --gres=gpu:4 -N 1 --partition=low  mybatch.sh

You demand for _4_ GPUs and 1 node.
Your config says each node has Gres=gpu:2

> Submitted batch job 103
> 
> 
> [root@szwg]# squeue
>  JOBID PARTITION NAME USER ST   TIME  NODES 
> NODELIST(REASON)
>103   low mybatch. root  R   0:10  1 
> cp01-sys-hic-gpu-00.cp01.baidu.com
> 
> 
> [root@szwg]#  sbatch --gres=gpu:4 -N 1 --partition=hig  mybatch.sh
> Submitted batch job 104
> 
> 
> [root@szwg]# squeue
>  JOBID PARTITION NAME USER ST   TIME  NODES 
> NODELIST(REASON)
>103   low mybatch. root PD   0:00  1 
> (BeginTime)
>104   hig mybatch. root  R   0:45  1 
> cp01-sys-hic-gpu-00.cp01.baidu.com

We are neither using preemption nor gres and maybe I am wrong, but I
think "BeginTime" is misleading.
As far as I understand there aren't enough free gpus (none) in your
partition with the idle node, requeue can't happen as long as 104 is
running.

Regards,
Benjamin
--
FSU Jena | JULIELab.de/Staff/Benjamin+Redling.html
vox: +49 3641 9 44323 | fax: +49 3641 9 44321


[slurm-dev] Re: slurm-dev Job not using cores from different nodes

2016-02-12 Thread Benjamin Redling

Am 10.02.2016 um 09:04 schrieb Pierre Schneeberger:
> I submitted the job with sbatch and the following command:
> #!/bin/bash
> #SBATCH -n 80 # number of cores
> #SBATCH -o
> /mnt/nfs/bio/HPC_related_material/Jobs_STDOUT_logs/slurm.%N.%j.out # STDOUT
> #SBATCH -e
> /mnt/nfs/bio/HPC_related_material/Jobs_STDERR_logs/slurm.%N.%j.err # STDERR
> perl /mnt/nfs/bio/Script_test_folder/Mira_script.pl
> 
> And the mira manifest file (don't know if you have experience with this
> assembler?) is written in a way that the software should use the total
> amount of allocated cores:

No direct experience with MIRA. As soon as you use more cores than a
node has, multi-threading seems secondary to me and I though it would be
nice to see your script -- to many people, including me, tend to call
commands ad hoc.

When you are sure you called both jobs with the same configuration and
one is working across nodes I currently have no idea.
  I thought it could be possible you submitted one job with proper MPI
parameters and the other one without. Now that you provided the script I
hope for you that other, more competent list members can help you.

Regards,
Benjamin
-- 
FSU Jena | JULIELab.de/Staff/Benjamin+Redling.html
vox: +49 3641 9 44323 | fax: +49 3641 9 44321


[slurm-dev] Re: Setting up SLURM for a single multi-core node

2016-02-11 Thread Benjamin Redling

On 2016-02-11 07:36, Rohan Garg wrote:
> 
> Hello,
> 
> I'm trying to set up SLURM-15.08.1 on a single multi-core node to
> manage multi-threaded jobs. The machine has 16 physical cores
> on 2 sockets with HyperThreading enabled. I'm using the EASY
> scheduling algorithm with backfilling. The goal is to fully utilize all
> the available cores at all times.
> 
> Given a list of three jobs with requirements of 8 cores, 2 cores,
> and 4 cores, the expectation is that the jobs should be co-scheduled
> to utilize 14 of the 16 available cores.  However, I can't seem to
> get SLURM to work as expected. SLURM runs the latter two jobs
> together but refuses to schedule the first job until they finish.
> (Is this the expected behavior of the EASY-backfilling algorithm?)
> 
> Here's the list of jobs:
> 
>   $ cat job1.batch
> 
> #!/bin/bash
> #SBATCH --sockets-per-node=1
> #SBATCH --cores-per-socket=8
> #SBATCH --threads-per-core=1
> srun /path/to/application1
>   
>   $ cat job2.batch
>   
> #!/bin/bash
> #SBATCH --sockets-per-node=1
> #SBATCH --cores-per-socket=2
> #SBATCH --threads-per-core=1
> srun /path/to/application2
>   
>   $ cat job3.batch
>   
> #!/bin/bash
> #SBATCH --sockets-per-node=1
> #SBATCH --cores-per-socket=4
> #SBATCH --threads-per-core=1
> srun /path/to/application3

At a quick glance:

In general let the scheduler do the work. Don't micro-manage.
Be aware that your SBATCH setting are constraints -- not hints.

You have 3 jobs that each request 1 socket = 3 socket.
You have 2 phys. sockets.

/Benjamin
-- 
FSU Jena | JULIELab.de/Staff/Benjamin+Redling.html
vox: +49 3641 9 44323 | fax: +49 3641 9 44321


[slurm-dev] Re: Setting up SLURM for a single multi-core node

2016-02-11 Thread Benjamin Redling

On 2016-02-11 07:36, Rohan Garg wrote:
> [...] The machine has 16 physical cores
> on 2 sockets with HyperThreading enabled. I'm using the EASY
> scheduling algorithm with backfilling. The goal is to fully utilize all
> the available cores at all times.

> Given a list of three jobs with requirements of 8 cores, 2 cores,
> and 4 cores, the expectation is that the jobs should be co-scheduled
> to utilize 14 of the 16 available cores.  However, I can't seem to
> get SLURM to work as expected. SLURM runs the latter two jobs
> together but refuses to schedule the first job until they finish.

The job should be shown in squeue with ST: PD and (Reason): (Resources)"
In your case you can never allocate more than 8+4 cores because you
request 3 sockets (1 per job)

>   $ cat job1.batch
> 
> #!/bin/bash
> #SBATCH --sockets-per-node=1
> #SBATCH --cores-per-socket=8
> #SBATCH --threads-per-core=1
> srun /path/to/application1
>   
>   $ cat job2.batch
>   
> #!/bin/bash
> #SBATCH --sockets-per-node=1
> #SBATCH --cores-per-socket=2
> #SBATCH --threads-per-core=1
> srun /path/to/application2
>   
>   $ cat job3.batch
>   
> #!/bin/bash
> #SBATCH --sockets-per-node=1
> #SBATCH --cores-per-socket=4
> #SBATCH --threads-per-core=1
> srun /path/to/application3

At a quick glance:

In general let the scheduler do the work. Don't micro-manage.
Be aware that your SBATCH setting are constraints. In this case
"--sockets-per-node=1" might not be what you want:

http://slurm.schedmd.com/sbatch.html
<--- %< --->
-B --extra-node-info=
Request a specific allocation of resources with details as to the
number and type of computational resources within a cluster: number of
sockets (or physical processors) per node, cores per socket, and threads
per core. The total amount of resources being requested is the product
of all of the terms. Each value specified is considered a minimum. An
asterisk (*) can be used as a placeholder indicating that all available
resources of that type are to be utilized. As with nodes, the individual
levels can also be specified in separate options if desired:

--sockets-per-node=
--cores-per-socket=
--threads-per-core=

If SelectType is configured to select/cons_res, it must have a
parameter of CR_Core, CR_Core_Memory, CR_Socket, or CR_Socket_Memory for
this option to be honored.
<--- %< --->

/Benjamin
-- 
FSU Jena | JULIELab.de/Staff/Benjamin+Redling.html
vox: +49 3641 9 44323 | fax: +49 3641 9 44321


[slurm-dev] Re: slurm-dev Job not using cores from different nodes

2016-02-04 Thread Benjamin Redling

Can you post how you submitted the job?
Mira on 60 cores needs MPI in your case. Multi threading works w/o

BTW. Your config says 31cpus. Generated without incr index or intended?

Am 4. Februar 2016 18:02:15 MEZ, schrieb Pierre Schneeberger 
:
>Hi there,
>
>I'm setting up a small cluster composed of 4 blades with 32 (physical)
>cores and 750 Gb RAM each (so a total of 128 cores and approx 3 Tb
>RAM). A
>CentOS 7 VM is running on each blade.
>The slurm controller service is up and running on one of the blade, and
>the
>daemon service has been installed on each of the four blades (up and
>running as well).
>
>A few days ago, I submitted a job using the MIRA assembler
>(multithreaded)
>on 60 cores and it worked well, using all the resources I allocated to
>the
>job. At that point, only 2 blades (including the one with the
>controller)
>were running and the job was completed successfully using 60 cores when
>needed.
>
>The problem appeared when I added the 2 last blades and it seems that
>it
>doesn't matter how much resources (cores) I allocate to a job, it now
>runs
>on a maximum of 32 cores (the number of physical cores per node).
>I tried it with 60, 90 and 120 cores but MIRA, according to the system
>monitor from CentOS, seem to use only a maximum of 32 cores (all cores
>from
>one node but none of the others that were allocated). Is it possible
>that
>there is a communication issue between the nodes? (although all seem
>available when using the sinfo command).
>
>I tried to restart the different services (controller/slaves) but it
>doesn't seem to help.
>
>I would be grateful if someone could give me a hint on how to solve
>this
>issue,
>
>Many thanks in advance,
>Pierre
>
>Here is the *slurm.conf* information:
>
># slurm.conf file generated by configurator easy.html.
># Put this file on all nodes of your cluster.
># See the slurm.conf man page for more information.
>#
>ControlMachine=hpc-srvbio-03
>ControlAddr=192.168.12.12
>#
>#MailProg=/bin/mail
>MpiDefault=none
>#MpiParams=ports=#-#
>ProctrackType=proctrack/pgid
>ReturnToService=1
>SlurmctldPidFile=/var/run/slurmctld.pid
>#SlurmctldPort=6817
>SlurmdPidFile=/var/run/slurmd.pid
>#SlurmdPort=6818
>SlurmdSpoolDir=/var/spool/slurmd
>SlurmUser=root
>#SlurmdUser=root
>StateSaveLocation=/var/spool/slurmctld
>SwitchType=switch/none
>TaskPlugin=task/none
>#
>#
># TIMERS
>#KillWait=30
>#MinJobAge=300
>#SlurmctldTimeout=120
>#SlurmdTimeout=300
>#
>#
># SCHEDULING
>FastSchedule=1
>SchedulerType=sched/backfill
>#SchedulerPort=7321
>SelectType=select/linear
>#
>#
># LOGGING AND ACCOUNTING
>AccountingStorageType=accounting_storage/filetxt
>ClusterName=cluster
>#JobAcctGatherFrequency=30
>JobAcctGatherType=jobacct_gather/none
>#SlurmctldDebug=3
>#SlurmctldLogFile=
>#SlurmdDebug=3
>#SlurmdLogFile=
>#
>#
># COMPUTE NODES
>#NodeName=Nodes[1-4] CPUs=31 State=UNKNOWN
>PartitionName=HPC_test Nodes=hpc-srvbio-0[3-4],HPC-SRVBIO-0[1-2]
>Default=YES MaxTime=INFINITE State=UP
>NodeName=DEFAULT CPUs=31 RealMemory=75 TmpDisk=36758
>NodeName=hpc-srvbio-03 NodeAddr=192.168.12.12
>NodeName=hpc-srvbio-04 NodeAddr=192.168.12.13
>NodeName=HPC-SRVBIO-02 NodeAddr=192.168.12.11
>NodeName=HPC-SRVBIO-01 NodeAddr=192.168.12.10

--
FSU Jena | JULIELab.de/Staff/Benjamin+Redling.HTML
vox: +49 3641 9 44323 | fax: +49 3641 9 44321


[slurm-dev] Re: Job limits

2016-02-04 Thread Benjamin Redling

On 2016-02-04 16:43, Skouson, Gary B wrote:
> The GrpJobs limits the total number of jobs allowed to be running.  Let's say 
> I want to allow 70 jobs per users.  The GrpJobs would work fine for that.  

> However, I'd like to limit the number of jobs able to reserve resources in 
> the backfill schedule's map. 

sorry to not consider "fit in whatever backfill window is available" in
your first mail.

I'm I closer with:
GrpJobs/MaxJobs=70 + bf_max_job_user= ?
http://slurm.schedmd.com/sched_config.html

> Setting a QOS flag with the NoReserve does this for all jobs in the QOS, I 
> was looking for a way to only apply something like that once a user has some 
> number of running jobs.
> I'd like to force NoReserve once the user's running jobs reaches some 
> threshold.

/Benjamin
-- 
FSU Jena | JULIELab.de/Staff/Benjamin+Redling.html
vox: +49 3641 9 44323 | fax: +49 3641 9 44321


[slurm-dev] Re: Job limits

2016-02-03 Thread Benjamin Redling

I haven't understood why qos GrpJobs= -- assoc per user? -- won't work for you.

Am 4. Februar 2016 01:50:22 MEZ, schrieb "Skouson, Gary B" 
:
>
>I'd like a way to be able to limit the number of jobs that a user is
>allowed to run before we only allow them to run by backfilling.
>
>For example, let's say we'd like to allow users to run lots of jobs,
>but only allow them to "reserve" resources for their first few jobs. 
>That way, a user with no jobs running submitting a job requesting 50
>nodes will have their job start reserving nodes until the job starts. 
>Once some number of jobs have started for this user, I'd still like to
>allow them to run, but I'd like to only allow them to run if their jobs
>can fit in whatever backfill window is available.  Once the number of
>running jobs falls below the threshold, another of their jobs (or maybe
>several) would be allowed to begin reserving resources in the backfill
>schedule's map, until another of their jobs starts, at which point the
>rest would be relegated to backfilling only again.
>
>We've been using the Moab soft/hard limits for this functionality, but
>I'd like to be able to do the same thing using Slurm directly.
>
>I've looked at QOS and while the NoReserve flag kind of describes what
>I'm looking for, I only want it to apply to the jobs in the queue once
>they've reached their running jobs threshold, I couldn't see how to
>make that work though.
>
>Any thoughts or other options to accomplish something like this with
>Slurm?
>
>-
>Gary Skouson

--
FSU Jena | JULIELab.de/Staff/Benjamin+Redling.HTML
vox: +49 3641 9 44323 | fax: +49 3641 9 44321


[slurm-dev] Re: Ressouces allocation problem

2016-02-01 Thread Benjamin Redling

On 2016-02-01 11:08, David Roman wrote:
> The both nodes are the same. They are virtual machine (VMWARE) to do some 
> tests.

That makes me wonder why changing fastschedule=0 to 1 results in
comprehensible behavior.

Have you looked into the log files on the master and the node?

(Apart from that I really don't like proctrac/pgid [marked as unreliable
for process tracking] and especially not combined with preemption.
But I can't underpin that feeling with facts -- you said both jobs were
in state RUNNING. Have checked with something else than squeue that they
really are?
Could proctrac/pgid mess up and squeue keeps reporting the jobs as
running? Where in this scenario is the link to fastsched=0? )

/Benjamin
-- 
FSU Jena | JULIELab.de/Staff/Benjamin+Redling.html
vox: +49 3641 9 44323 | fax: +49 3641 9 44321


[slurm-dev] Re: Ressouces allocation problem

2016-01-29 Thread Benjamin Redling



Am 29.01.2016 um 15:08 schrieb David Roman:
> I created 2 jobs
> Job_A uses 8 CPUS in partion DEV
> Job_B uses 16 CPUS in partion LOW
> 
> If I start Job_A before Job_B, all is ok. Job_A is in RUNNING state and Job_B 
> is in PENDING state
> 
> BUT, If I start Job_B before Job_A. The both jobs are in RUNNING state.

[...]

> FastSchedule=0

Can you set this to "1"?

What ever you post about your partitions resource configuration, it is
not taken into considerations.
I think I could construct a case very your behaviour is fine considering
your _actual configuration_ and not your _configured resources_

https://computing.llnl.gov/linux/slurm/slurm.conf.html
There section *FastSchedule*
<--- %< --->
0
Base scheduling decisions upon the actual configuration of each
individual node except that the node's processor count in SLURM's
configuration must match the actual hardware configuration if
SchedulerType=sched/gang or SelectType=select/cons_res are configured
(both of those plugins maintain resource allocation information using
bitmaps for the cores in the system and must remain static, while the
node's memory and disk space can be established later).
<--- %< --->

Regards, Benjamin
-- 
FSU Jena | JULIELab.de/Staff/Benjamin+Redling.html
vox: +49 3641 9 44323 | fax: +49 3641 9 44321


[slurm-dev] Re: Share free cpus

2016-01-29 Thread Benjamin Redling

Am 29.01.2016 um 15:33 schrieb Andy Riebs:
> 
> Slurmd needs to run as root so that it can start jobs for any of the
> cluster users.

Sure, I just wanted to correct my former statement that slurm_d_ is
running is a dedicated user here -- it isn't. It's only slurmctld which
is running as "slurm".

I went back to this thread because Brian Freed "slurmd node state down"
on 28th Jan. got a reply by Trey Dockendorf that gave me that hint that
I mixed things up a few days before

/B


> On 01/29/2016 08:10 AM, Benjamin Redling wrote:
>> Am 18.01.2016 um 18:42 schrieb Benjamin Redling:
>>> Am 18.01.2016 um 01:39 schrieb Jordan Willis:
>>>>  CompleteWait=60
>>>>  SlurmdUser=root
>>>    side note: really root? Why not a dedicated user?
>> It is at least the Debian default and I just didn't see the "d" of
>> SlurmdUser in Jordan Willi's config.
>>
>> Running fine:
>> slurm.conf
>> --- %< ---
>> SlurmUser=slurm
>> # SlurmdUser=root
>> --- %< ---
>>
>> Not uncommented but seems to be the default. Slurmd is running as root
>> here.
>>
>> /Benjamin

-- 
FSU Jena | JULIELab.de/Staff/Benjamin+Redling.html
vox: +49 3641 9 44323 | fax: +49 3641 9 44321


[slurm-dev] Re: Share free cpus

2016-01-29 Thread Benjamin Redling

Am 18.01.2016 um 18:42 schrieb Benjamin Redling:
> Am 18.01.2016 um 01:39 schrieb Jordan Willis:
>> CompleteWait=60
>> SlurmdUser=root
>   side note: really root? Why not a dedicated user?

It is at least the Debian default and I just didn't see the "d" of
SlurmdUser in Jordan Willi's config.

Running fine:
slurm.conf
--- %< ---
SlurmUser=slurm
# SlurmdUser=root
--- %< ---

Not uncommented but seems to be the default. Slurmd is running as root here.

/Benjamin
-- 
FSU Jena | JULIELab.de/Staff/Benjamin+Redling.html
vox: +49 3641 9 44323 | fax: +49 3641 9 44321


[slurm-dev] Re: Ressouces allocation problem

2016-01-29 Thread Benjamin Redling

On 2016-01-29 17:04, David Roman wrote:
> My problem is simple. I have 2 nodes, each  with 8 cpus. I can use at the 
> same time a maximum of 16 cpus. In the first case Job_A use 8 cpus and Job_B 
> wait to use 16 cpus. But, in the other case, Job_B use 16 cpus, and Job_A use 
> 8 cpus in the same time. But 16+8 = 24 and it is great than 16 !

Can you cat /proc/cpuinfo -- I still think one of the nodes might not
fit your configuration.

As I tried to explain: depending on your real hardware Fastschedule=0
will consider this and not your configuration and suddenly the sequence
of job submission is relevant.

Anyway can you have a look via scontrol show -d job  and
scontrol show  into details of both running jobs. For a quick
glimpse.

After that you can try to raise SlurmdDebug on the compute node and
Slurmctld on the master up to 9 and inspect SlurmdLogFile on the compute
node and SlurmctldLogFile on the master. To really get _all_ the details
of the job allocation.

Benjamin


> David
> 
> 
> De : Dennis Mungai [mailto:dmun...@kemri-wellcome.org]
> Envoyé : vendredi 29 janvier 2016 16:18
> À : slurm-dev <slurm-dev@schedmd.com>
> Objet : [slurm-dev] Re: Ressouces allocation problem
> 
> 
> Can you change your consumable resources from CR_Core_Memory to CR_CPU_Memory?
> On Jan 29, 2016 5:42 PM, Benjamin Redling 
> <benjamin.ra...@uni-jena.de<mailto:benjamin.ra...@uni-jena.de>> wrote:
> 
> Am 29.01.2016 um 15:31 schrieb Dennis Mungai:
>> Add SHARE=FORCE to your partition settings for each partition entry in
>> the configuration file.
> 
> https://computing.llnl.gov/linux/slurm/cons_res_share.html
> 
> selection setting was:
> SelectType=select/cons_res
> SelectTypeParameters=CR_Core_Memory
> 
> Shared=FORCE as you recommend leads to:
> "
> Cores are allocated to jobs. A core may run more than one job.
> "
> 
> What does that have to do with the problem?
> Can you elaborate on that?
> 
> /Benjamin
> 
> 
>> On Jan 29, 2016 5:08 PM, David Roman 
>> <david.ro...@noveltis.fr<mailto:david.ro...@noveltis.fr>> wrote:
>> Hello,
>>
>> I'm a newbies with SLURM. Perhaps could you help me to understand my
>> mistake.
>>
>> I have 2 nodes (2 sockets with 4 core per socket = 8 CPUs per node) I
>> created 3 partitions
>>
>> DEV with node2
>> OPwith node1
>> LOW with node1 and node2
>>
>> I created 2 jobs
>> Job_A uses 8 CPUS in partion DEV
>> Job_B uses 16 CPUS in partion LOW
>>
>> If I start Job_A before Job_B, all is ok. Job_A is in RUNNING state and
>> Job_B is in PENDING state
>>
>> BUT, If I start Job_B before Job_A. The both jobs are in RUNNING state.
>>
>> Thanks for your help,
>>
>> David.
>>
>>
>> Here my slurm.conf without comments
>>
>> ClusterName=Noveltits
>> ControlMachine=slurm
>> SlurmUser=slurm
>> SlurmctldPort=6817
>> SlurmdPort=6818
>> AuthType=auth/munge
>> StateSaveLocation=/tmp
>> SlurmdSpoolDir=/tmp/slurmd
>> SwitchType=switch/none
>> MpiDefault=none
>> SlurmctldPidFile=/var/run/slurmctld.pid
>> SlurmdPidFile=/var/run/slurmd.pid
>> ProctrackType=proctrack/pgid
>> CacheGroups=0
>> ReturnToService=0
>> SlurmctldTimeout=300
>> SlurmdTimeout=300
>> InactiveLimit=0
>> MinJobAge=300
>> KillWait=30
>> Waittime=0
>> SchedulerType=sched/backfill
>> SelectType=select/cons_res
>> SelectTypeParameters=CR_CORE_Memory
>> FastSchedule=0
>> SlurmctldDebug=3
>> SlurmdDebug=3
>> JobCompType=jobcomp/none
>>
>> PreemptMode=SUSPEND,GANG
>> PreemptType=preempt/partition_prio
>>
>>
>> NodeName=slurm_node[1-2] CPUs=8 SocketsPerBoard=2 CoresPerSocket=4
>> ThreadsPerCore=1
>> PartitionName=op  Nodes=slurm_node1 Priority=100 Default=No
>> MaxTime=INFINITE State=UP
>> PartitionName=dev Nodes=slurm_node2 Priority=1   Default=yes
>> MaxTime=INFINITE State=UP PreemptMode=OFF
>> PartitionName=low Nodes=slurm_node[1-2] Priority=1   Default=No
>> MaxTime=INFINITE State=UP
>>
>>
>> __
>>
>> This e-mail contains information which is confidential. It is intended
>> only for the use of the named recipient. If you have received this
>> e-mail in error, please let us know by replying to the sender, and
>> immediately delete it from your system. Please note, that in these
>> circumstances, the use, disclosure, distribution or copying of this
>> information is strictly prohibited. KEMRI-Wellcome Trust Programme
>>

[slurm-dev] Re: Ressouces allocation problem

2016-01-29 Thread Benjamin Redling

As far as I understood Slurm with setting Share=FORCE you risk
over-committing.

/Benjamin

On 2016-01-29 16:10, Dennis Mungai wrote:
> And with the SHARE=FORCE:8 parameter, each consumable processor, socket or 
> core can be shared by 8 jobs, as an example.
> 
> On Jan 29, 2016 5:08 PM, David Roman  wrote:
> Hello,
> 
> I'm a newbies with SLURM. Perhaps could you help me to understand my mistake.
> 
> I have 2 nodes (2 sockets with 4 core per socket = 8 CPUs per node) I created 
> 3 partitions
> 
> DEV with node2
> OPwith node1
> LOW with node1 and node2
> 
> I created 2 jobs
> Job_A uses 8 CPUS in partion DEV
> Job_B uses 16 CPUS in partion LOW
> 
> If I start Job_A before Job_B, all is ok. Job_A is in RUNNING state and Job_B 
> is in PENDING state
> 
> BUT, If I start Job_B before Job_A. The both jobs are in RUNNING state.
> 
> Thanks for your help,
> 
> David.
> 
> 
> Here my slurm.conf without comments
> 
> ClusterName=Noveltits
> ControlMachine=slurm
> SlurmUser=slurm
> SlurmctldPort=6817
> SlurmdPort=6818
> AuthType=auth/munge
> StateSaveLocation=/tmp
> SlurmdSpoolDir=/tmp/slurmd
> SwitchType=switch/none
> MpiDefault=none
> SlurmctldPidFile=/var/run/slurmctld.pid
> SlurmdPidFile=/var/run/slurmd.pid
> ProctrackType=proctrack/pgid
> CacheGroups=0
> ReturnToService=0
> SlurmctldTimeout=300
> SlurmdTimeout=300
> InactiveLimit=0
> MinJobAge=300
> KillWait=30
> Waittime=0
> SchedulerType=sched/backfill
> SelectType=select/cons_res
> SelectTypeParameters=CR_CORE_Memory
> FastSchedule=0
> SlurmctldDebug=3
> SlurmdDebug=3
> JobCompType=jobcomp/none
> 
> PreemptMode=SUSPEND,GANG
> PreemptType=preempt/partition_prio
> 
> 
> NodeName=slurm_node[1-2] CPUs=8 SocketsPerBoard=2 CoresPerSocket=4 
> ThreadsPerCore=1
> PartitionName=op  Nodes=slurm_node1 Priority=100 Default=No  
> MaxTime=INFINITE State=UP
> PartitionName=dev Nodes=slurm_node2 Priority=1   Default=yes 
> MaxTime=INFINITE State=UP PreemptMode=OFF
> PartitionName=low Nodes=slurm_node[1-2] Priority=1   Default=No  
> MaxTime=INFINITE State=UP
> 
> 
> __
> 
> This e-mail contains information which is confidential. It is intended only 
> for the use of the named recipient. If you have received this e-mail in 
> error, please let us know by replying to the sender, and immediately delete 
> it from your system.  Please note, that in these circumstances, the use, 
> disclosure, distribution or copying of this information is strictly 
> prohibited. KEMRI-Wellcome Trust Programme cannot accept any responsibility 
> for the  accuracy or completeness of this message as it has been transmitted 
> over a public network. Although the Programme has taken reasonable 
> precautions to ensure no viruses are present in emails, it cannot accept 
> responsibility for any loss or damage arising from the use of the email or 
> attachments. Any views expressed in this message are those of the individual 
> sender, except where the sender specifically states them to be the views of 
> KEMRI-Wellcome Trust Programme.
> __
> 


-- 
FSU Jena | JULIELab.de/Staff/Benjamin+Redling.html
vox: +49 3641 9 44323 | fax: +49 3641 9 44321


[slurm-dev] Re: Ressouces allocation problem

2016-01-29 Thread Benjamin Redling

On 2016-01-29 16:08, Bruce Roberts wrote:
> Not really related to the question, but the documentation you are referencing 
> is years old.  You should probably reference the current documentation at 
> SchedMD 
> 
> http://slurm.schedmd.com
> 
> In this case 
> 
> http://slurm.schedmd.com/cons_res_share.html

Full ACK. Was leaving the office and just used the highest page ranking
to make a point.


> On January 29, 2016 6:42:24 AM PST, Benjamin Redling 
> <benjamin.ra...@uni-jena.de> wrote:
>>
>> Am 29.01.2016 um 15:31 schrieb Dennis Mungai:
>>> Add SHARE=FORCE to your partition settings for each partition entry
>> in
>>> the configuration file.
>>
>> https://computing.llnl.gov/linux/slurm/cons_res_share.html
>>
>> selection setting was:
>> SelectType=select/cons_res
>> SelectTypeParameters=CR_Core_Memory
>>
>> Shared=FORCE as you recommend leads to:
>> "
>> Cores are allocated to jobs. A core may run more than one job.
>> "
>>
>> What does that have to do with the problem?
>> Can you elaborate on that?
>>
>> /Benjamin
>>
>>
>>> On Jan 29, 2016 5:08 PM, David Roman <david.ro...@noveltis.fr> wrote:
>>> Hello,
>>>
>>> I'm a newbies with SLURM. Perhaps could you help me to understand my
>>> mistake.
>>>
>>> I have 2 nodes (2 sockets with 4 core per socket = 8 CPUs per node) I
>>> created 3 partitions
>>>
>>> DEV with node2
>>> OPwith node1
>>> LOW with node1 and node2
>>>
>>> I created 2 jobs
>>> Job_A uses 8 CPUS in partion DEV
>>> Job_B uses 16 CPUS in partion LOW
>>>
>>> If I start Job_A before Job_B, all is ok. Job_A is in RUNNING state
>> and
>>> Job_B is in PENDING state
>>>
>>> BUT, If I start Job_B before Job_A. The both jobs are in RUNNING
>> state.
>>>
>>> Thanks for your help,
>>>
>>> David.
>>>
>>>
>>> Here my slurm.conf without comments
>>>
>>> ClusterName=Noveltits
>>> ControlMachine=slurm
>>> SlurmUser=slurm
>>> SlurmctldPort=6817
>>> SlurmdPort=6818
>>> AuthType=auth/munge
>>> StateSaveLocation=/tmp
>>> SlurmdSpoolDir=/tmp/slurmd
>>> SwitchType=switch/none
>>> MpiDefault=none
>>> SlurmctldPidFile=/var/run/slurmctld.pid
>>> SlurmdPidFile=/var/run/slurmd.pid
>>> ProctrackType=proctrack/pgid
>>> CacheGroups=0
>>> ReturnToService=0
>>> SlurmctldTimeout=300
>>> SlurmdTimeout=300
>>> InactiveLimit=0
>>> MinJobAge=300
>>> KillWait=30
>>> Waittime=0
>>> SchedulerType=sched/backfill
>>> SelectType=select/cons_res
>>> SelectTypeParameters=CR_CORE_Memory
>>> FastSchedule=0
>>> SlurmctldDebug=3
>>> SlurmdDebug=3
>>> JobCompType=jobcomp/none
>>>
>>> PreemptMode=SUSPEND,GANG
>>> PreemptType=preempt/partition_prio
>>>
>>>
>>> NodeName=slurm_node[1-2] CPUs=8 SocketsPerBoard=2 CoresPerSocket=4
>>> ThreadsPerCore=1
>>> PartitionName=op  Nodes=slurm_node1 Priority=100 Default=No 
>>> MaxTime=INFINITE State=UP
>>> PartitionName=dev Nodes=slurm_node2 Priority=1   Default=yes
>>> MaxTime=INFINITE State=UP PreemptMode=OFF
>>> PartitionName=low Nodes=slurm_node[1-2] Priority=1   Default=No 
>>> MaxTime=INFINITE State=UP
>>>
>>>
>>>
>> __
>>>
>>> This e-mail contains information which is confidential. It is
>> intended
>>> only for the use of the named recipient. If you have received this
>>> e-mail in error, please let us know by replying to the sender, and
>>> immediately delete it from your system. Please note, that in these
>>> circumstances, the use, disclosure, distribution or copying of this
>>> information is strictly prohibited. KEMRI-Wellcome Trust Programme
>>> cannot accept any responsibility for the accuracy or completeness of
>>> this message as it has been transmitted over a public network.
>> Although
>>> the Programme has taken reasonable precautions to ensure no viruses
>> are
>>> present in emails, it cannot accept responsibility for any loss or
>>> damage arising from the use of the email or attachments. Any views
>>> expressed in this message are those of the individual sender, except
>>> where the sender specifically states them to be the views of
>>> KEMRI-Wellcome Trust Programme.
>>>
>> __
>>
>> -- 
>> FSU Jena | JULIELab.de/Staff/Benjamin+Redling.html
>> vox: +49 3641 9 44323 | fax: +49 3641 9 44321
> 


-- 
FSU Jena | JULIELab.de/Staff/Benjamin+Redling.html
vox: +49 3641 9 44323 | fax: +49 3641 9 44321


[slurm-dev] Re: Ressouces allocation problem

2016-01-29 Thread Benjamin Redling

On 2016-01-29 16:45, David Roman wrote:
> Hi, 
> 
> With "FastSchedule=1" in two cases, the second job wait the end of the first 
> job.
> But in my script I need to remove the parameter : #SBATCH --mem=2048 else I 
> have this error 
> 
> sbatch: error: Memory specification can not be satisfied
> sbatch: error: Batch job submission failed: Requested node configuration is 
> not available

Makes sense
http://slurm.schedmd.com/cons_res.html
"
In the cases where Memory is the consumable resource or one of the two
consumable resources the RealMemory parameter, which defines a node's
amount of real memory in slurm.conf, must be set when FastSchedule=1.
"
So, you can avoid setting RealMemory with fastschedule=0
but then -- as I posted earlier:

FastSchedule
0
Base scheduling decisions upon the actual configuration of each
individual node except that the node's processor count in Slurm's
configuration must match the actual hardware configuration if
PreemptMode=suspend,gang or SelectType=select/cons_res are configured
(both of those plugins maintain resource allocation information using
bitmaps for the cores in the system and must remain static, while the
node's memory and disk space can be established later).

(And you are using PreemptMode=suspend,gang...)

And you didn't /proof/ your actual configuration -- not to discredit
you, but I know how often I failed myself to actually show and _not just
assume_.

/Benjamin

> 
> 
> I try the other solutions that you give me, and I tell you what happens.
> 
> PS : I'm sorry, but my English is not very good.
> 
> David
> 
> 
> -Message d'origine-
> De : Benjamin Redling [mailto:benjamin.ra...@uni-jena.de] 
> Envoyé : vendredi 29 janvier 2016 15:32
> À : slurm-dev <slurm-dev@schedmd.com>
> Objet : [slurm-dev] Re: Ressouces allocation problem
> 
> 
> 
> 
> Am 29.01.2016 um 15:08 schrieb David Roman:
>> I created 2 jobs
>> Job_A uses 8 CPUS in partion DEV
>> Job_B uses 16 CPUS in partion LOW
>>
>> If I start Job_A before Job_B, all is ok. Job_A is in RUNNING state 
>> and Job_B is in PENDING state
>>
>> BUT, If I start Job_B before Job_A. The both jobs are in RUNNING state.
> 
> [...]
> 
>> FastSchedule=0
> 
> Can you set this to "1"?
> 
> What ever you post about your partitions resource configuration, it is not 
> taken into considerations.
> I think I could construct a case very your behaviour is fine considering your 
> _actual configuration_ and not your _configured resources_
> 
> https://computing.llnl.gov/linux/slurm/slurm.conf.html
> There section *FastSchedule*
> <--- %< --->
> 0
> Base scheduling decisions upon the actual configuration of each 
> individual node except that the node's processor count in SLURM's 
> configuration must match the actual hardware configuration if 
> SchedulerType=sched/gang or SelectType=select/cons_res are configured (both 
> of those plugins maintain resource allocation information using bitmaps for 
> the cores in the system and must remain static, while the node's memory and 
> disk space can be established later).
> <--- %< --->
> 
> Regards, Benjamin
> --
> FSU Jena | JULIELab.de/Staff/Benjamin+Redling.html
> vox: +49 3641 9 44323 | fax: +49 3641 9 44321
> 


-- 
FSU Jena | JULIELab.de/Staff/Benjamin+Redling.html
vox: +49 3641 9 44323 | fax: +49 3641 9 44321


[slurm-dev] slurm-dev check health and trigger actions via monitoring (Re: Re: NHC and disk / dell server health)

2016-01-28 Thread Benjamin Redling

Am 27.01.2016 um 09:53 schrieb Ole Holm Nielsen:
> On 01/27/2016 09:12 AM, Johan Guldmyr wrote:
>> has anybody already made some custom NHC checks that can be used to
>> check disk health or perhaps even hardware health on a dell server?

>> I've been thinking of using smartctl + NHC to test if the local disks
>> on the compute node is healthy.
>>
>> Or for Dell hardware then "omreport" something or perhaps one could
>> call for example the check_openmanage nagios check from NHC..

> We're extremely happy with NHC (Node Health Check was moved to
> https://github.com/mej/nhc recently) due to its numerous checks and its
> lightweight resource usage.

when I first read about NHC I wondered what improvements that gives my
about (standard) monitoring
Not that I don't like NHC per se: I found the sample configuration on
github really nice, because I could immediately implement a hand full of
checks in short time that would prevent me from running into a list of
failures of the last months.

But: when I fail to implement the proper monitoring rules (centrally) I
will fail to implement to proper checks in NHC I guess(?)

> I haven't been able to find any command for checking disk health, since
> smartctl is completely unreliable for checking failing disks (a bad disk
> will usually have a PASSED SMART status).  What I've seen many times is
> that a disk fails partly, so the kernel remounts file systems read-only.
>  This prevents any further health checks from running, including NHC,
> and all batch jobs running on a system with read-only disks are going to
> fail (almost) silently :-(  Normally I discover this scenario due to
> user complaints.

check_mk (we use it as part of OMD) complaints automatically if mount
options change -- and really _a lot_ of other parameters of a node
(everything IPMI sensors provide, DRDB status, network, ... ) out of the
box.

Running a custom script
https://mathias-kettner.de/checkmk_mkeventd_actions.html
that drains that node / puts job on hold / mails your users before your
users complain should be quite easy
  Other monitoring solutions provide triggering actions after such a
event too.

Before a colleague of mine introduced it I like to keep tests minimal
(KISS done wrong?),
but since testing OMD on a ganeti cluster and getting warnings about
things I wouldn't have been able to figure out how to monitor -- or how
important they are -- I'm totally sold and can highly recommend it.


/BR
-- 
FSU Jena | JULIELab.de/Staff/Benjamin+Redling.html
vox: +49 3641 9 44323 | fax: +49 3641 9 44321


[slurm-dev] slurm-dev output dir missing (Re: Re: jobs vanishing w/o trace(?))

2016-01-27 Thread Benjamin Redling

Am 25.01.2016 um 16:41 schrieb Benjamin Redling:
> I could fire several hundert jobs with a dummy shell script against that
> node but as soon as one of my users tries a complex pipeline jobs get
> lost with a slurm-*.out
    typo: lost _without_ a .out-file

Question:

> What do I fail to understand?

Answer: attentively reading slurmd log output from the affected node.

<--- %< --->
[2016-01-26T09:43:55] [3154] Could not open stdout file /var/[...].out:
No such file or directory
<--- %< --->

After reading that, it was rather obvious that all the succeeding test
cases by the user and me were launched from different, accessible
directories.
So until now we didn't even recognize we changed test cases from time to
time.

Eventually the KVM instance running Slurm was not properly setup via FAI
(provisioning) or ansible (post-inst).

Benjamin
-- 
FSU Jena | JULIELab.de/Staff/Benjamin+Redling.html
vox: +49 3641 9 44323 | fax: +49 3641 9 44321


[slurm-dev] Re: jobs vanishing w/o trace(?)

2016-01-25 Thread Benjamin Redling


Am 16.01.2016 um 21:10 schrieb Benjamin Redling:

[...] how is it at all possible that the jobs get lost? What
happened that the slurm master thinks all went well? (Does it? Am I just
missing something?)
Where can I start to investigate next?


I could fire several hundert jobs with a dummy shell script against that 
node but as soon as one of my users tries a complex pipeline jobs get 
lost with a slurm-*.out

What do I fail to understand?

--- %< ---
3003|runAllPipelinePmc.sh|MC20GBplus||4|00:05:25|FAILED|1:0
3004|runAllPipelinePmc.sh|MC20GBplus||4|00:00:40|CANCELLED|0:0
3005|runAllPipelinePmc.sh|MC20GBplus||8|00:07:25|CANCELLED|0:0
3006|runAllPipelinePmc.sh|MC20GBplus||11|00:00:00|CANCELLED|0:0
3008|runAllPipelinePmc.sh|MC20GBplus||11|00:00:00|CANCELLED|0:0
3007|runAllPipelinePmc.sh|MC20GBplus||11|00:00:00|CANCELLED|0:0
3009|runAllPipelinePmc.sh|MC20GBplus||4|00:00:00|PENDING|0:0
3010|runAllPipelinePmc.sh|MC20GBplus||4|00:00:00|PENDING|0:0
3011|runAllPipelinePmc.sh|MC20GBplus||4|00:00:00|PENDING|0:0
3012|runAllPipelinePmc.sh|MC20GBplus||4|00:00:00|PENDING|0:0
3013|runAllPipelinePmc.sh|MC20GBplus||4|00:00:00|PENDING|0:0
3014|runAllPipelinePmc.sh|MC20GBplus||4|00:00:00|PENDING|0:0
3015|runAllPipelinePmc.sh|MC20GBplus||4|00:00:00|PENDING|0:0
3016|runAllPipelinePmc.sh|MC20GBplus||4|00:00:00|PENDING|0:0
3017|runAllPipelinePmc.sh|MC20GBplus||4|00:00:00|PENDING|0:0
3018|runAllPipelinePmc.sh|MC20GBplus||4|00:00:00|FAILED|1:0
3019|runAllPipelinePmc.sh|MC20GBplus||4|00:00:00|FAILED|1:0
3020|runAllPipelinePmc.sh|MC20GBplus||4|00:00:00|FAILED|1:0
3021|runAllPipelinePmc.sh|MC20GBplus||4|00:00:00|FAILED|1:0
3022|runAllPipelinePmc.sh|MC20GBplus||4|00:00:00|FAILED|1:0
3023|runAllPipelinePmc.sh|MC20GBplus||4|00:00:00|FAILED|1:0
3024|runAllPipelinePmc.sh|MC20GBplus||4|00:00:00|FAILED|1:0
3025|runAllPipelinePmc.sh|MC20GBplus||4|00:00:00|FAILED|1:0
3026|runAllPipelinePmc.sh|MC20GBplus||4|00:00:00|FAILED|1:0
3027|runAllPipelinePmc.sh|MC20GBplus||4|00:00:00|FAILED|1:0
--- %< ---

But 3003, 3024, 3025 and 3027 get a "job_complete ... success".
fgrep job_complete /var/log/slurm-llnl/slurmctld.log:
-
Jan 25 09:31:01 darwin slurmctld[12198]: sched: job_complete for 
JobId=3003 successful
Jan 25 10:25:06 darwin slurmctld[12198]: sched: job_complete for 
JobId=3024 successful
Jan 25 10:25:06 darwin slurmctld[12198]: sched: job_complete for 
JobId=3025 successful
Jan 25 10:25:25 darwin slurmctld[12198]: sched: job_complete for 
JobId=3027 successful



slurm-3*.out for all failed nodes are missing.

slurmctlddebug=9 give me
--- %< ---
Jan 25 10:25:25 darwin slurmctld[12198]: cons_res: cr_job_test: 
evaluating job 3027 on 1 nodes
Jan 25 10:25:25 darwin slurmctld[12198]: cons_res: _can_job_run_on_node: 
16 cpus on s17(1), mem 0/64000
Jan 25 10:25:25 darwin slurmctld[12198]: cons_res: eval_nodes:0 consec 
c=16 n=1 b=14 e=14 r=-1
Jan 25 10:25:25 darwin slurmctld[12198]: cons_res: cr_job_test: test 0 
pass - job fits on given resources
Jan 25 10:25:25 darwin slurmctld[12198]: cons_res: _can_job_run_on_node: 
16 cpus on s17(1), mem 0/64000
Jan 25 10:25:25 darwin slurmctld[12198]: cons_res: eval_nodes:0 consec 
c=16 n=1 b=14 e=14 r=-1
Jan 25 10:25:25 darwin slurmctld[12198]: cons_res: cr_job_test: test 1 
pass - idle resources found
Jan 25 10:25:25 darwin slurmctld[12198]: cons_res: cr_job_test: 
distributing job 3027
Jan 25 10:25:25 darwin slurmctld[12198]: cons_res: cr_job_test: job 3027 
ncpus 4 cbits 16/16 nbits 1
Jan 25 10:25:25 darwin slurmctld[12198]: DEBUG: job 3027 node s17 vpus 1 
cpus 4

Jan 25 10:25:25 darwin slurmctld[12198]: 
Jan 25 10:25:25 darwin slurmctld[12198]: job_id:3027 nhosts:1 ncpus:4 
node_req:1 nodes=s17

Jan 25 10:25:25 darwin slurmctld[12198]: Node[0]:
Jan 25 10:25:25 darwin slurmctld[12198]:   Mem(MB):23000:0  Sockets:2 
Cores:8  CPUs:4:0

Jan 25 10:25:25 darwin slurmctld[12198]:   Socket[0] Core[0] is allocated
Jan 25 10:25:25 darwin slurmctld[12198]:   Socket[0] Core[1] is allocated
Jan 25 10:25:25 darwin slurmctld[12198]:   Socket[1] Core[0] is allocated
Jan 25 10:25:25 darwin slurmctld[12198]:   Socket[1] Core[1] is allocated
Jan 25 10:25:25 darwin slurmctld[12198]: 
Jan 25 10:25:25 darwin slurmctld[12198]: cpu_array_value[0]:4 reps:1
Jan 25 10:25:25 darwin slurmctld[12198]: 
Jan 25 10:25:25 darwin slurmctld[12198]: DEBUG: Dump job_resources: 
nhosts 1 cb 0-1,8-9

Jan 25 10:25:25 darwin slurmctld[12198]: DEBUG: _add_job_to_res (after):
Jan 25 10:25:25 darwin slurmctld[12198]: part:MC20GBplus rows:1 pri:50
Jan 25 10:25:25 dar

[slurm-dev] Re: jobs vanishing w/o trace(?)

2016-01-22 Thread Benjamin Redling


Am 16.01.2016 um 21:10 schrieb Benjamin Redling:

I loose every job that gets allocated on a certain node (KVM instance).

[...]

Now I had to change the default route of the host because of a brittle
non-slurm instances with a web app.


after starting the unchanged instance several days later for another 
investigation the problem is gone.


Benjamin
--
FSU Jena | JULIELab.de/Staff/Benjamin+Redling.html
vox: +49 3641 9 44323 | fax: +49 3641 9 44321


[slurm-dev] Re: NodeName and PartitionName format in slurm.conf

2016-01-20 Thread Benjamin Redling


Am 19.01.2016 um 20:37 schrieb Andrus, Brian Contractor:

I am testing our slurm to replace our torque/moab setup here.

The issue I have is to try and put all our node names in the NodeName
and PartitionName entries.
In our cluster, we name our nodes compute--
That seems to be problem enough with the abilities to use ranges in
slurm, but it is compounded with the fact that the folks put the nodes
in keeping 1u of space in between.
So I have compute-1-[1,3,5,7,9,11...41]


Why not simply use a comma separated list _generated_ from your 
inventory / DNS / /etc/hosts / etc. .?


When you have outliers (2U, 4U -- do they have more resources too!?) it 
would make sense to group/partition by resources anyway.
What are you using to manage inventory? Most configuration management 
and provisioning tools I know provide you with the necessary tools -- 
have a look at puppetlabs facter (or alternatives).


http://slurm.schedmd.com/slurm.conf.html

Multiple node names may be comma separated (e.g. "alpha,beta,gamma") 
and/or a simple node range expression may optionally be used to specify 
numeric ranges of nodes to avoid building a configuration file with 
large numbers of entries. The node range expression can contain one pair 
of square brackets with a sequence of comma separated numbers and/or 
ranges of numbers separated by a "-" (e.g. "linux[0-64,128]", or 
"lx[15,18,32-33]"). Note that the numeric ranges can include one or more 
leading zeros to indicate the numeric portion has a fixed number of 
digits (e.g. "linux[-1023]"). Up to two numeric ranges can be 
included in the expression (e.g. "rack[0-63]_blade[0-41]"). If one or 
more numeric expressions are included, one of them must be at the end of 
the name (e.g. "unit[0-31]rack" is invalid), but arbitrary names can 
always be used in a comma separated list.



Complicating that logic wouldn't make much sense to me.
Mapping host names to partitions shouldn't be too hard to script.
In the worst case you copy the full/per-rack/per-resources host list to 
partitions and manually cherry-pick afterwards.


Regards,
Benjamin
--
FSU Jena | JULIELab.de/Staff/Benjamin+Redling.html
vox: +49 3641 9 44323 | fax: +49 3641 9 44321


[slurm-dev] Re: NodeName and PartitionName format in slurm.conf

2016-01-20 Thread Benjamin Redling


Am 20.01.2016 um 11:00 schrieb Benjamin Redling:

Am 19.01.2016 um 20:37 schrieb Andrus, Brian Contractor:

I am testing our slurm to replace our torque/moab setup here.

The issue I have is to try and put all our node names in the NodeName
and PartitionName entries.
In our cluster, we name our nodes compute--
That seems to be problem enough with the abilities to use ranges in
slurm, but it is compounded with the fact that the folks put the nodes
in keeping 1u of space in between.
So I have compute-1-[1,3,5,7,9,11...41]


Why not simply use a comma separated list _generated_ from your
inventory / DNS / /etc/hosts / etc. .?

< --- 8< --->

P.S.
Totally forgot, you can configure a NodeName different from its 
NodeHostname:


Node names can have up to three name specifications: NodeName is the 
name used by all Slurm tools when referring to the node, NodeAddr is the 
name or IP address Slurm uses to communicate with the node, and 
NodeHostname is the name returned by the command /bin/hostname -s. Only 
NodeName is required (the others default to the same name), although 
supporting all three parameters provides complete control over naming 
and addressing the nodes. See the slurm.conf man page for details on all 
configuration parameters.



But I wouldn't do that. IMHO in case of an erroneous node it is just one 
more level of indirection -- cumbersome to find the culprit.

Then again my host names don't depend on rack units.

Regards, Benjamin
--
FSU Jena | JULIELab.de/Staff/Benjamin+Redling.html
vox: +49 3641 9 44323 | fax: +49 3641 9 44321


[slurm-dev] Re: Share free cpus

2016-01-16 Thread Benjamin Redling

Hello Jordan,

On 2016-01-16 01:21, Jordan Willis wrote:
> If my partition is used up according to the node configuration, but still has 
> available CPUS, is there a way to allow a user to who only has a task that 
> takes 1 cpu on that node?
> 
> For instance here is my partition:
> 
> NODELISTNODES PARTITION  STATE  NODES(A/I) CPUS   CPUS(A/I/O/T)   
> MEMORY
> loma-node[ 38 all*   mix38/0   16+981/171/0/1152  64+
> 
> 
> According to the nodes, there is nothing Idling, but there are 171 available 
> cpus. Does anyone know what’s going on? When a new user asks for 1 task, why 
> can’t they get on one of those free cpus? What should I change in my 
> configuration.

without seeing your configuration thats just guesswork.
Are you using "select/linear" and "Shared=NO"?

Apart from that you might want to see the column "Resulting Behavior" to
get an idea what you have to check in your config:
http://slurm.schedmd.com/cons_res_share.html

Regards,
Benjamin
-- 
FSU Jena | JULIELab.de/Staff/Benjamin+Redling.html
vox: +49 3641 9 44323 | fax: +49 3641 9 44321


[slurm-dev] jobs vanishing w/o trace(?)

2016-01-16 Thread Benjamin Redling

Hello everybody,

I loose every job that gets allocated on a certain node (KVM instance).

Background:
to enable and test the resources of a cluster of new machines I run
Slurm 2.6 inside a Debian 7 KVM instance. Mainly because the hosts run
Debian 8 and the old cluster is Debian 7. I prefer the Debian packages
and do not want to build Slurm 2.6 from source and on top of that I need
easy resource isolation because I don't have the luxury of using that
cluster exclusively for Slurm: Ganeti is running on the hosts.
I don't see how to handle Slurm and Ganeti side by side reasonable well
with cgroups.
So far that setup worked reasonable well. Performance loss is negligible.

Now I had to change the default route of the host because of a brittle
non-slurm instances with a web app.
Since than jobs that are appointed to that KVM instance disappear.
And there isn't even an error log.
Every other job I scancel on other hosts will give me the log file (in
the users home on a NFS4 file server, where the job is submitted).

job accounting says
[...]
2253 MC20GBplus 1451848294 1451848294 10064 50 - - 0 runAllPipeline.sh 1
2 4 s2 (null)
2254 MC20GBplus 1451848298 1451848298 10064 50 - - 0 runAllPipeline.sh 1
2 4 s3 (null)
2255 MC20GBplus 1451848302 1451848302 10064 50 - - 0 runAllPipeline.sh 1
2 4 s4 (null)
2256 MC20GBplus 1451848306 1451848306 10064 50 - - 0 runAllPipeline.sh 1
2 4 s5 (null)
2257 MC20GBplus 1451848310 1451848310 10064 50 - - 0 runAllPipeline.sh 1
2 4 s7 (null)
2258 MC20GBplus 1451848313 1451848313 10064 50 - - 0 runAllPipeline.sh 1
2 4 s9 (null)
2259 MC20GBplus 1451848317 1451848317 10064 50 - - 0 runAllPipeline.sh 1
2 4 s10 (null)
2260 MC20GBplus 1451848320 1451848320 10064 50 - - 0 runAllPipeline.sh 1
2 4 s11 (null)
2261 MC20GBplus 1451848323 1451848323 10064 50 - - 0 runAllPipeline.sh 1
2 4 s12 (null)
2262 MC20GBplus 1451848326 1451848326 10064 50 - - 0 runAllPipeline.sh 1
2 4 s13 (null)
2263 MC20GBplus 1451848329 1451848329 10064 50 - - 0 runAllPipeline.sh 1
2 4 s15 (null)
2265 express 1451848341 1451848341 10064 50 - - 0 runAllPipeline.sh 1 2
4 darwin (null)
2267 MC20GBplus 1451848349 1451848349 10064 50 - - 0 runAllPipeline.sh 1
2 4 stemnet1 (null)
2268 express 1451900653 1451900653 10064 50 - - 0 runAllPipeline.sh 1 2
4 darwin (null)
2270 MC20GBplus 1451983401 1451983401 10064 50 - - 0 runAllPipeline.sh 1
2 4 s17 (null)
2270 MC20GBplus 1451983401 1451983401 10064 50 - - 0 runAllPipeline.sh 1
2 4 s17 (null)
2270 MC20GBplus 1451983401 1451983401 10064 50 - - 0 runAllPipeline.sh 1
2 4 s17 (null)
2270 MC20GBplus 1451983401 1451983401 10064 50 - - 3 0 5 4294967295 256
2271 MC20GBplus 1451983452 1451983452 10064 50 - - 0 runAllPipeline.sh 1
2 4 s17 (null)
2271 MC20GBplus 1451983452 1451983452 10064 50 - - 0 runAllPipeline.sh 1
2 4 s17 (null)
2271 MC20GBplus 1451983452 1451983452 10064 50 - - 0 runAllPipeline.sh 1
2 4 s17 (null)
[...]

But s17 (the KVM instance) _never_ gives results. The jobs more or less
immediately disappear.
Now I wonder: how is it at all possible that the jobs get lost? What
happened that the slurm master thinks all went well? (Does it? Am I just
missing something?)
Where can I start to investigate next?

Regards,
Benjamin
-- 
FSU Jena | JULIELab.de/Staff/Benjamin+Redling.html
vox: +49 3641 9 44323 | fax: +49 3641 9 44321