Hello Jordan,
On 2016-01-16 01:21, Jordan Willis wrote:
> If my partition is used up according to the node configuration, but still has
> available CPUS, is there a way to allow a user to who only has a task that
> takes 1 cpu on that node?
>
> For instance here is my partition:
>
> NODELIST
Hello everybody,
I loose every job that gets allocated on a certain node (KVM instance).
Background:
to enable and test the resources of a cluster of new machines I run
Slurm 2.6 inside a Debian 7 KVM instance. Mainly because the hosts run
Debian 8 and the old cluster is Debian 7. I prefer the D
mem-per-cpu as part of SallocDefaultCommand in the
slurm.conf and go with
DefMemPerCPU, DefMemPerNode, MaxMemPerCPU and MaxMemPerNode as mentioned
in the second last paragraph and let the user set --mem-per-cpu.
As recommended.
Regards, Benjamin
On Jan 16, 2016, at 7:34 AM, Benjamin Redling
m
Am 19.01.2016 um 20:37 schrieb Andrus, Brian Contractor:
I am testing our slurm to replace our torque/moab setup here.
The issue I have is to try and put all our node names in the NodeName
and PartitionName entries.
In our cluster, we name our nodes compute--
That seems to be problem enough wit
Am 20.01.2016 um 11:00 schrieb Benjamin Redling:
Am 19.01.2016 um 20:37 schrieb Andrus, Brian Contractor:
I am testing our slurm to replace our torque/moab setup here.
The issue I have is to try and put all our node names in the NodeName
and PartitionName entries.
In our cluster, we name our
Am 16.01.2016 um 21:10 schrieb Benjamin Redling:
I loose every job that gets allocated on a certain node (KVM instance).
[...]
Now I had to change the default route of the host because of a brittle
non-slurm instances with a web app.
after starting the unchanged instance several days later
Am 16.01.2016 um 21:10 schrieb Benjamin Redling:
[...] how is it at all possible that the jobs get lost? What
happened that the slurm master thinks all went well? (Does it? Am I just
missing something?)
Where can I start to investigate next?
I could fire several hundert jobs with a dummy
Am 25.01.2016 um 16:41 schrieb Benjamin Redling:
> I could fire several hundert jobs with a dummy shell script against that
> node but as soon as one of my users tries a complex pipeline jobs get
> lost with a slurm-*.out
typo: lost _without_ a .out-file
Question:
> Wh
Am 27.01.2016 um 09:53 schrieb Ole Holm Nielsen:
> On 01/27/2016 09:12 AM, Johan Guldmyr wrote:
>> has anybody already made some custom NHC checks that can be used to
>> check disk health or perhaps even hardware health on a dell server?
>> I've been thinking of using smartctl + NHC to test if th
Am 18.01.2016 um 18:42 schrieb Benjamin Redling:
> Am 18.01.2016 um 01:39 schrieb Jordan Willis:
>> CompleteWait=60
>> SlurmdUser=root
> side note: really root? Why not a dedicated user?
It is at least the Debian default and I just didn't se
Am 29.01.2016 um 15:08 schrieb David Roman:
> I created 2 jobs
> Job_A uses 8 CPUS in partion DEV
> Job_B uses 16 CPUS in partion LOW
>
> If I start Job_A before Job_B, all is ok. Job_A is in RUNNING state and Job_B
> is in PENDING state
>
> BUT, If I start Job_B before Job_A. The both jobs a
Am 29.01.2016 um 15:31 schrieb Dennis Mungai:
> Add SHARE=FORCE to your partition settings for each partition entry in
> the configuration file.
https://computing.llnl.gov/linux/slurm/cons_res_share.html
selection setting was:
SelectType=select/cons_res
SelectTypeParameters=CR_Core_Memory
Share
s running as "slurm".
I went back to this thread because Brian Freed "slurmd node state down"
on 28th Jan. got a reply by Trey Dockendorf that gave me that hint that
I mixed things up a few days before
/B
> On 01/29/2016 08:10 AM, Benjamin Redling wrote:
>> Am 18.0
As far as I understood Slurm with setting Share=FORCE you risk
over-committing.
/Benjamin
On 2016-01-29 16:10, Dennis Mungai wrote:
> And with the SHARE=FORCE:8 parameter, each consumable processor, socket or
> core can be shared by 8 jobs, as an example.
>
> On Jan 29, 2016 5:08 PM, David Rom
t; http://slurm.schedmd.com/cons_res_share.html
Full ACK. Was leaving the office and just used the highest page ranking
to make a point.
> On January 29, 2016 6:42:24 AM PST, Benjamin Redling
> wrote:
>>
>> Am 29.01.2016 um 15:31 schrieb Dennis Mungai:
>>> Add SHARE=FORCE to
sbatch: error: Batch job submission failed: Requested node configuration is
> not available
> I try the other solutions that you give me, and I tell you what happens.
>
> PS : I'm sorry, but my English is not very good.
>
> David
>
>
> -Message d'ori
llocation problem
>
>
> Can you change your consumable resources from CR_Core_Memory to CR_CPU_Memory?
> On Jan 29, 2016 5:42 PM, Benjamin Redling
> mailto:benjamin.ra...@uni-jena.de>> wrote:
>
> Am 29.01.2016 um 15:31 schrieb Dennis Mungai:
>> Add SHARE=F
ume_.
/Benjamin
>
>
> I try the other solutions that you give me, and I tell you what happens.
>
> PS : I'm sorry, but my English is not very good.
>
> David
>
>
> -Message d'origine-
> De : Benjamin Redling [mailto:benjamin.ra...@uni-jena.
On 2016-02-01 11:08, David Roman wrote:
> The both nodes are the same. They are virtual machine (VMWARE) to do some
> tests.
That makes me wonder why changing fastschedule=0 to 1 results in
comprehensible behavior.
Have you looked into the log files on the master and the node?
(Apart from that
I haven't understood why qos GrpJobs= -- assoc per user? -- won't work for you.
Am 4. Februar 2016 01:50:22 MEZ, schrieb "Skouson, Gary B"
:
>
>I'd like a way to be able to limit the number of jobs that a user is
>allowed to run before we only allow them to run by backfilling.
>
>For example, le
Can you post how you submitted the job?
Mira on 60 cores needs MPI in your case. Multi threading works w/o
BTW. Your config says 31cpus. Generated without incr index or intended?
Am 4. Februar 2016 18:02:15 MEZ, schrieb Pierre Schneeberger
:
>Hi there,
>
>I'm setting up a small cluster composed
On 2016-02-04 16:43, Skouson, Gary B wrote:
> The GrpJobs limits the total number of jobs allowed to be running. Let's say
> I want to allow 70 jobs per users. The GrpJobs would work fine for that.
> However, I'd like to limit the number of jobs able to reserve resources in
> the backfill s
On 2016-02-11 07:36, Rohan Garg wrote:
> [...] The machine has 16 physical cores
> on 2 sockets with HyperThreading enabled. I'm using the EASY
> scheduling algorithm with backfilling. The goal is to fully utilize all
> the available cores at all times.
> Given a list of three jobs with requireme
On 2016-02-11 07:36, Rohan Garg wrote:
>
> Hello,
>
> I'm trying to set up SLURM-15.08.1 on a single multi-core node to
> manage multi-threaded jobs. The machine has 16 physical cores
> on 2 sockets with HyperThreading enabled. I'm using the EASY
> scheduling algorithm with backfilling. The goal
Am 10.02.2016 um 09:04 schrieb Pierre Schneeberger:
> I submitted the job with sbatch and the following command:
> #!/bin/bash
> #SBATCH -n 80 # number of cores
> #SBATCH -o
> /mnt/nfs/bio/HPC_related_material/Jobs_STDOUT_logs/slurm.%N.%j.out # STDOUT
> #SBATCH -e
> /mnt/nfs/bio/HPC_related_materi
On 03/17/2016 04:01, 温圣召 wrote:
> The preempted job1 show a PD reason of BeginTime
> my job invocation at the info of them as follow:
> [root@szwg]# sbatch --gres=gpu:4 -N 1 --partition=low mybatch.sh
You demand for _4_ GPUs and 1 node.
Your config says each node has Gres=gpu:2
> Submitted b
On 2016-03-16 13:54, 温圣召 wrote:
> my job ... can not be requeue when it preempted ...
Can you please post the job invocation too?
Does the preempted job1 show a PD reason (%R) in the queue?
Regards,
Benjamin
--
FSU Jena | JULIELab.de/Staff/Benjamin+Redling.html
vox: +49 3641 9 44323 | fax: +49
On 03/17/2016 04:01, 温圣召 wrote:
> The preempted job1 show a PD reason of BeginTime
> my job invocation at the info of them as follow:
> [root@szwg]# sbatch --gres=gpu:4 -N 1 --partition=low mybatch.sh
You demand for _4_ GPUs and 1 node.
Your config says each node has Gres=gpu:2
> Submitted b
+1
have been bitten by MySQL many times in my career.
* Constraints silently ignored (luckily MyISAM is long gone)
* Dumps that are coming from one instance not going back into the same
* Huge BLOBs that never get out via a dump
* BLOBs destroying the whole integrity of a DB
...
I don't trust it
Hi Helmi Azizan,
On 03/22/2016 05:11, Helmi Azizan wrote:
> I created a new slurm.conf using the easy configurator but am still facing
> the following error:
hopefully the correct version, fitting to the 2.6 version you are using.
> helmi@Dellrackmount:~$ srun -N1 /bin/hostname
> srun: error: U
+1
If a user needs a newer Python version and a paper submission is due I don't
want to be a blocker as an admin
The conservative packages of server distributions are of no value to my users.
They are free to run from their homes whatever they need, otherwise the
clusters would be pointless! BR
On 2016-03-24 04:25, Helmi Azizan wrote:
You wrote
> https://groups.google.com/d/msg/slurm-devel/LXmU3BoWGQw/ULqmA85qKAAJ
I wrote:
> hopefully the correct version, fitting to the 2.6 version you are using.
You wrote:
>> helmi@Dellrackmount:~$ srun -N1 /bin/hostname
>> srun: error: Unable to all
On 2016-03-24 13:59, Diego Zuccato wrote:
> Is there an equivalent of torque's pbstop for SLURM?
There are a lot of "rosetta" websites for workload schedulers. Most of
the time Slurm, Torque, PBS and Sun Gridengine & variants are listed.
> I already tried slurmtop, but it seems something is not
On 2016-03-24 17:22, John DeSantis wrote:
>>> What I'm looking for is a tool that gives me, for every node/cpu the
>>> corresponding job.
>>
>> squeue -n
>>
>> As the man page explicitly mentions: can be a single node and
>> either a NodeName or a NodeHostname
>
> I believe this is a typo, as w
15.08 is in Debian testing. A bit risky but I would have a look with pinning
what else would need an upgrade as a dependency. BR
Am 25. März 2016 11:01:20 MEZ, schrieb Diego Zuccato :
>
>Il 25/03/2016 09:59, Diego Zuccato ha scritto:
>
>> I'm using SLURM 14.03.9 (the one packaged in Debian 8) an
On 04/14/2016 11:08, Naajil Aamir wrote:
> Hi hope you are doing well. I am currently working on a scheduling policy
> of slurm 2.3.2 for that i need *PYSLURM* version that is compatible with
> slurm 2.3.3 which i am unable to find on internet. It would be a great help
> if you could provide a lin
On 04/15/2016 16:22, Glen MacLachlan wrote:
> I tried that already by leaving the field blank as in "flags=" but that
> has no effect. Should I change it to something else?
I set my nodes to State=IDLE after maintenance (from DOWN, DRAIN/DOWN).
Depending on your cases you might have to look at
On 2016-04-15 16:54, Benjamin Redling wrote:
>
> On 04/15/2016 16:22, Glen MacLachlan wrote:
>> I tried that already by leaving the field blank as in "flags=" but that
>> has no effect. Should I change it to something else?
>
> I set my nodes to State=IDLE a
On 2016-04-29 07:36, Lachlan Musicman wrote:
> I'm finding this a little confusing.
>
> We have a very simple script we are using to test/train staff how to use
> SLURM (16.05-pre2). They are moving from an old Torque/Maui system.
>
> I have a test partition set up,
>
> from slurm.conf
>
> Nod
On 2016-05-13 05:58, Husen R wrote:
> Does slurm provide feature to get command that being executed/will be
> executed by running/pending jobs ?
scontrol show --detail job
or
scontrol show -d job
Benjamin
--
FSU Jena | JULIELab.de/Staff/Benjamin+Redling.html
vox: +49 3641 9 44323 | fax: +49 3
On 05/17/2016 10:02, Loris Bennett wrote:
>
> Benjamin Redling
> writes:
>
>> On 2016-05-13 05:58, Husen R wrote:
>>> Does slurm provide feature to get command that being executed/will be
>>> executed by running/pending jobs ?
>>
>> scontrol
On 2016-05-17 12:28, Carlos Fenoy wrote:
> On Tue, May 17, 2016 at 10:02 AM, Loris Bennett
> wrote:
[...]
>> Which version does this? 15.08.8 just seems to show the 'Command' entry,
>> which is the file containing the actual command.
> You will only see the script in the output of the scontrol
On 2016-05-17 12:19, Loris Bennett wrote:
>
> Benjamin Redling
> writes:
>
>> On 05/17/2016 10:02, Loris Bennett wrote:
>>>
>>> Benjamin Redling
>>> writes:
>>>
>>>> On 2016-05-13 05:58, Husen R wrote:
[...]
>>>
Hi,
On 05/25/2016 13:21, Mike Johnson wrote:
> I know this is a long-standing question, but thought it was worth
> asking. I am in an environment that uses NFSv4, which obviously needs
> user credentials to grant access to filesystems. Has anyone else
> tackled the issue of unattended batch job
On 05/26/2016 12:16, Per Lönnborg wrote:
> Example from logfile below. LOTS of info saying that one ore several
> nodes has incorrect time. I want to see which node(s)!
> Of course I can ask all nodes about the time, but it´s a bit dull. Even
> if we do it in parallell.
A monitoring application i
On 2016-06-03 21:25, Jason Bacon wrote:
> It might be worth mentioning that the calcpi-parallel jobs are run with
> --array (no srun).
>
> Disabling the task/affinity plugin and using "mpirun --bind-to core"
> works around the issue. The MPI processes bind to specific cores and
> the embarrassin
On 06/13/2016 09:50, Husen R wrote:
> Hi all,
>
> How to setup node sequence/order in slurm ?
> I configured nodes in slurm.conf like this -> Nodes = head,compute,spare.
>
> Using that configuration, if I use one node in my job, I hope slurm will
> choose head as computing node (as it is in a
Hi,
On 2016-06-14 20:19, Martin Kohn wrote:
> As you can see even with an job array only one job runs. Below you can find
> the script I submit and my configuration.
> SchedulerType=sched/buildin
> #SchedulerType=sched/backfill
> #SchedulerPort=7321
> #SelectType=select/linear
> SelectType=sele
Hi,
On 2016-06-14 20:19, Martin Kohn wrote:
> As you can see even with an job array only one job runs. Below you can find
> the script I submit and my configuration.
> SchedulerType=sched/buildin
> #SchedulerType=sched/backfill
> #SchedulerPort=7321
> #SelectType=select/linear
> SelectType=sele
Hallo Martin,
On 2016-06-16 20:21, Martin Kohn wrote:
> Hello Benjamin,
>
> thanks for your answer, I tried it to set SelectTypeParameters=CR_Core but no
> success.
>
> Good know at least for me is that it is the default behavior from slurm to
> take the entire node. As I'm coming from Torque
Hi,
On 07/06/2016 11:17, Laurent Facq wrote:
> i would like to use only one partition with the 80 nodes,
> and that users who need OPA nodes could add a constraint "OPA+IB" to
> choose OPA+IB nodes
> and, that users who dont need OPA are given IB nodes if some are free,
> and OPA+IB nodes ONLY if
Hi Yuri,
On 2016-07-12 20:53, Yuri wrote:
> In slurm.conf I have CPUs=4 for each node (but each node actually has a
> Intel Core i7). My question is: why is slurm assigning only one job per
> node and each job is consuming 8 CPUs?
considering that you only provide "CPU... for each node" the sbat
Hi,
On 2016-07-25 22:46, Joshua Baker-LePain wrote:
> I think that my initial question was too complex/detailed. Let me ask a
> more open-ended one. Do folks have any strategies they'd like to share
> on partition setups that favor paying customers while also allowing for
> usage of spare resou
On 2016-07-29 11:15, Kolodiev, Vladimir wrote:
> Hello,
> I am Vladimir Kolodiev, a SW engineer from Intel Corp.
>
> I work with SLURM now and I have a question about SLERM_STEP_NODELIST and
> SLURM_TASKS_PER_NODE formats.
>
> I understood that their formats are "hostA[1-18,22],hostB,hostC[001-
Hi,
On 08/23/2016 22:29, Tom G wrote:
> I have some slurm nodes with 8 core processors and hyperthreading, so 16
> CPUs in effect. I'd like to restrict slurm to only use 12 CPUs on this
> machine. What are the right slurm.conf settings to do this?
> Doing 8 or 16 CPUs seems straightforward sin
Hi,
I didn't see an answer so far, so I try to reason:
On 08/29/2016 19:40, Luis Torres wrote:
> We have recently deployed SLURM v 15.08.7-build1 on Ubuntu 16.04
> submission and execution nodes with apt-get; we built and installed the
> source packages of the same release on Ubuntu 14.04 for th
Hi
On 09/01/2016 10:16, Christof Koehler wrote:
> Now, the point we are not sure about is what happens if a user allocates
> 10 out of 40 and sets "--exclusive" (if possible). Is the usage of that
> user (job) actually computed with 40 CPUs as most people would expect ?
> As described before oth
Hi,
I think your case is mentioned in the FAQ Q30 in the "NOTE".
-- according to this you set CR_CPU and "CPU" only; no cores, no
threads, ...:
http://slurm.schedmd.com/faq.html
[...]
30. Slurm documentation refers to CPUs, cores and threads. What exactly
is considered a CPU?
If your nodes are
On 09/12/2016 16:48, Uwe Sauter wrote:
>
> Try SelectTypeParameters=CR_Core instead of CR_CPU
That alone is not sufficient:
http://slurm.schedmd.com/faq.html#cpu_count
BR
--
FSU Jena | JULIELab.de/Staff/Benjamin+Redling.html
vox: +49 3641 9 44323 | fax: +49 3641 9 44321
On 09/12/2016 16:55, Uwe Sauter wrote:
>
> Also. CPUs=32 is wrong. You need
>
> Sockets=2 CoresPerSocket=8 ThreadsPerCore=2
Setting "CPU" is not wrong according to the FAQ:
http://slurm.schedmd.com/faq.html#cpu_count
BR
--
FSU Jena | JULIELab.de/Staff/Benjamin+Redling.html
vox: +49 3641 9 443
On 09/12/2016 18:57, andrealphus wrote:
> It doesnt seem like changing it to a different resource allocation
> method makes a difference, and almost seems buggy to me, but I guess
> is just a quirk of multithread systems.
Your issue ("using all hyperthreads") was discussed multiple times on
the l
ThreadsPerCore should be 1, u set it to 4 BR, Benjamin
Am 4. Oktober 2016 16:41:33 MESZ, schrieb evan clark :
>
>I am not sure if this is the correct place to share this, but maybe
>someone can point me in the correct directions. I recently setup a
>Centos 7 based slurm cluster, however my nodes
Hi, what are your AccountingStorage settings? Esp. AccountingStorageEnforce.
Did limits work before, or is this a first try?
Regards, Benjamin
Am 19. Oktober 2016 22:14:27 MESZ, schrieb Steven Lo :
>
>
>By the way, we do have the following attribute set:
>
> PriorityType=priority/multifactor
Hi Steven,
On 10/20/2016 00:22, Steven Lo wrote:
> We have the attribute commented out:
> #AccountingStorageEnforce=0
I think the best is to (re)visit "Accounting and Resource Limits":
http://slurm.schedmd.com/accounting.html
Right know I have no setup that needs accounting but as far as I
curr
Hi,
On 10/21/2016 18:58, Steven Lo wrote:
> Is MaxTRESPerUser a better option to use?
if you only ever want to restrict every user alike, that seems reasonable.
I would choose whatever fits your needs right now and in the not so
distant future. That way you gain time to learn about the options s
Hi,
are you both working on the same cluster as the OP?
On 10/25/2016 08:12, suprita.bot...@wipro.com wrote:
> I have installed slurm on a 2 node cluster.
>
> On the master node when I run sinfo command I get below output.
[...]
> But on compute node:Slurmd daemon is also running but it gives
Hi,
Am 26.10.2016 um 15:35 schrieb Achi Hamza:
> But when i run a job more than 3 minutes it does not stop, like:
> srun -n1 sleep 300
>
> I also set MaxWall parameter but to no avail:
> sacctmgr show qos format=MaxWall
> MaxWall
> ---
> 00:03:00
>
> Please advice where
Hi,
Am 25.10.2016 um 13:20 schrieb Patrice Peterson:
> is there a build-in way to queue LoadLeveler-like job steps in SLURM?
> Something like this:
>
> #!/bin/bash
> #SBATCH --num-tasks=1
> echo "prepping data, simple stuff"
> #SBATCH --- END STEP
>
> #SBATCH --num-tasks
Are you allowed and able to post the slurm.conf? What does sprio -o %Q -j
say about that job? BR, Benjamin
Am 29. Oktober 2016 20:56:18 MESZ, schrieb Vlad Firoiu :
>I'm trying to figure out why utilization is low on our university
>cluster.
>It appears that many cores are available, but a minima
tyMaxAge=1-0
>
>The particular job in question has 0 priority.
>
>On Sat, Oct 29, 2016 at 7:50 PM, Benjamin Redling <
>benjamin.ra...@uni-jena.de> wrote:
>
>> Are you allowed and able to post the slurm.conf? What does sprio -o
>%Q -j
>> say about that job? BR, B
Am 31.10.2016 um 00:23 schrieb Benjamin Redling:
> Are you aware that as long as SchedulerType
Sorry, typo. I meant *SelectType*
(The rest I wrote next is just unfiltered noise from my brain while
skimming the conf:)
>is not set to anything explicitly, select/linear is the default?
&
Am 31.10.2016 um 00:47 schrieb Vlad Firoiu:
> What do you mean the ScheduleType is not explicit? I see
> `SchedulerType=sched/backfill`. (I don't know too much about slurm so I
> am probably misunderstanding something.)
Vlad, you are right: ScheduleType _is_ set. I meant SelectType.
See my other
Hi,
try adding "-N 10" to explicitly ask for ten nodes too.
If you have access to the slurm.conf and don't have to share the
cluster, or share it with people with the same needs, you might like
SelectTypeParameters=CR_LLN or LLN on a per partition basis.
(s. http://slurm.schedmd.com/slurm.conf)
Am 15. Dezember 2016 14:48:24 MEZ, schrieb Stefan Doerr :
>$ sinfo --version
>slurm 15.08.11
>
>$ sacct --format="CPUTime,MaxRSS" -j 72491
> CPUTime MaxRSS
>-- --
> 00:27:06
> 00:27:06 37316236K
>
>
>I will have to ask the sysadms about cgroups since I'm just a user
>here.
Am 24. Dezember 2016 04:43:36 MEZ, schrieb Will Dennis :
>I see the following in the systemctl status ouput of the slurmd service
>on my compute nodes:
>
>Dec 23 21:31:58 host01 slurmd[32101]: error: You are using cons_res or
>gang scheduling with Fastschedule=0 and node configuration differs from
Hi Will,
Am 24.12.2016 um 21:10 schrieb Will Dennis:
> Thanks for helping to interpret the error message… Clear enough to me now.
You're welcome! I wrote a bit brief because I used my mobile.
> I was told (by one of my researchers) that setting “FastSchedule=0” would
> "tell Slurm to get the h
Hi David,
Am 06.03.2017 um 12:05 schrieb David Ramírez:
> I have little problem. Slurm allocated job allocated nodes (When a nodes
> is full, sent job to next one).
>
> I need use all nodes without order (customer like that)
I don't know "without order", but you can spread the load with "least
Hello Will,
On 2017-03-15 18:13, Will Dennis wrote:
> Here are their definitions in slurm.conf:
>
> # PARTITIONS
> PartitionName=batch Nodes=[nodelist] Default=YES DefMemPerCPU=2048
> DefaultTime=01:00:00 MaxTime=05:00:00 PriorityTier=100 PreemptMode=off
> State=UP
> PartitionName=long Nodes=[
Re hi,
On 2017-03-17 03:01, Will Dennis wrote:
> My slurm.conf:
> https://paste.fedoraproject.org/paste/RedFSPXVlR2auRlevS5t~F5M1UNdIGYhyRLivL9gydE=/raw
>
>> Are you sure the current running config is the one in the file?
>> Did you double check via "scontrol show config"
>
> Yes, all params se
Good examples:
https://hpc.nih.gov/docs/job_dependencies.html
BR
On 2017-03-15 17:37, Álvaro pc wrote:
> Hi again!
>
> I would really like to know about the behaviour of --dependency argument..
>
> Nobody know anything?
>
> *Álvaro Ponce Cabrera.*
>
>
> 2017-03-14 12:31 GMT+01:00 Álvaro pc
Am 19.03.2017 um 15:36 schrieb kesim:
> ... I only want to find
> the solution for the trivial problem. I also think that slurm was design
> for HPC and it is performing well in such env. I agree with you that my
> env. hardly qualifies as HPC but still one of the simplest concept
> behind any sch
Hi,
if you don't want to depend on the whitespaces in the output of "uptime"
(the number of fields depends on a locale) you can improve that via "awk
'{print $3}' /proc/loadavg" (for the 15min avg) -- it's always better to
avoid programmatically accessing output made for humans as long as possibl
re hi,
your script will occasionally fail because the number of fields in the
output of "uptime" is variable.
I was reminded by this one:
http://stackoverflow.com/questions/11735211/get-last-five-minutes-load-average-using-ksh-with-uptime
Even more a reason to use /proc...
Regards,
Benjamin
Am
Am 05.04.2017 um 15:58 schrieb maviko.wag...@fau.de:
[...]
> The purpose of this cluster is to investigate how smart distribution of
> workloads based on predetermined performance and energy data can benefit
> hpc-clusters that consist of heterogenous systems that differ greatly
> regarding energy
Am 11. April 2017 08:21:31 MESZ, schrieb Uwe Sauter :
>
>Ray,
>
>if you're going with the easy "copy" method just be sure that the nodes
>are all in the same state (user management-wise) before
>you do your first copy. Otherwise you might accidentally delete already
>existing users.
>
>I also enco
AFAIK most request never hit LDAP servers.
In production there is always a cache on the client side -- nscd might
have issue, but that's another story.
Regards,
Benjamin
On 2017-04-11 15:32, Grigory Shamov wrote:
> On a larger cluster, deploying NIS, LDAP etc. might require some
> thought, becau
Hello Mahmood,
Am 16.04.2017 um 16:11 schrieb Mahmood Naderan:
> Hi,
> Currently, Torque is running on our cluster. I want to know, is it
> possible to install Slurm, create some test partitions, submit some test
> jobs and be sure that it is working while Torque is running?
> Then we are able to
Hi Batsirai,
Am 17.04.2017 um 14:54 schrieb Batsirai Mabvakure:
> SLURM has been running okay until recently my jobs are terminating before
> they finish.
> I have tried increasing memory using --mem, but still the jobs stop
halfway with an error in the slurm.out file.
> I then tried running ag
Hi,
Am 31.05.2017 um 10:39 schrieb Loris Bennett:
> Does any one know whether one can run multinode MATLAB jobs with Slurm
> using only the Distributed Computing Toolbox? Or do I need to be
> running a Distributed Computing Server too?
if you can get a hand on the overpriced and underwhelming D
Hello Sourabh,
On 2017-06-06 10:52, sourabh shinde wrote:
> Problem :
> As per my understanding, high priority jobs are executed first and takes
> all of the available nodes.
> I need that atleast one low or normal priority job should be executed in
> parallel with the high priority jobs. I want
On 2017-07-13 18:51, Perry, Martin wrote:
> This email is to announce the latest version of the job packs feature
> (heterogeneous resources and MPI-MPMD tight integration support) as
> open-source code.
[...]
> The code can be cloned from this branch:
> _https://github.com/RJMS-Bull/slurm/tree/dev
Hello,
Am 25.07.2017 um 16:19 schrieb J. Smith:
> Does anyone has any suggestions in setting up high availability and
> automatic failover between two servers that run a Controller daemon,
> Database daemon and Mysql Database (i.e replication vs galera cluster)?
>
> Any input would be appreciated
Am 29. Juli 2017 08:07:44 MESZ, schrieb Florian Pommerening
:
>
>Hi everyone,
>
>is there a way to find out why a job was canceled by slurm? I would
>like
>to distinguish the cases where a resource limit was hit from all other
>reasons (like a manual cancellation). In case a resource limit was
Am 10. August 2017 13:47:21 MESZ, schrieb Sean McGrath :
>
>Yes, you can run slurm on a single node. There is no need for for a
>different
>head and compute node(s).
>
>You will need to set Shared=Yes if you want multiple people to be able
>to run on
>the machine simultaneously.
>
>The slurm.conf
On 2017-08-24 09:18, nir wrote:
[...]
> slurm server ip 192.168.10.1
> compute nodes 10.2.2.3-40
>
> Until yesterday the compute nodes were in the same VLAN as the slurm ,
> but i had to move them to new VLAN.
> After i moved them there is ping connection between slurm server and the
> compute no
Re hi,
On 2017-08-24 12:55, nir wrote:
> Thank you for your answer.
> Yes I went over this guide.
> didn't find any problem since compute nodes communicate with slurm server.
if you did so, what does "scontrol show node "
give as a reason for "DOWN"?
BR,
BR
--
FSU Jena | JULIELab.de/Staff/Ben
On 13.09.2017 02:56, Christopher Samuel wrote:
On 13/09/17 10:47, Lachlan Musicman wrote:
Chris how does this sacrifice performance? If none of my software
(bioinformatics/perl) is HT, surely I'm sacrificing capacity by leaving
one thread unused as jobs take an entire core?
A HT is not a cor
On 14.09.2017 10:52, Taras Shapovalov wrote:
Hey guys!
As far as I know now there is a built-in 5 min time interval after a job
is finished, which leads to the job removal from Slurm "memory" (not
from accounting). This is ok until users need to requeue the job by some
reason. Thus if 5 mi
On 14.09.2017 11:12, Merlin Hartley wrote:
> I wonder: what would be the ramifications of setting this to 0 in
production? "A value of zero prevents any job record purging”
> Or is that option only really there for debugging?
(just guessing) should be horrible: once "MaxJobCount" (s. slurm.con
Hello Mike,
On 10/4/17 6:10 PM, Mike Cammilleri wrote:
I'm in search of a best practice for setting up Environment Modules for our
Slurm 16.05.6 installation (we have not had the time to upgrade to 17.02 yet).
We're a small group and had no explicit need for this in the beginning, but as
we
1 - 100 of 104 matches
Mail list logo