sure users explicitly requesting slow nodes
instead of just dumping them on ancient Opterons). Also, each user
gets their own Account, so the QoS Grp limits apply to each human
separately. Accounts would also have absolute core limits.
Thank you for your thoughts!
Corey
--
Ryan Cox
Operation
S: Ubuntu 16.04
Lua: lua5.2 and liblua5.2-dev (I can use Lua interactively)
SLURM version: 17.02.5, compiled from source (after installing
Lua) using ./configure --prefix=/usr --sysconfdir=/etc/slurm
Any guidance to get me up and running would be greatly appreciated!
Thanks,
Nathan
--
Ryan Cox
Operations Director
Fulton Supercomputing Lab
Brigham Young University
egards.
>
>
>
>
Could you set AllowedRamSpace/AllowedSwapSpace in
/etc/slurm/cgroup.conf to some big number? That way the job memory
limit will be the cgroup soft limit, and the cgroup hard limit
which is when the kernel will OOM kill the job would be
"job_memory_limit * AllowedRamSpace" that is, some large value?
--
Janne Blomqvist, D.Sc. (Tech.), Scientific Computing Specialist
Aalto University School of Science, PHYS & NBE
+358503841576 <tel:%2B358503841576> || janne.blomqv...@aalto.fi
<mailto:janne.blomqv...@aalto.fi>
--
Ryan Cox
Operations Director
Fulton Supercomputing Lab
Brigham Young University
blazed this trail before, but this is how
I am going about it.
--
Ryan Cox
Operations Director
Fulton Supercomputing Lab
Brigham Young University
lity in respect of such communication, and the
employee responsible will be personally liable for any damages or other
liability arising. XMA Limited is registered in England and Wales (registered
no. 2051703). Registered Office: Wilford Industrial Estate, Ruddington Lane,
Wilford, Nottingham, NG11 7EP
--
Ryan Cox
Operations Director
Fulton Supercomputing Lab
Brigham Young University
22.2 20.4 17 16.9799
* denotes the node where the batch script executes (node 0)
CPU usage is cumulative since the start of the job
Ryan
On 09/19/2016 11:13 AM, Ryan Cox wrote:
We use this script that we cobbled together:
https://github.com/BYUHPC/slurm-random/blob/master/rjobstat
We use this script that we cobbled together:
https://github.com/BYUHPC/slurm-random/blob/master/rjobstat. It assumes
that you're using cgroups. It uses ssh to connect to each node so it's
not very scalable but it works well enough for us.
Ryan
On 09/18/2016 06:42 PM, Igor Yakushin wrote:
rge Washington University
725 21st Street
Washington, DC 20052
Suite 211, Corcoran Hall
==
On Fri, Apr 15, 2016 at 1:07 PM, Ryan Cox <ryan_...@byu.edu
<mailto:ryan_...@byu.edu>> wrote:
Did you try this: --reservation=root_13
Did you try this: --reservation=root_13
On 04/15/2016 08:10 AM, Glen MacLachlan wrote:
scontrol update not allowing jobs
Dear all,
Wrapping up a maintenance period and I want to run some test jobs
before I release the reservation and allow regular user jobs to start
running. I've modified
Coincidentally, I asked about that yesterday in a bug report:
http://bugs.schedmd.com/show_bug.cgi?id=2465. The short answer is to use
SchedulerParameters=assoc_limit_continue that was introduced in
15.08.8. It only works if the Reason for the job is something like
Assoc*Limit.
Ryan
On
ug slurm_ar user1 R8:53:26 1 compute48*
In my config, I have:
*SelectType = select/cons_res*
*SelectTypeParameters = CR_CORE_MEMORY*
What am I missing to get more than one job to run on a node?
Thanks in advance,
Brian Andrus
there, and slurmctld decided the data was invalid and killed all jobs.
(I don't know if this is still a problem.)
--
Ryan Cox
Operations Director
Fulton Supercomputing Lab
Brigham Young University
We have seen similar issues on 14.11.8 but haven't bothered to diagnose
or report it. I think I've seen it twice so far out of dozens of new users.
Ryan
On 09/07/2015 09:16 AM, Loris Bennett wrote:
Hi,
This problem occurs with 14.11.8.
A user I set up today got the following error when
Be sure to test it first before trying anything else:
https://stackoverflow.com/questions/18661976/reading-dev-cpu-msr-from-userspace-operation-not-permitted.
We ran into this issue once when we had a trusted person and we
couldn't easily grant him access to the MSRs. We couldn't find a good
or the taking of any action in reliance on
the contents of this information is strictly prohibited. If you have
received this email in error, please immediately notify the sender via
telephone or return mail.
--
Ryan Cox
Operations Director
Fulton Supercomputing Lab
Brigham Young University
Telecommunications and Learning Technologies
Phone: (979)458-2396
Email: treyd...@tamu.edu mailto:treyd...@tamu.edu
Jabber: treyd...@tamu.edu mailto:treyd...@tamu.edu
On Thu, Jun 4, 2015 at 11:51 AM, Ryan Cox ryan_...@byu.edu
mailto:ryan_...@byu.edu wrote:
Trey,
In http
Phone: (979)458-2396
Email: treyd...@tamu.edu mailto:treyd...@tamu.edu
Jabber: treyd...@tamu.edu mailto:treyd...@tamu.edu
--
Ryan Cox
Operations Director
Fulton Supercomputing Lab
Brigham Young University
- Senior Technologist
|| \\ and Health | novos...@rutgers.edu- 973/972.0922 (2x0922)
|| \\ Sciences | OIRT/High Perf Res Comp - MSB C630, Newark
`'
On Apr 6, 2015, at 20:17, Ryan Cox ryan_...@byu.edu wrote:
Chris,
Just have GPU users request the numbers of CPU cores that they need and
don't
Chris,
Just have GPU users request the numbers of CPU cores that they need and
don't lie to Slurm about the number of cores. If a GPU user needs 4
cores and 4 GPUs, have them request that. That leaves 20 cores for
others to use.
Ryan
On 04/06/2015 03:43 PM, Christopher B Coffey wrote:
On 01/21/2015 09:23 AM, Bill Wichser wrote:
A user underneath gets the expected 0.009091 normalized shares since
there are a lot of fairshare=1 users there. The user3 gets basically
25x this value as the fairshare for user3=25
Yet the normalized shares is actually MORE than the
*From:*Ryan Cox ryan_...@byu.edu
*Sent:* 25 November 2014 17:43
*To:* slurm-dev
*Subject:* [slurm-dev] Re: [ sshare ] RAW Usage
Raw usage is a long double and the time added by jobs can be off by a
few seconds. You can take a look at _apply_new_usage() in
src/plugins/priority/multifactor
in the SLURM accounting DB? I could
not find any value for the JOB that corresponds to this RAW usage.
Roshan
From: Ryan Cox ryan_...@byu.edu
Sent: 25 November 2014 17:43
To: slurm-dev
Subject: [slurm-dev] Re: [ sshare ] RAW Usage
Raw usage is a long double and the time added by jobs can
Raw usage is a long double and the time added by jobs can be off by a
few seconds. You can take a look at _apply_new_usage() in
src/plugins/priority/multifactor/priority_multifactor.c to see exactly
what happens.
Ryan
On 11/25/2014 10:34 AM, Roshan Mathew wrote:
Hello SLURM users,
Dave,
I have done testing on 5-6 year old hardware with 100,000 users randomly
distributed in 10,000 accounts with semi-random depths with most being
between 1-4 levels from root but some much deeper than that, plus
100,000 jobs pending. slurmctld startup time was really long but, after
George,
Wouldn't a QOS with GrpNodes=10 accomplish that?
Ryan
On 10/30/2014 11:47 AM, Brown George Andrew wrote:
Hi,
I would like to have a partition of N nodes without statically
defining which nodes should belong to a partition and I'm trying to
work out the best way to achieve this.
Trey,
I'm not sure why your jobs aren't starting. Someone else will have to
answer that question.
You can model an organizational hierarchy a lot better in 14.11 due to
changes in Fairshare=parent for accounts. If you only want fairshare to
matter at the research group and user levels but
/wishing the values would be between 0.0 and
1.0, but I can work with 0.5 as the max value. It just means that I
need to double the PriorityWeightFairshare factor in order to achieve
the intended relative weighting between Fairshare, QOS, Partitions,
JobSize, Age.
Ed
*From:*Ryan Cox [mailto:ryan_
I assume you are using the default fairshare algorithm since you didn't
specify otherwise. F=2**(-U/S) where U is Effectv Usage (often
displayed in documentation as UE) and S is Norm Shares. See
http://slurm.schedmd.com/priority_multifactor.html under the heading
The SLURM Fair-Share
.
Idiria Sociedad Limitada reserves the right to take legal action
against any persons unlawfully gaining access to the content of any
external message it has emitted.
For additional information, please visit our website
http://www.idiria.com http://www.idiria.com/
--
Ryan Cox
Operations
On 09/23/2014 11:27 AM, Trey Dockendorf wrote:
Has anyone used the Lua job_submit plugin and also allows multiple partitions? I'm not
even user what the partition value would be in the Lua code when a job is submitted with
--partition=general,background, for example.
We do. We use the
have to
use
accounting to enfoce the limits? Or is there another way that I
don't
see?
Best regards,
Uwe Sauter
--
Ryan Cox
Operations Director
Fulton Supercomputing Lab
Brigham Young University
different
from the DRF ones? Or if one does specify different rates, it might end up
breaking some of the fairness properties that are described in the DRF paper
and opens up the algorithm for gaming?
--
Janne Blomqvist
From: Ryan Cox [ryan_...@byu.edu]
Sent
Thanks. I can certainly call it that. My understanding is that this
would be a slightly different implementation from Moab/Maui, but I don't
know those as well so I could be wrong. Either way, the concept is
similar enough that a more recognizable term might be good.
Does anyone else
in a bug report from the University of
Chicago: http://bugs.schedmd.com/show_bug.cgi?id=858.
Ryan
On 07/25/2014 10:31 AM, Ryan Cox wrote:
Bill and Don,
We have wondered about this ourselves. I just came up with this idea
and haven't thought it through completely, but option two seems like
just need to figure out a database
query to cull this information?
Thanks,
Bill
--
Ryan Cox
Operations Director
Fulton Supercomputing Lab
Brigham Young University
management system.
--
Ryan Cox
Operations Director
Fulton Supercomputing Lab
Brigham Young University
to our use case), see
http://tech.ryancox.net/2014/06/problems-with-slurm-prioritization.html.
--
Ryan Cox
Operations Director
Fulton Supercomputing Lab
Brigham Young University
student to have administrative control over the subaccount since he
actually knows the students but not have it affect priority calculations.
Ryan
--
Ryan Cox
Operations Director
Fulton Supercomputing Lab
Brigham Young University
',
'SelectTypeParameters' = 'CR_Core_Memory',
'SelectType'= 'select/cons_res',
--
Perfection is just a word I use occasionally with mustard.
--Atom Powers--
--
Ryan Cox
Operations Director
Fulton Supercomputing Lab
Brigham Young University
the exceeding
50 MB or so... they would actually fit in the swap area
and the job should not be killed...
What am I missing here?
Should the code itself be aware of the given mem.limit=9000MB?
Thanks for any explanation.
MG
--
Ryan Cox
Operations Director
Fulton Supercomputing Lab
Brigham Young
for Science Ltd.
E-Mail: olli-pekka.le...@csc.fi
Tel: +358 50 381 8604
skype: oplehto // twitter: ople
--
Ryan Cox
Operations Director
Fulton Supercomputing Lab
Brigham Young University
http://tech.ryancox.net
is being imposed about 5 minutes into the job.
Thanks
--
Ryan Cox
Operations Director
Fulton Supercomputing Lab
Brigham Young University
system administrator
research computing center
university of chicago
773.702.1104
--
andy wettstein
hpc system administrator
research computing center
university of chicago
773.702.1104
--
Ryan Cox
Operations Director
Fulton Supercomputing Lab
Brigham Young University
. Is this
amount realistic?
Is there a more efficient method to control memory usage on nodes which
are shared?
Thank you for any advice,
Kevin
--
Ryan Cox
Operations Director
Fulton Supercomputing Lab
Brigham Young University
.
--
Ryan Cox
Operations Director
Fulton Supercomputing Lab
Brigham Young University
at the documentation I don't see
anyway to do this other than what I stated above.
-Paul Edmon-
--
Ryan Cox
Operations Director
Fulton Supercomputing Lab
Brigham Young University
, Ryan Cox wrote:
Paul,
We were discussing this yesterday due to a user not limiting the amount
of jobs hammering our storage. A QOS with a GrpJobs limit sounds like
the best approach for both us and you.
Ryan
On 06/19/2013 09:36 AM, Paul Edmon wrote:
I have a group here that wants to submit
and then launches
sub-processes to solve (MPI on a local system) without connecting
the process IDs(?) In any event, I'm guessing I'm not the first
person to run into this. Is there a recommended solution to
configure SLURM to track codes like this?
Thanks,
~Mike C.
--
Ryan Cox
48 matches
Mail list logo