On Sunday, 5 November 2017 11:09:29 AM AEDT ايمان wrote:
> I want to run parallel java code on more than one nodes , but it executed
> only on one nodes ?
Java is not magically able to span nodes, you need to ensure your program can
handle that and has the necessary supporting
On Thursday, 2 November 2017 8:02:47 PM AEDT Rajiv Nishtala wrote:
> And also using cgroups; https://slurm.schedmd.com/cgroup.conf.html
That will constrain the memory a job can use to what it has asked for, but I
think that the original poster was asking how to stop a user asking for that
On Tuesday, 24 October 2017 7:51:23 PM AEDT Rajiv Nishtala wrote:
> I'm trying to play with the part of the code that is responsible for killing
> a job if it exceeds a memory limit, for instance via cgroups or so.
With cgroups it is the Linux kernel, not Slurm, that is responsible for
killing
On Friday, 20 October 2017 9:53:06 AM AEDT Lachlan Musicman wrote:
> Latest version of sssd can take shortnames and search through domains.
I'm not sure if that works though if you've got two different people with the
same username in different domains though.
cheers,
Chris
--
Christopher
On Thursday, 19 October 2017 7:41:37 PM AEDT Nadav Toledo wrote:
> running : id -u domain_name\\username , does return its uid
So your system is not finding users as just "username", but instead only as
domain_name\\username which is probably not ideal.
You probably want to see if you can
On Thursday, 19 October 2017 4:43:13 PM AEDT Nadav Toledo wrote:
> so adding manually only works if I dont restart slurmctrld...
That usually points to a communication problem for slurmdbd trying to tell the
slurmctld about these changes via an RPC.
What does this say?
sacctmgr list clusters
On Monday, 9 October 2017 8:46:21 PM AEDT Sysadmin CAOS wrote:
> Mmmm, yes... CentOS only offers PMIX packages and I don't know where I
> can find PMI{1,2} packages... How should I compile SLURM?
To compile Slurm to support PMIx you need to have this in your configure:
On Monday, 9 October 2017 8:11:29 PM AEDT Chris Samuel wrote:
> Do you mean --with-pmix=${PATH_TO_PMIX} instead?
Sorry, I thought you were configuring Slurm with PMIx support there!
--
Christopher SamuelSenior Systems Administrator
Melbourne Bioinformatics - The Univers
On Monday, 9 October 2017 7:11:06 PM AEDT Sysadmin CAOS wrote:
> I have compiled OpenMPI 1.8.1 with --with-pmi=/usr/lib64 (where is
> located libpmix.so file)
Do you mean --with-pmix=${PATH_TO_PMIX} instead?
--
Christopher SamuelSenior Systems Administrator
Melbourne Bioinformatics
On Tuesday, 22 August 2017 9:46:17 PM AEST 刘科 wrote:
> python ~/opt/script/auto_run_te_by_steps.py {} `%SLURM_JOB_ID`
Try this instead:
python ~/opt/script/auto_run_te_by_steps.py '{}' ${SLURM_JOB_ID}
That uses single quotes to stop the shell from trying to expand whatever that
is (I'm not a
On Tuesday, 15 August 2017 4:34:55 PM AEST John Hearns wrote:
> For the /proc/self you need to start an interactive job under Slurm.
You can actually use srun to join an existing job via the --jobid option.
[samuel@barcoo ~]$ srun --jobid 6821761 --pty -u /bin/bash -i -l
[samuel@barcoo033 ~]$
On Tuesday, 15 August 2017 1:16:08 PM AEST Lachlan Musicman wrote:
> Oh, that explains more.
>
> Now it looks like:
>
[...]
OK, so that looks better!
So what does this say:
cat /sys/fs/cgroup/cpuset/slurm/uid_1506/job_1998/step_batch/cpuset.cpus
> I seem to have a lot of guff in there that
On Thursday, 25 May 2017 6:51:26 PM AEST Baker D. J. wrote:
> Thank you for your response to my email. I've taken a look at one of the
> compute nodes that has been drained by the SLURM system -- please see
> below. If appears to suggest the node was drained due to a job failing
> (running out
On Friday, 10 March 2017 2:26:08 PM AEDT Grigory Shamov wrote:
> Another newbie question: does SLURM report any used memory (as well as
> other resource usage, other than wall time) statistics for jobs, as part
> of either Accounting or Completion records ?
If you use the slurmdbd accounting
On Saturday, 5 November 2016 12:04:03 AM AEDT Peter van Heusden wrote:
> Thanks! That was the problem - I misunderstood from my NICD colleagues what
> the machine config is and it has only a single CPU:
No worries, glad that helped!
--
Christopher SamuelSenior Systems Administrator
On Friday, 4 November 2016 11:39:30 PM AEDT Peter van Heusden wrote:
> Nov 5 02:12:09 bio-linux slurmctld[27239]: error: Node localhost has low
> socket*core*thread count (4 < 8)
> Nov 5 02:12:09 bio-linux slurmctld[27239]: error: Node localhost has low cpu
> count (4 < 8)
So Slurm is
On Friday, 23 September 2016 12:17:26 AM AEST Lachlan Musicman wrote:
> Is there a description of what each field is in the slurmdbd?
I don't think the Slurm developers support direct access to the database like
that and fields are liable to change on major releases.
Tools like XDMoD get
On Thursday, 26 May 2016 11:37:02 PM AEST Christopher Samuel wrote:
> Which is really odd as the code already calls into the Slurm libraries
> for other functions such as slurm_hostlist_create(), etc.
Solved! The slurm_showq code I'm hacking on is C++, not straight C, so I had
to change my
On Fri, 18 Mar 2016 06:53:38 AM Doguparthi, Subramanyam wrote:
> We are from Hewlett Packard Enterprise and evaluating SLURM
> for one of our requirements. Database our application uses is Postgres and
> we don’t see any working plugin available. Is it possible to help us with
>
On Tue, 15 Mar 2016 05:40:29 AM Bjørn-Helge Mevik wrote:
> I apologize for the slightly off-topic subject, but I could not think of
> a better forum to ask. If you know of a more proper place to ask this,
> I'd be happy to know about it.
http://beowulf.org/
There's actually a very recent
Hi Loris,
On Tue, 1 Mar 2016 12:29:12 AM Loris Bennett wrote:
> To help the user understand their current fairshare/priority status, I
> usually point them to 'sprio', generally in the following incantation:
>
> sprio -l | sort -nk3
>
> to get the jobs sorted by priority.
Thanks for that,
On Mon, 29 Feb 2016 11:50:18 PM Uwe Sauter wrote:
> Did you configure the RebootProgram parameter in slurm.conf and is that
> script working? Remember: this script is run on the compute node, therefore
> it must be available on the compute node and must be executable.
Yes, all our clusters are
On Thu, 18 Feb 2016 02:46:20 PM Skouson, Gary B wrote:
> As far as I can tell, there isn’t really a way to set a default combined
> limit across the cluster for the jobs a particular user can run
You can!We do:
sacctmgr modify cluster ${CLUSTER} set maxjobs=192
We thought we had to use
On Mon, 28 Dec 2015 08:25:04 PM Simpson Lachlan wrote:
> Out of interest – and because I don’t really see any docs about this – is
> XDMod just a front end to slurmdbd?
>
> If I go with XDMod do I even need slurmdbd?
One of our staff is playing with XDMod at the moment and what you do is
On Tue, 22 Dec 2015 12:57:04 AM Janne Blomqvist wrote:
> 1. When using systemd, or some other tool that mounts the cgroup file
> systems early in the boot process (e.g. cgconfig), you should not try to
> mount the cgroup filesystems from slurmd. That is, in
> /etc/slurm/cgroup.conf put
On Sat, 14 Nov 2015 04:09:03 PM Apolinar Martinez Melchor wrote:
> We want to update SLURM 2.6.7 to SLURM 15.08.4
Be aware that you cannot upgrade directly from 2.6 to 15.08, it's too big a
jump.
A version of Slurm only supports the last two minor releases, so 15.08 only
supports upgrades
On Mon, 19 Oct 2015 12:44:59 PM Ghislain LE MEUR wrote:
> Is it possible to start a job on one of these 2 nodes with 512G and also
> compute on others nodes with 128G of memory ? The job needs only more
> memory on the first node where the job start.
I suspect what you are after is the support
On Wed, 7 Oct 2015 12:50:53 AM g...@cines.fr wrote:
> #!/bin/bash
> #SBATCH --nodes=4
> #SBATCH --ntasks=7
> #SBATCH --ntasks-per-node=2
[...]
> With slurm version 14.11.9, when the job is submitted with sbatch command we
> get :
>
> SLURM_NTASKS=8
> SLURM_NPROCS=8
>
On Thu, 1 Oct 2015 02:23:30 AM vaibhav pol wrote:
>Does in Slurm, can we route jobs between partition. Previously I was
> working on PBS (Torque) it has routing queue functionality these queues
> are able to route jobs to different queue.
In Slurm you can submit to multiple partitions at
Hi there,
We've upgraded our slurmdbd to 14.11.8 as part of our prep for a new cluster
and I'm having issues with the creation of the new cluster.
We can add the new cluster snowy to slurmdbd with sacctmgr, but any changes
to accounts after that now fail with:
# sacctmgr modify user set
On Fri, 21 Aug 2015 02:59:56 AM Danny Auble wrote:
You need to add the accounts to the cluster. If you want it like your other
cluster an easy way to do that is use sacctmgr to dump the cluster and then
change the cluster name in the file and load it in with sacctmgr.
Thanks Danny - it's odd
On Fri, 21 Aug 2015 02:59:56 AM Danny Auble wrote:
You need to add the accounts to the cluster. If you want it like your other
cluster an easy way to do that is use sacctmgr to dump the cluster and then
change the cluster name in the file and load it in with sacctmgr.
Dumped another cluster,
On Thu, 25 Jun 2015 02:03:47 AM Bjørn-Helge Mevik wrote:
Christopher Samuel sam...@unimelb.edu.au writes:
http://karaage.readthedocs.org/en/latest/introduction.html
Karaage looks interesting for managing projects and users. Can it
manage usage limits?
Sadly that's one thing it doesn't
On Thu, 7 May 2015 04:01:25 AM Igor Kozin wrote:
My real question is why running
salloc --mem-per-cpu=1000 --ntasks=1 bash
does not create cgroups and therefore gets you an unlimited interactive
session?
My understanding is that salloc will give you a session on the same node you
run it,
Hi folks,
There's been discussion in the past about Slurm and docker and for the first
time today I was asked by a user if it was possible yet to run docker
containers inside Slurm.
Their use case is they want to distribute bioinformatics tools inside a docker
container and want to be able
On Thu, 5 Feb 2015 03:27:25 PM Peter A Ruprecht wrote:
I ask because some of our users have started reporting a 10x increase in
run-times of OpenMPI jobs since we upgraded to 14.11.3 from 14.3. It's
possible there is some other problem going on in our cluster, but all of
our hardware checks
On Tue, 27 Jan 2015 10:11:23 PM Dennis Zheleznyak wrote:
Anyone ?
Don't run it in an sbatch as it will be disconnected from your login session
and any $DISPLAY you may have won't make any sense.
I believe you need to run the srun from the login node itself.
--
Christopher Samuel
On Fri, 9 Jan 2015 05:11:39 AM Hendryk Bockelmann wrote:
Do you have any idea if this is an intended behaviour or a bug?
We hit this same issue when we were bringing up our Intel Sandybridge cluster
back in 2013 (as SMT for SB was better than previous generations) and it was a
significant
Hi Andrew,
On Thu, 11 Dec 2014 07:06:13 AM Andrew J. Prout wrote:
Looks like the groups id command actually mix reporting of the current
process and what's in /etc/group, so don't work to debug this issue.
Ah, sorry! I should have strace'd groups first to confirm it wasn't talking to
On Wed, 10 Dec 2014 09:20:27 AM Andrew J. Prout wrote:
Below are some examples of what I'm seeing with SLURM 14.11.0. Notice that
group 1000 disappears in the sleep process.
I can't replicate this on Slurm 14.03.10, FWIW, it seems to do the right
thing.
Also remember that you can grab your
On Thu, 2 Oct 2014 12:15:10 AM Dennis Zheleznyak wrote:
Hi Chris,
Hiya,
Those all look fine, but I've just noticed you're building RPMs there and
that's something we don't do with Slurm so I'm afraid I'm not sure I can help
there, sorry! :-(
All the best,
Chris
--
Christopher Samuel
On Mon, 29 Sep 2014 02:10:07 AM Alan Orth wrote:
Other users wouldn't have noticed because we updated all of our
infrastructure in one go using ansible[0] last Friday.
We use xCAT to manage our clusters and whilst we could have done that if we
had wished it would have caused any jobs queued
On Fri, 29 Aug 2014 11:03:10 AM Martin Perry wrote:
Thanks for investigating this. It looks like some work will be required to
fully integrate Slurm cgroups with systemd.
I suspect that's going to depend mightily on the version of systemd and the
kernel you are using (for points already
On Fri, 29 Aug 2014 06:02:10 AM Janne Blomqvist wrote:
Haven't tested anything yet, but with RHEL/CentOS 7 already available, I
suspect it won't be long before people are starting to roll out clusters
based on those OS'es. So the topic certainly deserves some attention,
thanks for bringing
On Sun, 24 Aug 2014 01:58:05 AM Dennis Zheleznyak wrote:
I'm upgrading Slurm from 2.4.4 to the latest 14.X version, when I tried to
simulate it in a virtual environment the running jobs were deleted every
single time.
As Uwe said I suspect that's too large a jump to be supported, you might
On Fri, 25 Jul 2014 01:34:09 AM Bastian Krüger wrote:
I recently began working with a cluster that consists of 1 control node and
several computation node and it was set up a couple of years ago by someone
else. In this current setup, there is only one actual slurm installation,
which is
On Mon, 19 May 2014 04:37:03 AM Teshome Dagne Mulugeta wrote:
Is there a way to keep the running jobs continue after a netwokring issue
between slurm daemon and nodes?
I suspect the answer is the --no-kill option for sbatch.
Best of luck!
Chris
--
Christopher SamuelSenior Systems
Hi folks,
As we are transitioning from Torque+Moab to a pure Slurm setup I'm trying to
wrap my head around how we can implement similar scheduling limits on our x86
systems to the rules we were using with Moab.
One thing that I've found is that GrpCPURunMins, GrpCPUs and GrpJobs are what
we
Hi there,
On Sat, 20 Jul 2013 02:53:52 AM Bjørn-Helge Mevik wrote:
With the recent changes in glibc in how virtual memory is allocated for
threaded applications, limiting virtual memory usage for threaded
applications is IMO not a good idea. (One example: our slurcltd has
allocated 16.1
49 matches
Mail list logo