[slurm-dev] Re: slurm allows submition of job, which node requirments cannot be satisfy

2012-11-06 Thread Marcin Stolarek
immediatelly. I can't fully understand what's going on. Is this possible that NumNodes is changed during scheduling? cheers marcin Quoting Marcin Stolarek stolarek.mar...@gmail.com: Hi, I'm using slurm-2.3.3 and I've noticed a problem. If user specify for example: srun -N 1 -n 64 -C Intel

[slurm-dev] Re: Task distribution in the accounting database

2012-12-05 Thread Marcin Stolarek
2012/12/4 Danny Auble d...@schedmd.com It should be possible to regenerate the task layout based on the distribution method and calling the slurm_step_layout_create function should be able to generate the layout for you in most cases outside of arbitrary. While most of the pieces are here,

[slurm-dev] Re: Lustre as a licence

2012-12-16 Thread Marcin Stolarek
control which filesystems are accessible to each namespace. chroot proposition sounds interesting, I don't know how to use cgroups for filesystem IO operations limiting. thank you, marcin Sent from my iPhone On Dec 16, 2012, at 10:12 AM, Marcin Stolarek stolarek.mar...@gmail.com wrote: Hi

[slurm-dev] Re: node switching / selection

2013-03-22 Thread Marcin Stolarek
2013/3/22 Michael Colonno mcolo...@stanford.edu Hi Folks ~ Hi, A couple (hopefully) simple questions; I can't find anything that obviously / easily solves these in the man pages. I have a fairly ordinary deployment in which scheduling is done by core so some high-memory

[slurm-dev] Re: Interesting queries / plots for slurmdbd?

2013-03-31 Thread Marcin Stolarek
Hi all, I'd like to join this question. I remember someone presenting web based application allowing visualisation of accounting data from slurmdbd. Does anyone know a running solution? cheers, marcin 2013/3/30 Alan Orth alan.o...@gmail.com All, I've got a small cluster using MySQL for

[slurm-dev] Re: Problems with slurm during many sacctmgr

2013-04-17 Thread Marcin Stolarek
2013/4/17 Danny Auble d...@schedmd.com Hi Danny, You should really look at using the 'sacctmgr load' command. It can load a whole cluster in one pull. It sounds really interesting, but I'm afraid that this format can change with slurm version? Am I right? thanks marcin

[slurm-dev] Re: Job Groups

2013-06-19 Thread Marcin Stolarek
2013/6/19 Paul Edmon ped...@cfa.harvard.edu: I have a group here that wants to submit a ton of jobs to the queue, but want to restrict how many they have running at any given time so that they don't torch their fileserver. They were using bgmod -L in LSF to do this, but they were wondering

[slurm-dev] Small development problems

2013-08-02 Thread Marcin Stolarek
Hi all, I have a few questions related to our recent slurm experience. 1) Is it possible to print text to user standard ouput/standard error form job_submit plugin and returning SLURM_SUCCESS at the same time? 2) Is it possible to run mysqld for slurmdbd, on ndb backend?

[slurm-dev] Re: Quota system

2013-08-05 Thread Marcin Stolarek
2013/8/5 Marco Passerini marco.passer...@csc.fi Hi, Would it be possible to add in Slurm the capability of running a script on submission time? Check documentation, you need JobSubmitPlugins. cheers, marcin

[slurm-dev] Re: Slurm User Group Meeting and New releases: v2.6.1, v13.12.0-pre1

2013-08-19 Thread Marcin Stolarek
Hi guys, 2013/8/19 Moe Jette je...@schedmd.com Quoting Bjørn-Helge Mevik b.h.me...@usit.uio.no: Moe Jette je...@schedmd.com writes: * Changes in Slurm 13.12.0pre1 == Just curious: Why the sudden jump in version numbering? year.month? Correct. We're

[slurm-dev] Re: Managing the SLURM database

2013-08-24 Thread Marcin Stolarek
2013/8/22 Chauvin Antoine antoine.chau...@synchrotron-soleil.fr Hi Hi, , As recommended by SLURM I use a mysql database with the slurmdbd daemon. Slurm include the failover system, but nor mysql, I would like to know how do you manage the mysql failover ? I was dealing with this

[slurm-dev] Re: Slurm User Group Meeting Regisration

2013-08-26 Thread Marcin Stolarek
2013/8/26 Moe Jette je...@schedmd.com Registration for the Slurm User Group Meeting ends soon. Please see the agenda and registration information at the site listed below. http://slurm.schedmd.com/**slurm_ug_agenda.htmlhttp://slurm.schedmd.com/slurm_ug_agenda.html Is it possible for you to

[slurm-dev] Re: Strangely a job requeued, when a node failed

2013-10-21 Thread Marcin Stolarek
2013/10/21 Lennart Karlsson lennart.karls...@it.uu.se Hi, I have set the configuration parameter JobRequeue to zero, so failed jobs should not automatically requeue and rerun: # scontrol show config|grep -i requeue JobRequeue = 0 # But still jobs are rerun:

[slurm-dev] Problem with reservations

2013-11-05 Thread Marcin Stolarek
Hi Guys, I'm currently experiencing a problem with reservation. The job have been submitted with appropriate --reservation parameter, the reservation is active and all nodes in reservation are in idle state. Despite of this conditions job remains in pending state. You can find output from

[slurm-dev] Problem with setting default account.

2013-12-03 Thread Marcin Stolarek
Hi, I've noticed a problem with setting default account in slurm. I'm creating account database with my own script, and then loading everythink with: sacctmgr load clean file=./cluster.cfg The listing describing the problem is: [plgstolarek@hpc ~]$ sacctmgr -p show user plgstolarek User|Def

[slurm-dev] Re: Reserving a node for short jobs between specific hours

2014-02-14 Thread Marcin Stolarek
2014-02-14 4:50 GMT+01:00 Christopher Samuel sam...@unimelb.edu.au: -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Hi there, I'm looking at tweaking our Slurm config to get it to handle our workload better and it seems to be doing pretty well, but there's one thing missing that I'd like

[slurm-dev] Re: Is MaxJobs enforced for associations?

2014-03-04 Thread Marcin Stolarek
2014-03-03 23:46 GMT+01:00 Christopher Samuel sam...@unimelb.edu.au: -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 04/03/14 04:53, Lyn Gerner wrote: Have you also set AccountingStorageEnforce appropriately, as described here: http://slurm.schedmd.com/resource_limits.html ? We've

[slurm-dev]

2014-03-13 Thread Marcin Stolarek
Hi guys, On our cluster we run into situation, when we want change the SlurmdSpoolDir location, do you know any way to do this without draining whole cluster? cheers, marcin Marcin Stolarek Interdisciplinary Center for Mathematical and Computational Modeling

[slurm-dev] select plugins doesn't correctly deal with PreemptMode=off

2014-03-18 Thread Marcin Stolarek
. I'was working on 2.6.7, but quick review of the code shows that there were no chages in 14.03-rc1 version. cheers, marcin === Marcin Stolarek Interdisciplinary Centre for Mathematical and Computational Modelling (ICM), University of Warsaw, Poland patch Description: Binary data

[slurm-dev] Re: Guidance on planning a slurmdbd outage

2014-04-07 Thread Marcin Stolarek
important to have enough free memory but I assume you are not using 32MB of RAM. cheers, marcin -- Marcin Stolarek Interdisciplinary Centre for Mathematical and Computational Modelling (ICM), University of Warsaw, Poland

[slurm-dev] Re: Shared TmpFS

2014-04-25 Thread Marcin Stolarek
2014-04-25 15:25 GMT+02:00 Barbara Krasovec barba...@arnes.si: Hello! As far as I know SlurmSpoolDir is a directory for slurmd state information, slurm TMPFS is the directory that the daemon should use when SlurmSpoolDir runs out of space. man slurm.conf /TmpFs TmpFS Fully qualified

[slurm-dev] Re: slurm_update error

2014-05-05 Thread Marcin Stolarek
2014-04-30 21:33 GMT+02:00 jeff wang slurmbegin...@gmail.com: Hello, I am using slurm 2.6.6. When I tried to change a job's size using the command: scontrol update JobId=2440 NumCPUs=3. It gave me an error: slurm_update error: Requested operation is presently disabled Can anyone

[slurm-dev] job_submit plugin init function

2014-05-08 Thread Marcin Stolarek
this doesn't work. My question is, does any mechanism to load job_submit plugin configuration file only once on slurmctld start exist? cheers marcin -- Marcin Stolarek Interdisciplinary Centre for Mathematical and Computational Modelling (ICM), University of Warsaw, Poland

[slurm-dev] How is job_suspend function defined?

2014-05-29 Thread Marcin Stolarek
( in my case probably with _slurm_cgroup_suspend) is made in code. thanks in advance, marcin -- Marcin Stolarek Interdisciplinary Centre for Mathematical and Computational Modelling (ICM), University of Warsaw, Poland

[slurm-dev] Re: problem with slurm job step creation

2014-05-30 Thread Marcin Stolarek
. www.wipro.com -- Marcin Stolarek Interdisciplinary Centre for Mathematical and Computational Modelling (ICM), University of Warsaw, Poland

[slurm-dev] Re: Segfault in gres.c:2945 with 14.03.5

2014-07-15 Thread Marcin Stolarek
2014-07-15 15:43 GMT+02:00 Markus Blank-Burian bur...@muenster.de: Hi, after job 436172 completed, the slurmctld daemon segfaulted. Starting slurmctld again reproduces the segfault. Debugging with gdb shows the following backtrace. How can i fix this without losing the complete state?

[slurm-dev] Job Submit plugin API

2014-07-16 Thread Marcin Stolarek
Hi guys, I've been checking job submit plugins API description: http://slurm.schedmd.com/job_submit_plugins.html And I was supprised that third argument of job_modify function is: part_list (input) List of pointer to partitions which this user is authorized to use. I've check job_modify

[slurm-dev] Re: Limiting job array concurrency

2014-07-22 Thread Marcin Stolarek
2014-07-21 14:43 GMT+02:00 Yuri D'Elia wav...@thregr.org: Is there a way for an user to specify an upper bound for the number of jobs running simultaneously on an array? A practical example for this scenario is to limit concurrent connections to a shared resource that is external to the

[slurm-dev] Re: even CPU load

2014-07-28 Thread Marcin Stolarek
2014-07-28 8:00 GMT+02:00 Леонид Коньков fatle...@gmail.com: Hi. I want my CPUs be loaded as even as possible. I have 22 nodes (motherboards) with 1 CPU each with 8 CPU cores each. ( My English is far from perfect, and I'm not shure what is what in you terminology.) I want to load 100

[slurm-dev] Enforcing qos limits without associations limits

2014-07-31 Thread Marcin Stolarek
Hi guys, In our installation we have a separate job_submit plugin which checks the account validity directly in LDAP. We would like to disable associations enforcement, but in current configuration we are using qos limits which are limiting number of jobs and cores per user (user can choose to

[slurm-dev] Re: Checkpoint support using BLCR - Steps and needed packages

2014-08-05 Thread Marcin Stolarek
2014-08-05 19:11 GMT+02:00 Trey Dockendorf treyd...@tamu.edu: I have found that in order to support SUSPEND preemption we can not use CR_Memory or Memory as a consumable resource. I've seen that if a preemptable partition has requested 15900MB of RAM on a 16GB node then the job will not be

[slurm-dev] Re: How to size the controller systems

2014-08-18 Thread Marcin Stolarek
W dniu poniedziałek, 18 sierpnia 2014 Jason Bacon jwba...@tds.net napisał(a): The controller generally shouldn't require much, but if you're running Linux, be aware that the way memory use is measured in recent kernels makes it look like slurmctld is using a lot of RAM Can you point me to

[slurm-dev] Re: Storing the job submission script in the accounting database

2014-08-22 Thread Marcin Stolarek
may be also interested in using slurmmon whitespace's, you can find slurmmon project on github. cheers, marcin On 21/08/2014 19:18, Marcin Stolarek wrote: W dniu czwartek, 21 sierpnia 2014 Antony Cleave antony.cle...@gmail.com napisał(a): Is it possible to store the job submission

[slurm-dev] Re: Upgrading and not losing jobs

2014-08-25 Thread Marcin Stolarek
2014-08-25 7:28 GMT+02:00 Dennis Zheleznyak den...@eshkol.com.co: Both running and queued From theoretical point of view it's possible to upgrade from 2.4.4 to 14.11 version, for sure you have to do that step by step, since slurm protocol is consistent only between three subsequent

[slurm-dev] Re: can't see completed jobs with squeue

2014-09-10 Thread Marcin Stolarek
2014-09-10 15:10 GMT+02:00 Erica Riello ericaflrie...@gmail.com: Hi all, I'm running Slurm 14.03.07 and I'd like to configure it in order to preserve completed jobs for 5 minutes so that if I run squeue a short while after a job completion, it would show me the job. What do I have to add

[slurm-dev] Re: job pending, not starting

2014-09-23 Thread Marcin Stolarek
2014-09-23 20:23 GMT+02:00 Eva Hocks ho...@sdsc.edu: How can I get a job started after it was pending with JobState=PENDING Reason=AssociationJobLimit I removed the qos job limit with no success, I removed the user from the qos with no success. I tried the scontrol StartTime=now with no

[slurm-dev] Re: shellshock patch uses a different function export, caused some errors on our Slurm cluster

2014-09-28 Thread Marcin Stolarek
2014-09-27 0:30 GMT+02:00 John Brunelle john_brune...@harvard.edu: Though I hope everyone is putting the bash shellshock patching in their rearview mirror, it might still help to be aware of a change to function exports that the latest version introduced. Instead of the corresponding

[slurm-dev] Re: shellshock patch uses a different function export, caused some errors on our Slurm cluster

2014-09-29 Thread Marcin Stolarek
2014-09-29 11:10 GMT+02:00 Alan Orth alan.o...@gmail.com: Wow, well spotted. I came here to see if anyone had reported this same issue with environment modules, as I noticed several of my jobs failing on our cluster this morning. Turns out, I'm probably the only one who had failed jobs, as

[slurm-dev] Re: OpenMPI, mpirun and suspend/gang

2014-11-09 Thread Marcin Stolarek
W dniu niedziela, 9 listopada 2014 Ralph Castain r...@open-mpi.org napisał(a): What stop signal is being sent, and where? We will catch and suspend the job on receipt of a SIGTSTP signal by mpirun. On Nov 9, 2014, at 6:47 AM, Jason Bacon jwba...@tds.net javascript:; wrote: Does

[slurm-dev] Re: How to get SLURM to honor TmpDisk reservations?

2014-11-13 Thread Marcin Stolarek
2014-11-12 17:06 GMT+01:00 David Lipowitz david.lipow...@seagate.com: Hi, We're doing a SLURM proof-of-concept and the management of temp space is really important for what we want to do. We set up a few virtual machines as a test with the following slurm.conf settings: FastSchedule=2

[slurm-dev] Re: Number of running jobs as a priority factor

2014-12-15 Thread Marcin Stolarek
2014-12-16 4:08 GMT+01:00 je...@schedmd.com: Quoting Skouson, Gary B gary.skou...@pnnl.gov: I'm looking for a way to prioritize jobs so that users with jobs running get lower priority than those without jobs running. I'd like the priority to be independent of the job size or past usage.

[slurm-dev] Re: Environment in prolog/epilog

2015-01-19 Thread Marcin Stolarek
2015-01-19 17:35 GMT+01:00 Uwe Sauter uwe.sauter...@gmail.com: Hi, is there a list of SLURM environment variables which I can access in the different prolog/epilog scripts? Specifically is it possible to get a list of nodes for a job in in the PrologSlurmctld script although this runs on

[slurm-dev] Re: Rounding up of resource requests

2015-02-14 Thread Marcin Stolarek
you may also be interested in using this: https://github.com/cinek810/misc/tree/master/jobsubmit/job_sane this plugin allows you to reject jobs that are trying to use more than one node, and not the full nodes. It uses separate configuration file to allow specific users jobs not to be checked and

[slurm-dev] Separate slurm-realdev list

2015-03-10 Thread Marcin Stolarek
Hi guys, One of the ideas that came on the last slurm user group was to create a separate list for more advanced topics. Any news on this? cheers, marcin

[slurm-dev] Re: Segment Fault in Slurmctld

2015-04-23 Thread Marcin Stolarek
2015-04-21 19:16 GMT+02:00 Dinesh Kumar dinesh121...@gmail.com: Hi Everyone, I am supporting cons_res plugin but my code is segment faulted, so please give me the hints to use gdb with core. Thanks for ur help in advance. check this page:

[slurm-dev] Re: Off-topic: What accounting system do you use?

2015-06-24 Thread Marcin Stolarek
2015-06-24 16:43 GMT+02:00 Veronique Legrand vlegr...@pasteur.fr: On 24/06/15 16:04, Bjørn-Helge Mevik wrote: (Apologies for this slightly off-topic question.) We are currently using Gold (http://www.adaptivecomputing.com/products/open-source/gold/) to manage allocations and accounting,

[slurm-dev] Re: Off-topic: What accounting system do you use?

2015-06-24 Thread Marcin Stolarek
2015-06-24 23:12 GMT+02:00 Marcin Stolarek stolarek.mar...@gmail.com: 2015-06-24 16:43 GMT+02:00 Veronique Legrand vlegr...@pasteur.fr: On 24/06/15 16:04, Bjørn-Helge Mevik wrote: (Apologies for this slightly off-topic question.) We are currently using Gold (http

[slurm-dev] Re: Off-topic: What accounting system do you use?

2015-06-24 Thread Marcin Stolarek
Sory for previous mails.. (keyboard problem) We are using slurm accounting with xdmod (http://xdmod.sourceforge.net/) for graphical presentation. It's nice and I hope with group of people using and developing this tool will make it even better :) cheers, marcin

[slurm-dev] Re: Job name truncated in email

2015-06-24 Thread Marcin Stolarek
2015-06-24 22:27 GMT+02:00 Moe Jette je...@schedmd.com: It's open source. Help yourself to it. Like it! :)

[slurm-dev] Changing /dev file permissions for particular user

2015-06-24 Thread Marcin Stolarek
Hey! I've got one user I trust and know that he isn't going to do anything malicious, he needs a direct acces to file in dev (/dev/cpu/*/msr in particular). Have anybody checked how to do such a thing in slurm? We are thinking abuot doing it in prologue and changing back in epilogue, checking if

[slurm-dev] Re: problem starting slurm on stateless node

2015-08-12 Thread Marcin Stolarek
2015-08-12 19:46 GMT+02:00 Trevor Gale tre...@snowhaven.com: Thank you for your reply! I found that the error was being caused by the var/log/* directories being excluded, as well as the hostname being changed on the node when I switched to Warewulf. I thought about using the file store to

[slurm-dev] Re: node renaming

2015-07-28 Thread Marcin Stolarek
2015-07-28 15:47 GMT+02:00 Andrew E. Bruno aebru...@buffalo.edu: On Tue, Jul 28, 2015 at 06:30:09AM -0700, Marcin Stolarek wrote: 2015-07-28 15:08 GMT+02:00 Andrew E. Bruno aebru...@buffalo.edu: We need to rename all the nodes in our cluster. Our thinking is to put in a full-system

[slurm-dev] Re: node renaming

2015-07-28 Thread Marcin Stolarek
2015-07-28 15:08 GMT+02:00 Andrew E. Bruno aebru...@buffalo.edu: We need to rename all the nodes in our cluster. Our thinking is to put in a full-system reservation: scontrol create reservation nodes=ALL .. Take the nodes down and rename them. Then bring slurm backup configured with

[slurm-dev] Re: Logging job executing time.

2015-08-10 Thread Marcin Stolarek
2015-08-10 17:35 GMT+02:00 Zentz, Scott C. ze...@email.unc.edu: Hello Everyone, The email’s from SLURM contain the job completion time but I was wondering if there was a way to get the job completion time from either an srun or sbatch command and have the time logged to a

[slurm-dev] Re: Problem using slurm 14.11.9 :

2015-10-19 Thread Marcin Stolarek
2015-10-19 18:21 GMT+02:00 : > Hello, > is there someone who can explain such kind of message in *slurmctld.log* : > > *debug: not the right user 2279 != 1761* > Have you enabled debug4 messages? This message can just mean that the association beeing iterated is not added

[slurm-dev] Re: extend user ssh permissions

2015-09-01 Thread Marcin Stolarek
2015-09-01 19:50 GMT+02:00 Jan Schulze : > > Dear all, > > this is slurm 14.11.6 on a ROCKS 6.2 cluster. > > I have a perhaps trivial question concerning the user permissions for the > ssh login to computing nodes. The default setting allows our users to login > to

[slurm-dev] Re: Disk I/O as consumable?

2015-09-08 Thread Marcin Stolarek
2015-09-08 12:55 GMT+02:00 Raymond Wan : > > Dear all, > > I'm trying to figure out how to configure a "cluster" with a single > computer (i.e., execution and master node is the same). After I > figure this out, I hope that setting up a cluster with multiple nodes > is not

[slurm-dev] Re: Disk I/O as consumable?

2015-09-10 Thread Marcin Stolarek
2015-09-10 22:10 GMT+02:00 Kilian Cavalotti <kilian.cavalotti.w...@gmail.com >: > > On Tue, Sep 8, 2015 at 5:01 AM, Marcin Stolarek > <stolarek.mar...@gmail.com> wrote: > > using specified mountpoint, but... thats not real IOPS threshold. > Currently >

[slurm-dev] Re: Limit access to reconfiguration of Slurm (p.ex. accounting, limits) to certain hosts?

2015-09-30 Thread Marcin Stolarek
As far as I remember the easy way was to modify auth/munge not to trust root from particular host. Cheers, Marcin W dniu środa, 30 września 2015 Christopher Samuel napisał(a): > > On 30/09/15 01:07, Thomas Orgis wrote: > > > Given the traffic on this list on other topics,

[slurm-dev] Re: pam_slurm: how can I exclude some users from pam_slurm?

2015-09-25 Thread Marcin Stolarek
pam_listfile before pam_slurm with "sufficient" key word in pam.d/ssh configuration? cheers, marcin 2015-09-25 6:18 GMT+02:00 Koji Tanaka : > Hello Slurm Community, > > Is there a way to exclude some users from pam_slurm? > > I've successfully set up ssh restriction with

[slurm-dev] Re: TMPDIR, clean up and prolog/epilog

2016-06-26 Thread Marcin Stolarek
This was discussed numbers of times before. You can check the list archive, or start for instance with: https://github.com/fafik23/slurm_plugins/tree/master/bindtmp cheers marcin 2016-06-24 7:22 GMT+02:00 Lachlan Musicman : > We are transitioning from Torque/Maui to SLURM and

[slurm-dev] Re: DB on worker nodes

2016-03-24 Thread Marcin Stolarek
2016-03-24 1:26 GMT+01:00 Lachlan Musicman : > Hi, > > I'm just configuring a script to deploy worker nodes. I've realised that > version #1, made many moons ago, installed MySQL/MariaDB. > > But now that I look at my worker nodes, I don't think that they need mysql > on them.

[slurm-dev] Re: Get triggering node as command line argument in the triggered program?

2016-05-19 Thread Marcin Stolarek
check here: http://slurm.schedmd.com/strigger.html 1st paragraph ends with: "A hostlist expression for the nodelist or job ID is passed as an argument to the program." 2016-05-19 0:50 GMT+02:00 Michael Basilyan : > Looks like it already does pass it as an argument -- it's

[slurm-dev] Re: Change in RealMemory after OS upgrade

2016-05-24 Thread Marcin Stolarek
Hi Ray, Can you for example check if sys/systemcfg.h is available/absent on new and old system? Is hwloc installed on both ? cheers, marcin >

[slurm-dev] Re: slurm and ldap

2016-05-10 Thread Marcin Stolarek
2016-05-10 20:02 GMT+02:00 remi marchal : > dear Marcin > > here is the output of the getent > > remarche:*:1050:502: remarche:/home/users/remarche:/bin/bash > have you restarted slurmd after configuring ldap authenticatiion? If you have "CacheGroups" enabled in

[slurm-dev] Re: Change in RealMemory after OS upgrade

2016-05-16 Thread Marcin Stolarek
2016-05-17 5:22 GMT+02:00 Raymond Wan <rwan.w...@gmail.com>: > > Hi Marcin, > > > On Mon, May 16, 2016 at 5:51 PM, Marcin Stolarek > <stolarek.mar...@gmail.com> wrote: > > 2016-05-14 16:49 GMT+02:00 Raymond Wan <rwan.w...@gmail.com>: > >> Anyway,

[slurm-dev] Re: Change in RealMemory after OS upgrade

2016-05-16 Thread Marcin Stolarek
2016-05-14 16:49 GMT+02:00 Raymond Wan : > > Hi all, > > I've recently upgraded the Ubuntu OS on a few independent (i.e., they > are not part of the same cluster) servers. This brought me up from > version 14.11 of SLURM to 15.08 . > > While things have gone fine, on at

[slurm-dev] Re: Difficulty using reboot_nodes or similar for maintenance, SLURM 15.08

2016-05-10 Thread Marcin Stolarek
2016-05-07 6:43 GMT+02:00 Ryan Novosielski : > > Hi all, > > What I want to do is to be able to use reboot_nodes as it is described in > the manual. The trouble is that my nodes return to service before the user > filesystems are mounted. I haven't been able to resolve that

[slurm-dev] Re: slurm and ldap

2016-05-10 Thread Marcin Stolarek
2016-05-10 17:43 GMT+02:00 remi marchal : > Dear Slurm users, > > I am quite new in the slurm community. > > I set up a slurm cluster using a munge authentification and would like to > allow ldap users to submit jobs. > > ldap authentification works perfectly on the

[slurm-dev] Storage accounting, with web presentation

2016-07-28 Thread Marcin Stolarek
Hi, This is not related to slurm, but I don't know better place to ask such a question. For sure on clusters you manage there is a need to present space used by projects, users on particular file systems. I'am no aware of any opensource solution providing sth like "accounting portal" for storage,

[slurm-dev] Re: SlurmUser question

2016-07-21 Thread Marcin Stolarek
2016-07-21 6:59 GMT+02:00 Barbara Krasovec : > > I re-checked, it is recommended to run slurm controller as slurm user and > slurmd on worker nodes can run under any user.. By default they all run as > root. Slurm user doesn't need login privileges. > Slurmd needs to fork

[slurm-dev] Re: SPANK plugin to access job info at submission stage

2016-07-19 Thread Marcin Stolarek
check this: http://slurm.schedmd.com/job_submit_plugins.html However for directly accessing the options specified you probably need to work with wrapper. Inside the plugin you can work on job structure. cheers, marcin 2016-07-19 7:53 GMT+02:00 Yong Qin : > Hi, > > I'm

[slurm-dev] Process finished but jobs still "R" in squeue

2016-07-13 Thread Marcin Stolarek
Hi guys, I have a cluster with a few nodes. Users are submitting job arrays with like 50k tasks in array, some jobs are finishing within less than second. I observed a lot of tasks are in running state for a few minutes even if user process finished after a second. I'm using 14.11.6, anyone with

[slurm-dev] RE: Remote Visualization and Slurm

2016-08-17 Thread Marcin Stolarek
You can try http://www.qoscosgrid.org/trac/qcg cheers, Marcin 2016-08-17 17:31 GMT+02:00 John Hearns : > Nicholas, > As you say there are several solutions out there. > > The one I have has experience with is NICE Software, which I admit I > integrated with PBS Pro. >

[slurm-dev] Re: Power outage causes wrong reports

2017-02-21 Thread Marcin Stolarek
It you are on Slurm 16, you can try: sacctmgr show *RunawayJobsfrom man:*Used only with the *list* or *show* command to report current jobs that have been orphanded on the local cluster and are now runaway. If there are jobs in this state it will also give you an option to "fix" the 2017-01-23

[slurm-dev] Re: Rename a SLURM account?

2017-02-14 Thread Marcin Stolarek
Maybe setup a test account and try with simply update on database: update clustername_assoc_table set acct="new" where id_assoc=YOURASSOCID ? I haven't checked the details, but I think that only id_assoc may be used as foreign key, or assumed to be a key in slurm and acct should only be a string

[slurm-dev] Re: Stopping compute usage on login nodes

2017-02-10 Thread Marcin Stolarek
On the cluster I've been managing we had a solution with pam_script that was choosing for each user two random cores and bounding his session to those (if this is second session use the same cores). I think it's quite good solution, since 1) User is not able to take all server resources 2) The

[slurm-dev] Re: Numbering of physical and hyper cores

2017-02-14 Thread Marcin Stolarek
I don't think you can specify it. I believe slurm makes use of external library to recognize the cores topology. To check if this is a bug or designed behaviur, you can check: --hint=nomultithread option to srun cheers, Marcin 2017-02-09 8:21 GMT+01:00 Ulf Markwardt

[slurm-dev] Re: build slurm on musl?

2017-02-14 Thread Marcin Stolarek
can't you just use different libc (glibc) for slurm? 2017-01-16 20:31 GMT+01:00 Rowan, Jim : > > Hi, > > We're trying to bring up slurm (compute nodes, not the master) on a > platform that uses musl for libc. Musl doesn’t support lazy binding of > symbols in dynamic objects —

[slurm-dev] Re: Removing partition killed jobs

2017-02-14 Thread Marcin Stolarek
I think that generally you should set this partition to drain state. This will prevent new jobs submission, but allow all running and pending jobs execution. I don't think it's possible to change the partition for running jobs, for pending you could have done simply: scontrol update job=JOBID

[slurm-dev] RE: A little bit help from my slurm-friends

2017-02-16 Thread Marcin Stolarek
You can also have a submit plugin that will put job in multiple partition if non specified. This should reduce the drawback of multiple partitions. However I think that with features and topology plugin you should be able to aviod multiple partitions setup. cheers Marcin 2017-01-17 9:49

[slurm-dev] Re: Standard suspend/resume scripts?

2017-02-16 Thread Marcin Stolarek
I don't think that there is something good for everyone. It depends on the way you manage the cluster. If you have diskless/stateless nodes it can be also good to force power off with ipmitool. 2017-02-16 8:19 GMT+01:00 Loris Bennett : > > Lachlan Musicman

[slurm-dev] Re: SLURM daemon doesn't start

2016-08-30 Thread Marcin Stolarek
Just run slurmctld into the foreground, check the output. If you still don't know the cause of the problem paste a few lines here. cheers, Marcin

[slurm-dev] RE: Combating idle interactive sessions

2016-08-30 Thread Marcin Stolarek
Maybe you can create a partition/qos for interactive jobs and use job_submit plugin to force all interactive jobs to use it? cheers, Marcin 2016-08-30 8:09 GMT+02:00 John Hearns : > I worked on the same problem in my last job, where engineers had > interactive sessions

[slurm-dev] Re: Send mail from SLURM

2016-09-16 Thread Marcin Stolarek
yes, you need to set up mail sending :) check your MTA configuration. cheers marcin 2016-09-16 21:37 GMT+02:00 Fanny Pagés Díaz : > I need to send notifications from slurm to any mail? It works only for my > local mail. I have to make some settings? thanks >

[slurm-dev] Re: auto detect Node definition details

2016-10-02 Thread Marcin Stolarek
Yes, I think this is how it normally works, however I always prefered specifining resourses in slurm.conf Check the manual. Cheers Marcin W dniu czwartek, 29 września 2016 Tus napisał(a): > > Hi All, > > Is there a way to auto detect node details that go in slurm.conf? If I >

[slurm-dev] Re: sreport empty

2016-10-19 Thread Marcin Stolarek
Account field is empty, have you enabled AccountingStorageEnforce ? cheers, Marcin 2016-10-19 16:50 GMT+02:00 Russell Jones : > Hi all, > > This is on Slurm 14.11.7. > > When trying to get a report via sreport, regardless of if I run it as root > or a user, and on the

[slurm-dev] Re: prioritize based on walltime request

2016-11-27 Thread Marcin Stolarek
Probably the solution for you is to use ostrich priority plugin: http://www.mimuw.edu.pl/~krzadca/ostrich/ Maybe you have to play with campaign setup in the plugin code. cheers, Marcin

[slurm-dev] Re: Priority blocking jobs despite idle machines

2017-03-25 Thread Marcin Stolarek
Shorten your time specification for this job if possible. Ask your admins :) 2017-03-24 16:32 GMT+01:00 Stefan Doerr : > Hm I don't know how since only the admins have access to that stuff. I > could ask them if you could be a bit more specific :) > > On Fri, Mar 24, 2017

[slurm-dev] sreport inconsistency

2017-03-17 Thread Marcin Stolarek
I've observed that utlization and top users listing looks like inconsitent for me. Do I understand correctly thatt percent of used by users shoudl sum to percent of allocated for cluster utilization? cheers, Marcin # sreport cluster utilization Start=2017-03-01 -t percent

[slurm-dev] Re: Fwd: job requeued in held state

2017-04-03 Thread Marcin Stolarek
Have you checked slurmd/prologue logs? It looks like you job was eligible to run, but it failed to start on computing node. If it failed in prologue you can requeue the job without helding with SchedulerParameters=nohoold_on_prologue_fail. cheers, Marcin 2017-04-03 21:31 GMT+02:00 Chris Woelkers

[slurm-dev] Re: LDAP required?

2017-04-10 Thread Marcin Stolarek
but... is LDAP such a big issue? 2017-04-10 22:03 GMT+02:00 Jeff White : > Using Salt/Ansible/Chef/Puppet/Engine is another way to get it done. > Define your users in states/playbooks/whatever and don't bother with > painful LDAP or ancient NIS solutions. > > -- > Jeff White

[slurm-dev] RE: Job-Specific Working Directory on Local Scratch

2017-03-14 Thread Marcin Stolarek
What about setting a random named working directory in submit plugin and creation of this directory in prologue?

[slurm-dev] Re: Can you run SLURM on a single node ?

2017-08-17 Thread Marcin Stolarek
However I'd advice you to create a VM with dedicated CPUs for "login node". If you allow people to login to the one node which is also computing node you have to bind ssh processes to dedicated CPU to prevent resources usage outside of slurm.. 2017-08-10 15:15 GMT+02:00 Benjamin Redling

[slurm-dev] Re: How to print information message at submission time.

2017-06-19 Thread Marcin Stolarek
I don't think it's possible to print a message if job is accepted to the queue.. 2017-06-19 18:16 GMT+02:00 : > Hello, > > I'm using *job_submit* plugin (C langage) to manage users job submission > on ours systems. > > I would like to print an information message at the terminal

[slurm-dev] Re: Looking for distributions of wait times for jobs submitted over the past year

2017-06-15 Thread Marcin Stolarek
My advice would be to export accounting data to xdmod Cheers Marcin W dniu czw., 15.06.2017 o 18:17 Barry Moore napisał(a): > Hey All, > > Does anyone have a script or knowledge of how to query wait times for > Slurm jobs in the last year or so? > > Thank you, > > Barry >

[slurm-dev] Re: Looking for distributions of wait times for jobs submitted over the past year

2017-06-17 Thread Marcin Stolarek
W dniu czw., 15.06.2017 o 21:58 Barry Moore napisał(a): > Thanks alot! These will be very helpful! > > On Thu, Jun 15, 2017 at 3:52 PM, Kilian Cavalotti < > kilian.cavalotti.w...@gmail.com> wrote: > >> >> Hi Barry, >> >> On Thu, Jun 15, 2017 at 9:16 AM, Barry Moore

[slurm-dev] Re: Accounting using LDAP ?

2017-09-20 Thread Marcin Stolarek
Christopher, If you want to use advanced slurm features you'll have to disable slurm management in Bright. It provides really basic functionalities for those who would like to start with the cluster very fast. However, when you're configuration complexity grows, you have to manage slurm

[slurm-dev] Re: defaults, passwd and data

2017-09-24 Thread Marcin Stolarek
2017-09-24 9:13 GMT+02:00 Lachlan Musicman : > > On 24 September 2017 at 16:20, Daniel Letai wrote: > >> Hello, >> >> B. We have active directory(AD) in our faculty, and We prefer manage >> users/groups from there , is it possible? any guide available

[slurm-dev] Re: How to limit number of running jobs in a partition?

2017-09-28 Thread Marcin Stolarek
You may consider specification of licenses in slurm.conf and for jobs. For example license=storage1*3 and #SBATCH -l storage1. If you cannot rely on users you can use submit plugins and namespaces to umount storage for jobs without specific license. cheers, Marcin 2017-09-28 17:12 GMT+02:00 E V

  1   2   >