[slurm-dev] submitting 100k job array causes slurmctld to socket timeout

2015-03-15 Thread Daniel Letai
Hi, Testing a new slurm cluster (14.11.4) on a 1k nodes cluster. Several things we've tried: Increase slurmctld threads (8 ports range) Increase munge threads (threads=10) Increase messageTimeout to 30 We are using accounting (db on different server) Thanks for any help

[slurm-dev] stdscr and other symbols from libncurses are actually defined in libtinfo

2015-05-09 Thread Daniel Letai
FYI If using an unpatched gnu binutils to link slurm, it will not build as the symbols are undefined in libncurses. Adding -ltinfo should solve the issue. Other option is to us -Wl,--copy-dt-needed-entries , as that option default has changed in recent gnu ld. Redhat in fedora provide their

[slurm-dev] Re: stdscr and other symbols from libncurses are actually defined in libtinfo

2015-05-09 Thread Daniel Letai
ference to 'stdscr' smap.c:319: error: undefined reference to 'nodelay' smap.c:366: error: undefined reference to 'curs_set' collect2: ld returned 1 exit status and stdscr reference: $ readelf -Ws /usr/lib64/libncurses.so | grep stdscr 12:

[slurm-dev] Re: stdscr and other symbols from libncurses are actually defined in libtinfo

2015-05-09 Thread Daniel Letai
uot;; then - NCURSES="-lncurses" + NCURSES="-ltinfo -lncurses" NCURSES_HEADER="ncurses.h" ac_have_some_curses="yes" elif test "$ac_have_curses" = "yes"; then On 05/09/2015 11:32 AM, Daniel Letai wrote: Fo

[slurm-dev] Re: stdscr and other symbols from libncurses are actually defined in libtinfo

2015-05-09 Thread Daniel Letai
then - NCURSES="-lncurses" + NCURSES="-ltinfo -lncurses" NCURSES_HEADER="ncurses.h" ac_have_some_curses="yes" elif test "$ac_have_curses" = "yes"; then On 05/09/2015 12:31 PM, Daniel Letai wrote: Here's a sample patch:

[slurm-dev] Re: Can slurm signal the end of jobs?

2015-05-13 Thread Daniel Letai
Have you looked into epilog as a means to start your analysis automatically? On 05/13/2015 05:33 PM, Trevor Gale wrote: Hey all, I was just wondering if there is any mechanism built into slurm to signal to the user when jobs are done (other than email). I’m making a script to run a series of

[slurm-dev] Re: Can slurm signal the end of jobs?

2015-05-17 Thread Daniel Letai
? Thanks, Trevor > On May 13, 2015, at 3:21 PM, Daniel Letai wrote: > > > Have you looked into epilog as a means to start your analysis automatically? > > On 05/13/2015 05:33 PM, Trevor Gale wrote: >> Hey all, >> >>

[slurm-dev] slurm and mariadb-galera

2015-05-27 Thread Daniel Letai
I know this was previously discussed but I just ran into the same issue today - 3 tables do not have a primary key: *_last_ran_table (I use hourly_rollup) *_suspend_table (I use job_db_inx) and jobcomp_table I'm thinking of using jobid, but would like your input. Thanks in advance, --Dani_L.

[slurm-dev] Re: slurm and mariadb-galera

2015-05-27 Thread Daniel Letai
I had to use composite pk as jobid is not unique (requeue recreates it). I used jobid,starttime,endtime. Is there a better solution? On 05/27/2015 06:13 PM, Daniel Letai wrote: I know this was previously discussed but I just ran into the same issue today - 3 tables do not have a primary

[slurm-dev] Re: Compute nodes not taking more then 1 jobs

2015-06-17 Thread Daniel Letai
Are you sure you set SelectType=select/cons_res? It seems from your description that slurm allocates entire nodes for jobs. On 06/17/2015 10:28 AM, Saerda Halifu wrote: Hi, I just updated to slurm 14.11.7 , and having following issue. I have nodes with 32 cores, they all have 1 core job alloca

[slurm-dev] Re: Set 1 job per core.

2015-06-17 Thread Daniel Letai
What should be the default? Suppose I have 10G mem nodes, each with a 10 core socket. Should I allocate 1G==1000M in DefMemPerCPU, or just 1M to force users to actually use --mem in their job submissions? On 06/09/2015 06:12 PM, Moe Jette wrote: Slurm has always allocated all of the memory o

[slurm-dev] Re: forward pending jobs automatically

2015-06-17 Thread Daniel Letai
Without looking at the code, I'd assume the slurmctld is responsible for allocating job ids. Consider submitting arrays - the individual elements are not assigned job id until they enter the running state. Considering that, I'd look for job id assignment in the slurmctld code (e.g. job_scheduler

[slurm-dev] setting DefMemPerCPU in a heterogeneous cluster

2015-06-17 Thread Daniel Letai
Currently I have 2 types of nodes: old = 2 sockets, 4 cores per socket, 64GB mem new = 2 sockets, 6 cores per socket, 128GB mem Since I'm using select/cr_cons and using CR_CPU_Memory, I thought I'd assign as default the relative amount of memory per core, old - DefMemPerCPU = 8000 new - DefMem

[slurm-dev] Re: setting DefMemPerCPU in a heterogeneous cluster

2015-06-18 Thread Daniel Letai
u.edu <mailto:treyd...@tamu.edu> On Wed, Jun 17, 2015 at 9:22 AM, Daniel Letai <mailto:d...@letai.org.il>> wrote: Currently I have 2 types of nodes: old = 2 sockets, 4 cores per socket, 64GB mem new = 2 sockets, 6 cores per socket, 128GB mem Since I'm using sele

[slurm-dev] building slurm 15.08.0 with json support

2015-09-08 Thread Daniel Letai
Hi, I've noticed configure checks for json parser availability, however in rhel6 based systems the json-c-devel rpm from epel(6) installs to /usr/include/json while the configure check is for /usr/include/json-c (configure:19424) BTW, when building without packaging (i.e. tar xf slurm-15

[slurm-dev] Re: building slurm 15.08.0 with json support

2015-09-10 Thread Daniel Letai
On 09/09/2015 01:53 AM, Moe Jette wrote: Quoting Daniel Letai : Hi, I've noticed configure checks for json parser availability, however in rhel6 based systems the json-c-devel rpm from epel(6) installs to /usr/include/json while the configure check is for /usr/include/json-c (conf

[slurm-dev] Re: building slurm 15.08.0 with json support

2015-09-10 Thread Daniel Letai
Sorry, was sent to list by mistake, please delete On 09/10/2015 04:23 PM, Daniel Letai wrote: On 09/09/2015 01:53 AM, Moe Jette wrote: Quoting Daniel Letai : Hi, I've noticed configure checks for json parser availability, however in rhel6 based systems the json-c-devel rpm from e

[slurm-dev] Re: Dividing a 2cpu+2gpu machine into two independent blocks of 1cpu+1gpu each.

2015-09-17 Thread Daniel Letai
Basically build a socket based scheduling (using sockets instead of cores), and build a gres configuration for the GPUs, 2 lines - 1 with CPUs=0-9, the other with CPUs=10-19 see http://slurm.schedmd.com/gres.html and http://slurm.schedmd.com/slurm.conf.html (search for CR_Socket) I'm not sure

[slurm-dev] Re: Slurm, job preemption, and swap.

2015-10-26 Thread Daniel Letai
It would be easy if there was a way to force TRES allocation/reconfiguration, e.g. Add the swap as GRES/swap, and on suspend transfer the allocation from TRES=mem=64,GRES/swap=0 to TRES=mem=0,GRES/swap=64. Then you could start the new job which requires available mem. Will it be possible to

[slurm-dev] Re: Fwd: SLURM : how to have a round-robin across nodes based on load average?

2015-11-18 Thread Daniel Letai
I'm curious - what would be he point of such scheduling? I tried to think about a scenario in which such a setting would gain me anything significant and came up with nothing. What is the advantage of this distribution? On 11/18/2015 08:37 AM, cips bmkg wrote: Fwd: SLURM : how to have a round-

[slurm-dev] A floating exclusive partition

2015-11-19 Thread Daniel Letai
Hi, Suppose I have a 100 node cluster with ~5% nodes down at any given time (maintanence/hw failure/...). One of the projects requires exclusive use of 5 nodes, and be able to use entire cluster when available (when other projects aren't running). I can do this easily if I maintain a stati

[slurm-dev] Re: A floating exclusive partition

2015-11-19 Thread Daniel Letai
-Paul Edmon- On 11/19/2015 04:49 AM, Daniel Letai wrote: Hi, Suppose I have a 100 node cluster with ~5% nodes down at any given time (maintanence/hw failure/...). One of the projects requires exclusive use of 5 nodes, and be able to use entire cluster when available (when other projects a

[slurm-dev] Re: A floating exclusive partition

2015-11-19 Thread Daniel Letai
's less seamless to the users as they will have to consciously monitor what is going on. -Paul Edmon- On 11/19/2015 10:50 AM, Daniel Letai wrote: Can you elaborate a little? I'm not sure what kind of QoS will help, nor how to implement one that will satisfy the requirements.

[slurm-dev] Re: A floating exclusive partition

2015-11-21 Thread Daniel Letai
so that the project cannot dominate the partition; Reservations could be used too, but you'd need to define at a minimum a start time and duration - and when not in use the hardware would be idle and unavailable to other users. John DeSantis 2015-11-19 13:31 GMT-05:00 Daniel Letai :

[slurm-dev] Re: A floating exclusive partition

2015-11-23 Thread Daniel Letai
ail nodes in special, and the second would run 1 element on special. Would it then use public for the other 3 elements (provided public has some idle nodes)? HTH! John DeSantis Thanks for your input, it's very helpful :) --Dani_L. 2015-11-21 11:29 GMT-05:00 Daniel Letai : John, That's cor

[slurm-dev] Re: Help: SLURM will not start on either nodes after setup.

2015-11-23 Thread Daniel Letai
Hi, I've run into the same issue with slurm-15.08.3, OS: RHEL 6.5 x64. slurmctld is reading the SlurmUser setting and starts as user slurm, however slurmd doesn't respect the SlurmdUser config. If SlurmdUser is commented out, slurmd starts as user root (in accordance with documentation) - co

[slurm-dev] Building slurm rpms with netloc installed fails.

2015-12-20 Thread Daniel Letai
Hi, I'm trying to build slurm rpms using: rpmbuild -ta slurm-15.08.6.tar.bz2 And I'm getting errors in netloc_to_topology.c (from contrib/sgi) I've tried rpmbuild -ta --define '%_without_netloc 1' ... but I got the same errors (not surprising, as the spec file has no provision for netloc in th

[slurm-dev] Re: slurm cluster design

2016-01-05 Thread Daniel Letai
On 01/05/2016 05:25 PM, GOLPAYEGANI, NAVID (GSFC-6190) wrote: Thank you for the quick response. See below for my reply. On 1/4/16, 6:25 PM, "je...@schedmd.com" wrote: Quoting "GOLPAYEGANI, NAVID (GSFC-6190)" : Hi, SLURM newbie here. Anybody have suggestions on how to do the scheduling f

[slurm-dev] Re: problem with start slurm

2016-01-05 Thread Daniel Letai
MS AD to control linux nodes? too much overhead and susceptible to errors. If the headnode is a VM, you must time-sync the host, not the guest, as in most paravirt environments the guest would automatically adjust it's clock to host. On 01/04/2016 10:54 PM, Dennis Mungai wrote: Hello Fany,

[slurm-dev] Re: more detailed installation guide

2016-01-05 Thread Daniel Letai
Just one comment regarding openmpi building: https://wiki.fysik.dtu.dk/niflheim/SLURM#mpi-setup - At least with regard to openmpi, it should be built --with-pmi On 01/05/2016 01:26 PM, Ole Holm Nielsen wrote: On 01/05/2016 12:12 PM, Randy Bin Lin wrote: I was wondering if anyone has a more

[slurm-dev] Re: Multiple Clusters with different OS can I consolidate

2016-01-05 Thread Daniel Letai
Another way to do this would be with features. Then mpi jobs must specify the feature=OSstring to run, and others can run without the feature. I'd use partitions to differentiate HW, not OS, but that's just my personal bias. The only issue you'd have is building the slurm rpms 3 times each ti

[slurm-dev] Re: "User not found on host" while user exists on LDAP

2016-01-05 Thread Daniel Letai
What's the nsswitch like on the node? from the node, can you do: # getent passwd | grep On 01/05/2016 08:31 PM, Koji Tanaka wrote: Hello Slurm Community, I get following errors when I run a job as a LDAP user. However, as a local user, everything works fine. $ srun -N1 hostname srun: error:

[slurm-dev] Re: SLURM, hyperthreads, and hints

2016-01-06 Thread Daniel Letai
Just a couple of observations 1) Naively you can create a skeleton sbatch template which normal jobs would use by default, and another that multi-thread jobs must specifically request. Populate the templates via a wrapper script or submit plugin - use only sbatch, not srun. Other option would b

[slurm-dev] Re: slurm job array limit?

2016-01-07 Thread Daniel Letai
Your MaxJobCount/MinJobAge combo might be too high, and the slurmctld is exhausting physical memory, resorting to swap which slows it down thus exceeding it's scheduling loop time window. You might wish to increase the scheduling loop duration as per http://slurm.schedmd.com/slurm.conf.html#OP

[slurm-dev] dynamic gres scheduling

2016-02-03 Thread Daniel Letai
A similar question has been asked before (not by me), without an answer: https://groups.google.com/forum/?hl=en#!topic/slurm-devel/4xkvs0dgYu8 Specifically - suppose I have a gpu cluster, 2 gpus per node, where some gpus might or might not function correctly (due to heat/fw issues/malfunction/

[slurm-dev] Re: How to limit number nodes used at the same time in a partition

2016-02-18 Thread Daniel Letai
body p { margin-bottom: 0cm; margin-top: 0pt; } 1. Regarding QOS - did you set slurm.conf to enforce qos limits?  *AccountingStorageEnforce=*limits,qos 2. Regarding your original question "limit the number of nodes allocated at the same time to a partion" - I'm not sure what you mean b

[slurm-dev] Re: [slurm-announce] Slurm version 15.08.8 and 16.05.0-pre1 now available

2016-02-19 Thread Daniel Letai
body p { margin-bottom: 0cm; margin-top: 0pt; } On 02/19/2016 01:44 AM, je...@schedmd.com wrote: Slurm version 15.08.8 is now available and includes about 30 bug fixes developed over the past four weeks. Slurm version 16.05.0-pre1 is also available and includes new develo

[slurm-dev] Where should I report bugs/issues ?

2016-03-01 Thread Daniel Letai
body p { margin-bottom: 0cm; margin-top: 0pt; } Hi, I have recently opened a bug in https://bugs.schedmd.com/ but I have now discovered github  https://github.com/SchedMD/slurm also seems quite active - should I have opened the issue in github? Whats the correct/preferred way to report

[slurm-dev] Re: Where should I report bugs/issues ?

2016-03-01 Thread Daniel Letai
body p { margin-bottom: 0cm; margin-top: 0pt; } Please disregard - slurm on github has "issues" disabled. My mistake. On 03/01/2016 05:39 PM, Daniel Letai wrote: Where should I report bugs/issues ? body p { margin-bottom: 0cm; margin-top: 0pt; } Hi, I hav

[slurm-dev] Re: One CPU always reserved for one GPU

2016-03-03 Thread Daniel Letai
body p { margin-bottom: 0cm; margin-top: 0pt; } Correct me if I'm wrong, but I don't see any NUMA based reservation of the CPUs - Do you ensure that each reserved cpu is from a different socket, and GPU jobs affinity is to correct NUMA node? On 03/02/2016 12:30 AM, Lachele Foley wrote:

[slurm-dev] Re: Patch for health check during slurmd start

2016-03-10 Thread Daniel Letai
body p { margin-bottom: 0cm; margin-top: 0pt; } We went a different route - a healthcheck agent script will modify the node's features, adding "functional" once all pertinent metrics are met, including gpfs mount. in slurm.conf the feature doesn't exist, and sbatch template has --constrain

[slurm-dev] Re: What cluster provisioning system do you use?

2016-03-15 Thread Daniel Letai
body p { margin-bottom: 0cm; margin-top: 0pt; } Another vote for xCAT here - been using it for ~3 years now, on installations ranging from 8 to 1+k nodes. Once you get to know xCAT it's quite easy to manage, although familiarity with perl will help in any troubleshooting or customization (

[slurm-dev] Re: Question from Vladivostok.

2016-04-15 Thread Daniel Letai
body p { margin-bottom: 0cm; margin-top: 0pt; } This is somewhat convoluted, but you might achieve this with gres.conf similar to Name=gpu CPUs=0,1 Name=gpu CPUs=10,11 Name=cpu CPUs=2-9,12-19 count=16 and when submitting a job sbatch --gres=gpu:1 Or sbatch --gres=cpu:16 Or sbatch

[slurm-dev] Re: Bug report from Vladivostok.

2016-05-06 Thread Daniel Letai
body p { margin-bottom: 0cm; margin-top: 0pt; } Looking at your patch, and without reviewing the code, I have one question - is it possible for core 'c' not to be in core_map, nor in part_core_map? I'm only asking because that case doesn't seem to be covered by your patch (A private case wo

[slurm-dev] Re: How to setup slurm database accounting feature

2016-05-21 Thread Daniel Letai
Does the socket file exists? What's in your /etc/my.cnf (or my.cnf.d/some other config file) under [mysqld]? [mysqld] socket=/path/to/datadir/mysql/mysql.sock If a socket value doesn't exist, either create one, or create a link between the actual socket file and /var/run/mysqld/mysqld.sock BT

[slurm-dev] Re: How to setup slurm database accounting feature

2016-05-22 Thread Daniel Letai
his ? Thank you in advance. Regards, Husen On Sat, May 21, 2016 at 6:28 PM, Daniel Letai wrote: Does the socket file exists? What's in your /etc/my.cnf (or my.cnf.d/s

[slurm-dev] Re: Prevision to make a RedHat repo?

2016-05-24 Thread Daniel Letai
body p { margin-bottom: 0cm; margin-top: 0pt; } The tar file contains a spec, so it's easy to just rpmbuild -ta slurm-XXX.tar.bz2 and createrepo on the rpms. If you require any special options during build, this is preferable to using an end result rpms, as it's quite easy to use defines

[slurm-dev] Re: Tree Topology & Node Weights

2016-05-26 Thread Daniel Letai
body p { margin-bottom: 0cm; margin-top: 0pt; } How about setting them as multiple clusters, instead of multiple partitions? Use sbatch -M cluster1,cluster2 to try to submit to both clusters, with the first one that accepts the job cancels it for the other. I don't think it's possible to

[slurm-dev] Re: Tree Topology & Node Weights

2016-05-26 Thread Daniel Letai
body p { margin-bottom: 0cm; margin-top: 0pt; } Other option might be to use constraints sbatch -C "part1|part2" will ensure the job runs only on one of the partitions, and then use node weights as normal, without topology. On 05/26/2016 12:22 PM, Stuart Franks wrote: Hi There,

[slurm-dev] Re: Tree Topology & Node Weights

2016-05-26 Thread Daniel Letai
body p { margin-bottom: 0cm; margin-top: 0pt; } Forgot the square brackets for the constraints option: sbatch -C "[part1|part2]" On 05/26/2016 12:22 PM, Stuart Franks wrote: Hi There, I've recently setup SLURM at our office and have been struggling to get weights to w

[slurm-dev] Re: 16.05?

2016-05-29 Thread Daniel Letai
body p { margin-bottom: 0cm; margin-top: 0pt; } You could just retag it 16.06 and remove some of the "deadline" pressure ;-) On 05/30/2016 09:03 AM, Lachlan Musicman wrote: Re: [slurm-dev] Re: 16.05? Fantastic - thanks. I am also going to presume that Tuesday is PDT (

[slurm-dev] Re: Building SLURM

2016-06-02 Thread Daniel Letai
body p { margin-bottom: 0cm; margin-top: 0pt; } IIRC slurmdb-direct is only used when accounting doesn't use dbd, and the slurmctld accesses a db directly. I might be wrong, though. On 06/02/2016 04:06 AM, Lachlan Musicman wrote: Re: Building SLURM Actually, I've now also n

[slurm-dev] Re: Processes sharing cores

2016-06-07 Thread Daniel Letai
body p { margin-bottom: 0cm; margin-top: 0pt; } As a workaround - can you test srun --cpu_bind=verbose,map_cpu: mpirun -slot-list $SBATCH_CPU_BIND_LIST I'm thinking -slot-list doesn't handle cpu masks, and slurm should provide an explicit list of IDs. On 06/07/2016 04:14 PM, Jason Bac

[slurm-dev] Re: Looking for a helping hand: multiple partitions with and without preemption

2016-06-07 Thread Daniel Letai
body p { margin-bottom: 0cm; margin-top: 0pt; } On 06/07/2016 05:24 PM, Steffen Grunewald wrote: On Tue, 2016-06-07 at 05:43:19 -0700, Steffen Grunewald wrote: Good afternoon, I'm looking for a (simple) set of preemption rules for the following planned setup: - three partitions: "urgent"

[slurm-dev] Re: ibrun's substitute on SLURM

2016-06-14 Thread Daniel Letai
body p { margin-bottom: 0cm; margin-top: 0pt; } Possibly with `--multi-prog` as per http://slurm.schedmd.com/srun.html#SECTION_MULTIPLE-PROGRAM-CONFIGURATION On 06/13/2016 06:26 PM, Akhilesh Mishra wrote: ibrun's substitute on SLURM Dear Developers and SLURM users,

[slurm-dev] Re: Accounting, users, associations and Partitions

2016-06-29 Thread Daniel Letai
Accounting, users, associations and Partitions Seems like it should 1http://slurm.schedmd.com/sacctmgr.html#SECTION_SPECIFICATIONS-FOR-USERS Add a 'partition=' to the user creation. On 06/29/2016 10:03 AM, Lachlan Musicman wrote: Is it possible to set a Default Partition against a

[slurm-dev] Re: how to monitor CPU/RAM usage on each node of a slurm job? python API?

2016-09-18 Thread Daniel Letai
how to monitor CPU/RAM usage on each node of a slurm job? python API? You should use HDF5 1http://slurm.schedmd.com/hdf5_profile_user_guide.html On 09/19/2016 03:41 AM, Igor Yakushin wrote: Hi All, I'd like to be able to see for a given jobid how much resources are used b

[slurm-dev] Re: how to monitor CPU/RAM usage on each node of a slurm job? python API?

2016-09-18 Thread Daniel Letai
;,')" On 09/19/2016 08:28 AM, Daniel Letai wrote: Re: [slurm-dev] how to monitor CPU/RAM usage on each node of a slurm job? python API? You should use HDF5 http://slurm.schedmd.com/hdf5_profile_user_guide.html On 09/19/2016 03:41 AM, Igor Yakushin wrote: how to monitor CPU

[slurm-dev] Re: Best way to control synchronized clocks in cluster?

2016-10-06 Thread Daniel Letai
One simple thing to do is enable 1http://slurm.schedmd.com/slurm.conf.html#OPT_HealthCheckProgram and use a simple script along the lines of: #!/bin/bash ntpdate -u ntpsserver.cluster.local ; rc=$? [[ rc -ne 0 ]] && scontrol update NodeName=$HOSTNAME State=drain Reason=ntp_

[slurm-dev] Gres function node_config_load() clarification

2016-10-06 Thread Daniel Letai
In the gres.conf man page it's mentioned that "If generic resource counts are set by the gres plugin function node_config_load(), this file may be optional." When looking at http://slurm.schedmd.com/gres_plugins.html I can't figure out from the description for node_config_load() how to remove

[slurm-dev] Re: How to account how many cpus/gpus per node has been allocated to a specific job?

2016-11-15 Thread Daniel Letai
Re: [slurm-dev] Re: How to account how many cpus/gpus per node has been allocated to a specific job? You should be able to do this with profiling data: 1https://slurm.schedmd.com/hdf5_profile_user_guide.html Just use the jobacct_gather plugin. This is very probably an overki

[slurm-dev] Re: Slurm version 17.02.0 is now available

2017-02-27 Thread Daniel Letai
$ git diff diff --git a/slurm.spec b/slurm.spec index 941b360..6bb3014 100644 --- a/slurm.spec +++ b/slurm.spec @@ -346,6 +346,7 @@ Includes the Slurm proctrack/lua and job_submit/lua plugin Summary: Perl tool to print Slurm job state information Group: Development/System

[slurm-dev] Re: GUI Job submission portal

2017-04-21 Thread Daniel Letai
Looks like an 1XY question What do you wish the scheduler to do, specifically? SLURM is CLI centric, but it's possible to use a 2web portal through 3rd party extension On 04/20/2017 01:22 PM, Parag Khuraswar wrote: \[\-\-ci8hmWUpDM_TMP[if gte mso 9]> Hi All,

[slurm-dev] Gres no_consume help required

2017-05-28 Thread Daniel Letai
Hello all, I'm having some trouble with no_consume gres. Specifically, I'm trying to use a certain gres as a feature, but prefer it to be gres so I can track it's usage/consumption. relevant potion of slurm.conf: GresTypes=b AccountingStorageTRES=gres/b NodeName=n01

[slurm-dev] Re: slurm.conf for single node

2017-06-28 Thread Daniel Letai
Title: Re: [slurm-dev] slurm.conf for single node Here's a complete slurm.conf which I often use for testing/debugging. You can safely drop the GresTypes,DebugFlags lines, or add any other from man slurm.conf ControlMachine=localhost AuthType=au

[slurm-dev] Re: Change in srun ?

2017-07-19 Thread Daniel Letai
Title: Re: [slurm-dev] Re: Change in srun ? Did you rebuild mpi with flag "--with-pmi" pointing at slurm's include dir, after making sure slurm has pmi.h there? I usually build with both --with-pmi and --with-slurm, although the last one should be enabled by default.

[slurm-dev] Re: Change in srun ?

2017-07-19 Thread Daniel Letai
Title: Re: [slurm-dev] Re: Change in srun ? Take a look at https://www.cyberciti.biz/faq/rebuilding-ubuntu-debian-linux-binary-package/ Make sure slurm is already installed before rebuilding the package, and pass the correct configure flags (--with-pmi= --with-slurm

[slurm-dev] Re: Exceeded job memory limit problem

2017-09-06 Thread Daniel Letai
Title: Re: [slurm-dev] Re: Exceeded job memory limit problem Do you have enough memory on your nodes? what's the output of sinfo -n -O nodelist,memory:20 You might not have enough memory on the nodes for the dataset. On 09/06/2017 10:36 AM, Sema Atasever

[slurm-dev] Re: defaults, passwd and data

2017-09-23 Thread Daniel Letai
Title: Re: [slurm-dev] defaults, passwd and data Hello, On 09/24/2017 08:35 AM, Nadav Toledo wrote: defaults, passwd and data Hey all, We are trying to setup a Slurm cluster for both cpu and gpu partition