[slurm-dev] Slurm version 17.11.0-0rc1 is now available

2017-10-24 Thread Moe Jette
We are pleased to announce the availability of Slurm version 17.11.0-0rc1 (release candidate 1). Production release of version 17.11 is expected in November. Interested parties are invited to test this pre-release. Slurm can be downloaded from https://www.schedmd.com/downloads.php Major changes

[slurm-dev] Re: Slurm API thread safety and concurrency

2017-10-16 Thread Moe Jette
All of the calls are thread safe On October 15, 2017 11:37:55 AM MDT, "Frank Ramirez, Alvaro" wrote: >Hi all, > > >New user but long time user and plugin developer here. > > >I am currently porting an automated set of virtualization triggers for >slurm to c/c++. > > >As

[slurm-dev] Slurm versions 17.02.7 and 17.11.0-pre2 are now available

2017-08-15 Thread Moe Jette
Slurm version 17.02.7 contains about 35 bug fixes developed over the past six weeks. Slurm version 17.11.0-pre2 is the second pre-release of version 17.11, to be released in November 2017. Slurm downloads are available from http://www.schedmd.com/#repos Details about the changes in each

[slurm-dev] Re: Job priority calculation when submitted to multiple partitions with different priorities

2017-08-15 Thread Moe Jette
Per-partition priority information will be available in Slurm version 17.11 On Tue, Aug 15, 2017, at 09:33 AM, Skouson, Gary B wrote: > I've also seen that. I'm not sure it's a "bug". It's just a result of > the current structure of the code. > > The job structure doesn't have a place to put

[slurm-dev] Slurm version 17.02.6 is now available

2017-07-05 Thread Moe Jette
Slurm version 17.02.6 is now available. It contains several bug fixes, including one which can result in communications between the slurmctld and slurmdbd daemons stopping. A description of all changes follows. Slurm downloads are available from http://www.schedmd.com/#repos * Changes in Slurm

[slurm-dev] Re: #SBATCH --time= not always overriding default?

2017-06-30 Thread Moe Jette
wrong its passed as an argument to the batch script On Fri, Jun 30, 2017, at 06:14 AM, Nathan Vance wrote: > L, > I discovered earlier this week (after embarking on a crusade to squash > nonexistent bugs) that sbatch will ignore any flag that comes after > the filename of the script to be

[slurm-dev] Slurm versions 17.02.5 and 17.11.0-pre1 are now available

2017-06-22 Thread Moe Jette
Slurm version 17.02.5 contains 18 bug fixes developed over the past month. Slurm version 17.11.0-pre1 is the first pre-release of version 17.11, to be released in November 2017. This version contains the support for scheduling of a workload across a set (federation) of clusters which is

[slurm-dev] Slurm version 15.08.3 now available

2015-11-04 Thread Moe Jette
tions. -- Modifications to pam_slurm_adopt to work correctly for the "extern" step. -- Alphabetize debugflags when printing them out. -- Fix systemd's slurmd service from killing slurmstepds on shutdown. -- Fixed counter of not indexed jobs, error_cnt post-increment changed to pre-increment. -- Morris "Moe" Jette CTO, SchedMD LLC Commercial Slurm Development and Support

[slurm-dev] Slurm version 15.08.2 now available, SC15 News

2015-10-22 Thread Moe Jette
t variables accordingly. -- MYSQL - Make sure suspended time is only subtracted from the CPU TRES as it is the only TRES that can be given to another job while suspended. -- Clarify how TRESBillingWeights operates on memory and burst buffers. -- Morris "Moe" Jette CTO, SchedMD LLC Commercial Slurm Development and Support

[slurm-dev] Re: slurm-dev SLURM environment variable SLURM_JOB_NODELIST

2015-09-29 Thread Moe Jette
DELIST be used as an alternative of SLURM_JOB_NODELIST in all described cases? Sorry for the inconsistency. I believe the answer is yes. -- Morris "Moe" Jette CTO, SchedMD LLC Commercial Slurm Development and Support

[slurm-dev] Re: Nodes dropping offline after replacing switch

2015-09-25 Thread Moe Jette
changed anything, only just rebooted cluster. -- Bob Healey Systems Administrator Biocomputation and Bioinformatics Constellation and Molecularium hea...@rpi.edu (518) 276-4407 -- Morris "Moe" Jette CTO, SchedMD LLC Commercial Slurm Development and Support

[slurm-dev] Re: jobs arrays & slurm_load_job changes in 14.11+

2015-09-25 Thread Moe Jette
onfirmation from them at the moment). All the best, Chris -- Christopher SamuelSenior Systems Administrator VLSCI - Victorian Life Sciences Computation Initiative Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545 http://www.vlsci.org.au/ http://twitter.com/vlsci -- Morris &qu

[slurm-dev] Slurm version 15.08.1 is now availabe

2015-09-25 Thread Moe Jette
option. -- Fix TRES counts on GRES on a clean start of the slurmctld. -- Add ability to change a job array's maximum running task count: "scontrol update jobid=# arraytaskthrottle=#" -- For pending jobs have sacct print 0 for nnodes instead of the bogus 2. -- Morris "Moe" J

[slurm-dev] Re: clarification of resource limits

2015-09-24 Thread Moe Jette
we get to #4, we find that the limit is 2 and thus this becomes the new limiting factor. Is this the correct interpretation? The final goal here is to have a fairly lenient policy but have the ability to penalize certain users who abuse the system. Thanks, Bill -- Morris "Moe

[slurm-dev] Re: Compile of 15.08.0 fails on trusty missing mpio.h

2015-09-22 Thread Moe Jette
All the best, Chris -- Christopher SamuelSenior Systems Administrator VLSCI - Victorian Life Sciences Computation Initiative Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545 http://www.vlsci.org.au/ http://twitter.com/vlsci -- Morris "Moe" Jette CTO, SchedMD LLC Commercial Slurm Development and Support

[slurm-dev] Re: Large job socket timed out errors.

2015-09-22 Thread Moe Jette
I suspect that you are hitting some Linux system limit, such as open files, or socket backlog. For information on how to address, see: http://slurm.schedmd.com/big_sys.html Quoting Timothy Brown <timothy.brow...@colorado.edu>: Hi Moe, On Sep 21, 2015, at 10:02 PM, Moe Jet

[slurm-dev] Re: Large job socket timed out errors.

2015-09-21 Thread Moe Jette
tion (and possibly not increase the logging of slurm, which is my only next idea). Thanks for your thoughts Chris. Timothy= -- Morris "Moe" Jette CTO, SchedMD LLC Commercial Slurm Development and Support

[slurm-dev] Slurm User Group Meeting, presentations online

2015-09-21 Thread Moe Jette
Thanks to everyone who helped make the Slurm User Group Meeting last week a big success. Copies of the presentations are now on-line here: http://slurm.schedmd.com/publications.html -- Morris "Moe" Jette CTO, SchedMD LLC Commercial Slurm Development and Support

[slurm-dev] Re: building slurm 15.08.0 with json support

2015-09-08 Thread Moe Jette
tches the header file that I work with too (no build errors that you see). That particular module was designed specifically for SGI systems. Perhaps you could directly send me (i.e. not to the list) your netloc.h file? -- Morris "Moe" Jette CTO, SchedMD LLC Commercial Slu

[slurm-dev] Re: jobs arrays & slurm_load_job changes in 14.11+

2015-09-02 Thread Moe Jette
ob IDs for individual tasks are selected at resource allocation time. I think the drmaa wrapper I use assumes we get back a list of job id's from the array submission. Any way to query this information? Generally not before resource allocation time. On Tue, Sep 1, 2015 at 5:56 PM, Moe Jet

[slurm-dev] Re: PMI2 in Slurm 14.11.8 ?

2015-09-02 Thread Moe Jette
u.au Phone: +61 (0)3 903 55545 http://www.vlsci.org.au/ http://twitter.com/vlsci -- Morris "Moe" Jette CTO, SchedMD LLC Commercial Slurm Development and Support === Slurm User Group Meeting, 15-16 September 2015, Washington D.C. http://slurm.schedmd.com/slurm_ug_agenda.html

[slurm-dev] Re: jobs arrays & slurm_load_job changes in 14.11+

2015-09-02 Thread Moe Jette
er API calls do. I'm not an expert on the drmaa code base though so might be more gotchas lurking, however it certainly sounds less daunting to me then redesigning the drmaa api's ;-) On Wed, Sep 2, 2015 at 3:32 PM, Moe Jette <je...@schedmd.com> wrote: The Slurm APIs recognize two diffe

[slurm-dev] Re: Issues with --switches option

2015-09-02 Thread Moe Jette
ches=v[1-8] -Mark -- Morris "Moe" Jette CTO, SchedMD LLC Commercial Slurm Development and Support === Slurm User Group Meeting, 15-16 September 2015, Washington D.C. http://slurm.schedmd.com/slurm_ug_agenda.html

[slurm-dev] Re: jobs arrays & slurm_load_job changes in 14.11+

2015-09-01 Thread Moe Jette
ew().) I guess this behaviour has changed in 14.11+, because record_count is now == 1. I see this new job_array_resp_msg_t type, but I don't see a corresponding load_job function in slurm.h using it. How does the handling of job arrays work now in slurm >14.03? -- Morris "Moe"

[slurm-dev] Re: MPI-OpenMP jobs on SLURM fail ORTE_ERROR_LOG: Not found in file ess_slurmd_module.c

2015-08-27 Thread Moe Jette
=UNKNOWN NodeName=DEFAULT Procs=16 Sockets=2 CoresPerSocket=8 ThreadsPerCore=1 RealMemory=129009 State=UNKNOWN NodeName=erik[001-044] PartitionName=erik Nodes=erik[001-044] Default=YES MaxTime=INFINITE State=UP -- Morris Moe Jette CTO, SchedMD LLC Commercial Slurm Development and Support

[slurm-dev] Re:

2015-08-27 Thread Moe Jette
-- Morris Moe Jette CTO, SchedMD LLC Commercial Slurm Development and Support === Slurm User Group Meeting, 15-16 September 2015, Washington D.C. http://slurm.schedmd.com/slurm_ug_agenda.html

[slurm-dev] Re: srun unable to start in tight loop

2015-08-24 Thread Moe Jette
of successful runs varies by 10 or 20. Am I using up some resource with each run? For completeness, I am running on a Cray system with SLURM 14.11.8 Thanks, Bob -- Bob Moench (rwm); PE Debugger Development; 605-9034; 354-7895; SP 24227 -- Morris Moe Jette CTO, SchedMD LLC Commercial Slurm Development

[slurm-dev] Re: sview: Unselect job?

2015-08-21 Thread Moe Jette
is selected, only the nodes corresponding to the job are shown. With Slurm 2.4.5 I could unselect the job by pressing either Alt or Ctl while clicking the job. With Slurm 14.11.08 this doesn't seem to work. Any ideas? Loris -- This signature is currently under construction. -- Morris Moe

[slurm-dev] Re: Non-default settings of AuthInfo not being consistently propagated

2015-08-21 Thread Moe Jette
these two patches and probably each replacement should be checked by someone who knows more of the surrounding context. Best regards, Daniel On Fri, Aug 14, 2015 at 12:57 AM, Moe Jette je...@schedmd.com wrote: Hi Daniel, You seem to have found two places where the AuthInfo configuration

[slurm-dev] Slurm version 15.08.0-rc1 is now available

2015-08-20 Thread Moe Jette
join us at the Slurm User Group meeting: http://slurm.schedmd.com/slurm_ug_agenda.html -- Morris Moe Jette CTO, SchedMD LLC Commercial Slurm Development and Support === Slurm User Group Meeting, 15-16 September 2015, Washington D.C. http

[slurm-dev] Re: Non-default settings of AuthInfo not being consistently propagated

2015-08-13 Thread Moe Jette
Moe Jette CTO, SchedMD LLC Commercial Slurm Development and Support === Slurm User Group Meeting, 15-16 September 2015, Washington D.C. http://slurm.schedmd.com/slurm_ug_agenda.html

[slurm-dev] Re: --comment on srun within salloc

2015-07-31 Thread Moe Jette
--comment also works fine. However, srun --comment from within an salloc appears as an empty field when viewed with sacct. Martin -- Morris Moe Jette CTO, SchedMD LLC Commercial Slurm Development and Support === Slurm User Group Meeting

[slurm-dev] Re: Slurm versions 14.11.8 and 15.08.0-pre6 are now available

2015-07-08 Thread Moe Jette
...@buffalo.edu: Moe, On 7/7/15 7:04 PM, Moe Jette wrote: -- Backfill scheduler: The configured backfill_interval value (default 30 seconds) is now interpreted as a maximum run time for the backfill scheduler. Once reached, the scheduler will build a new job queue and start over, even if not all

[slurm-dev] Re: PATCH - update job QOS before partition

2015-07-07 Thread Moe Jette
Texas AM University Academy for Advanced Telecommunications and Learning Technologies Phone: (979)458-2396 Email: treyd...@tamu.edu Jabber: treyd...@tamu.edu -- Morris Moe Jette CTO, SchedMD LLC Commercial Slurm Development and Support

[slurm-dev] Re: job requeued in held state

2015-07-07 Thread Moe Jette
RPC: REQUEST_SUSPEND(resume) from uid=0 [2015-07-06T20:31:18.469] _slurm_rpc_suspend(resume) for 8 Job is pending execution What we can do to continue execution without breaking or cansel? -- Morris Moe Jette CTO, SchedMD LLC Commercial Slurm Development and Support

[slurm-dev] Slurm versions 14.11.8 and 15.08.0-pre6 are now available

2015-07-07 Thread Moe Jette
allocation. -- Add association usage information to scontrol show cache command output. -- MPI/MVAPICH plugin now requires Munge for authentication. -- job_submit/lua: Add default_qos fields. Add job record qos. Add partition record allow_qos and qos_char fields. -- Morris Moe Jette CTO

[slurm-dev] Re: [RFC PATCH] srun: Enable output processing on stdout in pty mode

2015-07-01 Thread Moe Jette
. Sebastian M. Schmidt -- Morris Moe Jette CTO, SchedMD LLC Commercial Slurm Development and Support

[slurm-dev] Re: Job name truncated in email

2015-06-24 Thread Moe Jette
I look at squeue's output, it is truncated in the same way, but not when I issue `scontrol show jobid -d` wkr, Kenny On 24 June 2015 at 16:57, Moe Jette je...@schedmd.com wrote: There is no truncation in current code (src/slurmctld/agent.c): if (job_ptr-array_task_id != NO_VAL

[slurm-dev] Re: Off-topic: What accounting system do you use?

2015-06-24 Thread Moe Jette
for Research Computing, University of Oslo -- Morris Moe Jette CTO, SchedMD LLC Commercial Slurm Development and Support

[slurm-dev] Re: Scheduling of GPU resources

2015-06-22 Thread Moe Jette
with how to achieve my desired setup would be greatly appreciated, because I am unsure how to carry on with troubleshooting. Best, Antonia -- Dr. Antonia Mey University of Edinburgh Department of Chemistry Joseph Black Building -- Morris Moe Jette CTO, SchedMD LLC Commercial Slurm Development

[slurm-dev] Re: Question about running interactive job on all cores on a list of heterogeneous nodes.

2015-06-11 Thread Moe Jette
inconsistent. -- Morris Moe Jette CTO, SchedMD LLC Commercial Slurm Development and Support

[slurm-dev] Re: Slow powering up nodes seen as rebooted nodes (ReturnToService=1)

2015-06-10 Thread Moe Jette
) { node_ptr-node_state = NODE_STATE_ALLOCATED | node_flags; Didier -- Morris Moe Jette CTO, SchedMD LLC Commercial Slurm Development and Support

[slurm-dev] Re: Set 1 job per core.

2015-06-09 Thread Moe Jette
SlurmctldLogFile=/site/slurm/log/slurmctld.log SlurmdDebug=3 SlurmdLogFile=/site/slurm/log/slurmd.log -- Morris Moe Jette CTO, SchedMD LLC Commercial Slurm Development and Support

[slurm-dev] Re: Limit user to use not more than N number of ntasks/cpu's on specific partition

2015-06-08 Thread Moe Jette
to configure it like that so there will be no situation that one user is using all resources in the partition. I want that my partition will not allow to user use more than N number of cpus per day or per partition. Thanks, Igor. -- Morris Moe Jette CTO, SchedMD LLC Commercial Slurm

[slurm-dev] Re: scancel job_id.step_id fails in Slurm 14.11.3

2015-05-27 Thread Moe Jette
] Andy -- Andy Riebs Hewlett-Packard Company High Performance Computing +1 404 648 9024 My opinions are not necessarily those of HP -- Morris Moe Jette CTO, SchedMD LLC Commercial Slurm Development and Support

[slurm-dev] Slurm User Group Meeting 2015, Abstracts for talks due 1 June

2015-05-25 Thread Moe Jette
) -- Morris Moe Jette CTO, SchedMD LLC Commercial Slurm Development and Support

[slurm-dev] Re: Elastic Computing Question

2015-05-14 Thread Moe Jette
-- Morris Moe Jette CTO, SchedMD LLC Commercial Slurm Development and Support

[slurm-dev] Re: Elastic Computing Question

2015-05-14 Thread Moe Jette
git code, so I should be able to try that out once I get the latest code installed. Eric On 5/14/15 3:10 PM, Moe Jette wrote: There were some changes made to Slurm version 15.08 to support this type of problem, but they are not available with earlier versions. With the new version (not yet

[slurm-dev] Re: (Custom) warnings from job_submit.lua?

2015-05-11 Thread Moe Jette
: warning: foobar Is that possible? (This is slurm 14.03.7, btw.) -- Regards, Bjørn-Helge Mevik, dr. scient, Department for Research Computing, University of Oslo -- Morris Moe Jette CTO, SchedMD LLC Commercial Slurm Development and Support

[slurm-dev] Re: bug fix in task error reporting

2015-05-08 Thread Moe Jette
); } -- Jon Nelson Dyn / Senior Software Engineer p. +1 (603) 263-8029 -- Morris Moe Jette CTO, SchedMD LLC Commercial Slurm Development and Support

[slurm-dev] Re: slurmd on first node not responding and is running

2015-05-06 Thread Moe Jette
node is not available (down or drained). The issue is in Munge's configuration, which Slurm user for authentication. -- Morris Moe Jette CTO, SchedMD LLC Commercial Slurm Development and Support

[slurm-dev] Slurm User Group Meeting 2015, CFP

2015-05-05 Thread Moe Jette
2015: Slurm User Group Meeting 2015 *Program Committee:* Yiannis Georgiou (Bull) Brian Gilmer (Cray) Matthieu Hautreux (CEA) Morris Jette (SchedMD) Bruce Pfaff (NASA Goddard Space Flight Center) Tim Wickberg (The George Washington University) -- Morris Moe Jette CTO, SchedMD LLC Commercial Slurm

[slurm-dev] Re: How to ship a SPANK plugin

2015-05-01 Thread Moe Jette
Voice: (651) 605-9034 Programming Environment -- Debugger Dev. FAX: (651) 605-8972 Cray Inc. 380 Jackson St. Suite 210 Email: r...@cray.com St. Paul, MN 55101 URL: http://www.cray.com/ -- Morris Moe Jette CTO, SchedMD LLC

[slurm-dev] Re: How to ship a SPANK plugin

2015-05-01 Thread Moe Jette
it implements: slurm_spank_local_user_init slurm_spank_exit and uses: slurm_debug slurm_error spank_get_item spank_context // perhaps unnessarily Is that still simple enough? Bob On Fri, 1 May 2015, Moe Jette wrote: If your plugin is as simple as you describe, then you

[slurm-dev] Re: Spreading Job Array Across Different Nodes

2015-05-01 Thread Moe Jette
a non-issue. The motivation for this is network-related. It could be advantageous to spread job arrays across multiple nodes (and, more importantly, multiple racks/broods connected to different switches) if the tasks are network-bound. Thanks, Will -- Morris Moe Jette CTO, SchedMD LLC

[slurm-dev] Re: since version 14.11.6 srun takes 2 cpu by default

2015-04-29 Thread Moe Jette
mail: veronique.legr...@pasteur.fr Tel: 01 44 38 95 03 -- Morris Moe Jette CTO, SchedMD LLC Commercial Slurm Development and Support

[slurm-dev] Re: sbcast, prolog and SPANK.

2015-04-29 Thread Moe Jette
you actually send a job to the node. I.e. you can send data to a node with sbcast before the prolog this might not be an expected/wanted behaviour. Best, Magnus -- Magnus Jonsson, Developer, HPC2N, Umeå Universitet -- Morris Moe Jette CTO, SchedMD LLC Commercial Slurm Development

[slurm-dev] Re: Truncated array_task_str from slurm_job_info_t api

2015-04-27 Thread Moe Jette
,... Is there any method to obtain the full list of tasks actually scheduled via the slurm_job_info_t struct? Cheers, -JX -- Morris Moe Jette CTO, SchedMD LLC Commercial Slurm Development and Support

[slurm-dev] Slurm versions 14.11.6 is now available

2015-04-23 Thread Moe Jette
a release on a reservation on the slurmctld for a batch job. This is already handled on the stepd when the script finishes. -- Morris Moe Jette CTO, SchedMD LLC Commercial Slurm Development and Support

[slurm-dev] Re: What is SPANK logging function slurm_debug?

2015-04-21 Thread Moe Jette
to change config for DebugFlags. scontrol show config says it is (null). So that doesn't seem to explain how the other debug entries have been enabled. Any idea what else I am missing? Thanks, Bob On Mon, 20 Apr 2015, Moe Jette wrote: You'll need to increase the verbosity of messages

[slurm-dev] Re: prevent slurm from parsing the full script

2015-04-21 Thread Moe Jette
Gesellschaft: Hamburg Amtsgericht Hamburg HRB 39784 -- Morris Moe Jette CTO, SchedMD LLC Commercial Slurm Development and Support

[slurm-dev] Re: What is SPANK logging function slurm_debug?

2015-04-20 Thread Moe Jette
terminal, but not the slurm_debug. Neither show up in slurmctld.log, where I was expecting to see the slurm_debug. Any idea what I am doing wrong? Thanks, Bob On Fri, 17 Apr 2015, Moe Jette wrote: Messages printed with those functions are only seen if someone has the daemons configured to print

[slurm-dev] Re: Single node configuration- CPU resources

2015-04-20 Thread Moe Jette
the number of jobs that can be run concurrently on a single shared node (assuming available resources)? Thanks, Peter. -- Morris Moe Jette CTO, SchedMD LLC Commercial Slurm Development and Support

[slurm-dev] Re: What is SPANK logging function slurm_debug?

2015-04-17 Thread Moe Jette
? The man page and Schedmd documentation are silent on these. Thanks, Bob -- Bob Moench (rwm); PE Debugger Development; 605-9034; 354-7895; SP 24227 -- Morris Moe Jette CTO, SchedMD LLC Commercial Slurm Development and Support

[slurm-dev] Re: Some work and thoughts about the backfill scheduler in Slurm.

2015-04-13 Thread Moe Jette
for Slurm to be usable for us in the future. Not just small patches that add some tweaks to get us through the day. Best regards, Magnus -- Magnus Jonsson, Developer, HPC2N, Umeå Universitet -- Morris Moe Jette CTO, SchedMD LLC Commercial Slurm Development and Support

[slurm-dev] Re: requeue all jobs

2015-04-08 Thread Moe Jette
immediately notify the sender via telephone or return mail. -- Morris Moe Jette CTO, SchedMD LLC Commercial Slurm Development and Support

[slurm-dev] Re: scancel and old data.

2015-04-07 Thread Moe Jette
. -- Morris Moe Jette CTO, SchedMD LLC Commercial Slurm Development and Support

[slurm-dev] Re: scancel and old data.

2015-04-07 Thread Moe Jette
quite good for sending an email to user when task is finished. That's is why we needs a bit another instrument for do it. Thanks you for your help. 2015-04-07 20:47 GMT+03:00 Moe Jette je...@schedmd.com: Configure an Epilog script to do that. Quoting Anatoliy Kovalenko tolik.kovale...@gmail.com

[slurm-dev] Re: --reboot

2015-04-06 Thread Moe Jette
computing center university of chicago 773.702.1104 -- Morris Moe Jette CTO, SchedMD LLC Commercial Slurm Development and Support

[slurm-dev] Re: Upgrade Rollbacks

2015-04-02 Thread Moe Jette
. What is the procedure for rolling back a minor and major release of slurm in case something goes wrong? -Paul Edmon- -- Morris Moe Jette CTO, SchedMD LLC Commercial Slurm Development and Support

[slurm-dev] Re: Handle cancel from job submit plugin

2015-04-02 Thread Moe Jette
, is it possible to catch that in a job submit plugin? Or some other plugin type? Thanks Martins -- Morris Moe Jette CTO, SchedMD LLC Commercial Slurm Development and Support

[slurm-dev] Re: CG state forever?

2015-04-02 Thread Moe Jette
page. Quoting Moe Jette je...@schedmd.com: Slurm can't kill the process, so does not reallocate those resources. See: http://slurm.schedmd.com/troubleshoot.html#completing Quoting Michael Colonno mcolo...@stanford.edu: Hi ~ I've run into this issue with several different versions

[slurm-dev] Re: CG state forever?

2015-04-02 Thread Moe Jette
on updating versions to the latest but is there anything I can do to prevent or circumvent this? Thanks, ~Mike C. -- Morris Moe Jette CTO, SchedMD LLC Commercial Slurm Development and Support

[slurm-dev] Re: job req fields in lua plugin

2015-04-02 Thread Moe Jette
-- Morris Moe Jette CTO, SchedMD LLC Commercial Slurm Development and Support

[slurm-dev] A way of abuse the priority option in Slurm?

2015-03-31 Thread Moe Jette
/commit/4454316ef527b8700743d94c958811a39609e7d5.patch -- Morris Moe Jette CTO, SchedMD LLC Commercial Slurm Development and Support

[slurm-dev] Re: Error connecting slurm stream socket at IP:6817: Connection refused

2015-03-30 Thread Moe Jette
: debug2: _slurm_connect failed: Connection refused slurmd: debug2: Error connecting slurm stream socket at 172.16.40.42:6817: Connection refused slurmd: debug: Failed to contact primary controller: Connection refused . . . Someone can help? Bests, Jorge Góis -- Morris Moe Jette CTO, SchedMD

[slurm-dev] Re: Array tasks log files

2015-03-25 Thread Moe Jette
? Thank you for your help ! Philippe -- Morris Moe Jette CTO, SchedMD LLC Commercial Slurm Development and Support

[slurm-dev] Summary of Slurm commands and options online

2015-03-25 Thread Moe Jette
There is a two-page summary of Slurm commands and options available online: http://slurm.schedmd.com/pdfs/summary.pdf We plan to make cards available with this information at upcoming conferences. -- Morris Moe Jette CTO, SchedMD LLC Commercial Slurm Development and Support

[slurm-dev] Slurm versions 14.11.5 and 15.08.0-pre3 are now available

2015-03-19 Thread Moe Jette
commands. Added new partition configuration parameter ExclusiveUser=yes|no. -- Morris Moe Jette CTO, SchedMD LLC Commercial Slurm Development and Support

[slurm-dev] Re: Propegation of limits using ulimit for SLURM daemons

2015-03-17 Thread Moe Jette
script on the compute node.Curiosity is peaked as to why the SLURM daemon doesn't obtain the correct values in from the environment.  Does anyone know?Kelly -- Morris Moe Jette CTO, SchedMD LLC Commercial Slurm Development and Support

[slurm-dev] Re: Slurm is refusing to establish a connection between nodes and controller

2015-03-17 Thread Moe Jette
: No acct_gather.conf file (/etc/slurm-llnl/acct_gather.conf) -- Morris Moe Jette CTO, SchedMD LLC Commercial Slurm Development and Support

[slurm-dev] Re: query only submission times

2015-03-17 Thread Moe Jette
Office: 211A | Phone: 617-496-7468 == -- Morris Moe Jette CTO, SchedMD LLC Commercial Slurm Development and Support

[slurm-dev] Re: limits for GRES resources?

2015-03-13 Thread Moe Jette
Sorry, not with the current code. Quoting Bill Wichser b...@princeton.edu: Is there any way to set a limit in a QOS for GRES resources? Bill -- Morris Moe Jette CTO, SchedMD LLC Commercial Slurm Development and Support

[slurm-dev] Re: Array job: get number of array tasks in batch script

2015-03-13 Thread Moe Jette
/opt.c seems beyond my reach to be frank. Is there a way I can submit a feature request? On Fri, Mar 13, 2015 at 4:48 PM, Moe Jette je...@schedmd.com mailto:je...@schedmd.com mailto:je...@schedmd.com mailto:je...@schedmd.com wrote: That information

[slurm-dev] Re: Array job: get number of array tasks in batch script

2015-03-13 Thread Moe Jette
in the array job handling a different slice. Thanks for your help, jc -- Morris Moe Jette CTO, SchedMD LLC Commercial Slurm Development and Support

[slurm-dev] Re: confused by some values in `scontrol show job`

2015-03-12 Thread Moe Jette
of the output: * SecsPreSuspend * ReqB:S:C:T * NtasksPerN:B:S:C * Socks/Node (I think I get it; but it’s not present in scontrol manpage) * CoreSpec Could someone explain these to me? ~jonathon -- Morris Moe Jette CTO, SchedMD LLC Commercial Slurm Development and Support

[slurm-dev] Re: Prolog for requeued jobs not run on all nodes

2015-03-05 Thread Moe Jette
restarts by setting begin_time. Unfortunately I probably will not have time too look at this more myself. Pär Lindfors, NSC -- Morris Moe Jette CTO, SchedMD LLC Commercial Slurm Development and Support

[slurm-dev] Re: Bug in Slurm (14.11.3) when running under debugger and executable not existing?

2015-03-02 Thread Moe Jette
: step_launch_notify_io_failure: aborting, io error with slurmstepd on node 0 Regards, Dirk -- Dirk Schubert - Lead Software Developer || Allinea Software -- Morris Moe Jette CTO, SchedMD LLC Commercial Slurm Development and Support

[slurm-dev] Re: Exporting a variable in a TaskProlog

2015-03-02 Thread Moe Jette
? Would you have a better / more elegant solution? -- DANY TELLO -- Morris Moe Jette CTO, SchedMD LLC Commercial Slurm Development and Support

[slurm-dev] Re: mixing mpi and per node tasks

2015-03-02 Thread Moe Jette
. However, in that case to use srun and use all the cores, extra options were needed. I know that, but wanted to provide you with a more general solution. -Original Message- From: Moe Jette [mailto:je...@schedmd.com] Sent: Tuesday, 3 March 2015 3:42 AM To: slurm-dev Subject: [slurm

[slurm-dev] Re: mixing mpi and per node tasks

2015-03-02 Thread Moe Jette
Quoting gareth.willi...@csiro.au: -Original Message- From: Moe Jette [mailto:je...@schedmd.com] Sent: Tuesday, 3 March 2015 9:54 AM -snip- The options for srun, sbatch, and salloc are almost identical with respect to specification of a job's allocation requirements. Yes. Part

[slurm-dev] Re: mixing mpi and per node tasks

2015-03-02 Thread Moe Jette
any solution to work with hybrid mpi/openmp with one openmp task per node or per socket. Thanks, Gareth -- Morris Moe Jette CTO, SchedMD LLC Commercial Slurm Development and Support

[slurm-dev] Re: Prolog for requeued jobs not run on all nodes

2015-02-27 Thread Moe Jette
at all. I have not done any detailed analysis of this case, but I would guess something similar is causing this. Regards, Pär Lindfors, NSC -- Morris Moe Jette CTO, SchedMD LLC Commercial Slurm Development and Support

[slurm-dev] Slurm version 14.03.0 is now available

2014-03-26 Thread Moe Jette
Slurm version 14.03.0 is now available. This is a major Slurm release with many new features. See the RELEASE_NOTES and NEWS files in the distribution for detailed descriptions of the changes, a few of which are noted below. Upgrading from Slurm versions 2.5 or 2.6 should proceed without

[slurm-dev] Re: fix select_nodeinfo_set_all in select/linear

2014-03-25 Thread Moe Jette
Your patch will be in the next release of version 14.03. Thank you! https://github.com/SchedMD/slurm/commit/18ca8adf9437cb1d7756537785ee6ee573249f66 Quoting Hongjia Cao hj...@nudt.edu.cn: allocated but drained node will be shown mixed by sinfo.

[slurm-dev] Re: slurm_jobid2pid

2014-03-24 Thread Moe Jette
The closest thing available today is the scontrol listpids command described on the scontrol man page. Quoting Ulf Markwardt ulf.markwa...@tu-dresden.de: Dear developers, in the API, I can find a function slurm_pid2jobid, that's fine. For our monitoring, we need the inverse function,

[slurm-dev] Re: job_submit.lua and custom user message

2014-03-15 Thread Moe Jette
[0] == 0x42) failed Best Regards, Tommi Tervo CSC On Tuesday, March 11, 2014 4:35 PM, Moe Jette je...@schedmd.com wrote: I don't have time to test this right now, but believe the commit below  will fix the problem by initializing a variable to NULL. https://github.com/SchedMD/slurm/commit

[slurm-dev] Re: job_submit.lua and custom user message

2014-03-11 Thread Moe Jette
I don't have time to test this right now, but believe the commit below will fix the problem by initializing a variable to NULL. https://github.com/SchedMD/slurm/commit/e3363b95b0cedd4972c8c7b8dc87a1750f6bc3dd Quoting Marco Passerini marco.passer...@csc.fi: Hi, I'm trying Slurm 14.03.0,

[slurm-dev] Re: HA : not switching fast from master to backup server

2014-03-05 Thread Moe Jette
See the configuration parameter SlurmctldTimeout as described here: http://slurm.schedmd.com/slurm.conf.html Quoting Marc Vecsys vecsys@gmail.com: Hi It takes 5mn for the backup controler to start after the master failed, is there any setup to have a fast switching ? Thanks Marc

[slurm-dev] Re: Prevent interactive session longer than x minutes

2014-03-05 Thread Moe Jette
The job submit data structure does not have a batch_flag. Check if script is NULL or not. Quoting Oriol Mula-Valls omv.li...@gmail.com: Hi, I am creating a LUA plugin and I am trying to prevent the interactive jobs longer than 8h. How can I know if a job is interactive or not? I've tried

  1   2   3   4   >