[slurm-dev] SLURM versions 2.4.0 and 2.5.0-pre1 are now available

Danny Auble Thu, 28 Jun 2012 15:31:09 -0700

We are pleased to release a formal 2.4.0 release! Also a firstdevelopment release of 2.5.


Both are available now for download at http://www.schedmd.com/#repos.

If you are developing new code please code against the master git repohttps://github.com/SchedMD/slurm as it is constantly updated so as toavoid as many conflicts as possible.

*Note to BGQ earlier adopters:* Recently there have been a few changesthat require the runjob_mux to run as your SLURM user. Also theplugin_flags must be updated as well to avoid a possible runjob_muxcrash if you are starting a job and decide to turn off the slurmctld atthe same time. Please read the updated bluegene web pagehttp://schedmd.com/slurmdocs/bluegene.html look for "SystemAdministration for BlueGene/Q only" for full instructions.

Thanks for all your help and support. Among other things 2.4 bringssubstantial performance enhancements and many other improvements many ofwhich can be found in the RELEASE_NOTES file in the code.

As always if you find any bugs let us know throughhttp://bugs.schedmd.com or the slurm-dev list.


Below are changes for 2.4.0 and 2.5.0-pre1 since the last tag.

* Changes in SLURM 2.4.0
========================
 -- Cray - Improve support for zero compute note resource allocations.
    Partition used can now be configured with no nodes nodes.
 -- BGQ - make it so srun -i<taskid> works correctly.
 -- Fix parse_uint32/16 to complain if a non-digit is given.

-- Add SUBMITHOST to job state passed to Moab vial sched/wiki2. Patchby Jon

    Bringhurst (LANL).
 -- BGQ - Fix issue when running with AllowSubBlockAllocations=Yes without
    compiling with --enable-debug

-- Modify scontrol to require "-dd" option to report batch job'sscript. Patch

    from Don Albert, Bull.
 -- Modify SchedulerParamters option to match documentation: "bf_res="
    changed to "bf_resolution=". Patch from Rod Schultz, Bull.

-- Fix bug that clears job pending reason field. Patch fron DonLipari, LLNL.

 -- In etc/init.d/slurm move check for scontrol after sourcing
    /etc/sysconfig/slurm. Patch from Andy Wettstein, University of Chicago.
 -- Fix in scheduling logic that can delay jobs with min/max node counts.
 -- BGQ - fix issue where if a step uses the entire allocation and then

the next step in the allocation only uses part of the allocation itgets

    the correct cnodes.

-- BGQ - Fix checking for IO on a block with new IBM driver V1R1M1previous

    function didn't always work correctly.

-- BGQ - Fix issue when a nodeboard goes down and you want to combineblocks

    to make a larger small block and are running with sub-blocks.

-- BLUEGENE - Better logic for making small blocks around badnodeboard/card.

 -- BGQ - When using an old IBM driver cnodes that go into error because of
    a job kill timeout aren't always reported to the system.  This is now
    handled by the runjob_mux plugin.

-- BGQ - Added information on how to setup the runjob_mux to run asSlurmUser.

 -- Improve memory consumption on step layouts with high task count.
 -- BGQ - quiter debug when the real time server comes back but there are

still messages we find when we poll but haven't given it back tothe real

    time yet.
 -- BGQ - fix for if a request comes in smaller than the smallest block and
    we must use a small block instead of a shared midplane block.

-- Fix issues on large jobs (>64k tasks) to have the correct countertype when

    packing the step layout structure.

-- BGQ - fix issue where if a user was asking for tasks andntasks-per-node

    but not node count the node count is correctly figured out.

-- Move logic to always use the 1st alphanumeric node as the batchhost for

    batch jobs.

-- BLUEGENE - fix race condition where if a nodeboard/card goes downat the

    same time a block is destroyed and that block just happens to be the
    smallest overlapping block over the bad hardware.
 -- Fix bug when querying accounting looking for a job node size.
 -- BLUEGENE - fix possible race condition if cleaning up a block and the
    removal of the job on the block failed.
 -- BLUEGENE - fix issue if a cable was in an error state make it so we can
    check if a block is still makable if the cable wasn't in error.
 -- Put nodes names in alphabetic order in node table.

-- If preempted job should have a grace time and preempt mode is notcancelbut job is going to be canceled because it is interactive or otherreason

    it now receives the grace time.

-- BGQ - Modified documents to explain new plugin_flags needed inbg.properties

    in order for the runjob_mux to run correctly.

-- BGQ - change linking from libslurm.o to libslurmhelper.la to avoidwarning.


* Changes in SLURM 2.5.0.pre1
=============================

-- Add new output to "scontrol show configuration" of LicensesUsed.Output is

    "name:used/total"

-- Changed jobacct_gather plugin infrastructure to be cleaner andeasier to

    maintain.

-- Change license option count separator from "*" to ":" forconsistency withthe gres option (e.g. "--licenses=foo:2 --gres=gpu:2"). The "*"will still

    be accepted, but is no longer documented.
 -- Permit more than 100 jobs to be scheduled per node (new limit is 10,000
    jobs).
 -- Restructure of srun code to allow outside programs to utilize existing
    logic.

[slurm-dev] SLURM versions 2.4.0 and 2.5.0-pre1 are now available

Reply via email to