We are pleased to release a formal 2.4.0 release! Also a first development release of 2.5.

Both are available now for download at http://www.schedmd.com/#repos.

If you are developing new code please code against the master git repo https://github.com/SchedMD/slurm as it is constantly updated so as to avoid as many conflicts as possible.

*Note to BGQ earlier adopters:* Recently there have been a few changes that require the runjob_mux to run as your SLURM user. Also the plugin_flags must be updated as well to avoid a possible runjob_mux crash if you are starting a job and decide to turn off the slurmctld at the same time. Please read the updated bluegene web page http://schedmd.com/slurmdocs/bluegene.html look for "System Administration for BlueGene/Q only" for full instructions.

Thanks for all your help and support. Among other things 2.4 brings substantial performance enhancements and many other improvements many of which can be found in the RELEASE_NOTES file in the code.

As always if you find any bugs let us know through http://bugs.schedmd.com or the slurm-dev list.

Below are changes for 2.4.0 and 2.5.0-pre1 since the last tag.

* Changes in SLURM 2.4.0
========================
 -- Cray - Improve support for zero compute note resource allocations.
    Partition used can now be configured with no nodes nodes.
 -- BGQ - make it so srun -i<taskid> works correctly.
 -- Fix parse_uint32/16 to complain if a non-digit is given.
-- Add SUBMITHOST to job state passed to Moab vial sched/wiki2. Patch by Jon
    Bringhurst (LANL).
 -- BGQ - Fix issue when running with AllowSubBlockAllocations=Yes without
    compiling with --enable-debug
-- Modify scontrol to require "-dd" option to report batch job's script. Patch
    from Don Albert, Bull.
 -- Modify SchedulerParamters option to match documentation: "bf_res="
    changed to "bf_resolution=". Patch from Rod Schultz, Bull.
-- Fix bug that clears job pending reason field. Patch fron Don Lipari, LLNL.
 -- In etc/init.d/slurm move check for scontrol after sourcing
    /etc/sysconfig/slurm. Patch from Andy Wettstein, University of Chicago.
 -- Fix in scheduling logic that can delay jobs with min/max node counts.
 -- BGQ - fix issue where if a step uses the entire allocation and then
the next step in the allocation only uses part of the allocation it gets
    the correct cnodes.
-- BGQ - Fix checking for IO on a block with new IBM driver V1R1M1 previous
    function didn't always work correctly.
-- BGQ - Fix issue when a nodeboard goes down and you want to combine blocks
    to make a larger small block and are running with sub-blocks.
-- BLUEGENE - Better logic for making small blocks around bad nodeboard/card.
 -- BGQ - When using an old IBM driver cnodes that go into error because of
    a job kill timeout aren't always reported to the system.  This is now
    handled by the runjob_mux plugin.
-- BGQ - Added information on how to setup the runjob_mux to run as SlurmUser.
 -- Improve memory consumption on step layouts with high task count.
 -- BGQ - quiter debug when the real time server comes back but there are
still messages we find when we poll but haven't given it back to the real
    time yet.
 -- BGQ - fix for if a request comes in smaller than the smallest block and
    we must use a small block instead of a shared midplane block.
-- Fix issues on large jobs (>64k tasks) to have the correct counter type when
    packing the step layout structure.
-- BGQ - fix issue where if a user was asking for tasks and ntasks-per-node
    but not node count the node count is correctly figured out.
-- Move logic to always use the 1st alphanumeric node as the batch host for
    batch jobs.
-- BLUEGENE - fix race condition where if a nodeboard/card goes down at the
    same time a block is destroyed and that block just happens to be the
    smallest overlapping block over the bad hardware.
 -- Fix bug when querying accounting looking for a job node size.
 -- BLUEGENE - fix possible race condition if cleaning up a block and the
    removal of the job on the block failed.
 -- BLUEGENE - fix issue if a cable was in an error state make it so we can
    check if a block is still makable if the cable wasn't in error.
 -- Put nodes names in alphabetic order in node table.
-- If preempted job should have a grace time and preempt mode is not cancel but job is going to be canceled because it is interactive or other reason
    it now receives the grace time.
-- BGQ - Modified documents to explain new plugin_flags needed in bg.properties
    in order for the runjob_mux to run correctly.
-- BGQ - change linking from libslurm.o to libslurmhelper.la to avoid warning.

* Changes in SLURM 2.5.0.pre1
=============================
-- Add new output to "scontrol show configuration" of LicensesUsed. Output is
    "name:used/total"
-- Changed jobacct_gather plugin infrastructure to be cleaner and easier to
    maintain.
-- Change license option count separator from "*" to ":" for consistency with the gres option (e.g. "--licenses=foo:2 --gres=gpu:2"). The "*" will still
    be accepted, but is no longer documented.
 -- Permit more than 100 jobs to be scheduled per node (new limit is 10,000
    jobs).
 -- Restructure of srun code to allow outside programs to utilize existing
    logic.

Reply via email to