SLURM version 2.4.2 is now available from
http://www.schedmd.com/#repos

It includes many bug fixes, most of which IBM BlueGene specific.

* Changes in SLURM 2.4.2
========================
  -- BLUEGENE - Correct potential deadlock issue when hardware goes bad and
     there are jobs running on that hardware.
  -- If job is submitted to more than one partition, it's partition pointer can
     be set to an invalid value. This can result in the count of CPUs allocated
     on a node being bad, resulting in over- or under-allocation of its CPUs.
     Patch by Carles Fenoy, BSC.
  -- Fix bug in task layout with select/cons_res plugin and --ntasks-per-node
     option. Patch by Martin Perry, Bull.
  -- BLUEGENE - remove race condition where if a block is removed while waiting
     for a job to finish on it the number of unused cpus wasn't updated
     correctly.
  -- BGQ - make sure we have a valid block when creating or finishing a step
     allocation.
  -- BLUEGENE - If a large block (> 1 midplane) is in error and underlying
     hardware is marked bad remove the larger block and create a block over
     just the bad hardware making the other hardware available to run on.
  -- BLUEGENE - Handle job completion correctly if an admin removes a block
     where other blocks on an overlapping midplane are running jobs.
  -- BLUEGENE - correctly remove running jobs when freeing a block.
  -- BGQ - correct logic to place multiple (< 1 midplane) steps inside a
     multi midplane block allocation.
  -- BGQ - Make it possible for a multi midplane allocation to run on more
     than 1 midplane but not the entire allocation.
  -- BGL - Fix for syncing users on block from Tim Wickberg
  -- Fix initialization of protocol_version for some messages to make sure it
     is always set when sending or receiving a message.
  -- Reset backfilled job counter only when explicitly cleared using scontrol.
     Patch from Alejandro Lucero Palau, BSC.
  -- BLUEGENE - Fix for handling blocks when a larger block will not free and
     while it is attempting to free underlying hardware is marked in error
     making small blocks overlapping with the freeing block.  This only
     applies to dynamic layout mode.
  -- Cray and BlueGene - Do not treat lack of usable front-end nodes when
     slurmctld deamon starts as a fatal error. Also preserve correct front-end
     node for jobs when there is more than one front-end node and the slurmctld
     daemon restarts.
  -- Correct parsing of srun/sbatch input/output/error file names so that only
     the name "none" is mapped to /dev/null and not any file name starting
     with "none" (e.g. "none.o").
  -- BGQ - added version string to the load of the runjob_mux plugin to verify
     the current plugin has been loaded when using runjob_mux_refresh_config
  -- CGROUPS - Use system mount/umount function calls instead of doing fork
     exec of mount/umount from Janne Blomqvist.
  -- BLUEGENE - correct start time setup when no jobs are blocking the way
     from Mark Nelson
  -- Fixed sacct --state=S query to return information about suspended jobs
     current or in the past.
  -- FRONTEND - Made error warning more apparent if a frontend node isn't
     configured correctly.
  -- BGQ - update documentation about runjob_mux_refresh_config which works
     correctly as of IBM driver V1R1M1 efix 008.

Reply via email to