You can try utilizing SLURM_HINT=nomultithread in the user's environment. This allows you to have mutlithreading turned on in the BIOS but in use by default with Slurm. Just keep in mind it's a hint.You would configure Slurm as if it had hyperthreading on like so:NodeName=compute-[029-083] RealMemory=64000 Sockets=2 CoresPerSocket=12 ThreadsPerCore=2 State=UNKNOWNThen when you wanted to test with hyperthreading you would use --hint=multithread on the srun or sbatch command line. -----------------------------------------------------------------------------------On 05/18/16, Davide Vanzo<[email protected]> wrote: The thing is that disabling HT via OS or via BIOS may not be the same as you can see in this thread: https://software.intel.com/en-us/forums/software-tuning-performance-optimization-platform-monitoring/topic/480007Moreover, I wouldn't be surprised if hwloc (which SLURM uses for affinity binding) may be "insensitive" to OS disabled HT. Hoverver when you disable it via BIOS there will be no ambiguity.DavideOn Wed, 2016-05-18 at 07:58 -0700, Jason Bacon wrote:No, opted against that in case we want to experiment with hyperthreading in the future without having to reboot.How might that affect SLURM?Thanks, JBOn 05/18/16 09:24, Davide Vanzo wrote:Jason,have you tried disabling HT from bios instead of doing from the OS?DavideOn Wed, 2016-05-18 at 06:02 -0700, Jason Bacon wrote:Just leaving a trail for future Googlers. My colleague did an extensivesearch for answers and came up empty.We ran into an issue after disabling hyperthreading on one of our CentOSclusters.Here's the scenario:- Our compute nodes had hyperthreading enabled while we evaluated thecosts and benefits.- SLURM was configured to schedule only one job per real core. Forexample, nodes with 24 cores / 48 virtual are configured as follows:NodeName=compute-[029-083] RealMemory=64000 Sockets=2 CoresPerSocket=12ThreadsPerCore=1 State=UNKNOWN- I added a command to /etc/rc.d/rc.local to disable hyperthreadingon the next reboot.- No changes were made to slurm.conf.- After rebooting with hyperthreading disabled, certain jobs landingon the node would fail with the following error: slurmstepd: Failed task affinity setup- Restarting the scheduler cleared up the issue.Does anybody know what would cause this? My best hypothesis is thatslurmctld is caching some probed hardware info from slurmd that changedwhen hyperthreading was disabled.Cheers, Jason
