The thing is that disabling HT via OS or via BIOS may not be the same
as you can see in this thread:
https://software.intel.com/en-us/forums/software-tuning-performance-opt
imization-platform-monitoring/topic/480007
Moreover, I wouldn't be surprised if hwloc (which SLURM uses for
affinity binding) may be "insensitive" to OS disabled HT. Hoverver when
you disable it via BIOS there will be no ambiguity.
Davide
On Wed, 2016-05-18 at 07:58 -0700, Jason Bacon wrote:
> No, opted against that in case we want to experiment with
> hyperthreading
> in the future without having to reboot.
>
> How might that affect SLURM?
>
> Thanks,
>
> JB
>
> On 05/18/16 09:24, Davide Vanzo wrote:
> >
> > Jason,
> > have you tried disabling HT from bios instead of doing from the OS?
> >
> > Davide
> >
> >
> >
> > On Wed, 2016-05-18 at 06:02 -0700, Jason Bacon wrote:
> > >
> > > Just leaving a trail for future Googlers. My colleague did an
> > > extensive
> > > search for answers and came up empty.
> > >
> > > We ran into an issue after disabling hyperthreading on one of our
> > > CentOS
> > > clusters.
> > >
> > > Here's the scenario:
> > >
> > > - Our compute nodes had hyperthreading enabled while we
> > > evaluated the
> > > costs and benefits.
> > >
> > > - SLURM was configured to schedule only one job per real core.
> > > For
> > > example, nodes with 24 cores / 48 virtual are configured as
> > > follows:
> > >
> > > NodeName=compute-[029-083] RealMemory=64000 Sockets=2
> > > CoresPerSocket=12
> > > ThreadsP
> > > erCore=1 State=UNKNOWN
> > >
> > > - I added a command to /etc/rc.d/rc.local to disable
> > > hyperthreading
> > > on the next reboot.
> > >
> > > - No changes were made to slurm.conf.
> > >
> > > - After rebooting with hyperthreading disabled, certain jobs
> > > landing
> > > on the node would fail with the following error:
> > >
> > > slurmstepd: Failed task affinity setup
> > >
> > > - Restarting the scheduler cleared up the issue.
> > >
> > > Does anybody know what would cause this? My best hypothesis is
> > > that
> > > slurmctld is caching some probed hardware info from slurmd that
> > > changed
> > > when hyperthreading was disabled.
> > >
> > > Cheers,
> > >
> > > Jason
> > >
>