Re: [PATCH v3 1/2] x86/fpu: track AVX-512 usage of tasks

2018-11-20 Thread Samuel Neves
On Tue, Nov 20, 2018 at 1:32 PM Li, Aubrey  wrote:
> Thanks for your program, Samuel, it's very helpful. But I saw a different
> output on my side, May I have your glibc version?
>

This system is running glibc 2.27, and kernel 4.18.7. The Xeon Gold
5120 also happens to be one of the Skylake-SP models with a single
512-bit FMA unit, instead of 2.

Samuel.


Re: [PATCH v3 1/2] x86/fpu: track AVX-512 usage of tasks

2018-11-20 Thread Samuel Neves
On Tue, Nov 20, 2018 at 1:32 PM Li, Aubrey  wrote:
> Thanks for your program, Samuel, it's very helpful. But I saw a different
> output on my side, May I have your glibc version?
>

This system is running glibc 2.27, and kernel 4.18.7. The Xeon Gold
5120 also happens to be one of the Skylake-SP models with a single
512-bit FMA unit, instead of 2.

Samuel.


Re: [PATCH v3 1/2] x86/fpu: track AVX-512 usage of tasks

2018-11-18 Thread Samuel Neves
On 11/17/18 12:36 AM, Li, Aubrey wrote:
> On 2018/11/17 7:10, Dave Hansen wrote:
>> Just to be clear: there are 3 AVX-512 XSAVE states:
>>
>>  XFEATURE_OPMASK,
>>  XFEATURE_ZMM_Hi256,
>>  XFEATURE_Hi16_ZMM,
>>
>> I honestly don't know what XFEATURE_OPMASK does.  It does not appear to
>> be affected by VZEROUPPER (although VZEROUPPER's SDM documentation isn't
>> looking too great).

XFEATURE_OPMASK refers to the additional 8 mask registers used in
AVX512. These are more similar to general purpose registers than
vector registers, and should not be too relevant here.

>>
>> But, XFEATURE_ZMM_Hi256 is used for the upper 256 bits of the
>> registers ZMM0-ZMM15.  Those are AVX-512-only registers.  The only way
>> to get data into XFEATURE_ZMM_Hi256 state is by using AVX512 instructions.
>>
>> XFEATURE_Hi16_ZMM is the same.  The only way to get state in there is
>> with AVX512 instructions.
>>
>> So, first of all, I think you *MUST* check XFEATURE_ZMM_Hi256 and
>> XFEATURE_Hi16_ZMM.  That's without question.
>
> No, XFEATURE_ZMM_Hi256 does not request turbo license 2, so it's less
> interested to us.
>

I think Dave is right, and it's easy enough to check this. See the
attached program. For the "high current" instruction vpmuludq
operating on zmm0--zmm3 registers, we have (on a Skylake-SP Xeon Gold
5120)

   175,097  core_power.lvl0_turbo_license:u
 ( +-  2.18% )
41,185  core_power.lvl1_turbo_license:u
 ( +-  1.55% )
83,928,648  core_power.lvl2_turbo_license:u
 ( +-  0.00% )

while for the same code operating on zmm28--zmm31 registers, we have

   163,507  core_power.lvl0_turbo_license:u
 ( +-  6.85% )
47,390  core_power.lvl1_turbo_license:u
 ( +- 12.25% )
83,927,735  core_power.lvl2_turbo_license:u
 ( +-  0.00% )

In other words, the register index does not seem to matter at all for
turbo license purposes (this makes sense, considering these chips have
168 vector registers internally; zmm15--zmm31 are simply newly exposed
architectural registers).

We can also see that XFEATURE_Hi16_ZMM does not imply license 1 or 2;
we may be using xmm15--xmm31 purely for the convenient extra register
space. For example, cases 4 and 5 of the sample program:

84,064,239  core_power.lvl0_turbo_license:u
 ( +-  0.00% )
 0  core_power.lvl1_turbo_license:u
 0  core_power.lvl2_turbo_license:u

84,060,625  core_power.lvl0_turbo_license:u
 ( +-  0.00% )
 0  core_power.lvl1_turbo_license:u
 0  core_power.lvl2_turbo_license:u

So what's most important is the width of the vectors being used, not
the instruction set or the register index. Second to that is the
instruction type, namely whether those are "heavy" instructions.
Neither of these things can be accurately captured by the XSAVE state.

>>
>> It's probably *possible* to run AVX512 instructions by loading state
>> into the YMM register and then executing AVX512 instructions that only
>> write to memory and never to register state.  That *might* allow
>> XFEATURE_Hi16_ZMM and XFEATURE_ZMM_Hi256 to stay in the init state, but
>> for the frequency to be affected since AVX512 instructions _are_
>> executing.  But, there's no way to detect this situation from XSAVE
>> states themselves.
>>
>
> Andi should have more details on this. FWICT, not all AVX512 instructions
> has high current, those only touching memory do not cause notable frequency
> drop.

According to section 15.26 of the Intel optimization reference manual,
"heavy" instructions consist of floating-point and integer
multiplication. Moves, adds, logical operations, etc, will request at
most turbo license 1 when operating on zmm registers.

>
> Thanks,
> -Aubrey
>
#include 

#define INSN_LOOP_LO(insn, reg) do {   \
  asm volatile(\
"mov $1<<24,%%rcx;"\
".align 32;"   \
"1:"   \
#insn " " "%%" #reg "0" "," "%%" #reg "0" "," "%%" #reg "0" ";"\
#insn " " "%%" #reg "1" "," "%%" #reg "1" "," "%%" #reg "1" ";"\
#insn " " "%%" #reg "2" "," "%%" #reg "2" "," "%%" #reg "2" ";"\
#insn " " "%%" #reg "3" "," "%%" #reg "3" "," "%%" #reg "3" ";"\
"dec %%rcx;"   \
"jnz 1b;"  \
::: "rcx"  \
  );   \
} while(0);

#define INSN_LOOP_HI(insn, reg) do {

Re: [PATCH v3 1/2] x86/fpu: track AVX-512 usage of tasks

2018-11-18 Thread Samuel Neves
On 11/17/18 12:36 AM, Li, Aubrey wrote:
> On 2018/11/17 7:10, Dave Hansen wrote:
>> Just to be clear: there are 3 AVX-512 XSAVE states:
>>
>>  XFEATURE_OPMASK,
>>  XFEATURE_ZMM_Hi256,
>>  XFEATURE_Hi16_ZMM,
>>
>> I honestly don't know what XFEATURE_OPMASK does.  It does not appear to
>> be affected by VZEROUPPER (although VZEROUPPER's SDM documentation isn't
>> looking too great).

XFEATURE_OPMASK refers to the additional 8 mask registers used in
AVX512. These are more similar to general purpose registers than
vector registers, and should not be too relevant here.

>>
>> But, XFEATURE_ZMM_Hi256 is used for the upper 256 bits of the
>> registers ZMM0-ZMM15.  Those are AVX-512-only registers.  The only way
>> to get data into XFEATURE_ZMM_Hi256 state is by using AVX512 instructions.
>>
>> XFEATURE_Hi16_ZMM is the same.  The only way to get state in there is
>> with AVX512 instructions.
>>
>> So, first of all, I think you *MUST* check XFEATURE_ZMM_Hi256 and
>> XFEATURE_Hi16_ZMM.  That's without question.
>
> No, XFEATURE_ZMM_Hi256 does not request turbo license 2, so it's less
> interested to us.
>

I think Dave is right, and it's easy enough to check this. See the
attached program. For the "high current" instruction vpmuludq
operating on zmm0--zmm3 registers, we have (on a Skylake-SP Xeon Gold
5120)

   175,097  core_power.lvl0_turbo_license:u
 ( +-  2.18% )
41,185  core_power.lvl1_turbo_license:u
 ( +-  1.55% )
83,928,648  core_power.lvl2_turbo_license:u
 ( +-  0.00% )

while for the same code operating on zmm28--zmm31 registers, we have

   163,507  core_power.lvl0_turbo_license:u
 ( +-  6.85% )
47,390  core_power.lvl1_turbo_license:u
 ( +- 12.25% )
83,927,735  core_power.lvl2_turbo_license:u
 ( +-  0.00% )

In other words, the register index does not seem to matter at all for
turbo license purposes (this makes sense, considering these chips have
168 vector registers internally; zmm15--zmm31 are simply newly exposed
architectural registers).

We can also see that XFEATURE_Hi16_ZMM does not imply license 1 or 2;
we may be using xmm15--xmm31 purely for the convenient extra register
space. For example, cases 4 and 5 of the sample program:

84,064,239  core_power.lvl0_turbo_license:u
 ( +-  0.00% )
 0  core_power.lvl1_turbo_license:u
 0  core_power.lvl2_turbo_license:u

84,060,625  core_power.lvl0_turbo_license:u
 ( +-  0.00% )
 0  core_power.lvl1_turbo_license:u
 0  core_power.lvl2_turbo_license:u

So what's most important is the width of the vectors being used, not
the instruction set or the register index. Second to that is the
instruction type, namely whether those are "heavy" instructions.
Neither of these things can be accurately captured by the XSAVE state.

>>
>> It's probably *possible* to run AVX512 instructions by loading state
>> into the YMM register and then executing AVX512 instructions that only
>> write to memory and never to register state.  That *might* allow
>> XFEATURE_Hi16_ZMM and XFEATURE_ZMM_Hi256 to stay in the init state, but
>> for the frequency to be affected since AVX512 instructions _are_
>> executing.  But, there's no way to detect this situation from XSAVE
>> states themselves.
>>
>
> Andi should have more details on this. FWICT, not all AVX512 instructions
> has high current, those only touching memory do not cause notable frequency
> drop.

According to section 15.26 of the Intel optimization reference manual,
"heavy" instructions consist of floating-point and integer
multiplication. Moves, adds, logical operations, etc, will request at
most turbo license 1 when operating on zmm registers.

>
> Thanks,
> -Aubrey
>
#include 

#define INSN_LOOP_LO(insn, reg) do {   \
  asm volatile(\
"mov $1<<24,%%rcx;"\
".align 32;"   \
"1:"   \
#insn " " "%%" #reg "0" "," "%%" #reg "0" "," "%%" #reg "0" ";"\
#insn " " "%%" #reg "1" "," "%%" #reg "1" "," "%%" #reg "1" ";"\
#insn " " "%%" #reg "2" "," "%%" #reg "2" "," "%%" #reg "2" ";"\
#insn " " "%%" #reg "3" "," "%%" #reg "3" "," "%%" #reg "3" ";"\
"dec %%rcx;"   \
"jnz 1b;"  \
::: "rcx"  \
  );   \
} while(0);

#define INSN_LOOP_HI(insn, reg) do {

Re: [PATCH] x86: entry: flush the cache if syscall error

2018-10-12 Thread Samuel Neves
On Fri, Oct 12, 2018 at 2:26 PM Jann Horn  wrote:
>
> On Fri, Oct 12, 2018 at 11:41 AM Samuel Neves  wrote:
> >
> > On Thu, Oct 11, 2018 at 8:25 PM Andy Lutomirski  wrote:
> > > What exactly is this trying to protect against?  And how many cycles
> > > should we expect L1D_FLUSH to take?
> >
> > As far as I could measure, I got 1660 cycles per wrmsr 0x10b, 0x1 on a
> > Skylake chip, and 1220 cycles on a Skylake-SP.
>
> Is that with L1D mostly empty, with L1D mostly full with clean lines,
> or with L1D full of dirty lines that need to be written back?

Mostly empty, as this is flushing repeatedly without bothering to
refill L1d with anything.

On Skylake the (averaged) uops breakdown is something like

port 0: 255
port 1: 143
port 2: 176
port 3: 177
port 4: 524
port 5: 273
port 6: 616
port 7: 182

The number of port 4 dispatches is very close to the number of cache
lines, suggesting one write per line (with respective 176+177+182 port
{2, 3, 7} address generations).

Furthermore, I suspect it also clears L1i cache. For 2^20 wrmsr
executions, we have around 2^20 frontend_retired_l1i_miss events, but
a negligible amount of frontend_retired_l2_miss ones.


Re: [PATCH] x86: entry: flush the cache if syscall error

2018-10-12 Thread Samuel Neves
On Fri, Oct 12, 2018 at 2:26 PM Jann Horn  wrote:
>
> On Fri, Oct 12, 2018 at 11:41 AM Samuel Neves  wrote:
> >
> > On Thu, Oct 11, 2018 at 8:25 PM Andy Lutomirski  wrote:
> > > What exactly is this trying to protect against?  And how many cycles
> > > should we expect L1D_FLUSH to take?
> >
> > As far as I could measure, I got 1660 cycles per wrmsr 0x10b, 0x1 on a
> > Skylake chip, and 1220 cycles on a Skylake-SP.
>
> Is that with L1D mostly empty, with L1D mostly full with clean lines,
> or with L1D full of dirty lines that need to be written back?

Mostly empty, as this is flushing repeatedly without bothering to
refill L1d with anything.

On Skylake the (averaged) uops breakdown is something like

port 0: 255
port 1: 143
port 2: 176
port 3: 177
port 4: 524
port 5: 273
port 6: 616
port 7: 182

The number of port 4 dispatches is very close to the number of cache
lines, suggesting one write per line (with respective 176+177+182 port
{2, 3, 7} address generations).

Furthermore, I suspect it also clears L1i cache. For 2^20 wrmsr
executions, we have around 2^20 frontend_retired_l1i_miss events, but
a negligible amount of frontend_retired_l2_miss ones.


Re: [PATCH] x86: entry: flush the cache if syscall error

2018-10-12 Thread Samuel Neves
On Thu, Oct 11, 2018 at 8:25 PM Andy Lutomirski  wrote:
> What exactly is this trying to protect against?  And how many cycles
> should we expect L1D_FLUSH to take?

As far as I could measure, I got 1660 cycles per wrmsr 0x10b, 0x1 on a
Skylake chip, and 1220 cycles on a Skylake-SP.


Re: [PATCH] x86: entry: flush the cache if syscall error

2018-10-12 Thread Samuel Neves
On Thu, Oct 11, 2018 at 8:25 PM Andy Lutomirski  wrote:
> What exactly is this trying to protect against?  And how many cycles
> should we expect L1D_FLUSH to take?

As far as I could measure, I got 1660 cycles per wrmsr 0x10b, 0x1 on a
Skylake chip, and 1220 cycles on a Skylake-SP.


[tip:x86/urgent] x86/vdso: Fix lsl operand order

2018-09-01 Thread tip-bot for Samuel Neves
Commit-ID:  e78e5a91456fcecaa2efbb3706572fe043766f4d
Gitweb: https://git.kernel.org/tip/e78e5a91456fcecaa2efbb3706572fe043766f4d
Author: Samuel Neves 
AuthorDate: Sat, 1 Sep 2018 21:14:52 +0100
Committer:  Thomas Gleixner 
CommitDate: Sat, 1 Sep 2018 23:01:56 +0200

x86/vdso: Fix lsl operand order

In the __getcpu function, lsl is using the wrong target and destination
registers. Luckily, the compiler tends to choose %eax for both variables,
so it has been working so far.

Fixes: a582c540ac1b ("x86/vdso: Use RDPID in preference to LSL when available")
Signed-off-by: Samuel Neves 
Signed-off-by: Thomas Gleixner 
Acked-by: Andy Lutomirski 
Cc: sta...@vger.kernel.org
Link: https://lkml.kernel.org/r/20180901201452.27828-1-sne...@dei.uc.pt

---
 arch/x86/include/asm/vgtod.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/vgtod.h b/arch/x86/include/asm/vgtod.h
index fb856c9f0449..53748541c487 100644
--- a/arch/x86/include/asm/vgtod.h
+++ b/arch/x86/include/asm/vgtod.h
@@ -93,7 +93,7 @@ static inline unsigned int __getcpu(void)
 *
 * If RDPID is available, use it.
 */
-   alternative_io ("lsl %[p],%[seg]",
+   alternative_io ("lsl %[seg],%[p]",
".byte 0xf3,0x0f,0xc7,0xf8", /* RDPID %eax/rax */
X86_FEATURE_RDPID,
[p] "=a" (p), [seg] "r" (__PER_CPU_SEG));


[tip:x86/urgent] x86/vdso: Fix lsl operand order

2018-09-01 Thread tip-bot for Samuel Neves
Commit-ID:  e78e5a91456fcecaa2efbb3706572fe043766f4d
Gitweb: https://git.kernel.org/tip/e78e5a91456fcecaa2efbb3706572fe043766f4d
Author: Samuel Neves 
AuthorDate: Sat, 1 Sep 2018 21:14:52 +0100
Committer:  Thomas Gleixner 
CommitDate: Sat, 1 Sep 2018 23:01:56 +0200

x86/vdso: Fix lsl operand order

In the __getcpu function, lsl is using the wrong target and destination
registers. Luckily, the compiler tends to choose %eax for both variables,
so it has been working so far.

Fixes: a582c540ac1b ("x86/vdso: Use RDPID in preference to LSL when available")
Signed-off-by: Samuel Neves 
Signed-off-by: Thomas Gleixner 
Acked-by: Andy Lutomirski 
Cc: sta...@vger.kernel.org
Link: https://lkml.kernel.org/r/20180901201452.27828-1-sne...@dei.uc.pt

---
 arch/x86/include/asm/vgtod.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/vgtod.h b/arch/x86/include/asm/vgtod.h
index fb856c9f0449..53748541c487 100644
--- a/arch/x86/include/asm/vgtod.h
+++ b/arch/x86/include/asm/vgtod.h
@@ -93,7 +93,7 @@ static inline unsigned int __getcpu(void)
 *
 * If RDPID is available, use it.
 */
-   alternative_io ("lsl %[p],%[seg]",
+   alternative_io ("lsl %[seg],%[p]",
".byte 0xf3,0x0f,0xc7,0xf8", /* RDPID %eax/rax */
X86_FEATURE_RDPID,
[p] "=a" (p), [seg] "r" (__PER_CPU_SEG));


[PATCH] x86/vdso: fix lsl operand order

2018-09-01 Thread Samuel Neves
In the __getcpu function, lsl was using the wrong target
and destination registers. Luckily, the compiler tends to
choose %eax for both variables, so it has been working
so far.

Cc: x...@kernel.org
Cc: sta...@vger.kernel.org
Signed-off-by: Samuel Neves 
---
 arch/x86/include/asm/vgtod.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/vgtod.h b/arch/x86/include/asm/vgtod.h
index fb856c9f0449..53748541c487 100644
--- a/arch/x86/include/asm/vgtod.h
+++ b/arch/x86/include/asm/vgtod.h
@@ -93,7 +93,7 @@ static inline unsigned int __getcpu(void)
 *
 * If RDPID is available, use it.
 */
-   alternative_io ("lsl %[p],%[seg]",
+   alternative_io ("lsl %[seg],%[p]",
".byte 0xf3,0x0f,0xc7,0xf8", /* RDPID %eax/rax */
X86_FEATURE_RDPID,
[p] "=a" (p), [seg] "r" (__PER_CPU_SEG));
-- 
2.17.1



[PATCH] x86/vdso: fix lsl operand order

2018-09-01 Thread Samuel Neves
In the __getcpu function, lsl was using the wrong target
and destination registers. Luckily, the compiler tends to
choose %eax for both variables, so it has been working
so far.

Cc: x...@kernel.org
Cc: sta...@vger.kernel.org
Signed-off-by: Samuel Neves 
---
 arch/x86/include/asm/vgtod.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/vgtod.h b/arch/x86/include/asm/vgtod.h
index fb856c9f0449..53748541c487 100644
--- a/arch/x86/include/asm/vgtod.h
+++ b/arch/x86/include/asm/vgtod.h
@@ -93,7 +93,7 @@ static inline unsigned int __getcpu(void)
 *
 * If RDPID is available, use it.
 */
-   alternative_io ("lsl %[p],%[seg]",
+   alternative_io ("lsl %[seg],%[p]",
".byte 0xf3,0x0f,0xc7,0xf8", /* RDPID %eax/rax */
X86_FEATURE_RDPID,
[p] "=a" (p), [seg] "r" (__PER_CPU_SEG));
-- 
2.17.1



[tip:x86/urgent] x86/topology: Update the 'cpu cores' field in /proc/cpuinfo correctly across CPU hotplug operations

2018-02-23 Thread tip-bot for Samuel Neves
Commit-ID:  4596749339e06dc7a424fc08a15eded850ed78b7
Gitweb: https://git.kernel.org/tip/4596749339e06dc7a424fc08a15eded850ed78b7
Author: Samuel Neves <sne...@dei.uc.pt>
AuthorDate: Wed, 21 Feb 2018 20:50:36 +
Committer:  Ingo Molnar <mi...@kernel.org>
CommitDate: Fri, 23 Feb 2018 08:47:47 +0100

x86/topology: Update the 'cpu cores' field in /proc/cpuinfo correctly across 
CPU hotplug operations

Without this fix, /proc/cpuinfo will display an incorrect amount
of CPU cores, after bringing them offline and online again, as
exemplified below:

  $ cat /proc/cpuinfo | grep cores
  cpu cores : 4
  cpu cores : 8
  cpu cores : 8
  cpu cores : 20
  cpu cores : 4
  cpu cores : 3
  cpu cores : 2
  cpu cores : 2

This patch fixes this by always zeroing the booted_cores variable
upon turning off a logical CPU.

Tested-by: Dou Liyang <douly.f...@cn.fujitsu.com>
Signed-off-by: Samuel Neves <sne...@dei.uc.pt>
Cc: Linus Torvalds <torva...@linux-foundation.org>
Cc: Peter Zijlstra <pet...@infradead.org>
Cc: Thomas Gleixner <t...@linutronix.de>
Cc: jgr...@suse.com
Cc: l...@kernel.org
Cc: pra...@redhat.com
Cc: vkuzn...@redhat.com
Link: http://lkml.kernel.org/r/20180221205036.5244-1-sne...@dei.uc.pt
Signed-off-by: Ingo Molnar <mi...@kernel.org>
---
 arch/x86/kernel/smpboot.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
index 9eee25d07586..ff99e2b6fc54 100644
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -1437,6 +1437,7 @@ static void remove_siblinginfo(int cpu)
cpumask_clear(topology_sibling_cpumask(cpu));
cpumask_clear(topology_core_cpumask(cpu));
c->cpu_core_id = 0;
+   c->booted_cores = 0;
cpumask_clear_cpu(cpu, cpu_sibling_setup_mask);
recompute_smt_state();
 }


[tip:x86/urgent] x86/topology: Update the 'cpu cores' field in /proc/cpuinfo correctly across CPU hotplug operations

2018-02-23 Thread tip-bot for Samuel Neves
Commit-ID:  4596749339e06dc7a424fc08a15eded850ed78b7
Gitweb: https://git.kernel.org/tip/4596749339e06dc7a424fc08a15eded850ed78b7
Author: Samuel Neves 
AuthorDate: Wed, 21 Feb 2018 20:50:36 +
Committer:  Ingo Molnar 
CommitDate: Fri, 23 Feb 2018 08:47:47 +0100

x86/topology: Update the 'cpu cores' field in /proc/cpuinfo correctly across 
CPU hotplug operations

Without this fix, /proc/cpuinfo will display an incorrect amount
of CPU cores, after bringing them offline and online again, as
exemplified below:

  $ cat /proc/cpuinfo | grep cores
  cpu cores : 4
  cpu cores : 8
  cpu cores : 8
  cpu cores : 20
  cpu cores : 4
  cpu cores : 3
  cpu cores : 2
  cpu cores : 2

This patch fixes this by always zeroing the booted_cores variable
upon turning off a logical CPU.

Tested-by: Dou Liyang 
Signed-off-by: Samuel Neves 
Cc: Linus Torvalds 
Cc: Peter Zijlstra 
Cc: Thomas Gleixner 
Cc: jgr...@suse.com
Cc: l...@kernel.org
Cc: pra...@redhat.com
Cc: vkuzn...@redhat.com
Link: http://lkml.kernel.org/r/20180221205036.5244-1-sne...@dei.uc.pt
Signed-off-by: Ingo Molnar 
---
 arch/x86/kernel/smpboot.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
index 9eee25d07586..ff99e2b6fc54 100644
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -1437,6 +1437,7 @@ static void remove_siblinginfo(int cpu)
cpumask_clear(topology_sibling_cpumask(cpu));
cpumask_clear(topology_core_cpumask(cpu));
c->cpu_core_id = 0;
+   c->booted_cores = 0;
cpumask_clear_cpu(cpu, cpu_sibling_setup_mask);
recompute_smt_state();
 }


[PATCH] smpboot: correctly update number of booted cores

2018-02-21 Thread Samuel Neves
Without this fix, /proc/cpuinfo will display an incorrect amount
of CPU cores, after bringing them offline and online again, as
exemplified below:

$ cat /proc/cpuinfo | grep cores
cpu cores   : 4
cpu cores   : 8
cpu cores   : 8
cpu cores   : 20
cpu cores   : 4
cpu cores   : 3
cpu cores   : 2
cpu cores   : 2

This patch fixes this by always zeroing the booted_cores variable
upon turning off a logical CPU.

Signed-off-by: Samuel Neves <sne...@dei.uc.pt>
---
 arch/x86/kernel/smpboot.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
index 9eee25d07586..ff99e2b6fc54 100644
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -1437,6 +1437,7 @@ static void remove_siblinginfo(int cpu)
cpumask_clear(topology_sibling_cpumask(cpu));
cpumask_clear(topology_core_cpumask(cpu));
c->cpu_core_id = 0;
+   c->booted_cores = 0;
cpumask_clear_cpu(cpu, cpu_sibling_setup_mask);
recompute_smt_state();
 }
-- 
2.14.3



[PATCH] smpboot: correctly update number of booted cores

2018-02-21 Thread Samuel Neves
Without this fix, /proc/cpuinfo will display an incorrect amount
of CPU cores, after bringing them offline and online again, as
exemplified below:

$ cat /proc/cpuinfo | grep cores
cpu cores   : 4
cpu cores   : 8
cpu cores   : 8
cpu cores   : 20
cpu cores   : 4
cpu cores   : 3
cpu cores   : 2
cpu cores   : 2

This patch fixes this by always zeroing the booted_cores variable
upon turning off a logical CPU.

Signed-off-by: Samuel Neves 
---
 arch/x86/kernel/smpboot.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
index 9eee25d07586..ff99e2b6fc54 100644
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -1437,6 +1437,7 @@ static void remove_siblinginfo(int cpu)
cpumask_clear(topology_sibling_cpumask(cpu));
cpumask_clear(topology_core_cpumask(cpu));
c->cpu_core_id = 0;
+   c->booted_cores = 0;
cpumask_clear_cpu(cpu, cpu_sibling_setup_mask);
recompute_smt_state();
 }
-- 
2.14.3