Re: [PATCH] x86/cpuid: Deal with broken firmware once more

2016-11-10 Thread Charles (Chas) Williams



On 11/10/2016 09:02 AM, Boris Ostrovsky wrote:

On 11/10/2016 06:13 AM, Thomas Gleixner wrote:

On Thu, 10 Nov 2016, M. Vefa Bicakci wrote:


I have found that your patch unfortunately does not improve the situation
for me. Here is an excerpt obtained from the dmesg of a kernel compiled
with this patch *as well as* Sebastian's patch:
[0.002561] CPU: Physical Processor ID: 0
[0.002566] CPU: Processor Core ID: 0
[0.002572] [Firmware Bug]: CPU0: APIC id mismatch. Firmware:  CPUID: 2

So apic->cpu_present_to_apicid() gives us a completely bogus APIC id which
translates to a bogus package id. And looking at the XEN code:

   xen_pv_apic.cpu_present_to_apicid = xen_cpu_present_to_apicid,

and xen_cpu_present_to_apicid does:

static int xen_cpu_present_to_apicid(int cpu)
{
if (cpu_present(cpu))
return xen_get_apic_id(xen_apic_read(APIC_ID));
else
return BAD_APICID;
}

So independent of which present CPU we query we get just some random
information, in the above case we get BAD_APICID from xen_apic_read() not
from the else path as this CPU _IS_ present.

What's so wrong with storing the fricking firmware supplied APICid as
everybody else does and report it back when queried?


By firmware you mean ACPI? It is most likely not available to PV guests.
How about returning cpu_data(cpu).initial_apicid?

And what was the original problem?


The original issue I found was that VMware was returning a different set
of APIC id's in the ACPI tables than what it advertised on the CPU's.

http://www.mail-archive.com/linux-kernel@vger.kernel.org/msg1266716.html



This damned attitude of we just hack the code into submission and let
everybody else deal with the outcoming is utterly annoying.

Thanks,

tglx





Re: [PATCH] x86/cpuid: Deal with broken firmware once more

2016-11-10 Thread Charles (Chas) Williams



On 11/10/2016 09:02 AM, Boris Ostrovsky wrote:

On 11/10/2016 06:13 AM, Thomas Gleixner wrote:

On Thu, 10 Nov 2016, M. Vefa Bicakci wrote:


I have found that your patch unfortunately does not improve the situation
for me. Here is an excerpt obtained from the dmesg of a kernel compiled
with this patch *as well as* Sebastian's patch:
[0.002561] CPU: Physical Processor ID: 0
[0.002566] CPU: Processor Core ID: 0
[0.002572] [Firmware Bug]: CPU0: APIC id mismatch. Firmware:  CPUID: 2

So apic->cpu_present_to_apicid() gives us a completely bogus APIC id which
translates to a bogus package id. And looking at the XEN code:

   xen_pv_apic.cpu_present_to_apicid = xen_cpu_present_to_apicid,

and xen_cpu_present_to_apicid does:

static int xen_cpu_present_to_apicid(int cpu)
{
if (cpu_present(cpu))
return xen_get_apic_id(xen_apic_read(APIC_ID));
else
return BAD_APICID;
}

So independent of which present CPU we query we get just some random
information, in the above case we get BAD_APICID from xen_apic_read() not
from the else path as this CPU _IS_ present.

What's so wrong with storing the fricking firmware supplied APICid as
everybody else does and report it back when queried?


By firmware you mean ACPI? It is most likely not available to PV guests.
How about returning cpu_data(cpu).initial_apicid?

And what was the original problem?


The original issue I found was that VMware was returning a different set
of APIC id's in the ACPI tables than what it advertised on the CPU's.

http://www.mail-archive.com/linux-kernel@vger.kernel.org/msg1266716.html



This damned attitude of we just hack the code into submission and let
everybody else deal with the outcoming is utterly annoying.

Thanks,

tglx





Re: [PATCH] x86/cpuid: Deal with broken firmware once more

2016-11-10 Thread Charles (Chas) Williams



On 11/09/2016 10:57 PM, M. Vefa Bicakci wrote:

[0.002000] mvb: CPU: Physical Processor ID: 0
[0.002000] mvb: CPU: Processor Core ID: 0
[0.002000] mvb: identify_cpu:1112: c: 880013b0a040, c->logical_proc_id: 
65535
[0.002000] mvb: __default_cpu_present_to_apicid:612: Returning 65535! 
mps_cpu: 1, nr_cpu_ids: 2, cpu_present(mps_cpu): 1
[0.002000] smpboot: mvb: topology_update_package_map:270: cpu: 1, pkg: 4095
[0.002000] smpboot: APIC() Converting physical 4095 to logical package 0
[0.002000] smpboot: mvb: topology_update_package_map:305: cpu: 1, 
cpu_data(cpu).logical_proc_id: 0


This seems strange.  0x is BAD_APICID.  Why didn't this fail here:

for_each_present_cpu(cpu) {
unsigned int apicid = apic->cpu_present_to_apicid(cpu);

if (apicid == BAD_APICID || !apic->apic_id_valid(apicid))  
<<
continue;
if (!topology_update_package_map(apicid, cpu))
continue;

topology_update_package_map() should never have been called?


Re: [PATCH] x86/cpuid: Deal with broken firmware once more

2016-11-10 Thread Charles (Chas) Williams



On 11/09/2016 10:57 PM, M. Vefa Bicakci wrote:

[0.002000] mvb: CPU: Physical Processor ID: 0
[0.002000] mvb: CPU: Processor Core ID: 0
[0.002000] mvb: identify_cpu:1112: c: 880013b0a040, c->logical_proc_id: 
65535
[0.002000] mvb: __default_cpu_present_to_apicid:612: Returning 65535! 
mps_cpu: 1, nr_cpu_ids: 2, cpu_present(mps_cpu): 1
[0.002000] smpboot: mvb: topology_update_package_map:270: cpu: 1, pkg: 4095
[0.002000] smpboot: APIC() Converting physical 4095 to logical package 0
[0.002000] smpboot: mvb: topology_update_package_map:305: cpu: 1, 
cpu_data(cpu).logical_proc_id: 0


This seems strange.  0x is BAD_APICID.  Why didn't this fail here:

for_each_present_cpu(cpu) {
unsigned int apicid = apic->cpu_present_to_apicid(cpu);

if (apicid == BAD_APICID || !apic->apic_id_valid(apicid))  
<<
continue;
if (!topology_update_package_map(apicid, cpu))
continue;

topology_update_package_map() should never have been called?


Re: [PATCH] x86/cpuid: Deal with broken firmware once more

2016-11-09 Thread Charles (Chas) Williams

On 11/09/2016 10:35 AM, Thomas Gleixner wrote:

Both ACPI and MP specifications require that the APIC id in the respective
tables must be the same as the APIC id in CPUID.

The kernel retrieves the physical package id from the APIC id during the
ACPI/MP table scan and builds the physical to logical package map.

There exist Virtualbox and Xen implementations which violate the spec. As a
result the physical to logical package map, which relies on the ACPI/MP
tables does not work on those systems, because the CPUID initialized
physical package id does not match the firmware id. This causes system
crashes and malfunction due to invalid package mappings.

The only way to cure this is to sanitize the physical package id after the
CPUID enumeration and yell when the APIC ids are different. If the physical
package IDs differ use the package information from the ACPI/MP tables so
the existing logical package map just works.

Reported-by: "Charles (Chas) Williams" <ciwil...@brocade.com>,
Reported-by: M. Vefa Bicakci <m@runbox.com>
Signed-off-by: Thomas Gleixner <t...@linutronix.de>



For 4 virtual sockets, 1 core per socket VM:

[0.235459]  node  #0, CPUs:  #1
[0.238579] Disabled fast string operations
[0.238620] mce: CPU supports 0 MCE banks
[0.238864] [Firmware Bug]: CPU1: APIC id mismatch. Firmware: 1 CPUID: 2
[0.238878] [Firmware Bug]: CPU1: Using firmware package id 1 instead of 2
[0.239502]  #2
[0.241298] Disabled fast string operations
[0.241356] mce: CPU supports 0 MCE banks
[0.241429] [Firmware Bug]: CPU2: APIC id mismatch. Firmware: 2 CPUID: 4
[0.241431] [Firmware Bug]: CPU2: Using firmware package id 2 instead of 4
[0.241631]  #3
[0.244075] Disabled fast string operations
[0.244112] mce: CPU supports 0 MCE banks
[0.244284] [Firmware Bug]: CPU3: APIC id mismatch. Firmware: 3 CPUID: 6
[0.244293] [Firmware Bug]: CPU3: Using firmware package id 3 instead of 6

For a 2 virtual sockets, 2 cores per socket, VMware seems to get its
APIC table correct as this fixup code isn't triggered.  The mapping looks like:

[0.028911] topology_update_package_map:  apicid 0  pkg 0  cpu 0
[0.029068] smpboot: APIC(0) Converting physical 0 to logical package 0, cpu 0
[0.029072] topology_update_package_map:  apicid 1  pkg 0  cpu 1
[0.029220] topology_update_package_map:  apicid 2  pkg 1  cpu 2
[0.029376] smpboot: APIC(2) Converting physical 1 to logical package 1, cpu 
2
[0.029381] topology_update_package_map:  apicid 3  pkg 1  cpu 3
[0.029525] smpboot: Max logical packages: 2

For a VM with 1 virtual socket and 4 cores, the behavior is again correct.

[0.016198] topology_update_package_map:  apicid 0  pkg 0  cpu 0
[0.016271] smpboot: APIC(0) Converting physical 0 to logical package 0, cpu 
0 (88023fc0a040)
[0.016273] topology_update_package_map:  apicid 1  pkg 0  cpu 1
[0.016336] topology_update_package_map:  apicid 2  pkg 0  cpu 2
[0.016397] topology_update_package_map:  apicid 3  pkg 0  cpu 3

It looks like VMware might have some assumption about the minimum number
of cores on a virtual socket.  Regardless, the fix solves my problem!

Thanks!


Re: [PATCH] x86/cpuid: Deal with broken firmware once more

2016-11-09 Thread Charles (Chas) Williams

On 11/09/2016 10:35 AM, Thomas Gleixner wrote:

Both ACPI and MP specifications require that the APIC id in the respective
tables must be the same as the APIC id in CPUID.

The kernel retrieves the physical package id from the APIC id during the
ACPI/MP table scan and builds the physical to logical package map.

There exist Virtualbox and Xen implementations which violate the spec. As a
result the physical to logical package map, which relies on the ACPI/MP
tables does not work on those systems, because the CPUID initialized
physical package id does not match the firmware id. This causes system
crashes and malfunction due to invalid package mappings.

The only way to cure this is to sanitize the physical package id after the
CPUID enumeration and yell when the APIC ids are different. If the physical
package IDs differ use the package information from the ACPI/MP tables so
the existing logical package map just works.

Reported-by: "Charles (Chas) Williams" ,
Reported-by: M. Vefa Bicakci 
Signed-off-by: Thomas Gleixner 



For 4 virtual sockets, 1 core per socket VM:

[0.235459]  node  #0, CPUs:  #1
[0.238579] Disabled fast string operations
[0.238620] mce: CPU supports 0 MCE banks
[0.238864] [Firmware Bug]: CPU1: APIC id mismatch. Firmware: 1 CPUID: 2
[0.238878] [Firmware Bug]: CPU1: Using firmware package id 1 instead of 2
[0.239502]  #2
[0.241298] Disabled fast string operations
[0.241356] mce: CPU supports 0 MCE banks
[0.241429] [Firmware Bug]: CPU2: APIC id mismatch. Firmware: 2 CPUID: 4
[0.241431] [Firmware Bug]: CPU2: Using firmware package id 2 instead of 4
[0.241631]  #3
[0.244075] Disabled fast string operations
[0.244112] mce: CPU supports 0 MCE banks
[0.244284] [Firmware Bug]: CPU3: APIC id mismatch. Firmware: 3 CPUID: 6
[0.244293] [Firmware Bug]: CPU3: Using firmware package id 3 instead of 6

For a 2 virtual sockets, 2 cores per socket, VMware seems to get its
APIC table correct as this fixup code isn't triggered.  The mapping looks like:

[0.028911] topology_update_package_map:  apicid 0  pkg 0  cpu 0
[0.029068] smpboot: APIC(0) Converting physical 0 to logical package 0, cpu 0
[0.029072] topology_update_package_map:  apicid 1  pkg 0  cpu 1
[0.029220] topology_update_package_map:  apicid 2  pkg 1  cpu 2
[0.029376] smpboot: APIC(2) Converting physical 1 to logical package 1, cpu 
2
[0.029381] topology_update_package_map:  apicid 3  pkg 1  cpu 3
[0.029525] smpboot: Max logical packages: 2

For a VM with 1 virtual socket and 4 cores, the behavior is again correct.

[0.016198] topology_update_package_map:  apicid 0  pkg 0  cpu 0
[0.016271] smpboot: APIC(0) Converting physical 0 to logical package 0, cpu 
0 (88023fc0a040)
[0.016273] topology_update_package_map:  apicid 1  pkg 0  cpu 1
[0.016336] topology_update_package_map:  apicid 2  pkg 0  cpu 2
[0.016397] topology_update_package_map:  apicid 3  pkg 0  cpu 3

It looks like VMware might have some assumption about the minimum number
of cores on a virtual socket.  Regardless, the fix solves my problem!

Thanks!


Re: [PATCH] x86/cpuid: Deal with broken firmware once more

2016-11-09 Thread Charles (Chas) Williams



On 11/09/2016 11:03 AM, Peter Zijlstra wrote:

On Wed, Nov 09, 2016 at 04:35:51PM +0100, Thomas Gleixner wrote:

Both ACPI and MP specifications require that the APIC id in the respective
tables must be the same as the APIC id in CPUID.

The kernel retrieves the physical package id from the APIC id during the
ACPI/MP table scan and builds the physical to logical package map.

There exist Virtualbox and Xen implementations which violate the spec. As a


ISTR it was VMware, not VirtualBox, but whatever.. they're both crazy
virt stuff.


Yes, this was VMware in particular.  It would be good to get this comment
right so as not to mislead anyone.


 /*
+ * The physical to logical package id mapping is initialized from the
+ * acpi/mptables information. Make sure that CPUID actually agrees with
+ * that.
+ */
+static void sanitize_package_id(struct cpuinfo_x86 *c)
+{
+#ifdef CONFIG_SMP
+   unsigned int pkg, apicid, cpu = smp_processor_id();
+
+   apicid = apic->cpu_present_to_apicid(cpu);
+   pkg = apicid >> boot_cpu_data.x86_coreid_bits;
+
+   if (apicid != c->initial_apicid) {
+   pr_err(FW_BUG "CPU%u: APIC id mismatch. Firmware: %x CPUID: 
%x\n",
+  cpu, apicid, c->initial_apicid);


Should we not also 'fix' c->initial_apicid ?


Since we have c->apicid and c->initial_apicid it seems reasonable to keep one
set to the "correct" value.  I don't think c->initial_apicid is used past
this.

I should have some tests on this patch later today.


Re: [PATCH] x86/cpuid: Deal with broken firmware once more

2016-11-09 Thread Charles (Chas) Williams



On 11/09/2016 11:03 AM, Peter Zijlstra wrote:

On Wed, Nov 09, 2016 at 04:35:51PM +0100, Thomas Gleixner wrote:

Both ACPI and MP specifications require that the APIC id in the respective
tables must be the same as the APIC id in CPUID.

The kernel retrieves the physical package id from the APIC id during the
ACPI/MP table scan and builds the physical to logical package map.

There exist Virtualbox and Xen implementations which violate the spec. As a


ISTR it was VMware, not VirtualBox, but whatever.. they're both crazy
virt stuff.


Yes, this was VMware in particular.  It would be good to get this comment
right so as not to mislead anyone.


 /*
+ * The physical to logical package id mapping is initialized from the
+ * acpi/mptables information. Make sure that CPUID actually agrees with
+ * that.
+ */
+static void sanitize_package_id(struct cpuinfo_x86 *c)
+{
+#ifdef CONFIG_SMP
+   unsigned int pkg, apicid, cpu = smp_processor_id();
+
+   apicid = apic->cpu_present_to_apicid(cpu);
+   pkg = apicid >> boot_cpu_data.x86_coreid_bits;
+
+   if (apicid != c->initial_apicid) {
+   pr_err(FW_BUG "CPU%u: APIC id mismatch. Firmware: %x CPUID: 
%x\n",
+  cpu, apicid, c->initial_apicid);


Should we not also 'fix' c->initial_apicid ?


Since we have c->apicid and c->initial_apicid it seems reasonable to keep one
set to the "correct" value.  I don't think c->initial_apicid is used past
this.

I should have some tests on this patch later today.


Re: [RFC PATCH] perf/x86/intel/rapl: avoid access unallocate memory

2016-11-08 Thread Charles (Chas) Williams



On 11/08/2016 09:31 AM, Thomas Gleixner wrote:

On Tue, 8 Nov 2016, Charles (Chas) Williams wrote:

[0.016335] topology_update_package_map:  apicid 0  pkg 0  cpu 0
[0.016398] smpboot: APIC(0) Converting physical 0 to logical
package 0, cpu 0 (88023fc0a040)
[0.016399] topology_update_package_map:  apicid 1  pkg 1  cpu 1
[0.016462] smpboot: APIC(1) Converting physical 1 to logical
package 1, cpu 1 (88023fd0a040)

So, I don't know where apic->cpu_present_to_apicid(cpu) is getting its
apicid but it certainly doesn't seem to the match the apicid in the
CPU's registers.  For whatever reason, my VMware system is reporting
that the second CPU has a local APIC ID of 2:


The initial information comes from MP tables or ACPI.


[0.009115] identify_cpu: cpu_index 0  phys_proc_id is now 0,
apicid 0, initial_apicid 0
...
[0.237401] identify_cpu: cpu_index 1  phys_proc_id is now 2,
apicid 2, initial_apicid 2


And the CPUID emulation tells something different. Sigh!


I was thinking it might be better to call topology_update_package_map()
at the bottom of identify_cpu() to setup the secondary CPU's.  The boot
cpu could be setup during smp_init_package_map().


Perhaps, but that does not make the inconsistencies go away


By the time I know it's not consistent, there isn't anything I can do
about it.  I can't update the table to remove the bad information.

The other alternative, is to trust the ACPI and just update the
cpu_data's apicid in identify_cpu() to the value from the table.
The earlier kernels didn't seem to rely as much on this information.
But it does appear to be "wrong" in the APIC table.  From acpidump:

[02Ch 0044   1]Subtable Type : 00 [Processor Local APIC]
[02Dh 0045   1]   Length : 08
[02Eh 0046   1] Processor ID : 00
[02Fh 0047   1]Local Apic ID : 00
[030h 0048   4]Flags (decoded below) : 0001
   Processor Enabled : 1

[034h 0052   1]Subtable Type : 00 [Processor Local APIC]
[035h 0053   1]   Length : 08
[036h 0054   1] Processor ID : 01
[037h 0055   1]Local Apic ID : 01
[038h 0056   4]Flags (decoded below) : 0001
   Processor Enabled : 1





Re: [RFC PATCH] perf/x86/intel/rapl: avoid access unallocate memory

2016-11-08 Thread Charles (Chas) Williams



On 11/08/2016 09:31 AM, Thomas Gleixner wrote:

On Tue, 8 Nov 2016, Charles (Chas) Williams wrote:

[0.016335] topology_update_package_map:  apicid 0  pkg 0  cpu 0
[0.016398] smpboot: APIC(0) Converting physical 0 to logical
package 0, cpu 0 (88023fc0a040)
[0.016399] topology_update_package_map:  apicid 1  pkg 1  cpu 1
[0.016462] smpboot: APIC(1) Converting physical 1 to logical
package 1, cpu 1 (88023fd0a040)

So, I don't know where apic->cpu_present_to_apicid(cpu) is getting its
apicid but it certainly doesn't seem to the match the apicid in the
CPU's registers.  For whatever reason, my VMware system is reporting
that the second CPU has a local APIC ID of 2:


The initial information comes from MP tables or ACPI.


[0.009115] identify_cpu: cpu_index 0  phys_proc_id is now 0,
apicid 0, initial_apicid 0
...
[0.237401] identify_cpu: cpu_index 1  phys_proc_id is now 2,
apicid 2, initial_apicid 2


And the CPUID emulation tells something different. Sigh!


I was thinking it might be better to call topology_update_package_map()
at the bottom of identify_cpu() to setup the secondary CPU's.  The boot
cpu could be setup during smp_init_package_map().


Perhaps, but that does not make the inconsistencies go away


By the time I know it's not consistent, there isn't anything I can do
about it.  I can't update the table to remove the bad information.

The other alternative, is to trust the ACPI and just update the
cpu_data's apicid in identify_cpu() to the value from the table.
The earlier kernels didn't seem to rely as much on this information.
But it does appear to be "wrong" in the APIC table.  From acpidump:

[02Ch 0044   1]Subtable Type : 00 [Processor Local APIC]
[02Dh 0045   1]   Length : 08
[02Eh 0046   1] Processor ID : 00
[02Fh 0047   1]Local Apic ID : 00
[030h 0048   4]Flags (decoded below) : 0001
   Processor Enabled : 1

[034h 0052   1]Subtable Type : 00 [Processor Local APIC]
[035h 0053   1]   Length : 08
[036h 0054   1] Processor ID : 01
[037h 0055   1]Local Apic ID : 01
[038h 0056   4]Flags (decoded below) : 0001
   Processor Enabled : 1





Re: [RFC PATCH] perf/x86/intel/rapl: avoid access unallocate memory

2016-11-08 Thread Charles (Chas) Williams

On 11/07/2016 03:20 PM, Thomas Gleixner wrote:

On Mon, 7 Nov 2016, Charles (Chas) Williams wrote:

On 11/07/2016 11:19 AM, Thomas Gleixner wrote:

On Wed, 2 Nov 2016, Charles (Chas) Williams wrote:

I don't know why the CPU's phys_proc_id is 2.


max_physical_pkg_id gets initialized via:

cpus = boot_cpu_data.x86_max_cores;
max_physical_pkg_id = DIV_ROUND_UP(MAX_LOCAL_APIC, ncpus);

What's the value of boot_cpu_data.x86_max_cores and MAX_LOCAL_APIC?


I have discovered that that is not the problem.  smp_init_package_map()
is calculating the physical core id using the following:

for_each_present_cpu(cpu) {
unsigned int apicid = apic->cpu_present_to_apicid(cpu);

...
if (!topology_update_package_map(apicid, cpu))
continue;

...
int topology_update_package_map(unsigned int apicid, unsigned int cpu)
{
unsigned int new, pkg = apicid >>
boot_cpu_data.x86_coreid_bits;

But later when the secondary CPU's are identified they use a different
calculation using the local APIC ID from the CPU's registers:

static void generic_identify(struct cpuinfo_x86 *c)
...
if (c->cpuid_level >= 0x0001) {
c->initial_apicid = (cpuid_ebx(1) >> 24) & 0xFF;
...
c->phys_proc_id = c->initial_apicid;

So at the end of identify_cpu() when the boot/hotplug assignment is
put back:

c->logical_proc_id = topology_phys_to_logical_pkg(c->phys_proc_id);

topology_phys_to_logical_pkg() is returning an invalid logical processor
since one isn't configured.

It's not clear to me what the right thing to do is or which is right.


Nice detective work! So the issue is that the package mapping code honours
boot_cpu_data.x86_coreid_bit, while generic_identify does
not. boot_cpu_data.x86_coreid_bit is obviously 1 in your case. Tentative
fix below. I still need to gow through that maze and figure out what could
go wrong with that :(


I don't think that will fix the issue.  My deubugging leads me to believe
that boot_cpu_data.x86_coreid_bits is probably 0:

[0.016335] topology_update_package_map:  apicid 0  pkg 0  cpu 0
[0.016398] smpboot: APIC(0) Converting physical 0 to logical 
package 0, cpu 0 (88023fc0a040)
[0.016399] topology_update_package_map:  apicid 1  pkg 1  cpu 1
[0.016462] smpboot: APIC(1) Converting physical 1 to logical 
package 1, cpu 1 (88023fd0a040)

So, I don't know where apic->cpu_present_to_apicid(cpu) is getting its
apicid but it certainly doesn't seem to the match the apicid in the
CPU's registers.  For whatever reason, my VMware system is reporting
that the second CPU has a local APIC ID of 2:

[0.009115] identify_cpu: cpu_index 0  phys_proc_id is now 0, apicid 
0, initial_apicid 0
...
[0.237401] identify_cpu: cpu_index 1  phys_proc_id is now 2, apicid 
2, initial_apicid 2

I was thinking it might be better to call topology_update_package_map()
at the bottom of identify_cpu() to setup the secondary CPU's.  The boot
cpu could be setup during smp_init_package_map().

topology_update_package_map() is also poorly named it can create the
assignment, but can't update it (since it doesn't know the previous
mapping).



8<
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -905,6 +905,8 @@ static void detect_null_seg_behavior(str

 static void generic_identify(struct cpuinfo_x86 *c)
 {
+   unsigned int pkg;
+
c->extended_cpuid_level = 0;

if (!have_cpuid_p())
@@ -929,7 +931,8 @@ static void generic_identify(struct cpui
c->apicid = c->initial_apicid;
 # endif
 #endif
-   c->phys_proc_id = c->initial_apicid;
+   pkg = c->initial_apicid >> boot_cpu_data.x86_coreid_bits;
+   c->phys_proc_id = pkg;
}

get_model_name(c); /* Default name */





Re: [RFC PATCH] perf/x86/intel/rapl: avoid access unallocate memory

2016-11-08 Thread Charles (Chas) Williams

On 11/07/2016 03:20 PM, Thomas Gleixner wrote:

On Mon, 7 Nov 2016, Charles (Chas) Williams wrote:

On 11/07/2016 11:19 AM, Thomas Gleixner wrote:

On Wed, 2 Nov 2016, Charles (Chas) Williams wrote:

I don't know why the CPU's phys_proc_id is 2.


max_physical_pkg_id gets initialized via:

cpus = boot_cpu_data.x86_max_cores;
max_physical_pkg_id = DIV_ROUND_UP(MAX_LOCAL_APIC, ncpus);

What's the value of boot_cpu_data.x86_max_cores and MAX_LOCAL_APIC?


I have discovered that that is not the problem.  smp_init_package_map()
is calculating the physical core id using the following:

for_each_present_cpu(cpu) {
unsigned int apicid = apic->cpu_present_to_apicid(cpu);

...
if (!topology_update_package_map(apicid, cpu))
continue;

...
int topology_update_package_map(unsigned int apicid, unsigned int cpu)
{
unsigned int new, pkg = apicid >>
boot_cpu_data.x86_coreid_bits;

But later when the secondary CPU's are identified they use a different
calculation using the local APIC ID from the CPU's registers:

static void generic_identify(struct cpuinfo_x86 *c)
...
if (c->cpuid_level >= 0x0001) {
c->initial_apicid = (cpuid_ebx(1) >> 24) & 0xFF;
...
c->phys_proc_id = c->initial_apicid;

So at the end of identify_cpu() when the boot/hotplug assignment is
put back:

c->logical_proc_id = topology_phys_to_logical_pkg(c->phys_proc_id);

topology_phys_to_logical_pkg() is returning an invalid logical processor
since one isn't configured.

It's not clear to me what the right thing to do is or which is right.


Nice detective work! So the issue is that the package mapping code honours
boot_cpu_data.x86_coreid_bit, while generic_identify does
not. boot_cpu_data.x86_coreid_bit is obviously 1 in your case. Tentative
fix below. I still need to gow through that maze and figure out what could
go wrong with that :(


I don't think that will fix the issue.  My deubugging leads me to believe
that boot_cpu_data.x86_coreid_bits is probably 0:

[0.016335] topology_update_package_map:  apicid 0  pkg 0  cpu 0
[0.016398] smpboot: APIC(0) Converting physical 0 to logical 
package 0, cpu 0 (88023fc0a040)
[0.016399] topology_update_package_map:  apicid 1  pkg 1  cpu 1
[0.016462] smpboot: APIC(1) Converting physical 1 to logical 
package 1, cpu 1 (88023fd0a040)

So, I don't know where apic->cpu_present_to_apicid(cpu) is getting its
apicid but it certainly doesn't seem to the match the apicid in the
CPU's registers.  For whatever reason, my VMware system is reporting
that the second CPU has a local APIC ID of 2:

[0.009115] identify_cpu: cpu_index 0  phys_proc_id is now 0, apicid 
0, initial_apicid 0
...
[0.237401] identify_cpu: cpu_index 1  phys_proc_id is now 2, apicid 
2, initial_apicid 2

I was thinking it might be better to call topology_update_package_map()
at the bottom of identify_cpu() to setup the secondary CPU's.  The boot
cpu could be setup during smp_init_package_map().

topology_update_package_map() is also poorly named it can create the
assignment, but can't update it (since it doesn't know the previous
mapping).



8<
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -905,6 +905,8 @@ static void detect_null_seg_behavior(str

 static void generic_identify(struct cpuinfo_x86 *c)
 {
+   unsigned int pkg;
+
c->extended_cpuid_level = 0;

if (!have_cpuid_p())
@@ -929,7 +931,8 @@ static void generic_identify(struct cpui
c->apicid = c->initial_apicid;
 # endif
 #endif
-   c->phys_proc_id = c->initial_apicid;
+   pkg = c->initial_apicid >> boot_cpu_data.x86_coreid_bits;
+   c->phys_proc_id = pkg;
}

get_model_name(c); /* Default name */





Re: [RFC PATCH] perf/x86/intel/rapl: avoid access unallocate memory

2016-11-07 Thread Charles (Chas) Williams

On 11/07/2016 11:19 AM, Thomas Gleixner wrote:

On Wed, 2 Nov 2016, Charles (Chas) Williams wrote:


On 11/02/2016 08:25 AM, Sebastian Andrzej Siewior wrote:

I am not sure if this a race with the new hotplug code or something that was
always there. Both (M. Vefa Bicakc and Charles) say that the box boots
sometimes fine (without the patch).  smp_store_boot_cpu_info() should have
run
before the notofoert and thus should have set the info properly. However I
got
the following bootlog from Charles with this patch:


I don't this this is a race.  Here is some debugging from the two CPU VM
(2 sockets, 1 core per socket).  In identify_cpu() we have:

/* The boot/hotplug time assigment got cleared, restore it */
c->logical_proc_id = topology_phys_to_logical_pkg(c->phys_proc_id);

The values just after this:

[0.228306] identify_cpu: c 88023fd0a040  logical_proc_id 65535
c->phys_proc_id 2

So what's interesting here, is the phys_proc_id of 2 for CPU1:

int topology_phys_to_logical_pkg(unsigned int phys_pkg)
{
if (phys_pkg >= max_physical_pkg_id)
return -1;
return physical_to_logical_pkg[phys_pkg];
}

And we happen to know the max_physical_pkg_id is 2 in this case.
So apparently, topology_phys_to_logical_pkg() returns -1 and it gets
assigned to the logical_proc_id.

I don't know why the CPU's phys_proc_id is 2.


max_physical_pkg_id gets initialized via:

cpus = boot_cpu_data.x86_max_cores;
max_physical_pkg_id = DIV_ROUND_UP(MAX_LOCAL_APIC, ncpus);

What's the value of boot_cpu_data.x86_max_cores and MAX_LOCAL_APIC?


I have discovered that that is not the problem.  smp_init_package_map()
is calculating the physical core id using the following:

for_each_present_cpu(cpu) {
unsigned int apicid = apic->cpu_present_to_apicid(cpu);

...
if (!topology_update_package_map(apicid, cpu))
continue;

...
int topology_update_package_map(unsigned int apicid, unsigned int cpu)
{
unsigned int new, pkg = apicid >> boot_cpu_data.x86_coreid_bits;

But later when the secondary CPU's are identified they use a different
calculation using the local APIC ID from the CPU's registers:

static void generic_identify(struct cpuinfo_x86 *c)
...
if (c->cpuid_level >= 0x0001) {
c->initial_apicid = (cpuid_ebx(1) >> 24) & 0xFF;
...
c->phys_proc_id = c->initial_apicid;

So at the end of identify_cpu() when the boot/hotplug assignment is
put back:

c->logical_proc_id = topology_phys_to_logical_pkg(c->phys_proc_id);

topology_phys_to_logical_pkg() is returning an invalid logical processor
since one isn't configured.

It's not clear to me what the right thing to do is or which is right.


Re: [RFC PATCH] perf/x86/intel/rapl: avoid access unallocate memory

2016-11-07 Thread Charles (Chas) Williams

On 11/07/2016 11:19 AM, Thomas Gleixner wrote:

On Wed, 2 Nov 2016, Charles (Chas) Williams wrote:


On 11/02/2016 08:25 AM, Sebastian Andrzej Siewior wrote:

I am not sure if this a race with the new hotplug code or something that was
always there. Both (M. Vefa Bicakc and Charles) say that the box boots
sometimes fine (without the patch).  smp_store_boot_cpu_info() should have
run
before the notofoert and thus should have set the info properly. However I
got
the following bootlog from Charles with this patch:


I don't this this is a race.  Here is some debugging from the two CPU VM
(2 sockets, 1 core per socket).  In identify_cpu() we have:

/* The boot/hotplug time assigment got cleared, restore it */
c->logical_proc_id = topology_phys_to_logical_pkg(c->phys_proc_id);

The values just after this:

[0.228306] identify_cpu: c 88023fd0a040  logical_proc_id 65535
c->phys_proc_id 2

So what's interesting here, is the phys_proc_id of 2 for CPU1:

int topology_phys_to_logical_pkg(unsigned int phys_pkg)
{
if (phys_pkg >= max_physical_pkg_id)
return -1;
return physical_to_logical_pkg[phys_pkg];
}

And we happen to know the max_physical_pkg_id is 2 in this case.
So apparently, topology_phys_to_logical_pkg() returns -1 and it gets
assigned to the logical_proc_id.

I don't know why the CPU's phys_proc_id is 2.


max_physical_pkg_id gets initialized via:

cpus = boot_cpu_data.x86_max_cores;
max_physical_pkg_id = DIV_ROUND_UP(MAX_LOCAL_APIC, ncpus);

What's the value of boot_cpu_data.x86_max_cores and MAX_LOCAL_APIC?


I have discovered that that is not the problem.  smp_init_package_map()
is calculating the physical core id using the following:

for_each_present_cpu(cpu) {
unsigned int apicid = apic->cpu_present_to_apicid(cpu);

...
if (!topology_update_package_map(apicid, cpu))
continue;

...
int topology_update_package_map(unsigned int apicid, unsigned int cpu)
{
unsigned int new, pkg = apicid >> boot_cpu_data.x86_coreid_bits;

But later when the secondary CPU's are identified they use a different
calculation using the local APIC ID from the CPU's registers:

static void generic_identify(struct cpuinfo_x86 *c)
...
if (c->cpuid_level >= 0x0001) {
c->initial_apicid = (cpuid_ebx(1) >> 24) & 0xFF;
...
c->phys_proc_id = c->initial_apicid;

So at the end of identify_cpu() when the boot/hotplug assignment is
put back:

c->logical_proc_id = topology_phys_to_logical_pkg(c->phys_proc_id);

topology_phys_to_logical_pkg() is returning an invalid logical processor
since one isn't configured.

It's not clear to me what the right thing to do is or which is right.


Re: [RFC PATCH] perf/x86/intel/rapl: avoid access unallocate memory

2016-11-04 Thread Charles (Chas) Williams

On 11/04/2016 02:03 PM, Sebastian Andrzej Siewior wrote:

On 2016-11-04 08:20:37 [-0400], Charles (Chas) Williams wrote:

The initial CPU boots and is identified:

[0.009018] identify_boot_cpu
[0.009174] generic_identify: phys_proc_id is now 0
...
[0.009427] identify_cpu: before c 81ae2680  logical_proc_id 0  
c->phys_proc_id 0
[0.009506] identify_cpu: after c 81ae2680  logical_proc_id 65535  
c->phys_proc_id 0

So, this is fine because the APIC hasn't been scanned yet.  APIC
now gets scanned:

[0.015789] smpboot: APIC(0) Converting physical 0 to logical package 0, cpu 
0 (88023fc0a040)
[0.015794] smpboot: APIC(1) Converting physical 1 to logical package 1, cpu 
1 (88023fd0a040)
[0.015797] smpboot: Max logical packages: 2


where is the APICID here is comming from?


This comes from here:

unsigned int apicid = apic->cpu_present_to_apicid(cpu);

if (apicid == BAD_APICID || !apic->apic_id_valid(apicid))
continue;
if (!topology_update_package_map(apicid, cpu))

And I think this is the part that is "wrong".  The apicid appears to
be a logical CPU id.  I believe that in most cases this mapping comes
from x86_bios_cpu_apicid (or x86_cpu_to_apicid) which is generated in
generic_processor_info() which maps apicid's to logical cpu indexes.

Note that apic->cpu_present_to_apicid() is using just the cpu_index.

for_each_present_cpu(cpu) {
unsigned int apicid = apic->cpu_present_to_apicid(cpu);

if (apicid == BAD_APICID || !apic->apic_id_valid(apicid))
continue;
if (!topology_update_package_map(apicid, cpu))
continue;
pr_warn("CPU %u APICId %x disabled\n", cpu, apicid);
per_cpu(x86_bios_cpu_apicid, cpu) = BAD_APICID;
set_cpu_possible(cpu, false);
set_cpu_present(cpu, false);
}


So, at this point, I think everything is correct.  But now the secondary
CPU's "boot":

[0.236569] identify_secondary_cpu
[0.236620] generic_identify: phys_proc_id is now 2


so here is where fun starts. Xen has also
arch/x86/xen/smp.c::cpu_bringup() where the phys_proc_id is changed. But
isn't done for vmware but it might a place where they duct tape things.

How is this APIC id different from the earlier? I guess based on your
output that generic_identify() changes the content of phys_proc_id.


[0.236745] identify_cpu: before c 88023fd0a040  logical_proc_id 65535  
c->phys_proc_id 2
[0.236747] identify_cpu: after c 88023fd0a040  logical_proc_id 65535  
c->phys_proc_id 2

So, APIC discovered I have a cpu 0 and 1 but generic_identify() is called
my second CPU, 2.  This is >= max_physical_pkg_id, so it is going to get
set to -1.


Now. max_physical_pkg_id is huge. The physical_to_logical_pkg array is
set to -1 on init so slot two has the value -1. That is what you see -
not the -1 because of ">= max_physical_pkg_id".


The comment at the end of identfy_cpu() says:

/* The boot/hotplug time assigment got cleared, restore it */

So, logical_proc_id being wrong here before restoration doesn't bother
me since I assume something in booting the secondary CPU's clears any
existing cpu data.

I know detect_extended_topology() is likely being called for both CPU's
and getting the right values (checking this now).  I don't know why
generic_identify() is resetting this value.


I don't know either. But it is clearly reading the apic id twice and
second approach is different from the first which leads to different
results. So if you figure out how the first APICID for the second CPU is
retrieved and then you see how it happens for the second time. There
must be a difference.


The phys core id from generic_identify() comes from the CPU's EBX register
so we _know_ this is right.

   if (c->cpuid_level >= 0x0001) {
c->initial_apicid = (cpuid_ebx(1) >> 24) & 0xFF;
#ifdef CONFIG_X86_32
# ifdef CONFIG_SMP
c->apicid = apic->phys_pkg_id(c->initial_apicid, 0);
# else
c->apicid = c->initial_apicid;
# endif
#endif
c->phys_proc_id = c->initial_apicid;
}

The intel docs http://x86.renejeschke.de/html/file_module_x86_id_45.html
claims this is the Local APIC ID.  So it seems likely this is correct
value.  It's not clear it matter if this is the right value or not
though.  Even if this is the correct apicid, nothing knows about it.

An argument could be made that instead of checking the cpuid level, we
could just use the apicid based on the cpu index just like the other code.
It would be consistent at least then.



Re: [RFC PATCH] perf/x86/intel/rapl: avoid access unallocate memory

2016-11-04 Thread Charles (Chas) Williams

On 11/04/2016 02:03 PM, Sebastian Andrzej Siewior wrote:

On 2016-11-04 08:20:37 [-0400], Charles (Chas) Williams wrote:

The initial CPU boots and is identified:

[0.009018] identify_boot_cpu
[0.009174] generic_identify: phys_proc_id is now 0
...
[0.009427] identify_cpu: before c 81ae2680  logical_proc_id 0  
c->phys_proc_id 0
[0.009506] identify_cpu: after c 81ae2680  logical_proc_id 65535  
c->phys_proc_id 0

So, this is fine because the APIC hasn't been scanned yet.  APIC
now gets scanned:

[0.015789] smpboot: APIC(0) Converting physical 0 to logical package 0, cpu 
0 (88023fc0a040)
[0.015794] smpboot: APIC(1) Converting physical 1 to logical package 1, cpu 
1 (88023fd0a040)
[0.015797] smpboot: Max logical packages: 2


where is the APICID here is comming from?


This comes from here:

unsigned int apicid = apic->cpu_present_to_apicid(cpu);

if (apicid == BAD_APICID || !apic->apic_id_valid(apicid))
continue;
if (!topology_update_package_map(apicid, cpu))

And I think this is the part that is "wrong".  The apicid appears to
be a logical CPU id.  I believe that in most cases this mapping comes
from x86_bios_cpu_apicid (or x86_cpu_to_apicid) which is generated in
generic_processor_info() which maps apicid's to logical cpu indexes.

Note that apic->cpu_present_to_apicid() is using just the cpu_index.

for_each_present_cpu(cpu) {
unsigned int apicid = apic->cpu_present_to_apicid(cpu);

if (apicid == BAD_APICID || !apic->apic_id_valid(apicid))
continue;
if (!topology_update_package_map(apicid, cpu))
continue;
pr_warn("CPU %u APICId %x disabled\n", cpu, apicid);
per_cpu(x86_bios_cpu_apicid, cpu) = BAD_APICID;
set_cpu_possible(cpu, false);
set_cpu_present(cpu, false);
}


So, at this point, I think everything is correct.  But now the secondary
CPU's "boot":

[0.236569] identify_secondary_cpu
[0.236620] generic_identify: phys_proc_id is now 2


so here is where fun starts. Xen has also
arch/x86/xen/smp.c::cpu_bringup() where the phys_proc_id is changed. But
isn't done for vmware but it might a place where they duct tape things.

How is this APIC id different from the earlier? I guess based on your
output that generic_identify() changes the content of phys_proc_id.


[0.236745] identify_cpu: before c 88023fd0a040  logical_proc_id 65535  
c->phys_proc_id 2
[0.236747] identify_cpu: after c 88023fd0a040  logical_proc_id 65535  
c->phys_proc_id 2

So, APIC discovered I have a cpu 0 and 1 but generic_identify() is called
my second CPU, 2.  This is >= max_physical_pkg_id, so it is going to get
set to -1.


Now. max_physical_pkg_id is huge. The physical_to_logical_pkg array is
set to -1 on init so slot two has the value -1. That is what you see -
not the -1 because of ">= max_physical_pkg_id".


The comment at the end of identfy_cpu() says:

/* The boot/hotplug time assigment got cleared, restore it */

So, logical_proc_id being wrong here before restoration doesn't bother
me since I assume something in booting the secondary CPU's clears any
existing cpu data.

I know detect_extended_topology() is likely being called for both CPU's
and getting the right values (checking this now).  I don't know why
generic_identify() is resetting this value.


I don't know either. But it is clearly reading the apic id twice and
second approach is different from the first which leads to different
results. So if you figure out how the first APICID for the second CPU is
retrieved and then you see how it happens for the second time. There
must be a difference.


The phys core id from generic_identify() comes from the CPU's EBX register
so we _know_ this is right.

   if (c->cpuid_level >= 0x0001) {
c->initial_apicid = (cpuid_ebx(1) >> 24) & 0xFF;
#ifdef CONFIG_X86_32
# ifdef CONFIG_SMP
c->apicid = apic->phys_pkg_id(c->initial_apicid, 0);
# else
c->apicid = c->initial_apicid;
# endif
#endif
c->phys_proc_id = c->initial_apicid;
}

The intel docs http://x86.renejeschke.de/html/file_module_x86_id_45.html
claims this is the Local APIC ID.  So it seems likely this is correct
value.  It's not clear it matter if this is the right value or not
though.  Even if this is the correct apicid, nothing knows about it.

An argument could be made that instead of checking the cpuid level, we
could just use the apicid based on the cpu index just like the other code.
It would be consistent at least then.



Re: [RFC PATCH] perf/x86/intel/rapl: avoid access unallocate memory

2016-11-04 Thread Charles (Chas) Williams



On 11/03/2016 01:47 PM, Sebastian Andrzej Siewior wrote:

On 2016-11-02 18:47:49 [-0400], Charles (Chas) Williams wrote:

I don't this this is a race.  Here is some debugging from the two CPU VM
(2 sockets, 1 core per socket).  In identify_cpu() we have:

/* The boot/hotplug time assigment got cleared, restore it */
c->logical_proc_id = topology_phys_to_logical_pkg(c->phys_proc_id);

The values just after this:

[0.228306] identify_cpu: c 88023fd0a040  logical_proc_id 65535  
c->phys_proc_id 2

So what's interesting here, is the phys_proc_id of 2 for CPU1:

int topology_phys_to_logical_pkg(unsigned int phys_pkg)
{
if (phys_pkg >= max_physical_pkg_id)
return -1;
return physical_to_logical_pkg[phys_pkg];
}

And we happen to know the max_physical_pkg_id is 2 in this case.
So apparently, topology_phys_to_logical_pkg() returns -1 and it gets
assigned to the logical_proc_id.

I don't know why the CPU's phys_proc_id is 2.


This is the physical ID. You have two logical IDs (on your two sockets
machine). What is max_physical_pkg_id? In order to get that -1 you would
have to max_physical_pkg_id of 1 but code does
max_physical_pkg_id = DIV_ROUND_UP(MAX_LOCAL_APIC, ncpus);

and I would be a little surprised if this is 1.

Sebastian


The initial CPU boots and is identified:

[0.009018] identify_boot_cpu
[0.009174] generic_identify: phys_proc_id is now 0
...
[0.009427] identify_cpu: before c 81ae2680  logical_proc_id 0  
c->phys_proc_id 0
[0.009506] identify_cpu: after c 81ae2680  logical_proc_id 65535  
c->phys_proc_id 0

So, this is fine because the APIC hasn't been scanned yet.  APIC
now gets scanned:

[0.015789] smpboot: APIC(0) Converting physical 0 to logical package 0, cpu 
0 (88023fc0a040)
[0.015794] smpboot: APIC(1) Converting physical 1 to logical package 1, cpu 
1 (88023fd0a040)
[0.015797] smpboot: Max logical packages: 2

So, at this point, I think everything is correct.  But now the secondary
CPU's "boot":

[0.236569] identify_secondary_cpu
[0.236620] generic_identify: phys_proc_id is now 2

[0.236745] identify_cpu: before c 88023fd0a040  logical_proc_id 65535  
c->phys_proc_id 2
[0.236747] identify_cpu: after c 88023fd0a040  logical_proc_id 65535  
c->phys_proc_id 2

So, APIC discovered I have a cpu 0 and 1 but generic_identify() is called
my second CPU, 2.  This is >= max_physical_pkg_id, so it is going to get
set to -1.

The comment at the end of identfy_cpu() says:

/* The boot/hotplug time assigment got cleared, restore it */

So, logical_proc_id being wrong here before restoration doesn't bother
me since I assume something in booting the secondary CPU's clears any
existing cpu data.

I know detect_extended_topology() is likely being called for both CPU's
and getting the right values (checking this now).  I don't know why
generic_identify() is resetting this value.


Re: [RFC PATCH] perf/x86/intel/rapl: avoid access unallocate memory

2016-11-04 Thread Charles (Chas) Williams



On 11/03/2016 01:47 PM, Sebastian Andrzej Siewior wrote:

On 2016-11-02 18:47:49 [-0400], Charles (Chas) Williams wrote:

I don't this this is a race.  Here is some debugging from the two CPU VM
(2 sockets, 1 core per socket).  In identify_cpu() we have:

/* The boot/hotplug time assigment got cleared, restore it */
c->logical_proc_id = topology_phys_to_logical_pkg(c->phys_proc_id);

The values just after this:

[0.228306] identify_cpu: c 88023fd0a040  logical_proc_id 65535  
c->phys_proc_id 2

So what's interesting here, is the phys_proc_id of 2 for CPU1:

int topology_phys_to_logical_pkg(unsigned int phys_pkg)
{
if (phys_pkg >= max_physical_pkg_id)
return -1;
return physical_to_logical_pkg[phys_pkg];
}

And we happen to know the max_physical_pkg_id is 2 in this case.
So apparently, topology_phys_to_logical_pkg() returns -1 and it gets
assigned to the logical_proc_id.

I don't know why the CPU's phys_proc_id is 2.


This is the physical ID. You have two logical IDs (on your two sockets
machine). What is max_physical_pkg_id? In order to get that -1 you would
have to max_physical_pkg_id of 1 but code does
max_physical_pkg_id = DIV_ROUND_UP(MAX_LOCAL_APIC, ncpus);

and I would be a little surprised if this is 1.

Sebastian


The initial CPU boots and is identified:

[0.009018] identify_boot_cpu
[0.009174] generic_identify: phys_proc_id is now 0
...
[0.009427] identify_cpu: before c 81ae2680  logical_proc_id 0  
c->phys_proc_id 0
[0.009506] identify_cpu: after c 81ae2680  logical_proc_id 65535  
c->phys_proc_id 0

So, this is fine because the APIC hasn't been scanned yet.  APIC
now gets scanned:

[0.015789] smpboot: APIC(0) Converting physical 0 to logical package 0, cpu 
0 (88023fc0a040)
[0.015794] smpboot: APIC(1) Converting physical 1 to logical package 1, cpu 
1 (88023fd0a040)
[0.015797] smpboot: Max logical packages: 2

So, at this point, I think everything is correct.  But now the secondary
CPU's "boot":

[0.236569] identify_secondary_cpu
[0.236620] generic_identify: phys_proc_id is now 2

[0.236745] identify_cpu: before c 88023fd0a040  logical_proc_id 65535  
c->phys_proc_id 2
[0.236747] identify_cpu: after c 88023fd0a040  logical_proc_id 65535  
c->phys_proc_id 2

So, APIC discovered I have a cpu 0 and 1 but generic_identify() is called
my second CPU, 2.  This is >= max_physical_pkg_id, so it is going to get
set to -1.

The comment at the end of identfy_cpu() says:

/* The boot/hotplug time assigment got cleared, restore it */

So, logical_proc_id being wrong here before restoration doesn't bother
me since I assume something in booting the secondary CPU's clears any
existing cpu data.

I know detect_extended_topology() is likely being called for both CPU's
and getting the right values (checking this now).  I don't know why
generic_identify() is resetting this value.


Re: [RFC PATCH] perf/x86/intel/rapl: avoid access unallocate memory

2016-11-02 Thread Charles (Chas) Williams

On 11/02/2016 08:25 AM, Sebastian Andrzej Siewior wrote:

I am not sure if this a race with the new hotplug code or something that was
always there. Both (M. Vefa Bicakc and Charles) say that the box boots
sometimes fine (without the patch).  smp_store_boot_cpu_info() should have run
before the notofoert and thus should have set the info properly. However I got
the following bootlog from Charles with this patch:


I don't this this is a race.  Here is some debugging from the two CPU VM
(2 sockets, 1 core per socket).  In identify_cpu() we have:

/* The boot/hotplug time assigment got cleared, restore it */
c->logical_proc_id = topology_phys_to_logical_pkg(c->phys_proc_id);

The values just after this:

[0.228306] identify_cpu: c 88023fd0a040  logical_proc_id 65535  
c->phys_proc_id 2

So what's interesting here, is the phys_proc_id of 2 for CPU1:

int topology_phys_to_logical_pkg(unsigned int phys_pkg)
{
if (phys_pkg >= max_physical_pkg_id)
return -1;
return physical_to_logical_pkg[phys_pkg];
}

And we happen to know the max_physical_pkg_id is 2 in this case.
So apparently, topology_phys_to_logical_pkg() returns -1 and it gets
assigned to the logical_proc_id.

I don't know why the CPU's phys_proc_id is 2.


Re: [RFC PATCH] perf/x86/intel/rapl: avoid access unallocate memory

2016-11-02 Thread Charles (Chas) Williams

On 11/02/2016 08:25 AM, Sebastian Andrzej Siewior wrote:

I am not sure if this a race with the new hotplug code or something that was
always there. Both (M. Vefa Bicakc and Charles) say that the box boots
sometimes fine (without the patch).  smp_store_boot_cpu_info() should have run
before the notofoert and thus should have set the info properly. However I got
the following bootlog from Charles with this patch:


I don't this this is a race.  Here is some debugging from the two CPU VM
(2 sockets, 1 core per socket).  In identify_cpu() we have:

/* The boot/hotplug time assigment got cleared, restore it */
c->logical_proc_id = topology_phys_to_logical_pkg(c->phys_proc_id);

The values just after this:

[0.228306] identify_cpu: c 88023fd0a040  logical_proc_id 65535  
c->phys_proc_id 2

So what's interesting here, is the phys_proc_id of 2 for CPU1:

int topology_phys_to_logical_pkg(unsigned int phys_pkg)
{
if (phys_pkg >= max_physical_pkg_id)
return -1;
return physical_to_logical_pkg[phys_pkg];
}

And we happen to know the max_physical_pkg_id is 2 in this case.
So apparently, topology_phys_to_logical_pkg() returns -1 and it gets
assigned to the logical_proc_id.

I don't know why the CPU's phys_proc_id is 2.


Re: [PREEMPT-RT] Oops in rapl_cpu_prepare()

2016-11-02 Thread Charles (Chas) Williams

On 10/28/2016 04:03 AM, Sebastian Andrzej Siewior wrote:

On 2016-10-27 15:00:32 [-0400], Charles (Chas) Williams wrote:

I assume "init_rapl_pmus: maxpkg 4" is from init_rapl_pmus() returning
topology_max_packages(). So it says 4 but then returns 65535 for CPU 2
and 3. That -1 comes probably from topology_update_package_map(). Could
you please send a complete boot log and try the following patch? This
one should fix your boot problem and disable RAPL if the info is
invalid.


But sometimes the topology info is correct and if I get lucky, the
package id could be valid for all the CPU's.  Given the behavior,
I have seen so far it makes me thing the RAPL isn't being emulated.
So even if I did boot onto a "valid" set of cores, would I always be
certain that I will be on those cores?


I don't what vmware does here. Nor do they ship source to check. So if
you have a big HW box with say two packages, it might make sense to give
this information to the guest _if_ the CPUs are pinned and the guest
never migrates.


Yes, I agree _if_.  That's why it simply isn't clear to me that we should
attempt do any RAPL at all for VMWare.  The current behavior doesn't seem
to make sense and I don't expect it to suddenly start acting reasonable.
Since I don't understand why some package id's are valid and others
are not, I would prefer not to trust any of the information as far as
enabling/disabling the RAPL monitoring.




Per your request in your next email:


One thing I forgot to ask: Could you please check if you get the same
pkgid reported for cpu 0-3 on a pre-v4.8 kernel? (before the hotplug
rework).


Our previous kernel was 4.4, and didn't use the logical package id:

I see.

Did the patch I sent fixed it for you and were you not able to test?


Yes, it does prevent RAPL from starting and loading.  From the boot log:

[2.711481] RAPL PMU: rapl pmu error: max package: 4 but CPU2 belongs to 
65535
[2.711639] rapl pmu error: max package: 4 but CPU2 belongs to 65535

This was consistent across several reboots.  I poked around in the
VM settings.  Apparently this guest is configured for four virtual
sockets with one core per socket.  Testing with two virtual sockets,
one core per socket:

[2.163177] RAPL PMU: rapl pmu error: max package: 2 but CPU1 belongs to 
65535
[2.163304] rapl pmu error: max package: 2 but CPU1 belongs to 65535

Booting with 1 virtual socket, 1 core per socket:

[1.750311] RAPL PMU: API unit is 2^-32 Joules, 3 fixed counters, 
10737418240 ms ovfl timer
[1.750312] RAPL PMU: hw unit of domain pp0-core 2^-0 Joules
[1.750313] RAPL PMU: hw unit of domain package 2^-0 Joules
[1.750314] RAPL PMU: hw unit of domain dram 2^-0 Joules

Booting with 1 virtual socket, 4 cores per socket:

[3.527298] RAPL PMU: API unit is 2^-32 Joules, 3 fixed counters, 
10737418240 ms ovfl timer
[3.527302] RAPL PMU: hw unit of domain pp0-core 2^-0 Joules
[3.527304] RAPL PMU: hw unit of domain package 2^-0 Joules
[3.527307] RAPL PMU: hw unit of domain dram 2^-0 Joules

So, it looks like VMWare tends to always get something wrong if you have
more than one virtual socket.  The above behavior was consistent across
several reboots.





Re: [PREEMPT-RT] Oops in rapl_cpu_prepare()

2016-11-02 Thread Charles (Chas) Williams

On 10/28/2016 04:03 AM, Sebastian Andrzej Siewior wrote:

On 2016-10-27 15:00:32 [-0400], Charles (Chas) Williams wrote:

I assume "init_rapl_pmus: maxpkg 4" is from init_rapl_pmus() returning
topology_max_packages(). So it says 4 but then returns 65535 for CPU 2
and 3. That -1 comes probably from topology_update_package_map(). Could
you please send a complete boot log and try the following patch? This
one should fix your boot problem and disable RAPL if the info is
invalid.


But sometimes the topology info is correct and if I get lucky, the
package id could be valid for all the CPU's.  Given the behavior,
I have seen so far it makes me thing the RAPL isn't being emulated.
So even if I did boot onto a "valid" set of cores, would I always be
certain that I will be on those cores?


I don't what vmware does here. Nor do they ship source to check. So if
you have a big HW box with say two packages, it might make sense to give
this information to the guest _if_ the CPUs are pinned and the guest
never migrates.


Yes, I agree _if_.  That's why it simply isn't clear to me that we should
attempt do any RAPL at all for VMWare.  The current behavior doesn't seem
to make sense and I don't expect it to suddenly start acting reasonable.
Since I don't understand why some package id's are valid and others
are not, I would prefer not to trust any of the information as far as
enabling/disabling the RAPL monitoring.




Per your request in your next email:


One thing I forgot to ask: Could you please check if you get the same
pkgid reported for cpu 0-3 on a pre-v4.8 kernel? (before the hotplug
rework).


Our previous kernel was 4.4, and didn't use the logical package id:

I see.

Did the patch I sent fixed it for you and were you not able to test?


Yes, it does prevent RAPL from starting and loading.  From the boot log:

[2.711481] RAPL PMU: rapl pmu error: max package: 4 but CPU2 belongs to 
65535
[2.711639] rapl pmu error: max package: 4 but CPU2 belongs to 65535

This was consistent across several reboots.  I poked around in the
VM settings.  Apparently this guest is configured for four virtual
sockets with one core per socket.  Testing with two virtual sockets,
one core per socket:

[2.163177] RAPL PMU: rapl pmu error: max package: 2 but CPU1 belongs to 
65535
[2.163304] rapl pmu error: max package: 2 but CPU1 belongs to 65535

Booting with 1 virtual socket, 1 core per socket:

[1.750311] RAPL PMU: API unit is 2^-32 Joules, 3 fixed counters, 
10737418240 ms ovfl timer
[1.750312] RAPL PMU: hw unit of domain pp0-core 2^-0 Joules
[1.750313] RAPL PMU: hw unit of domain package 2^-0 Joules
[1.750314] RAPL PMU: hw unit of domain dram 2^-0 Joules

Booting with 1 virtual socket, 4 cores per socket:

[3.527298] RAPL PMU: API unit is 2^-32 Joules, 3 fixed counters, 
10737418240 ms ovfl timer
[3.527302] RAPL PMU: hw unit of domain pp0-core 2^-0 Joules
[3.527304] RAPL PMU: hw unit of domain package 2^-0 Joules
[3.527307] RAPL PMU: hw unit of domain dram 2^-0 Joules

So, it looks like VMWare tends to always get something wrong if you have
more than one virtual socket.  The above behavior was consistent across
several reboots.





Re: [PREEMPT-RT] Oops in rapl_cpu_prepare()

2016-10-27 Thread Charles (Chas) Williams

On 10/25/2016 08:22 AM, Sebastian Andrzej Siewior wrote:

On 2016-10-21 17:03:56 [-0400], Charles (Chas) Williams wrote:

[3.107126] init_rapl_pmus: maxpkg 4

there! vmware bug. It probably worked by chance.


Yes, the behavior is a bit random.


I assume "init_rapl_pmus: maxpkg 4" is from init_rapl_pmus() returning
topology_max_packages(). So it says 4 but then returns 65535 for CPU 2
and 3. That -1 comes probably from topology_update_package_map(). Could
you please send a complete boot log and try the following patch? This
one should fix your boot problem and disable RAPL if the info is
invalid.


But sometimes the topology info is correct and if I get lucky, the
package id could be valid for all the CPU's.  Given the behavior,
I have seen so far it makes me thing the RAPL isn't being emulated.
So even if I did boot onto a "valid" set of cores, would I always be
certain that I will be on those cores?


diff --git a/arch/x86/events/intel/rapl.c b/arch/x86/events/intel/rapl.c
index 0a535cea8ff3..f5d85f2853d7 100644
--- a/arch/x86/events/intel/rapl.c
+++ b/arch/x86/events/intel/rapl.c
@@ -682,6 +682,15 @@ static int __init init_rapl_pmus(void)
 {
int maxpkg = topology_max_packages();
size_t size;
+   unsigned int cpu;
+
+   for_each_possible_cpu(cpu) {
+   if (topology_logical_package_id(cpu) >= maxpkg) {
+   pr_err("rapl pmu error: max package: %u but CPU%d belongs to 
%u\n",
+  maxpkg, cpu, topology_logical_package_id(cpu));
+   return -EINVAL;
+   }
+   }

size = sizeof(*rapl_pmus) + maxpkg * sizeof(struct rapl_pmu *);
rapl_pmus = kzalloc(size, GFP_KERNEL);


Per your request in your next email:


One thing I forgot to ask: Could you please check if you get the same
pkgid reported for cpu 0-3 on a pre-v4.8 kernel? (before the hotplug
rework).


Our previous kernel was 4.4, and didn't use the logical package id:

/* check if phys_is is already covered */
for_each_cpu(i, _cpu_mask) {
if (phys_id == topology_physical_package_id(i))
return;



Re: [PREEMPT-RT] Oops in rapl_cpu_prepare()

2016-10-27 Thread Charles (Chas) Williams

On 10/25/2016 08:22 AM, Sebastian Andrzej Siewior wrote:

On 2016-10-21 17:03:56 [-0400], Charles (Chas) Williams wrote:

[3.107126] init_rapl_pmus: maxpkg 4

there! vmware bug. It probably worked by chance.


Yes, the behavior is a bit random.


I assume "init_rapl_pmus: maxpkg 4" is from init_rapl_pmus() returning
topology_max_packages(). So it says 4 but then returns 65535 for CPU 2
and 3. That -1 comes probably from topology_update_package_map(). Could
you please send a complete boot log and try the following patch? This
one should fix your boot problem and disable RAPL if the info is
invalid.


But sometimes the topology info is correct and if I get lucky, the
package id could be valid for all the CPU's.  Given the behavior,
I have seen so far it makes me thing the RAPL isn't being emulated.
So even if I did boot onto a "valid" set of cores, would I always be
certain that I will be on those cores?


diff --git a/arch/x86/events/intel/rapl.c b/arch/x86/events/intel/rapl.c
index 0a535cea8ff3..f5d85f2853d7 100644
--- a/arch/x86/events/intel/rapl.c
+++ b/arch/x86/events/intel/rapl.c
@@ -682,6 +682,15 @@ static int __init init_rapl_pmus(void)
 {
int maxpkg = topology_max_packages();
size_t size;
+   unsigned int cpu;
+
+   for_each_possible_cpu(cpu) {
+   if (topology_logical_package_id(cpu) >= maxpkg) {
+   pr_err("rapl pmu error: max package: %u but CPU%d belongs to 
%u\n",
+  maxpkg, cpu, topology_logical_package_id(cpu));
+   return -EINVAL;
+   }
+   }

size = sizeof(*rapl_pmus) + maxpkg * sizeof(struct rapl_pmu *);
rapl_pmus = kzalloc(size, GFP_KERNEL);


Per your request in your next email:


One thing I forgot to ask: Could you please check if you get the same
pkgid reported for cpu 0-3 on a pre-v4.8 kernel? (before the hotplug
rework).


Our previous kernel was 4.4, and didn't use the logical package id:

/* check if phys_is is already covered */
for_each_cpu(i, _cpu_mask) {
if (phys_id == topology_physical_package_id(i))
return;



Re: [PREEMPT-RT] Oops in rapl_cpu_prepare()

2016-10-21 Thread Charles (Chas) Williams

On 10/21/2016 06:56 AM, Sebastian Andrzej Siewior wrote:

On 2016-10-20 16:27:55 [-0400], Charles (Chas) Williams wrote:

Recent 4.8 kernels have been oopsing when running under VMWare:


can you reproduce this on bare metal?


I can't get dedicated access to the specific bare metal since it is
running as a dedicated hypervisor.  I haven't seen this issue anywhere
else though with the 4.8 kernel.


[2.270203] BUG: unable to handle kernel NULL pointer dereference at 
0408
[2.270325] IP: [] rapl_cpu_online+0x59/0x70


can you check if pmu is NULL?


It's not.  The dereference at 0x408 and pmu->cpu being fairly early in
the struct seems to indicate that pmu wasn't pointing to 0 at the time
(but fairly close).  I should have noticed that earlier.


Is there a particular order guaranteed by the callbacks?  Will
rapl_cpu_prepare() always happen before online/offline?  Additionally,


yes, see include/linux/cpuhotplug.h. On CPU-up the array ids are invoked
from CPUHP_OFFLINE till CPUHP_ONLINE.


Yes, I see that now.  Thanks for the pointer!


If a callback (such as CPUHP_PERF_X86_RAPL_PREP) fail then we rollback
to the starting point (in case of CPU up it would be CPUHP_OFFLINE.


You'll like this, I just did a little printk debugging because it was
easier than trying to get a debugger running:

[3.107126] init_rapl_pmus: maxpkg 4
[3.107263] rapl_cpu_prepare: pmu 880234faa540  cpu 0  pkgid 0
[3.107400] rapl_cpu_prepare: pmu 880234faa600  cpu 1  pkgid 2
[3.107537] rapl_cpu_prepare: pmu 880234faa6c0  cpu 2  pkgid 
65535
[3.107662] rapl_cpu_online: pmu 880234faa540 cpu 0 pkgid 0
[3.107907] rapl_cpu_online: pmu 880234faa600 cpu 1 pkgid 2
[3.108133] rapl_cpu_online: pmu 880234faa6c0 cpu 2 pkgid 65535
[3.108333] rapl_cpu_online: pmu 880234faa6c0 cpu 3 pkgid 65535

where pkgid is topology_logical_package_id(cpu).

I can't understand why I don't see a cpu 3 during cpu prepare, when I
see one later.  The 65535 is a -1 from topology_phys_to_logical_pkg()
getting assigned to the logical_proc_id apparently.

So this is pretty puzzling.  Since this is a guest running under VMWare, I
don't know that there is any particular CPU pinning or emulation of RAPL.

It looks there was a proposal to not run in guests:

https://lkml.org/lkml/2015/12/3/559



Re: [PREEMPT-RT] Oops in rapl_cpu_prepare()

2016-10-21 Thread Charles (Chas) Williams

On 10/21/2016 06:56 AM, Sebastian Andrzej Siewior wrote:

On 2016-10-20 16:27:55 [-0400], Charles (Chas) Williams wrote:

Recent 4.8 kernels have been oopsing when running under VMWare:


can you reproduce this on bare metal?


I can't get dedicated access to the specific bare metal since it is
running as a dedicated hypervisor.  I haven't seen this issue anywhere
else though with the 4.8 kernel.


[2.270203] BUG: unable to handle kernel NULL pointer dereference at 
0408
[2.270325] IP: [] rapl_cpu_online+0x59/0x70


can you check if pmu is NULL?


It's not.  The dereference at 0x408 and pmu->cpu being fairly early in
the struct seems to indicate that pmu wasn't pointing to 0 at the time
(but fairly close).  I should have noticed that earlier.


Is there a particular order guaranteed by the callbacks?  Will
rapl_cpu_prepare() always happen before online/offline?  Additionally,


yes, see include/linux/cpuhotplug.h. On CPU-up the array ids are invoked
from CPUHP_OFFLINE till CPUHP_ONLINE.


Yes, I see that now.  Thanks for the pointer!


If a callback (such as CPUHP_PERF_X86_RAPL_PREP) fail then we rollback
to the starting point (in case of CPU up it would be CPUHP_OFFLINE.


You'll like this, I just did a little printk debugging because it was
easier than trying to get a debugger running:

[3.107126] init_rapl_pmus: maxpkg 4
[3.107263] rapl_cpu_prepare: pmu 880234faa540  cpu 0  pkgid 0
[3.107400] rapl_cpu_prepare: pmu 880234faa600  cpu 1  pkgid 2
[3.107537] rapl_cpu_prepare: pmu 880234faa6c0  cpu 2  pkgid 
65535
[3.107662] rapl_cpu_online: pmu 880234faa540 cpu 0 pkgid 0
[3.107907] rapl_cpu_online: pmu 880234faa600 cpu 1 pkgid 2
[3.108133] rapl_cpu_online: pmu 880234faa6c0 cpu 2 pkgid 65535
[3.108333] rapl_cpu_online: pmu 880234faa6c0 cpu 3 pkgid 65535

where pkgid is topology_logical_package_id(cpu).

I can't understand why I don't see a cpu 3 during cpu prepare, when I
see one later.  The 65535 is a -1 from topology_phys_to_logical_pkg()
getting assigned to the logical_proc_id apparently.

So this is pretty puzzling.  Since this is a guest running under VMWare, I
don't know that there is any particular CPU pinning or emulation of RAPL.

It looks there was a proposal to not run in guests:

https://lkml.org/lkml/2015/12/3/559



Oops in rapl_cpu_prepare()

2016-10-20 Thread Charles (Chas) Williams

Recent 4.8 kernels have been oopsing when running under VMWare:

[2.270203] BUG: unable to handle kernel NULL pointer dereference at 
0408
[2.270325] IP: [] rapl_cpu_online+0x59/0x70
[2.270448] PGD 0
[2.270570] Oops: 0002 [#1] SMP
[2.270693] Modules linked in:
[2.270815] CPU: 2 PID: 21 Comm: cpuhp/2 Not tainted 4.8.2-1-amd64-vyatta #1
[2.270938] Hardware name: VMware, Inc. VMware Virtual Platform/440BX 
Desktop Reference Platform, BIOS 6.00 04/14/2014
[2.271060] task: 8802361fc2c0 task.stack: 880236208000
[2.271183] RIP: 0010:[]  [] 
rapl_cpu_online+0x59/0x70
[2.271306] RSP: :88023620be68  EFLAGS: 00010246
[2.271428] RAX: 0004 RBX: 88023fd0d940 RCX: 
[2.271551] RDX: 0040 RSI: 0004 RDI: 0004
[2.271673] RBP: 0002 R08: fffc R09: 
[2.271796] R10:  R11:  R12: 0400
[2.271918] R13: 8802361fc2c0 R14: 8802361fc2c0 R15: 8802361fc2c0
[2.272041] FS:  () GS:88023fd0() 
knlGS:
[2.272163] CS:  0010 DS:  ES:  CR0: 80050033
[2.272286] CR2: 0408 CR3: 01a06000 CR4: 000406e0
[2.272408] Stack:
[2.272531]  88023fd0d940 0002 81a38240 
81061231
[2.272654]  8802361fc2c0 880237002180 8107ddcf 

[2.272776]  8802361a5a80 880237002180 8107dcb0 
81a6a380
[2.272899] Call Trace:
[2.273021]  [] ? cpuhp_thread_fun+0x31/0x100
[2.273144]  [] ? smpboot_thread_fn+0x11f/0x180
[2.273266]  [] ? sort_range+0x20/0x20
[2.273389]  [] ? kthread+0xca/0xe0
[2.273511]  [] ? ret_from_fork+0x1f/0x40
[2.273634]  [] ? kthread_park+0x50/0x50
[2.273757] Code: 00 00 48 83 c0 22 4c 8b 24 c1 48 c7 c0 30 a1 00 00 48 8b 
14 10 e8 a8 61 26 00 3b 05 b6 56 ae 00 7c 0e f0 48 0f a
[2.279445] RIP  [] rapl_cpu_online+0x59/0x70
[2.279568]  RSP 
[2.279690] CR2: 0408
[2.279813] ---[ end trace c95da920748eb432 ]---


gdb tells me:

(gdb) info line *(rapl_cpu_online+0x59)
Line 595 of "arch/x86/events/intel/rapl.c" starts at address 0x81012bb9 

   and ends at 0x81012bbe .

Which is:


target = cpumask_any_and(_cpu_mask, topology_core_cpumask(cpu));
if (target < nr_cpu_ids)
return 0;

cpumask_set_cpu(cpu, _cpu_mask);
pmu->cpu = cpu;  <<
return 0;

This code was recently changed by commit 8b5b773d6245138c
"perf/x86/intel/rapl: Convert to hotplug state machine" and it
appears that the setup is done as a callback:

/*
 * Install callbacks. Core will call them for each online cpu.
 */

ret = cpuhp_setup_state(CPUHP_PERF_X86_RAPL_PREP, "PERF_X86_RAPL_PREP",
rapl_cpu_prepare, NULL);
if (ret)
goto out;

ret = cpuhp_setup_state(CPUHP_AP_PERF_X86_RAPL_ONLINE,
"AP_PERF_X86_RAPL_ONLINE",
rapl_cpu_online, rapl_cpu_offline);

Is there a particular order guaranteed by the callbacks?  Will
rapl_cpu_prepare() always happen before online/offline?  Additionally,
rapl_cpu_prepare() can fail to allocate pmu,

static int rapl_cpu_prepare(unsigned int cpu)
{
struct rapl_pmu *pmu = cpu_to_rapl_pmu(cpu);

if (pmu)
return 0;

pmu = kzalloc_node(sizeof(*pmu), GFP_KERNEL, cpu_to_node(cpu));
if (!pmu)
return -ENOMEM;

But rapl_cpu_online() would have no idea about this.  What should be
done in this case?


Oops in rapl_cpu_prepare()

2016-10-20 Thread Charles (Chas) Williams

Recent 4.8 kernels have been oopsing when running under VMWare:

[2.270203] BUG: unable to handle kernel NULL pointer dereference at 
0408
[2.270325] IP: [] rapl_cpu_online+0x59/0x70
[2.270448] PGD 0
[2.270570] Oops: 0002 [#1] SMP
[2.270693] Modules linked in:
[2.270815] CPU: 2 PID: 21 Comm: cpuhp/2 Not tainted 4.8.2-1-amd64-vyatta #1
[2.270938] Hardware name: VMware, Inc. VMware Virtual Platform/440BX 
Desktop Reference Platform, BIOS 6.00 04/14/2014
[2.271060] task: 8802361fc2c0 task.stack: 880236208000
[2.271183] RIP: 0010:[]  [] 
rapl_cpu_online+0x59/0x70
[2.271306] RSP: :88023620be68  EFLAGS: 00010246
[2.271428] RAX: 0004 RBX: 88023fd0d940 RCX: 
[2.271551] RDX: 0040 RSI: 0004 RDI: 0004
[2.271673] RBP: 0002 R08: fffc R09: 
[2.271796] R10:  R11:  R12: 0400
[2.271918] R13: 8802361fc2c0 R14: 8802361fc2c0 R15: 8802361fc2c0
[2.272041] FS:  () GS:88023fd0() 
knlGS:
[2.272163] CS:  0010 DS:  ES:  CR0: 80050033
[2.272286] CR2: 0408 CR3: 01a06000 CR4: 000406e0
[2.272408] Stack:
[2.272531]  88023fd0d940 0002 81a38240 
81061231
[2.272654]  8802361fc2c0 880237002180 8107ddcf 

[2.272776]  8802361a5a80 880237002180 8107dcb0 
81a6a380
[2.272899] Call Trace:
[2.273021]  [] ? cpuhp_thread_fun+0x31/0x100
[2.273144]  [] ? smpboot_thread_fn+0x11f/0x180
[2.273266]  [] ? sort_range+0x20/0x20
[2.273389]  [] ? kthread+0xca/0xe0
[2.273511]  [] ? ret_from_fork+0x1f/0x40
[2.273634]  [] ? kthread_park+0x50/0x50
[2.273757] Code: 00 00 48 83 c0 22 4c 8b 24 c1 48 c7 c0 30 a1 00 00 48 8b 
14 10 e8 a8 61 26 00 3b 05 b6 56 ae 00 7c 0e f0 48 0f a
[2.279445] RIP  [] rapl_cpu_online+0x59/0x70
[2.279568]  RSP 
[2.279690] CR2: 0408
[2.279813] ---[ end trace c95da920748eb432 ]---


gdb tells me:

(gdb) info line *(rapl_cpu_online+0x59)
Line 595 of "arch/x86/events/intel/rapl.c" starts at address 0x81012bb9 

   and ends at 0x81012bbe .

Which is:


target = cpumask_any_and(_cpu_mask, topology_core_cpumask(cpu));
if (target < nr_cpu_ids)
return 0;

cpumask_set_cpu(cpu, _cpu_mask);
pmu->cpu = cpu;  <<
return 0;

This code was recently changed by commit 8b5b773d6245138c
"perf/x86/intel/rapl: Convert to hotplug state machine" and it
appears that the setup is done as a callback:

/*
 * Install callbacks. Core will call them for each online cpu.
 */

ret = cpuhp_setup_state(CPUHP_PERF_X86_RAPL_PREP, "PERF_X86_RAPL_PREP",
rapl_cpu_prepare, NULL);
if (ret)
goto out;

ret = cpuhp_setup_state(CPUHP_AP_PERF_X86_RAPL_ONLINE,
"AP_PERF_X86_RAPL_ONLINE",
rapl_cpu_online, rapl_cpu_offline);

Is there a particular order guaranteed by the callbacks?  Will
rapl_cpu_prepare() always happen before online/offline?  Additionally,
rapl_cpu_prepare() can fail to allocate pmu,

static int rapl_cpu_prepare(unsigned int cpu)
{
struct rapl_pmu *pmu = cpu_to_rapl_pmu(cpu);

if (pmu)
return 0;

pmu = kzalloc_node(sizeof(*pmu), GFP_KERNEL, cpu_to_node(cpu));
if (!pmu)
return -ENOMEM;

But rapl_cpu_online() would have no idea about this.  What should be
done in this case?


Re: [PATCH v3 1/1] atm: solos-pci: Replace simple_strtol by kstrtoint

2015-12-03 Thread Charles (Chas) Williams
On Thu, 2015-12-03 at 13:58 +0100, LABBE Corentin wrote:
> On Thu, Dec 03, 2015 at 06:26:31AM -0500, Charles (Chas) Williams wrote:
> > On Thu, 2015-12-03 at 09:06 +0100, LABBE Corentin wrote:
> > > @@ -357,11 +357,11 @@ static int process_status(struct solos_card *card, 
> > > int port, struct sk_buff *skb
> > >   if (!str)
> > >   return -EIO;
> > >  
> > > - ver = simple_strtol(str, NULL, 10);
> > > - if (ver < 1) {
> > > + err = kstrtoint(str, 10, );
> > > + if (err || ver < 1) {
> > >   dev_warn(>dev->dev, "Unexpected status interrupt version 
> > > %d\n",
> > >ver);
> > > - return -EIO;
> > > + return err;
> > 
> > 
> > If ver < 1 then you might return a 0 here.  Always returning -EIO is
> > probably just fine.
> > 
> 
> Hello
> 
> I think the best solution is to split the test, since returning error code 
> from kstrtoint was asked by David Miller.
> if (err)
>   return err;
> if (ver < 1)
>   return -EIO;
> Thanks
> Regards

That's fine.  You just shouldn't return 0 if the ver < 1.  This isn't
timing critical code.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v3 1/1] atm: solos-pci: Replace simple_strtol by kstrtoint

2015-12-03 Thread Charles (Chas) Williams
On Thu, 2015-12-03 at 09:06 +0100, LABBE Corentin wrote:
> @@ -357,11 +357,11 @@ static int process_status(struct solos_card *card, int 
> port, struct sk_buff *skb
>   if (!str)
>   return -EIO;
>  
> - ver = simple_strtol(str, NULL, 10);
> - if (ver < 1) {
> + err = kstrtoint(str, 10, );
> + if (err || ver < 1) {
>   dev_warn(>dev->dev, "Unexpected status interrupt version 
> %d\n",
>ver);
> - return -EIO;
> + return err;


If ver < 1 then you might return a 0 here.  Always returning -EIO is
probably just fine.


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v3 1/1] atm: solos-pci: Replace simple_strtol by kstrtoint

2015-12-03 Thread Charles (Chas) Williams
On Thu, 2015-12-03 at 09:06 +0100, LABBE Corentin wrote:
> @@ -357,11 +357,11 @@ static int process_status(struct solos_card *card, int 
> port, struct sk_buff *skb
>   if (!str)
>   return -EIO;
>  
> - ver = simple_strtol(str, NULL, 10);
> - if (ver < 1) {
> + err = kstrtoint(str, 10, );
> + if (err || ver < 1) {
>   dev_warn(>dev->dev, "Unexpected status interrupt version 
> %d\n",
>ver);
> - return -EIO;
> + return err;


If ver < 1 then you might return a 0 here.  Always returning -EIO is
probably just fine.


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v3 1/1] atm: solos-pci: Replace simple_strtol by kstrtoint

2015-12-03 Thread Charles (Chas) Williams
On Thu, 2015-12-03 at 13:58 +0100, LABBE Corentin wrote:
> On Thu, Dec 03, 2015 at 06:26:31AM -0500, Charles (Chas) Williams wrote:
> > On Thu, 2015-12-03 at 09:06 +0100, LABBE Corentin wrote:
> > > @@ -357,11 +357,11 @@ static int process_status(struct solos_card *card, 
> > > int port, struct sk_buff *skb
> > >   if (!str)
> > >   return -EIO;
> > >  
> > > - ver = simple_strtol(str, NULL, 10);
> > > - if (ver < 1) {
> > > + err = kstrtoint(str, 10, );
> > > + if (err || ver < 1) {
> > >   dev_warn(>dev->dev, "Unexpected status interrupt version 
> > > %d\n",
> > >ver);
> > > - return -EIO;
> > > + return err;
> > 
> > 
> > If ver < 1 then you might return a 0 here.  Always returning -EIO is
> > probably just fine.
> > 
> 
> Hello
> 
> I think the best solution is to split the test, since returning error code 
> from kstrtoint was asked by David Miller.
> if (err)
>   return err;
> if (ver < 1)
>   return -EIO;
> Thanks
> Regards

That's fine.  You just shouldn't return 0 if the ver < 1.  This isn't
timing critical code.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/2] atm: iphase: Fix misleading indention and return -ENOMEM on error

2015-10-12 Thread Charles (Chas) Williams
On Sat, 2015-10-10 at 21:47 +0200, Tillmann Heidsieck wrote:
> this series fixes two of them. The if(); warning would require
> restructuring the code to a larger extend. Beyond this there remains a
> whooping number of > 2k checkpatch.pl warnings and errors each. Those
> can be grouped into 
...
> Generally I would not mind cleaning all this up for those who have to
> make functional changes to the driver. However, I would like to know
> from the maintainers if such an afford would be welcome or not.

It doesn't bother me if you do this.  I can review it.


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/2] atm: iphase: Fix misleading indention and return -ENOMEM on error

2015-10-12 Thread Charles (Chas) Williams
On Sat, 2015-10-10 at 21:47 +0200, Tillmann Heidsieck wrote:
> this series fixes two of them. The if(); warning would require
> restructuring the code to a larger extend. Beyond this there remains a
> whooping number of > 2k checkpatch.pl warnings and errors each. Those
> can be grouped into 
...
> Generally I would not mind cleaning all this up for those who have to
> make functional changes to the driver. However, I would like to know
> from the maintainers if such an afford would be welcome or not.

It doesn't bother me if you do this.  I can review it.


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 07/12] atm: hide 'struct zatm_t_hist'

2015-09-30 Thread Charles (Chas) Williams
On Wed, 2015-09-30 at 13:26 +0200, Arnd Bergmann wrote:
> The zatm_t_hist structure is not used anywhere in the kernel, but is
> exported to user space. As we are trying to eliminate uses of time_t
> in the kernel for y2038 compatibility, the current definition triggers
> checking tools because it contains 'struct timeval'.
> 
> We can work around this by adding '#ifdef __KERNEL__'. I could not find
> out what the structure is actually used for, so this is the safe choice
> in case there is some user space tool that relies on the definition.
> 
> If we are sure that nothing in user space relies on the structure, we
> can instead remove the definition completely.

It was used by the ZATM_GETHIST ioctl which is long since gone in the
kernel driver.  You can just remove this.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 07/12] atm: hide 'struct zatm_t_hist'

2015-09-30 Thread Charles (Chas) Williams
On Wed, 2015-09-30 at 13:26 +0200, Arnd Bergmann wrote:
> The zatm_t_hist structure is not used anywhere in the kernel, but is
> exported to user space. As we are trying to eliminate uses of time_t
> in the kernel for y2038 compatibility, the current definition triggers
> checking tools because it contains 'struct timeval'.
> 
> We can work around this by adding '#ifdef __KERNEL__'. I could not find
> out what the structure is actually used for, so this is the safe choice
> in case there is some user space tool that relies on the definition.
> 
> If we are sure that nothing in user space relies on the structure, we
> can instead remove the definition completely.

It was used by the ZATM_GETHIST ioctl which is long since gone in the
kernel driver.  You can just remove this.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/4] ozwpan: Use proper check to prevent heap overflow

2015-05-16 Thread Charles (Chas) Williams
On Fri, 2015-05-15 at 15:02 +, David Laight wrote:
> From: Jason A. Donenfeld
> > Sent: 13 May 2015 19:34
> > Since elt->length is a u8, we can make this variable a u8. Then we can
> > do proper bounds checking more easily. Without this, a potentially
> > negative value is passed to the memcpy inside oz_hcd_get_desc_cnf,
> > resulting in a remotely exploitable heap overflow with network
> > supplied data.
> ...
> > diff --git a/drivers/staging/ozwpan/ozusbsvc1.c 
> > b/drivers/staging/ozwpan/ozusbsvc1.c
> > index d434d8c..cd6c63e 100644
> > --- a/drivers/staging/ozwpan/ozusbsvc1.c
> > +++ b/drivers/staging/ozwpan/ozusbsvc1.c
> > @@ -390,8 +390,10 @@ void oz_usb_rx(struct oz_pd *pd, struct oz_elt *elt)
> > case OZ_GET_DESC_RSP: {
> > struct oz_get_desc_rsp *body =
> > (struct oz_get_desc_rsp *)usb_hdr;
> > -   int data_len = elt->length -
> > +   u8 data_len = elt->length -
> > sizeof(struct oz_get_desc_rsp) + 1;
> > +   if (data_len > elt->length)
> > +   break;

This check already seems bogus?  It's really:

if (sizeof(struct oz_get_desc_rsp) - 1 < 0)


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/4] ozwpan: Use proper check to prevent heap overflow

2015-05-16 Thread Charles (Chas) Williams
On Fri, 2015-05-15 at 15:02 +, David Laight wrote:
 From: Jason A. Donenfeld
  Sent: 13 May 2015 19:34
  Since elt-length is a u8, we can make this variable a u8. Then we can
  do proper bounds checking more easily. Without this, a potentially
  negative value is passed to the memcpy inside oz_hcd_get_desc_cnf,
  resulting in a remotely exploitable heap overflow with network
  supplied data.
 ...
  diff --git a/drivers/staging/ozwpan/ozusbsvc1.c 
  b/drivers/staging/ozwpan/ozusbsvc1.c
  index d434d8c..cd6c63e 100644
  --- a/drivers/staging/ozwpan/ozusbsvc1.c
  +++ b/drivers/staging/ozwpan/ozusbsvc1.c
  @@ -390,8 +390,10 @@ void oz_usb_rx(struct oz_pd *pd, struct oz_elt *elt)
  case OZ_GET_DESC_RSP: {
  struct oz_get_desc_rsp *body =
  (struct oz_get_desc_rsp *)usb_hdr;
  -   int data_len = elt-length -
  +   u8 data_len = elt-length -
  sizeof(struct oz_get_desc_rsp) + 1;
  +   if (data_len  elt-length)
  +   break;

This check already seems bogus?  It's really:

if (sizeof(struct oz_get_desc_rsp) - 1  0)


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/