Re: [PATCH] x86/cpuid: Deal with broken firmware once more
On 11/10/2016 09:02 AM, Boris Ostrovsky wrote: On 11/10/2016 06:13 AM, Thomas Gleixner wrote: On Thu, 10 Nov 2016, M. Vefa Bicakci wrote: I have found that your patch unfortunately does not improve the situation for me. Here is an excerpt obtained from the dmesg of a kernel compiled with this patch *as well as* Sebastian's patch: [0.002561] CPU: Physical Processor ID: 0 [0.002566] CPU: Processor Core ID: 0 [0.002572] [Firmware Bug]: CPU0: APIC id mismatch. Firmware: CPUID: 2 So apic->cpu_present_to_apicid() gives us a completely bogus APIC id which translates to a bogus package id. And looking at the XEN code: xen_pv_apic.cpu_present_to_apicid = xen_cpu_present_to_apicid, and xen_cpu_present_to_apicid does: static int xen_cpu_present_to_apicid(int cpu) { if (cpu_present(cpu)) return xen_get_apic_id(xen_apic_read(APIC_ID)); else return BAD_APICID; } So independent of which present CPU we query we get just some random information, in the above case we get BAD_APICID from xen_apic_read() not from the else path as this CPU _IS_ present. What's so wrong with storing the fricking firmware supplied APICid as everybody else does and report it back when queried? By firmware you mean ACPI? It is most likely not available to PV guests. How about returning cpu_data(cpu).initial_apicid? And what was the original problem? The original issue I found was that VMware was returning a different set of APIC id's in the ACPI tables than what it advertised on the CPU's. http://www.mail-archive.com/linux-kernel@vger.kernel.org/msg1266716.html This damned attitude of we just hack the code into submission and let everybody else deal with the outcoming is utterly annoying. Thanks, tglx
Re: [PATCH] x86/cpuid: Deal with broken firmware once more
On 11/10/2016 09:02 AM, Boris Ostrovsky wrote: On 11/10/2016 06:13 AM, Thomas Gleixner wrote: On Thu, 10 Nov 2016, M. Vefa Bicakci wrote: I have found that your patch unfortunately does not improve the situation for me. Here is an excerpt obtained from the dmesg of a kernel compiled with this patch *as well as* Sebastian's patch: [0.002561] CPU: Physical Processor ID: 0 [0.002566] CPU: Processor Core ID: 0 [0.002572] [Firmware Bug]: CPU0: APIC id mismatch. Firmware: CPUID: 2 So apic->cpu_present_to_apicid() gives us a completely bogus APIC id which translates to a bogus package id. And looking at the XEN code: xen_pv_apic.cpu_present_to_apicid = xen_cpu_present_to_apicid, and xen_cpu_present_to_apicid does: static int xen_cpu_present_to_apicid(int cpu) { if (cpu_present(cpu)) return xen_get_apic_id(xen_apic_read(APIC_ID)); else return BAD_APICID; } So independent of which present CPU we query we get just some random information, in the above case we get BAD_APICID from xen_apic_read() not from the else path as this CPU _IS_ present. What's so wrong with storing the fricking firmware supplied APICid as everybody else does and report it back when queried? By firmware you mean ACPI? It is most likely not available to PV guests. How about returning cpu_data(cpu).initial_apicid? And what was the original problem? The original issue I found was that VMware was returning a different set of APIC id's in the ACPI tables than what it advertised on the CPU's. http://www.mail-archive.com/linux-kernel@vger.kernel.org/msg1266716.html This damned attitude of we just hack the code into submission and let everybody else deal with the outcoming is utterly annoying. Thanks, tglx
Re: [PATCH] x86/cpuid: Deal with broken firmware once more
On 11/09/2016 10:57 PM, M. Vefa Bicakci wrote: [0.002000] mvb: CPU: Physical Processor ID: 0 [0.002000] mvb: CPU: Processor Core ID: 0 [0.002000] mvb: identify_cpu:1112: c: 880013b0a040, c->logical_proc_id: 65535 [0.002000] mvb: __default_cpu_present_to_apicid:612: Returning 65535! mps_cpu: 1, nr_cpu_ids: 2, cpu_present(mps_cpu): 1 [0.002000] smpboot: mvb: topology_update_package_map:270: cpu: 1, pkg: 4095 [0.002000] smpboot: APIC() Converting physical 4095 to logical package 0 [0.002000] smpboot: mvb: topology_update_package_map:305: cpu: 1, cpu_data(cpu).logical_proc_id: 0 This seems strange. 0x is BAD_APICID. Why didn't this fail here: for_each_present_cpu(cpu) { unsigned int apicid = apic->cpu_present_to_apicid(cpu); if (apicid == BAD_APICID || !apic->apic_id_valid(apicid)) << continue; if (!topology_update_package_map(apicid, cpu)) continue; topology_update_package_map() should never have been called?
Re: [PATCH] x86/cpuid: Deal with broken firmware once more
On 11/09/2016 10:57 PM, M. Vefa Bicakci wrote: [0.002000] mvb: CPU: Physical Processor ID: 0 [0.002000] mvb: CPU: Processor Core ID: 0 [0.002000] mvb: identify_cpu:1112: c: 880013b0a040, c->logical_proc_id: 65535 [0.002000] mvb: __default_cpu_present_to_apicid:612: Returning 65535! mps_cpu: 1, nr_cpu_ids: 2, cpu_present(mps_cpu): 1 [0.002000] smpboot: mvb: topology_update_package_map:270: cpu: 1, pkg: 4095 [0.002000] smpboot: APIC() Converting physical 4095 to logical package 0 [0.002000] smpboot: mvb: topology_update_package_map:305: cpu: 1, cpu_data(cpu).logical_proc_id: 0 This seems strange. 0x is BAD_APICID. Why didn't this fail here: for_each_present_cpu(cpu) { unsigned int apicid = apic->cpu_present_to_apicid(cpu); if (apicid == BAD_APICID || !apic->apic_id_valid(apicid)) << continue; if (!topology_update_package_map(apicid, cpu)) continue; topology_update_package_map() should never have been called?
Re: [PATCH] x86/cpuid: Deal with broken firmware once more
On 11/09/2016 10:35 AM, Thomas Gleixner wrote: Both ACPI and MP specifications require that the APIC id in the respective tables must be the same as the APIC id in CPUID. The kernel retrieves the physical package id from the APIC id during the ACPI/MP table scan and builds the physical to logical package map. There exist Virtualbox and Xen implementations which violate the spec. As a result the physical to logical package map, which relies on the ACPI/MP tables does not work on those systems, because the CPUID initialized physical package id does not match the firmware id. This causes system crashes and malfunction due to invalid package mappings. The only way to cure this is to sanitize the physical package id after the CPUID enumeration and yell when the APIC ids are different. If the physical package IDs differ use the package information from the ACPI/MP tables so the existing logical package map just works. Reported-by: "Charles (Chas) Williams" <ciwil...@brocade.com>, Reported-by: M. Vefa Bicakci <m@runbox.com> Signed-off-by: Thomas Gleixner <t...@linutronix.de> For 4 virtual sockets, 1 core per socket VM: [0.235459] node #0, CPUs: #1 [0.238579] Disabled fast string operations [0.238620] mce: CPU supports 0 MCE banks [0.238864] [Firmware Bug]: CPU1: APIC id mismatch. Firmware: 1 CPUID: 2 [0.238878] [Firmware Bug]: CPU1: Using firmware package id 1 instead of 2 [0.239502] #2 [0.241298] Disabled fast string operations [0.241356] mce: CPU supports 0 MCE banks [0.241429] [Firmware Bug]: CPU2: APIC id mismatch. Firmware: 2 CPUID: 4 [0.241431] [Firmware Bug]: CPU2: Using firmware package id 2 instead of 4 [0.241631] #3 [0.244075] Disabled fast string operations [0.244112] mce: CPU supports 0 MCE banks [0.244284] [Firmware Bug]: CPU3: APIC id mismatch. Firmware: 3 CPUID: 6 [0.244293] [Firmware Bug]: CPU3: Using firmware package id 3 instead of 6 For a 2 virtual sockets, 2 cores per socket, VMware seems to get its APIC table correct as this fixup code isn't triggered. The mapping looks like: [0.028911] topology_update_package_map: apicid 0 pkg 0 cpu 0 [0.029068] smpboot: APIC(0) Converting physical 0 to logical package 0, cpu 0 [0.029072] topology_update_package_map: apicid 1 pkg 0 cpu 1 [0.029220] topology_update_package_map: apicid 2 pkg 1 cpu 2 [0.029376] smpboot: APIC(2) Converting physical 1 to logical package 1, cpu 2 [0.029381] topology_update_package_map: apicid 3 pkg 1 cpu 3 [0.029525] smpboot: Max logical packages: 2 For a VM with 1 virtual socket and 4 cores, the behavior is again correct. [0.016198] topology_update_package_map: apicid 0 pkg 0 cpu 0 [0.016271] smpboot: APIC(0) Converting physical 0 to logical package 0, cpu 0 (88023fc0a040) [0.016273] topology_update_package_map: apicid 1 pkg 0 cpu 1 [0.016336] topology_update_package_map: apicid 2 pkg 0 cpu 2 [0.016397] topology_update_package_map: apicid 3 pkg 0 cpu 3 It looks like VMware might have some assumption about the minimum number of cores on a virtual socket. Regardless, the fix solves my problem! Thanks!
Re: [PATCH] x86/cpuid: Deal with broken firmware once more
On 11/09/2016 10:35 AM, Thomas Gleixner wrote: Both ACPI and MP specifications require that the APIC id in the respective tables must be the same as the APIC id in CPUID. The kernel retrieves the physical package id from the APIC id during the ACPI/MP table scan and builds the physical to logical package map. There exist Virtualbox and Xen implementations which violate the spec. As a result the physical to logical package map, which relies on the ACPI/MP tables does not work on those systems, because the CPUID initialized physical package id does not match the firmware id. This causes system crashes and malfunction due to invalid package mappings. The only way to cure this is to sanitize the physical package id after the CPUID enumeration and yell when the APIC ids are different. If the physical package IDs differ use the package information from the ACPI/MP tables so the existing logical package map just works. Reported-by: "Charles (Chas) Williams" , Reported-by: M. Vefa Bicakci Signed-off-by: Thomas Gleixner For 4 virtual sockets, 1 core per socket VM: [0.235459] node #0, CPUs: #1 [0.238579] Disabled fast string operations [0.238620] mce: CPU supports 0 MCE banks [0.238864] [Firmware Bug]: CPU1: APIC id mismatch. Firmware: 1 CPUID: 2 [0.238878] [Firmware Bug]: CPU1: Using firmware package id 1 instead of 2 [0.239502] #2 [0.241298] Disabled fast string operations [0.241356] mce: CPU supports 0 MCE banks [0.241429] [Firmware Bug]: CPU2: APIC id mismatch. Firmware: 2 CPUID: 4 [0.241431] [Firmware Bug]: CPU2: Using firmware package id 2 instead of 4 [0.241631] #3 [0.244075] Disabled fast string operations [0.244112] mce: CPU supports 0 MCE banks [0.244284] [Firmware Bug]: CPU3: APIC id mismatch. Firmware: 3 CPUID: 6 [0.244293] [Firmware Bug]: CPU3: Using firmware package id 3 instead of 6 For a 2 virtual sockets, 2 cores per socket, VMware seems to get its APIC table correct as this fixup code isn't triggered. The mapping looks like: [0.028911] topology_update_package_map: apicid 0 pkg 0 cpu 0 [0.029068] smpboot: APIC(0) Converting physical 0 to logical package 0, cpu 0 [0.029072] topology_update_package_map: apicid 1 pkg 0 cpu 1 [0.029220] topology_update_package_map: apicid 2 pkg 1 cpu 2 [0.029376] smpboot: APIC(2) Converting physical 1 to logical package 1, cpu 2 [0.029381] topology_update_package_map: apicid 3 pkg 1 cpu 3 [0.029525] smpboot: Max logical packages: 2 For a VM with 1 virtual socket and 4 cores, the behavior is again correct. [0.016198] topology_update_package_map: apicid 0 pkg 0 cpu 0 [0.016271] smpboot: APIC(0) Converting physical 0 to logical package 0, cpu 0 (88023fc0a040) [0.016273] topology_update_package_map: apicid 1 pkg 0 cpu 1 [0.016336] topology_update_package_map: apicid 2 pkg 0 cpu 2 [0.016397] topology_update_package_map: apicid 3 pkg 0 cpu 3 It looks like VMware might have some assumption about the minimum number of cores on a virtual socket. Regardless, the fix solves my problem! Thanks!
Re: [PATCH] x86/cpuid: Deal with broken firmware once more
On 11/09/2016 11:03 AM, Peter Zijlstra wrote: On Wed, Nov 09, 2016 at 04:35:51PM +0100, Thomas Gleixner wrote: Both ACPI and MP specifications require that the APIC id in the respective tables must be the same as the APIC id in CPUID. The kernel retrieves the physical package id from the APIC id during the ACPI/MP table scan and builds the physical to logical package map. There exist Virtualbox and Xen implementations which violate the spec. As a ISTR it was VMware, not VirtualBox, but whatever.. they're both crazy virt stuff. Yes, this was VMware in particular. It would be good to get this comment right so as not to mislead anyone. /* + * The physical to logical package id mapping is initialized from the + * acpi/mptables information. Make sure that CPUID actually agrees with + * that. + */ +static void sanitize_package_id(struct cpuinfo_x86 *c) +{ +#ifdef CONFIG_SMP + unsigned int pkg, apicid, cpu = smp_processor_id(); + + apicid = apic->cpu_present_to_apicid(cpu); + pkg = apicid >> boot_cpu_data.x86_coreid_bits; + + if (apicid != c->initial_apicid) { + pr_err(FW_BUG "CPU%u: APIC id mismatch. Firmware: %x CPUID: %x\n", + cpu, apicid, c->initial_apicid); Should we not also 'fix' c->initial_apicid ? Since we have c->apicid and c->initial_apicid it seems reasonable to keep one set to the "correct" value. I don't think c->initial_apicid is used past this. I should have some tests on this patch later today.
Re: [PATCH] x86/cpuid: Deal with broken firmware once more
On 11/09/2016 11:03 AM, Peter Zijlstra wrote: On Wed, Nov 09, 2016 at 04:35:51PM +0100, Thomas Gleixner wrote: Both ACPI and MP specifications require that the APIC id in the respective tables must be the same as the APIC id in CPUID. The kernel retrieves the physical package id from the APIC id during the ACPI/MP table scan and builds the physical to logical package map. There exist Virtualbox and Xen implementations which violate the spec. As a ISTR it was VMware, not VirtualBox, but whatever.. they're both crazy virt stuff. Yes, this was VMware in particular. It would be good to get this comment right so as not to mislead anyone. /* + * The physical to logical package id mapping is initialized from the + * acpi/mptables information. Make sure that CPUID actually agrees with + * that. + */ +static void sanitize_package_id(struct cpuinfo_x86 *c) +{ +#ifdef CONFIG_SMP + unsigned int pkg, apicid, cpu = smp_processor_id(); + + apicid = apic->cpu_present_to_apicid(cpu); + pkg = apicid >> boot_cpu_data.x86_coreid_bits; + + if (apicid != c->initial_apicid) { + pr_err(FW_BUG "CPU%u: APIC id mismatch. Firmware: %x CPUID: %x\n", + cpu, apicid, c->initial_apicid); Should we not also 'fix' c->initial_apicid ? Since we have c->apicid and c->initial_apicid it seems reasonable to keep one set to the "correct" value. I don't think c->initial_apicid is used past this. I should have some tests on this patch later today.
Re: [RFC PATCH] perf/x86/intel/rapl: avoid access unallocate memory
On 11/08/2016 09:31 AM, Thomas Gleixner wrote: On Tue, 8 Nov 2016, Charles (Chas) Williams wrote: [0.016335] topology_update_package_map: apicid 0 pkg 0 cpu 0 [0.016398] smpboot: APIC(0) Converting physical 0 to logical package 0, cpu 0 (88023fc0a040) [0.016399] topology_update_package_map: apicid 1 pkg 1 cpu 1 [0.016462] smpboot: APIC(1) Converting physical 1 to logical package 1, cpu 1 (88023fd0a040) So, I don't know where apic->cpu_present_to_apicid(cpu) is getting its apicid but it certainly doesn't seem to the match the apicid in the CPU's registers. For whatever reason, my VMware system is reporting that the second CPU has a local APIC ID of 2: The initial information comes from MP tables or ACPI. [0.009115] identify_cpu: cpu_index 0 phys_proc_id is now 0, apicid 0, initial_apicid 0 ... [0.237401] identify_cpu: cpu_index 1 phys_proc_id is now 2, apicid 2, initial_apicid 2 And the CPUID emulation tells something different. Sigh! I was thinking it might be better to call topology_update_package_map() at the bottom of identify_cpu() to setup the secondary CPU's. The boot cpu could be setup during smp_init_package_map(). Perhaps, but that does not make the inconsistencies go away By the time I know it's not consistent, there isn't anything I can do about it. I can't update the table to remove the bad information. The other alternative, is to trust the ACPI and just update the cpu_data's apicid in identify_cpu() to the value from the table. The earlier kernels didn't seem to rely as much on this information. But it does appear to be "wrong" in the APIC table. From acpidump: [02Ch 0044 1]Subtable Type : 00 [Processor Local APIC] [02Dh 0045 1] Length : 08 [02Eh 0046 1] Processor ID : 00 [02Fh 0047 1]Local Apic ID : 00 [030h 0048 4]Flags (decoded below) : 0001 Processor Enabled : 1 [034h 0052 1]Subtable Type : 00 [Processor Local APIC] [035h 0053 1] Length : 08 [036h 0054 1] Processor ID : 01 [037h 0055 1]Local Apic ID : 01 [038h 0056 4]Flags (decoded below) : 0001 Processor Enabled : 1
Re: [RFC PATCH] perf/x86/intel/rapl: avoid access unallocate memory
On 11/08/2016 09:31 AM, Thomas Gleixner wrote: On Tue, 8 Nov 2016, Charles (Chas) Williams wrote: [0.016335] topology_update_package_map: apicid 0 pkg 0 cpu 0 [0.016398] smpboot: APIC(0) Converting physical 0 to logical package 0, cpu 0 (88023fc0a040) [0.016399] topology_update_package_map: apicid 1 pkg 1 cpu 1 [0.016462] smpboot: APIC(1) Converting physical 1 to logical package 1, cpu 1 (88023fd0a040) So, I don't know where apic->cpu_present_to_apicid(cpu) is getting its apicid but it certainly doesn't seem to the match the apicid in the CPU's registers. For whatever reason, my VMware system is reporting that the second CPU has a local APIC ID of 2: The initial information comes from MP tables or ACPI. [0.009115] identify_cpu: cpu_index 0 phys_proc_id is now 0, apicid 0, initial_apicid 0 ... [0.237401] identify_cpu: cpu_index 1 phys_proc_id is now 2, apicid 2, initial_apicid 2 And the CPUID emulation tells something different. Sigh! I was thinking it might be better to call topology_update_package_map() at the bottom of identify_cpu() to setup the secondary CPU's. The boot cpu could be setup during smp_init_package_map(). Perhaps, but that does not make the inconsistencies go away By the time I know it's not consistent, there isn't anything I can do about it. I can't update the table to remove the bad information. The other alternative, is to trust the ACPI and just update the cpu_data's apicid in identify_cpu() to the value from the table. The earlier kernels didn't seem to rely as much on this information. But it does appear to be "wrong" in the APIC table. From acpidump: [02Ch 0044 1]Subtable Type : 00 [Processor Local APIC] [02Dh 0045 1] Length : 08 [02Eh 0046 1] Processor ID : 00 [02Fh 0047 1]Local Apic ID : 00 [030h 0048 4]Flags (decoded below) : 0001 Processor Enabled : 1 [034h 0052 1]Subtable Type : 00 [Processor Local APIC] [035h 0053 1] Length : 08 [036h 0054 1] Processor ID : 01 [037h 0055 1]Local Apic ID : 01 [038h 0056 4]Flags (decoded below) : 0001 Processor Enabled : 1
Re: [RFC PATCH] perf/x86/intel/rapl: avoid access unallocate memory
On 11/07/2016 03:20 PM, Thomas Gleixner wrote: On Mon, 7 Nov 2016, Charles (Chas) Williams wrote: On 11/07/2016 11:19 AM, Thomas Gleixner wrote: On Wed, 2 Nov 2016, Charles (Chas) Williams wrote: I don't know why the CPU's phys_proc_id is 2. max_physical_pkg_id gets initialized via: cpus = boot_cpu_data.x86_max_cores; max_physical_pkg_id = DIV_ROUND_UP(MAX_LOCAL_APIC, ncpus); What's the value of boot_cpu_data.x86_max_cores and MAX_LOCAL_APIC? I have discovered that that is not the problem. smp_init_package_map() is calculating the physical core id using the following: for_each_present_cpu(cpu) { unsigned int apicid = apic->cpu_present_to_apicid(cpu); ... if (!topology_update_package_map(apicid, cpu)) continue; ... int topology_update_package_map(unsigned int apicid, unsigned int cpu) { unsigned int new, pkg = apicid >> boot_cpu_data.x86_coreid_bits; But later when the secondary CPU's are identified they use a different calculation using the local APIC ID from the CPU's registers: static void generic_identify(struct cpuinfo_x86 *c) ... if (c->cpuid_level >= 0x0001) { c->initial_apicid = (cpuid_ebx(1) >> 24) & 0xFF; ... c->phys_proc_id = c->initial_apicid; So at the end of identify_cpu() when the boot/hotplug assignment is put back: c->logical_proc_id = topology_phys_to_logical_pkg(c->phys_proc_id); topology_phys_to_logical_pkg() is returning an invalid logical processor since one isn't configured. It's not clear to me what the right thing to do is or which is right. Nice detective work! So the issue is that the package mapping code honours boot_cpu_data.x86_coreid_bit, while generic_identify does not. boot_cpu_data.x86_coreid_bit is obviously 1 in your case. Tentative fix below. I still need to gow through that maze and figure out what could go wrong with that :( I don't think that will fix the issue. My deubugging leads me to believe that boot_cpu_data.x86_coreid_bits is probably 0: [0.016335] topology_update_package_map: apicid 0 pkg 0 cpu 0 [0.016398] smpboot: APIC(0) Converting physical 0 to logical package 0, cpu 0 (88023fc0a040) [0.016399] topology_update_package_map: apicid 1 pkg 1 cpu 1 [0.016462] smpboot: APIC(1) Converting physical 1 to logical package 1, cpu 1 (88023fd0a040) So, I don't know where apic->cpu_present_to_apicid(cpu) is getting its apicid but it certainly doesn't seem to the match the apicid in the CPU's registers. For whatever reason, my VMware system is reporting that the second CPU has a local APIC ID of 2: [0.009115] identify_cpu: cpu_index 0 phys_proc_id is now 0, apicid 0, initial_apicid 0 ... [0.237401] identify_cpu: cpu_index 1 phys_proc_id is now 2, apicid 2, initial_apicid 2 I was thinking it might be better to call topology_update_package_map() at the bottom of identify_cpu() to setup the secondary CPU's. The boot cpu could be setup during smp_init_package_map(). topology_update_package_map() is also poorly named it can create the assignment, but can't update it (since it doesn't know the previous mapping). 8< --- a/arch/x86/kernel/cpu/common.c +++ b/arch/x86/kernel/cpu/common.c @@ -905,6 +905,8 @@ static void detect_null_seg_behavior(str static void generic_identify(struct cpuinfo_x86 *c) { + unsigned int pkg; + c->extended_cpuid_level = 0; if (!have_cpuid_p()) @@ -929,7 +931,8 @@ static void generic_identify(struct cpui c->apicid = c->initial_apicid; # endif #endif - c->phys_proc_id = c->initial_apicid; + pkg = c->initial_apicid >> boot_cpu_data.x86_coreid_bits; + c->phys_proc_id = pkg; } get_model_name(c); /* Default name */
Re: [RFC PATCH] perf/x86/intel/rapl: avoid access unallocate memory
On 11/07/2016 03:20 PM, Thomas Gleixner wrote: On Mon, 7 Nov 2016, Charles (Chas) Williams wrote: On 11/07/2016 11:19 AM, Thomas Gleixner wrote: On Wed, 2 Nov 2016, Charles (Chas) Williams wrote: I don't know why the CPU's phys_proc_id is 2. max_physical_pkg_id gets initialized via: cpus = boot_cpu_data.x86_max_cores; max_physical_pkg_id = DIV_ROUND_UP(MAX_LOCAL_APIC, ncpus); What's the value of boot_cpu_data.x86_max_cores and MAX_LOCAL_APIC? I have discovered that that is not the problem. smp_init_package_map() is calculating the physical core id using the following: for_each_present_cpu(cpu) { unsigned int apicid = apic->cpu_present_to_apicid(cpu); ... if (!topology_update_package_map(apicid, cpu)) continue; ... int topology_update_package_map(unsigned int apicid, unsigned int cpu) { unsigned int new, pkg = apicid >> boot_cpu_data.x86_coreid_bits; But later when the secondary CPU's are identified they use a different calculation using the local APIC ID from the CPU's registers: static void generic_identify(struct cpuinfo_x86 *c) ... if (c->cpuid_level >= 0x0001) { c->initial_apicid = (cpuid_ebx(1) >> 24) & 0xFF; ... c->phys_proc_id = c->initial_apicid; So at the end of identify_cpu() when the boot/hotplug assignment is put back: c->logical_proc_id = topology_phys_to_logical_pkg(c->phys_proc_id); topology_phys_to_logical_pkg() is returning an invalid logical processor since one isn't configured. It's not clear to me what the right thing to do is or which is right. Nice detective work! So the issue is that the package mapping code honours boot_cpu_data.x86_coreid_bit, while generic_identify does not. boot_cpu_data.x86_coreid_bit is obviously 1 in your case. Tentative fix below. I still need to gow through that maze and figure out what could go wrong with that :( I don't think that will fix the issue. My deubugging leads me to believe that boot_cpu_data.x86_coreid_bits is probably 0: [0.016335] topology_update_package_map: apicid 0 pkg 0 cpu 0 [0.016398] smpboot: APIC(0) Converting physical 0 to logical package 0, cpu 0 (88023fc0a040) [0.016399] topology_update_package_map: apicid 1 pkg 1 cpu 1 [0.016462] smpboot: APIC(1) Converting physical 1 to logical package 1, cpu 1 (88023fd0a040) So, I don't know where apic->cpu_present_to_apicid(cpu) is getting its apicid but it certainly doesn't seem to the match the apicid in the CPU's registers. For whatever reason, my VMware system is reporting that the second CPU has a local APIC ID of 2: [0.009115] identify_cpu: cpu_index 0 phys_proc_id is now 0, apicid 0, initial_apicid 0 ... [0.237401] identify_cpu: cpu_index 1 phys_proc_id is now 2, apicid 2, initial_apicid 2 I was thinking it might be better to call topology_update_package_map() at the bottom of identify_cpu() to setup the secondary CPU's. The boot cpu could be setup during smp_init_package_map(). topology_update_package_map() is also poorly named it can create the assignment, but can't update it (since it doesn't know the previous mapping). 8< --- a/arch/x86/kernel/cpu/common.c +++ b/arch/x86/kernel/cpu/common.c @@ -905,6 +905,8 @@ static void detect_null_seg_behavior(str static void generic_identify(struct cpuinfo_x86 *c) { + unsigned int pkg; + c->extended_cpuid_level = 0; if (!have_cpuid_p()) @@ -929,7 +931,8 @@ static void generic_identify(struct cpui c->apicid = c->initial_apicid; # endif #endif - c->phys_proc_id = c->initial_apicid; + pkg = c->initial_apicid >> boot_cpu_data.x86_coreid_bits; + c->phys_proc_id = pkg; } get_model_name(c); /* Default name */
Re: [RFC PATCH] perf/x86/intel/rapl: avoid access unallocate memory
On 11/07/2016 11:19 AM, Thomas Gleixner wrote: On Wed, 2 Nov 2016, Charles (Chas) Williams wrote: On 11/02/2016 08:25 AM, Sebastian Andrzej Siewior wrote: I am not sure if this a race with the new hotplug code or something that was always there. Both (M. Vefa Bicakc and Charles) say that the box boots sometimes fine (without the patch). smp_store_boot_cpu_info() should have run before the notofoert and thus should have set the info properly. However I got the following bootlog from Charles with this patch: I don't this this is a race. Here is some debugging from the two CPU VM (2 sockets, 1 core per socket). In identify_cpu() we have: /* The boot/hotplug time assigment got cleared, restore it */ c->logical_proc_id = topology_phys_to_logical_pkg(c->phys_proc_id); The values just after this: [0.228306] identify_cpu: c 88023fd0a040 logical_proc_id 65535 c->phys_proc_id 2 So what's interesting here, is the phys_proc_id of 2 for CPU1: int topology_phys_to_logical_pkg(unsigned int phys_pkg) { if (phys_pkg >= max_physical_pkg_id) return -1; return physical_to_logical_pkg[phys_pkg]; } And we happen to know the max_physical_pkg_id is 2 in this case. So apparently, topology_phys_to_logical_pkg() returns -1 and it gets assigned to the logical_proc_id. I don't know why the CPU's phys_proc_id is 2. max_physical_pkg_id gets initialized via: cpus = boot_cpu_data.x86_max_cores; max_physical_pkg_id = DIV_ROUND_UP(MAX_LOCAL_APIC, ncpus); What's the value of boot_cpu_data.x86_max_cores and MAX_LOCAL_APIC? I have discovered that that is not the problem. smp_init_package_map() is calculating the physical core id using the following: for_each_present_cpu(cpu) { unsigned int apicid = apic->cpu_present_to_apicid(cpu); ... if (!topology_update_package_map(apicid, cpu)) continue; ... int topology_update_package_map(unsigned int apicid, unsigned int cpu) { unsigned int new, pkg = apicid >> boot_cpu_data.x86_coreid_bits; But later when the secondary CPU's are identified they use a different calculation using the local APIC ID from the CPU's registers: static void generic_identify(struct cpuinfo_x86 *c) ... if (c->cpuid_level >= 0x0001) { c->initial_apicid = (cpuid_ebx(1) >> 24) & 0xFF; ... c->phys_proc_id = c->initial_apicid; So at the end of identify_cpu() when the boot/hotplug assignment is put back: c->logical_proc_id = topology_phys_to_logical_pkg(c->phys_proc_id); topology_phys_to_logical_pkg() is returning an invalid logical processor since one isn't configured. It's not clear to me what the right thing to do is or which is right.
Re: [RFC PATCH] perf/x86/intel/rapl: avoid access unallocate memory
On 11/07/2016 11:19 AM, Thomas Gleixner wrote: On Wed, 2 Nov 2016, Charles (Chas) Williams wrote: On 11/02/2016 08:25 AM, Sebastian Andrzej Siewior wrote: I am not sure if this a race with the new hotplug code or something that was always there. Both (M. Vefa Bicakc and Charles) say that the box boots sometimes fine (without the patch). smp_store_boot_cpu_info() should have run before the notofoert and thus should have set the info properly. However I got the following bootlog from Charles with this patch: I don't this this is a race. Here is some debugging from the two CPU VM (2 sockets, 1 core per socket). In identify_cpu() we have: /* The boot/hotplug time assigment got cleared, restore it */ c->logical_proc_id = topology_phys_to_logical_pkg(c->phys_proc_id); The values just after this: [0.228306] identify_cpu: c 88023fd0a040 logical_proc_id 65535 c->phys_proc_id 2 So what's interesting here, is the phys_proc_id of 2 for CPU1: int topology_phys_to_logical_pkg(unsigned int phys_pkg) { if (phys_pkg >= max_physical_pkg_id) return -1; return physical_to_logical_pkg[phys_pkg]; } And we happen to know the max_physical_pkg_id is 2 in this case. So apparently, topology_phys_to_logical_pkg() returns -1 and it gets assigned to the logical_proc_id. I don't know why the CPU's phys_proc_id is 2. max_physical_pkg_id gets initialized via: cpus = boot_cpu_data.x86_max_cores; max_physical_pkg_id = DIV_ROUND_UP(MAX_LOCAL_APIC, ncpus); What's the value of boot_cpu_data.x86_max_cores and MAX_LOCAL_APIC? I have discovered that that is not the problem. smp_init_package_map() is calculating the physical core id using the following: for_each_present_cpu(cpu) { unsigned int apicid = apic->cpu_present_to_apicid(cpu); ... if (!topology_update_package_map(apicid, cpu)) continue; ... int topology_update_package_map(unsigned int apicid, unsigned int cpu) { unsigned int new, pkg = apicid >> boot_cpu_data.x86_coreid_bits; But later when the secondary CPU's are identified they use a different calculation using the local APIC ID from the CPU's registers: static void generic_identify(struct cpuinfo_x86 *c) ... if (c->cpuid_level >= 0x0001) { c->initial_apicid = (cpuid_ebx(1) >> 24) & 0xFF; ... c->phys_proc_id = c->initial_apicid; So at the end of identify_cpu() when the boot/hotplug assignment is put back: c->logical_proc_id = topology_phys_to_logical_pkg(c->phys_proc_id); topology_phys_to_logical_pkg() is returning an invalid logical processor since one isn't configured. It's not clear to me what the right thing to do is or which is right.
Re: [RFC PATCH] perf/x86/intel/rapl: avoid access unallocate memory
On 11/04/2016 02:03 PM, Sebastian Andrzej Siewior wrote: On 2016-11-04 08:20:37 [-0400], Charles (Chas) Williams wrote: The initial CPU boots and is identified: [0.009018] identify_boot_cpu [0.009174] generic_identify: phys_proc_id is now 0 ... [0.009427] identify_cpu: before c 81ae2680 logical_proc_id 0 c->phys_proc_id 0 [0.009506] identify_cpu: after c 81ae2680 logical_proc_id 65535 c->phys_proc_id 0 So, this is fine because the APIC hasn't been scanned yet. APIC now gets scanned: [0.015789] smpboot: APIC(0) Converting physical 0 to logical package 0, cpu 0 (88023fc0a040) [0.015794] smpboot: APIC(1) Converting physical 1 to logical package 1, cpu 1 (88023fd0a040) [0.015797] smpboot: Max logical packages: 2 where is the APICID here is comming from? This comes from here: unsigned int apicid = apic->cpu_present_to_apicid(cpu); if (apicid == BAD_APICID || !apic->apic_id_valid(apicid)) continue; if (!topology_update_package_map(apicid, cpu)) And I think this is the part that is "wrong". The apicid appears to be a logical CPU id. I believe that in most cases this mapping comes from x86_bios_cpu_apicid (or x86_cpu_to_apicid) which is generated in generic_processor_info() which maps apicid's to logical cpu indexes. Note that apic->cpu_present_to_apicid() is using just the cpu_index. for_each_present_cpu(cpu) { unsigned int apicid = apic->cpu_present_to_apicid(cpu); if (apicid == BAD_APICID || !apic->apic_id_valid(apicid)) continue; if (!topology_update_package_map(apicid, cpu)) continue; pr_warn("CPU %u APICId %x disabled\n", cpu, apicid); per_cpu(x86_bios_cpu_apicid, cpu) = BAD_APICID; set_cpu_possible(cpu, false); set_cpu_present(cpu, false); } So, at this point, I think everything is correct. But now the secondary CPU's "boot": [0.236569] identify_secondary_cpu [0.236620] generic_identify: phys_proc_id is now 2 so here is where fun starts. Xen has also arch/x86/xen/smp.c::cpu_bringup() where the phys_proc_id is changed. But isn't done for vmware but it might a place where they duct tape things. How is this APIC id different from the earlier? I guess based on your output that generic_identify() changes the content of phys_proc_id. [0.236745] identify_cpu: before c 88023fd0a040 logical_proc_id 65535 c->phys_proc_id 2 [0.236747] identify_cpu: after c 88023fd0a040 logical_proc_id 65535 c->phys_proc_id 2 So, APIC discovered I have a cpu 0 and 1 but generic_identify() is called my second CPU, 2. This is >= max_physical_pkg_id, so it is going to get set to -1. Now. max_physical_pkg_id is huge. The physical_to_logical_pkg array is set to -1 on init so slot two has the value -1. That is what you see - not the -1 because of ">= max_physical_pkg_id". The comment at the end of identfy_cpu() says: /* The boot/hotplug time assigment got cleared, restore it */ So, logical_proc_id being wrong here before restoration doesn't bother me since I assume something in booting the secondary CPU's clears any existing cpu data. I know detect_extended_topology() is likely being called for both CPU's and getting the right values (checking this now). I don't know why generic_identify() is resetting this value. I don't know either. But it is clearly reading the apic id twice and second approach is different from the first which leads to different results. So if you figure out how the first APICID for the second CPU is retrieved and then you see how it happens for the second time. There must be a difference. The phys core id from generic_identify() comes from the CPU's EBX register so we _know_ this is right. if (c->cpuid_level >= 0x0001) { c->initial_apicid = (cpuid_ebx(1) >> 24) & 0xFF; #ifdef CONFIG_X86_32 # ifdef CONFIG_SMP c->apicid = apic->phys_pkg_id(c->initial_apicid, 0); # else c->apicid = c->initial_apicid; # endif #endif c->phys_proc_id = c->initial_apicid; } The intel docs http://x86.renejeschke.de/html/file_module_x86_id_45.html claims this is the Local APIC ID. So it seems likely this is correct value. It's not clear it matter if this is the right value or not though. Even if this is the correct apicid, nothing knows about it. An argument could be made that instead of checking the cpuid level, we could just use the apicid based on the cpu index just like the other code. It would be consistent at least then.
Re: [RFC PATCH] perf/x86/intel/rapl: avoid access unallocate memory
On 11/04/2016 02:03 PM, Sebastian Andrzej Siewior wrote: On 2016-11-04 08:20:37 [-0400], Charles (Chas) Williams wrote: The initial CPU boots and is identified: [0.009018] identify_boot_cpu [0.009174] generic_identify: phys_proc_id is now 0 ... [0.009427] identify_cpu: before c 81ae2680 logical_proc_id 0 c->phys_proc_id 0 [0.009506] identify_cpu: after c 81ae2680 logical_proc_id 65535 c->phys_proc_id 0 So, this is fine because the APIC hasn't been scanned yet. APIC now gets scanned: [0.015789] smpboot: APIC(0) Converting physical 0 to logical package 0, cpu 0 (88023fc0a040) [0.015794] smpboot: APIC(1) Converting physical 1 to logical package 1, cpu 1 (88023fd0a040) [0.015797] smpboot: Max logical packages: 2 where is the APICID here is comming from? This comes from here: unsigned int apicid = apic->cpu_present_to_apicid(cpu); if (apicid == BAD_APICID || !apic->apic_id_valid(apicid)) continue; if (!topology_update_package_map(apicid, cpu)) And I think this is the part that is "wrong". The apicid appears to be a logical CPU id. I believe that in most cases this mapping comes from x86_bios_cpu_apicid (or x86_cpu_to_apicid) which is generated in generic_processor_info() which maps apicid's to logical cpu indexes. Note that apic->cpu_present_to_apicid() is using just the cpu_index. for_each_present_cpu(cpu) { unsigned int apicid = apic->cpu_present_to_apicid(cpu); if (apicid == BAD_APICID || !apic->apic_id_valid(apicid)) continue; if (!topology_update_package_map(apicid, cpu)) continue; pr_warn("CPU %u APICId %x disabled\n", cpu, apicid); per_cpu(x86_bios_cpu_apicid, cpu) = BAD_APICID; set_cpu_possible(cpu, false); set_cpu_present(cpu, false); } So, at this point, I think everything is correct. But now the secondary CPU's "boot": [0.236569] identify_secondary_cpu [0.236620] generic_identify: phys_proc_id is now 2 so here is where fun starts. Xen has also arch/x86/xen/smp.c::cpu_bringup() where the phys_proc_id is changed. But isn't done for vmware but it might a place where they duct tape things. How is this APIC id different from the earlier? I guess based on your output that generic_identify() changes the content of phys_proc_id. [0.236745] identify_cpu: before c 88023fd0a040 logical_proc_id 65535 c->phys_proc_id 2 [0.236747] identify_cpu: after c 88023fd0a040 logical_proc_id 65535 c->phys_proc_id 2 So, APIC discovered I have a cpu 0 and 1 but generic_identify() is called my second CPU, 2. This is >= max_physical_pkg_id, so it is going to get set to -1. Now. max_physical_pkg_id is huge. The physical_to_logical_pkg array is set to -1 on init so slot two has the value -1. That is what you see - not the -1 because of ">= max_physical_pkg_id". The comment at the end of identfy_cpu() says: /* The boot/hotplug time assigment got cleared, restore it */ So, logical_proc_id being wrong here before restoration doesn't bother me since I assume something in booting the secondary CPU's clears any existing cpu data. I know detect_extended_topology() is likely being called for both CPU's and getting the right values (checking this now). I don't know why generic_identify() is resetting this value. I don't know either. But it is clearly reading the apic id twice and second approach is different from the first which leads to different results. So if you figure out how the first APICID for the second CPU is retrieved and then you see how it happens for the second time. There must be a difference. The phys core id from generic_identify() comes from the CPU's EBX register so we _know_ this is right. if (c->cpuid_level >= 0x0001) { c->initial_apicid = (cpuid_ebx(1) >> 24) & 0xFF; #ifdef CONFIG_X86_32 # ifdef CONFIG_SMP c->apicid = apic->phys_pkg_id(c->initial_apicid, 0); # else c->apicid = c->initial_apicid; # endif #endif c->phys_proc_id = c->initial_apicid; } The intel docs http://x86.renejeschke.de/html/file_module_x86_id_45.html claims this is the Local APIC ID. So it seems likely this is correct value. It's not clear it matter if this is the right value or not though. Even if this is the correct apicid, nothing knows about it. An argument could be made that instead of checking the cpuid level, we could just use the apicid based on the cpu index just like the other code. It would be consistent at least then.
Re: [RFC PATCH] perf/x86/intel/rapl: avoid access unallocate memory
On 11/03/2016 01:47 PM, Sebastian Andrzej Siewior wrote: On 2016-11-02 18:47:49 [-0400], Charles (Chas) Williams wrote: I don't this this is a race. Here is some debugging from the two CPU VM (2 sockets, 1 core per socket). In identify_cpu() we have: /* The boot/hotplug time assigment got cleared, restore it */ c->logical_proc_id = topology_phys_to_logical_pkg(c->phys_proc_id); The values just after this: [0.228306] identify_cpu: c 88023fd0a040 logical_proc_id 65535 c->phys_proc_id 2 So what's interesting here, is the phys_proc_id of 2 for CPU1: int topology_phys_to_logical_pkg(unsigned int phys_pkg) { if (phys_pkg >= max_physical_pkg_id) return -1; return physical_to_logical_pkg[phys_pkg]; } And we happen to know the max_physical_pkg_id is 2 in this case. So apparently, topology_phys_to_logical_pkg() returns -1 and it gets assigned to the logical_proc_id. I don't know why the CPU's phys_proc_id is 2. This is the physical ID. You have two logical IDs (on your two sockets machine). What is max_physical_pkg_id? In order to get that -1 you would have to max_physical_pkg_id of 1 but code does max_physical_pkg_id = DIV_ROUND_UP(MAX_LOCAL_APIC, ncpus); and I would be a little surprised if this is 1. Sebastian The initial CPU boots and is identified: [0.009018] identify_boot_cpu [0.009174] generic_identify: phys_proc_id is now 0 ... [0.009427] identify_cpu: before c 81ae2680 logical_proc_id 0 c->phys_proc_id 0 [0.009506] identify_cpu: after c 81ae2680 logical_proc_id 65535 c->phys_proc_id 0 So, this is fine because the APIC hasn't been scanned yet. APIC now gets scanned: [0.015789] smpboot: APIC(0) Converting physical 0 to logical package 0, cpu 0 (88023fc0a040) [0.015794] smpboot: APIC(1) Converting physical 1 to logical package 1, cpu 1 (88023fd0a040) [0.015797] smpboot: Max logical packages: 2 So, at this point, I think everything is correct. But now the secondary CPU's "boot": [0.236569] identify_secondary_cpu [0.236620] generic_identify: phys_proc_id is now 2 [0.236745] identify_cpu: before c 88023fd0a040 logical_proc_id 65535 c->phys_proc_id 2 [0.236747] identify_cpu: after c 88023fd0a040 logical_proc_id 65535 c->phys_proc_id 2 So, APIC discovered I have a cpu 0 and 1 but generic_identify() is called my second CPU, 2. This is >= max_physical_pkg_id, so it is going to get set to -1. The comment at the end of identfy_cpu() says: /* The boot/hotplug time assigment got cleared, restore it */ So, logical_proc_id being wrong here before restoration doesn't bother me since I assume something in booting the secondary CPU's clears any existing cpu data. I know detect_extended_topology() is likely being called for both CPU's and getting the right values (checking this now). I don't know why generic_identify() is resetting this value.
Re: [RFC PATCH] perf/x86/intel/rapl: avoid access unallocate memory
On 11/03/2016 01:47 PM, Sebastian Andrzej Siewior wrote: On 2016-11-02 18:47:49 [-0400], Charles (Chas) Williams wrote: I don't this this is a race. Here is some debugging from the two CPU VM (2 sockets, 1 core per socket). In identify_cpu() we have: /* The boot/hotplug time assigment got cleared, restore it */ c->logical_proc_id = topology_phys_to_logical_pkg(c->phys_proc_id); The values just after this: [0.228306] identify_cpu: c 88023fd0a040 logical_proc_id 65535 c->phys_proc_id 2 So what's interesting here, is the phys_proc_id of 2 for CPU1: int topology_phys_to_logical_pkg(unsigned int phys_pkg) { if (phys_pkg >= max_physical_pkg_id) return -1; return physical_to_logical_pkg[phys_pkg]; } And we happen to know the max_physical_pkg_id is 2 in this case. So apparently, topology_phys_to_logical_pkg() returns -1 and it gets assigned to the logical_proc_id. I don't know why the CPU's phys_proc_id is 2. This is the physical ID. You have two logical IDs (on your two sockets machine). What is max_physical_pkg_id? In order to get that -1 you would have to max_physical_pkg_id of 1 but code does max_physical_pkg_id = DIV_ROUND_UP(MAX_LOCAL_APIC, ncpus); and I would be a little surprised if this is 1. Sebastian The initial CPU boots and is identified: [0.009018] identify_boot_cpu [0.009174] generic_identify: phys_proc_id is now 0 ... [0.009427] identify_cpu: before c 81ae2680 logical_proc_id 0 c->phys_proc_id 0 [0.009506] identify_cpu: after c 81ae2680 logical_proc_id 65535 c->phys_proc_id 0 So, this is fine because the APIC hasn't been scanned yet. APIC now gets scanned: [0.015789] smpboot: APIC(0) Converting physical 0 to logical package 0, cpu 0 (88023fc0a040) [0.015794] smpboot: APIC(1) Converting physical 1 to logical package 1, cpu 1 (88023fd0a040) [0.015797] smpboot: Max logical packages: 2 So, at this point, I think everything is correct. But now the secondary CPU's "boot": [0.236569] identify_secondary_cpu [0.236620] generic_identify: phys_proc_id is now 2 [0.236745] identify_cpu: before c 88023fd0a040 logical_proc_id 65535 c->phys_proc_id 2 [0.236747] identify_cpu: after c 88023fd0a040 logical_proc_id 65535 c->phys_proc_id 2 So, APIC discovered I have a cpu 0 and 1 but generic_identify() is called my second CPU, 2. This is >= max_physical_pkg_id, so it is going to get set to -1. The comment at the end of identfy_cpu() says: /* The boot/hotplug time assigment got cleared, restore it */ So, logical_proc_id being wrong here before restoration doesn't bother me since I assume something in booting the secondary CPU's clears any existing cpu data. I know detect_extended_topology() is likely being called for both CPU's and getting the right values (checking this now). I don't know why generic_identify() is resetting this value.
Re: [RFC PATCH] perf/x86/intel/rapl: avoid access unallocate memory
On 11/02/2016 08:25 AM, Sebastian Andrzej Siewior wrote: I am not sure if this a race with the new hotplug code or something that was always there. Both (M. Vefa Bicakc and Charles) say that the box boots sometimes fine (without the patch). smp_store_boot_cpu_info() should have run before the notofoert and thus should have set the info properly. However I got the following bootlog from Charles with this patch: I don't this this is a race. Here is some debugging from the two CPU VM (2 sockets, 1 core per socket). In identify_cpu() we have: /* The boot/hotplug time assigment got cleared, restore it */ c->logical_proc_id = topology_phys_to_logical_pkg(c->phys_proc_id); The values just after this: [0.228306] identify_cpu: c 88023fd0a040 logical_proc_id 65535 c->phys_proc_id 2 So what's interesting here, is the phys_proc_id of 2 for CPU1: int topology_phys_to_logical_pkg(unsigned int phys_pkg) { if (phys_pkg >= max_physical_pkg_id) return -1; return physical_to_logical_pkg[phys_pkg]; } And we happen to know the max_physical_pkg_id is 2 in this case. So apparently, topology_phys_to_logical_pkg() returns -1 and it gets assigned to the logical_proc_id. I don't know why the CPU's phys_proc_id is 2.
Re: [RFC PATCH] perf/x86/intel/rapl: avoid access unallocate memory
On 11/02/2016 08:25 AM, Sebastian Andrzej Siewior wrote: I am not sure if this a race with the new hotplug code or something that was always there. Both (M. Vefa Bicakc and Charles) say that the box boots sometimes fine (without the patch). smp_store_boot_cpu_info() should have run before the notofoert and thus should have set the info properly. However I got the following bootlog from Charles with this patch: I don't this this is a race. Here is some debugging from the two CPU VM (2 sockets, 1 core per socket). In identify_cpu() we have: /* The boot/hotplug time assigment got cleared, restore it */ c->logical_proc_id = topology_phys_to_logical_pkg(c->phys_proc_id); The values just after this: [0.228306] identify_cpu: c 88023fd0a040 logical_proc_id 65535 c->phys_proc_id 2 So what's interesting here, is the phys_proc_id of 2 for CPU1: int topology_phys_to_logical_pkg(unsigned int phys_pkg) { if (phys_pkg >= max_physical_pkg_id) return -1; return physical_to_logical_pkg[phys_pkg]; } And we happen to know the max_physical_pkg_id is 2 in this case. So apparently, topology_phys_to_logical_pkg() returns -1 and it gets assigned to the logical_proc_id. I don't know why the CPU's phys_proc_id is 2.
Re: [PREEMPT-RT] Oops in rapl_cpu_prepare()
On 10/28/2016 04:03 AM, Sebastian Andrzej Siewior wrote: On 2016-10-27 15:00:32 [-0400], Charles (Chas) Williams wrote: I assume "init_rapl_pmus: maxpkg 4" is from init_rapl_pmus() returning topology_max_packages(). So it says 4 but then returns 65535 for CPU 2 and 3. That -1 comes probably from topology_update_package_map(). Could you please send a complete boot log and try the following patch? This one should fix your boot problem and disable RAPL if the info is invalid. But sometimes the topology info is correct and if I get lucky, the package id could be valid for all the CPU's. Given the behavior, I have seen so far it makes me thing the RAPL isn't being emulated. So even if I did boot onto a "valid" set of cores, would I always be certain that I will be on those cores? I don't what vmware does here. Nor do they ship source to check. So if you have a big HW box with say two packages, it might make sense to give this information to the guest _if_ the CPUs are pinned and the guest never migrates. Yes, I agree _if_. That's why it simply isn't clear to me that we should attempt do any RAPL at all for VMWare. The current behavior doesn't seem to make sense and I don't expect it to suddenly start acting reasonable. Since I don't understand why some package id's are valid and others are not, I would prefer not to trust any of the information as far as enabling/disabling the RAPL monitoring. Per your request in your next email: One thing I forgot to ask: Could you please check if you get the same pkgid reported for cpu 0-3 on a pre-v4.8 kernel? (before the hotplug rework). Our previous kernel was 4.4, and didn't use the logical package id: I see. Did the patch I sent fixed it for you and were you not able to test? Yes, it does prevent RAPL from starting and loading. From the boot log: [2.711481] RAPL PMU: rapl pmu error: max package: 4 but CPU2 belongs to 65535 [2.711639] rapl pmu error: max package: 4 but CPU2 belongs to 65535 This was consistent across several reboots. I poked around in the VM settings. Apparently this guest is configured for four virtual sockets with one core per socket. Testing with two virtual sockets, one core per socket: [2.163177] RAPL PMU: rapl pmu error: max package: 2 but CPU1 belongs to 65535 [2.163304] rapl pmu error: max package: 2 but CPU1 belongs to 65535 Booting with 1 virtual socket, 1 core per socket: [1.750311] RAPL PMU: API unit is 2^-32 Joules, 3 fixed counters, 10737418240 ms ovfl timer [1.750312] RAPL PMU: hw unit of domain pp0-core 2^-0 Joules [1.750313] RAPL PMU: hw unit of domain package 2^-0 Joules [1.750314] RAPL PMU: hw unit of domain dram 2^-0 Joules Booting with 1 virtual socket, 4 cores per socket: [3.527298] RAPL PMU: API unit is 2^-32 Joules, 3 fixed counters, 10737418240 ms ovfl timer [3.527302] RAPL PMU: hw unit of domain pp0-core 2^-0 Joules [3.527304] RAPL PMU: hw unit of domain package 2^-0 Joules [3.527307] RAPL PMU: hw unit of domain dram 2^-0 Joules So, it looks like VMWare tends to always get something wrong if you have more than one virtual socket. The above behavior was consistent across several reboots.
Re: [PREEMPT-RT] Oops in rapl_cpu_prepare()
On 10/28/2016 04:03 AM, Sebastian Andrzej Siewior wrote: On 2016-10-27 15:00:32 [-0400], Charles (Chas) Williams wrote: I assume "init_rapl_pmus: maxpkg 4" is from init_rapl_pmus() returning topology_max_packages(). So it says 4 but then returns 65535 for CPU 2 and 3. That -1 comes probably from topology_update_package_map(). Could you please send a complete boot log and try the following patch? This one should fix your boot problem and disable RAPL if the info is invalid. But sometimes the topology info is correct and if I get lucky, the package id could be valid for all the CPU's. Given the behavior, I have seen so far it makes me thing the RAPL isn't being emulated. So even if I did boot onto a "valid" set of cores, would I always be certain that I will be on those cores? I don't what vmware does here. Nor do they ship source to check. So if you have a big HW box with say two packages, it might make sense to give this information to the guest _if_ the CPUs are pinned and the guest never migrates. Yes, I agree _if_. That's why it simply isn't clear to me that we should attempt do any RAPL at all for VMWare. The current behavior doesn't seem to make sense and I don't expect it to suddenly start acting reasonable. Since I don't understand why some package id's are valid and others are not, I would prefer not to trust any of the information as far as enabling/disabling the RAPL monitoring. Per your request in your next email: One thing I forgot to ask: Could you please check if you get the same pkgid reported for cpu 0-3 on a pre-v4.8 kernel? (before the hotplug rework). Our previous kernel was 4.4, and didn't use the logical package id: I see. Did the patch I sent fixed it for you and were you not able to test? Yes, it does prevent RAPL from starting and loading. From the boot log: [2.711481] RAPL PMU: rapl pmu error: max package: 4 but CPU2 belongs to 65535 [2.711639] rapl pmu error: max package: 4 but CPU2 belongs to 65535 This was consistent across several reboots. I poked around in the VM settings. Apparently this guest is configured for four virtual sockets with one core per socket. Testing with two virtual sockets, one core per socket: [2.163177] RAPL PMU: rapl pmu error: max package: 2 but CPU1 belongs to 65535 [2.163304] rapl pmu error: max package: 2 but CPU1 belongs to 65535 Booting with 1 virtual socket, 1 core per socket: [1.750311] RAPL PMU: API unit is 2^-32 Joules, 3 fixed counters, 10737418240 ms ovfl timer [1.750312] RAPL PMU: hw unit of domain pp0-core 2^-0 Joules [1.750313] RAPL PMU: hw unit of domain package 2^-0 Joules [1.750314] RAPL PMU: hw unit of domain dram 2^-0 Joules Booting with 1 virtual socket, 4 cores per socket: [3.527298] RAPL PMU: API unit is 2^-32 Joules, 3 fixed counters, 10737418240 ms ovfl timer [3.527302] RAPL PMU: hw unit of domain pp0-core 2^-0 Joules [3.527304] RAPL PMU: hw unit of domain package 2^-0 Joules [3.527307] RAPL PMU: hw unit of domain dram 2^-0 Joules So, it looks like VMWare tends to always get something wrong if you have more than one virtual socket. The above behavior was consistent across several reboots.
Re: [PREEMPT-RT] Oops in rapl_cpu_prepare()
On 10/25/2016 08:22 AM, Sebastian Andrzej Siewior wrote: On 2016-10-21 17:03:56 [-0400], Charles (Chas) Williams wrote: [3.107126] init_rapl_pmus: maxpkg 4 there! vmware bug. It probably worked by chance. Yes, the behavior is a bit random. I assume "init_rapl_pmus: maxpkg 4" is from init_rapl_pmus() returning topology_max_packages(). So it says 4 but then returns 65535 for CPU 2 and 3. That -1 comes probably from topology_update_package_map(). Could you please send a complete boot log and try the following patch? This one should fix your boot problem and disable RAPL if the info is invalid. But sometimes the topology info is correct and if I get lucky, the package id could be valid for all the CPU's. Given the behavior, I have seen so far it makes me thing the RAPL isn't being emulated. So even if I did boot onto a "valid" set of cores, would I always be certain that I will be on those cores? diff --git a/arch/x86/events/intel/rapl.c b/arch/x86/events/intel/rapl.c index 0a535cea8ff3..f5d85f2853d7 100644 --- a/arch/x86/events/intel/rapl.c +++ b/arch/x86/events/intel/rapl.c @@ -682,6 +682,15 @@ static int __init init_rapl_pmus(void) { int maxpkg = topology_max_packages(); size_t size; + unsigned int cpu; + + for_each_possible_cpu(cpu) { + if (topology_logical_package_id(cpu) >= maxpkg) { + pr_err("rapl pmu error: max package: %u but CPU%d belongs to %u\n", + maxpkg, cpu, topology_logical_package_id(cpu)); + return -EINVAL; + } + } size = sizeof(*rapl_pmus) + maxpkg * sizeof(struct rapl_pmu *); rapl_pmus = kzalloc(size, GFP_KERNEL); Per your request in your next email: One thing I forgot to ask: Could you please check if you get the same pkgid reported for cpu 0-3 on a pre-v4.8 kernel? (before the hotplug rework). Our previous kernel was 4.4, and didn't use the logical package id: /* check if phys_is is already covered */ for_each_cpu(i, _cpu_mask) { if (phys_id == topology_physical_package_id(i)) return;
Re: [PREEMPT-RT] Oops in rapl_cpu_prepare()
On 10/25/2016 08:22 AM, Sebastian Andrzej Siewior wrote: On 2016-10-21 17:03:56 [-0400], Charles (Chas) Williams wrote: [3.107126] init_rapl_pmus: maxpkg 4 there! vmware bug. It probably worked by chance. Yes, the behavior is a bit random. I assume "init_rapl_pmus: maxpkg 4" is from init_rapl_pmus() returning topology_max_packages(). So it says 4 but then returns 65535 for CPU 2 and 3. That -1 comes probably from topology_update_package_map(). Could you please send a complete boot log and try the following patch? This one should fix your boot problem and disable RAPL if the info is invalid. But sometimes the topology info is correct and if I get lucky, the package id could be valid for all the CPU's. Given the behavior, I have seen so far it makes me thing the RAPL isn't being emulated. So even if I did boot onto a "valid" set of cores, would I always be certain that I will be on those cores? diff --git a/arch/x86/events/intel/rapl.c b/arch/x86/events/intel/rapl.c index 0a535cea8ff3..f5d85f2853d7 100644 --- a/arch/x86/events/intel/rapl.c +++ b/arch/x86/events/intel/rapl.c @@ -682,6 +682,15 @@ static int __init init_rapl_pmus(void) { int maxpkg = topology_max_packages(); size_t size; + unsigned int cpu; + + for_each_possible_cpu(cpu) { + if (topology_logical_package_id(cpu) >= maxpkg) { + pr_err("rapl pmu error: max package: %u but CPU%d belongs to %u\n", + maxpkg, cpu, topology_logical_package_id(cpu)); + return -EINVAL; + } + } size = sizeof(*rapl_pmus) + maxpkg * sizeof(struct rapl_pmu *); rapl_pmus = kzalloc(size, GFP_KERNEL); Per your request in your next email: One thing I forgot to ask: Could you please check if you get the same pkgid reported for cpu 0-3 on a pre-v4.8 kernel? (before the hotplug rework). Our previous kernel was 4.4, and didn't use the logical package id: /* check if phys_is is already covered */ for_each_cpu(i, _cpu_mask) { if (phys_id == topology_physical_package_id(i)) return;
Re: [PREEMPT-RT] Oops in rapl_cpu_prepare()
On 10/21/2016 06:56 AM, Sebastian Andrzej Siewior wrote: On 2016-10-20 16:27:55 [-0400], Charles (Chas) Williams wrote: Recent 4.8 kernels have been oopsing when running under VMWare: can you reproduce this on bare metal? I can't get dedicated access to the specific bare metal since it is running as a dedicated hypervisor. I haven't seen this issue anywhere else though with the 4.8 kernel. [2.270203] BUG: unable to handle kernel NULL pointer dereference at 0408 [2.270325] IP: [] rapl_cpu_online+0x59/0x70 can you check if pmu is NULL? It's not. The dereference at 0x408 and pmu->cpu being fairly early in the struct seems to indicate that pmu wasn't pointing to 0 at the time (but fairly close). I should have noticed that earlier. Is there a particular order guaranteed by the callbacks? Will rapl_cpu_prepare() always happen before online/offline? Additionally, yes, see include/linux/cpuhotplug.h. On CPU-up the array ids are invoked from CPUHP_OFFLINE till CPUHP_ONLINE. Yes, I see that now. Thanks for the pointer! If a callback (such as CPUHP_PERF_X86_RAPL_PREP) fail then we rollback to the starting point (in case of CPU up it would be CPUHP_OFFLINE. You'll like this, I just did a little printk debugging because it was easier than trying to get a debugger running: [3.107126] init_rapl_pmus: maxpkg 4 [3.107263] rapl_cpu_prepare: pmu 880234faa540 cpu 0 pkgid 0 [3.107400] rapl_cpu_prepare: pmu 880234faa600 cpu 1 pkgid 2 [3.107537] rapl_cpu_prepare: pmu 880234faa6c0 cpu 2 pkgid 65535 [3.107662] rapl_cpu_online: pmu 880234faa540 cpu 0 pkgid 0 [3.107907] rapl_cpu_online: pmu 880234faa600 cpu 1 pkgid 2 [3.108133] rapl_cpu_online: pmu 880234faa6c0 cpu 2 pkgid 65535 [3.108333] rapl_cpu_online: pmu 880234faa6c0 cpu 3 pkgid 65535 where pkgid is topology_logical_package_id(cpu). I can't understand why I don't see a cpu 3 during cpu prepare, when I see one later. The 65535 is a -1 from topology_phys_to_logical_pkg() getting assigned to the logical_proc_id apparently. So this is pretty puzzling. Since this is a guest running under VMWare, I don't know that there is any particular CPU pinning or emulation of RAPL. It looks there was a proposal to not run in guests: https://lkml.org/lkml/2015/12/3/559
Re: [PREEMPT-RT] Oops in rapl_cpu_prepare()
On 10/21/2016 06:56 AM, Sebastian Andrzej Siewior wrote: On 2016-10-20 16:27:55 [-0400], Charles (Chas) Williams wrote: Recent 4.8 kernels have been oopsing when running under VMWare: can you reproduce this on bare metal? I can't get dedicated access to the specific bare metal since it is running as a dedicated hypervisor. I haven't seen this issue anywhere else though with the 4.8 kernel. [2.270203] BUG: unable to handle kernel NULL pointer dereference at 0408 [2.270325] IP: [] rapl_cpu_online+0x59/0x70 can you check if pmu is NULL? It's not. The dereference at 0x408 and pmu->cpu being fairly early in the struct seems to indicate that pmu wasn't pointing to 0 at the time (but fairly close). I should have noticed that earlier. Is there a particular order guaranteed by the callbacks? Will rapl_cpu_prepare() always happen before online/offline? Additionally, yes, see include/linux/cpuhotplug.h. On CPU-up the array ids are invoked from CPUHP_OFFLINE till CPUHP_ONLINE. Yes, I see that now. Thanks for the pointer! If a callback (such as CPUHP_PERF_X86_RAPL_PREP) fail then we rollback to the starting point (in case of CPU up it would be CPUHP_OFFLINE. You'll like this, I just did a little printk debugging because it was easier than trying to get a debugger running: [3.107126] init_rapl_pmus: maxpkg 4 [3.107263] rapl_cpu_prepare: pmu 880234faa540 cpu 0 pkgid 0 [3.107400] rapl_cpu_prepare: pmu 880234faa600 cpu 1 pkgid 2 [3.107537] rapl_cpu_prepare: pmu 880234faa6c0 cpu 2 pkgid 65535 [3.107662] rapl_cpu_online: pmu 880234faa540 cpu 0 pkgid 0 [3.107907] rapl_cpu_online: pmu 880234faa600 cpu 1 pkgid 2 [3.108133] rapl_cpu_online: pmu 880234faa6c0 cpu 2 pkgid 65535 [3.108333] rapl_cpu_online: pmu 880234faa6c0 cpu 3 pkgid 65535 where pkgid is topology_logical_package_id(cpu). I can't understand why I don't see a cpu 3 during cpu prepare, when I see one later. The 65535 is a -1 from topology_phys_to_logical_pkg() getting assigned to the logical_proc_id apparently. So this is pretty puzzling. Since this is a guest running under VMWare, I don't know that there is any particular CPU pinning or emulation of RAPL. It looks there was a proposal to not run in guests: https://lkml.org/lkml/2015/12/3/559
Oops in rapl_cpu_prepare()
Recent 4.8 kernels have been oopsing when running under VMWare: [2.270203] BUG: unable to handle kernel NULL pointer dereference at 0408 [2.270325] IP: [] rapl_cpu_online+0x59/0x70 [2.270448] PGD 0 [2.270570] Oops: 0002 [#1] SMP [2.270693] Modules linked in: [2.270815] CPU: 2 PID: 21 Comm: cpuhp/2 Not tainted 4.8.2-1-amd64-vyatta #1 [2.270938] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 04/14/2014 [2.271060] task: 8802361fc2c0 task.stack: 880236208000 [2.271183] RIP: 0010:[] [] rapl_cpu_online+0x59/0x70 [2.271306] RSP: :88023620be68 EFLAGS: 00010246 [2.271428] RAX: 0004 RBX: 88023fd0d940 RCX: [2.271551] RDX: 0040 RSI: 0004 RDI: 0004 [2.271673] RBP: 0002 R08: fffc R09: [2.271796] R10: R11: R12: 0400 [2.271918] R13: 8802361fc2c0 R14: 8802361fc2c0 R15: 8802361fc2c0 [2.272041] FS: () GS:88023fd0() knlGS: [2.272163] CS: 0010 DS: ES: CR0: 80050033 [2.272286] CR2: 0408 CR3: 01a06000 CR4: 000406e0 [2.272408] Stack: [2.272531] 88023fd0d940 0002 81a38240 81061231 [2.272654] 8802361fc2c0 880237002180 8107ddcf [2.272776] 8802361a5a80 880237002180 8107dcb0 81a6a380 [2.272899] Call Trace: [2.273021] [] ? cpuhp_thread_fun+0x31/0x100 [2.273144] [] ? smpboot_thread_fn+0x11f/0x180 [2.273266] [] ? sort_range+0x20/0x20 [2.273389] [] ? kthread+0xca/0xe0 [2.273511] [] ? ret_from_fork+0x1f/0x40 [2.273634] [] ? kthread_park+0x50/0x50 [2.273757] Code: 00 00 48 83 c0 22 4c 8b 24 c1 48 c7 c0 30 a1 00 00 48 8b 14 10 e8 a8 61 26 00 3b 05 b6 56 ae 00 7c 0e f0 48 0f a [2.279445] RIP [] rapl_cpu_online+0x59/0x70 [2.279568] RSP [2.279690] CR2: 0408 [2.279813] ---[ end trace c95da920748eb432 ]--- gdb tells me: (gdb) info line *(rapl_cpu_online+0x59) Line 595 of "arch/x86/events/intel/rapl.c" starts at address 0x81012bb9and ends at 0x81012bbe . Which is: target = cpumask_any_and(_cpu_mask, topology_core_cpumask(cpu)); if (target < nr_cpu_ids) return 0; cpumask_set_cpu(cpu, _cpu_mask); pmu->cpu = cpu; << return 0; This code was recently changed by commit 8b5b773d6245138c "perf/x86/intel/rapl: Convert to hotplug state machine" and it appears that the setup is done as a callback: /* * Install callbacks. Core will call them for each online cpu. */ ret = cpuhp_setup_state(CPUHP_PERF_X86_RAPL_PREP, "PERF_X86_RAPL_PREP", rapl_cpu_prepare, NULL); if (ret) goto out; ret = cpuhp_setup_state(CPUHP_AP_PERF_X86_RAPL_ONLINE, "AP_PERF_X86_RAPL_ONLINE", rapl_cpu_online, rapl_cpu_offline); Is there a particular order guaranteed by the callbacks? Will rapl_cpu_prepare() always happen before online/offline? Additionally, rapl_cpu_prepare() can fail to allocate pmu, static int rapl_cpu_prepare(unsigned int cpu) { struct rapl_pmu *pmu = cpu_to_rapl_pmu(cpu); if (pmu) return 0; pmu = kzalloc_node(sizeof(*pmu), GFP_KERNEL, cpu_to_node(cpu)); if (!pmu) return -ENOMEM; But rapl_cpu_online() would have no idea about this. What should be done in this case?
Oops in rapl_cpu_prepare()
Recent 4.8 kernels have been oopsing when running under VMWare: [2.270203] BUG: unable to handle kernel NULL pointer dereference at 0408 [2.270325] IP: [] rapl_cpu_online+0x59/0x70 [2.270448] PGD 0 [2.270570] Oops: 0002 [#1] SMP [2.270693] Modules linked in: [2.270815] CPU: 2 PID: 21 Comm: cpuhp/2 Not tainted 4.8.2-1-amd64-vyatta #1 [2.270938] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 04/14/2014 [2.271060] task: 8802361fc2c0 task.stack: 880236208000 [2.271183] RIP: 0010:[] [] rapl_cpu_online+0x59/0x70 [2.271306] RSP: :88023620be68 EFLAGS: 00010246 [2.271428] RAX: 0004 RBX: 88023fd0d940 RCX: [2.271551] RDX: 0040 RSI: 0004 RDI: 0004 [2.271673] RBP: 0002 R08: fffc R09: [2.271796] R10: R11: R12: 0400 [2.271918] R13: 8802361fc2c0 R14: 8802361fc2c0 R15: 8802361fc2c0 [2.272041] FS: () GS:88023fd0() knlGS: [2.272163] CS: 0010 DS: ES: CR0: 80050033 [2.272286] CR2: 0408 CR3: 01a06000 CR4: 000406e0 [2.272408] Stack: [2.272531] 88023fd0d940 0002 81a38240 81061231 [2.272654] 8802361fc2c0 880237002180 8107ddcf [2.272776] 8802361a5a80 880237002180 8107dcb0 81a6a380 [2.272899] Call Trace: [2.273021] [] ? cpuhp_thread_fun+0x31/0x100 [2.273144] [] ? smpboot_thread_fn+0x11f/0x180 [2.273266] [] ? sort_range+0x20/0x20 [2.273389] [] ? kthread+0xca/0xe0 [2.273511] [] ? ret_from_fork+0x1f/0x40 [2.273634] [] ? kthread_park+0x50/0x50 [2.273757] Code: 00 00 48 83 c0 22 4c 8b 24 c1 48 c7 c0 30 a1 00 00 48 8b 14 10 e8 a8 61 26 00 3b 05 b6 56 ae 00 7c 0e f0 48 0f a [2.279445] RIP [] rapl_cpu_online+0x59/0x70 [2.279568] RSP [2.279690] CR2: 0408 [2.279813] ---[ end trace c95da920748eb432 ]--- gdb tells me: (gdb) info line *(rapl_cpu_online+0x59) Line 595 of "arch/x86/events/intel/rapl.c" starts at address 0x81012bb9 and ends at 0x81012bbe . Which is: target = cpumask_any_and(_cpu_mask, topology_core_cpumask(cpu)); if (target < nr_cpu_ids) return 0; cpumask_set_cpu(cpu, _cpu_mask); pmu->cpu = cpu; << return 0; This code was recently changed by commit 8b5b773d6245138c "perf/x86/intel/rapl: Convert to hotplug state machine" and it appears that the setup is done as a callback: /* * Install callbacks. Core will call them for each online cpu. */ ret = cpuhp_setup_state(CPUHP_PERF_X86_RAPL_PREP, "PERF_X86_RAPL_PREP", rapl_cpu_prepare, NULL); if (ret) goto out; ret = cpuhp_setup_state(CPUHP_AP_PERF_X86_RAPL_ONLINE, "AP_PERF_X86_RAPL_ONLINE", rapl_cpu_online, rapl_cpu_offline); Is there a particular order guaranteed by the callbacks? Will rapl_cpu_prepare() always happen before online/offline? Additionally, rapl_cpu_prepare() can fail to allocate pmu, static int rapl_cpu_prepare(unsigned int cpu) { struct rapl_pmu *pmu = cpu_to_rapl_pmu(cpu); if (pmu) return 0; pmu = kzalloc_node(sizeof(*pmu), GFP_KERNEL, cpu_to_node(cpu)); if (!pmu) return -ENOMEM; But rapl_cpu_online() would have no idea about this. What should be done in this case?
Re: [PATCH v3 1/1] atm: solos-pci: Replace simple_strtol by kstrtoint
On Thu, 2015-12-03 at 13:58 +0100, LABBE Corentin wrote: > On Thu, Dec 03, 2015 at 06:26:31AM -0500, Charles (Chas) Williams wrote: > > On Thu, 2015-12-03 at 09:06 +0100, LABBE Corentin wrote: > > > @@ -357,11 +357,11 @@ static int process_status(struct solos_card *card, > > > int port, struct sk_buff *skb > > > if (!str) > > > return -EIO; > > > > > > - ver = simple_strtol(str, NULL, 10); > > > - if (ver < 1) { > > > + err = kstrtoint(str, 10, ); > > > + if (err || ver < 1) { > > > dev_warn(>dev->dev, "Unexpected status interrupt version > > > %d\n", > > >ver); > > > - return -EIO; > > > + return err; > > > > > > If ver < 1 then you might return a 0 here. Always returning -EIO is > > probably just fine. > > > > Hello > > I think the best solution is to split the test, since returning error code > from kstrtoint was asked by David Miller. > if (err) > return err; > if (ver < 1) > return -EIO; > Thanks > Regards That's fine. You just shouldn't return 0 if the ver < 1. This isn't timing critical code. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v3 1/1] atm: solos-pci: Replace simple_strtol by kstrtoint
On Thu, 2015-12-03 at 09:06 +0100, LABBE Corentin wrote: > @@ -357,11 +357,11 @@ static int process_status(struct solos_card *card, int > port, struct sk_buff *skb > if (!str) > return -EIO; > > - ver = simple_strtol(str, NULL, 10); > - if (ver < 1) { > + err = kstrtoint(str, 10, ); > + if (err || ver < 1) { > dev_warn(>dev->dev, "Unexpected status interrupt version > %d\n", >ver); > - return -EIO; > + return err; If ver < 1 then you might return a 0 here. Always returning -EIO is probably just fine. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v3 1/1] atm: solos-pci: Replace simple_strtol by kstrtoint
On Thu, 2015-12-03 at 09:06 +0100, LABBE Corentin wrote: > @@ -357,11 +357,11 @@ static int process_status(struct solos_card *card, int > port, struct sk_buff *skb > if (!str) > return -EIO; > > - ver = simple_strtol(str, NULL, 10); > - if (ver < 1) { > + err = kstrtoint(str, 10, ); > + if (err || ver < 1) { > dev_warn(>dev->dev, "Unexpected status interrupt version > %d\n", >ver); > - return -EIO; > + return err; If ver < 1 then you might return a 0 here. Always returning -EIO is probably just fine. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v3 1/1] atm: solos-pci: Replace simple_strtol by kstrtoint
On Thu, 2015-12-03 at 13:58 +0100, LABBE Corentin wrote: > On Thu, Dec 03, 2015 at 06:26:31AM -0500, Charles (Chas) Williams wrote: > > On Thu, 2015-12-03 at 09:06 +0100, LABBE Corentin wrote: > > > @@ -357,11 +357,11 @@ static int process_status(struct solos_card *card, > > > int port, struct sk_buff *skb > > > if (!str) > > > return -EIO; > > > > > > - ver = simple_strtol(str, NULL, 10); > > > - if (ver < 1) { > > > + err = kstrtoint(str, 10, ); > > > + if (err || ver < 1) { > > > dev_warn(>dev->dev, "Unexpected status interrupt version > > > %d\n", > > >ver); > > > - return -EIO; > > > + return err; > > > > > > If ver < 1 then you might return a 0 here. Always returning -EIO is > > probably just fine. > > > > Hello > > I think the best solution is to split the test, since returning error code > from kstrtoint was asked by David Miller. > if (err) > return err; > if (ver < 1) > return -EIO; > Thanks > Regards That's fine. You just shouldn't return 0 if the ver < 1. This isn't timing critical code. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/2] atm: iphase: Fix misleading indention and return -ENOMEM on error
On Sat, 2015-10-10 at 21:47 +0200, Tillmann Heidsieck wrote: > this series fixes two of them. The if(); warning would require > restructuring the code to a larger extend. Beyond this there remains a > whooping number of > 2k checkpatch.pl warnings and errors each. Those > can be grouped into ... > Generally I would not mind cleaning all this up for those who have to > make functional changes to the driver. However, I would like to know > from the maintainers if such an afford would be welcome or not. It doesn't bother me if you do this. I can review it. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/2] atm: iphase: Fix misleading indention and return -ENOMEM on error
On Sat, 2015-10-10 at 21:47 +0200, Tillmann Heidsieck wrote: > this series fixes two of them. The if(); warning would require > restructuring the code to a larger extend. Beyond this there remains a > whooping number of > 2k checkpatch.pl warnings and errors each. Those > can be grouped into ... > Generally I would not mind cleaning all this up for those who have to > make functional changes to the driver. However, I would like to know > from the maintainers if such an afford would be welcome or not. It doesn't bother me if you do this. I can review it. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 07/12] atm: hide 'struct zatm_t_hist'
On Wed, 2015-09-30 at 13:26 +0200, Arnd Bergmann wrote: > The zatm_t_hist structure is not used anywhere in the kernel, but is > exported to user space. As we are trying to eliminate uses of time_t > in the kernel for y2038 compatibility, the current definition triggers > checking tools because it contains 'struct timeval'. > > We can work around this by adding '#ifdef __KERNEL__'. I could not find > out what the structure is actually used for, so this is the safe choice > in case there is some user space tool that relies on the definition. > > If we are sure that nothing in user space relies on the structure, we > can instead remove the definition completely. It was used by the ZATM_GETHIST ioctl which is long since gone in the kernel driver. You can just remove this. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 07/12] atm: hide 'struct zatm_t_hist'
On Wed, 2015-09-30 at 13:26 +0200, Arnd Bergmann wrote: > The zatm_t_hist structure is not used anywhere in the kernel, but is > exported to user space. As we are trying to eliminate uses of time_t > in the kernel for y2038 compatibility, the current definition triggers > checking tools because it contains 'struct timeval'. > > We can work around this by adding '#ifdef __KERNEL__'. I could not find > out what the structure is actually used for, so this is the safe choice > in case there is some user space tool that relies on the definition. > > If we are sure that nothing in user space relies on the structure, we > can instead remove the definition completely. It was used by the ZATM_GETHIST ioctl which is long since gone in the kernel driver. You can just remove this. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/4] ozwpan: Use proper check to prevent heap overflow
On Fri, 2015-05-15 at 15:02 +, David Laight wrote: > From: Jason A. Donenfeld > > Sent: 13 May 2015 19:34 > > Since elt->length is a u8, we can make this variable a u8. Then we can > > do proper bounds checking more easily. Without this, a potentially > > negative value is passed to the memcpy inside oz_hcd_get_desc_cnf, > > resulting in a remotely exploitable heap overflow with network > > supplied data. > ... > > diff --git a/drivers/staging/ozwpan/ozusbsvc1.c > > b/drivers/staging/ozwpan/ozusbsvc1.c > > index d434d8c..cd6c63e 100644 > > --- a/drivers/staging/ozwpan/ozusbsvc1.c > > +++ b/drivers/staging/ozwpan/ozusbsvc1.c > > @@ -390,8 +390,10 @@ void oz_usb_rx(struct oz_pd *pd, struct oz_elt *elt) > > case OZ_GET_DESC_RSP: { > > struct oz_get_desc_rsp *body = > > (struct oz_get_desc_rsp *)usb_hdr; > > - int data_len = elt->length - > > + u8 data_len = elt->length - > > sizeof(struct oz_get_desc_rsp) + 1; > > + if (data_len > elt->length) > > + break; This check already seems bogus? It's really: if (sizeof(struct oz_get_desc_rsp) - 1 < 0) -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/4] ozwpan: Use proper check to prevent heap overflow
On Fri, 2015-05-15 at 15:02 +, David Laight wrote: From: Jason A. Donenfeld Sent: 13 May 2015 19:34 Since elt-length is a u8, we can make this variable a u8. Then we can do proper bounds checking more easily. Without this, a potentially negative value is passed to the memcpy inside oz_hcd_get_desc_cnf, resulting in a remotely exploitable heap overflow with network supplied data. ... diff --git a/drivers/staging/ozwpan/ozusbsvc1.c b/drivers/staging/ozwpan/ozusbsvc1.c index d434d8c..cd6c63e 100644 --- a/drivers/staging/ozwpan/ozusbsvc1.c +++ b/drivers/staging/ozwpan/ozusbsvc1.c @@ -390,8 +390,10 @@ void oz_usb_rx(struct oz_pd *pd, struct oz_elt *elt) case OZ_GET_DESC_RSP: { struct oz_get_desc_rsp *body = (struct oz_get_desc_rsp *)usb_hdr; - int data_len = elt-length - + u8 data_len = elt-length - sizeof(struct oz_get_desc_rsp) + 1; + if (data_len elt-length) + break; This check already seems bogus? It's really: if (sizeof(struct oz_get_desc_rsp) - 1 0) -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/