[BUG] ACPI Error: Needed type [Reference], found [Integer]
Latest Linux git has ACPI errors on Macbook IVB: [0.333133] ACPI Error: Needed type [Reference], found [Integer] 880261656f30 (20170119/exresop-103) [0.333141] ACPI Exception: AE_AML_OPERAND_TYPE, While resolving operands for [Store] (20170119/dswexec-461) [0.333150] ACPI Error: Method parse/execution failed [\_PR.CPU0._PDC] (Node 8802628c60c8), AE_AML_OPERAND_TYPE (20170119/psparse-543) [0.517073] ACPI Error: [\_PR_.CPU0._CST] Namespace lookup failure, AE_NOT_FOUND (20170119/psargs-363) [0.517084] ACPI Error: Method parse/execution failed [\_PR.CPU1._CST] (Node 88026173b618), AE_NOT_FOUND (20170119/psparse-543) [0.517267] ACPI Error: [\_PR_.CPU0._CST] Namespace lookup failure, AE_NOT_FOUND (20170119/psargs-363) [0.517279] ACPI Error: Method parse/execution failed [\_PR.CPU2._CST] (Node 88026173b640), AE_NOT_FOUND (20170119/psparse-543) [0.517464] ACPI Error: [\_PR_.CPU0._CST] Namespace lookup failure, AE_NOT_FOUND (20170119/psargs-363) [0.517474] ACPI Error: Method parse/execution failed [\_PR.CPU3._CST] (Node 88026173b668), AE_NOT_FOUND (20170119/psparse-543) Bisect leads to: ce87e09dd88c61f9088768a7708828423549725c is the first bad commit commit ce87e09dd88c61f9088768a7708828423549725c Author: Bob MooreDate: Wed Dec 28 15:29:43 2016 +0800 ACPICA: Parser: Allow method invocations as target operands ACPICA commit a6cca7a4786cdbfd29cea67e84b5b01a8ae6ff1c Method invocations as target operands are allowed as target operands in the ASL grammar. This change implements support for this. Method must return a reference for this to work properly at runtime, however. Link: https://github.com/acpica/acpica/commit/a6cca7a4 Signed-off-by: Bob Moore Signed-off-by: Lv Zheng Signed-off-by: Rafael J. Wysocki
[BUG] ACPI Error: Needed type [Reference], found [Integer]
Latest Linux git has ACPI errors on Macbook IVB: [0.333133] ACPI Error: Needed type [Reference], found [Integer] 880261656f30 (20170119/exresop-103) [0.333141] ACPI Exception: AE_AML_OPERAND_TYPE, While resolving operands for [Store] (20170119/dswexec-461) [0.333150] ACPI Error: Method parse/execution failed [\_PR.CPU0._PDC] (Node 8802628c60c8), AE_AML_OPERAND_TYPE (20170119/psparse-543) [0.517073] ACPI Error: [\_PR_.CPU0._CST] Namespace lookup failure, AE_NOT_FOUND (20170119/psargs-363) [0.517084] ACPI Error: Method parse/execution failed [\_PR.CPU1._CST] (Node 88026173b618), AE_NOT_FOUND (20170119/psparse-543) [0.517267] ACPI Error: [\_PR_.CPU0._CST] Namespace lookup failure, AE_NOT_FOUND (20170119/psargs-363) [0.517279] ACPI Error: Method parse/execution failed [\_PR.CPU2._CST] (Node 88026173b640), AE_NOT_FOUND (20170119/psparse-543) [0.517464] ACPI Error: [\_PR_.CPU0._CST] Namespace lookup failure, AE_NOT_FOUND (20170119/psargs-363) [0.517474] ACPI Error: Method parse/execution failed [\_PR.CPU3._CST] (Node 88026173b668), AE_NOT_FOUND (20170119/psparse-543) Bisect leads to: ce87e09dd88c61f9088768a7708828423549725c is the first bad commit commit ce87e09dd88c61f9088768a7708828423549725c Author: Bob Moore Date: Wed Dec 28 15:29:43 2016 +0800 ACPICA: Parser: Allow method invocations as target operands ACPICA commit a6cca7a4786cdbfd29cea67e84b5b01a8ae6ff1c Method invocations as target operands are allowed as target operands in the ASL grammar. This change implements support for this. Method must return a reference for this to work properly at runtime, however. Link: https://github.com/acpica/acpica/commit/a6cca7a4 Signed-off-by: Bob Moore Signed-off-by: Lv Zheng Signed-off-by: Rafael J. Wysocki
Re: [Intel-gfx] [v4.6-10530-g28165ec7a99b] i915: *ERROR* "CPU pipe/PCH transcoder" A FIFO underrun
On 25 May 2016 at 08:31, Sedat Dilekwrote: > Hi Daniel, > > with latest Linus Git I see this with my Intel SandyBridge GPU... > > [ 17.629014] [drm:intel_cpu_fifo_underrun_irq_handler [i915]] > *ERROR* CPU pipe A FIFO underrun > [ 17.630652] [drm:intel_set_pch_fifo_underrun_reporting [i915]] > *ERROR* uncleared pch fifo underrun on pch transcoder A > [ 17.630685] [drm:intel_pch_fifo_underrun_irq_handler [i915]] > *ERROR* PCH transcoder A FIFO underrun Guessing this is https://bugs.freedesktop.org/show_bug.cgi?id=95736
Re: [Intel-gfx] [v4.6-10530-g28165ec7a99b] i915: *ERROR* "CPU pipe/PCH transcoder" A FIFO underrun
On 25 May 2016 at 08:31, Sedat Dilek wrote: > Hi Daniel, > > with latest Linus Git I see this with my Intel SandyBridge GPU... > > [ 17.629014] [drm:intel_cpu_fifo_underrun_irq_handler [i915]] > *ERROR* CPU pipe A FIFO underrun > [ 17.630652] [drm:intel_set_pch_fifo_underrun_reporting [i915]] > *ERROR* uncleared pch fifo underrun on pch transcoder A > [ 17.630685] [drm:intel_pch_fifo_underrun_irq_handler [i915]] > *ERROR* PCH transcoder A FIFO underrun Guessing this is https://bugs.freedesktop.org/show_bug.cgi?id=95736
[PATCH v3] usb: core: hub: hub_port_init lock controller instead of bus
The XHCI controller presents two USB buses to the system - one for USB2 and one for USB3. The hub init code (hub_port_init) is reentrant but only locks one bus per thread, leading to a race condition failure when two threads attempt to simultaneously initialise a USB2 and USB3 device: [8.034843] xhci_hcd :00:14.0: Timeout while waiting for setup device command [ 13.183701] usb 3-3: device descriptor read/all, error -110 On a test system this failure occurred on 6% of all boots. The call traces at the point of failure are: Call Trace: [] schedule+0x37/0x90 [] usb_kill_urb+0x8d/0xd0 [] ? wake_up_atomic_t+0x30/0x30 [] usb_start_wait_urb+0xbe/0x150 [] usb_control_msg+0xbc/0xf0 [] hub_port_init+0x51e/0xb70 [] hub_event+0x817/0x1570 [] process_one_work+0x1ff/0x620 [] ? process_one_work+0x15f/0x620 [] worker_thread+0x64/0x4b0 [] ? rescuer_thread+0x390/0x390 [] kthread+0x105/0x120 [] ? kthread_create_on_node+0x200/0x200 [] ret_from_fork+0x3f/0x70 [] ? kthread_create_on_node+0x200/0x200 Call Trace: [] xhci_setup_device+0x53d/0xa40 [] xhci_address_device+0xe/0x10 [] hub_port_init+0x1bf/0xb70 [] ? trace_hardirqs_on+0xd/0x10 [] hub_event+0x817/0x1570 [] process_one_work+0x1ff/0x620 [] ? process_one_work+0x15f/0x620 [] worker_thread+0x64/0x4b0 [] ? rescuer_thread+0x390/0x390 [] kthread+0x105/0x120 [] ? kthread_create_on_node+0x200/0x200 [] ret_from_fork+0x3f/0x70 [] ? kthread_create_on_node+0x200/0x200 Which results from the two call chains: hub_port_init usb_get_device_descriptor usb_get_descriptor usb_control_msg usb_internal_control_msg usb_start_wait_urb usb_submit_urb / wait_for_completion_timeout / usb_kill_urb hub_port_init hub_set_address xhci_address_device xhci_setup_device Mathias Nyman explains the current behaviour violates the XHCI spec: hub_port_reset() will end up moving the corresponding xhci device slot to default state. As hub_port_reset() is called several times in hub_port_init() it sounds reasonable that we could end up with two threads having their xhci device slots in default state at the same time, which according to xhci 4.5.3 specs still is a big no no: "Note: Software shall not transition more than one Device Slot to the Default State at a time" So both threads fail at their next task after this. One fails to read the descriptor, and the other fails addressing the device. Fix this in hub_port_init by locking the USB controller (instead of an individual bus) to prevent simultaneous initialisation of both buses. Fixes: 638139eb95d2 ("usb: hub: allow to process more usb hub events in parallel") Link: https://lkml.org/lkml/2016/2/8/312 Link: https://lkml.org/lkml/2016/2/4/748 Signed-off-by: Chris Bainbridge <chris.bainbri...@gmail.com> --- This is technically the same as the last patch version, but with a reworked commit message to better explain the cause of the issue. --- drivers/usb/core/hcd.c | 15 +-- drivers/usb/core/hub.c | 8 include/linux/usb.h | 3 +-- include/linux/usb/hcd.h | 1 + 4 files changed, 19 insertions(+), 8 deletions(-) diff --git a/drivers/usb/core/hcd.c b/drivers/usb/core/hcd.c index 2ca2cef7f681..980fc5774151 100644 --- a/drivers/usb/core/hcd.c +++ b/drivers/usb/core/hcd.c @@ -994,7 +994,7 @@ static void usb_bus_init (struct usb_bus *bus) bus->bandwidth_allocated = 0; bus->bandwidth_int_reqs = 0; bus->bandwidth_isoc_reqs = 0; - mutex_init(>usb_address0_mutex); + mutex_init(>devnum_next_mutex); } /*-*/ @@ -2521,6 +2521,14 @@ struct usb_hcd *usb_create_shared_hcd(const struct hc_driver *driver, return NULL; } if (primary_hcd == NULL) { + hcd->address0_mutex = kmalloc(sizeof(*hcd->address0_mutex), + GFP_KERNEL); + if (!hcd->address0_mutex) { + kfree(hcd); + dev_dbg(dev, "hcd address0 mutex alloc failed\n"); + return NULL; + } + mutex_init(hcd->address0_mutex); hcd->bandwidth_mutex = kmalloc(sizeof(*hcd->bandwidth_mutex), GFP_KERNEL); if (!hcd->bandwidth_mutex) { @@ -2532,6 +2540,7 @@ struct usb_hcd *usb_create_shared_hcd(const struct hc_driver *driver, dev_set_drvdata(dev, hcd); } else { mutex_lock(_port_peer_mutex); + hcd->address0_mutex = primary_hcd->address0_mutex; hcd->bandwidth_mutex = primary_hcd->bandwidth_mutex; hcd->primary_hcd = primary_hcd; primary_hcd->primary_hcd = primary_hcd; @@ -2598,8 +2607,10 @@ static void hcd_release(struct kref *kref) struct usb_h
[PATCH v3] usb: core: hub: hub_port_init lock controller instead of bus
The XHCI controller presents two USB buses to the system - one for USB2 and one for USB3. The hub init code (hub_port_init) is reentrant but only locks one bus per thread, leading to a race condition failure when two threads attempt to simultaneously initialise a USB2 and USB3 device: [8.034843] xhci_hcd :00:14.0: Timeout while waiting for setup device command [ 13.183701] usb 3-3: device descriptor read/all, error -110 On a test system this failure occurred on 6% of all boots. The call traces at the point of failure are: Call Trace: [] schedule+0x37/0x90 [] usb_kill_urb+0x8d/0xd0 [] ? wake_up_atomic_t+0x30/0x30 [] usb_start_wait_urb+0xbe/0x150 [] usb_control_msg+0xbc/0xf0 [] hub_port_init+0x51e/0xb70 [] hub_event+0x817/0x1570 [] process_one_work+0x1ff/0x620 [] ? process_one_work+0x15f/0x620 [] worker_thread+0x64/0x4b0 [] ? rescuer_thread+0x390/0x390 [] kthread+0x105/0x120 [] ? kthread_create_on_node+0x200/0x200 [] ret_from_fork+0x3f/0x70 [] ? kthread_create_on_node+0x200/0x200 Call Trace: [] xhci_setup_device+0x53d/0xa40 [] xhci_address_device+0xe/0x10 [] hub_port_init+0x1bf/0xb70 [] ? trace_hardirqs_on+0xd/0x10 [] hub_event+0x817/0x1570 [] process_one_work+0x1ff/0x620 [] ? process_one_work+0x15f/0x620 [] worker_thread+0x64/0x4b0 [] ? rescuer_thread+0x390/0x390 [] kthread+0x105/0x120 [] ? kthread_create_on_node+0x200/0x200 [] ret_from_fork+0x3f/0x70 [] ? kthread_create_on_node+0x200/0x200 Which results from the two call chains: hub_port_init usb_get_device_descriptor usb_get_descriptor usb_control_msg usb_internal_control_msg usb_start_wait_urb usb_submit_urb / wait_for_completion_timeout / usb_kill_urb hub_port_init hub_set_address xhci_address_device xhci_setup_device Mathias Nyman explains the current behaviour violates the XHCI spec: hub_port_reset() will end up moving the corresponding xhci device slot to default state. As hub_port_reset() is called several times in hub_port_init() it sounds reasonable that we could end up with two threads having their xhci device slots in default state at the same time, which according to xhci 4.5.3 specs still is a big no no: "Note: Software shall not transition more than one Device Slot to the Default State at a time" So both threads fail at their next task after this. One fails to read the descriptor, and the other fails addressing the device. Fix this in hub_port_init by locking the USB controller (instead of an individual bus) to prevent simultaneous initialisation of both buses. Fixes: 638139eb95d2 ("usb: hub: allow to process more usb hub events in parallel") Link: https://lkml.org/lkml/2016/2/8/312 Link: https://lkml.org/lkml/2016/2/4/748 Signed-off-by: Chris Bainbridge --- This is technically the same as the last patch version, but with a reworked commit message to better explain the cause of the issue. --- drivers/usb/core/hcd.c | 15 +-- drivers/usb/core/hub.c | 8 include/linux/usb.h | 3 +-- include/linux/usb/hcd.h | 1 + 4 files changed, 19 insertions(+), 8 deletions(-) diff --git a/drivers/usb/core/hcd.c b/drivers/usb/core/hcd.c index 2ca2cef7f681..980fc5774151 100644 --- a/drivers/usb/core/hcd.c +++ b/drivers/usb/core/hcd.c @@ -994,7 +994,7 @@ static void usb_bus_init (struct usb_bus *bus) bus->bandwidth_allocated = 0; bus->bandwidth_int_reqs = 0; bus->bandwidth_isoc_reqs = 0; - mutex_init(>usb_address0_mutex); + mutex_init(>devnum_next_mutex); } /*-*/ @@ -2521,6 +2521,14 @@ struct usb_hcd *usb_create_shared_hcd(const struct hc_driver *driver, return NULL; } if (primary_hcd == NULL) { + hcd->address0_mutex = kmalloc(sizeof(*hcd->address0_mutex), + GFP_KERNEL); + if (!hcd->address0_mutex) { + kfree(hcd); + dev_dbg(dev, "hcd address0 mutex alloc failed\n"); + return NULL; + } + mutex_init(hcd->address0_mutex); hcd->bandwidth_mutex = kmalloc(sizeof(*hcd->bandwidth_mutex), GFP_KERNEL); if (!hcd->bandwidth_mutex) { @@ -2532,6 +2540,7 @@ struct usb_hcd *usb_create_shared_hcd(const struct hc_driver *driver, dev_set_drvdata(dev, hcd); } else { mutex_lock(_port_peer_mutex); + hcd->address0_mutex = primary_hcd->address0_mutex; hcd->bandwidth_mutex = primary_hcd->bandwidth_mutex; hcd->primary_hcd = primary_hcd; primary_hcd->primary_hcd = primary_hcd; @@ -2598,8 +2607,10 @@ static void hcd_release(struct kref *kref) struct usb_hcd *hcd = container_of (kref, struct usb_hcd, k
Re: [PATCH] usb: core: hub: hub_port_init lock controller instead of bus
On Wed, Feb 10, 2016 at 07:13:38PM +0200, Mathias Nyman wrote: > > Most likely xhci is messed up after two device slots are in default state at > the same time. > This happens when both threads are in hub_port_init() have called > hub_port_reset() > > The issue becomes visible when the the descriptor read and set address both > fail after > the port resets. > > xhci specs 4.5.3 has one tiny note about this: > "Note: Software shall not transition more than one Device Slot to the Default > State at a time" > > So to me, and from xhci pov this patch looks like the correct solution, > but I might be missing some usb core side details. > > -Mathias > Just following up to see if this patch disappeared into the void?
Re: [PATCH] usb: core: hub: hub_port_init lock controller instead of bus
On Wed, Feb 10, 2016 at 07:13:38PM +0200, Mathias Nyman wrote: > > Most likely xhci is messed up after two device slots are in default state at > the same time. > This happens when both threads are in hub_port_init() have called > hub_port_reset() > > The issue becomes visible when the the descriptor read and set address both > fail after > the port resets. > > xhci specs 4.5.3 has one tiny note about this: > "Note: Software shall not transition more than one Device Slot to the Default > State at a time" > > So to me, and from xhci pov this patch looks like the correct solution, > but I might be missing some usb core side details. > > -Mathias > Just following up to see if this patch disappeared into the void?
[BUG] packet loss with PROVE_LOCKING, bisected to EDAC fix
Hi, I was testing something on an old server (Dell T105 opteron) and noticed packet loss after updating the kernel from 3.10 to 4.5. The test was: On Dell run: iperf -s On another system: iperf3 -c dell -u -b 20M -l 1k -t 1000 This sends a 20mbit UDP stream to the Dell. It works fine normally (0% packet loss), but when CONFIG_PROVE_LOCKING is enabled there is high (35%) packet loss. (DEBUG_LOCKDEP also seems to cause packet loss) The packet loss bisected back to: commit 88d84ac97378c2f1d5fec9af1e8b7d9a662d6b00 Author: Borislav PetkovDate: Fri Jul 19 12:28:25 2013 +0200 EDAC: Fix lockdep splat I have confirmed that the commit preceding this (v3.11-rc1) is fine and that 88d84a introduced the bug.
[BUG] packet loss with PROVE_LOCKING, bisected to EDAC fix
Hi, I was testing something on an old server (Dell T105 opteron) and noticed packet loss after updating the kernel from 3.10 to 4.5. The test was: On Dell run: iperf -s On another system: iperf3 -c dell -u -b 20M -l 1k -t 1000 This sends a 20mbit UDP stream to the Dell. It works fine normally (0% packet loss), but when CONFIG_PROVE_LOCKING is enabled there is high (35%) packet loss. (DEBUG_LOCKDEP also seems to cause packet loss) The packet loss bisected back to: commit 88d84ac97378c2f1d5fec9af1e8b7d9a662d6b00 Author: Borislav Petkov Date: Fri Jul 19 12:28:25 2013 +0200 EDAC: Fix lockdep splat I have confirmed that the commit preceding this (v3.11-rc1) is fine and that 88d84a introduced the bug.
[tip:x86/microcode] x86/microcode/intel: Change checksum variables to u32
Commit-ID: bc864af13f34d19c911f5691d87bdacc9ce109f5 Gitweb: http://git.kernel.org/tip/bc864af13f34d19c911f5691d87bdacc9ce109f5 Author: Chris Bainbridge <chris.bainbri...@gmail.com> AuthorDate: Mon, 7 Mar 2016 11:10:00 +0100 Committer: Thomas Gleixner <t...@linutronix.de> CommitDate: Tue, 8 Mar 2016 09:08:44 +0100 x86/microcode/intel: Change checksum variables to u32 Microcode checksum verification should be done using unsigned 32-bit values otherwise the calculation overflow results in undefined behaviour. This is also nicely documented in the SDM, section "Microcode Update Checksum": "To check for a corrupt microcode update, software must perform a unsigned DWORD (32-bit) checksum of the microcode update. Even though some fields are signed, the checksum procedure treats all DWORDs as unsigned. Microcode updates with a header version equal to 0001H must sum all DWORDs that comprise the microcode update. A valid checksum check will yield a value of H." but for some reason the code has been using ints from the very beginning. In practice, this bug possibly manifested itself only when doing the microcode data checksum - apparently, currently shipped Intel microcode doesn't have an extended signature table for which we do checksum verification too. UBSAN: Undefined behaviour in arch/x86/kernel/cpu/microcode/intel_lib.c:105:12 signed integer overflow: -1500151068 + -2125470173 cannot be represented in type 'int' CPU: 0 PID: 0 Comm: swapper Not tainted 4.5.0-rc5+ #495 ... Call Trace: dump_stack ? inotify_ioctl ubsan_epilogue handle_overflow __ubsan_handle_add_overflow microcode_sanity_check get_matching_model_microcode.isra.2.constprop.8 ? early_idt_handler_common ? strlcpy ? find_cpio_data load_ucode_intel_bsp load_ucode_bsp ? load_ucode_bsp x86_64_start_kernel [ Expand and massage commit message. ] Signed-off-by: Chris Bainbridge <chris.bainbri...@gmail.com> Signed-off-by: Borislav Petkov <b...@suse.de> Cc: h...@hmh.eng.br Link: http://lkml.kernel.org/r/1456834359-5132-1-git-send-email-chris.bainbri...@gmail.com Signed-off-by: Thomas Gleixner <t...@linutronix.de> --- arch/x86/kernel/cpu/microcode/intel_lib.c | 8 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/arch/x86/kernel/cpu/microcode/intel_lib.c b/arch/x86/kernel/cpu/microcode/intel_lib.c index b96896b..99ca2c9 100644 --- a/arch/x86/kernel/cpu/microcode/intel_lib.c +++ b/arch/x86/kernel/cpu/microcode/intel_lib.c @@ -49,7 +49,7 @@ int microcode_sanity_check(void *mc, int print_err) unsigned long total_size, data_size, ext_table_size; struct microcode_header_intel *mc_header = mc; struct extended_sigtable *ext_header = NULL; - int sum, orig_sum, ext_sigcount = 0, i; + u32 sum, orig_sum, ext_sigcount = 0, i; struct extended_signature *ext_sig; total_size = get_totalsize(mc_header); @@ -85,8 +85,8 @@ int microcode_sanity_check(void *mc, int print_err) /* check extended table checksum */ if (ext_table_size) { - int ext_table_sum = 0; - int *ext_tablep = (int *)ext_header; + u32 ext_table_sum = 0; + u32 *ext_tablep = (u32 *)ext_header; i = ext_table_size / DWSIZE; while (i--) @@ -102,7 +102,7 @@ int microcode_sanity_check(void *mc, int print_err) orig_sum = 0; i = (MC_HEADER_SIZE + data_size) / DWSIZE; while (i--) - orig_sum += ((int *)mc)[i]; + orig_sum += ((u32 *)mc)[i]; if (orig_sum) { if (print_err) pr_err("aborting, bad checksum\n");
[tip:x86/microcode] x86/microcode/intel: Change checksum variables to u32
Commit-ID: bc864af13f34d19c911f5691d87bdacc9ce109f5 Gitweb: http://git.kernel.org/tip/bc864af13f34d19c911f5691d87bdacc9ce109f5 Author: Chris Bainbridge AuthorDate: Mon, 7 Mar 2016 11:10:00 +0100 Committer: Thomas Gleixner CommitDate: Tue, 8 Mar 2016 09:08:44 +0100 x86/microcode/intel: Change checksum variables to u32 Microcode checksum verification should be done using unsigned 32-bit values otherwise the calculation overflow results in undefined behaviour. This is also nicely documented in the SDM, section "Microcode Update Checksum": "To check for a corrupt microcode update, software must perform a unsigned DWORD (32-bit) checksum of the microcode update. Even though some fields are signed, the checksum procedure treats all DWORDs as unsigned. Microcode updates with a header version equal to 0001H must sum all DWORDs that comprise the microcode update. A valid checksum check will yield a value of H." but for some reason the code has been using ints from the very beginning. In practice, this bug possibly manifested itself only when doing the microcode data checksum - apparently, currently shipped Intel microcode doesn't have an extended signature table for which we do checksum verification too. UBSAN: Undefined behaviour in arch/x86/kernel/cpu/microcode/intel_lib.c:105:12 signed integer overflow: -1500151068 + -2125470173 cannot be represented in type 'int' CPU: 0 PID: 0 Comm: swapper Not tainted 4.5.0-rc5+ #495 ... Call Trace: dump_stack ? inotify_ioctl ubsan_epilogue handle_overflow __ubsan_handle_add_overflow microcode_sanity_check get_matching_model_microcode.isra.2.constprop.8 ? early_idt_handler_common ? strlcpy ? find_cpio_data load_ucode_intel_bsp load_ucode_bsp ? load_ucode_bsp x86_64_start_kernel [ Expand and massage commit message. ] Signed-off-by: Chris Bainbridge Signed-off-by: Borislav Petkov Cc: h...@hmh.eng.br Link: http://lkml.kernel.org/r/1456834359-5132-1-git-send-email-chris.bainbri...@gmail.com Signed-off-by: Thomas Gleixner --- arch/x86/kernel/cpu/microcode/intel_lib.c | 8 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/arch/x86/kernel/cpu/microcode/intel_lib.c b/arch/x86/kernel/cpu/microcode/intel_lib.c index b96896b..99ca2c9 100644 --- a/arch/x86/kernel/cpu/microcode/intel_lib.c +++ b/arch/x86/kernel/cpu/microcode/intel_lib.c @@ -49,7 +49,7 @@ int microcode_sanity_check(void *mc, int print_err) unsigned long total_size, data_size, ext_table_size; struct microcode_header_intel *mc_header = mc; struct extended_sigtable *ext_header = NULL; - int sum, orig_sum, ext_sigcount = 0, i; + u32 sum, orig_sum, ext_sigcount = 0, i; struct extended_signature *ext_sig; total_size = get_totalsize(mc_header); @@ -85,8 +85,8 @@ int microcode_sanity_check(void *mc, int print_err) /* check extended table checksum */ if (ext_table_size) { - int ext_table_sum = 0; - int *ext_tablep = (int *)ext_header; + u32 ext_table_sum = 0; + u32 *ext_tablep = (u32 *)ext_header; i = ext_table_size / DWSIZE; while (i--) @@ -102,7 +102,7 @@ int microcode_sanity_check(void *mc, int print_err) orig_sum = 0; i = (MC_HEADER_SIZE + data_size) / DWSIZE; while (i--) - orig_sum += ((int *)mc)[i]; + orig_sum += ((u32 *)mc)[i]; if (orig_sum) { if (print_err) pr_err("aborting, bad checksum\n");
Re: [PATCH] ACPICA: Events: Execute some _REG methods in early boot
On Mon, Mar 07, 2016 at 06:36:05AM +, Zheng, Lv wrote: > Hi, > > First of all, why don't you respond on the kernel bugzilla? > Posting a fix here directly without communication doesn't look like a > constructive help but more like a destructive attack to me. Sorry if it appears to be an attack. It wasn't meant that way. I was under the impression that email is the preferred means of communication for kernel development. I'm not sure whether other developers actively monitor bugzilla reports (some do, but bug trackers are often graveyards for bug reports and it's easy for communications to be missed). > As I said in the previous reply, this is the known issue and can be fixed by > applying the whole series. > Especially this commit: > https://patchwork.kernel.org/patch/8241421/ > That's why I asked you to test again by applying the whole series. > And that's why after having not seen your response for so long time, we > prepared a test branch and was waiting for your response. I already replied 9 days ago: https://lkml.org/lkml/2016/2/27/164 The suggested patch did not fix the issue and the patch series did not apply cleanly. (btw I'm not paid for this work so I tend to handle it in batches when I have spare time, which is why you may see replies delayed to weekends etc.) > You need to post acpidump there to have the issue root caused so that more > accurate fix can be generated. I already sent an acpidump for this system to you and Robert (email 3rd Feb). > The above fix looks hackish to me. > IMO, if you want to stop regressions that are triggered by this commit. > A simpler and proper way would be to move acpi_gbl_reg_method_enabled = TRUE > to the end of acpi_load_tables(). > So that when the order of table loading and ECDT probing is corrected, the > correct logic can still apply. > > I don't have ECDT platforms in hand to confirm. > Can you help to just give it a try? Yes the fix may be hackish, but it served the purpose of correctly identifying the bug. Your suggestion works fine, for reference the tested patch was: diff --git a/drivers/acpi/acpica/tbxfload.c b/drivers/acpi/acpica/tbxfload.c index 278666e39563..9136d7250022 100644 --- a/drivers/acpi/acpica/tbxfload.c +++ b/drivers/acpi/acpica/tbxfload.c @@ -83,6 +83,7 @@ acpi_status __init acpi_load_tables(void) "While loading namespace from ACPI tables")); } + acpi_gbl_reg_methods_enabled = TRUE; return_ACPI_STATUS(status); } diff --git a/drivers/acpi/acpica/utxfinit.c b/drivers/acpi/acpica/utxfinit.c index 721b87cce908..2c0491038068 100644 --- a/drivers/acpi/acpica/utxfinit.c +++ b/drivers/acpi/acpica/utxfinit.c @@ -267,7 +267,6 @@ acpi_status __init acpi_initialize_objects(u32 flags) * initialized, even if they contain executable AML (see the call to * acpi_ns_initialize_objects below). */ - acpi_gbl_reg_methods_enabled = TRUE; if (!(flags & ACPI_NO_ADDRESS_SPACE_INIT)) { ACPI_DEBUG_PRINT((ACPI_DB_EXEC, "[Init] Executing _REG OpRegion methods\n")); With this patch applied the ODEBUG errors do not occur. Thanks.
Re: [PATCH] ACPICA: Events: Execute some _REG methods in early boot
On Mon, Mar 07, 2016 at 06:36:05AM +, Zheng, Lv wrote: > Hi, > > First of all, why don't you respond on the kernel bugzilla? > Posting a fix here directly without communication doesn't look like a > constructive help but more like a destructive attack to me. Sorry if it appears to be an attack. It wasn't meant that way. I was under the impression that email is the preferred means of communication for kernel development. I'm not sure whether other developers actively monitor bugzilla reports (some do, but bug trackers are often graveyards for bug reports and it's easy for communications to be missed). > As I said in the previous reply, this is the known issue and can be fixed by > applying the whole series. > Especially this commit: > https://patchwork.kernel.org/patch/8241421/ > That's why I asked you to test again by applying the whole series. > And that's why after having not seen your response for so long time, we > prepared a test branch and was waiting for your response. I already replied 9 days ago: https://lkml.org/lkml/2016/2/27/164 The suggested patch did not fix the issue and the patch series did not apply cleanly. (btw I'm not paid for this work so I tend to handle it in batches when I have spare time, which is why you may see replies delayed to weekends etc.) > You need to post acpidump there to have the issue root caused so that more > accurate fix can be generated. I already sent an acpidump for this system to you and Robert (email 3rd Feb). > The above fix looks hackish to me. > IMO, if you want to stop regressions that are triggered by this commit. > A simpler and proper way would be to move acpi_gbl_reg_method_enabled = TRUE > to the end of acpi_load_tables(). > So that when the order of table loading and ECDT probing is corrected, the > correct logic can still apply. > > I don't have ECDT platforms in hand to confirm. > Can you help to just give it a try? Yes the fix may be hackish, but it served the purpose of correctly identifying the bug. Your suggestion works fine, for reference the tested patch was: diff --git a/drivers/acpi/acpica/tbxfload.c b/drivers/acpi/acpica/tbxfload.c index 278666e39563..9136d7250022 100644 --- a/drivers/acpi/acpica/tbxfload.c +++ b/drivers/acpi/acpica/tbxfload.c @@ -83,6 +83,7 @@ acpi_status __init acpi_load_tables(void) "While loading namespace from ACPI tables")); } + acpi_gbl_reg_methods_enabled = TRUE; return_ACPI_STATUS(status); } diff --git a/drivers/acpi/acpica/utxfinit.c b/drivers/acpi/acpica/utxfinit.c index 721b87cce908..2c0491038068 100644 --- a/drivers/acpi/acpica/utxfinit.c +++ b/drivers/acpi/acpica/utxfinit.c @@ -267,7 +267,6 @@ acpi_status __init acpi_initialize_objects(u32 flags) * initialized, even if they contain executable AML (see the call to * acpi_ns_initialize_objects below). */ - acpi_gbl_reg_methods_enabled = TRUE; if (!(flags & ACPI_NO_ADDRESS_SPACE_INIT)) { ACPI_DEBUG_PRINT((ACPI_DB_EXEC, "[Init] Executing _REG OpRegion methods\n")); With this patch applied the ODEBUG errors do not occur. Thanks.
[PATCH] ACPICA: Events: Execute some _REG methods in early boot
The regression caused _REG methods to not be run in early boot for all space IDs, but a removed comment stated that _REG methods should be executed for IDs like embedded_controller. Before the regression this was the case: [0.230469] Executing Method \_SB.PCI0.LPCB.EC._REG [0.230531] Initializing Region \GNVS [0.230607] Initializing Region \_SB.PCI0.LPCB.EC.ECOR [0.231043] Initializing Region \_SB.PCI0.IGPU.IGDM After the regression the initialisation is not done and ODEBUG warnings are shown at boot and/or shutdown: [6.676570] WARNING: CPU: 0 PID: 3317 at lib/debugobjects.c:263 debug_print_object+0x85/0xa0() [6.676576] ODEBUG: assert_init not available (active state 0) object type: timer_list hint: stub_timer+0x0/0x20 [6.676578] Modules linked in: [6.676582] CPU: 0 PID: 3317 Comm: ccpd Not tainted 4.5.0-rc6 #509 [6.676584] Hardware name: Apple Inc. MacBookPro10,2/Mac-AFD8A9D944EA4843, BIOS MBP102.88Z.0106.B0A.1509130955 09/13/2015 [6.676586] 880256543db0 814802b5 880256543df8 [6.676590] 81f40dbd 880256543de8 810bd0ec 880256543e90 [6.676594] 822532c0 81f40e63 83138c88 01fdf040 [6.676598] Call Trace: [6.676603] [] dump_stack+0x67/0x92 [6.676608] [] warn_slowpath_common+0x7c/0xb0 [6.676611] [] warn_slowpath_fmt+0x47/0x50 [6.676614] [] debug_print_object+0x85/0xa0 [6.676616] [] ? cascade+0x70/0x70 [6.676620] [] debug_object_assert_init+0xf8/0x130 [6.676624] [] ? trace_hardirqs_on+0xd/0x10 [6.676626] [] del_timer+0x1f/0x70 [6.676631] [] laptop_sync_completion+0x61/0x100 [6.676633] [] ? laptop_io_completion+0x30/0x30 [6.676637] [] sys_sync+0x7f/0x90 [6.676641] [] entry_SYSCALL_64_fastpath+0x12/0x6f Fixes: efaed9be998b ("ACPICA: Events: Enhance acpi_ev_execute_reg_method() to ensure no _REG evaluations can happen during OS early boot stages") Link: https://lkml.org/lkml/2016/2/3/273 Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=112911 Signed-off-by: Chris Bainbridge <chris.bainbri...@gmail.com> --- drivers/acpi/acpica/evregion.c | 22 +- 1 file changed, 21 insertions(+), 1 deletion(-) diff --git a/drivers/acpi/acpica/evregion.c b/drivers/acpi/acpica/evregion.c index 47092b4d633c..d0b02ef0effa 100644 --- a/drivers/acpi/acpica/evregion.c +++ b/drivers/acpi/acpica/evregion.c @@ -590,6 +590,7 @@ acpi_ev_execute_reg_method(union acpi_operand_object *region_obj, u32 function) union acpi_operand_object *args[3]; union acpi_operand_object *region_obj2; acpi_status status; + bool sp; ACPI_FUNCTION_TRACE(ev_execute_reg_method); @@ -598,9 +599,28 @@ acpi_ev_execute_reg_method(union acpi_operand_object *region_obj, u32 function) return_ACPI_STATUS(AE_NOT_EXIST); } + /* +* For the default space_IDs, (the IDs for which there are default region handlers +* installed) Only execute the _REG methods if the global initialization _REG +* methods have already been run (via acpi_initialize_objects). In other words, +* we will defer the execution of the _REG methods for these space_IDs until +* execution of acpi_initialize_objects. This is done because we need the handlers +* for the default spaces (mem/io/pci/table) to be installed before we can run +* any control methods (or _REG methods). There is known BIOS code that depends +* on this. +* +* For all other space_IDs, we can safely execute the _REG methods immediately. +* This means that for IDs like embedded_controller, this function should be called +* only after acpi_enable_subsystem has been called. +*/ + + sp = (region_obj->region.space_id == ACPI_ADR_SPACE_SYSTEM_MEMORY || + region_obj->region.space_id == ACPI_ADR_SPACE_SYSTEM_IO || + region_obj->region.space_id == ACPI_ADR_SPACE_PCI_CONFIG || + region_obj->region.space_id == ACPI_ADR_SPACE_DATA_TABLE); if (region_obj2->extra.method_REG == NULL || region_obj->region.handler == NULL || - !acpi_gbl_reg_methods_enabled) { + (sp && !acpi_gbl_reg_methods_enabled)) { return_ACPI_STATUS(AE_OK); } -- 2.1.4
[PATCH] ACPICA: Events: Execute some _REG methods in early boot
The regression caused _REG methods to not be run in early boot for all space IDs, but a removed comment stated that _REG methods should be executed for IDs like embedded_controller. Before the regression this was the case: [0.230469] Executing Method \_SB.PCI0.LPCB.EC._REG [0.230531] Initializing Region \GNVS [0.230607] Initializing Region \_SB.PCI0.LPCB.EC.ECOR [0.231043] Initializing Region \_SB.PCI0.IGPU.IGDM After the regression the initialisation is not done and ODEBUG warnings are shown at boot and/or shutdown: [6.676570] WARNING: CPU: 0 PID: 3317 at lib/debugobjects.c:263 debug_print_object+0x85/0xa0() [6.676576] ODEBUG: assert_init not available (active state 0) object type: timer_list hint: stub_timer+0x0/0x20 [6.676578] Modules linked in: [6.676582] CPU: 0 PID: 3317 Comm: ccpd Not tainted 4.5.0-rc6 #509 [6.676584] Hardware name: Apple Inc. MacBookPro10,2/Mac-AFD8A9D944EA4843, BIOS MBP102.88Z.0106.B0A.1509130955 09/13/2015 [6.676586] 880256543db0 814802b5 880256543df8 [6.676590] 81f40dbd 880256543de8 810bd0ec 880256543e90 [6.676594] 822532c0 81f40e63 83138c88 01fdf040 [6.676598] Call Trace: [6.676603] [] dump_stack+0x67/0x92 [6.676608] [] warn_slowpath_common+0x7c/0xb0 [6.676611] [] warn_slowpath_fmt+0x47/0x50 [6.676614] [] debug_print_object+0x85/0xa0 [6.676616] [] ? cascade+0x70/0x70 [6.676620] [] debug_object_assert_init+0xf8/0x130 [6.676624] [] ? trace_hardirqs_on+0xd/0x10 [6.676626] [] del_timer+0x1f/0x70 [6.676631] [] laptop_sync_completion+0x61/0x100 [6.676633] [] ? laptop_io_completion+0x30/0x30 [6.676637] [] sys_sync+0x7f/0x90 [6.676641] [] entry_SYSCALL_64_fastpath+0x12/0x6f Fixes: efaed9be998b ("ACPICA: Events: Enhance acpi_ev_execute_reg_method() to ensure no _REG evaluations can happen during OS early boot stages") Link: https://lkml.org/lkml/2016/2/3/273 Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=112911 Signed-off-by: Chris Bainbridge --- drivers/acpi/acpica/evregion.c | 22 +- 1 file changed, 21 insertions(+), 1 deletion(-) diff --git a/drivers/acpi/acpica/evregion.c b/drivers/acpi/acpica/evregion.c index 47092b4d633c..d0b02ef0effa 100644 --- a/drivers/acpi/acpica/evregion.c +++ b/drivers/acpi/acpica/evregion.c @@ -590,6 +590,7 @@ acpi_ev_execute_reg_method(union acpi_operand_object *region_obj, u32 function) union acpi_operand_object *args[3]; union acpi_operand_object *region_obj2; acpi_status status; + bool sp; ACPI_FUNCTION_TRACE(ev_execute_reg_method); @@ -598,9 +599,28 @@ acpi_ev_execute_reg_method(union acpi_operand_object *region_obj, u32 function) return_ACPI_STATUS(AE_NOT_EXIST); } + /* +* For the default space_IDs, (the IDs for which there are default region handlers +* installed) Only execute the _REG methods if the global initialization _REG +* methods have already been run (via acpi_initialize_objects). In other words, +* we will defer the execution of the _REG methods for these space_IDs until +* execution of acpi_initialize_objects. This is done because we need the handlers +* for the default spaces (mem/io/pci/table) to be installed before we can run +* any control methods (or _REG methods). There is known BIOS code that depends +* on this. +* +* For all other space_IDs, we can safely execute the _REG methods immediately. +* This means that for IDs like embedded_controller, this function should be called +* only after acpi_enable_subsystem has been called. +*/ + + sp = (region_obj->region.space_id == ACPI_ADR_SPACE_SYSTEM_MEMORY || + region_obj->region.space_id == ACPI_ADR_SPACE_SYSTEM_IO || + region_obj->region.space_id == ACPI_ADR_SPACE_PCI_CONFIG || + region_obj->region.space_id == ACPI_ADR_SPACE_DATA_TABLE); if (region_obj2->extra.method_REG == NULL || region_obj->region.handler == NULL || - !acpi_gbl_reg_methods_enabled) { + (sp && !acpi_gbl_reg_methods_enabled)) { return_ACPI_STATUS(AE_OK); } -- 2.1.4
[PATCH v2] x86/microcode: Change checksum to u32
Checksum should be unsigned 32-bit otherwise the calculation overflow results in undefined behaviour: [0.00] [0.00] UBSAN: Undefined behaviour in arch/x86/kernel/cpu/microcode/intel_lib.c:105:12 [0.00] signed integer overflow: [0.00] -1500151068 + -2125470173 cannot be represented in type 'int' [0.00] CPU: 0 PID: 0 Comm: swapper Not tainted 4.5.0-rc5+ #495 [0.00] 0086 0086 83203968 [0.00] 81b30952 834c43b8 83203998 814fe623 [0.00] 83203980 81bcdf2d 8339a448 83203a08 [0.00] Call Trace: [0.00] [] dump_stack+0x4e/0x6c [0.00] [] ? inotify_ioctl+0x43/0x1c0 [0.00] [] ubsan_epilogue+0xd/0x40 [0.00] [] handle_overflow+0xbd/0xe0 [0.00] [] __ubsan_handle_add_overflow+0xe/0x10 [0.00] [] microcode_sanity_check+0x405/0x590 [0.00] [] get_matching_model_microcode.isra.2.constprop.8+0xa5/0x345 [0.00] [] ? early_idt_handler_common+0x3d/0xae [0.00] [] ? strlcpy+0x52/0xa0 [0.00] [] ? find_cpio_data+0x371/0x510 [0.00] [] load_ucode_intel_bsp+0xa1/0xe5 [0.00] [] load_ucode_bsp+0xdf/0xf3 [0.00] [] ? load_ucode_bsp+0xdf/0xf3 [0.00] [] x86_64_start_kernel+0xd1/0xee [0.00] Link: https://lkml.org/lkml/2016/2/27/79 Signed-off-by: Chris Bainbridge <chris.bainbri...@gmail.com> --- arch/x86/kernel/cpu/microcode/intel_lib.c | 8 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/arch/x86/kernel/cpu/microcode/intel_lib.c b/arch/x86/kernel/cpu/microcode/intel_lib.c index b96896bcbdaf..99ca2c935777 100644 --- a/arch/x86/kernel/cpu/microcode/intel_lib.c +++ b/arch/x86/kernel/cpu/microcode/intel_lib.c @@ -49,7 +49,7 @@ int microcode_sanity_check(void *mc, int print_err) unsigned long total_size, data_size, ext_table_size; struct microcode_header_intel *mc_header = mc; struct extended_sigtable *ext_header = NULL; - int sum, orig_sum, ext_sigcount = 0, i; + u32 sum, orig_sum, ext_sigcount = 0, i; struct extended_signature *ext_sig; total_size = get_totalsize(mc_header); @@ -85,8 +85,8 @@ int microcode_sanity_check(void *mc, int print_err) /* check extended table checksum */ if (ext_table_size) { - int ext_table_sum = 0; - int *ext_tablep = (int *)ext_header; + u32 ext_table_sum = 0; + u32 *ext_tablep = (u32 *)ext_header; i = ext_table_size / DWSIZE; while (i--) @@ -102,7 +102,7 @@ int microcode_sanity_check(void *mc, int print_err) orig_sum = 0; i = (MC_HEADER_SIZE + data_size) / DWSIZE; while (i--) - orig_sum += ((int *)mc)[i]; + orig_sum += ((u32 *)mc)[i]; if (orig_sum) { if (print_err) pr_err("aborting, bad checksum\n"); -- 2.1.4
[PATCH v2] x86/microcode: Change checksum to u32
Checksum should be unsigned 32-bit otherwise the calculation overflow results in undefined behaviour: [0.00] [0.00] UBSAN: Undefined behaviour in arch/x86/kernel/cpu/microcode/intel_lib.c:105:12 [0.00] signed integer overflow: [0.00] -1500151068 + -2125470173 cannot be represented in type 'int' [0.00] CPU: 0 PID: 0 Comm: swapper Not tainted 4.5.0-rc5+ #495 [0.00] 0086 0086 83203968 [0.00] 81b30952 834c43b8 83203998 814fe623 [0.00] 83203980 81bcdf2d 8339a448 83203a08 [0.00] Call Trace: [0.00] [] dump_stack+0x4e/0x6c [0.00] [] ? inotify_ioctl+0x43/0x1c0 [0.00] [] ubsan_epilogue+0xd/0x40 [0.00] [] handle_overflow+0xbd/0xe0 [0.00] [] __ubsan_handle_add_overflow+0xe/0x10 [0.00] [] microcode_sanity_check+0x405/0x590 [0.00] [] get_matching_model_microcode.isra.2.constprop.8+0xa5/0x345 [0.00] [] ? early_idt_handler_common+0x3d/0xae [0.00] [] ? strlcpy+0x52/0xa0 [0.00] [] ? find_cpio_data+0x371/0x510 [0.00] [] load_ucode_intel_bsp+0xa1/0xe5 [0.00] [] load_ucode_bsp+0xdf/0xf3 [0.00] [] ? load_ucode_bsp+0xdf/0xf3 [0.00] [] x86_64_start_kernel+0xd1/0xee [0.00] Link: https://lkml.org/lkml/2016/2/27/79 Signed-off-by: Chris Bainbridge --- arch/x86/kernel/cpu/microcode/intel_lib.c | 8 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/arch/x86/kernel/cpu/microcode/intel_lib.c b/arch/x86/kernel/cpu/microcode/intel_lib.c index b96896bcbdaf..99ca2c935777 100644 --- a/arch/x86/kernel/cpu/microcode/intel_lib.c +++ b/arch/x86/kernel/cpu/microcode/intel_lib.c @@ -49,7 +49,7 @@ int microcode_sanity_check(void *mc, int print_err) unsigned long total_size, data_size, ext_table_size; struct microcode_header_intel *mc_header = mc; struct extended_sigtable *ext_header = NULL; - int sum, orig_sum, ext_sigcount = 0, i; + u32 sum, orig_sum, ext_sigcount = 0, i; struct extended_signature *ext_sig; total_size = get_totalsize(mc_header); @@ -85,8 +85,8 @@ int microcode_sanity_check(void *mc, int print_err) /* check extended table checksum */ if (ext_table_size) { - int ext_table_sum = 0; - int *ext_tablep = (int *)ext_header; + u32 ext_table_sum = 0; + u32 *ext_tablep = (u32 *)ext_header; i = ext_table_size / DWSIZE; while (i--) @@ -102,7 +102,7 @@ int microcode_sanity_check(void *mc, int print_err) orig_sum = 0; i = (MC_HEADER_SIZE + data_size) / DWSIZE; while (i--) - orig_sum += ((int *)mc)[i]; + orig_sum += ((u32 *)mc)[i]; if (orig_sum) { if (print_err) pr_err("aborting, bad checksum\n"); -- 2.1.4
Re: [BUG] ODEBUG: assert_init not available (active state 0)
On Tue, Feb 23, 2016 at 02:18:22AM +, Zheng, Lv wrote: > > In fact, I don't understand what your report is. > What did you mean by referring the following commit: > > ACPICA: Events: Enhance acpi_ev_execute_reg_method() to ensure no _REG > > evaluations can happen during OS early boot stages > Was this a bisection result for an issue? > And what was the issue? Yes it was a bisection result for the ODEBUG errors. > If the issue is the following warning messages: > > [ 34.512758] [ cut here ] > > [ 34.512765] WARNING: CPU: 0 PID: 4975 at lib/debugobjects.c:263 > > debug_print_object+0x85/0xa0() > > [ 34.512770] ODEBUG: assert_init not available (active state 0) object > > type: > > timer_list hint: stub_timer+0x0/0x20 > > [ 34.512772] Modules linked in: > > [ 34.512774] CPU: 0 PID: 4975 Comm: systemd Not tainted 4.4.0-rc7+ #353 > > [ 34.512776] Hardware name: Apple Inc. MacBookPro10,2/Mac- > > AFD8A9D944EA4843, BIOS MBP102.88Z.0106.B0A.1509130955 09/13/2015 > > [ 34.512779] 81f9b41c 880227a1bdb0 814ec829 > > 880227a1bdf8 > > [ 34.512782] 880227a1bde8 810cd831 880227a1be90 > > 822514c0 > > [ 34.512785] 81f9b4c2 8327e288 7f29190ba700 > > 880227a1be48 > > [ 34.512786] Call Trace: > > [ 34.512790] [] dump_stack+0x4b/0x72 > > [ 34.512793] [] warn_slowpath_common+0x81/0xc0 > > [ 34.512795] [] warn_slowpath_fmt+0x47/0x50 > > [ 34.512798] [] ? do_init_timer+0x52/0x60 > > [ 34.512800] [] debug_print_object+0x85/0xa0 > > [ 34.512802] [] ? > > trace_event_raw_event_tick_stop+0x100/0x100 > > [ 34.512805] [] debug_object_assert_init+0xf8/0x130 > > [ 34.512807] [] ? trace_hardirqs_on+0xd/0x10 > > [ 34.512810] [] del_timer+0x1f/0x70 > > [ 34.512813] [] laptop_sync_completion+0x61/0x100 > > [ 34.512815] [] ? laptop_io_completion+0x30/0x30 > > [ 34.512819] [] sys_sync+0x7f/0x90 > > [ 34.512824] [] entry_SYSCALL_64_fastpath+0x12/0x6f > > [ 34.512825] ---[ end trace 8fe52cbaccc47e66 ]--- > > Our investigation result shows this is caused by a multiple deletion on the > same Linux kernel timer. > And the commit you reported shouldn't be the culprit. > So it looks to me like a wrong bisection result if it was a bisection result. I've checked again and the bisect result is correct. It could be that the issue was always there, but this error is new on my system at least. > While the following commit may trigger such breakage: > > ACPICA: Parser: Fix for SuperName method invocation > So I think you need to test again after reverting this patch. > The patch is actually reverted in ACPICA 20160212 release. > You can use the following branch where ACPICA 20160212 release is applied: > https://git.kernel.org/cgit/linux/kernel/git/rafael/linux-pm.git/?h=acpica-test > test it again to confirm if such breakage still can be seen. That branch also gives the ODEBUG error. > If the issue is some graphical glitches. > As I said, the commit is just a part of whole series that try to make Linux > ACPICA initialization sequences correctly ordered. > It, along with other patches try to make Linux working as spec compliant > (also proven by Windows behavior). > So you really need to apply more patches on top of acpica-test branch to > confirm again. > For example, patches from this series: > http://www.spinics.net/lists/linux-acpi/msg63550.html Patch series does not apply cleanly (3 4 5 6 12 14 fail) > Especially this one: > http://www.spinics.net/lists/linux-acpi/msg63558.html No change, ODEBUG error still occurs. To sum up test results: v4.4-rc7-48-g849c25719ac6 - ok v4.4-rc7-49-gefaed9be998b - odebug error acpia-test - odebug error http://www.spinics.net/lists/linux-acpi/msg63550.html - patch series does not apply cleanly (3 4 5 6 12 14 fail) http://www.spinics.net/lists/linux-acpi/msg63558.html (patch7) - odebug error
Re: [BUG] ODEBUG: assert_init not available (active state 0)
On Tue, Feb 23, 2016 at 02:18:22AM +, Zheng, Lv wrote: > > In fact, I don't understand what your report is. > What did you mean by referring the following commit: > > ACPICA: Events: Enhance acpi_ev_execute_reg_method() to ensure no _REG > > evaluations can happen during OS early boot stages > Was this a bisection result for an issue? > And what was the issue? Yes it was a bisection result for the ODEBUG errors. > If the issue is the following warning messages: > > [ 34.512758] [ cut here ] > > [ 34.512765] WARNING: CPU: 0 PID: 4975 at lib/debugobjects.c:263 > > debug_print_object+0x85/0xa0() > > [ 34.512770] ODEBUG: assert_init not available (active state 0) object > > type: > > timer_list hint: stub_timer+0x0/0x20 > > [ 34.512772] Modules linked in: > > [ 34.512774] CPU: 0 PID: 4975 Comm: systemd Not tainted 4.4.0-rc7+ #353 > > [ 34.512776] Hardware name: Apple Inc. MacBookPro10,2/Mac- > > AFD8A9D944EA4843, BIOS MBP102.88Z.0106.B0A.1509130955 09/13/2015 > > [ 34.512779] 81f9b41c 880227a1bdb0 814ec829 > > 880227a1bdf8 > > [ 34.512782] 880227a1bde8 810cd831 880227a1be90 > > 822514c0 > > [ 34.512785] 81f9b4c2 8327e288 7f29190ba700 > > 880227a1be48 > > [ 34.512786] Call Trace: > > [ 34.512790] [] dump_stack+0x4b/0x72 > > [ 34.512793] [] warn_slowpath_common+0x81/0xc0 > > [ 34.512795] [] warn_slowpath_fmt+0x47/0x50 > > [ 34.512798] [] ? do_init_timer+0x52/0x60 > > [ 34.512800] [] debug_print_object+0x85/0xa0 > > [ 34.512802] [] ? > > trace_event_raw_event_tick_stop+0x100/0x100 > > [ 34.512805] [] debug_object_assert_init+0xf8/0x130 > > [ 34.512807] [] ? trace_hardirqs_on+0xd/0x10 > > [ 34.512810] [] del_timer+0x1f/0x70 > > [ 34.512813] [] laptop_sync_completion+0x61/0x100 > > [ 34.512815] [] ? laptop_io_completion+0x30/0x30 > > [ 34.512819] [] sys_sync+0x7f/0x90 > > [ 34.512824] [] entry_SYSCALL_64_fastpath+0x12/0x6f > > [ 34.512825] ---[ end trace 8fe52cbaccc47e66 ]--- > > Our investigation result shows this is caused by a multiple deletion on the > same Linux kernel timer. > And the commit you reported shouldn't be the culprit. > So it looks to me like a wrong bisection result if it was a bisection result. I've checked again and the bisect result is correct. It could be that the issue was always there, but this error is new on my system at least. > While the following commit may trigger such breakage: > > ACPICA: Parser: Fix for SuperName method invocation > So I think you need to test again after reverting this patch. > The patch is actually reverted in ACPICA 20160212 release. > You can use the following branch where ACPICA 20160212 release is applied: > https://git.kernel.org/cgit/linux/kernel/git/rafael/linux-pm.git/?h=acpica-test > test it again to confirm if such breakage still can be seen. That branch also gives the ODEBUG error. > If the issue is some graphical glitches. > As I said, the commit is just a part of whole series that try to make Linux > ACPICA initialization sequences correctly ordered. > It, along with other patches try to make Linux working as spec compliant > (also proven by Windows behavior). > So you really need to apply more patches on top of acpica-test branch to > confirm again. > For example, patches from this series: > http://www.spinics.net/lists/linux-acpi/msg63550.html Patch series does not apply cleanly (3 4 5 6 12 14 fail) > Especially this one: > http://www.spinics.net/lists/linux-acpi/msg63558.html No change, ODEBUG error still occurs. To sum up test results: v4.4-rc7-48-g849c25719ac6 - ok v4.4-rc7-49-gefaed9be998b - odebug error acpia-test - odebug error http://www.spinics.net/lists/linux-acpi/msg63550.html - patch series does not apply cleanly (3 4 5 6 12 14 fail) http://www.spinics.net/lists/linux-acpi/msg63558.html (patch7) - odebug error
Re: [PATCH] x86/microcode: Change checksum to u32
On Sat, Feb 27, 2016 at 06:51:18PM +0100, Borislav Petkov wrote: > On Sat, Feb 27, 2016 at 11:01:47AM +0000, Chris Bainbridge wrote: > > Checksum should be unsigned 32-bit otherwise the calculation overflows > > resulting in undefined behaviour: > > > > [0.00] > > > > [0.00] UBSAN: Undefined behaviour in > > arch/x86/kernel/cpu/microcode/intel_lib.c:105:12 > > [0.00] signed integer overflow: > > [0.00] -1500151068 + -2125470173 cannot be represented in type 'int' > > So I've been staring at this error message for a while now and I'm > having hard time understanding what it is telling me. Let's look at all > three ints: > > * sum is computed here: > > sum = orig_sum > - (mc_header->sig + mc_header->pf + mc_header->cksum) > + (ext_sig->sig + ext_sig->pf + ext_sig->cksum); > > and checks mc_header against each extended signature. If it overflows, we > abort: > > if (sum) { > if (print_err) > pr_err("aborting, bad checksum\n"); > > > * orig_sum in the above sum can only be 0: > > if (orig_sum) { > if (print_err) > pr_err("aborting, bad checksum\n"); > return -EINVAL; > > and it adds a bunch of integers which can overflow, sure, but if it > overflows, we exit early. > > * and > > ext_sigcount = ext_header->count; > > so it is some count of extended signatures. > > So what is ubsan complaining about? /* calculate the checksum */ orig_sum = 0; i = (MC_HEADER_SIZE + data_size) / DWSIZE; while (i--) orig_sum += ((int *)mc)[i]; The checksum is the additive sum of all the 32-bit values, but 32-bit addition overflow is undefined for signed ints. For comparison the checksum function from iucode-tool uses u32: intel_ucode_status_t intel_ucode_check_microcode(const void * const uc, int strict) { ... uint32_t sum, orig_sum; unsigned int i; uint32_t *p; ... /* Calculate the checksum. We exclude the extended table as it * also has to have a zero checksum, in order to get better * coverage */ orig_sum = 0; i = (INTEL_UC_V1_HEADER_SIZE + data_size) / sizeof(uint32_t); p = (uint32_t *)uc; while (i--) { orig_sum += *p; p++; }
Re: [PATCH] x86/microcode: Change checksum to u32
On Sat, Feb 27, 2016 at 06:51:18PM +0100, Borislav Petkov wrote: > On Sat, Feb 27, 2016 at 11:01:47AM +0000, Chris Bainbridge wrote: > > Checksum should be unsigned 32-bit otherwise the calculation overflows > > resulting in undefined behaviour: > > > > [0.00] > > > > [0.00] UBSAN: Undefined behaviour in > > arch/x86/kernel/cpu/microcode/intel_lib.c:105:12 > > [0.00] signed integer overflow: > > [0.00] -1500151068 + -2125470173 cannot be represented in type 'int' > > So I've been staring at this error message for a while now and I'm > having hard time understanding what it is telling me. Let's look at all > three ints: > > * sum is computed here: > > sum = orig_sum > - (mc_header->sig + mc_header->pf + mc_header->cksum) > + (ext_sig->sig + ext_sig->pf + ext_sig->cksum); > > and checks mc_header against each extended signature. If it overflows, we > abort: > > if (sum) { > if (print_err) > pr_err("aborting, bad checksum\n"); > > > * orig_sum in the above sum can only be 0: > > if (orig_sum) { > if (print_err) > pr_err("aborting, bad checksum\n"); > return -EINVAL; > > and it adds a bunch of integers which can overflow, sure, but if it > overflows, we exit early. > > * and > > ext_sigcount = ext_header->count; > > so it is some count of extended signatures. > > So what is ubsan complaining about? /* calculate the checksum */ orig_sum = 0; i = (MC_HEADER_SIZE + data_size) / DWSIZE; while (i--) orig_sum += ((int *)mc)[i]; The checksum is the additive sum of all the 32-bit values, but 32-bit addition overflow is undefined for signed ints. For comparison the checksum function from iucode-tool uses u32: intel_ucode_status_t intel_ucode_check_microcode(const void * const uc, int strict) { ... uint32_t sum, orig_sum; unsigned int i; uint32_t *p; ... /* Calculate the checksum. We exclude the extended table as it * also has to have a zero checksum, in order to get better * coverage */ orig_sum = 0; i = (INTEL_UC_V1_HEADER_SIZE + data_size) / sizeof(uint32_t); p = (uint32_t *)uc; while (i--) { orig_sum += *p; p++; }
[PATCH] x86/microcode: Change checksum to u32
Checksum should be unsigned 32-bit otherwise the calculation overflows resulting in undefined behaviour: [0.00] [0.00] UBSAN: Undefined behaviour in arch/x86/kernel/cpu/microcode/intel_lib.c:105:12 [0.00] signed integer overflow: [0.00] -1500151068 + -2125470173 cannot be represented in type 'int' [0.00] CPU: 0 PID: 0 Comm: swapper Not tainted 4.5.0-rc5+ #495 [0.00] 0086 0086 83203968 [0.00] 81b30952 834c43b8 83203998 814fe623 [0.00] 83203980 81bcdf2d 8339a448 83203a08 [0.00] Call Trace: [0.00] [] dump_stack+0x4e/0x6c [0.00] [] ? inotify_ioctl+0x43/0x1c0 [0.00] [] ubsan_epilogue+0xd/0x40 [0.00] [] handle_overflow+0xbd/0xe0 [0.00] [] __ubsan_handle_add_overflow+0xe/0x10 [0.00] [] microcode_sanity_check+0x405/0x590 [0.00] [] get_matching_model_microcode.isra.2.constprop.8+0xa5/0x345 [0.00] [] ? early_idt_handler_common+0x3d/0xae [0.00] [] ? strlcpy+0x52/0xa0 [0.00] [] ? find_cpio_data+0x371/0x510 [0.00] [] load_ucode_intel_bsp+0xa1/0xe5 [0.00] [] load_ucode_bsp+0xdf/0xf3 [0.00] [] ? load_ucode_bsp+0xdf/0xf3 [0.00] [] x86_64_start_kernel+0xd1/0xee [0.00] Signed-off-by: Chris Bainbridge <chris.bainbri...@gmail.com> --- arch/x86/kernel/cpu/microcode/intel_lib.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/x86/kernel/cpu/microcode/intel_lib.c b/arch/x86/kernel/cpu/microcode/intel_lib.c index b96896bcbdaf..2bef93ff2e3f 100644 --- a/arch/x86/kernel/cpu/microcode/intel_lib.c +++ b/arch/x86/kernel/cpu/microcode/intel_lib.c @@ -49,7 +49,7 @@ int microcode_sanity_check(void *mc, int print_err) unsigned long total_size, data_size, ext_table_size; struct microcode_header_intel *mc_header = mc; struct extended_sigtable *ext_header = NULL; - int sum, orig_sum, ext_sigcount = 0, i; + u32 sum, orig_sum, ext_sigcount = 0, i; struct extended_signature *ext_sig; total_size = get_totalsize(mc_header); -- 2.1.4
[PATCH] x86/microcode: Change checksum to u32
Checksum should be unsigned 32-bit otherwise the calculation overflows resulting in undefined behaviour: [0.00] [0.00] UBSAN: Undefined behaviour in arch/x86/kernel/cpu/microcode/intel_lib.c:105:12 [0.00] signed integer overflow: [0.00] -1500151068 + -2125470173 cannot be represented in type 'int' [0.00] CPU: 0 PID: 0 Comm: swapper Not tainted 4.5.0-rc5+ #495 [0.00] 0086 0086 83203968 [0.00] 81b30952 834c43b8 83203998 814fe623 [0.00] 83203980 81bcdf2d 8339a448 83203a08 [0.00] Call Trace: [0.00] [] dump_stack+0x4e/0x6c [0.00] [] ? inotify_ioctl+0x43/0x1c0 [0.00] [] ubsan_epilogue+0xd/0x40 [0.00] [] handle_overflow+0xbd/0xe0 [0.00] [] __ubsan_handle_add_overflow+0xe/0x10 [0.00] [] microcode_sanity_check+0x405/0x590 [0.00] [] get_matching_model_microcode.isra.2.constprop.8+0xa5/0x345 [0.00] [] ? early_idt_handler_common+0x3d/0xae [0.00] [] ? strlcpy+0x52/0xa0 [0.00] [] ? find_cpio_data+0x371/0x510 [0.00] [] load_ucode_intel_bsp+0xa1/0xe5 [0.00] [] load_ucode_bsp+0xdf/0xf3 [0.00] [] ? load_ucode_bsp+0xdf/0xf3 [0.00] [] x86_64_start_kernel+0xd1/0xee [0.00] Signed-off-by: Chris Bainbridge --- arch/x86/kernel/cpu/microcode/intel_lib.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/x86/kernel/cpu/microcode/intel_lib.c b/arch/x86/kernel/cpu/microcode/intel_lib.c index b96896bcbdaf..2bef93ff2e3f 100644 --- a/arch/x86/kernel/cpu/microcode/intel_lib.c +++ b/arch/x86/kernel/cpu/microcode/intel_lib.c @@ -49,7 +49,7 @@ int microcode_sanity_check(void *mc, int print_err) unsigned long total_size, data_size, ext_table_size; struct microcode_header_intel *mc_header = mc; struct extended_sigtable *ext_header = NULL; - int sum, orig_sum, ext_sigcount = 0, i; + u32 sum, orig_sum, ext_sigcount = 0, i; struct extended_signature *ext_sig; total_size = get_totalsize(mc_header); -- 2.1.4
[BUG] USB ethernet on XHCI dies under load
USB ethernet devices stop responding when plugged in to USB3 XHCI ports and flooded with incoming traffic. The usbnet layer seems to get -EPROTO from the xhci driver. Nothing is usually logged in kernel when this occurs, but with dyndebug on there are errors like: ax88179_178a 4-1.1:1.0 eth0: tx throttle -71 intr status -71 Sometimes: xhci_hcd :00:14.0: xHCI xhci_drop_endpoint called with disabled ep 88024ea1adc8 To reproduce just flood the adaptor with gigabit traffic from another computer for a couple of minutes with any of: ping -f $host iperf3 -c $host -t 1000 -P 128 iperf3 -c $host -u -b $bandwidth (for iperf tests run 'iperf3 -s' on the test platform) iperf3 shows many errors/retries. If slub_debug is on (specifically slub_debug=U) there are more retries and it fails faster, which suggests some kind of memory corruption. Other slub_debug options do not appear to have any effect. Another symptom of memory corruption is that other USB devices plugged in to different ports will disconnect and reconnect. The problem is not these devices though, as the USB ethernet still fails when all other devices are unplugged. I have reproduced this with an AX88179 USB3 gigabit ethernet adaptor and a dm9601 USB1 10/100 adaptor, on both a laptop with Intel XHCI chipset and a desktop with VIA XHCI chipset. With the dm9601 adaptor just running "iperf3 -c z -t 10 -u -b 12m" (12mbit UDP stream) is enough to quickly kill it. I have verified that the bug does not occur with USB2 chipset of the desktop, only USB3. This bug appears to have been around a long time. There are some relevant bug threads, several refer specifically to disconnects on flooding with AX88179: http://www.spinics.net/lists/linux-usb/msg113358.html https://bugzilla.kernel.org/show_bug.cgi?id=75381 https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1371233 https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1269883 I bisected this with the AX88179 but it just led back to v3.11-rc1-224-g452c447a497d ("USBNET: increase max rx/tx qlen for improving USB3 thoughtput") which is probably not the actual cause of the bug.
[BUG] USB ethernet on XHCI dies under load
USB ethernet devices stop responding when plugged in to USB3 XHCI ports and flooded with incoming traffic. The usbnet layer seems to get -EPROTO from the xhci driver. Nothing is usually logged in kernel when this occurs, but with dyndebug on there are errors like: ax88179_178a 4-1.1:1.0 eth0: tx throttle -71 intr status -71 Sometimes: xhci_hcd :00:14.0: xHCI xhci_drop_endpoint called with disabled ep 88024ea1adc8 To reproduce just flood the adaptor with gigabit traffic from another computer for a couple of minutes with any of: ping -f $host iperf3 -c $host -t 1000 -P 128 iperf3 -c $host -u -b $bandwidth (for iperf tests run 'iperf3 -s' on the test platform) iperf3 shows many errors/retries. If slub_debug is on (specifically slub_debug=U) there are more retries and it fails faster, which suggests some kind of memory corruption. Other slub_debug options do not appear to have any effect. Another symptom of memory corruption is that other USB devices plugged in to different ports will disconnect and reconnect. The problem is not these devices though, as the USB ethernet still fails when all other devices are unplugged. I have reproduced this with an AX88179 USB3 gigabit ethernet adaptor and a dm9601 USB1 10/100 adaptor, on both a laptop with Intel XHCI chipset and a desktop with VIA XHCI chipset. With the dm9601 adaptor just running "iperf3 -c z -t 10 -u -b 12m" (12mbit UDP stream) is enough to quickly kill it. I have verified that the bug does not occur with USB2 chipset of the desktop, only USB3. This bug appears to have been around a long time. There are some relevant bug threads, several refer specifically to disconnects on flooding with AX88179: http://www.spinics.net/lists/linux-usb/msg113358.html https://bugzilla.kernel.org/show_bug.cgi?id=75381 https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1371233 https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1269883 I bisected this with the AX88179 but it just led back to v3.11-rc1-224-g452c447a497d ("USBNET: increase max rx/tx qlen for improving USB3 thoughtput") which is probably not the actual cause of the bug.
Re: [BUG] ODEBUG: assert_init not available (active state 0)
On 5 February 2016 at 02:52, Zheng, Lv wrote: > > So you could wait for just several days to test it again. > And report here if you can still see issues. I didn't notice any revert last week, is there something specific that I should test? I just noticed that there are graphical glitches in Firefox on Youtube as well.
Re: [BUG] ODEBUG: assert_init not available (active state 0)
On 5 February 2016 at 02:52, Zheng, Lvwrote: > > So you could wait for just several days to test it again. > And report here if you can still see issues. I didn't notice any revert last week, is there something specific that I should test? I just noticed that there are graphical glitches in Firefox on Youtube as well.
[PATCH] usb: core: hub: hub_port_init lock controller instead of bus
The XHCI controller presents two USB buses to the system - one for USB 2 and one for USB 3. When only one bus is locked there is a race condition with two threads in hub_port_init: [8.984500] Call Trace: [8.985698] [] schedule+0x37/0x90 [8.986918] [] usb_kill_urb+0x8d/0xd0 [8.988130] [] ? wake_up_atomic_t+0x30/0x30 [8.989343] [] usb_start_wait_urb+0xbe/0x150 [8.990561] [] usb_control_msg+0xbc/0xf0 [8.991766] [] hub_port_init+0x51e/0xb70 [8.992964] [] hub_event+0x817/0x1570 [8.994156] [] process_one_work+0x1ff/0x620 [8.995342] [] ? process_one_work+0x15f/0x620 [8.996528] [] worker_thread+0x64/0x4b0 [8.997707] [] ? rescuer_thread+0x390/0x390 [8.998883] [] kthread+0x105/0x120 [9.56] [] ? kthread_create_on_node+0x200/0x200 [9.001241] [] ret_from_fork+0x3f/0x70 [9.002420] [] ? kthread_create_on_node+0x200/0x200 [9.870837] Call Trace: [9.875664] [] xhci_setup_device+0x53d/0xa40 [9.876871] [] xhci_address_device+0xe/0x10 [9.878068] [] hub_port_init+0x1bf/0xb70 [9.879262] [] ? trace_hardirqs_on+0xd/0x10 [9.880465] [] hub_event+0x817/0x1570 [9.881668] [] process_one_work+0x1ff/0x620 [9.882869] [] ? process_one_work+0x15f/0x620 [9.884074] [] worker_thread+0x64/0x4b0 [9.885268] [] ? rescuer_thread+0x390/0x390 [9.886457] [] kthread+0x105/0x120 [9.887634] [] ? kthread_create_on_node+0x200/0x200 [9.17] [] ret_from_fork+0x3f/0x70 [9.889995] [] ? kthread_create_on_node+0x200/0x200 Which results from the two call chains: hub_port_init usb_get_device_descriptor usb_get_descriptor usb_control_msg usb_internal_control_msg usb_start_wait_urb usb_submit_urb / wait_for_completion_timeout / usb_kill_urb hub_port_init hub_set_address xhci_address_device xhci_setup_device The logged kernel errors are: [8.034843] xhci_hcd :00:14.0: Timeout while waiting for setup device command [ 13.183701] usb 3-3: device descriptor read/all, error -110 On a test system this failure occurred on 6% of all boots. Hypothetically, it should perhaps be possible to set the address of the hub on one bus while the hub on the other bus already has an outstanding descriptor read request. But as this is not working reliably, fix it by locking the controller in hub_port_init to prevent parallel initialisation of both buses. Fixes: 638139eb95d2 ("usb: hub: allow to process more usb hub events in parallel") Signed-off-by: Chris Bainbridge --- drivers/usb/core/hcd.c | 15 +-- drivers/usb/core/hub.c | 8 include/linux/usb.h | 3 +-- include/linux/usb/hcd.h | 1 + 4 files changed, 19 insertions(+), 8 deletions(-) diff --git a/drivers/usb/core/hcd.c b/drivers/usb/core/hcd.c index df0e3b92533a..5824e819176a 100644 --- a/drivers/usb/core/hcd.c +++ b/drivers/usb/core/hcd.c @@ -966,7 +966,7 @@ static void usb_bus_init (struct usb_bus *bus) bus->bandwidth_allocated = 0; bus->bandwidth_int_reqs = 0; bus->bandwidth_isoc_reqs = 0; - mutex_init(>usb_address0_mutex); + mutex_init(>devnum_next_mutex); INIT_LIST_HEAD (>bus_list); } @@ -2497,6 +2497,14 @@ struct usb_hcd *usb_create_shared_hcd(const struct hc_driver *driver, return NULL; } if (primary_hcd == NULL) { + hcd->address0_mutex = kmalloc(sizeof(*hcd->address0_mutex), + GFP_KERNEL); + if (!hcd->address0_mutex) { + kfree(hcd); + dev_dbg(dev, "hcd address0 mutex alloc failed\n"); + return NULL; + } + mutex_init(hcd->address0_mutex); hcd->bandwidth_mutex = kmalloc(sizeof(*hcd->bandwidth_mutex), GFP_KERNEL); if (!hcd->bandwidth_mutex) { @@ -2508,6 +2516,7 @@ struct usb_hcd *usb_create_shared_hcd(const struct hc_driver *driver, dev_set_drvdata(dev, hcd); } else { mutex_lock(_port_peer_mutex); + hcd->address0_mutex = primary_hcd->address0_mutex; hcd->bandwidth_mutex = primary_hcd->bandwidth_mutex; hcd->primary_hcd = primary_hcd; primary_hcd->primary_hcd = primary_hcd; @@ -2574,8 +2583,10 @@ static void hcd_release(struct kref *kref) struct usb_hcd *hcd = container_of (kref, struct usb_hcd, kref); mutex_lock(_port_peer_mutex); - if (usb_hcd_is_primary_hcd(hcd)) + if (usb_hcd_is_primary_hcd(hcd)) { + kfree(hcd->address0_mutex); kfree(hcd->bandwidth_mutex); + } if (hcd->shared_hcd) { struct usb_hcd *peer = hcd->shared_hcd; diff --git a/drivers/usb/core/hub.c b/drivers/usb/core/hub.c index 350dcd9af5d8..72ee2338bde0 100644 --
[PATCH] usb: core: hub: hub_port_init lock controller instead of bus
The XHCI controller presents two USB buses to the system - one for USB 2 and one for USB 3. When only one bus is locked there is a race condition with two threads in hub_port_init: [8.984500] Call Trace: [8.985698] [] schedule+0x37/0x90 [8.986918] [] usb_kill_urb+0x8d/0xd0 [8.988130] [] ? wake_up_atomic_t+0x30/0x30 [8.989343] [] usb_start_wait_urb+0xbe/0x150 [8.990561] [] usb_control_msg+0xbc/0xf0 [8.991766] [] hub_port_init+0x51e/0xb70 [8.992964] [] hub_event+0x817/0x1570 [8.994156] [] process_one_work+0x1ff/0x620 [8.995342] [] ? process_one_work+0x15f/0x620 [8.996528] [] worker_thread+0x64/0x4b0 [8.997707] [] ? rescuer_thread+0x390/0x390 [8.998883] [] kthread+0x105/0x120 [9.56] [] ? kthread_create_on_node+0x200/0x200 [9.001241] [] ret_from_fork+0x3f/0x70 [9.002420] [] ? kthread_create_on_node+0x200/0x200 [9.870837] Call Trace: [9.875664] [] xhci_setup_device+0x53d/0xa40 [9.876871] [] xhci_address_device+0xe/0x10 [9.878068] [] hub_port_init+0x1bf/0xb70 [9.879262] [] ? trace_hardirqs_on+0xd/0x10 [9.880465] [] hub_event+0x817/0x1570 [9.881668] [] process_one_work+0x1ff/0x620 [9.882869] [] ? process_one_work+0x15f/0x620 [9.884074] [] worker_thread+0x64/0x4b0 [9.885268] [] ? rescuer_thread+0x390/0x390 [9.886457] [] kthread+0x105/0x120 [9.887634] [] ? kthread_create_on_node+0x200/0x200 [9.17] [] ret_from_fork+0x3f/0x70 [9.889995] [] ? kthread_create_on_node+0x200/0x200 Which results from the two call chains: hub_port_init usb_get_device_descriptor usb_get_descriptor usb_control_msg usb_internal_control_msg usb_start_wait_urb usb_submit_urb / wait_for_completion_timeout / usb_kill_urb hub_port_init hub_set_address xhci_address_device xhci_setup_device The logged kernel errors are: [8.034843] xhci_hcd :00:14.0: Timeout while waiting for setup device command [ 13.183701] usb 3-3: device descriptor read/all, error -110 On a test system this failure occurred on 6% of all boots. Hypothetically, it should perhaps be possible to set the address of the hub on one bus while the hub on the other bus already has an outstanding descriptor read request. But as this is not working reliably, fix it by locking the controller in hub_port_init to prevent parallel initialisation of both buses. Fixes: 638139eb95d2 ("usb: hub: allow to process more usb hub events in parallel") Signed-off-by: Chris Bainbridge <chris.bainbri...@gmail.com> --- drivers/usb/core/hcd.c | 15 +-- drivers/usb/core/hub.c | 8 include/linux/usb.h | 3 +-- include/linux/usb/hcd.h | 1 + 4 files changed, 19 insertions(+), 8 deletions(-) diff --git a/drivers/usb/core/hcd.c b/drivers/usb/core/hcd.c index df0e3b92533a..5824e819176a 100644 --- a/drivers/usb/core/hcd.c +++ b/drivers/usb/core/hcd.c @@ -966,7 +966,7 @@ static void usb_bus_init (struct usb_bus *bus) bus->bandwidth_allocated = 0; bus->bandwidth_int_reqs = 0; bus->bandwidth_isoc_reqs = 0; - mutex_init(>usb_address0_mutex); + mutex_init(>devnum_next_mutex); INIT_LIST_HEAD (>bus_list); } @@ -2497,6 +2497,14 @@ struct usb_hcd *usb_create_shared_hcd(const struct hc_driver *driver, return NULL; } if (primary_hcd == NULL) { + hcd->address0_mutex = kmalloc(sizeof(*hcd->address0_mutex), + GFP_KERNEL); + if (!hcd->address0_mutex) { + kfree(hcd); + dev_dbg(dev, "hcd address0 mutex alloc failed\n"); + return NULL; + } + mutex_init(hcd->address0_mutex); hcd->bandwidth_mutex = kmalloc(sizeof(*hcd->bandwidth_mutex), GFP_KERNEL); if (!hcd->bandwidth_mutex) { @@ -2508,6 +2516,7 @@ struct usb_hcd *usb_create_shared_hcd(const struct hc_driver *driver, dev_set_drvdata(dev, hcd); } else { mutex_lock(_port_peer_mutex); + hcd->address0_mutex = primary_hcd->address0_mutex; hcd->bandwidth_mutex = primary_hcd->bandwidth_mutex; hcd->primary_hcd = primary_hcd; primary_hcd->primary_hcd = primary_hcd; @@ -2574,8 +2583,10 @@ static void hcd_release(struct kref *kref) struct usb_hcd *hcd = container_of (kref, struct usb_hcd, kref); mutex_lock(_port_peer_mutex); - if (usb_hcd_is_primary_hcd(hcd)) + if (usb_hcd_is_primary_hcd(hcd)) { + kfree(hcd->address0_mutex); kfree(hcd->bandwidth_mutex); + } if (hcd->shared_hcd) { struct usb_hcd *peer = hcd->shared_hcd; diff --git a/drivers/usb/core/hub.c b/drivers/usb/core/hub.c in
Re: [PATCH] usb: host: xhci: Replace bus lock with host controller lock
Running task list at fail point: [8.978405] kworker/3:0 R running task1176024 2 0x [8.979624] Workqueue: usb_hub_wq hub_event [8.980831] 880260613af8 880260613ac0 811247ed 8802604597c0 [8.982061] 880260608000 880260614000 0100 880260613b70 [8.983280] 880260613c14 1388 880260613b10 81b9bab7 [8.984500] Call Trace: [8.985698] [] schedule+0x37/0x90 [8.986918] [] usb_kill_urb+0x8d/0xd0 [8.988130] [] ? wake_up_atomic_t+0x30/0x30 [8.989343] [] usb_start_wait_urb+0xbe/0x150 [8.990561] [] usb_control_msg+0xbc/0xf0 [8.991766] [] hub_port_init+0x51e/0xb70 [8.992964] [] hub_event+0x817/0x1570 [8.994156] [] process_one_work+0x1ff/0x620 [8.995342] [] ? process_one_work+0x15f/0x620 [8.996528] [] worker_thread+0x64/0x4b0 [8.997707] [] ? rescuer_thread+0x390/0x390 [8.998883] [] kthread+0x105/0x120 [9.56] [] ? kthread_create_on_node+0x200/0x200 [9.001241] [] ret_from_fork+0x3f/0x70 [9.002420] [] ? kthread_create_on_node+0x200/0x200 [9.094152] kworker/3:1 R running task11496 238 2 0x [9.095361] Workqueue: pm pm_runtime_work [9.096545] 88025fc13ae0 88025fc13aa8 0003811247ed 880260bc [9.097767] 8802603aaf80 88025fc14000 88025cc7a520 88007c4b7978 [9.098986] 88025cc7a520 88007c4b7978 88025fc13af8 81b9bab7 [9.100213] Call Trace: [9.101415] [] schedule+0x37/0x90 [9.102626] [] usb_kill_urb+0x8d/0xd0 [9.103835] [] ? wake_up_atomic_t+0x30/0x30 [9.105035] [] hub_quiesce+0x6b/0xa0 [9.106226] [] hub_suspend+0x12a/0x250 [9.107417] [] usb_suspend_both+0x95/0x1d0 [9.108600] [] usb_runtime_suspend+0x2e/0x70 [9.109783] [] ? usb_probe_interface+0x310/0x310 [9.110958] [] __rpm_callback+0x2d/0x70 [9.112130] [] ? usb_probe_interface+0x310/0x310 [9.113305] [] rpm_callback+0x1d/0x90 [9.114482] [] rpm_suspend+0x14b/0x750 [9.115664] [] __pm_runtime_suspend+0x57/0x90 [9.116852] [] ? usb_runtime_resume+0x20/0x20 [9.118034] [] usb_runtime_idle+0x25/0x30 [9.119210] [] __rpm_callback+0x2d/0x70 [9.120382] [] rpm_idle+0x221/0x410 [9.121541] [] pm_runtime_work+0xa9/0xc0 [9.122693] [] process_one_work+0x1ff/0x620 [9.123850] [] ? process_one_work+0x15f/0x620 [9.124993] [] worker_thread+0x64/0x4b0 [9.126134] [] ? rescuer_thread+0x390/0x390 [9.127272] [] ? rescuer_thread+0x390/0x390 [9.128403] [] kthread+0x105/0x120 [9.129524] [] ? kthread_create_on_node+0x200/0x200 [9.130653] [] ret_from_fork+0x3f/0x70 [9.131781] [] ? kthread_create_on_node+0x200/0x200 [9.864737] kworker/3:2 R running task11888 1348 2 0x0008 [9.865947] Workqueue: usb_hub_wq hub_event [9.867152] 88025dbcdf00 88025dbd3b88 81109a8c 811099a9 [9.868386] 88025dbcdf00 88025dbce270 88025dbd3bb8 [9.869610] 81109c04 88025dac8000 88025dac8368 88025c8a4000 [9.870837] Call Trace: [9.872044] [] sched_show_task+0x15c/0x260 [9.873248] [] ? sched_show_task+0x79/0x260 [9.874456] [] show_state_filter+0x74/0xc0 [9.875664] [] xhci_setup_device+0x53d/0xa40 [9.876871] [] xhci_address_device+0xe/0x10 [9.878068] [] hub_port_init+0x1bf/0xb70 [9.879262] [] ? trace_hardirqs_on+0xd/0x10 [9.880465] [] hub_event+0x817/0x1570 [9.881668] [] process_one_work+0x1ff/0x620 [9.882869] [] ? process_one_work+0x15f/0x620 [9.884074] [] worker_thread+0x64/0x4b0 [9.885268] [] ? rescuer_thread+0x390/0x390 [9.886457] [] kthread+0x105/0x120 [9.887634] [] ? kthread_create_on_node+0x200/0x200 [9.17] [] ret_from_fork+0x3f/0x70 [9.889995] [] ? kthread_create_on_node+0x200/0x200 [9.891174] kworker/3:3 R running task10808 1350 2 0x [9.892367] Workqueue: events_freezable usb_stor_scan_dwork [9.893560] 88025dbdbb78 88026a0ccd80 6a0ccd80 8802604597c0 [9.894773] 88025dbcc740 88025dbdc000 88025dbdbbb0 88026a0ccd80 [9.896004] 88026a0ccd80 0003 88025dbdbb90 81b9bab7 [9.897233] Call Trace: [9.898444] [] schedule+0x37/0x90 [9.899670] [] schedule_timeout+0x17c/0x2f0 [9.900893] [] ? init_timer_on_stack_key+0x40/0x40 [9.902099] [] wait_for_completion_interruptible_timeout+0xcd/0x160 [9.903324] [] ? wake_up_q+0x70/0x70 [9.904536] [] usb_stor_msg_common+0xd8/0x130 [9.905751] [] usb_stor_control_msg+0x96/0xb0 [9.906961] [] usb_stor_Bulk_max_lun+0x51/0xa0 [9.908163] [] usb_stor_scan_dwork+0x7f/0xe0 [9.909361] [] process_one_work+0x1ff/0x620 [9.910563] [] ? process_one_work+0x15f/0x620 [9.911762] [] worker_thread+0x64/0x4b0 [
Re: [PATCH] usb: host: xhci: Replace bus lock with host controller lock
Running task list at fail point: [8.978405] kworker/3:0 R running task1176024 2 0x [8.979624] Workqueue: usb_hub_wq hub_event [8.980831] 880260613af8 880260613ac0 811247ed 8802604597c0 [8.982061] 880260608000 880260614000 0100 880260613b70 [8.983280] 880260613c14 1388 880260613b10 81b9bab7 [8.984500] Call Trace: [8.985698] [] schedule+0x37/0x90 [8.986918] [] usb_kill_urb+0x8d/0xd0 [8.988130] [] ? wake_up_atomic_t+0x30/0x30 [8.989343] [] usb_start_wait_urb+0xbe/0x150 [8.990561] [] usb_control_msg+0xbc/0xf0 [8.991766] [] hub_port_init+0x51e/0xb70 [8.992964] [] hub_event+0x817/0x1570 [8.994156] [] process_one_work+0x1ff/0x620 [8.995342] [] ? process_one_work+0x15f/0x620 [8.996528] [] worker_thread+0x64/0x4b0 [8.997707] [] ? rescuer_thread+0x390/0x390 [8.998883] [] kthread+0x105/0x120 [9.56] [] ? kthread_create_on_node+0x200/0x200 [9.001241] [] ret_from_fork+0x3f/0x70 [9.002420] [] ? kthread_create_on_node+0x200/0x200 [9.094152] kworker/3:1 R running task11496 238 2 0x [9.095361] Workqueue: pm pm_runtime_work [9.096545] 88025fc13ae0 88025fc13aa8 0003811247ed 880260bc [9.097767] 8802603aaf80 88025fc14000 88025cc7a520 88007c4b7978 [9.098986] 88025cc7a520 88007c4b7978 88025fc13af8 81b9bab7 [9.100213] Call Trace: [9.101415] [] schedule+0x37/0x90 [9.102626] [] usb_kill_urb+0x8d/0xd0 [9.103835] [] ? wake_up_atomic_t+0x30/0x30 [9.105035] [] hub_quiesce+0x6b/0xa0 [9.106226] [] hub_suspend+0x12a/0x250 [9.107417] [] usb_suspend_both+0x95/0x1d0 [9.108600] [] usb_runtime_suspend+0x2e/0x70 [9.109783] [] ? usb_probe_interface+0x310/0x310 [9.110958] [] __rpm_callback+0x2d/0x70 [9.112130] [] ? usb_probe_interface+0x310/0x310 [9.113305] [] rpm_callback+0x1d/0x90 [9.114482] [] rpm_suspend+0x14b/0x750 [9.115664] [] __pm_runtime_suspend+0x57/0x90 [9.116852] [] ? usb_runtime_resume+0x20/0x20 [9.118034] [] usb_runtime_idle+0x25/0x30 [9.119210] [] __rpm_callback+0x2d/0x70 [9.120382] [] rpm_idle+0x221/0x410 [9.121541] [] pm_runtime_work+0xa9/0xc0 [9.122693] [] process_one_work+0x1ff/0x620 [9.123850] [] ? process_one_work+0x15f/0x620 [9.124993] [] worker_thread+0x64/0x4b0 [9.126134] [] ? rescuer_thread+0x390/0x390 [9.127272] [] ? rescuer_thread+0x390/0x390 [9.128403] [] kthread+0x105/0x120 [9.129524] [] ? kthread_create_on_node+0x200/0x200 [9.130653] [] ret_from_fork+0x3f/0x70 [9.131781] [] ? kthread_create_on_node+0x200/0x200 [9.864737] kworker/3:2 R running task11888 1348 2 0x0008 [9.865947] Workqueue: usb_hub_wq hub_event [9.867152] 88025dbcdf00 88025dbd3b88 81109a8c 811099a9 [9.868386] 88025dbcdf00 88025dbce270 88025dbd3bb8 [9.869610] 81109c04 88025dac8000 88025dac8368 88025c8a4000 [9.870837] Call Trace: [9.872044] [] sched_show_task+0x15c/0x260 [9.873248] [] ? sched_show_task+0x79/0x260 [9.874456] [] show_state_filter+0x74/0xc0 [9.875664] [] xhci_setup_device+0x53d/0xa40 [9.876871] [] xhci_address_device+0xe/0x10 [9.878068] [] hub_port_init+0x1bf/0xb70 [9.879262] [] ? trace_hardirqs_on+0xd/0x10 [9.880465] [] hub_event+0x817/0x1570 [9.881668] [] process_one_work+0x1ff/0x620 [9.882869] [] ? process_one_work+0x15f/0x620 [9.884074] [] worker_thread+0x64/0x4b0 [9.885268] [] ? rescuer_thread+0x390/0x390 [9.886457] [] kthread+0x105/0x120 [9.887634] [] ? kthread_create_on_node+0x200/0x200 [9.17] [] ret_from_fork+0x3f/0x70 [9.889995] [] ? kthread_create_on_node+0x200/0x200 [9.891174] kworker/3:3 R running task10808 1350 2 0x [9.892367] Workqueue: events_freezable usb_stor_scan_dwork [9.893560] 88025dbdbb78 88026a0ccd80 6a0ccd80 8802604597c0 [9.894773] 88025dbcc740 88025dbdc000 88025dbdbbb0 88026a0ccd80 [9.896004] 88026a0ccd80 0003 88025dbdbb90 81b9bab7 [9.897233] Call Trace: [9.898444] [] schedule+0x37/0x90 [9.899670] [] schedule_timeout+0x17c/0x2f0 [9.900893] [] ? init_timer_on_stack_key+0x40/0x40 [9.902099] [] wait_for_completion_interruptible_timeout+0xcd/0x160 [9.903324] [] ? wake_up_q+0x70/0x70 [9.904536] [] usb_stor_msg_common+0xd8/0x130 [9.905751] [] usb_stor_control_msg+0x96/0xb0 [9.906961] [] usb_stor_Bulk_max_lun+0x51/0xa0 [9.908163] [] usb_stor_scan_dwork+0x7f/0xe0 [9.909361] [] process_one_work+0x1ff/0x620 [9.910563] [] ? process_one_work+0x15f/0x620 [9.911762] [] worker_thread+0x64/0x4b0 [
Re: [PATCH] usb: host: xhci: Replace bus lock with host controller lock
On Thu, Feb 04, 2016 at 04:00:51PM -0500, Alan Stern wrote: > On Thu, 4 Feb 2016, Chris Bainbridge wrote: > > > The XHCI controller presents two USB buses to the system - one for USB 2 > > and one for USB 3. When only one bus is locked there is a race condition > > during hub init that results in errors like: > > > > [ 13.183701] usb 3-3: device descriptor read/all, error -110 > > What exactly is the race condition? Why does locking both buses fix > it? [2.692571] xhci_hcd :00:14.0: xHCI Host Controller [2.693279] xhci_hcd :00:14.0: new USB bus registered, assigned bus number 3 [2.694867] xhci_hcd :00:14.0: hcc params 0x20007181 hci version 0x100 quirks 0xb930 [2.694880] xhci_hcd :00:14.0: cache line size of 256 is not supported [2.695995] usb usb3: New USB device found, idVendor=1d6b, idProduct=0002 [2.696005] usb usb3: New USB device strings: Mfr=3, Product=2, SerialNumber=1 [2.696016] usb usb3: Product: xHCI Host Controller [2.696024] usb usb3: Manufacturer: Linux 4.4.0 xhci-hcd [2.696031] usb usb3: SerialNumber: :00:14.0 [2.698414] hub 3-0:1.0: USB hub found [2.704723] xhci_hcd :00:14.0: xHCI Host Controller [2.705502] xhci_hcd :00:14.0: new USB bus registered, assigned bus number 4 [2.706136] usb usb4: New USB device found, idVendor=1d6b, idProduct=0003 [2.706138] usb usb4: New USB device strings: Mfr=3, Product=2, SerialNumber=1 [2.706139] usb usb4: Product: xHCI Host Controller [2.706140] usb usb4: Manufacturer: Linux 4.4.0 xhci-hcd [2.706141] usb usb4: SerialNumber: :00:14.0 [2.708467] hub 4-0:1.0: USB hub found [3.021779] usb 3-3: new high-speed USB device number 2 using xhci_hcd [8.034843] xhci_hcd :00:14.0: Timeout while waiting for setup device command ...bus4 enumeration... [ 13.297835] usb 3-3: device descriptor read/all, error -110 ...bus3 enumeration... Note there seem to be two timeouts of 5 seconds/10 seconds after the last message at 3.021779 seconds. hub_port_init is called in parallel for both buses. The first thread is in usb_get_device_descriptor when the second one enters the function and calls the code to get an address. I don't know precisely how it fails - it looks like the functions for doing the initialisation are synchronous and sleeping waiting for a response and that gets disrupted when the second thread tries to initialise the hub. What was the basis for using a lock on the bus rather than the controller? Does the spec say that buses of the same controller can be initialised in parallel? Mathias previously said: > Just found an additional note in the xhci specs section 4.5.3 saying that: > "Note: Software shall not transition more than one Device Slot to the Default > State at a time" > which is what xhci_setup_device() does in addition to moving slots to the > addressed state But I don't know if that means you can do the reset/set address/read descriptors in parallel? > > @@ -4312,7 +4312,7 @@ hub_port_init(struct usb_hub *hub, struct usb_device > > *udev, int port1, > > if (oldspeed == USB_SPEED_LOW) > > delay = HUB_LONG_RESET_TIME; > > > > - mutex_lock(>bus->usb_address0_mutex); > > + mutex_lock(>bus->controller->mutex); > > > > /* Reset the device; full speed may morph to high speed */ > > /* FIXME a USB 2.0 device may morph into SuperSpeed on reset. */ > > @@ -4588,7 +4588,7 @@ fail: > > hub_port_disable(hub, port1, 0); > > update_devnum(udev, devnum);/* for disconnect processing */ > > } > > - mutex_unlock(>bus->usb_address0_mutex); > > + mutex_unlock(>bus->controller->mutex); > > return retval; > > } > > I don't think this is a good idea. The driver core needs to be able to > access the controller while this function is running. You can > introduce a new mutex if you want, perhaps in the primary hcd > structure, but don't use bus->controller->mutex. An explicit lock might be a good idea. I was trying to avoid adding another lock so used the one in struct device as it appeared unused. The XHCI code seems to only use the lock in struct xhci_hcd and ehci uses struct ehci->lock. btw I think this bug may be the same as reported at https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1437492
[PATCH] usb: host: xhci: Replace bus lock with host controller lock
The XHCI controller presents two USB buses to the system - one for USB 2 and one for USB 3. When only one bus is locked there is a race condition during hub init that results in errors like: [ 13.183701] usb 3-3: device descriptor read/all, error -110 On a test system this error occurred on 6% of boots. Fix this by locking the bus controller instead of the bus, thus preventing simultaneous hub init on both buses. Fixes: 638139eb95d2 ("usb: hub: allow to process more usb hub events in parallel") Signed-off-by: Chris Bainbridge --- drivers/usb/core/hcd.c | 2 +- drivers/usb/core/hub.c | 8 include/linux/usb.h| 3 +-- 3 files changed, 6 insertions(+), 7 deletions(-) diff --git a/drivers/usb/core/hcd.c b/drivers/usb/core/hcd.c index df0e3b92533a..c468d047ec5e 100644 --- a/drivers/usb/core/hcd.c +++ b/drivers/usb/core/hcd.c @@ -966,7 +966,7 @@ static void usb_bus_init (struct usb_bus *bus) bus->bandwidth_allocated = 0; bus->bandwidth_int_reqs = 0; bus->bandwidth_isoc_reqs = 0; - mutex_init(>usb_address0_mutex); + mutex_init(>devnum_next_mutex); INIT_LIST_HEAD (>bus_list); } diff --git a/drivers/usb/core/hub.c b/drivers/usb/core/hub.c index 350dcd9af5d8..e03c2f05c2f6 100644 --- a/drivers/usb/core/hub.c +++ b/drivers/usb/core/hub.c @@ -2066,7 +2066,7 @@ static void choose_devnum(struct usb_device *udev) struct usb_bus *bus = udev->bus; /* be safe when more hub events are proceed in parallel */ - mutex_lock(>usb_address0_mutex); + mutex_lock(>devnum_next_mutex); if (udev->wusb) { devnum = udev->portnum + 1; BUG_ON(test_bit(devnum, bus->devmap.devicemap)); @@ -2084,7 +2084,7 @@ static void choose_devnum(struct usb_device *udev) set_bit(devnum, bus->devmap.devicemap); udev->devnum = devnum; } - mutex_unlock(>usb_address0_mutex); + mutex_unlock(>devnum_next_mutex); } static void release_devnum(struct usb_device *udev) @@ -4312,7 +4312,7 @@ hub_port_init(struct usb_hub *hub, struct usb_device *udev, int port1, if (oldspeed == USB_SPEED_LOW) delay = HUB_LONG_RESET_TIME; - mutex_lock(>bus->usb_address0_mutex); + mutex_lock(>bus->controller->mutex); /* Reset the device; full speed may morph to high speed */ /* FIXME a USB 2.0 device may morph into SuperSpeed on reset. */ @@ -4588,7 +4588,7 @@ fail: hub_port_disable(hub, port1, 0); update_devnum(udev, devnum);/* for disconnect processing */ } - mutex_unlock(>bus->usb_address0_mutex); + mutex_unlock(>bus->controller->mutex); return retval; } diff --git a/include/linux/usb.h b/include/linux/usb.h index 89533ba38691..6b736c82b9d1 100644 --- a/include/linux/usb.h +++ b/include/linux/usb.h @@ -371,14 +371,13 @@ struct usb_bus { int devnum_next;/* Next open device number in * round-robin allocation */ + struct mutex devnum_next_mutex; /* devnum_next mutex */ struct usb_devmap devmap; /* device address allocation map */ struct usb_device *root_hub;/* Root hub */ struct usb_bus *hs_companion; /* Companion EHCI bus, if any */ struct list_head bus_list; /* list of busses */ - struct mutex usb_address0_mutex; /* unaddressed device mutex */ - int bandwidth_allocated;/* on this bus: how much of the time * reserved for periodic (intr/iso) * requests is used, on average? -- 2.1.4
Re: [PATCH] usb: host: xhci: Replace bus lock with host controller lock
On Thu, Feb 04, 2016 at 04:00:51PM -0500, Alan Stern wrote: > On Thu, 4 Feb 2016, Chris Bainbridge wrote: > > > The XHCI controller presents two USB buses to the system - one for USB 2 > > and one for USB 3. When only one bus is locked there is a race condition > > during hub init that results in errors like: > > > > [ 13.183701] usb 3-3: device descriptor read/all, error -110 > > What exactly is the race condition? Why does locking both buses fix > it? [2.692571] xhci_hcd :00:14.0: xHCI Host Controller [2.693279] xhci_hcd :00:14.0: new USB bus registered, assigned bus number 3 [2.694867] xhci_hcd :00:14.0: hcc params 0x20007181 hci version 0x100 quirks 0xb930 [2.694880] xhci_hcd :00:14.0: cache line size of 256 is not supported [2.695995] usb usb3: New USB device found, idVendor=1d6b, idProduct=0002 [2.696005] usb usb3: New USB device strings: Mfr=3, Product=2, SerialNumber=1 [2.696016] usb usb3: Product: xHCI Host Controller [2.696024] usb usb3: Manufacturer: Linux 4.4.0 xhci-hcd [2.696031] usb usb3: SerialNumber: :00:14.0 [2.698414] hub 3-0:1.0: USB hub found [2.704723] xhci_hcd :00:14.0: xHCI Host Controller [2.705502] xhci_hcd :00:14.0: new USB bus registered, assigned bus number 4 [2.706136] usb usb4: New USB device found, idVendor=1d6b, idProduct=0003 [2.706138] usb usb4: New USB device strings: Mfr=3, Product=2, SerialNumber=1 [2.706139] usb usb4: Product: xHCI Host Controller [2.706140] usb usb4: Manufacturer: Linux 4.4.0 xhci-hcd [2.706141] usb usb4: SerialNumber: :00:14.0 [2.708467] hub 4-0:1.0: USB hub found [3.021779] usb 3-3: new high-speed USB device number 2 using xhci_hcd [8.034843] xhci_hcd :00:14.0: Timeout while waiting for setup device command ...bus4 enumeration... [ 13.297835] usb 3-3: device descriptor read/all, error -110 ...bus3 enumeration... Note there seem to be two timeouts of 5 seconds/10 seconds after the last message at 3.021779 seconds. hub_port_init is called in parallel for both buses. The first thread is in usb_get_device_descriptor when the second one enters the function and calls the code to get an address. I don't know precisely how it fails - it looks like the functions for doing the initialisation are synchronous and sleeping waiting for a response and that gets disrupted when the second thread tries to initialise the hub. What was the basis for using a lock on the bus rather than the controller? Does the spec say that buses of the same controller can be initialised in parallel? Mathias previously said: > Just found an additional note in the xhci specs section 4.5.3 saying that: > "Note: Software shall not transition more than one Device Slot to the Default > State at a time" > which is what xhci_setup_device() does in addition to moving slots to the > addressed state But I don't know if that means you can do the reset/set address/read descriptors in parallel? > > @@ -4312,7 +4312,7 @@ hub_port_init(struct usb_hub *hub, struct usb_device > > *udev, int port1, > > if (oldspeed == USB_SPEED_LOW) > > delay = HUB_LONG_RESET_TIME; > > > > - mutex_lock(>bus->usb_address0_mutex); > > + mutex_lock(>bus->controller->mutex); > > > > /* Reset the device; full speed may morph to high speed */ > > /* FIXME a USB 2.0 device may morph into SuperSpeed on reset. */ > > @@ -4588,7 +4588,7 @@ fail: > > hub_port_disable(hub, port1, 0); > > update_devnum(udev, devnum);/* for disconnect processing */ > > } > > - mutex_unlock(>bus->usb_address0_mutex); > > + mutex_unlock(>bus->controller->mutex); > > return retval; > > } > > I don't think this is a good idea. The driver core needs to be able to > access the controller while this function is running. You can > introduce a new mutex if you want, perhaps in the primary hcd > structure, but don't use bus->controller->mutex. An explicit lock might be a good idea. I was trying to avoid adding another lock so used the one in struct device as it appeared unused. The XHCI code seems to only use the lock in struct xhci_hcd and ehci uses struct ehci->lock. btw I think this bug may be the same as reported at https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1437492
[PATCH] usb: host: xhci: Replace bus lock with host controller lock
The XHCI controller presents two USB buses to the system - one for USB 2 and one for USB 3. When only one bus is locked there is a race condition during hub init that results in errors like: [ 13.183701] usb 3-3: device descriptor read/all, error -110 On a test system this error occurred on 6% of boots. Fix this by locking the bus controller instead of the bus, thus preventing simultaneous hub init on both buses. Fixes: 638139eb95d2 ("usb: hub: allow to process more usb hub events in parallel") Signed-off-by: Chris Bainbridge <chris.bainbri...@gmail.com> --- drivers/usb/core/hcd.c | 2 +- drivers/usb/core/hub.c | 8 include/linux/usb.h| 3 +-- 3 files changed, 6 insertions(+), 7 deletions(-) diff --git a/drivers/usb/core/hcd.c b/drivers/usb/core/hcd.c index df0e3b92533a..c468d047ec5e 100644 --- a/drivers/usb/core/hcd.c +++ b/drivers/usb/core/hcd.c @@ -966,7 +966,7 @@ static void usb_bus_init (struct usb_bus *bus) bus->bandwidth_allocated = 0; bus->bandwidth_int_reqs = 0; bus->bandwidth_isoc_reqs = 0; - mutex_init(>usb_address0_mutex); + mutex_init(>devnum_next_mutex); INIT_LIST_HEAD (>bus_list); } diff --git a/drivers/usb/core/hub.c b/drivers/usb/core/hub.c index 350dcd9af5d8..e03c2f05c2f6 100644 --- a/drivers/usb/core/hub.c +++ b/drivers/usb/core/hub.c @@ -2066,7 +2066,7 @@ static void choose_devnum(struct usb_device *udev) struct usb_bus *bus = udev->bus; /* be safe when more hub events are proceed in parallel */ - mutex_lock(>usb_address0_mutex); + mutex_lock(>devnum_next_mutex); if (udev->wusb) { devnum = udev->portnum + 1; BUG_ON(test_bit(devnum, bus->devmap.devicemap)); @@ -2084,7 +2084,7 @@ static void choose_devnum(struct usb_device *udev) set_bit(devnum, bus->devmap.devicemap); udev->devnum = devnum; } - mutex_unlock(>usb_address0_mutex); + mutex_unlock(>devnum_next_mutex); } static void release_devnum(struct usb_device *udev) @@ -4312,7 +4312,7 @@ hub_port_init(struct usb_hub *hub, struct usb_device *udev, int port1, if (oldspeed == USB_SPEED_LOW) delay = HUB_LONG_RESET_TIME; - mutex_lock(>bus->usb_address0_mutex); + mutex_lock(>bus->controller->mutex); /* Reset the device; full speed may morph to high speed */ /* FIXME a USB 2.0 device may morph into SuperSpeed on reset. */ @@ -4588,7 +4588,7 @@ fail: hub_port_disable(hub, port1, 0); update_devnum(udev, devnum);/* for disconnect processing */ } - mutex_unlock(>bus->usb_address0_mutex); + mutex_unlock(>bus->controller->mutex); return retval; } diff --git a/include/linux/usb.h b/include/linux/usb.h index 89533ba38691..6b736c82b9d1 100644 --- a/include/linux/usb.h +++ b/include/linux/usb.h @@ -371,14 +371,13 @@ struct usb_bus { int devnum_next;/* Next open device number in * round-robin allocation */ + struct mutex devnum_next_mutex; /* devnum_next mutex */ struct usb_devmap devmap; /* device address allocation map */ struct usb_device *root_hub;/* Root hub */ struct usb_bus *hs_companion; /* Companion EHCI bus, if any */ struct list_head bus_list; /* list of busses */ - struct mutex usb_address0_mutex; /* unaddressed device mutex */ - int bandwidth_allocated;/* on this bus: how much of the time * reserved for periodic (intr/iso) * requests is used, on average? -- 2.1.4
[BUG] ACPI Error: ...
v4.5-rc2, Macbook IVB, ACPI errors on boot: [0.618382] ACPI Error: Needed type [Reference], found [Integer] 880260240fa0 (20150930/exresop-103) [0.618448] ACPI Error: Method parse/execution failed [\_PR.CPU0._PDC] (Node 880261acb9e0), AE_AML_OPERAND_TYPE (20150930/psparse-542) [1.581790] ACPI Error: [\_PR_.CPU0._CST] Namespace lookup failure, AE_NOT_FOUND (20150930/psargs-360) [1.581885] ACPI Error: Method parse/execution failed [\_PR.CPU1._CST] (Node 88025f9962e0), AE_NOT_FOUND (20150930/psparse-542) [1.585873] ACPI Error: [\_PR_.CPU0._CST] Namespace lookup failure, AE_NOT_FOUND (20150930/psargs-360) [1.585968] ACPI Error: Method parse/execution failed [\_PR.CPU2._CST] (Node 88025f997b50), AE_NOT_FOUND (20150930/psparse-542) [1.590267] ACPI Error: [\_PR_.CPU0._CST] Namespace lookup failure, AE_NOT_FOUND (20150930/psargs-360) [1.590360] ACPI Error: Method parse/execution failed [\_PR.CPU3._CST] (Node 88025f996450), AE_NOT_FOUND (20150930/psparse-542) ae90fbf562d733a392c7a0ffefe1e09b5a31c99c is the first bad commit commit ae90fbf562d733a392c7a0ffefe1e09b5a31c99c Author: Bob Moore Date: Tue Dec 29 14:00:14 2015 +0800 ACPICA: Parser: Fix for SuperName method invocation ACPICA commit 4b86d1046d06e462dae83ebcd5a66cc132a08f8f SuperName parameters that are in fact control method invocations were not handled correctly by the parser. This change fixes the problem by identifying these properly as method invocations. This affects about 14 different ASL operators that contain SuperName parameters. ACPICA BZ 1002. Link: https://github.com/acpica/acpica/commit/4b86d104 Link: https://bugs.acpica.org/show_bug.cgi?id=1002 Signed-off-by: Bob Moore Signed-off-by: Lv Zheng Signed-off-by: Rafael J. Wysocki :04 04 3c9015e32a1af1bbe211d455138e4940fc313543 441172b0f1209cd86fc68f70a0ce7f0adb1f9e33 M drivers
[BUG] ODEBUG: assert_init not available (active state 0)
v4.5-rc2. Macbook 2012 (IVB). ODEBUG reports an error or two at shutdown. Apart from the ODEBUG error(s), this bug also seems to cause some graphical corruption when typing in Firefox text fields. efaed9be998b5ae0afb7458e057e5f4402b43fa0 is the first bad commit commit efaed9be998b5ae0afb7458e057e5f4402b43fa0 Author: Lv Zheng Date: Tue Dec 29 14:03:08 2015 +0800 ACPICA: Events: Enhance acpi_ev_execute_reg_method() to ensure no _REG evaluations can happen during OS early boot stages ACPICA commit 31178590dde82368fdb0f6b0e466b6c0add96c57 We can ensure no early _REG evaluations by ensuring the following rules in acpi_ev_execute_reg_method(): 1. If an address space handler is installed during early stage, _REG(CONNECT) evaluations are blocked. This is achieved using acpi_gbl_reg_methods_enabled which is renamed from acpi_gbl_reg_methods_executed. 2. If _REG(CONNECT) has never been evalauted for the region object, _REG(DISCONNECT) evaluations are blocked. This is achieved by a new region object flag: AOPOBJ_REG_CONNECTED. Note that, after applying this patch, we can ensure _REG(DISCONNECT) is always paired to _REG(CONNECT). Lv Zheng Link: https://github.com/acpica/acpica/commit/31178590 Signed-off-by: Lv Zheng Signed-off-by: Bob Moore Signed-off-by: Rafael J. Wysocki :04 04 357c50e20ee06ad557c77a2a03d09719dae9dc52 3e0610826dc9cb86454baf06de9392aa31e0aa91 M drivers [ 34.512758] [ cut here ] [ 34.512765] WARNING: CPU: 0 PID: 4975 at lib/debugobjects.c:263 debug_print_object+0x85/0xa0() [ 34.512770] ODEBUG: assert_init not available (active state 0) object type: timer_list hint: stub_timer+0x0/0x20 [ 34.512772] Modules linked in: [ 34.512774] CPU: 0 PID: 4975 Comm: systemd Not tainted 4.4.0-rc7+ #353 [ 34.512776] Hardware name: Apple Inc. MacBookPro10,2/Mac-AFD8A9D944EA4843, BIOS MBP102.88Z.0106.B0A.1509130955 09/13/2015 [ 34.512779] 81f9b41c 880227a1bdb0 814ec829 880227a1bdf8 [ 34.512782] 880227a1bde8 810cd831 880227a1be90 822514c0 [ 34.512785] 81f9b4c2 8327e288 7f29190ba700 880227a1be48 [ 34.512786] Call Trace: [ 34.512790] [] dump_stack+0x4b/0x72 [ 34.512793] [] warn_slowpath_common+0x81/0xc0 [ 34.512795] [] warn_slowpath_fmt+0x47/0x50 [ 34.512798] [] ? do_init_timer+0x52/0x60 [ 34.512800] [] debug_print_object+0x85/0xa0 [ 34.512802] [] ? trace_event_raw_event_tick_stop+0x100/0x100 [ 34.512805] [] debug_object_assert_init+0xf8/0x130 [ 34.512807] [] ? trace_hardirqs_on+0xd/0x10 [ 34.512810] [] del_timer+0x1f/0x70 [ 34.512813] [] laptop_sync_completion+0x61/0x100 [ 34.512815] [] ? laptop_io_completion+0x30/0x30 [ 34.512819] [] sys_sync+0x7f/0x90 [ 34.512824] [] entry_SYSCALL_64_fastpath+0x12/0x6f [ 34.512825] ---[ end trace 8fe52cbaccc47e66 ]--- [ 34.512830] [ cut here ] [ 34.512833] WARNING: CPU: 0 PID: 4975 at lib/debugobjects.c:263 debug_print_object+0x85/0xa0() [ 34.512835] ODEBUG: assert_init not available (active state 0) object type: timer_list hint: stub_timer+0x0/0x20 [ 34.512836] Modules linked in: [ 34.512838] CPU: 0 PID: 4975 Comm: systemd Tainted: GW 4.4.0-rc7+ #353 [ 34.512839] Hardware name: Apple Inc. MacBookPro10,2/Mac-AFD8A9D944EA4843, BIOS MBP102.88Z.0106.B0A.1509130955 09/13/2015 [ 34.512842] 81f9b41c 880227a1bdb0 814ec829 880227a1bdf8 [ 34.512845] 880227a1bde8 810cd831 880227a1be90 822514c0 [ 34.512847] 81f9b4c2 8331f388 7f29190ba700 880227a1be48 [ 34.512848] Call Trace: [ 34.512850] [] dump_stack+0x4b/0x72 [ 34.512853] [] warn_slowpath_common+0x81/0xc0 [ 34.512855] [] warn_slowpath_fmt+0x47/0x50 [ 34.512857] [] ? do_init_timer+0x52/0x60 [ 34.512859] [] debug_print_object+0x85/0xa0 [ 34.512861] [] ? trace_event_raw_event_tick_stop+0x100/0x100 [ 34.512864] [] debug_object_assert_init+0xf8/0x130 [ 34.512865] [] ? trace_hardirqs_on+0xd/0x10 [ 34.512868] [] del_timer+0x1f/0x70 [ 34.512869] [] laptop_sync_completion+0x61/0x100 [ 34.512871] [] ? laptop_io_completion+0x30/0x30 [ 34.512873] [] sys_sync+0x7f/0x90 [ 34.512876] [] entry_SYSCALL_64_fastpath+0x12/0x6f [ 34.512877] ---[ end trace 8fe52cbaccc47e67 ]---
[BUG] ODEBUG: assert_init not available (active state 0)
v4.5-rc2. Macbook 2012 (IVB). ODEBUG reports an error or two at shutdown. Apart from the ODEBUG error(s), this bug also seems to cause some graphical corruption when typing in Firefox text fields. efaed9be998b5ae0afb7458e057e5f4402b43fa0 is the first bad commit commit efaed9be998b5ae0afb7458e057e5f4402b43fa0 Author: Lv ZhengDate: Tue Dec 29 14:03:08 2015 +0800 ACPICA: Events: Enhance acpi_ev_execute_reg_method() to ensure no _REG evaluations can happen during OS early boot stages ACPICA commit 31178590dde82368fdb0f6b0e466b6c0add96c57 We can ensure no early _REG evaluations by ensuring the following rules in acpi_ev_execute_reg_method(): 1. If an address space handler is installed during early stage, _REG(CONNECT) evaluations are blocked. This is achieved using acpi_gbl_reg_methods_enabled which is renamed from acpi_gbl_reg_methods_executed. 2. If _REG(CONNECT) has never been evalauted for the region object, _REG(DISCONNECT) evaluations are blocked. This is achieved by a new region object flag: AOPOBJ_REG_CONNECTED. Note that, after applying this patch, we can ensure _REG(DISCONNECT) is always paired to _REG(CONNECT). Lv Zheng Link: https://github.com/acpica/acpica/commit/31178590 Signed-off-by: Lv Zheng Signed-off-by: Bob Moore Signed-off-by: Rafael J. Wysocki :04 04 357c50e20ee06ad557c77a2a03d09719dae9dc52 3e0610826dc9cb86454baf06de9392aa31e0aa91 M drivers [ 34.512758] [ cut here ] [ 34.512765] WARNING: CPU: 0 PID: 4975 at lib/debugobjects.c:263 debug_print_object+0x85/0xa0() [ 34.512770] ODEBUG: assert_init not available (active state 0) object type: timer_list hint: stub_timer+0x0/0x20 [ 34.512772] Modules linked in: [ 34.512774] CPU: 0 PID: 4975 Comm: systemd Not tainted 4.4.0-rc7+ #353 [ 34.512776] Hardware name: Apple Inc. MacBookPro10,2/Mac-AFD8A9D944EA4843, BIOS MBP102.88Z.0106.B0A.1509130955 09/13/2015 [ 34.512779] 81f9b41c 880227a1bdb0 814ec829 880227a1bdf8 [ 34.512782] 880227a1bde8 810cd831 880227a1be90 822514c0 [ 34.512785] 81f9b4c2 8327e288 7f29190ba700 880227a1be48 [ 34.512786] Call Trace: [ 34.512790] [] dump_stack+0x4b/0x72 [ 34.512793] [] warn_slowpath_common+0x81/0xc0 [ 34.512795] [] warn_slowpath_fmt+0x47/0x50 [ 34.512798] [] ? do_init_timer+0x52/0x60 [ 34.512800] [] debug_print_object+0x85/0xa0 [ 34.512802] [] ? trace_event_raw_event_tick_stop+0x100/0x100 [ 34.512805] [] debug_object_assert_init+0xf8/0x130 [ 34.512807] [] ? trace_hardirqs_on+0xd/0x10 [ 34.512810] [] del_timer+0x1f/0x70 [ 34.512813] [] laptop_sync_completion+0x61/0x100 [ 34.512815] [] ? laptop_io_completion+0x30/0x30 [ 34.512819] [] sys_sync+0x7f/0x90 [ 34.512824] [] entry_SYSCALL_64_fastpath+0x12/0x6f [ 34.512825] ---[ end trace 8fe52cbaccc47e66 ]--- [ 34.512830] [ cut here ] [ 34.512833] WARNING: CPU: 0 PID: 4975 at lib/debugobjects.c:263 debug_print_object+0x85/0xa0() [ 34.512835] ODEBUG: assert_init not available (active state 0) object type: timer_list hint: stub_timer+0x0/0x20 [ 34.512836] Modules linked in: [ 34.512838] CPU: 0 PID: 4975 Comm: systemd Tainted: GW 4.4.0-rc7+ #353 [ 34.512839] Hardware name: Apple Inc. MacBookPro10,2/Mac-AFD8A9D944EA4843, BIOS MBP102.88Z.0106.B0A.1509130955 09/13/2015 [ 34.512842] 81f9b41c 880227a1bdb0 814ec829 880227a1bdf8 [ 34.512845] 880227a1bde8 810cd831 880227a1be90 822514c0 [ 34.512847] 81f9b4c2 8331f388 7f29190ba700 880227a1be48 [ 34.512848] Call Trace: [ 34.512850] [] dump_stack+0x4b/0x72 [ 34.512853] [] warn_slowpath_common+0x81/0xc0 [ 34.512855] [] warn_slowpath_fmt+0x47/0x50 [ 34.512857] [] ? do_init_timer+0x52/0x60 [ 34.512859] [] debug_print_object+0x85/0xa0 [ 34.512861] [] ? trace_event_raw_event_tick_stop+0x100/0x100 [ 34.512864] [] debug_object_assert_init+0xf8/0x130 [ 34.512865] [] ? trace_hardirqs_on+0xd/0x10 [ 34.512868] [] del_timer+0x1f/0x70 [ 34.512869] [] laptop_sync_completion+0x61/0x100 [ 34.512871] [] ? laptop_io_completion+0x30/0x30 [ 34.512873] [] sys_sync+0x7f/0x90 [ 34.512876] [] entry_SYSCALL_64_fastpath+0x12/0x6f [ 34.512877] ---[ end trace 8fe52cbaccc47e67 ]---
[BUG] ACPI Error: ...
v4.5-rc2, Macbook IVB, ACPI errors on boot: [0.618382] ACPI Error: Needed type [Reference], found [Integer] 880260240fa0 (20150930/exresop-103) [0.618448] ACPI Error: Method parse/execution failed [\_PR.CPU0._PDC] (Node 880261acb9e0), AE_AML_OPERAND_TYPE (20150930/psparse-542) [1.581790] ACPI Error: [\_PR_.CPU0._CST] Namespace lookup failure, AE_NOT_FOUND (20150930/psargs-360) [1.581885] ACPI Error: Method parse/execution failed [\_PR.CPU1._CST] (Node 88025f9962e0), AE_NOT_FOUND (20150930/psparse-542) [1.585873] ACPI Error: [\_PR_.CPU0._CST] Namespace lookup failure, AE_NOT_FOUND (20150930/psargs-360) [1.585968] ACPI Error: Method parse/execution failed [\_PR.CPU2._CST] (Node 88025f997b50), AE_NOT_FOUND (20150930/psparse-542) [1.590267] ACPI Error: [\_PR_.CPU0._CST] Namespace lookup failure, AE_NOT_FOUND (20150930/psargs-360) [1.590360] ACPI Error: Method parse/execution failed [\_PR.CPU3._CST] (Node 88025f996450), AE_NOT_FOUND (20150930/psparse-542) ae90fbf562d733a392c7a0ffefe1e09b5a31c99c is the first bad commit commit ae90fbf562d733a392c7a0ffefe1e09b5a31c99c Author: Bob MooreDate: Tue Dec 29 14:00:14 2015 +0800 ACPICA: Parser: Fix for SuperName method invocation ACPICA commit 4b86d1046d06e462dae83ebcd5a66cc132a08f8f SuperName parameters that are in fact control method invocations were not handled correctly by the parser. This change fixes the problem by identifying these properly as method invocations. This affects about 14 different ASL operators that contain SuperName parameters. ACPICA BZ 1002. Link: https://github.com/acpica/acpica/commit/4b86d104 Link: https://bugs.acpica.org/show_bug.cgi?id=1002 Signed-off-by: Bob Moore Signed-off-by: Lv Zheng Signed-off-by: Rafael J. Wysocki :04 04 3c9015e32a1af1bbe211d455138e4940fc313543 441172b0f1209cd86fc68f70a0ce7f0adb1f9e33 M drivers
[PATCH] net/mac80211/agg-rx.c: fix use of uninitialised values
Use kzalloc instead of kmalloc for struct tid_ampdu_rx. Fixes: [7.976605] UBSAN: Undefined behaviour in net/mac80211/rx.c:932:29 [7.976608] load of value 2 is not a valid value for type '_Bool' [7.976611] CPU: 3 PID: 1134 Comm: kworker/u16:7 Not tainted 4.5.0-rc1+ #265 [7.976613] Hardware name: Apple Inc. MacBookPro10,2/Mac-AFD8A9D944EA4843, BIOS MBP102.88Z.0106.B0A.1509130955 09/13/2015 [7.976616] Workqueue: phy0 rt2x00usb_work_rxdone [7.976619] 0004 880254a7ba50 8181d866 0007 [7.976622] 880254a7ba78 880254a7ba68 8188422d 8379b500 [7.976626] 880254a7bab8 81884747 0202 000348620032 [7.976629] Call Trace: [7.976633] [] dump_stack+0x45/0x5f [7.976637] [] ubsan_epilogue+0xd/0x40 [7.976642] [] __ubsan_handle_load_invalid_value+0x67/0x70 [7.976646] [] ieee80211_sta_reorder_release.isra.16+0x5ed/0x730 [7.976650] [] ieee80211_prepare_and_rx_handle+0xd04/0x1c00 [7.976654] [] ? usb_hcd_map_urb_for_dma+0x65e/0x960 [7.976659] [] __ieee80211_rx_handle_packet+0x1f3/0x750 [7.976663] [] ieee80211_rx_napi+0x447/0x990 [7.976667] [] rt2x00lib_rxdone+0x305/0xbd0 [7.976670] [] ? dequeue_task_fair+0x64f/0x1de0 [7.976674] [] ? sched_clock_cpu+0xe6/0x150 [7.976678] [] rt2x00usb_work_rxdone+0x7c/0x140 [7.976682] [] process_one_work+0x226/0x860 [7.976686] [] worker_thread+0x5c/0x680 [7.976690] [] ? process_one_work+0x860/0x860 [7.976693] [] kthread+0xf6/0x150 [7.976697] [] ? kthread_worker_fn+0x310/0x310 [7.976700] [] ret_from_fork+0x3f/0x70 [7.976703] [] ? kthread_worker_fn+0x310/0x310 Link: https://lkml.org/lkml/2016/1/26/230 Signed-off-by: Chris Bainbridge --- net/mac80211/agg-rx.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/net/mac80211/agg-rx.c b/net/mac80211/agg-rx.c index 10ad4ac1fa0b..bde3344cbdd0 100644 --- a/net/mac80211/agg-rx.c +++ b/net/mac80211/agg-rx.c @@ -291,7 +291,7 @@ void __ieee80211_start_rx_ba_session(struct sta_info *sta, } /* prepare A-MPDU MLME for Rx aggregation */ - tid_agg_rx = kmalloc(sizeof(struct tid_ampdu_rx), GFP_KERNEL); + tid_agg_rx = kzalloc(sizeof(struct tid_ampdu_rx), GFP_KERNEL); if (!tid_agg_rx) goto end; -- 2.1.4
[PATCH] net/mac80211/agg-rx.c: fix use of uninitialised values
Use kzalloc instead of kmalloc for struct tid_ampdu_rx. Fixes: [7.976605] UBSAN: Undefined behaviour in net/mac80211/rx.c:932:29 [7.976608] load of value 2 is not a valid value for type '_Bool' [7.976611] CPU: 3 PID: 1134 Comm: kworker/u16:7 Not tainted 4.5.0-rc1+ #265 [7.976613] Hardware name: Apple Inc. MacBookPro10,2/Mac-AFD8A9D944EA4843, BIOS MBP102.88Z.0106.B0A.1509130955 09/13/2015 [7.976616] Workqueue: phy0 rt2x00usb_work_rxdone [7.976619] 0004 880254a7ba50 8181d866 0007 [7.976622] 880254a7ba78 880254a7ba68 8188422d 8379b500 [7.976626] 880254a7bab8 81884747 0202 000348620032 [7.976629] Call Trace: [7.976633] [] dump_stack+0x45/0x5f [7.976637] [] ubsan_epilogue+0xd/0x40 [7.976642] [] __ubsan_handle_load_invalid_value+0x67/0x70 [7.976646] [] ieee80211_sta_reorder_release.isra.16+0x5ed/0x730 [7.976650] [] ieee80211_prepare_and_rx_handle+0xd04/0x1c00 [7.976654] [] ? usb_hcd_map_urb_for_dma+0x65e/0x960 [7.976659] [] __ieee80211_rx_handle_packet+0x1f3/0x750 [7.976663] [] ieee80211_rx_napi+0x447/0x990 [7.976667] [] rt2x00lib_rxdone+0x305/0xbd0 [7.976670] [] ? dequeue_task_fair+0x64f/0x1de0 [7.976674] [] ? sched_clock_cpu+0xe6/0x150 [7.976678] [] rt2x00usb_work_rxdone+0x7c/0x140 [7.976682] [] process_one_work+0x226/0x860 [7.976686] [] worker_thread+0x5c/0x680 [7.976690] [] ? process_one_work+0x860/0x860 [7.976693] [] kthread+0xf6/0x150 [7.976697] [] ? kthread_worker_fn+0x310/0x310 [7.976700] [] ret_from_fork+0x3f/0x70 [7.976703] [] ? kthread_worker_fn+0x310/0x310 Link: https://lkml.org/lkml/2016/1/26/230 Signed-off-by: Chris Bainbridge <chris.bainbri...@gmail.com> --- net/mac80211/agg-rx.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/net/mac80211/agg-rx.c b/net/mac80211/agg-rx.c index 10ad4ac1fa0b..bde3344cbdd0 100644 --- a/net/mac80211/agg-rx.c +++ b/net/mac80211/agg-rx.c @@ -291,7 +291,7 @@ void __ieee80211_start_rx_ba_session(struct sta_info *sta, } /* prepare A-MPDU MLME for Rx aggregation */ - tid_agg_rx = kmalloc(sizeof(struct tid_ampdu_rx), GFP_KERNEL); + tid_agg_rx = kzalloc(sizeof(struct tid_ampdu_rx), GFP_KERNEL); if (!tid_agg_rx) goto end; -- 2.1.4
UBSAN: Undefined behaviour in fs/ext4/mballoc.c:1274:11
Mounting ext4 fs on 4.5.0-rc1 [ 1176.592557] [ 1176.592565] UBSAN: Undefined behaviour in fs/ext4/mballoc.c:1274:11 [ 1176.592567] shift exponent -1 is negative [ 1176.592571] CPU: 3 PID: 4976 Comm: QThread Not tainted 4.5.0-rc1 #252 [ 1176.592573] Hardware name: Apple Inc. MacBookPro10,2/Mac-AFD8A9D944EA4843, BIOS MBP102.88Z.0106.B0A.1509130955 09/13/2015 [ 1176.592575] 8379dede 88021f4bf8b8 [ 1176.592578] 81b2e7d9 0007 88021f4bf8e8 88021f4bf8d0 [ 1176.592581] 81bcb87d 88021f4bf968 81bcc1c1 [ 1176.592583] Call Trace: [ 1176.592590] [] dump_stack+0x45/0x6c [ 1176.592595] [] ubsan_epilogue+0xd/0x40 [ 1176.592597] [] __ubsan_handle_shift_out_of_bounds+0xf1/0x140 [ 1176.592601] [] ? ext4_mb_init_cache+0x677/0x1570 [ 1176.592604] [] ? lru_cache_add+0x9/0x10 [ 1176.592607] [] mb_find_order_for_block+0x132/0x1e0 [ 1176.592609] [] mb_find_extent+0xde/0x6d0 [ 1176.592611] [] ext4_mb_complex_scan_group+0x14b/0x8a0 [ 1176.592614] [] ext4_mb_regular_allocator+0x3a7/0xcc0 [ 1176.592616] [] ext4_mb_new_blocks+0x4c5/0x1270 [ 1176.592620] [] ? ext4_ext_put_gap_in_cache+0x116/0x230 [ 1176.592622] [] ext4_ext_map_blocks+0x81c/0x2280 [ 1176.592626] [] ext4_map_blocks+0x1b6/0xe00 [ 1176.592628] [] ext4_getblk+0x49/0x290 [ 1176.592630] [] ext4_bread+0xe/0xb0 [ 1176.592633] [] ext4_append+0xad/0x340 [ 1176.592635] [] ext4_mkdir+0x344/0x970 [ 1176.592639] [] vfs_mkdir+0x103/0x310 [ 1176.592642] [] SyS_mkdir+0x93/0x130 [ 1176.592645] [] entry_SYSCALL_64_fastpath+0x12/0x6a [ 1176.592647]
Re: UBSAN: Undefined behaviour in fs/btrfs/inode.c:5845:10
On Tue, Jan 26, 2016 at 12:03:39PM +0100, David Sterba wrote: > On Tue, Jan 26, 2016 at 10:13:33AM +0000, Chris Bainbridge wrote: > > Booting 4.5.0-rc1 with new UBSAN checker enabled: > > > > [3.859690] > > > > [3.859694] UBSAN: Undefined behaviour in fs/btrfs/inode.c:5845:10 > > [3.859696] signed integer overflow: > > [3.859697] 9223372036854775807 + 1 cannot be represented in type 'long > > long int' > > [3.859701] CPU: 3 PID: 3237 Comm: polkitd Not tainted 4.5.0-rc1 #252 > > [3.859702] Hardware name: Apple Inc. > > MacBookPro10,2/Mac-AFD8A9D944EA4843, BIOS MBP102.88Z.0106.B0A.1509130955 > > 09/13/2015 > > [3.859706] 0001 > > 88024d4bfcf8 > > [3.859709] 81b2e7d9 0001 88024d4bfd28 > > 88024d4bfd10 > > [3.859711] 81bcb87d 83aceb48 88024d4bfd98 > > 81bcbc4d > > [3.859712] Call Trace: > > [3.859719] [] dump_stack+0x45/0x6c > > [3.859723] [] ubsan_epilogue+0xd/0x40 > > [3.859725] [] handle_overflow+0xbd/0xe0 > > [3.859728] [] __ubsan_handle_add_overflow+0xe/0x10 > > [3.859732] [] btrfs_real_readdir+0x881/0xc10 > > [3.859737] [] iterate_dir+0xdd/0x2d0 > > [3.859740] [] SyS_getdents+0x9b/0x110 > > [3.859743] [] ? iterate_dir+0x2d0/0x2d0 > > [3.859747] [] entry_SYSCALL_64_fastpath+0x12/0x6a > > [3.859749] > > > > That seems to be the same overflow as reported in the past, caught by > the PaX SIZE_OVERFLOW plugin. There's a patch but not merged yet. > > https://patchwork.kernel.org/patch/7611351/ Yes, the error does not appear with that patch applied.
UBSAN: Undefined behaviour in net/mac80211/rx.c:924:18
4.5.0-rc1 another UBSAN error: [ 4845.229441] [ 4845.229454] UBSAN: Undefined behaviour in net/mac80211/rx.c:924:18 [ 4845.229458] load of value 2 is not a valid value for type '_Bool' [ 4845.229464] CPU: 1 PID: 6266 Comm: kworker/u16:8 Not tainted 4.5.0-rc1 #252 [ 4845.229468] Hardware name: Apple Inc. MacBookPro10,2/Mac-AFD8A9D944EA4843, BIOS MBP102.88Z.0106.B0A.1509130955 09/13/2015 [ 4845.229491] Workqueue: phy2 rt2x00usb_work_rxdone [ 4845.229493] 0002 8801b13c39f8 [ 4845.229496] 81b2e7d9 0007 8801b13c3a20 8801b13c3a10 [ 4845.229498] 81bcb87d 85016890 8801b13c3a60 81bcc279 [ 4845.229501] Call Trace: [ 4845.229506] [] dump_stack+0x45/0x6c [ 4845.229510] [] ubsan_epilogue+0xd/0x40 [ 4845.229513] [] __ubsan_handle_load_invalid_value+0x69/0x80 [ 4845.229517] [] ? xhci_setup_addressable_virt_dev+0xeb2/0x13b0 [ 4845.229520] [] ? pick_next_entity+0xcb/0x280 [ 4845.229524] [] ieee80211_sta_reorder_release.isra.15+0x7e3/0xad0 [ 4845.229527] [] ieee80211_prepare_and_rx_handle+0x11a7/0x2ab0 [ 4845.229530] [] ? xhci_urb_enqueue+0x394/0x1140 [ 4845.229533] [] ? usb_hcd_map_urb_for_dma+0x94f/0x1140 [ 4845.229537] [] ? skb_release_data+0x117/0x2f0 [ 4845.229539] [] __ieee80211_rx_handle_packet+0x26a/0x9a0 [ 4845.229542] [] ? __kmalloc_reserve.isra.11+0x2c/0x80 [ 4845.229545] [] ieee80211_rx_napi+0x651/0x12b0 [ 4845.229549] [] rt2x00lib_rxdone+0x402/0x1120 [ 4845.229552] [] ? dequeue_task_fair+0x97f/0x41d0 [ 4845.229554] [] rt2x00usb_work_rxdone+0xac/0x1f0 [ 4845.229558] [] ? __schedule+0x5cd/0x1770 [ 4845.229561] [] process_one_work+0x266/0xc00 [ 4845.229563] [] worker_thread+0x96/0xd40 [ 4845.229565] [] ? process_scheduled_works+0x70/0x70 [ 4845.229568] [] kthread+0x108/0x180 [ 4845.229571] [] ? kthread_create_on_node+0x210/0x210 [ 4845.229573] [] ret_from_fork+0x3f/0x70 [ 4845.229576] [] ? kthread_create_on_node+0x210/0x210 [ 4845.229577]
UBSAN: Undefined behaviour in net/mac80211/rc80211_minstrel.h:47:34
4.5.0-rc1 with the new UBSAN checker: [ 2624.978741] [ 2624.978749] UBSAN: Undefined behaviour in net/mac80211/rc80211_minstrel.h:47:34 [ 2624.978752] signed integer overflow: [ 2624.978754] -32768 * 100 cannot be represented in type 'int' [ 2624.978759] CPU: 1 PID: 5362 Comm: kworker/u16:2 Not tainted 4.5.0-rc1 #252 [ 2624.978762] Hardware name: Apple Inc. MacBookPro10,2/Mac-AFD8A9D944EA4843, BIOS MBP102.88Z.0106.B0A.1509130955 09/13/2015 [ 2624.978770] Workqueue: phy1 rt2800usb_work_txdone [ 2624.978773] 000f4240 8801e5d23990 [ 2624.978778] 81b2e7d9 0007 8801e5d239c0 8801e5d239a8 [ 2624.978782] 81bcb87d 850b928c 8801e5d23a30 81bcbc4d [ 2624.978785] Call Trace: [ 2624.978792] [] dump_stack+0x45/0x6c [ 2624.978797] [] ubsan_epilogue+0xd/0x40 [ 2624.978802] [] handle_overflow+0xbd/0xe0 [ 2624.978807] [] ? number+0x35f/0x600 [ 2624.978812] [] __ubsan_handle_mul_overflow+0xe/0x10 [ 2624.978819] [] minstrel_calc_rate_stats+0x58c/0x5f0 [ 2624.978823] [] minstrel_ht_update_stats.isra.4+0x148/0xdd0 [ 2624.978827] [] minstrel_ht_tx_status+0x683/0x1120 [ 2624.978832] [] ieee80211_tx_status+0xff8/0x2d10 [ 2624.978836] [] rt2x00lib_txdone+0x42c/0x11a0 [ 2624.978841] [] ? update_curr+0x15a/0x400 [ 2624.978846] [] rt2800_txdone_entry+0xc5/0x1b0 [ 2624.978850] [] rt2800usb_work_txdone+0x6ff/0xbb0 [ 2624.978854] [] process_one_work+0x266/0xc00 [ 2624.978857] [] worker_thread+0x96/0xd40 [ 2624.978861] [] ? process_scheduled_works+0x70/0x70 [ 2624.978865] [] kthread+0x108/0x180 [ 2624.978869] [] ? kthread_create_on_node+0x210/0x210 [ 2624.978874] [] ret_from_fork+0x3f/0x70 [ 2624.978878] [] ? kthread_create_on_node+0x210/0x210 [ 2624.978880]
UBSAN: Undefined behaviour in fs/btrfs/inode.c:5845:10
Booting 4.5.0-rc1 with new UBSAN checker enabled: [3.859690] [3.859694] UBSAN: Undefined behaviour in fs/btrfs/inode.c:5845:10 [3.859696] signed integer overflow: [3.859697] 9223372036854775807 + 1 cannot be represented in type 'long long int' [3.859701] CPU: 3 PID: 3237 Comm: polkitd Not tainted 4.5.0-rc1 #252 [3.859702] Hardware name: Apple Inc. MacBookPro10,2/Mac-AFD8A9D944EA4843, BIOS MBP102.88Z.0106.B0A.1509130955 09/13/2015 [3.859706] 0001 88024d4bfcf8 [3.859709] 81b2e7d9 0001 88024d4bfd28 88024d4bfd10 [3.859711] 81bcb87d 83aceb48 88024d4bfd98 81bcbc4d [3.859712] Call Trace: [3.859719] [] dump_stack+0x45/0x6c [3.859723] [] ubsan_epilogue+0xd/0x40 [3.859725] [] handle_overflow+0xbd/0xe0 [3.859728] [] __ubsan_handle_add_overflow+0xe/0x10 [3.859732] [] btrfs_real_readdir+0x881/0xc10 [3.859737] [] iterate_dir+0xdd/0x2d0 [3.859740] [] SyS_getdents+0x9b/0x110 [3.859743] [] ? iterate_dir+0x2d0/0x2d0 [3.859747] [] entry_SYSCALL_64_fastpath+0x12/0x6a [3.859749]
UBSAN: Undefined behaviour in drivers/usb/core/devio.c:1517:25
Booting 4.5.0-rc1 with new UBSAN checker enabled: [4.556968] [4.556972] UBSAN: Undefined behaviour in drivers/usb/core/devio.c:1517:25 [4.556975] shift exponent -1 is negative [4.556979] CPU: 2 PID: 3624 Comm: usb Not tainted 4.5.0-rc1 #252 [4.556981] Hardware name: Apple Inc. MacBookPro10,2/Mac-AFD8A9D944EA4843, BIOS MBP102.88Z.0106.B0A.1509130955 09/13/2015 [4.556984] 845c6528 8802493b3c68 [4.556988] 81b2e7d9 0007 8802493b3c98 8802493b3c80 [4.556992] 81bcb87d 8802493b3d10 81bcc1c1 [4.556996] Call Trace: [4.557004] [] dump_stack+0x45/0x6c [4.557010] [] ubsan_epilogue+0xd/0x40 [4.557015] [] __ubsan_handle_shift_out_of_bounds+0xf1/0x140 [4.557020] [] ? __kmalloc+0x209/0x5f0 [4.557025] [] ? usb_alloc_urb+0x15/0x40 [4.557030] [] ? proc_do_submiturb+0x9af/0x2c30 [4.557034] [] proc_do_submiturb+0x2994/0x2c30 [4.557039] [] ? blocking_notifier_call_chain+0x11/0x20 [4.557044] [] usbdev_do_ioctl+0x90b/0x2170 [4.557049] [] ? hrtimer_start_range_ns+0x274/0xc60 [4.557053] [] usbdev_ioctl+0x9/0x10 [4.557059] [] do_vfs_ioctl+0xd2/0xcb0 [4.557063] [] SyS_ioctl+0x74/0x80 [4.557067] [] entry_SYSCALL_64_fastpath+0x12/0x6a [4.557070]
UBSAN: Undefined behaviour in ./arch/x86/include/asm/atomic.h:156:2
Booting 4.5.0-rc1 with new UBSAN checker enabled: [3.805449] [3.805453] UBSAN: Undefined behaviour in ./arch/x86/include/asm/atomic.h:156:2 [3.805455] signed integer overflow: [3.805456] -1720106381 + -1531247276 cannot be represented in type 'int' [3.805460] CPU: 3 PID: 3235 Comm: cups-browsed Not tainted 4.5.0-rc1 #252 [3.805461] Hardware name: Apple Inc. MacBookPro10,2/Mac-AFD8A9D944EA4843, BIOS MBP102.88Z.0106.B0A.1509130955 09/13/2015 [3.805465] a4bb0554 88025f2c37c8 [3.805468] 81b2e7d9 0001 88025f2c37f8 88025f2c37e0 [3.805470] 81bcb87d 84b16a74 88025f2c3868 81bcbc4d [3.805471] Call Trace: [3.805478][] dump_stack+0x45/0x6c [3.805483] [] ubsan_epilogue+0xd/0x40 [3.805485] [] handle_overflow+0xbd/0xe0 [3.805490] [] ? csum_partial_copy_nocheck+0xf/0x20 [3.805493] [] ? get_random_bytes+0x4f/0x100 [3.805496] [] __ubsan_handle_add_overflow+0xe/0x10 [3.805500] [] ip_idents_reserve+0x9a/0xd0 [3.805503] [] __ip_select_ident+0xc9/0x160 [3.805506] [] __ip_make_skb+0x83a/0x1e70 [3.805509] [] ? ip_setup_cork+0x4d0/0x4d0 [3.805511] [] ip_push_pending_frames+0x1d/0x50 [3.805514] [] ip_send_unicast_reply+0x3df/0xa80 [3.805517] [] ? ac6_proc_exit+0x40/0x40 [3.805521] [] tcp_v4_send_reset+0x4e4/0xe40 [3.805524] [] tcp_v4_rcv+0x979/0x1db0 [3.805527] [] ip_local_deliver_finish+0x139/0x600 [3.805529] [] ip_local_deliver+0x10a/0x1a0 [3.805532] [] ? ip_rcv_finish+0xb10/0xb10 [3.805534] [] ip_rcv_finish+0x23d/0xb10 [3.805536] [] ip_rcv+0x45d/0xa30 [3.805540] [] ? wake_up_q+0x12/0xd0 [3.805543] [] ? inet_del_offload+0x40/0x40 [3.805545] [] ? ip_local_deliver+0x1a0/0x1a0 [3.805549] [] __netif_receive_skb_core+0xd2c/0x1c30 [3.805552] [] __netif_receive_skb+0x29/0x190 [3.805556] [] ? rcu_gp_kthread_wake+0x6e/0xb0 [3.805559] [] process_backlog+0x116/0x5d0 [3.805562] [] net_rx_action+0x395/0x9c0 [3.805566] [] __do_softirq+0xbc/0x590 [3.805569] [] do_softirq_own_stack+0x1c/0x30 [3.805572][] do_softirq.part.17+0x1d/0x20 [3.805575] [] __local_bh_enable_ip+0x94/0xc0 [3.805577] [] ip_finish_output2+0x219/0x880 [3.805580] [] ip_finish_output+0x280/0x690 [3.805582] [] ip_output+0x100/0x210 [3.805585] [] ? ip_fragment.constprop.24+0x190/0x190 [3.805588] [] ip_local_out+0x3b/0x80 [3.805590] [] ip_queue_xmit+0x274/0x1000 [3.805593] [] ? __skb_clone+0x5b/0x550 [3.805595] [] tcp_transmit_skb+0x7b6/0x2260 [3.805598] [] tcp_connect+0xf2d/0x1fe0 [3.805601] [] ? secure_tcp_sequence_number+0x7c/0xd0 [3.805603] [] tcp_v4_connect+0x4a7/0x15f0 [3.805607] [] __inet_stream_connect+0xf1/0x880 [3.805609] [] ? _raw_spin_unlock_bh+0x26/0x40 [3.805612] [] ? __inet_stream_connect+0x880/0x880 [3.805615] [] inet_stream_connect+0x38/0x80 [3.805618] [] SYSC_connect+0xe7/0x170 [3.805622] [] ? selinux_file_fcntl+0x8a/0x100 [3.805625] [] ? security_file_fcntl+0x48/0x80 [3.805629] [] ? SyS_fcntl+0x542/0x8b0 [3.805632] [] SyS_connect+0x9/0x10 [3.805634] [] entry_SYSCALL_64_fastpath+0x12/0x6a [3.805636]
UBSAN: Undefined behaviour in drivers/usb/core/devio.c:1517:25
Booting 4.5.0-rc1 with new UBSAN checker enabled: [4.556968] [4.556972] UBSAN: Undefined behaviour in drivers/usb/core/devio.c:1517:25 [4.556975] shift exponent -1 is negative [4.556979] CPU: 2 PID: 3624 Comm: usb Not tainted 4.5.0-rc1 #252 [4.556981] Hardware name: Apple Inc. MacBookPro10,2/Mac-AFD8A9D944EA4843, BIOS MBP102.88Z.0106.B0A.1509130955 09/13/2015 [4.556984] 845c6528 8802493b3c68 [4.556988] 81b2e7d9 0007 8802493b3c98 8802493b3c80 [4.556992] 81bcb87d 8802493b3d10 81bcc1c1 [4.556996] Call Trace: [4.557004] [] dump_stack+0x45/0x6c [4.557010] [] ubsan_epilogue+0xd/0x40 [4.557015] [] __ubsan_handle_shift_out_of_bounds+0xf1/0x140 [4.557020] [] ? __kmalloc+0x209/0x5f0 [4.557025] [] ? usb_alloc_urb+0x15/0x40 [4.557030] [] ? proc_do_submiturb+0x9af/0x2c30 [4.557034] [] proc_do_submiturb+0x2994/0x2c30 [4.557039] [] ? blocking_notifier_call_chain+0x11/0x20 [4.557044] [] usbdev_do_ioctl+0x90b/0x2170 [4.557049] [] ? hrtimer_start_range_ns+0x274/0xc60 [4.557053] [] usbdev_ioctl+0x9/0x10 [4.557059] [] do_vfs_ioctl+0xd2/0xcb0 [4.557063] [] SyS_ioctl+0x74/0x80 [4.557067] [] entry_SYSCALL_64_fastpath+0x12/0x6a [4.557070]
UBSAN: Undefined behaviour in net/mac80211/rc80211_minstrel.h:47:34
4.5.0-rc1 with the new UBSAN checker: [ 2624.978741] [ 2624.978749] UBSAN: Undefined behaviour in net/mac80211/rc80211_minstrel.h:47:34 [ 2624.978752] signed integer overflow: [ 2624.978754] -32768 * 100 cannot be represented in type 'int' [ 2624.978759] CPU: 1 PID: 5362 Comm: kworker/u16:2 Not tainted 4.5.0-rc1 #252 [ 2624.978762] Hardware name: Apple Inc. MacBookPro10,2/Mac-AFD8A9D944EA4843, BIOS MBP102.88Z.0106.B0A.1509130955 09/13/2015 [ 2624.978770] Workqueue: phy1 rt2800usb_work_txdone [ 2624.978773] 000f4240 8801e5d23990 [ 2624.978778] 81b2e7d9 0007 8801e5d239c0 8801e5d239a8 [ 2624.978782] 81bcb87d 850b928c 8801e5d23a30 81bcbc4d [ 2624.978785] Call Trace: [ 2624.978792] [] dump_stack+0x45/0x6c [ 2624.978797] [] ubsan_epilogue+0xd/0x40 [ 2624.978802] [] handle_overflow+0xbd/0xe0 [ 2624.978807] [] ? number+0x35f/0x600 [ 2624.978812] [] __ubsan_handle_mul_overflow+0xe/0x10 [ 2624.978819] [] minstrel_calc_rate_stats+0x58c/0x5f0 [ 2624.978823] [] minstrel_ht_update_stats.isra.4+0x148/0xdd0 [ 2624.978827] [] minstrel_ht_tx_status+0x683/0x1120 [ 2624.978832] [] ieee80211_tx_status+0xff8/0x2d10 [ 2624.978836] [] rt2x00lib_txdone+0x42c/0x11a0 [ 2624.978841] [] ? update_curr+0x15a/0x400 [ 2624.978846] [] rt2800_txdone_entry+0xc5/0x1b0 [ 2624.978850] [] rt2800usb_work_txdone+0x6ff/0xbb0 [ 2624.978854] [] process_one_work+0x266/0xc00 [ 2624.978857] [] worker_thread+0x96/0xd40 [ 2624.978861] [] ? process_scheduled_works+0x70/0x70 [ 2624.978865] [] kthread+0x108/0x180 [ 2624.978869] [] ? kthread_create_on_node+0x210/0x210 [ 2624.978874] [] ret_from_fork+0x3f/0x70 [ 2624.978878] [] ? kthread_create_on_node+0x210/0x210 [ 2624.978880]
UBSAN: Undefined behaviour in ./arch/x86/include/asm/atomic.h:156:2
Booting 4.5.0-rc1 with new UBSAN checker enabled: [3.805449] [3.805453] UBSAN: Undefined behaviour in ./arch/x86/include/asm/atomic.h:156:2 [3.805455] signed integer overflow: [3.805456] -1720106381 + -1531247276 cannot be represented in type 'int' [3.805460] CPU: 3 PID: 3235 Comm: cups-browsed Not tainted 4.5.0-rc1 #252 [3.805461] Hardware name: Apple Inc. MacBookPro10,2/Mac-AFD8A9D944EA4843, BIOS MBP102.88Z.0106.B0A.1509130955 09/13/2015 [3.805465] a4bb0554 88025f2c37c8 [3.805468] 81b2e7d9 0001 88025f2c37f8 88025f2c37e0 [3.805470] 81bcb87d 84b16a74 88025f2c3868 81bcbc4d [3.805471] Call Trace: [3.805478][] dump_stack+0x45/0x6c [3.805483] [] ubsan_epilogue+0xd/0x40 [3.805485] [] handle_overflow+0xbd/0xe0 [3.805490] [] ? csum_partial_copy_nocheck+0xf/0x20 [3.805493] [] ? get_random_bytes+0x4f/0x100 [3.805496] [] __ubsan_handle_add_overflow+0xe/0x10 [3.805500] [] ip_idents_reserve+0x9a/0xd0 [3.805503] [] __ip_select_ident+0xc9/0x160 [3.805506] [] __ip_make_skb+0x83a/0x1e70 [3.805509] [] ? ip_setup_cork+0x4d0/0x4d0 [3.805511] [] ip_push_pending_frames+0x1d/0x50 [3.805514] [] ip_send_unicast_reply+0x3df/0xa80 [3.805517] [] ? ac6_proc_exit+0x40/0x40 [3.805521] [] tcp_v4_send_reset+0x4e4/0xe40 [3.805524] [] tcp_v4_rcv+0x979/0x1db0 [3.805527] [] ip_local_deliver_finish+0x139/0x600 [3.805529] [] ip_local_deliver+0x10a/0x1a0 [3.805532] [] ? ip_rcv_finish+0xb10/0xb10 [3.805534] [] ip_rcv_finish+0x23d/0xb10 [3.805536] [] ip_rcv+0x45d/0xa30 [3.805540] [] ? wake_up_q+0x12/0xd0 [3.805543] [] ? inet_del_offload+0x40/0x40 [3.805545] [] ? ip_local_deliver+0x1a0/0x1a0 [3.805549] [] __netif_receive_skb_core+0xd2c/0x1c30 [3.805552] [] __netif_receive_skb+0x29/0x190 [3.805556] [] ? rcu_gp_kthread_wake+0x6e/0xb0 [3.805559] [] process_backlog+0x116/0x5d0 [3.805562] [] net_rx_action+0x395/0x9c0 [3.805566] [] __do_softirq+0xbc/0x590 [3.805569] [] do_softirq_own_stack+0x1c/0x30 [3.805572][] do_softirq.part.17+0x1d/0x20 [3.805575] [] __local_bh_enable_ip+0x94/0xc0 [3.805577] [] ip_finish_output2+0x219/0x880 [3.805580] [] ip_finish_output+0x280/0x690 [3.805582] [] ip_output+0x100/0x210 [3.805585] [] ? ip_fragment.constprop.24+0x190/0x190 [3.805588] [] ip_local_out+0x3b/0x80 [3.805590] [] ip_queue_xmit+0x274/0x1000 [3.805593] [] ? __skb_clone+0x5b/0x550 [3.805595] [] tcp_transmit_skb+0x7b6/0x2260 [3.805598] [] tcp_connect+0xf2d/0x1fe0 [3.805601] [] ? secure_tcp_sequence_number+0x7c/0xd0 [3.805603] [] tcp_v4_connect+0x4a7/0x15f0 [3.805607] [] __inet_stream_connect+0xf1/0x880 [3.805609] [] ? _raw_spin_unlock_bh+0x26/0x40 [3.805612] [] ? __inet_stream_connect+0x880/0x880 [3.805615] [] inet_stream_connect+0x38/0x80 [3.805618] [] SYSC_connect+0xe7/0x170 [3.805622] [] ? selinux_file_fcntl+0x8a/0x100 [3.805625] [] ? security_file_fcntl+0x48/0x80 [3.805629] [] ? SyS_fcntl+0x542/0x8b0 [3.805632] [] SyS_connect+0x9/0x10 [3.805634] [] entry_SYSCALL_64_fastpath+0x12/0x6a [3.805636]
UBSAN: Undefined behaviour in fs/ext4/mballoc.c:1274:11
Mounting ext4 fs on 4.5.0-rc1 [ 1176.592557] [ 1176.592565] UBSAN: Undefined behaviour in fs/ext4/mballoc.c:1274:11 [ 1176.592567] shift exponent -1 is negative [ 1176.592571] CPU: 3 PID: 4976 Comm: QThread Not tainted 4.5.0-rc1 #252 [ 1176.592573] Hardware name: Apple Inc. MacBookPro10,2/Mac-AFD8A9D944EA4843, BIOS MBP102.88Z.0106.B0A.1509130955 09/13/2015 [ 1176.592575] 8379dede 88021f4bf8b8 [ 1176.592578] 81b2e7d9 0007 88021f4bf8e8 88021f4bf8d0 [ 1176.592581] 81bcb87d 88021f4bf968 81bcc1c1 [ 1176.592583] Call Trace: [ 1176.592590] [] dump_stack+0x45/0x6c [ 1176.592595] [] ubsan_epilogue+0xd/0x40 [ 1176.592597] [] __ubsan_handle_shift_out_of_bounds+0xf1/0x140 [ 1176.592601] [] ? ext4_mb_init_cache+0x677/0x1570 [ 1176.592604] [] ? lru_cache_add+0x9/0x10 [ 1176.592607] [] mb_find_order_for_block+0x132/0x1e0 [ 1176.592609] [] mb_find_extent+0xde/0x6d0 [ 1176.592611] [] ext4_mb_complex_scan_group+0x14b/0x8a0 [ 1176.592614] [] ext4_mb_regular_allocator+0x3a7/0xcc0 [ 1176.592616] [] ext4_mb_new_blocks+0x4c5/0x1270 [ 1176.592620] [] ? ext4_ext_put_gap_in_cache+0x116/0x230 [ 1176.592622] [] ext4_ext_map_blocks+0x81c/0x2280 [ 1176.592626] [] ext4_map_blocks+0x1b6/0xe00 [ 1176.592628] [] ext4_getblk+0x49/0x290 [ 1176.592630] [] ext4_bread+0xe/0xb0 [ 1176.592633] [] ext4_append+0xad/0x340 [ 1176.592635] [] ext4_mkdir+0x344/0x970 [ 1176.592639] [] vfs_mkdir+0x103/0x310 [ 1176.592642] [] SyS_mkdir+0x93/0x130 [ 1176.592645] [] entry_SYSCALL_64_fastpath+0x12/0x6a [ 1176.592647]
UBSAN: Undefined behaviour in fs/btrfs/inode.c:5845:10
Booting 4.5.0-rc1 with new UBSAN checker enabled: [3.859690] [3.859694] UBSAN: Undefined behaviour in fs/btrfs/inode.c:5845:10 [3.859696] signed integer overflow: [3.859697] 9223372036854775807 + 1 cannot be represented in type 'long long int' [3.859701] CPU: 3 PID: 3237 Comm: polkitd Not tainted 4.5.0-rc1 #252 [3.859702] Hardware name: Apple Inc. MacBookPro10,2/Mac-AFD8A9D944EA4843, BIOS MBP102.88Z.0106.B0A.1509130955 09/13/2015 [3.859706] 0001 88024d4bfcf8 [3.859709] 81b2e7d9 0001 88024d4bfd28 88024d4bfd10 [3.859711] 81bcb87d 83aceb48 88024d4bfd98 81bcbc4d [3.859712] Call Trace: [3.859719] [] dump_stack+0x45/0x6c [3.859723] [] ubsan_epilogue+0xd/0x40 [3.859725] [] handle_overflow+0xbd/0xe0 [3.859728] [] __ubsan_handle_add_overflow+0xe/0x10 [3.859732] [] btrfs_real_readdir+0x881/0xc10 [3.859737] [] iterate_dir+0xdd/0x2d0 [3.859740] [] SyS_getdents+0x9b/0x110 [3.859743] [] ? iterate_dir+0x2d0/0x2d0 [3.859747] [] entry_SYSCALL_64_fastpath+0x12/0x6a [3.859749]
Re: UBSAN: Undefined behaviour in fs/btrfs/inode.c:5845:10
On Tue, Jan 26, 2016 at 12:03:39PM +0100, David Sterba wrote: > On Tue, Jan 26, 2016 at 10:13:33AM +0000, Chris Bainbridge wrote: > > Booting 4.5.0-rc1 with new UBSAN checker enabled: > > > > [3.859690] > > > > [3.859694] UBSAN: Undefined behaviour in fs/btrfs/inode.c:5845:10 > > [3.859696] signed integer overflow: > > [3.859697] 9223372036854775807 + 1 cannot be represented in type 'long > > long int' > > [3.859701] CPU: 3 PID: 3237 Comm: polkitd Not tainted 4.5.0-rc1 #252 > > [3.859702] Hardware name: Apple Inc. > > MacBookPro10,2/Mac-AFD8A9D944EA4843, BIOS MBP102.88Z.0106.B0A.1509130955 > > 09/13/2015 > > [3.859706] 0001 > > 88024d4bfcf8 > > [3.859709] 81b2e7d9 0001 88024d4bfd28 > > 88024d4bfd10 > > [3.859711] 81bcb87d 83aceb48 88024d4bfd98 > > 81bcbc4d > > [3.859712] Call Trace: > > [3.859719] [] dump_stack+0x45/0x6c > > [3.859723] [] ubsan_epilogue+0xd/0x40 > > [3.859725] [] handle_overflow+0xbd/0xe0 > > [3.859728] [] __ubsan_handle_add_overflow+0xe/0x10 > > [3.859732] [] btrfs_real_readdir+0x881/0xc10 > > [3.859737] [] iterate_dir+0xdd/0x2d0 > > [3.859740] [] SyS_getdents+0x9b/0x110 > > [3.859743] [] ? iterate_dir+0x2d0/0x2d0 > > [3.859747] [] entry_SYSCALL_64_fastpath+0x12/0x6a > > [3.859749] > > > > That seems to be the same overflow as reported in the past, caught by > the PaX SIZE_OVERFLOW plugin. There's a patch but not merged yet. > > https://patchwork.kernel.org/patch/7611351/ Yes, the error does not appear with that patch applied.
UBSAN: Undefined behaviour in net/mac80211/rx.c:924:18
4.5.0-rc1 another UBSAN error: [ 4845.229441] [ 4845.229454] UBSAN: Undefined behaviour in net/mac80211/rx.c:924:18 [ 4845.229458] load of value 2 is not a valid value for type '_Bool' [ 4845.229464] CPU: 1 PID: 6266 Comm: kworker/u16:8 Not tainted 4.5.0-rc1 #252 [ 4845.229468] Hardware name: Apple Inc. MacBookPro10,2/Mac-AFD8A9D944EA4843, BIOS MBP102.88Z.0106.B0A.1509130955 09/13/2015 [ 4845.229491] Workqueue: phy2 rt2x00usb_work_rxdone [ 4845.229493] 0002 8801b13c39f8 [ 4845.229496] 81b2e7d9 0007 8801b13c3a20 8801b13c3a10 [ 4845.229498] 81bcb87d 85016890 8801b13c3a60 81bcc279 [ 4845.229501] Call Trace: [ 4845.229506] [] dump_stack+0x45/0x6c [ 4845.229510] [] ubsan_epilogue+0xd/0x40 [ 4845.229513] [] __ubsan_handle_load_invalid_value+0x69/0x80 [ 4845.229517] [] ? xhci_setup_addressable_virt_dev+0xeb2/0x13b0 [ 4845.229520] [] ? pick_next_entity+0xcb/0x280 [ 4845.229524] [] ieee80211_sta_reorder_release.isra.15+0x7e3/0xad0 [ 4845.229527] [] ieee80211_prepare_and_rx_handle+0x11a7/0x2ab0 [ 4845.229530] [] ? xhci_urb_enqueue+0x394/0x1140 [ 4845.229533] [] ? usb_hcd_map_urb_for_dma+0x94f/0x1140 [ 4845.229537] [] ? skb_release_data+0x117/0x2f0 [ 4845.229539] [] __ieee80211_rx_handle_packet+0x26a/0x9a0 [ 4845.229542] [] ? __kmalloc_reserve.isra.11+0x2c/0x80 [ 4845.229545] [] ieee80211_rx_napi+0x651/0x12b0 [ 4845.229549] [] rt2x00lib_rxdone+0x402/0x1120 [ 4845.229552] [] ? dequeue_task_fair+0x97f/0x41d0 [ 4845.229554] [] rt2x00usb_work_rxdone+0xac/0x1f0 [ 4845.229558] [] ? __schedule+0x5cd/0x1770 [ 4845.229561] [] process_one_work+0x266/0xc00 [ 4845.229563] [] worker_thread+0x96/0xd40 [ 4845.229565] [] ? process_scheduled_works+0x70/0x70 [ 4845.229568] [] kthread+0x108/0x180 [ 4845.229571] [] ? kthread_create_on_node+0x210/0x210 [ 4845.229573] [] ret_from_fork+0x3f/0x70 [ 4845.229576] [] ? kthread_create_on_node+0x210/0x210 [ 4845.229577]
Re: 4.4-rc, intel dri i915, regular hangs and corruptions
On 17 December 2015 at 18:34, Norbert Preining wrote: > * font corruption > sometime sets of glyphs, or practically all glyphs disappear > related probably to bug > https://bugs.freedesktop.org/show_bug.cgi?id=55500 > I have sent some info there already, without response > > Currently my font displays some kind of strange symbols instead of > an m ... looks a bit like a Kanji. I remember a similar bug around 2.99.917 but that tag is over a year old now and there have been many bug fixes since. You'll need to verify you can still reproduce your issue with the latest from git://anongit.freedesktop.org/xorg/driver/xf86-video-intel and if so do a bisect from the previous working kernel or xf86-video-intel to identify the problematic commit. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 4.4-rc, intel dri i915, regular hangs and corruptions
On 17 December 2015 at 18:34, Norbert Preiningwrote: > * font corruption > sometime sets of glyphs, or practically all glyphs disappear > related probably to bug > https://bugs.freedesktop.org/show_bug.cgi?id=55500 > I have sent some info there already, without response > > Currently my font displays some kind of strange symbols instead of > an m ... looks a bit like a Kanji. I remember a similar bug around 2.99.917 but that tag is over a year old now and there have been many bug fixes since. You'll need to verify you can still reproduce your issue with the latest from git://anongit.freedesktop.org/xorg/driver/xf86-video-intel and if so do a bisect from the previous working kernel or xf86-video-intel to identify the problematic commit. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] Revert "ACPI / SBS: Add 5 us delay to fix SBS hangs on MacBook"
This reverts commit 3349fb64b2927407017d970dd5c4daf3c5ad69f8. After commit 8848d20cf09c ("ACPI / SMBus: Fix boot stalls / high CPU caused by reentrant code") the delay is no longer necessary. Signed-off-by: Chris Bainbridge --- drivers/acpi/sbshc.c | 22 -- 1 file changed, 22 deletions(-) diff --git a/drivers/acpi/sbshc.c b/drivers/acpi/sbshc.c index e2900518cf7e..2fa8304171e0 100644 --- a/drivers/acpi/sbshc.c +++ b/drivers/acpi/sbshc.c @@ -14,7 +14,6 @@ #include #include #include -#include #include "sbshc.h" #define PREFIX "ACPI: " @@ -89,8 +88,6 @@ enum acpi_smb_offset { ACPI_SMB_ALARM_DATA = 0x26, /* 2 bytes alarm data */ }; -static bool macbook; - static inline int smb_hc_read(struct acpi_smb_hc *hc, u8 address, u8 *data) { return ec_read(hc->offset + address, data); @@ -121,8 +118,6 @@ static int acpi_smbus_transaction(struct acpi_smb_hc *hc, u8 protocol, mutex_lock(>lock); hc->done = false; - if (macbook) - udelay(5); if (smb_hc_read(hc, ACPI_SMB_PROTOCOL, )) goto end; if (temp) { @@ -250,29 +245,12 @@ extern int acpi_ec_add_query_handler(struct acpi_ec *ec, u8 query_bit, acpi_handle handle, acpi_ec_query_func func, void *data); -static int macbook_dmi_match(const struct dmi_system_id *d) -{ - pr_debug("Detected MacBook, enabling workaround\n"); - macbook = true; - return 0; -} - -static struct dmi_system_id acpi_smbus_dmi_table[] = { - { macbook_dmi_match, "Apple MacBook", { - DMI_MATCH(DMI_BOARD_VENDOR, "Apple"), - DMI_MATCH(DMI_PRODUCT_NAME, "MacBook") }, - }, - { }, -}; - static int acpi_smbus_hc_add(struct acpi_device *device) { int status; unsigned long long val; struct acpi_smb_hc *hc; - dmi_check_system(acpi_smbus_dmi_table); - if (!device) return -EINVAL; -- 2.1.4 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] ACPI / SMBus: Fix boot stalls / high CPU caused by reentrant code
In the SBS initialisation, a reentrant call to wait_event_timeout() causes an intermittent boot stall of several minutes usually following the "Switching to clocksource tsc" message. Another symptom of this bug is high CPU usage from programs (Firefox, upowerd) querying the battery state. This is caused by: 1. drivers/acpi/sbshc.c wait_transaction_complete() calls wait_event_timeout(): if (wait_event_timeout(hc->wait, smb_check_done(hc), msecs_to_jiffies(timeout))) 2. ___wait_event sets task state to uninterruptible 3. ___wait_event calls the "condition" smb_check_done() 4. smb_check_done (sbshc.c) calls through to ec_read() in drivers/acpi/ec.c 5. ec_guard() is reached which calls wait_event_timeout() if (wait_event_timeout(ec->wait, ec_transaction_completed(ec), guard)) ie. wait_event_timeout() is being called again inside evaluation of the previous wait_event_timeout() condition 5. The EC IRQ handler calls wake_up() and wakes up the sleeping task in ec_guard() 6. The task is now in state running even though the wait "condition" is still being evaluated 7. The "condition" check returns false so ___wait_event calls schedule_timeout() 8. Since the task state is running, the scheduler immediately schedules it again 9. This loop usually repeats for around 250 seconds even though the original wait_event_timeout was only 1000ms. The timeout is incorrect because each call to schedule_timeout() usually returns immediately, taking less than 1ms, so the jiffies timeout counter is not decremented. The task is now stuck in a running state, and so is highly likely to be immediately rescheduled, which takes less than a jiffy. The loop will never exit if all schedule_timeout() calls take less than a jiffy. Fix this by replacing SMBus reads in the wait_event_timeout condition with checks of a boolean value that is updated by the EC query handler. Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=107191 Link: https://lkml.org/lkml/2015/11/6/776 Signed-off-by: Chris Bainbridge --- drivers/acpi/sbshc.c | 26 +++--- 1 file changed, 7 insertions(+), 19 deletions(-) diff --git a/drivers/acpi/sbshc.c b/drivers/acpi/sbshc.c index bf034f8..e290051 100644 --- a/drivers/acpi/sbshc.c +++ b/drivers/acpi/sbshc.c @@ -30,6 +30,7 @@ struct acpi_smb_hc { u8 query_bit; smbus_alarm_callback callback; void *context; + bool done; }; static int acpi_smbus_hc_add(struct acpi_device *device); @@ -100,27 +101,11 @@ static inline int smb_hc_write(struct acpi_smb_hc *hc, u8 address, u8 data) return ec_write(hc->offset + address, data); } -static inline int smb_check_done(struct acpi_smb_hc *hc) -{ - union acpi_smb_status status = {.raw = 0}; - smb_hc_read(hc, ACPI_SMB_STATUS, ); - return status.fields.done && (status.fields.status == SMBUS_OK); -} - static int wait_transaction_complete(struct acpi_smb_hc *hc, int timeout) { - if (wait_event_timeout(hc->wait, smb_check_done(hc), - msecs_to_jiffies(timeout))) + if (wait_event_timeout(hc->wait, hc->done, msecs_to_jiffies(timeout))) return 0; - /* -* After the timeout happens, OS will try to check the status of SMbus. -* If the status is what OS expected, it will be regarded as the bogus -* timeout. -*/ - if (smb_check_done(hc)) - return 0; - else - return -ETIME; + return -ETIME; } static int acpi_smbus_transaction(struct acpi_smb_hc *hc, u8 protocol, @@ -135,6 +120,7 @@ static int acpi_smbus_transaction(struct acpi_smb_hc *hc, u8 protocol, } mutex_lock(>lock); + hc->done = false; if (macbook) udelay(5); if (smb_hc_read(hc, ACPI_SMB_PROTOCOL, )) @@ -235,8 +221,10 @@ static int smbus_alarm(void *context) if (smb_hc_read(hc, ACPI_SMB_STATUS, )) return 0; /* Check if it is only a completion notify */ - if (status.fields.done) + if (status.fields.done && status.fields.status == SMBUS_OK) { + hc->done = true; wake_up(>wait); + } if (!status.fields.alarm) return 0; mutex_lock(>lock); -- 2.1.4 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] Revert "ACPI / SBS: Add 5 us delay to fix SBS hangs on MacBook"
This reverts commit 3349fb64b2927407017d970dd5c4daf3c5ad69f8. After commit 8848d20cf09c ("ACPI / SMBus: Fix boot stalls / high CPU caused by reentrant code") the delay is no longer necessary. Signed-off-by: Chris Bainbridge <chris.bainbri...@gmail.com> --- drivers/acpi/sbshc.c | 22 -- 1 file changed, 22 deletions(-) diff --git a/drivers/acpi/sbshc.c b/drivers/acpi/sbshc.c index e2900518cf7e..2fa8304171e0 100644 --- a/drivers/acpi/sbshc.c +++ b/drivers/acpi/sbshc.c @@ -14,7 +14,6 @@ #include #include #include -#include #include "sbshc.h" #define PREFIX "ACPI: " @@ -89,8 +88,6 @@ enum acpi_smb_offset { ACPI_SMB_ALARM_DATA = 0x26, /* 2 bytes alarm data */ }; -static bool macbook; - static inline int smb_hc_read(struct acpi_smb_hc *hc, u8 address, u8 *data) { return ec_read(hc->offset + address, data); @@ -121,8 +118,6 @@ static int acpi_smbus_transaction(struct acpi_smb_hc *hc, u8 protocol, mutex_lock(>lock); hc->done = false; - if (macbook) - udelay(5); if (smb_hc_read(hc, ACPI_SMB_PROTOCOL, )) goto end; if (temp) { @@ -250,29 +245,12 @@ extern int acpi_ec_add_query_handler(struct acpi_ec *ec, u8 query_bit, acpi_handle handle, acpi_ec_query_func func, void *data); -static int macbook_dmi_match(const struct dmi_system_id *d) -{ - pr_debug("Detected MacBook, enabling workaround\n"); - macbook = true; - return 0; -} - -static struct dmi_system_id acpi_smbus_dmi_table[] = { - { macbook_dmi_match, "Apple MacBook", { - DMI_MATCH(DMI_BOARD_VENDOR, "Apple"), - DMI_MATCH(DMI_PRODUCT_NAME, "MacBook") }, - }, - { }, -}; - static int acpi_smbus_hc_add(struct acpi_device *device) { int status; unsigned long long val; struct acpi_smb_hc *hc; - dmi_check_system(acpi_smbus_dmi_table); - if (!device) return -EINVAL; -- 2.1.4 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] ACPI / SMBus: Fix boot stalls / high CPU caused by reentrant code
In the SBS initialisation, a reentrant call to wait_event_timeout() causes an intermittent boot stall of several minutes usually following the "Switching to clocksource tsc" message. Another symptom of this bug is high CPU usage from programs (Firefox, upowerd) querying the battery state. This is caused by: 1. drivers/acpi/sbshc.c wait_transaction_complete() calls wait_event_timeout(): if (wait_event_timeout(hc->wait, smb_check_done(hc), msecs_to_jiffies(timeout))) 2. ___wait_event sets task state to uninterruptible 3. ___wait_event calls the "condition" smb_check_done() 4. smb_check_done (sbshc.c) calls through to ec_read() in drivers/acpi/ec.c 5. ec_guard() is reached which calls wait_event_timeout() if (wait_event_timeout(ec->wait, ec_transaction_completed(ec), guard)) ie. wait_event_timeout() is being called again inside evaluation of the previous wait_event_timeout() condition 5. The EC IRQ handler calls wake_up() and wakes up the sleeping task in ec_guard() 6. The task is now in state running even though the wait "condition" is still being evaluated 7. The "condition" check returns false so ___wait_event calls schedule_timeout() 8. Since the task state is running, the scheduler immediately schedules it again 9. This loop usually repeats for around 250 seconds even though the original wait_event_timeout was only 1000ms. The timeout is incorrect because each call to schedule_timeout() usually returns immediately, taking less than 1ms, so the jiffies timeout counter is not decremented. The task is now stuck in a running state, and so is highly likely to be immediately rescheduled, which takes less than a jiffy. The loop will never exit if all schedule_timeout() calls take less than a jiffy. Fix this by replacing SMBus reads in the wait_event_timeout condition with checks of a boolean value that is updated by the EC query handler. Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=107191 Link: https://lkml.org/lkml/2015/11/6/776 Signed-off-by: Chris Bainbridge <chris.bainbri...@gmail.com> --- drivers/acpi/sbshc.c | 26 +++--- 1 file changed, 7 insertions(+), 19 deletions(-) diff --git a/drivers/acpi/sbshc.c b/drivers/acpi/sbshc.c index bf034f8..e290051 100644 --- a/drivers/acpi/sbshc.c +++ b/drivers/acpi/sbshc.c @@ -30,6 +30,7 @@ struct acpi_smb_hc { u8 query_bit; smbus_alarm_callback callback; void *context; + bool done; }; static int acpi_smbus_hc_add(struct acpi_device *device); @@ -100,27 +101,11 @@ static inline int smb_hc_write(struct acpi_smb_hc *hc, u8 address, u8 data) return ec_write(hc->offset + address, data); } -static inline int smb_check_done(struct acpi_smb_hc *hc) -{ - union acpi_smb_status status = {.raw = 0}; - smb_hc_read(hc, ACPI_SMB_STATUS, ); - return status.fields.done && (status.fields.status == SMBUS_OK); -} - static int wait_transaction_complete(struct acpi_smb_hc *hc, int timeout) { - if (wait_event_timeout(hc->wait, smb_check_done(hc), - msecs_to_jiffies(timeout))) + if (wait_event_timeout(hc->wait, hc->done, msecs_to_jiffies(timeout))) return 0; - /* -* After the timeout happens, OS will try to check the status of SMbus. -* If the status is what OS expected, it will be regarded as the bogus -* timeout. -*/ - if (smb_check_done(hc)) - return 0; - else - return -ETIME; + return -ETIME; } static int acpi_smbus_transaction(struct acpi_smb_hc *hc, u8 protocol, @@ -135,6 +120,7 @@ static int acpi_smbus_transaction(struct acpi_smb_hc *hc, u8 protocol, } mutex_lock(>lock); + hc->done = false; if (macbook) udelay(5); if (smb_hc_read(hc, ACPI_SMB_PROTOCOL, )) @@ -235,8 +221,10 @@ static int smbus_alarm(void *context) if (smb_hc_read(hc, ACPI_SMB_STATUS, )) return 0; /* Check if it is only a completion notify */ - if (status.fields.done) + if (status.fields.done && status.fields.status == SMBUS_OK) { + hc->done = true; wake_up(>wait); + } if (!status.fields.alarm) return 0; mutex_lock(>lock); -- 2.1.4 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Preserve task state in reentrant calls to ___wait_event
On Fri, Nov 06, 2015 at 08:44:08PM +, Chris Bainbridge wrote: > -#define ___wait_event(wq, condition, state, exclusive, ret, cmd) \ > +#define ___wait_event(wq, condition, nstate, exclusive, ret, cmd)\ > ({ \ > __label__ __out;\ > wait_queue_t __wait;\ > long __ret = ret; /* explicit shadow */ \ > + long ostate = current->state; \ XXX > \ > INIT_LIST_HEAD(&__wait.task_list); \ > if (exclusive) \ > @@ -222,16 +223,16 @@ wait_queue_head_t *bit_waitqueue(void *, int); > __wait.flags = 0; \ > \ > for (;;) { \ > - long __int = prepare_to_wait_event(, &__wait, state);\ > + long __int = prepare_to_wait_event(, &__wait, nstate);\ > \ > if (condition) \ > break; \ > \ > - if (___wait_is_interruptible(state) && __int) { \ > + if (___wait_is_interruptible(nstate) && __int) {\ > __ret = __int; \ > if (exclusive) {\ > abort_exclusive_wait(, &__wait, \ > - state, NULL); \ > + nstate, NULL); \ > goto __out; \ > } \ > break; \ > @@ -240,6 +241,7 @@ wait_queue_head_t *bit_waitqueue(void *, int); > cmd;\ > } \ > finish_wait(, &__wait); \ > + set_current_state(ostate); \ I'm not convinced that this particular code is (or can be) race free in the general reentrant case. The outer call to ___wait_event will miss any wake_up received in the inner call between XXX above (store of current->state) and this point of restoring the previous state. So if the inner condition evaluation or some interrupt handler happens to trigger a wake_up meant for the outer call then it will be lost. > __out: __ret; > \ > }) -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Preserve task state in reentrant calls to ___wait_event
On Fri, Nov 06, 2015 at 08:44:08PM +, Chris Bainbridge wrote: > -#define ___wait_event(wq, condition, state, exclusive, ret, cmd) \ > +#define ___wait_event(wq, condition, nstate, exclusive, ret, cmd)\ > ({ \ > __label__ __out;\ > wait_queue_t __wait;\ > long __ret = ret; /* explicit shadow */ \ > + long ostate = current->state; \ XXX > \ > INIT_LIST_HEAD(&__wait.task_list); \ > if (exclusive) \ > @@ -222,16 +223,16 @@ wait_queue_head_t *bit_waitqueue(void *, int); > __wait.flags = 0; \ > \ > for (;;) { \ > - long __int = prepare_to_wait_event(, &__wait, state);\ > + long __int = prepare_to_wait_event(, &__wait, nstate);\ > \ > if (condition) \ > break; \ > \ > - if (___wait_is_interruptible(state) && __int) { \ > + if (___wait_is_interruptible(nstate) && __int) {\ > __ret = __int; \ > if (exclusive) {\ > abort_exclusive_wait(, &__wait, \ > - state, NULL); \ > + nstate, NULL); \ > goto __out; \ > } \ > break; \ > @@ -240,6 +241,7 @@ wait_queue_head_t *bit_waitqueue(void *, int); > cmd;\ > } \ > finish_wait(, &__wait); \ > + set_current_state(ostate); \ I'm not convinced that this particular code is (or can be) race free in the general reentrant case. The outer call to ___wait_event will miss any wake_up received in the inner call between XXX above (store of current->state) and this point of restoring the previous state. So if the inner condition evaluation or some interrupt handler happens to trigger a wake_up meant for the outer call then it will be lost. > __out: __ret; > \ > }) -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] Preserve task state in reentrant calls to ___wait_event
In the ACPI SBS initialisation, a reentrant call to wait_event_timeout() causes an intermittent boot stall of several minutes usually following the "Switching to clocksource tsc" message. This stall is caused by: 1. drivers/acpi/sbshc.c wait_transaction_complete() calls wait_event_timeout(): if (wait_event_timeout(hc->wait, smb_check_done(hc), msecs_to_jiffies(timeout))) 2. ___wait_event sets task state to uninterruptible 3. ___wait_event calls the "condition" smb_check_done() 4. smb_check_done (sbshc.c) calls through to ec_read() in drivers/acpi/ec.c 5. ec_guard() is reached which calls wait_event_timeout() if (wait_event_timeout(ec->wait, ec_transaction_completed(ec), guard)) ie. wait_event_timeout() is being called again inside evaluation of the previous wait_event_timeout() condition 5. The EC IRQ handler calls wake_up() and wakes up the sleeping task in ec_guard() 6. The task is now in state running even though the wait "condition" is still being evaluated 7. The "condition" check returns false so ___wait_event calls schedule_timeout() 8. Since the task state is running, the scheduler immediately schedules it again 9. This loop repeats for around 250 seconds event though the original wait_event_timeout was only 1000ms. This happens because each the call to schedule_timeout() usually returns immediately, taking less than 1ms, so the jiffies timeout counter is not decremented. The task is now stuck in a running state, and so is highly likely to get rescheduled immediately, which takes less than a jiffy. The root problem is that wait_event_timeout() does not preserve the task state when called by tasks that are not running. We fix this by preserving and restoring the task state in ___wait_event(). Signed-off-by: Chris Bainbridge --- I am assuming here that wait_event_timeout() is supposed to support reentrant calls. If not, perhaps it should BUG_ON when called with a non-running task state, and the SBS HC / ACPI EC code needs to be fixed to stop doing this. If reentrant calls are supposed to work, then this patch will preserve the task state (there may be a more appropriate way to support reentrant calls than this exact patch, suggestions/alternatives are welcome, but this does work). --- include/linux/wait.h | 10 ++ 1 file changed, 6 insertions(+), 4 deletions(-) diff --git a/include/linux/wait.h b/include/linux/wait.h index 1e1bf9f..a847cf8 100644 --- a/include/linux/wait.h +++ b/include/linux/wait.h @@ -209,11 +209,12 @@ wait_queue_head_t *bit_waitqueue(void *, int); * otherwise. */ -#define ___wait_event(wq, condition, state, exclusive, ret, cmd) \ +#define ___wait_event(wq, condition, nstate, exclusive, ret, cmd) \ ({ \ __label__ __out;\ wait_queue_t __wait;\ long __ret = ret; /* explicit shadow */ \ + long ostate = current->state; \ \ INIT_LIST_HEAD(&__wait.task_list); \ if (exclusive) \ @@ -222,16 +223,16 @@ wait_queue_head_t *bit_waitqueue(void *, int); __wait.flags = 0; \ \ for (;;) { \ - long __int = prepare_to_wait_event(, &__wait, state);\ + long __int = prepare_to_wait_event(, &__wait, nstate);\ \ if (condition) \ break; \ \ - if (___wait_is_interruptible(state) && __int) { \ + if (___wait_is_interruptible(nstate) && __int) {\ __ret = __int; \ if (exclusive) {\ abort_exclusive_wait(, &__wait, \ -state, NULL); \ +nstate, NULL); \ goto __out; \ } \ break;
[PATCH] Preserve task state in reentrant calls to ___wait_event
In the ACPI SBS initialisation, a reentrant call to wait_event_timeout() causes an intermittent boot stall of several minutes usually following the "Switching to clocksource tsc" message. This stall is caused by: 1. drivers/acpi/sbshc.c wait_transaction_complete() calls wait_event_timeout(): if (wait_event_timeout(hc->wait, smb_check_done(hc), msecs_to_jiffies(timeout))) 2. ___wait_event sets task state to uninterruptible 3. ___wait_event calls the "condition" smb_check_done() 4. smb_check_done (sbshc.c) calls through to ec_read() in drivers/acpi/ec.c 5. ec_guard() is reached which calls wait_event_timeout() if (wait_event_timeout(ec->wait, ec_transaction_completed(ec), guard)) ie. wait_event_timeout() is being called again inside evaluation of the previous wait_event_timeout() condition 5. The EC IRQ handler calls wake_up() and wakes up the sleeping task in ec_guard() 6. The task is now in state running even though the wait "condition" is still being evaluated 7. The "condition" check returns false so ___wait_event calls schedule_timeout() 8. Since the task state is running, the scheduler immediately schedules it again 9. This loop repeats for around 250 seconds event though the original wait_event_timeout was only 1000ms. This happens because each the call to schedule_timeout() usually returns immediately, taking less than 1ms, so the jiffies timeout counter is not decremented. The task is now stuck in a running state, and so is highly likely to get rescheduled immediately, which takes less than a jiffy. The root problem is that wait_event_timeout() does not preserve the task state when called by tasks that are not running. We fix this by preserving and restoring the task state in ___wait_event(). Signed-off-by: Chris Bainbridge <chris.bainbri...@gmail.com> --- I am assuming here that wait_event_timeout() is supposed to support reentrant calls. If not, perhaps it should BUG_ON when called with a non-running task state, and the SBS HC / ACPI EC code needs to be fixed to stop doing this. If reentrant calls are supposed to work, then this patch will preserve the task state (there may be a more appropriate way to support reentrant calls than this exact patch, suggestions/alternatives are welcome, but this does work). --- include/linux/wait.h | 10 ++ 1 file changed, 6 insertions(+), 4 deletions(-) diff --git a/include/linux/wait.h b/include/linux/wait.h index 1e1bf9f..a847cf8 100644 --- a/include/linux/wait.h +++ b/include/linux/wait.h @@ -209,11 +209,12 @@ wait_queue_head_t *bit_waitqueue(void *, int); * otherwise. */ -#define ___wait_event(wq, condition, state, exclusive, ret, cmd) \ +#define ___wait_event(wq, condition, nstate, exclusive, ret, cmd) \ ({ \ __label__ __out;\ wait_queue_t __wait;\ long __ret = ret; /* explicit shadow */ \ + long ostate = current->state; \ \ INIT_LIST_HEAD(&__wait.task_list); \ if (exclusive) \ @@ -222,16 +223,16 @@ wait_queue_head_t *bit_waitqueue(void *, int); __wait.flags = 0; \ \ for (;;) { \ - long __int = prepare_to_wait_event(, &__wait, state);\ + long __int = prepare_to_wait_event(, &__wait, nstate);\ \ if (condition) \ break; \ \ - if (___wait_is_interruptible(state) && __int) { \ + if (___wait_is_interruptible(nstate) && __int) {\ __ret = __int; \ if (exclusive) {\ abort_exclusive_wait(, &__wait, \ -state, NULL); \ +nstate, NULL); \ goto __out; \ } \
Re: Input: keyboard/Trackpad support for MacBookPro 12,1
On 5 May 2015 at 04:29, Yang Hongyang wrote: > Any ideas? https://bugzilla.kernel.org/show_bug.cgi?id=96771 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Input: keyboard/Trackpad support for MacBookPro 12,1
On 5 May 2015 at 04:29, Yang Hongyang macrosh...@163.com wrote: Any ideas? https://bugzilla.kernel.org/show_bug.cgi?id=96771 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[BUG] Macbook intermittent boot fail at "Switched to clocksource tsc"
Commit 7bc5a2b -- ACPI: Support _OSI("Darwin") correctly -- has introduced an intermittent boot failure on an IvyBridge Macbook Pro 13 (MacBookPro10,2). The failure rate is around 12%. This was diagnosed by doing a bisect with many boots for each kernel, then verifying failure on 7bc5a2b, and then booting 50 times again with the prior commit (9faf613) to verify that commit was ok (50 boots is necessary to get a high probability >99% of hitting this). Example fail: http://imgur.com//QHWJTYS The failure has been observed up to and including 4.0-rc7. The Apple BIOS is MBP102.88Z.0106.B07.1501071215 01/07/2015 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[BUG] Macbook intermittent boot fail at Switched to clocksource tsc
Commit 7bc5a2b -- ACPI: Support _OSI(Darwin) correctly -- has introduced an intermittent boot failure on an IvyBridge Macbook Pro 13 (MacBookPro10,2). The failure rate is around 12%. This was diagnosed by doing a bisect with many boots for each kernel, then verifying failure on 7bc5a2b, and then booting 50 times again with the prior commit (9faf613) to verify that commit was ok (50 boots is necessary to get a high probability 99% of hitting this). Example fail: http://imgur.com//QHWJTYS The failure has been observed up to and including 4.0-rc7. The Apple BIOS is MBP102.88Z.0106.B07.1501071215 01/07/2015 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[tip:x86/cpu] x86, cpu: Add forcepae parameter for booting PAE kernels on PAE-disabled Pentium M
Commit-ID: 69f2366c9456d0ce784cf5aba87ee77eeadc1d5e Gitweb: http://git.kernel.org/tip/69f2366c9456d0ce784cf5aba87ee77eeadc1d5e Author: Chris Bainbridge AuthorDate: Fri, 7 Mar 2014 18:40:42 +0700 Committer: H. Peter Anvin CommitDate: Thu, 20 Mar 2014 16:31:54 -0700 x86, cpu: Add forcepae parameter for booting PAE kernels on PAE-disabled Pentium M Many Pentium M systems disable PAE but may have a functionally usable PAE implementation. This adds the "forcepae" parameter which bypasses the boot check for PAE, and sets the CPU as being PAE capable. Using this parameter will taint the kernel with TAINT_CPU_OUT_OF_SPEC. Signed-off-by: Chris Bainbridge Link: http://lkml.kernel.org/r/20140307114040.GA4997@localhost Acked-by: Borislav Petkov Signed-off-by: H. Peter Anvin --- Documentation/kernel-parameters.txt | 7 +++ arch/x86/boot/cpucheck.c| 20 arch/x86/kernel/cpu/intel.c | 19 +++ 3 files changed, 46 insertions(+) diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt index 7116fda..06600cc 100644 --- a/Documentation/kernel-parameters.txt +++ b/Documentation/kernel-parameters.txt @@ -1011,6 +1011,13 @@ bytes respectively. Such letter suffixes can also be entirely omitted. parameter will force ia64_sal_cache_flush to call ia64_pal_cache_flush instead of SAL_CACHE_FLUSH. + forcepae [X86-32] + Forcefully enable Physical Address Extension (PAE). + Many Pentium M systems disable PAE but may have a + functionally usable PAE implementation. + Warning: use of this parameter will taint the kernel + and may cause unknown problems. + ftrace=[tracer] [FTRACE] will set and start the specified tracer as early as possible in order to facilitate early diff --git a/arch/x86/boot/cpucheck.c b/arch/x86/boot/cpucheck.c index 100a9a1..f0d0b20 100644 --- a/arch/x86/boot/cpucheck.c +++ b/arch/x86/boot/cpucheck.c @@ -67,6 +67,13 @@ static int is_transmeta(void) cpu_vendor[2] == A32('M', 'x', '8', '6'); } +static int is_intel(void) +{ + return cpu_vendor[0] == A32('G', 'e', 'n', 'u') && + cpu_vendor[1] == A32('i', 'n', 'e', 'I') && + cpu_vendor[2] == A32('n', 't', 'e', 'l'); +} + /* Returns a bitmask of which words we have error bits in */ static int check_cpuflags(void) { @@ -153,6 +160,19 @@ int check_cpu(int *cpu_level_ptr, int *req_level_ptr, u32 **err_flags_ptr) asm("wrmsr" : : "a" (eax), "d" (edx), "c" (ecx)); err = check_cpuflags(); + } else if (err == 0x01 && + !(err_flags[0] & ~(1 << X86_FEATURE_PAE)) && + is_intel() && cpu.level == 6 && + (cpu.model == 9 || cpu.model == 13)) { + /* PAE is disabled on this Pentium M but can be forced */ + if (cmdline_find_option_bool("forcepae")) { + puts("WARNING: Forcing PAE in CPU flags\n"); + set_bit(X86_FEATURE_PAE, cpu.flags); + err = check_cpuflags(); + } + else { + puts("WARNING: PAE disabled. Use parameter 'forcepae' to enable at your own risk!\n"); + } } if (err_flags_ptr) diff --git a/arch/x86/kernel/cpu/intel.c b/arch/x86/kernel/cpu/intel.c index 34bbb55..897d620 100644 --- a/arch/x86/kernel/cpu/intel.c +++ b/arch/x86/kernel/cpu/intel.c @@ -186,6 +186,14 @@ static void intel_smp_check(struct cpuinfo_x86 *c) } } +static int forcepae; +static int __init forcepae_setup(char *__unused) +{ + forcepae = 1; + return 1; +} +__setup("forcepae", forcepae_setup); + static void intel_workarounds(struct cpuinfo_x86 *c) { #ifdef CONFIG_X86_F00F_BUG @@ -214,6 +222,17 @@ static void intel_workarounds(struct cpuinfo_x86 *c) clear_cpu_cap(c, X86_FEATURE_SEP); /* +* PAE CPUID issue: many Pentium M report no PAE but may have a +* functionally usable PAE implementation. +* Forcefully enable PAE if kernel parameter "forcepae" is present. +*/ + if (forcepae) { + printk(KERN_WARNING "PAE forced!\n"); + set_cpu_cap(c, X86_FEATURE_PAE); + add_taint(TAINT_CPU_OUT_OF_SPEC, LOCKDEP_NOW_UNRELIABLE); + } + + /* * P4 Xeon errata 037 workaround. * Hardware prefetcher may cause stale data to be loaded into the cache. */ -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[tip:x86/cpu] x86, cpu: Add forcepae parameter for booting PAE kernels on PAE-disabled Pentium M
Commit-ID: 3e2ae0fed2a12ef72dea787db4c25466f9d89207 Gitweb: http://git.kernel.org/tip/3e2ae0fed2a12ef72dea787db4c25466f9d89207 Author: Chris Bainbridge AuthorDate: Fri, 7 Mar 2014 18:40:42 +0700 Committer: H. Peter Anvin CommitDate: Thu, 20 Mar 2014 16:28:43 -0700 x86, cpu: Add forcepae parameter for booting PAE kernels on PAE-disabled Pentium M Many Pentium M systems disable PAE but may have a functionally usable PAE implementation. This adds the "forcepae" parameter which bypasses the boot check for PAE, and sets the CPU as being PAE capable. Using this parameter will taint the kernel with TAINT_CPU_OUT_OF_SPEC. Signed-off-by: Chris Bainbridge Link: http://lkml.kernel.org/r/20140307114040.GA4997@localhost Signed-off-by: H. Peter Anvin --- Documentation/kernel-parameters.txt | 7 +++ arch/x86/boot/cpucheck.c| 20 arch/x86/kernel/cpu/intel.c | 19 +++ 3 files changed, 46 insertions(+) diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt index 7116fda..06600cc 100644 --- a/Documentation/kernel-parameters.txt +++ b/Documentation/kernel-parameters.txt @@ -1011,6 +1011,13 @@ bytes respectively. Such letter suffixes can also be entirely omitted. parameter will force ia64_sal_cache_flush to call ia64_pal_cache_flush instead of SAL_CACHE_FLUSH. + forcepae [X86-32] + Forcefully enable Physical Address Extension (PAE). + Many Pentium M systems disable PAE but may have a + functionally usable PAE implementation. + Warning: use of this parameter will taint the kernel + and may cause unknown problems. + ftrace=[tracer] [FTRACE] will set and start the specified tracer as early as possible in order to facilitate early diff --git a/arch/x86/boot/cpucheck.c b/arch/x86/boot/cpucheck.c index 100a9a1..f0d0b20 100644 --- a/arch/x86/boot/cpucheck.c +++ b/arch/x86/boot/cpucheck.c @@ -67,6 +67,13 @@ static int is_transmeta(void) cpu_vendor[2] == A32('M', 'x', '8', '6'); } +static int is_intel(void) +{ + return cpu_vendor[0] == A32('G', 'e', 'n', 'u') && + cpu_vendor[1] == A32('i', 'n', 'e', 'I') && + cpu_vendor[2] == A32('n', 't', 'e', 'l'); +} + /* Returns a bitmask of which words we have error bits in */ static int check_cpuflags(void) { @@ -153,6 +160,19 @@ int check_cpu(int *cpu_level_ptr, int *req_level_ptr, u32 **err_flags_ptr) asm("wrmsr" : : "a" (eax), "d" (edx), "c" (ecx)); err = check_cpuflags(); + } else if (err == 0x01 && + !(err_flags[0] & ~(1 << X86_FEATURE_PAE)) && + is_intel() && cpu.level == 6 && + (cpu.model == 9 || cpu.model == 13)) { + /* PAE is disabled on this Pentium M but can be forced */ + if (cmdline_find_option_bool("forcepae")) { + puts("WARNING: Forcing PAE in CPU flags\n"); + set_bit(X86_FEATURE_PAE, cpu.flags); + err = check_cpuflags(); + } + else { + puts("WARNING: PAE disabled. Use parameter 'forcepae' to enable at your own risk!\n"); + } } if (err_flags_ptr) diff --git a/arch/x86/kernel/cpu/intel.c b/arch/x86/kernel/cpu/intel.c index 34bbb55..897d620 100644 --- a/arch/x86/kernel/cpu/intel.c +++ b/arch/x86/kernel/cpu/intel.c @@ -186,6 +186,14 @@ static void intel_smp_check(struct cpuinfo_x86 *c) } } +static int forcepae; +static int __init forcepae_setup(char *__unused) +{ + forcepae = 1; + return 1; +} +__setup("forcepae", forcepae_setup); + static void intel_workarounds(struct cpuinfo_x86 *c) { #ifdef CONFIG_X86_F00F_BUG @@ -214,6 +222,17 @@ static void intel_workarounds(struct cpuinfo_x86 *c) clear_cpu_cap(c, X86_FEATURE_SEP); /* +* PAE CPUID issue: many Pentium M report no PAE but may have a +* functionally usable PAE implementation. +* Forcefully enable PAE if kernel parameter "forcepae" is present. +*/ + if (forcepae) { + printk(KERN_WARNING "PAE forced!\n"); + set_cpu_cap(c, X86_FEATURE_PAE); + add_taint(TAINT_CPU_OUT_OF_SPEC, LOCKDEP_NOW_UNRELIABLE); + } + + /* * P4 Xeon errata 037 workaround. * Hardware prefetcher may cause stale data to be loaded into the cache. */ -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[tip:x86/cpu] x86, cpu: Add forcepae parameter for booting PAE kernels on PAE-disabled Pentium M
Commit-ID: 3e2ae0fed2a12ef72dea787db4c25466f9d89207 Gitweb: http://git.kernel.org/tip/3e2ae0fed2a12ef72dea787db4c25466f9d89207 Author: Chris Bainbridge chris.bainbri...@gmail.com AuthorDate: Fri, 7 Mar 2014 18:40:42 +0700 Committer: H. Peter Anvin h...@zytor.com CommitDate: Thu, 20 Mar 2014 16:28:43 -0700 x86, cpu: Add forcepae parameter for booting PAE kernels on PAE-disabled Pentium M Many Pentium M systems disable PAE but may have a functionally usable PAE implementation. This adds the forcepae parameter which bypasses the boot check for PAE, and sets the CPU as being PAE capable. Using this parameter will taint the kernel with TAINT_CPU_OUT_OF_SPEC. Signed-off-by: Chris Bainbridge chris.bainbri...@gmail.com Link: http://lkml.kernel.org/r/20140307114040.GA4997@localhost Signed-off-by: H. Peter Anvin h...@zytor.com --- Documentation/kernel-parameters.txt | 7 +++ arch/x86/boot/cpucheck.c| 20 arch/x86/kernel/cpu/intel.c | 19 +++ 3 files changed, 46 insertions(+) diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt index 7116fda..06600cc 100644 --- a/Documentation/kernel-parameters.txt +++ b/Documentation/kernel-parameters.txt @@ -1011,6 +1011,13 @@ bytes respectively. Such letter suffixes can also be entirely omitted. parameter will force ia64_sal_cache_flush to call ia64_pal_cache_flush instead of SAL_CACHE_FLUSH. + forcepae [X86-32] + Forcefully enable Physical Address Extension (PAE). + Many Pentium M systems disable PAE but may have a + functionally usable PAE implementation. + Warning: use of this parameter will taint the kernel + and may cause unknown problems. + ftrace=[tracer] [FTRACE] will set and start the specified tracer as early as possible in order to facilitate early diff --git a/arch/x86/boot/cpucheck.c b/arch/x86/boot/cpucheck.c index 100a9a1..f0d0b20 100644 --- a/arch/x86/boot/cpucheck.c +++ b/arch/x86/boot/cpucheck.c @@ -67,6 +67,13 @@ static int is_transmeta(void) cpu_vendor[2] == A32('M', 'x', '8', '6'); } +static int is_intel(void) +{ + return cpu_vendor[0] == A32('G', 'e', 'n', 'u') + cpu_vendor[1] == A32('i', 'n', 'e', 'I') + cpu_vendor[2] == A32('n', 't', 'e', 'l'); +} + /* Returns a bitmask of which words we have error bits in */ static int check_cpuflags(void) { @@ -153,6 +160,19 @@ int check_cpu(int *cpu_level_ptr, int *req_level_ptr, u32 **err_flags_ptr) asm(wrmsr : : a (eax), d (edx), c (ecx)); err = check_cpuflags(); + } else if (err == 0x01 + !(err_flags[0] ~(1 X86_FEATURE_PAE)) + is_intel() cpu.level == 6 + (cpu.model == 9 || cpu.model == 13)) { + /* PAE is disabled on this Pentium M but can be forced */ + if (cmdline_find_option_bool(forcepae)) { + puts(WARNING: Forcing PAE in CPU flags\n); + set_bit(X86_FEATURE_PAE, cpu.flags); + err = check_cpuflags(); + } + else { + puts(WARNING: PAE disabled. Use parameter 'forcepae' to enable at your own risk!\n); + } } if (err_flags_ptr) diff --git a/arch/x86/kernel/cpu/intel.c b/arch/x86/kernel/cpu/intel.c index 34bbb55..897d620 100644 --- a/arch/x86/kernel/cpu/intel.c +++ b/arch/x86/kernel/cpu/intel.c @@ -186,6 +186,14 @@ static void intel_smp_check(struct cpuinfo_x86 *c) } } +static int forcepae; +static int __init forcepae_setup(char *__unused) +{ + forcepae = 1; + return 1; +} +__setup(forcepae, forcepae_setup); + static void intel_workarounds(struct cpuinfo_x86 *c) { #ifdef CONFIG_X86_F00F_BUG @@ -214,6 +222,17 @@ static void intel_workarounds(struct cpuinfo_x86 *c) clear_cpu_cap(c, X86_FEATURE_SEP); /* +* PAE CPUID issue: many Pentium M report no PAE but may have a +* functionally usable PAE implementation. +* Forcefully enable PAE if kernel parameter forcepae is present. +*/ + if (forcepae) { + printk(KERN_WARNING PAE forced!\n); + set_cpu_cap(c, X86_FEATURE_PAE); + add_taint(TAINT_CPU_OUT_OF_SPEC, LOCKDEP_NOW_UNRELIABLE); + } + + /* * P4 Xeon errata 037 workaround. * Hardware prefetcher may cause stale data to be loaded into the cache. */ -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[tip:x86/cpu] x86, cpu: Add forcepae parameter for booting PAE kernels on PAE-disabled Pentium M
Commit-ID: 69f2366c9456d0ce784cf5aba87ee77eeadc1d5e Gitweb: http://git.kernel.org/tip/69f2366c9456d0ce784cf5aba87ee77eeadc1d5e Author: Chris Bainbridge chris.bainbri...@gmail.com AuthorDate: Fri, 7 Mar 2014 18:40:42 +0700 Committer: H. Peter Anvin h...@zytor.com CommitDate: Thu, 20 Mar 2014 16:31:54 -0700 x86, cpu: Add forcepae parameter for booting PAE kernels on PAE-disabled Pentium M Many Pentium M systems disable PAE but may have a functionally usable PAE implementation. This adds the forcepae parameter which bypasses the boot check for PAE, and sets the CPU as being PAE capable. Using this parameter will taint the kernel with TAINT_CPU_OUT_OF_SPEC. Signed-off-by: Chris Bainbridge chris.bainbri...@gmail.com Link: http://lkml.kernel.org/r/20140307114040.GA4997@localhost Acked-by: Borislav Petkov b...@suse.de Signed-off-by: H. Peter Anvin h...@zytor.com --- Documentation/kernel-parameters.txt | 7 +++ arch/x86/boot/cpucheck.c| 20 arch/x86/kernel/cpu/intel.c | 19 +++ 3 files changed, 46 insertions(+) diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt index 7116fda..06600cc 100644 --- a/Documentation/kernel-parameters.txt +++ b/Documentation/kernel-parameters.txt @@ -1011,6 +1011,13 @@ bytes respectively. Such letter suffixes can also be entirely omitted. parameter will force ia64_sal_cache_flush to call ia64_pal_cache_flush instead of SAL_CACHE_FLUSH. + forcepae [X86-32] + Forcefully enable Physical Address Extension (PAE). + Many Pentium M systems disable PAE but may have a + functionally usable PAE implementation. + Warning: use of this parameter will taint the kernel + and may cause unknown problems. + ftrace=[tracer] [FTRACE] will set and start the specified tracer as early as possible in order to facilitate early diff --git a/arch/x86/boot/cpucheck.c b/arch/x86/boot/cpucheck.c index 100a9a1..f0d0b20 100644 --- a/arch/x86/boot/cpucheck.c +++ b/arch/x86/boot/cpucheck.c @@ -67,6 +67,13 @@ static int is_transmeta(void) cpu_vendor[2] == A32('M', 'x', '8', '6'); } +static int is_intel(void) +{ + return cpu_vendor[0] == A32('G', 'e', 'n', 'u') + cpu_vendor[1] == A32('i', 'n', 'e', 'I') + cpu_vendor[2] == A32('n', 't', 'e', 'l'); +} + /* Returns a bitmask of which words we have error bits in */ static int check_cpuflags(void) { @@ -153,6 +160,19 @@ int check_cpu(int *cpu_level_ptr, int *req_level_ptr, u32 **err_flags_ptr) asm(wrmsr : : a (eax), d (edx), c (ecx)); err = check_cpuflags(); + } else if (err == 0x01 + !(err_flags[0] ~(1 X86_FEATURE_PAE)) + is_intel() cpu.level == 6 + (cpu.model == 9 || cpu.model == 13)) { + /* PAE is disabled on this Pentium M but can be forced */ + if (cmdline_find_option_bool(forcepae)) { + puts(WARNING: Forcing PAE in CPU flags\n); + set_bit(X86_FEATURE_PAE, cpu.flags); + err = check_cpuflags(); + } + else { + puts(WARNING: PAE disabled. Use parameter 'forcepae' to enable at your own risk!\n); + } } if (err_flags_ptr) diff --git a/arch/x86/kernel/cpu/intel.c b/arch/x86/kernel/cpu/intel.c index 34bbb55..897d620 100644 --- a/arch/x86/kernel/cpu/intel.c +++ b/arch/x86/kernel/cpu/intel.c @@ -186,6 +186,14 @@ static void intel_smp_check(struct cpuinfo_x86 *c) } } +static int forcepae; +static int __init forcepae_setup(char *__unused) +{ + forcepae = 1; + return 1; +} +__setup(forcepae, forcepae_setup); + static void intel_workarounds(struct cpuinfo_x86 *c) { #ifdef CONFIG_X86_F00F_BUG @@ -214,6 +222,17 @@ static void intel_workarounds(struct cpuinfo_x86 *c) clear_cpu_cap(c, X86_FEATURE_SEP); /* +* PAE CPUID issue: many Pentium M report no PAE but may have a +* functionally usable PAE implementation. +* Forcefully enable PAE if kernel parameter forcepae is present. +*/ + if (forcepae) { + printk(KERN_WARNING PAE forced!\n); + set_cpu_cap(c, X86_FEATURE_PAE); + add_taint(TAINT_CPU_OUT_OF_SPEC, LOCKDEP_NOW_UNRELIABLE); + } + + /* * P4 Xeon errata 037 workaround. * Hardware prefetcher may cause stale data to be loaded into the cache. */ -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http
[PATCH] x86: Add forcepae parameter for booting PAE kernels on PAE-disabled Pentium M
From: Chris Bainbridge Many Pentium M systems disable PAE but may have a functionally usable PAE implementation. This adds the "forcepae" parameter which bypasses the boot check for PAE, and sets the CPU as being PAE capable. Using this parameter will taint the kernel with TAINT_CPU_OUT_OF_SPEC. Signed-off-by: Chris Bainbridge --- This patch depends on Dave Jones's TAINT_CPU_OUT_OF_SPEC patch @ https://lkml.org/lkml/2014/2/26/394 diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt index 580a60c..67755ea 100644 --- a/Documentation/kernel-parameters.txt +++ b/Documentation/kernel-parameters.txt @@ -1011,6 +1011,13 @@ bytes respectively. Such letter suffixes can also be entirely omitted. parameter will force ia64_sal_cache_flush to call ia64_pal_cache_flush instead of SAL_CACHE_FLUSH. + forcepae [X86-32] + Forcefully enable Physical Address Extension (PAE). + Many Pentium M systems disable PAE but may have a + functionally usable PAE implementation. + Warning: use of this parameter will taint the kernel + and may cause unknown problems. + ftrace=[tracer] [FTRACE] will set and start the specified tracer as early as possible in order to facilitate early diff --git a/arch/x86/boot/cpucheck.c b/arch/x86/boot/cpucheck.c index 100a9a1..f0d0b20 100644 --- a/arch/x86/boot/cpucheck.c +++ b/arch/x86/boot/cpucheck.c @@ -67,6 +67,13 @@ static int is_transmeta(void) cpu_vendor[2] == A32('M', 'x', '8', '6'); } +static int is_intel(void) +{ + return cpu_vendor[0] == A32('G', 'e', 'n', 'u') && + cpu_vendor[1] == A32('i', 'n', 'e', 'I') && + cpu_vendor[2] == A32('n', 't', 'e', 'l'); +} + /* Returns a bitmask of which words we have error bits in */ static int check_cpuflags(void) { @@ -153,6 +160,19 @@ int check_cpu(int *cpu_level_ptr, int *req_level_ptr, u32 **err_flags_ptr) asm("wrmsr" : : "a" (eax), "d" (edx), "c" (ecx)); err = check_cpuflags(); + } else if (err == 0x01 && + !(err_flags[0] & ~(1 << X86_FEATURE_PAE)) && + is_intel() && cpu.level == 6 && + (cpu.model == 9 || cpu.model == 13)) { + /* PAE is disabled on this Pentium M but can be forced */ + if (cmdline_find_option_bool("forcepae")) { + puts("WARNING: Forcing PAE in CPU flags\n"); + set_bit(X86_FEATURE_PAE, cpu.flags); + err = check_cpuflags(); + } + else { + puts("WARNING: PAE disabled. Use parameter 'forcepae' to enable at your own risk!\n"); + } } if (err_flags_ptr) diff --git a/arch/x86/kernel/cpu/intel.c b/arch/x86/kernel/cpu/intel.c index ea56e7c..053cb59 100644 --- a/arch/x86/kernel/cpu/intel.c +++ b/arch/x86/kernel/cpu/intel.c @@ -195,6 +195,14 @@ static void intel_smp_check(struct cpuinfo_x86 *c) } } +static int forcepae; +static int __init forcepae_setup(char *__unused) +{ + forcepae = 1; + return 1; +} +__setup("forcepae", forcepae_setup); + static void intel_workarounds(struct cpuinfo_x86 *c) { unsigned long lo, hi; @@ -225,6 +233,17 @@ static void intel_workarounds(struct cpuinfo_x86 *c) clear_cpu_cap(c, X86_FEATURE_SEP); /* +* PAE CPUID issue: many Pentium M report no PAE but may have a +* functionally usable PAE implementation. +* Forcefully enable PAE if kernel parameter "forcepae" is present. +*/ + if (forcepae) { + printk(KERN_WARNING "PAE forced!\n"); + set_cpu_cap(c, X86_FEATURE_PAE); + add_taint(TAINT_CPU_OUT_OF_SPEC, LOCKDEP_NOW_UNRELIABLE); + } + + /* * P4 Xeon errata 037 workaround. * Hardware prefetcher may cause stale data to be loaded into the cache. */ -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] x86: Add forcepae parameter for booting PAE kernels on PAE-disabled Pentium M
From: Chris Bainbridge chris.bainbri...@gmail.com Many Pentium M systems disable PAE but may have a functionally usable PAE implementation. This adds the forcepae parameter which bypasses the boot check for PAE, and sets the CPU as being PAE capable. Using this parameter will taint the kernel with TAINT_CPU_OUT_OF_SPEC. Signed-off-by: Chris Bainbridge chris.bainbri...@gmail.com --- This patch depends on Dave Jones's TAINT_CPU_OUT_OF_SPEC patch @ https://lkml.org/lkml/2014/2/26/394 diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt index 580a60c..67755ea 100644 --- a/Documentation/kernel-parameters.txt +++ b/Documentation/kernel-parameters.txt @@ -1011,6 +1011,13 @@ bytes respectively. Such letter suffixes can also be entirely omitted. parameter will force ia64_sal_cache_flush to call ia64_pal_cache_flush instead of SAL_CACHE_FLUSH. + forcepae [X86-32] + Forcefully enable Physical Address Extension (PAE). + Many Pentium M systems disable PAE but may have a + functionally usable PAE implementation. + Warning: use of this parameter will taint the kernel + and may cause unknown problems. + ftrace=[tracer] [FTRACE] will set and start the specified tracer as early as possible in order to facilitate early diff --git a/arch/x86/boot/cpucheck.c b/arch/x86/boot/cpucheck.c index 100a9a1..f0d0b20 100644 --- a/arch/x86/boot/cpucheck.c +++ b/arch/x86/boot/cpucheck.c @@ -67,6 +67,13 @@ static int is_transmeta(void) cpu_vendor[2] == A32('M', 'x', '8', '6'); } +static int is_intel(void) +{ + return cpu_vendor[0] == A32('G', 'e', 'n', 'u') + cpu_vendor[1] == A32('i', 'n', 'e', 'I') + cpu_vendor[2] == A32('n', 't', 'e', 'l'); +} + /* Returns a bitmask of which words we have error bits in */ static int check_cpuflags(void) { @@ -153,6 +160,19 @@ int check_cpu(int *cpu_level_ptr, int *req_level_ptr, u32 **err_flags_ptr) asm(wrmsr : : a (eax), d (edx), c (ecx)); err = check_cpuflags(); + } else if (err == 0x01 + !(err_flags[0] ~(1 X86_FEATURE_PAE)) + is_intel() cpu.level == 6 + (cpu.model == 9 || cpu.model == 13)) { + /* PAE is disabled on this Pentium M but can be forced */ + if (cmdline_find_option_bool(forcepae)) { + puts(WARNING: Forcing PAE in CPU flags\n); + set_bit(X86_FEATURE_PAE, cpu.flags); + err = check_cpuflags(); + } + else { + puts(WARNING: PAE disabled. Use parameter 'forcepae' to enable at your own risk!\n); + } } if (err_flags_ptr) diff --git a/arch/x86/kernel/cpu/intel.c b/arch/x86/kernel/cpu/intel.c index ea56e7c..053cb59 100644 --- a/arch/x86/kernel/cpu/intel.c +++ b/arch/x86/kernel/cpu/intel.c @@ -195,6 +195,14 @@ static void intel_smp_check(struct cpuinfo_x86 *c) } } +static int forcepae; +static int __init forcepae_setup(char *__unused) +{ + forcepae = 1; + return 1; +} +__setup(forcepae, forcepae_setup); + static void intel_workarounds(struct cpuinfo_x86 *c) { unsigned long lo, hi; @@ -225,6 +233,17 @@ static void intel_workarounds(struct cpuinfo_x86 *c) clear_cpu_cap(c, X86_FEATURE_SEP); /* +* PAE CPUID issue: many Pentium M report no PAE but may have a +* functionally usable PAE implementation. +* Forcefully enable PAE if kernel parameter forcepae is present. +*/ + if (forcepae) { + printk(KERN_WARNING PAE forced!\n); + set_cpu_cap(c, X86_FEATURE_PAE); + add_taint(TAINT_CPU_OUT_OF_SPEC, LOCKDEP_NOW_UNRELIABLE); + } + + /* * P4 Xeon errata 037 workaround. * Hardware prefetcher may cause stale data to be loaded into the cache. */ -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Re: [PATCH] x86: set Pentium M as PAE capable
On 4 March 2014 17:44, Borislav Petkov wrote: > On Tue, Mar 04, 2014 at 01:06:03PM +0700, Chris Bainbridge wrote: >> On Mon, Mar 03, 2014 at 09:04:19PM -0800, H. Peter Anvin wrote: >> > forcepae is descriptive. >> >> Back to forcepae. > > Ok, it looks ok to me after a quick look. Now you only have to ask > Dave whether he's fine with you merging his patch into yours, write a > nice commit message explaining why this patch is needed and give it a > final test on your machine. Dave - are you ok with the merge? I take it I am expected to re-post the most recent patch plus changelog entry as a new thread (and with a subject that better reflects the new patch)? -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Re: [PATCH] x86: set Pentium M as PAE capable
On 4 March 2014 17:44, Borislav Petkov b...@alien8.de wrote: On Tue, Mar 04, 2014 at 01:06:03PM +0700, Chris Bainbridge wrote: On Mon, Mar 03, 2014 at 09:04:19PM -0800, H. Peter Anvin wrote: forcepae is descriptive. Back to forcepae. Ok, it looks ok to me after a quick look. Now you only have to ask Dave whether he's fine with you merging his patch into yours, write a nice commit message explaining why this patch is needed and give it a final test on your machine. Dave - are you ok with the merge? I take it I am expected to re-post the most recent patch plus changelog entry as a new thread (and with a subject that better reflects the new patch)? -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Re: [PATCH] x86: set Pentium M as PAE capable
On Mon, Mar 03, 2014 at 09:04:19PM -0800, H. Peter Anvin wrote: > forcepae is descriptive. Back to forcepae. Signed-off-by: Chris Bainbridge --- diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt index 580a60c..67755ea 100644 --- a/Documentation/kernel-parameters.txt +++ b/Documentation/kernel-parameters.txt @@ -1011,6 +1011,13 @@ bytes respectively. Such letter suffixes can also be entirely omitted. parameter will force ia64_sal_cache_flush to call ia64_pal_cache_flush instead of SAL_CACHE_FLUSH. + forcepae [X86-32] + Forcefully enable Physical Address Extension (PAE). + Many Pentium M systems disable PAE but may have a + functionally usable PAE implementation. + Warning: use of this parameter will taint the kernel + and may cause unknown problems. + ftrace=[tracer] [FTRACE] will set and start the specified tracer as early as possible in order to facilitate early diff --git a/arch/x86/boot/cpucheck.c b/arch/x86/boot/cpucheck.c index 100a9a1..f0d0b20 100644 --- a/arch/x86/boot/cpucheck.c +++ b/arch/x86/boot/cpucheck.c @@ -67,6 +67,13 @@ static int is_transmeta(void) cpu_vendor[2] == A32('M', 'x', '8', '6'); } +static int is_intel(void) +{ + return cpu_vendor[0] == A32('G', 'e', 'n', 'u') && + cpu_vendor[1] == A32('i', 'n', 'e', 'I') && + cpu_vendor[2] == A32('n', 't', 'e', 'l'); +} + /* Returns a bitmask of which words we have error bits in */ static int check_cpuflags(void) { @@ -153,6 +160,19 @@ int check_cpu(int *cpu_level_ptr, int *req_level_ptr, u32 **err_flags_ptr) asm("wrmsr" : : "a" (eax), "d" (edx), "c" (ecx)); err = check_cpuflags(); + } else if (err == 0x01 && + !(err_flags[0] & ~(1 << X86_FEATURE_PAE)) && + is_intel() && cpu.level == 6 && + (cpu.model == 9 || cpu.model == 13)) { + /* PAE is disabled on this Pentium M but can be forced */ + if (cmdline_find_option_bool("forcepae")) { + puts("WARNING: Forcing PAE in CPU flags\n"); + set_bit(X86_FEATURE_PAE, cpu.flags); + err = check_cpuflags(); + } + else { + puts("WARNING: PAE disabled. Use parameter 'forcepae' to enable at your own risk!\n"); + } } if (err_flags_ptr) diff --git a/arch/x86/kernel/cpu/amd.c b/arch/x86/kernel/cpu/amd.c index c67ffa6..7ec4119 100644 --- a/arch/x86/kernel/cpu/amd.c +++ b/arch/x86/kernel/cpu/amd.c @@ -218,7 +218,7 @@ static void amd_k7_smp_check(struct cpuinfo_x86 *c) */ WARN_ONCE(1, "WARNING: This combination of AMD" " processors is not suitable for SMP.\n"); - add_taint(TAINT_UNSAFE_SMP, LOCKDEP_NOW_UNRELIABLE); + add_taint(TAINT_CPU_OUT_OF_SPEC, LOCKDEP_NOW_UNRELIABLE); } static void init_amd_k7(struct cpuinfo_x86 *c) diff --git a/arch/x86/kernel/cpu/intel.c b/arch/x86/kernel/cpu/intel.c index ea56e7c..053cb59 100644 --- a/arch/x86/kernel/cpu/intel.c +++ b/arch/x86/kernel/cpu/intel.c @@ -195,6 +195,14 @@ static void intel_smp_check(struct cpuinfo_x86 *c) } } +static int forcepae; +static int __init forcepae_setup(char *__unused) +{ + forcepae = 1; + return 1; +} +__setup("forcepae", forcepae_setup); + static void intel_workarounds(struct cpuinfo_x86 *c) { unsigned long lo, hi; @@ -225,6 +233,17 @@ static void intel_workarounds(struct cpuinfo_x86 *c) clear_cpu_cap(c, X86_FEATURE_SEP); /* +* PAE CPUID issue: many Pentium M report no PAE but may have a +* functionally usable PAE implementation. +* Forcefully enable PAE if kernel parameter "forcepae" is present. +*/ + if (forcepae) { + printk(KERN_WARNING "PAE forced!\n"); + set_cpu_cap(c, X86_FEATURE_PAE); + add_taint(TAINT_CPU_OUT_OF_SPEC, LOCKDEP_NOW_UNRELIABLE); + } + + /* * P4 Xeon errata 037 workaround. * Hardware prefetcher may cause stale data to be loaded into the cache. */ diff --git a/include/linux/kernel.h b/include/linux/kernel.h index 196d1ea..08fb024 100644 --- a/include/linux/kernel.h +++ b/include/linux/kernel.h @@ -458,7 +458,7 @@ extern enum system_states { #define TAINT_PROPRIETARY_MODULE 0 #define TAINT_FORCED_MODULE1 -#define TAINT_UNSAFE_SMP 2 +#define TAINT_CPU_OUT_OF_SPEC 2 #define TAINT_FORCED_RMMOD 3 #defin
Re: Re: [PATCH] x86: set Pentium M as PAE capable
On Mon, Mar 03, 2014 at 08:29:39PM +0100, Borislav Petkov wrote: > On Mon, Mar 03, 2014 at 03:04:35PM +0700, Chris Bainbridge wrote: > > On 3 March 2014 02:05, Roland Kletzing wrote: > > > i would recommend adding the newly introduced param to > > > Documentation/kernel- > > > parameters.txt , though. > > > > Done. > > > > Signed-off-by: Chris Bainbridge > > --- > > diff --git a/Documentation/kernel-parameters.txt > > b/Documentation/kernel-parameters.txt > > index b9e9bd8..388b5e9 100644 > > --- a/Documentation/kernel-parameters.txt > > +++ b/Documentation/kernel-parameters.txt > > @@ -962,6 +962,13 @@ bytes respectively. Such letter suffixes can also be > > entirely omitted. > > parameter will force ia64_sal_cache_flush to call > > ia64_pal_cache_flush instead of SAL_CACHE_FLUSH. > > > > + forcepae [X86-32] > > + Forcefully enable Physical Address Extension (PAE). > > + Many Pentium M systems disable PAE but may have a > > + functionally usable PAE implementation. > > + Note: This parameter is unsupported, may cause unknown > > What does "unsupported" mean here exactly? It was supposed to have the dual meaning that neither the kernel developers or Intel are going to help you if it doesn't work. But perhaps it is unnecessary - being tainted implies that the kernel developers won't help, and Intel certainly won't be interested. > > This function is called check_cpuflags() now. You probably want to redo > your patch against tip/master, i.e.: > > git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git#master > Done, new patch is against tip. > > + } > > + else { > > + puts("ERROR: PAE is disabled on this Pentium M\n" > > + "(PAE can potentially be enabled with " > > + "kernel parameter\n" > > + "\"forcepae\" - this is unsupported, may " > > + "cause unknown\n" > > + "problems, and will taint the kernel)\n"); > > This string could definitely violate the 80 cols rule so that it is much > more readable: > > } > else > puts("WARNING: PAE disabled. Use \"forcepae\" to enable > at your own risk!\n"); > > I've shortened it to the most relevant info only. No need to say we're > tainting the kernel because LOCKDEP_NOW_UNRELIABLE will cause that > anyway below. Ok I changed it to: "WARNING: PAE disabled. Use parameter 'pae' to enable at your own risk!\n"" > > diff --git a/arch/x86/kernel/cpu/intel.c b/arch/x86/kernel/cpu/intel.c > > index bbe1b8b..271686d 100644 > > --- a/arch/x86/kernel/cpu/intel.c > > +++ b/arch/x86/kernel/cpu/intel.c > > @@ -196,6 +196,14 @@ static void intel_smp_check(struct cpuinfo_x86 *c) > > } > > } > > > > +static int forcepae; > > +static int __init forcepae_setup(char *__unused) > > +{ > > + forcepae = 1; > > + return 1; > > +} > > +__setup("forcepae", forcepae_setup); > > Yeah, why not simply call it "pae"? It is smaller and the letter > combination is not used yet and it means the same. I don't see any reason not to just use "pae" "forcepae" is perhaps a bit more descriptive but the other text in the patch clearly describes the parameter so it doesn't really matter. > > static void intel_workarounds(struct cpuinfo_x86 *c) > > { > > unsigned long lo, hi; > > @@ -226,6 +234,17 @@ static void intel_workarounds(struct cpuinfo_x86 *c) > > clear_cpu_cap(c, X86_FEATURE_SEP); > > > > /* > > +* PAE CPUID issue: many Pentium M report no PAE but may have a > > + * functionally usable PAE implementation. > > +* Forcefully enable PAE if kernel parameter "forcepae" is present. > > +*/ > > + if (forcepae) { > > + printk(KERN_WARNING "PAE forced!\n"); > > + set_cpu_cap(c, X86_FEATURE_PAE); > > + add_taint(TAINT_CPU_OUT_OF_SPEC, LOCKDEP_NOW_UNRELIABLE); > > Right, this implies Dave's patch is preceding yours. I guess hpa can > fish it out from the thread when applying. I will include Dave's patch, it is trivial. New patch with all above changes follows. Signed-off-by: Chris Bainbridge --- diff --git a/Documentation/kernel-par
Re: Re: [PATCH] x86: set Pentium M as PAE capable
On Sun, Mar 02, 2014 at 09:56:19PM +0100, Andreas Mohr wrote: > Hi, > > > /* > > +* PAE CPUID bug: Pentium M reports no PAE but has PAE > > +*/ > > Ain't that a tad strongly/incorrectly worded? I've updated the wording. On 3 March 2014 02:05, Roland Kletzing wrote: > i would recommend adding the newly introduced param to > Documentation/kernel- > parameters.txt , though. Done. Signed-off-by: Chris Bainbridge --- diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt index b9e9bd8..388b5e9 100644 --- a/Documentation/kernel-parameters.txt +++ b/Documentation/kernel-parameters.txt @@ -962,6 +962,13 @@ bytes respectively. Such letter suffixes can also be entirely omitted. parameter will force ia64_sal_cache_flush to call ia64_pal_cache_flush instead of SAL_CACHE_FLUSH. + forcepae [X86-32] + Forcefully enable Physical Address Extension (PAE). + Many Pentium M systems disable PAE but may have a + functionally usable PAE implementation. + Note: This parameter is unsupported, may cause unknown + problems, and will taint the kernel. + ftrace=[tracer] [FTRACE] will set and start the specified tracer as early as possible in order to facilitate early diff --git a/arch/x86/boot/cpucheck.c b/arch/x86/boot/cpucheck.c index 4d3ff03..93ba160 100644 --- a/arch/x86/boot/cpucheck.c +++ b/arch/x86/boot/cpucheck.c @@ -69,6 +69,13 @@ static int is_transmeta(void) cpu_vendor[2] == A32('M', 'x', '8', '6'); } +static int is_intel(void) +{ + return cpu_vendor[0] == A32('G', 'e', 'n', 'u') && + cpu_vendor[1] == A32('i', 'n', 'e', 'I') && + cpu_vendor[2] == A32('n', 't', 'e', 'l'); +} + static int has_fpu(void) { u16 fcw = -1, fsw = -1; @@ -239,6 +246,24 @@ int check_cpu(int *cpu_level_ptr, int *req_level_ptr, u32 **err_flags_ptr) asm("wrmsr" : : "a" (eax), "d" (edx), "c" (ecx)); err = check_flags(); + } else if (err == 0x01 && + !(err_flags[0] & ~(1 << X86_FEATURE_PAE)) && + is_intel() && cpu.level == 6 && + (cpu.model == 9 || cpu.model == 13)) { + /* PAE is disabled on this Pentium M but can be forced */ + if (cmdline_find_option_bool("forcepae")) { + puts("WARNING: Forcing PAE in CPU flags\n"); + set_bit(X86_FEATURE_PAE, cpu.flags); + err = check_flags(); + } + else { + puts("ERROR: PAE is disabled on this Pentium M\n" + "(PAE can potentially be enabled with " + "kernel parameter\n" + "\"forcepae\" - this is unsupported, may " + "cause unknown\n" + "problems, and will taint the kernel)\n"); + } } if (err_flags_ptr) diff --git a/arch/x86/kernel/cpu/intel.c b/arch/x86/kernel/cpu/intel.c index bbe1b8b..271686d 100644 --- a/arch/x86/kernel/cpu/intel.c +++ b/arch/x86/kernel/cpu/intel.c @@ -196,6 +196,14 @@ static void intel_smp_check(struct cpuinfo_x86 *c) } } +static int forcepae; +static int __init forcepae_setup(char *__unused) +{ + forcepae = 1; + return 1; +} +__setup("forcepae", forcepae_setup); + static void intel_workarounds(struct cpuinfo_x86 *c) { unsigned long lo, hi; @@ -226,6 +234,17 @@ static void intel_workarounds(struct cpuinfo_x86 *c) clear_cpu_cap(c, X86_FEATURE_SEP); /* +* PAE CPUID issue: many Pentium M report no PAE but may have a +* functionally usable PAE implementation. +* Forcefully enable PAE if kernel parameter "forcepae" is present. +*/ + if (forcepae) { + printk(KERN_WARNING "PAE forced!\n"); + set_cpu_cap(c, X86_FEATURE_PAE); + add_taint(TAINT_CPU_OUT_OF_SPEC, LOCKDEP_NOW_UNRELIABLE); + } + + /* * P4 Xeon errata 037 workaround. * Hardware prefetcher may cause stale data to be loaded into the cache. */ -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Re: [PATCH] x86: set Pentium M as PAE capable
On Sun, Mar 02, 2014 at 09:56:19PM +0100, Andreas Mohr wrote: Hi, /* +* PAE CPUID bug: Pentium M reports no PAE but has PAE +*/ Ain't that a tad strongly/incorrectly worded? I've updated the wording. On 3 March 2014 02:05, Roland Kletzing devz...@web.de wrote: i would recommend adding the newly introduced param to Documentation/kernel- parameters.txt , though. Done. Signed-off-by: Chris Bainbridge chris.bainbri...@gmail.com --- diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt index b9e9bd8..388b5e9 100644 --- a/Documentation/kernel-parameters.txt +++ b/Documentation/kernel-parameters.txt @@ -962,6 +962,13 @@ bytes respectively. Such letter suffixes can also be entirely omitted. parameter will force ia64_sal_cache_flush to call ia64_pal_cache_flush instead of SAL_CACHE_FLUSH. + forcepae [X86-32] + Forcefully enable Physical Address Extension (PAE). + Many Pentium M systems disable PAE but may have a + functionally usable PAE implementation. + Note: This parameter is unsupported, may cause unknown + problems, and will taint the kernel. + ftrace=[tracer] [FTRACE] will set and start the specified tracer as early as possible in order to facilitate early diff --git a/arch/x86/boot/cpucheck.c b/arch/x86/boot/cpucheck.c index 4d3ff03..93ba160 100644 --- a/arch/x86/boot/cpucheck.c +++ b/arch/x86/boot/cpucheck.c @@ -69,6 +69,13 @@ static int is_transmeta(void) cpu_vendor[2] == A32('M', 'x', '8', '6'); } +static int is_intel(void) +{ + return cpu_vendor[0] == A32('G', 'e', 'n', 'u') + cpu_vendor[1] == A32('i', 'n', 'e', 'I') + cpu_vendor[2] == A32('n', 't', 'e', 'l'); +} + static int has_fpu(void) { u16 fcw = -1, fsw = -1; @@ -239,6 +246,24 @@ int check_cpu(int *cpu_level_ptr, int *req_level_ptr, u32 **err_flags_ptr) asm(wrmsr : : a (eax), d (edx), c (ecx)); err = check_flags(); + } else if (err == 0x01 + !(err_flags[0] ~(1 X86_FEATURE_PAE)) + is_intel() cpu.level == 6 + (cpu.model == 9 || cpu.model == 13)) { + /* PAE is disabled on this Pentium M but can be forced */ + if (cmdline_find_option_bool(forcepae)) { + puts(WARNING: Forcing PAE in CPU flags\n); + set_bit(X86_FEATURE_PAE, cpu.flags); + err = check_flags(); + } + else { + puts(ERROR: PAE is disabled on this Pentium M\n + (PAE can potentially be enabled with + kernel parameter\n + \forcepae\ - this is unsupported, may + cause unknown\n + problems, and will taint the kernel)\n); + } } if (err_flags_ptr) diff --git a/arch/x86/kernel/cpu/intel.c b/arch/x86/kernel/cpu/intel.c index bbe1b8b..271686d 100644 --- a/arch/x86/kernel/cpu/intel.c +++ b/arch/x86/kernel/cpu/intel.c @@ -196,6 +196,14 @@ static void intel_smp_check(struct cpuinfo_x86 *c) } } +static int forcepae; +static int __init forcepae_setup(char *__unused) +{ + forcepae = 1; + return 1; +} +__setup(forcepae, forcepae_setup); + static void intel_workarounds(struct cpuinfo_x86 *c) { unsigned long lo, hi; @@ -226,6 +234,17 @@ static void intel_workarounds(struct cpuinfo_x86 *c) clear_cpu_cap(c, X86_FEATURE_SEP); /* +* PAE CPUID issue: many Pentium M report no PAE but may have a +* functionally usable PAE implementation. +* Forcefully enable PAE if kernel parameter forcepae is present. +*/ + if (forcepae) { + printk(KERN_WARNING PAE forced!\n); + set_cpu_cap(c, X86_FEATURE_PAE); + add_taint(TAINT_CPU_OUT_OF_SPEC, LOCKDEP_NOW_UNRELIABLE); + } + + /* * P4 Xeon errata 037 workaround. * Hardware prefetcher may cause stale data to be loaded into the cache. */ -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Re: [PATCH] x86: set Pentium M as PAE capable
On Mon, Mar 03, 2014 at 08:29:39PM +0100, Borislav Petkov wrote: On Mon, Mar 03, 2014 at 03:04:35PM +0700, Chris Bainbridge wrote: On 3 March 2014 02:05, Roland Kletzing devz...@web.de wrote: i would recommend adding the newly introduced param to Documentation/kernel- parameters.txt , though. Done. Signed-off-by: Chris Bainbridge chris.bainbri...@gmail.com --- diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt index b9e9bd8..388b5e9 100644 --- a/Documentation/kernel-parameters.txt +++ b/Documentation/kernel-parameters.txt @@ -962,6 +962,13 @@ bytes respectively. Such letter suffixes can also be entirely omitted. parameter will force ia64_sal_cache_flush to call ia64_pal_cache_flush instead of SAL_CACHE_FLUSH. + forcepae [X86-32] + Forcefully enable Physical Address Extension (PAE). + Many Pentium M systems disable PAE but may have a + functionally usable PAE implementation. + Note: This parameter is unsupported, may cause unknown What does unsupported mean here exactly? It was supposed to have the dual meaning that neither the kernel developers or Intel are going to help you if it doesn't work. But perhaps it is unnecessary - being tainted implies that the kernel developers won't help, and Intel certainly won't be interested. This function is called check_cpuflags() now. You probably want to redo your patch against tip/master, i.e.: git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git#master Done, new patch is against tip. + } + else { + puts(ERROR: PAE is disabled on this Pentium M\n + (PAE can potentially be enabled with + kernel parameter\n + \forcepae\ - this is unsupported, may + cause unknown\n + problems, and will taint the kernel)\n); This string could definitely violate the 80 cols rule so that it is much more readable: } else puts(WARNING: PAE disabled. Use \forcepae\ to enable at your own risk!\n); I've shortened it to the most relevant info only. No need to say we're tainting the kernel because LOCKDEP_NOW_UNRELIABLE will cause that anyway below. Ok I changed it to: WARNING: PAE disabled. Use parameter 'pae' to enable at your own risk!\n diff --git a/arch/x86/kernel/cpu/intel.c b/arch/x86/kernel/cpu/intel.c index bbe1b8b..271686d 100644 --- a/arch/x86/kernel/cpu/intel.c +++ b/arch/x86/kernel/cpu/intel.c @@ -196,6 +196,14 @@ static void intel_smp_check(struct cpuinfo_x86 *c) } } +static int forcepae; +static int __init forcepae_setup(char *__unused) +{ + forcepae = 1; + return 1; +} +__setup(forcepae, forcepae_setup); Yeah, why not simply call it pae? It is smaller and the letter combination is not used yet and it means the same. I don't see any reason not to just use pae forcepae is perhaps a bit more descriptive but the other text in the patch clearly describes the parameter so it doesn't really matter. static void intel_workarounds(struct cpuinfo_x86 *c) { unsigned long lo, hi; @@ -226,6 +234,17 @@ static void intel_workarounds(struct cpuinfo_x86 *c) clear_cpu_cap(c, X86_FEATURE_SEP); /* +* PAE CPUID issue: many Pentium M report no PAE but may have a +* functionally usable PAE implementation. +* Forcefully enable PAE if kernel parameter forcepae is present. +*/ + if (forcepae) { + printk(KERN_WARNING PAE forced!\n); + set_cpu_cap(c, X86_FEATURE_PAE); + add_taint(TAINT_CPU_OUT_OF_SPEC, LOCKDEP_NOW_UNRELIABLE); Right, this implies Dave's patch is preceding yours. I guess hpa can fish it out from the thread when applying. I will include Dave's patch, it is trivial. New patch with all above changes follows. Signed-off-by: Chris Bainbridge chris.bainbri...@gmail.com --- diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt index 580a60c..7d57a5a 100644 --- a/Documentation/kernel-parameters.txt +++ b/Documentation/kernel-parameters.txt @@ -2307,6 +2307,13 @@ bytes respectively. Such letter suffixes can also be entirely omitted. OSS [HW,OSS] See Documentation/sound/oss/oss-parameters.txt + pae [X86-32] + Forcefully enable Physical Address Extension (PAE). + Many Pentium M systems disable PAE but may have a + functionally usable PAE implementation. + Warning: use of this parameter will taint the kernel + and may cause unknown problems. + panic= [KNL] Kernel
Re: Re: [PATCH] x86: set Pentium M as PAE capable
On Mon, Mar 03, 2014 at 09:04:19PM -0800, H. Peter Anvin wrote: forcepae is descriptive. Back to forcepae. Signed-off-by: Chris Bainbridge chris.bainbri...@gmail.com --- diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt index 580a60c..67755ea 100644 --- a/Documentation/kernel-parameters.txt +++ b/Documentation/kernel-parameters.txt @@ -1011,6 +1011,13 @@ bytes respectively. Such letter suffixes can also be entirely omitted. parameter will force ia64_sal_cache_flush to call ia64_pal_cache_flush instead of SAL_CACHE_FLUSH. + forcepae [X86-32] + Forcefully enable Physical Address Extension (PAE). + Many Pentium M systems disable PAE but may have a + functionally usable PAE implementation. + Warning: use of this parameter will taint the kernel + and may cause unknown problems. + ftrace=[tracer] [FTRACE] will set and start the specified tracer as early as possible in order to facilitate early diff --git a/arch/x86/boot/cpucheck.c b/arch/x86/boot/cpucheck.c index 100a9a1..f0d0b20 100644 --- a/arch/x86/boot/cpucheck.c +++ b/arch/x86/boot/cpucheck.c @@ -67,6 +67,13 @@ static int is_transmeta(void) cpu_vendor[2] == A32('M', 'x', '8', '6'); } +static int is_intel(void) +{ + return cpu_vendor[0] == A32('G', 'e', 'n', 'u') + cpu_vendor[1] == A32('i', 'n', 'e', 'I') + cpu_vendor[2] == A32('n', 't', 'e', 'l'); +} + /* Returns a bitmask of which words we have error bits in */ static int check_cpuflags(void) { @@ -153,6 +160,19 @@ int check_cpu(int *cpu_level_ptr, int *req_level_ptr, u32 **err_flags_ptr) asm(wrmsr : : a (eax), d (edx), c (ecx)); err = check_cpuflags(); + } else if (err == 0x01 + !(err_flags[0] ~(1 X86_FEATURE_PAE)) + is_intel() cpu.level == 6 + (cpu.model == 9 || cpu.model == 13)) { + /* PAE is disabled on this Pentium M but can be forced */ + if (cmdline_find_option_bool(forcepae)) { + puts(WARNING: Forcing PAE in CPU flags\n); + set_bit(X86_FEATURE_PAE, cpu.flags); + err = check_cpuflags(); + } + else { + puts(WARNING: PAE disabled. Use parameter 'forcepae' to enable at your own risk!\n); + } } if (err_flags_ptr) diff --git a/arch/x86/kernel/cpu/amd.c b/arch/x86/kernel/cpu/amd.c index c67ffa6..7ec4119 100644 --- a/arch/x86/kernel/cpu/amd.c +++ b/arch/x86/kernel/cpu/amd.c @@ -218,7 +218,7 @@ static void amd_k7_smp_check(struct cpuinfo_x86 *c) */ WARN_ONCE(1, WARNING: This combination of AMD processors is not suitable for SMP.\n); - add_taint(TAINT_UNSAFE_SMP, LOCKDEP_NOW_UNRELIABLE); + add_taint(TAINT_CPU_OUT_OF_SPEC, LOCKDEP_NOW_UNRELIABLE); } static void init_amd_k7(struct cpuinfo_x86 *c) diff --git a/arch/x86/kernel/cpu/intel.c b/arch/x86/kernel/cpu/intel.c index ea56e7c..053cb59 100644 --- a/arch/x86/kernel/cpu/intel.c +++ b/arch/x86/kernel/cpu/intel.c @@ -195,6 +195,14 @@ static void intel_smp_check(struct cpuinfo_x86 *c) } } +static int forcepae; +static int __init forcepae_setup(char *__unused) +{ + forcepae = 1; + return 1; +} +__setup(forcepae, forcepae_setup); + static void intel_workarounds(struct cpuinfo_x86 *c) { unsigned long lo, hi; @@ -225,6 +233,17 @@ static void intel_workarounds(struct cpuinfo_x86 *c) clear_cpu_cap(c, X86_FEATURE_SEP); /* +* PAE CPUID issue: many Pentium M report no PAE but may have a +* functionally usable PAE implementation. +* Forcefully enable PAE if kernel parameter forcepae is present. +*/ + if (forcepae) { + printk(KERN_WARNING PAE forced!\n); + set_cpu_cap(c, X86_FEATURE_PAE); + add_taint(TAINT_CPU_OUT_OF_SPEC, LOCKDEP_NOW_UNRELIABLE); + } + + /* * P4 Xeon errata 037 workaround. * Hardware prefetcher may cause stale data to be loaded into the cache. */ diff --git a/include/linux/kernel.h b/include/linux/kernel.h index 196d1ea..08fb024 100644 --- a/include/linux/kernel.h +++ b/include/linux/kernel.h @@ -458,7 +458,7 @@ extern enum system_states { #define TAINT_PROPRIETARY_MODULE 0 #define TAINT_FORCED_MODULE1 -#define TAINT_UNSAFE_SMP 2 +#define TAINT_CPU_OUT_OF_SPEC 2 #define TAINT_FORCED_RMMOD 3 #define TAINT_MACHINE_CHECK4 #define TAINT_BAD_PAGE 5 diff --git a/kernel/module.c b/kernel/module.c index b99e801..8dc7f5e 100644 --- a/kernel/module.c +++ b/kernel/module.c
Re: [PATCH] x86: set Pentium M as PAE capable
On Fri, Feb 28, 2014 at 03:27:50PM +0300, Dennis Mungai wrote: > Hello people, > > Note that revisions of the Dothan core were released in the first quarter > of 2005 with the *Sonoma* chipsets and supported a 533 MT/s FSB and NX-bit > (and PAE support required for it was enabled, unlike earlier Pentium Ms > that had it disabled). These processors include the 730M (1.6 GHz), 740M > (1.73 GHz), 750M (1.86 GHz), 760M (2.0 GHz) and 770M (2.13 GHz). These > models all have a TDP of 27 W and a 2 MB L2 cache. > > These CPUs should have PAE enabled. Only earlier versions of the Pentium M > ( Older Dothans and the Banias core) do not have PAE support, officially. > > -Dennis. Good point, patch updated to not show the warning if PAE is already enabled. Signed-off-by: Chris Bainbridge --- diff --git a/arch/x86/boot/cpucheck.c b/arch/x86/boot/cpucheck.c index 4d3ff03..3fd12bc 100644 --- a/arch/x86/boot/cpucheck.c +++ b/arch/x86/boot/cpucheck.c @@ -69,6 +69,13 @@ static int is_transmeta(void) cpu_vendor[2] == A32('M', 'x', '8', '6'); } +static int is_intel(void) +{ + return cpu_vendor[0] == A32('G', 'e', 'n', 'u') && + cpu_vendor[1] == A32('i', 'n', 'e', 'I') && + cpu_vendor[2] == A32('n', 't', 'e', 'l'); +} + static int has_fpu(void) { u16 fcw = -1, fsw = -1; @@ -239,6 +246,21 @@ int check_cpu(int *cpu_level_ptr, int *req_level_ptr, u32 **err_flags_ptr) asm("wrmsr" : : "a" (eax), "d" (edx), "c" (ecx)); err = check_flags(); + } else if (err == 0x01 && + !(err_flags[0] & ~(1 << X86_FEATURE_PAE)) && + is_intel() && cpu.level == 6 && + (cpu.model == 9 || cpu.model == 13)) { + /* PAE is disabled on this Pentium M but can be forced */ + if (cmdline_find_option_bool("forcepae")) { + puts("WARNING: Forcing PAE in CPU flags\n"); + set_bit(X86_FEATURE_PAE, cpu.flags); + err = check_flags(); + } + else { + puts("Pentium M: PAE is disabled, " +"enable it with kernel argument \"forcepae\"\n" +"(this will taint the kernel)\n"); + } } if (err_flags_ptr) diff --git a/arch/x86/kernel/cpu/intel.c b/arch/x86/kernel/cpu/intel.c index bbe1b8b..873cf3b 100644 --- a/arch/x86/kernel/cpu/intel.c +++ b/arch/x86/kernel/cpu/intel.c @@ -196,6 +196,14 @@ static void intel_smp_check(struct cpuinfo_x86 *c) } } +static int forcepae; +static int __init forcepae_setup(char *__unused) +{ + forcepae = 1; + return 1; +} +__setup("forcepae", forcepae_setup); + static void intel_workarounds(struct cpuinfo_x86 *c) { unsigned long lo, hi; @@ -226,6 +234,15 @@ static void intel_workarounds(struct cpuinfo_x86 *c) clear_cpu_cap(c, X86_FEATURE_SEP); /* +* PAE CPUID bug: Pentium M reports no PAE but has PAE +*/ + if (forcepae) { + printk(KERN_WARNING "PAE forced!\n"); + set_cpu_cap(c, X86_FEATURE_PAE); + add_taint(TAINT_CPU_OUT_OF_SPEC, LOCKDEP_NOW_UNRELIABLE); + } + + /* * P4 Xeon errata 037 workaround. * Hardware prefetcher may cause stale data to be loaded into the cache. */ -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] x86: set Pentium M as PAE capable
On Fri, Feb 28, 2014 at 03:27:50PM +0300, Dennis Mungai wrote: Hello people, Note that revisions of the Dothan core were released in the first quarter of 2005 with the *Sonoma* chipsets and supported a 533 MT/s FSB and NX-bit (and PAE support required for it was enabled, unlike earlier Pentium Ms that had it disabled). These processors include the 730M (1.6 GHz), 740M (1.73 GHz), 750M (1.86 GHz), 760M (2.0 GHz) and 770M (2.13 GHz). These models all have a TDP of 27 W and a 2 MB L2 cache. These CPUs should have PAE enabled. Only earlier versions of the Pentium M ( Older Dothans and the Banias core) do not have PAE support, officially. -Dennis. Good point, patch updated to not show the warning if PAE is already enabled. Signed-off-by: Chris Bainbridge chris.bainbri...@gmail.com --- diff --git a/arch/x86/boot/cpucheck.c b/arch/x86/boot/cpucheck.c index 4d3ff03..3fd12bc 100644 --- a/arch/x86/boot/cpucheck.c +++ b/arch/x86/boot/cpucheck.c @@ -69,6 +69,13 @@ static int is_transmeta(void) cpu_vendor[2] == A32('M', 'x', '8', '6'); } +static int is_intel(void) +{ + return cpu_vendor[0] == A32('G', 'e', 'n', 'u') + cpu_vendor[1] == A32('i', 'n', 'e', 'I') + cpu_vendor[2] == A32('n', 't', 'e', 'l'); +} + static int has_fpu(void) { u16 fcw = -1, fsw = -1; @@ -239,6 +246,21 @@ int check_cpu(int *cpu_level_ptr, int *req_level_ptr, u32 **err_flags_ptr) asm(wrmsr : : a (eax), d (edx), c (ecx)); err = check_flags(); + } else if (err == 0x01 + !(err_flags[0] ~(1 X86_FEATURE_PAE)) + is_intel() cpu.level == 6 + (cpu.model == 9 || cpu.model == 13)) { + /* PAE is disabled on this Pentium M but can be forced */ + if (cmdline_find_option_bool(forcepae)) { + puts(WARNING: Forcing PAE in CPU flags\n); + set_bit(X86_FEATURE_PAE, cpu.flags); + err = check_flags(); + } + else { + puts(Pentium M: PAE is disabled, +enable it with kernel argument \forcepae\\n +(this will taint the kernel)\n); + } } if (err_flags_ptr) diff --git a/arch/x86/kernel/cpu/intel.c b/arch/x86/kernel/cpu/intel.c index bbe1b8b..873cf3b 100644 --- a/arch/x86/kernel/cpu/intel.c +++ b/arch/x86/kernel/cpu/intel.c @@ -196,6 +196,14 @@ static void intel_smp_check(struct cpuinfo_x86 *c) } } +static int forcepae; +static int __init forcepae_setup(char *__unused) +{ + forcepae = 1; + return 1; +} +__setup(forcepae, forcepae_setup); + static void intel_workarounds(struct cpuinfo_x86 *c) { unsigned long lo, hi; @@ -226,6 +234,15 @@ static void intel_workarounds(struct cpuinfo_x86 *c) clear_cpu_cap(c, X86_FEATURE_SEP); /* +* PAE CPUID bug: Pentium M reports no PAE but has PAE +*/ + if (forcepae) { + printk(KERN_WARNING PAE forced!\n); + set_cpu_cap(c, X86_FEATURE_PAE); + add_taint(TAINT_CPU_OUT_OF_SPEC, LOCKDEP_NOW_UNRELIABLE); + } + + /* * P4 Xeon errata 037 workaround. * Hardware prefetcher may cause stale data to be loaded into the cache. */ -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] x86: set Pentium M as PAE capable
On Wed, Feb 26, 2014 at 10:49:49AM -0500, Dave Jones wrote: > On Wed, Feb 26, 2014 at 02:18:52PM +0100, Borislav Petkov wrote: > > On Wed, Feb 26, 2014 at 07:12:59PM +0700, Chris Bainbridge wrote: > > > @@ -226,6 +234,15 @@ static void intel_workarounds(struct cpuinfo_x86 *c) > > > clear_cpu_cap(c, X86_FEATURE_SEP); > > > > > > /* > > > + * PAE CPUID bug: Pentium M reports no PAE but has PAE > > > + */ > > > +if (forcepae) { > > > +printk(KERN_WARNING "PAE forced!\n"); > > > +set_cpu_cap(c, X86_FEATURE_PAE); > > > +add_taint(TAINT_MACHINE_CHECK, LOCKDEP_NOW_UNRELIABLE); > > > > This is certainly the wrong taint flag. We'd need a new one or to > > repurpose another one as I suggested in a previous mail. > > I'd suggest repurposing 'S'. Instead of 'unsafe smp', it could mean > "out of Spec". We currently only use that flag on some ancient athlon xp, > so there's not going to be any kind of collision. > > Start with the below maybe ? > > Dave Patch looks fine. Is the patch I previously posted, combined with this patch, sufficient for inclusion in the kernel? Or is there anything else I need to do? -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] x86: set Pentium M as PAE capable
On Wed, Feb 26, 2014 at 10:49:49AM -0500, Dave Jones wrote: On Wed, Feb 26, 2014 at 02:18:52PM +0100, Borislav Petkov wrote: On Wed, Feb 26, 2014 at 07:12:59PM +0700, Chris Bainbridge wrote: @@ -226,6 +234,15 @@ static void intel_workarounds(struct cpuinfo_x86 *c) clear_cpu_cap(c, X86_FEATURE_SEP); /* + * PAE CPUID bug: Pentium M reports no PAE but has PAE + */ +if (forcepae) { +printk(KERN_WARNING PAE forced!\n); +set_cpu_cap(c, X86_FEATURE_PAE); +add_taint(TAINT_MACHINE_CHECK, LOCKDEP_NOW_UNRELIABLE); This is certainly the wrong taint flag. We'd need a new one or to repurpose another one as I suggested in a previous mail. I'd suggest repurposing 'S'. Instead of 'unsafe smp', it could mean out of Spec. We currently only use that flag on some ancient athlon xp, so there's not going to be any kind of collision. Start with the below maybe ? Dave Patch looks fine. Is the patch I previously posted, combined with this patch, sufficient for inclusion in the kernel? Or is there anything else I need to do? -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] x86: set Pentium M as PAE capable
On Tue, Feb 25, 2014 at 09:16:02AM -0800, H. Peter Anvin wrote: > On 02/25/2014 08:26 AM, Dave Jones wrote: > > On Tue, Feb 25, 2014 at 02:45:57AM -0800, H. Peter Anvin wrote: > > > On 02/24/2014 10:01 PM, Chris Bainbridge wrote: > > > > Pentium M is PAE capable but does not indicate so in the CPUID > > response. > > > > This is an issue now that some distributions are no longer shipping > > > > non-PAE kernels (those distributions no longer boot on Pentium M). This > > > > small patch fixes the issue by forcing the PAE capability on Pentium M. > > > > > > > > For more discussion see https://bugs.launchpad.net/baltix/+bug/930447 > > > > > > > > > > 1. This patch doesn't match the discussion in the link. > > > 2. You would have to also enable this in the cpu testing code in > > > arch/x86/boot. > > > 3. At the very least we need to print a serious warning that the CPU > > > is being run outside its specifications. I have no personal > > > information about why this CPUID bit was disabled, but it could be > > > that it was discovered in testing that it didn't work correctly in > > > all circumstances (e.g. high temperature.) This is very much "use > > > at your own risk..."; you could get data corruption or even > > > hardware damage. > > > > About six years ago, we almost went down this same path for Fedora, > > and I'm fairly sure the only reason we backed off and decided to not > > pursue it was that we found some Pentium M's where it just didn't work. > > > > OK, that *definitely* means that if we're doing this at all we're doing > it via an explicit opt-in on the command line, and tainting the kernel > in the process. > > I don't know if anyone (Chris?) is interested enough in the problem to > do such a patch, though. I know I'm not too interested in spending a > bunch of time on. > > -hpa > The basic findings of the bug discussion is that people are successfully running PAE kernels on Pentium M (for some unknown reason Grub skips the validate_cpu code in the kernel, so existing PAE kernels will run unmodified, although they do fail when booted with syslinux), and people are using a user-space hack to add "pae" to /proc/cpuinfo. In all of the testing reported on the Launchpad bug and elsewhere, every user who managed to boot a PAE kernel on Pentium M reported success. There was a single report of failure, but the user encountered the "This kernel requires the following features" message, so the failure was caused by some issue of his boot setup not skipping the cpu validation code, rather than a PAE failure in the Pentium M. It is possible that PAE was disabled for technical reasons, or for commercial reasons (e.g. to discourage vendors from building Pentium M servers). We don't know. What we do know is that people are using PAE kernels on Pentium M systems, and that not all are aware of the implications (for a user with an existing install of Debian who apt-gets a PAE kernel, it will install and boot (thanks to Grub) and no errors or warnings will have been shown to indicate that their system is now running "out of spec") I have made the requested changes to the patch: --- diff --git a/arch/x86/boot/cpucheck.c b/arch/x86/boot/cpucheck.c index 4d3ff03..3220734 100644 --- a/arch/x86/boot/cpucheck.c +++ b/arch/x86/boot/cpucheck.c @@ -151,6 +151,16 @@ static void get_flags(void) : : "ebx"); } } + + if (cmdline_find_option_bool("forcepae")) { + puts("WARNING: Forcing PAE in CPU flags\n"); + set_bit(X86_FEATURE_PAE, cpu.flags); + } + else if ((cpu.level == 6) && ((cpu.model == 9) || (cpu.model == 13))) { + puts("Pentium M: PAE is disabled, " +"enable it with kernel argument \"forcepae\"\n" +"(\"forcepae\" is unsupported and will taint the kernel)\n"); + } } /* Returns a bitmask of which words we have error bits in */ diff --git a/arch/x86/kernel/cpu/intel.c b/arch/x86/kernel/cpu/intel.c index bbe1b8b..1047098 100644 --- a/arch/x86/kernel/cpu/intel.c +++ b/arch/x86/kernel/cpu/intel.c @@ -196,6 +196,14 @@ static void intel_smp_check(struct cpuinfo_x86 *c) } } +static int forcepae; +static int __init forcepae_setup(char *__unused) +{ + forcepae = 1; + return 1; +} +__setup("forcepae", forcepae_setup); + static void intel_workarounds(struct cpuinfo_x86 *c) { unsigned long lo, hi; @@ -226,6 +234,15 @@ static void intel_workarounds(struct cpuinfo_x
Re: [PATCH] x86: set Pentium M as PAE capable
On Tue, Feb 25, 2014 at 09:16:02AM -0800, H. Peter Anvin wrote: On 02/25/2014 08:26 AM, Dave Jones wrote: On Tue, Feb 25, 2014 at 02:45:57AM -0800, H. Peter Anvin wrote: On 02/24/2014 10:01 PM, Chris Bainbridge wrote: Pentium M is PAE capable but does not indicate so in the CPUID response. This is an issue now that some distributions are no longer shipping non-PAE kernels (those distributions no longer boot on Pentium M). This small patch fixes the issue by forcing the PAE capability on Pentium M. For more discussion see https://bugs.launchpad.net/baltix/+bug/930447 1. This patch doesn't match the discussion in the link. 2. You would have to also enable this in the cpu testing code in arch/x86/boot. 3. At the very least we need to print a serious warning that the CPU is being run outside its specifications. I have no personal information about why this CPUID bit was disabled, but it could be that it was discovered in testing that it didn't work correctly in all circumstances (e.g. high temperature.) This is very much use at your own risk...; you could get data corruption or even hardware damage. About six years ago, we almost went down this same path for Fedora, and I'm fairly sure the only reason we backed off and decided to not pursue it was that we found some Pentium M's where it just didn't work. OK, that *definitely* means that if we're doing this at all we're doing it via an explicit opt-in on the command line, and tainting the kernel in the process. I don't know if anyone (Chris?) is interested enough in the problem to do such a patch, though. I know I'm not too interested in spending a bunch of time on. -hpa The basic findings of the bug discussion is that people are successfully running PAE kernels on Pentium M (for some unknown reason Grub skips the validate_cpu code in the kernel, so existing PAE kernels will run unmodified, although they do fail when booted with syslinux), and people are using a user-space hack to add pae to /proc/cpuinfo. In all of the testing reported on the Launchpad bug and elsewhere, every user who managed to boot a PAE kernel on Pentium M reported success. There was a single report of failure, but the user encountered the This kernel requires the following features message, so the failure was caused by some issue of his boot setup not skipping the cpu validation code, rather than a PAE failure in the Pentium M. It is possible that PAE was disabled for technical reasons, or for commercial reasons (e.g. to discourage vendors from building Pentium M servers). We don't know. What we do know is that people are using PAE kernels on Pentium M systems, and that not all are aware of the implications (for a user with an existing install of Debian who apt-gets a PAE kernel, it will install and boot (thanks to Grub) and no errors or warnings will have been shown to indicate that their system is now running out of spec) I have made the requested changes to the patch: --- diff --git a/arch/x86/boot/cpucheck.c b/arch/x86/boot/cpucheck.c index 4d3ff03..3220734 100644 --- a/arch/x86/boot/cpucheck.c +++ b/arch/x86/boot/cpucheck.c @@ -151,6 +151,16 @@ static void get_flags(void) : : ebx); } } + + if (cmdline_find_option_bool(forcepae)) { + puts(WARNING: Forcing PAE in CPU flags\n); + set_bit(X86_FEATURE_PAE, cpu.flags); + } + else if ((cpu.level == 6) ((cpu.model == 9) || (cpu.model == 13))) { + puts(Pentium M: PAE is disabled, +enable it with kernel argument \forcepae\\n +(\forcepae\ is unsupported and will taint the kernel)\n); + } } /* Returns a bitmask of which words we have error bits in */ diff --git a/arch/x86/kernel/cpu/intel.c b/arch/x86/kernel/cpu/intel.c index bbe1b8b..1047098 100644 --- a/arch/x86/kernel/cpu/intel.c +++ b/arch/x86/kernel/cpu/intel.c @@ -196,6 +196,14 @@ static void intel_smp_check(struct cpuinfo_x86 *c) } } +static int forcepae; +static int __init forcepae_setup(char *__unused) +{ + forcepae = 1; + return 1; +} +__setup(forcepae, forcepae_setup); + static void intel_workarounds(struct cpuinfo_x86 *c) { unsigned long lo, hi; @@ -226,6 +234,15 @@ static void intel_workarounds(struct cpuinfo_x86 *c) clear_cpu_cap(c, X86_FEATURE_SEP); /* +* PAE CPUID bug: Pentium M reports no PAE but has PAE +*/ + if (forcepae) { + printk(KERN_WARNING PAE forced!\n); + set_cpu_cap(c, X86_FEATURE_PAE); + add_taint(TAINT_MACHINE_CHECK, LOCKDEP_NOW_UNRELIABLE); + } + + /* * P4 Xeon errata 037 workaround. * Hardware prefetcher may cause stale data to be loaded into the cache. */ -- To unsubscribe from
[PATCH] x86: set Pentium M as PAE capable
Pentium M is PAE capable but does not indicate so in the CPUID response. This is an issue now that some distributions are no longer shipping non-PAE kernels (those distributions no longer boot on Pentium M). This small patch fixes the issue by forcing the PAE capability on Pentium M. For more discussion see https://bugs.launchpad.net/baltix/+bug/930447 Signed-off-by: Chris Bainbridge --- diff --git a/arch/x86/kernel/cpu/intel.c b/arch/x86/kernel/cpu/intel.c index bbe1b8b..97996aa 100644 --- a/arch/x86/kernel/cpu/intel.c +++ b/arch/x86/kernel/cpu/intel.c @@ -226,6 +226,12 @@ static void intel_workarounds(struct cpuinfo_x86 *c) clear_cpu_cap(c, X86_FEATURE_SEP); /* +* PAE CPUID bug: Pentium M reports no PAE but has PAE +*/ + if ((c->x86 == 6) && ((c->x86_model == 9) || (c->x86_model == 13))) + set_cpu_cap(c, X86_FEATURE_PAE); + + /* * P4 Xeon errata 037 workaround. * Hardware prefetcher may cause stale data to be loaded into the cache. */ -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] x86: set Pentium M as PAE capable
Pentium M is PAE capable but does not indicate so in the CPUID response. This is an issue now that some distributions are no longer shipping non-PAE kernels (those distributions no longer boot on Pentium M). This small patch fixes the issue by forcing the PAE capability on Pentium M. For more discussion see https://bugs.launchpad.net/baltix/+bug/930447 Signed-off-by: Chris Bainbridge chris.bainbri...@gmail.com --- diff --git a/arch/x86/kernel/cpu/intel.c b/arch/x86/kernel/cpu/intel.c index bbe1b8b..97996aa 100644 --- a/arch/x86/kernel/cpu/intel.c +++ b/arch/x86/kernel/cpu/intel.c @@ -226,6 +226,12 @@ static void intel_workarounds(struct cpuinfo_x86 *c) clear_cpu_cap(c, X86_FEATURE_SEP); /* +* PAE CPUID bug: Pentium M reports no PAE but has PAE +*/ + if ((c-x86 == 6) ((c-x86_model == 9) || (c-x86_model == 13))) + set_cpu_cap(c, X86_FEATURE_PAE); + + /* * P4 Xeon errata 037 workaround. * Hardware prefetcher may cause stale data to be loaded into the cache. */ -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/