from:"Chris Bainbridge"

[BUG] ACPI Error: Needed type [Reference], found [Integer]

2017-02-27 Thread Chris Bainbridge

Latest Linux git has ACPI errors on Macbook IVB:

[0.333133] ACPI Error: Needed type [Reference], found [Integer] 
880261656f30 (20170119/exresop-103)
[0.333141] ACPI Exception: AE_AML_OPERAND_TYPE, While resolving operands 
for [Store] (20170119/dswexec-461)
[0.333150] ACPI Error: Method parse/execution failed [\_PR.CPU0._PDC] (Node 
8802628c60c8), AE_AML_OPERAND_TYPE (20170119/psparse-543)

[0.517073] ACPI Error: [\_PR_.CPU0._CST] Namespace lookup failure, 
AE_NOT_FOUND (20170119/psargs-363)
[0.517084] ACPI Error: Method parse/execution failed [\_PR.CPU1._CST] (Node 
88026173b618), AE_NOT_FOUND (20170119/psparse-543)
[0.517267] ACPI Error: [\_PR_.CPU0._CST] Namespace lookup failure, 
AE_NOT_FOUND (20170119/psargs-363)
[0.517279] ACPI Error: Method parse/execution failed [\_PR.CPU2._CST] (Node 
88026173b640), AE_NOT_FOUND (20170119/psparse-543)
[0.517464] ACPI Error: [\_PR_.CPU0._CST] Namespace lookup failure, 
AE_NOT_FOUND (20170119/psargs-363)
[0.517474] ACPI Error: Method parse/execution failed [\_PR.CPU3._CST] (Node 
88026173b668), AE_NOT_FOUND (20170119/psparse-543)

Bisect leads to:

ce87e09dd88c61f9088768a7708828423549725c is the first bad commit
commit ce87e09dd88c61f9088768a7708828423549725c
Author: Bob Moore 
Date:   Wed Dec 28 15:29:43 2016 +0800

ACPICA: Parser: Allow method invocations as target operands

ACPICA commit a6cca7a4786cdbfd29cea67e84b5b01a8ae6ff1c

Method invocations as target operands are allowed as target
operands in the ASL grammar. This change implements support
for this. Method must return a reference for this to work
properly at runtime, however.

Link: https://github.com/acpica/acpica/commit/a6cca7a4
Signed-off-by: Bob Moore 
Signed-off-by: Lv Zheng 
Signed-off-by: Rafael J. Wysocki

[BUG] ACPI Error: Needed type [Reference], found [Integer]

2017-02-27 Thread Chris Bainbridge

Latest Linux git has ACPI errors on Macbook IVB:

[0.333133] ACPI Error: Needed type [Reference], found [Integer] 
880261656f30 (20170119/exresop-103)
[0.333141] ACPI Exception: AE_AML_OPERAND_TYPE, While resolving operands 
for [Store] (20170119/dswexec-461)
[0.333150] ACPI Error: Method parse/execution failed [\_PR.CPU0._PDC] (Node 
8802628c60c8), AE_AML_OPERAND_TYPE (20170119/psparse-543)

[0.517073] ACPI Error: [\_PR_.CPU0._CST] Namespace lookup failure, 
AE_NOT_FOUND (20170119/psargs-363)
[0.517084] ACPI Error: Method parse/execution failed [\_PR.CPU1._CST] (Node 
88026173b618), AE_NOT_FOUND (20170119/psparse-543)
[0.517267] ACPI Error: [\_PR_.CPU0._CST] Namespace lookup failure, 
AE_NOT_FOUND (20170119/psargs-363)
[0.517279] ACPI Error: Method parse/execution failed [\_PR.CPU2._CST] (Node 
88026173b640), AE_NOT_FOUND (20170119/psparse-543)
[0.517464] ACPI Error: [\_PR_.CPU0._CST] Namespace lookup failure, 
AE_NOT_FOUND (20170119/psargs-363)
[0.517474] ACPI Error: Method parse/execution failed [\_PR.CPU3._CST] (Node 
88026173b668), AE_NOT_FOUND (20170119/psparse-543)

Bisect leads to:

ce87e09dd88c61f9088768a7708828423549725c is the first bad commit
commit ce87e09dd88c61f9088768a7708828423549725c
Author: Bob Moore 
Date:   Wed Dec 28 15:29:43 2016 +0800

ACPICA: Parser: Allow method invocations as target operands

ACPICA commit a6cca7a4786cdbfd29cea67e84b5b01a8ae6ff1c

Method invocations as target operands are allowed as target
operands in the ASL grammar. This change implements support
for this. Method must return a reference for this to work
properly at runtime, however.

Link: https://github.com/acpica/acpica/commit/a6cca7a4
Signed-off-by: Bob Moore 
Signed-off-by: Lv Zheng 
Signed-off-by: Rafael J. Wysocki

Re: [Intel-gfx] [v4.6-10530-g28165ec7a99b] i915: ERROR "CPU pipe/PCH transcoder" A FIFO underrun

2016-05-27 Thread Chris Bainbridge

On 25 May 2016 at 08:31, Sedat Dilek  wrote:
> Hi Daniel,
>
> with latest Linus Git I see this with my Intel SandyBridge GPU...
>
> [   17.629014] [drm:intel_cpu_fifo_underrun_irq_handler [i915]]
> *ERROR* CPU pipe A FIFO underrun
> [   17.630652] [drm:intel_set_pch_fifo_underrun_reporting [i915]]
> *ERROR* uncleared pch fifo underrun on pch transcoder A
> [   17.630685] [drm:intel_pch_fifo_underrun_irq_handler [i915]]
> *ERROR* PCH transcoder A FIFO underrun

Guessing this is https://bugs.freedesktop.org/show_bug.cgi?id=95736

Re: [Intel-gfx] [v4.6-10530-g28165ec7a99b] i915: ERROR "CPU pipe/PCH transcoder" A FIFO underrun

2016-05-27 Thread Chris Bainbridge

On 25 May 2016 at 08:31, Sedat Dilek  wrote:
> Hi Daniel,
>
> with latest Linus Git I see this with my Intel SandyBridge GPU...
>
> [   17.629014] [drm:intel_cpu_fifo_underrun_irq_handler [i915]]
> *ERROR* CPU pipe A FIFO underrun
> [   17.630652] [drm:intel_set_pch_fifo_underrun_reporting [i915]]
> *ERROR* uncleared pch fifo underrun on pch transcoder A
> [   17.630685] [drm:intel_pch_fifo_underrun_irq_handler [i915]]
> *ERROR* PCH transcoder A FIFO underrun

Guessing this is https://bugs.freedesktop.org/show_bug.cgi?id=95736

[PATCH v3] usb: core: hub: hub_port_init lock controller instead of bus

2016-04-25 Thread Chris Bainbridge

The XHCI controller presents two USB buses to the system - one for USB2
and one for USB3. The hub init code (hub_port_init) is reentrant but
only locks one bus per thread, leading to a race condition failure when
two threads attempt to simultaneously initialise a USB2 and USB3 device:

[8.034843] xhci_hcd :00:14.0: Timeout while waiting for setup device 
command
[   13.183701] usb 3-3: device descriptor read/all, error -110

On a test system this failure occurred on 6% of all boots.

The call traces at the point of failure are:

Call Trace:
 [] schedule+0x37/0x90
 [] usb_kill_urb+0x8d/0xd0
 [] ? wake_up_atomic_t+0x30/0x30
 [] usb_start_wait_urb+0xbe/0x150
 [] usb_control_msg+0xbc/0xf0
 [] hub_port_init+0x51e/0xb70
 [] hub_event+0x817/0x1570
 [] process_one_work+0x1ff/0x620
 [] ? process_one_work+0x15f/0x620
 [] worker_thread+0x64/0x4b0
 [] ? rescuer_thread+0x390/0x390
 [] kthread+0x105/0x120
 [] ? kthread_create_on_node+0x200/0x200
 [] ret_from_fork+0x3f/0x70
 [] ? kthread_create_on_node+0x200/0x200

Call Trace:
 [] xhci_setup_device+0x53d/0xa40
 [] xhci_address_device+0xe/0x10
 [] hub_port_init+0x1bf/0xb70
 [] ? trace_hardirqs_on+0xd/0x10
 [] hub_event+0x817/0x1570
 [] process_one_work+0x1ff/0x620
 [] ? process_one_work+0x15f/0x620
 [] worker_thread+0x64/0x4b0
 [] ? rescuer_thread+0x390/0x390
 [] kthread+0x105/0x120
 [] ? kthread_create_on_node+0x200/0x200
 [] ret_from_fork+0x3f/0x70
 [] ? kthread_create_on_node+0x200/0x200

Which results from the two call chains:

hub_port_init
 usb_get_device_descriptor
  usb_get_descriptor
   usb_control_msg
usb_internal_control_msg
 usb_start_wait_urb
  usb_submit_urb / wait_for_completion_timeout / usb_kill_urb

hub_port_init
 hub_set_address
  xhci_address_device
   xhci_setup_device

Mathias Nyman explains the current behaviour violates the XHCI spec:

 hub_port_reset() will end up moving the corresponding xhci device slot
 to default state.

 As hub_port_reset() is called several times in hub_port_init() it
 sounds reasonable that we could end up with two threads having their
 xhci device slots in default state at the same time, which according to
 xhci 4.5.3 specs still is a big no no:

 "Note: Software shall not transition more than one Device Slot to the
  Default State at a time"

 So both threads fail at their next task after this.
 One fails to read the descriptor, and the other fails addressing the
 device.

Fix this in hub_port_init by locking the USB controller (instead of an
individual bus) to prevent simultaneous initialisation of both buses.

Fixes: 638139eb95d2 ("usb: hub: allow to process more usb hub events in 
parallel")
Link: https://lkml.org/lkml/2016/2/8/312
Link: https://lkml.org/lkml/2016/2/4/748
Signed-off-by: Chris Bainbridge <chris.bainbri...@gmail.com>
---
This is technically the same as the last patch version, but with a reworked
commit message to better explain the cause of the issue.
---
 drivers/usb/core/hcd.c  | 15 +--
 drivers/usb/core/hub.c  |  8 
 include/linux/usb.h |  3 +--
 include/linux/usb/hcd.h |  1 +
 4 files changed, 19 insertions(+), 8 deletions(-)

diff --git a/drivers/usb/core/hcd.c b/drivers/usb/core/hcd.c
index 2ca2cef7f681..980fc5774151 100644
--- a/drivers/usb/core/hcd.c
+++ b/drivers/usb/core/hcd.c
@@ -994,7 +994,7 @@ static void usb_bus_init (struct usb_bus *bus)
bus->bandwidth_allocated = 0;
bus->bandwidth_int_reqs  = 0;
bus->bandwidth_isoc_reqs = 0;
-   mutex_init(>usb_address0_mutex);
+   mutex_init(>devnum_next_mutex);
 }
 
 /*-*/
@@ -2521,6 +2521,14 @@ struct usb_hcd *usb_create_shared_hcd(const struct 
hc_driver *driver,
return NULL;
}
if (primary_hcd == NULL) {
+   hcd->address0_mutex = kmalloc(sizeof(*hcd->address0_mutex),
+   GFP_KERNEL);
+   if (!hcd->address0_mutex) {
+   kfree(hcd);
+   dev_dbg(dev, "hcd address0 mutex alloc failed\n");
+   return NULL;
+   }
+   mutex_init(hcd->address0_mutex);
hcd->bandwidth_mutex = kmalloc(sizeof(*hcd->bandwidth_mutex),
GFP_KERNEL);
if (!hcd->bandwidth_mutex) {
@@ -2532,6 +2540,7 @@ struct usb_hcd *usb_create_shared_hcd(const struct 
hc_driver *driver,
dev_set_drvdata(dev, hcd);
} else {
mutex_lock(_port_peer_mutex);
+   hcd->address0_mutex = primary_hcd->address0_mutex;
hcd->bandwidth_mutex = primary_hcd->bandwidth_mutex;
hcd->primary_hcd = primary_hcd;
primary_hcd->primary_hcd = primary_hcd;
@@ -2598,8 +2607,10 @@ static void hcd_release(struct kref *kref)
struct usb_h

[PATCH v3] usb: core: hub: hub_port_init lock controller instead of bus

2016-04-25 Thread Chris Bainbridge

The XHCI controller presents two USB buses to the system - one for USB2
and one for USB3. The hub init code (hub_port_init) is reentrant but
only locks one bus per thread, leading to a race condition failure when
two threads attempt to simultaneously initialise a USB2 and USB3 device:

[8.034843] xhci_hcd :00:14.0: Timeout while waiting for setup device 
command
[   13.183701] usb 3-3: device descriptor read/all, error -110

On a test system this failure occurred on 6% of all boots.

The call traces at the point of failure are:

Call Trace:
 [] schedule+0x37/0x90
 [] usb_kill_urb+0x8d/0xd0
 [] ? wake_up_atomic_t+0x30/0x30
 [] usb_start_wait_urb+0xbe/0x150
 [] usb_control_msg+0xbc/0xf0
 [] hub_port_init+0x51e/0xb70
 [] hub_event+0x817/0x1570
 [] process_one_work+0x1ff/0x620
 [] ? process_one_work+0x15f/0x620
 [] worker_thread+0x64/0x4b0
 [] ? rescuer_thread+0x390/0x390
 [] kthread+0x105/0x120
 [] ? kthread_create_on_node+0x200/0x200
 [] ret_from_fork+0x3f/0x70
 [] ? kthread_create_on_node+0x200/0x200

Call Trace:
 [] xhci_setup_device+0x53d/0xa40
 [] xhci_address_device+0xe/0x10
 [] hub_port_init+0x1bf/0xb70
 [] ? trace_hardirqs_on+0xd/0x10
 [] hub_event+0x817/0x1570
 [] process_one_work+0x1ff/0x620
 [] ? process_one_work+0x15f/0x620
 [] worker_thread+0x64/0x4b0
 [] ? rescuer_thread+0x390/0x390
 [] kthread+0x105/0x120
 [] ? kthread_create_on_node+0x200/0x200
 [] ret_from_fork+0x3f/0x70
 [] ? kthread_create_on_node+0x200/0x200

Which results from the two call chains:

hub_port_init
 usb_get_device_descriptor
  usb_get_descriptor
   usb_control_msg
usb_internal_control_msg
 usb_start_wait_urb
  usb_submit_urb / wait_for_completion_timeout / usb_kill_urb

hub_port_init
 hub_set_address
  xhci_address_device
   xhci_setup_device

Mathias Nyman explains the current behaviour violates the XHCI spec:

 hub_port_reset() will end up moving the corresponding xhci device slot
 to default state.

 As hub_port_reset() is called several times in hub_port_init() it
 sounds reasonable that we could end up with two threads having their
 xhci device slots in default state at the same time, which according to
 xhci 4.5.3 specs still is a big no no:

 "Note: Software shall not transition more than one Device Slot to the
  Default State at a time"

 So both threads fail at their next task after this.
 One fails to read the descriptor, and the other fails addressing the
 device.

Fix this in hub_port_init by locking the USB controller (instead of an
individual bus) to prevent simultaneous initialisation of both buses.

Fixes: 638139eb95d2 ("usb: hub: allow to process more usb hub events in 
parallel")
Link: https://lkml.org/lkml/2016/2/8/312
Link: https://lkml.org/lkml/2016/2/4/748
Signed-off-by: Chris Bainbridge 
---
This is technically the same as the last patch version, but with a reworked
commit message to better explain the cause of the issue.
---
 drivers/usb/core/hcd.c  | 15 +--
 drivers/usb/core/hub.c  |  8 
 include/linux/usb.h |  3 +--
 include/linux/usb/hcd.h |  1 +
 4 files changed, 19 insertions(+), 8 deletions(-)

diff --git a/drivers/usb/core/hcd.c b/drivers/usb/core/hcd.c
index 2ca2cef7f681..980fc5774151 100644
--- a/drivers/usb/core/hcd.c
+++ b/drivers/usb/core/hcd.c
@@ -994,7 +994,7 @@ static void usb_bus_init (struct usb_bus *bus)
bus->bandwidth_allocated = 0;
bus->bandwidth_int_reqs  = 0;
bus->bandwidth_isoc_reqs = 0;
-   mutex_init(>usb_address0_mutex);
+   mutex_init(>devnum_next_mutex);
 }
 
 /*-*/
@@ -2521,6 +2521,14 @@ struct usb_hcd *usb_create_shared_hcd(const struct 
hc_driver *driver,
return NULL;
}
if (primary_hcd == NULL) {
+   hcd->address0_mutex = kmalloc(sizeof(*hcd->address0_mutex),
+   GFP_KERNEL);
+   if (!hcd->address0_mutex) {
+   kfree(hcd);
+   dev_dbg(dev, "hcd address0 mutex alloc failed\n");
+   return NULL;
+   }
+   mutex_init(hcd->address0_mutex);
hcd->bandwidth_mutex = kmalloc(sizeof(*hcd->bandwidth_mutex),
GFP_KERNEL);
if (!hcd->bandwidth_mutex) {
@@ -2532,6 +2540,7 @@ struct usb_hcd *usb_create_shared_hcd(const struct 
hc_driver *driver,
dev_set_drvdata(dev, hcd);
} else {
mutex_lock(_port_peer_mutex);
+   hcd->address0_mutex = primary_hcd->address0_mutex;
hcd->bandwidth_mutex = primary_hcd->bandwidth_mutex;
hcd->primary_hcd = primary_hcd;
primary_hcd->primary_hcd = primary_hcd;
@@ -2598,8 +2607,10 @@ static void hcd_release(struct kref *kref)
struct usb_hcd *hcd = container_of (kref, struct usb_hcd, k

Re: [PATCH] usb: core: hub: hub_port_init lock controller instead of bus

2016-04-07 Thread Chris Bainbridge

On Wed, Feb 10, 2016 at 07:13:38PM +0200, Mathias Nyman wrote:
> 
> Most likely xhci is messed up after two device slots are in default state at 
> the same time.
> This happens when both threads are in hub_port_init() have called 
> hub_port_reset()
> 
> The issue becomes visible when the the descriptor read and set address both 
> fail after
> the port resets.
> 
> xhci specs 4.5.3 has one tiny note about this:
> "Note: Software shall not transition more than one Device Slot to the Default 
> State at a time"
> 
> So to me, and from xhci pov this patch looks like the correct solution,
> but I might be missing some usb core side details.
> 
> -Mathias
> 

Just following up to see if this patch disappeared into the void?

Re: [PATCH] usb: core: hub: hub_port_init lock controller instead of bus

2016-04-07 Thread Chris Bainbridge

On Wed, Feb 10, 2016 at 07:13:38PM +0200, Mathias Nyman wrote:
> 
> Most likely xhci is messed up after two device slots are in default state at 
> the same time.
> This happens when both threads are in hub_port_init() have called 
> hub_port_reset()
> 
> The issue becomes visible when the the descriptor read and set address both 
> fail after
> the port resets.
> 
> xhci specs 4.5.3 has one tiny note about this:
> "Note: Software shall not transition more than one Device Slot to the Default 
> State at a time"
> 
> So to me, and from xhci pov this patch looks like the correct solution,
> but I might be missing some usb core side details.
> 
> -Mathias
> 

Just following up to see if this patch disappeared into the void?

[BUG] packet loss with PROVE_LOCKING, bisected to EDAC fix

2016-03-21 Thread Chris Bainbridge

Hi,

I was testing something on an old server (Dell T105 opteron) and noticed
packet loss after updating the kernel from 3.10 to 4.5. The test was:

On Dell run: iperf -s
On another system: iperf3 -c dell -u -b 20M -l 1k -t 1000

This sends a 20mbit UDP stream to the Dell. It works fine normally (0%
packet loss), but when CONFIG_PROVE_LOCKING is enabled there is high
(35%) packet loss. (DEBUG_LOCKDEP also seems to cause packet loss)

The packet loss bisected back to:

commit 88d84ac97378c2f1d5fec9af1e8b7d9a662d6b00
Author: Borislav Petkov 
Date:   Fri Jul 19 12:28:25 2013 +0200

EDAC: Fix lockdep splat

I have confirmed that the commit preceding this (v3.11-rc1) is fine and
that 88d84a introduced the bug.

[BUG] packet loss with PROVE_LOCKING, bisected to EDAC fix

2016-03-21 Thread Chris Bainbridge

Hi,

I was testing something on an old server (Dell T105 opteron) and noticed
packet loss after updating the kernel from 3.10 to 4.5. The test was:

On Dell run: iperf -s
On another system: iperf3 -c dell -u -b 20M -l 1k -t 1000

This sends a 20mbit UDP stream to the Dell. It works fine normally (0%
packet loss), but when CONFIG_PROVE_LOCKING is enabled there is high
(35%) packet loss. (DEBUG_LOCKDEP also seems to cause packet loss)

The packet loss bisected back to:

commit 88d84ac97378c2f1d5fec9af1e8b7d9a662d6b00
Author: Borislav Petkov 
Date:   Fri Jul 19 12:28:25 2013 +0200

EDAC: Fix lockdep splat

I have confirmed that the commit preceding this (v3.11-rc1) is fine and
that 88d84a introduced the bug.

[tip:x86/microcode] x86/microcode/intel: Change checksum variables to u32

2016-03-08 Thread tip-bot for Chris Bainbridge

Commit-ID:  bc864af13f34d19c911f5691d87bdacc9ce109f5
Gitweb: http://git.kernel.org/tip/bc864af13f34d19c911f5691d87bdacc9ce109f5
Author: Chris Bainbridge <chris.bainbri...@gmail.com>
AuthorDate: Mon, 7 Mar 2016 11:10:00 +0100
Committer:  Thomas Gleixner <t...@linutronix.de>
CommitDate: Tue, 8 Mar 2016 09:08:44 +0100

x86/microcode/intel: Change checksum variables to u32

Microcode checksum verification should be done using unsigned 32-bit
values otherwise the calculation overflow results in undefined
behaviour.

This is also nicely documented in the SDM, section "Microcode Update
Checksum":

  "To check for a corrupt microcode update, software must perform a
  unsigned DWORD (32-bit) checksum of the microcode update. Even though
  some fields are signed, the checksum procedure treats all DWORDs as
  unsigned. Microcode updates with a header version equal to 0001H
  must sum all DWORDs that comprise the microcode update. A valid
  checksum check will yield a value of H."

but for some reason the code has been using ints from the very
beginning.

In practice, this bug possibly manifested itself only when doing the
microcode data checksum - apparently, currently shipped Intel microcode
doesn't have an extended signature table for which we do checksum
verification too.

  UBSAN: Undefined behaviour in arch/x86/kernel/cpu/microcode/intel_lib.c:105:12
  signed integer overflow:
  -1500151068 + -2125470173 cannot be represented in type 'int'
  CPU: 0 PID: 0 Comm: swapper Not tainted 4.5.0-rc5+ #495
  ...
  Call Trace:
   dump_stack
   ? inotify_ioctl
   ubsan_epilogue
   handle_overflow
   __ubsan_handle_add_overflow
   microcode_sanity_check
   get_matching_model_microcode.isra.2.constprop.8
   ? early_idt_handler_common
   ? strlcpy
   ? find_cpio_data
   load_ucode_intel_bsp
   load_ucode_bsp
   ? load_ucode_bsp
   x86_64_start_kernel

[ Expand and massage commit message. ]
Signed-off-by: Chris Bainbridge <chris.bainbri...@gmail.com>
Signed-off-by: Borislav Petkov <b...@suse.de>
Cc: h...@hmh.eng.br
Link: 
http://lkml.kernel.org/r/1456834359-5132-1-git-send-email-chris.bainbri...@gmail.com
Signed-off-by: Thomas Gleixner <t...@linutronix.de>
---
 arch/x86/kernel/cpu/microcode/intel_lib.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kernel/cpu/microcode/intel_lib.c 
b/arch/x86/kernel/cpu/microcode/intel_lib.c
index b96896b..99ca2c9 100644
--- a/arch/x86/kernel/cpu/microcode/intel_lib.c
+++ b/arch/x86/kernel/cpu/microcode/intel_lib.c
@@ -49,7 +49,7 @@ int microcode_sanity_check(void *mc, int print_err)
unsigned long total_size, data_size, ext_table_size;
struct microcode_header_intel *mc_header = mc;
struct extended_sigtable *ext_header = NULL;
-   int sum, orig_sum, ext_sigcount = 0, i;
+   u32 sum, orig_sum, ext_sigcount = 0, i;
struct extended_signature *ext_sig;
 
total_size = get_totalsize(mc_header);
@@ -85,8 +85,8 @@ int microcode_sanity_check(void *mc, int print_err)
 
/* check extended table checksum */
if (ext_table_size) {
-   int ext_table_sum = 0;
-   int *ext_tablep = (int *)ext_header;
+   u32 ext_table_sum = 0;
+   u32 *ext_tablep = (u32 *)ext_header;
 
i = ext_table_size / DWSIZE;
while (i--)
@@ -102,7 +102,7 @@ int microcode_sanity_check(void *mc, int print_err)
orig_sum = 0;
i = (MC_HEADER_SIZE + data_size) / DWSIZE;
while (i--)
-   orig_sum += ((int *)mc)[i];
+   orig_sum += ((u32 *)mc)[i];
if (orig_sum) {
if (print_err)
pr_err("aborting, bad checksum\n");

[tip:x86/microcode] x86/microcode/intel: Change checksum variables to u32

2016-03-08 Thread tip-bot for Chris Bainbridge

Commit-ID:  bc864af13f34d19c911f5691d87bdacc9ce109f5
Gitweb: http://git.kernel.org/tip/bc864af13f34d19c911f5691d87bdacc9ce109f5
Author: Chris Bainbridge 
AuthorDate: Mon, 7 Mar 2016 11:10:00 +0100
Committer:  Thomas Gleixner 
CommitDate: Tue, 8 Mar 2016 09:08:44 +0100

x86/microcode/intel: Change checksum variables to u32

Microcode checksum verification should be done using unsigned 32-bit
values otherwise the calculation overflow results in undefined
behaviour.

This is also nicely documented in the SDM, section "Microcode Update
Checksum":

  "To check for a corrupt microcode update, software must perform a
  unsigned DWORD (32-bit) checksum of the microcode update. Even though
  some fields are signed, the checksum procedure treats all DWORDs as
  unsigned. Microcode updates with a header version equal to 0001H
  must sum all DWORDs that comprise the microcode update. A valid
  checksum check will yield a value of H."

but for some reason the code has been using ints from the very
beginning.

In practice, this bug possibly manifested itself only when doing the
microcode data checksum - apparently, currently shipped Intel microcode
doesn't have an extended signature table for which we do checksum
verification too.

  UBSAN: Undefined behaviour in arch/x86/kernel/cpu/microcode/intel_lib.c:105:12
  signed integer overflow:
  -1500151068 + -2125470173 cannot be represented in type 'int'
  CPU: 0 PID: 0 Comm: swapper Not tainted 4.5.0-rc5+ #495
  ...
  Call Trace:
   dump_stack
   ? inotify_ioctl
   ubsan_epilogue
   handle_overflow
   __ubsan_handle_add_overflow
   microcode_sanity_check
   get_matching_model_microcode.isra.2.constprop.8
   ? early_idt_handler_common
   ? strlcpy
   ? find_cpio_data
   load_ucode_intel_bsp
   load_ucode_bsp
   ? load_ucode_bsp
   x86_64_start_kernel

[ Expand and massage commit message. ]
Signed-off-by: Chris Bainbridge 
Signed-off-by: Borislav Petkov 
Cc: h...@hmh.eng.br
Link: 
http://lkml.kernel.org/r/1456834359-5132-1-git-send-email-chris.bainbri...@gmail.com
Signed-off-by: Thomas Gleixner 
---
 arch/x86/kernel/cpu/microcode/intel_lib.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kernel/cpu/microcode/intel_lib.c 
b/arch/x86/kernel/cpu/microcode/intel_lib.c
index b96896b..99ca2c9 100644
--- a/arch/x86/kernel/cpu/microcode/intel_lib.c
+++ b/arch/x86/kernel/cpu/microcode/intel_lib.c
@@ -49,7 +49,7 @@ int microcode_sanity_check(void *mc, int print_err)
unsigned long total_size, data_size, ext_table_size;
struct microcode_header_intel *mc_header = mc;
struct extended_sigtable *ext_header = NULL;
-   int sum, orig_sum, ext_sigcount = 0, i;
+   u32 sum, orig_sum, ext_sigcount = 0, i;
struct extended_signature *ext_sig;
 
total_size = get_totalsize(mc_header);
@@ -85,8 +85,8 @@ int microcode_sanity_check(void *mc, int print_err)
 
/* check extended table checksum */
if (ext_table_size) {
-   int ext_table_sum = 0;
-   int *ext_tablep = (int *)ext_header;
+   u32 ext_table_sum = 0;
+   u32 *ext_tablep = (u32 *)ext_header;
 
i = ext_table_size / DWSIZE;
while (i--)
@@ -102,7 +102,7 @@ int microcode_sanity_check(void *mc, int print_err)
orig_sum = 0;
i = (MC_HEADER_SIZE + data_size) / DWSIZE;
while (i--)
-   orig_sum += ((int *)mc)[i];
+   orig_sum += ((u32 *)mc)[i];
if (orig_sum) {
if (print_err)
pr_err("aborting, bad checksum\n");

Re: [PATCH] ACPICA: Events: Execute some _REG methods in early boot

2016-03-07 Thread Chris Bainbridge

On Mon, Mar 07, 2016 at 06:36:05AM +, Zheng, Lv wrote:
> Hi,
> 
> First of all, why don't you respond on the kernel bugzilla?
> Posting a fix here directly without communication doesn't look like a 
> constructive help but more like a destructive attack to me.

Sorry if it appears to be an attack. It wasn't meant that way. I was
under the impression that email is the preferred means of communication
for kernel development. I'm not sure whether other developers actively
monitor bugzilla reports (some do, but bug trackers are often graveyards
for bug reports and it's easy for communications to be missed).

> As I said in the previous reply, this is the known issue and can be fixed by 
> applying the whole series.
> Especially this commit:
> https://patchwork.kernel.org/patch/8241421/
> That's why I asked you to test again by applying the whole series.
> And that's why after having not seen your response for so long time, we 
> prepared a test branch and was waiting for your response.

I already replied 9 days ago: https://lkml.org/lkml/2016/2/27/164
The suggested patch did not fix the issue and the patch series did not
apply cleanly.

(btw I'm not paid for this work so I tend to handle it in batches when I
have spare time, which is why you may see replies delayed to weekends etc.)

> You need to post acpidump there to have the issue root caused so that more 
> accurate fix can be generated.

I already sent an acpidump for this system to you and Robert (email 3rd Feb).

> The above fix looks hackish to me.
> IMO, if you want to stop regressions that are triggered by this commit.
> A simpler and proper way would be to move acpi_gbl_reg_method_enabled = TRUE 
> to the end of acpi_load_tables().
> So that when the order of table loading and ECDT probing is corrected, the 
> correct logic can still apply.
> 
> I don't have ECDT platforms in hand to confirm.
> Can you help to just give it a try?

Yes the fix may be hackish, but it served the purpose of correctly
identifying the bug. Your suggestion works fine, for reference the
tested patch was:


diff --git a/drivers/acpi/acpica/tbxfload.c b/drivers/acpi/acpica/tbxfload.c
index 278666e39563..9136d7250022 100644
--- a/drivers/acpi/acpica/tbxfload.c
+++ b/drivers/acpi/acpica/tbxfload.c
@@ -83,6 +83,7 @@ acpi_status __init acpi_load_tables(void)
"While loading namespace from ACPI tables"));
}
 
+   acpi_gbl_reg_methods_enabled = TRUE;
return_ACPI_STATUS(status);
 }
 
diff --git a/drivers/acpi/acpica/utxfinit.c b/drivers/acpi/acpica/utxfinit.c
index 721b87cce908..2c0491038068 100644
--- a/drivers/acpi/acpica/utxfinit.c
+++ b/drivers/acpi/acpica/utxfinit.c
@@ -267,7 +267,6 @@ acpi_status __init acpi_initialize_objects(u32 flags)
 * initialized, even if they contain executable AML (see the call to
 * acpi_ns_initialize_objects below).
 */
-   acpi_gbl_reg_methods_enabled = TRUE;
if (!(flags & ACPI_NO_ADDRESS_SPACE_INIT)) {
ACPI_DEBUG_PRINT((ACPI_DB_EXEC,
  "[Init] Executing _REG OpRegion methods\n"));


With this patch applied the ODEBUG errors do not occur.

Thanks.

Re: [PATCH] ACPICA: Events: Execute some _REG methods in early boot

2016-03-07 Thread Chris Bainbridge

On Mon, Mar 07, 2016 at 06:36:05AM +, Zheng, Lv wrote:
> Hi,
> 
> First of all, why don't you respond on the kernel bugzilla?
> Posting a fix here directly without communication doesn't look like a 
> constructive help but more like a destructive attack to me.

Sorry if it appears to be an attack. It wasn't meant that way. I was
under the impression that email is the preferred means of communication
for kernel development. I'm not sure whether other developers actively
monitor bugzilla reports (some do, but bug trackers are often graveyards
for bug reports and it's easy for communications to be missed).

> As I said in the previous reply, this is the known issue and can be fixed by 
> applying the whole series.
> Especially this commit:
> https://patchwork.kernel.org/patch/8241421/
> That's why I asked you to test again by applying the whole series.
> And that's why after having not seen your response for so long time, we 
> prepared a test branch and was waiting for your response.

I already replied 9 days ago: https://lkml.org/lkml/2016/2/27/164
The suggested patch did not fix the issue and the patch series did not
apply cleanly.

(btw I'm not paid for this work so I tend to handle it in batches when I
have spare time, which is why you may see replies delayed to weekends etc.)

> You need to post acpidump there to have the issue root caused so that more 
> accurate fix can be generated.

I already sent an acpidump for this system to you and Robert (email 3rd Feb).

> The above fix looks hackish to me.
> IMO, if you want to stop regressions that are triggered by this commit.
> A simpler and proper way would be to move acpi_gbl_reg_method_enabled = TRUE 
> to the end of acpi_load_tables().
> So that when the order of table loading and ECDT probing is corrected, the 
> correct logic can still apply.
> 
> I don't have ECDT platforms in hand to confirm.
> Can you help to just give it a try?

Yes the fix may be hackish, but it served the purpose of correctly
identifying the bug. Your suggestion works fine, for reference the
tested patch was:


diff --git a/drivers/acpi/acpica/tbxfload.c b/drivers/acpi/acpica/tbxfload.c
index 278666e39563..9136d7250022 100644
--- a/drivers/acpi/acpica/tbxfload.c
+++ b/drivers/acpi/acpica/tbxfload.c
@@ -83,6 +83,7 @@ acpi_status __init acpi_load_tables(void)
"While loading namespace from ACPI tables"));
}
 
+   acpi_gbl_reg_methods_enabled = TRUE;
return_ACPI_STATUS(status);
 }
 
diff --git a/drivers/acpi/acpica/utxfinit.c b/drivers/acpi/acpica/utxfinit.c
index 721b87cce908..2c0491038068 100644
--- a/drivers/acpi/acpica/utxfinit.c
+++ b/drivers/acpi/acpica/utxfinit.c
@@ -267,7 +267,6 @@ acpi_status __init acpi_initialize_objects(u32 flags)
 * initialized, even if they contain executable AML (see the call to
 * acpi_ns_initialize_objects below).
 */
-   acpi_gbl_reg_methods_enabled = TRUE;
if (!(flags & ACPI_NO_ADDRESS_SPACE_INIT)) {
ACPI_DEBUG_PRINT((ACPI_DB_EXEC,
  "[Init] Executing _REG OpRegion methods\n"));


With this patch applied the ODEBUG errors do not occur.

Thanks.

[PATCH] ACPICA: Events: Execute some _REG methods in early boot

2016-03-05 Thread Chris Bainbridge

The regression caused _REG methods to not be run in early boot for all
space IDs, but a removed comment stated that _REG methods should be
executed for IDs like embedded_controller. Before the regression this
was the case:

[0.230469] Executing  Method   \_SB.PCI0.LPCB.EC._REG
[0.230531] Initializing Region   \GNVS
[0.230607] Initializing Region   \_SB.PCI0.LPCB.EC.ECOR
[0.231043] Initializing Region   \_SB.PCI0.IGPU.IGDM

After the regression the initialisation is not done and ODEBUG warnings
are shown at boot and/or shutdown:

[6.676570] WARNING: CPU: 0 PID: 3317 at lib/debugobjects.c:263 
debug_print_object+0x85/0xa0()
[6.676576] ODEBUG: assert_init not available (active state 0) object type: 
timer_list hint: stub_timer+0x0/0x20
[6.676578] Modules linked in:
[6.676582] CPU: 0 PID: 3317 Comm: ccpd Not tainted 4.5.0-rc6 #509
[6.676584] Hardware name: Apple Inc. MacBookPro10,2/Mac-AFD8A9D944EA4843, 
BIOS MBP102.88Z.0106.B0A.1509130955 09/13/2015
[6.676586]   880256543db0 814802b5 
880256543df8
[6.676590]  81f40dbd 880256543de8 810bd0ec 
880256543e90
[6.676594]  822532c0 81f40e63 83138c88 
01fdf040
[6.676598] Call Trace:
[6.676603]  [] dump_stack+0x67/0x92
[6.676608]  [] warn_slowpath_common+0x7c/0xb0
[6.676611]  [] warn_slowpath_fmt+0x47/0x50
[6.676614]  [] debug_print_object+0x85/0xa0
[6.676616]  [] ? cascade+0x70/0x70
[6.676620]  [] debug_object_assert_init+0xf8/0x130
[6.676624]  [] ? trace_hardirqs_on+0xd/0x10
[6.676626]  [] del_timer+0x1f/0x70
[6.676631]  [] laptop_sync_completion+0x61/0x100
[6.676633]  [] ? laptop_io_completion+0x30/0x30
[6.676637]  [] sys_sync+0x7f/0x90
[6.676641]  [] entry_SYSCALL_64_fastpath+0x12/0x6f

Fixes: efaed9be998b ("ACPICA: Events: Enhance acpi_ev_execute_reg_method() to 
ensure no _REG evaluations can happen during OS early boot stages")
Link: https://lkml.org/lkml/2016/2/3/273
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=112911
Signed-off-by: Chris Bainbridge <chris.bainbri...@gmail.com>
---
 drivers/acpi/acpica/evregion.c | 22 +-
 1 file changed, 21 insertions(+), 1 deletion(-)

diff --git a/drivers/acpi/acpica/evregion.c b/drivers/acpi/acpica/evregion.c
index 47092b4d633c..d0b02ef0effa 100644
--- a/drivers/acpi/acpica/evregion.c
+++ b/drivers/acpi/acpica/evregion.c
@@ -590,6 +590,7 @@ acpi_ev_execute_reg_method(union acpi_operand_object 
*region_obj, u32 function)
union acpi_operand_object *args[3];
union acpi_operand_object *region_obj2;
acpi_status status;
+   bool sp;
 
ACPI_FUNCTION_TRACE(ev_execute_reg_method);
 
@@ -598,9 +599,28 @@ acpi_ev_execute_reg_method(union acpi_operand_object 
*region_obj, u32 function)
return_ACPI_STATUS(AE_NOT_EXIST);
}
 
+   /*
+* For the default space_IDs, (the IDs for which there are default 
region handlers
+* installed) Only execute the _REG methods if the global 
initialization _REG
+* methods have already been run (via acpi_initialize_objects). In 
other words,
+* we will defer the execution of the _REG methods for these space_IDs 
until
+* execution of acpi_initialize_objects. This is done because we need 
the handlers
+* for the default spaces (mem/io/pci/table) to be installed before we 
can run
+* any control methods (or _REG methods). There is known BIOS code that 
depends
+* on this.
+*
+* For all other space_IDs, we can safely execute the _REG methods 
immediately.
+* This means that for IDs like embedded_controller, this function 
should be called
+* only after acpi_enable_subsystem has been called.
+*/
+
+   sp = (region_obj->region.space_id == ACPI_ADR_SPACE_SYSTEM_MEMORY ||
+ region_obj->region.space_id == ACPI_ADR_SPACE_SYSTEM_IO ||
+ region_obj->region.space_id == ACPI_ADR_SPACE_PCI_CONFIG ||
+ region_obj->region.space_id == ACPI_ADR_SPACE_DATA_TABLE);
if (region_obj2->extra.method_REG == NULL ||
region_obj->region.handler == NULL ||
-   !acpi_gbl_reg_methods_enabled) {
+   (sp && !acpi_gbl_reg_methods_enabled)) {
return_ACPI_STATUS(AE_OK);
}
 
-- 
2.1.4

[PATCH] ACPICA: Events: Execute some _REG methods in early boot

2016-03-05 Thread Chris Bainbridge

The regression caused _REG methods to not be run in early boot for all
space IDs, but a removed comment stated that _REG methods should be
executed for IDs like embedded_controller. Before the regression this
was the case:

[0.230469] Executing  Method   \_SB.PCI0.LPCB.EC._REG
[0.230531] Initializing Region   \GNVS
[0.230607] Initializing Region   \_SB.PCI0.LPCB.EC.ECOR
[0.231043] Initializing Region   \_SB.PCI0.IGPU.IGDM

After the regression the initialisation is not done and ODEBUG warnings
are shown at boot and/or shutdown:

[6.676570] WARNING: CPU: 0 PID: 3317 at lib/debugobjects.c:263 
debug_print_object+0x85/0xa0()
[6.676576] ODEBUG: assert_init not available (active state 0) object type: 
timer_list hint: stub_timer+0x0/0x20
[6.676578] Modules linked in:
[6.676582] CPU: 0 PID: 3317 Comm: ccpd Not tainted 4.5.0-rc6 #509
[6.676584] Hardware name: Apple Inc. MacBookPro10,2/Mac-AFD8A9D944EA4843, 
BIOS MBP102.88Z.0106.B0A.1509130955 09/13/2015
[6.676586]   880256543db0 814802b5 
880256543df8
[6.676590]  81f40dbd 880256543de8 810bd0ec 
880256543e90
[6.676594]  822532c0 81f40e63 83138c88 
01fdf040
[6.676598] Call Trace:
[6.676603]  [] dump_stack+0x67/0x92
[6.676608]  [] warn_slowpath_common+0x7c/0xb0
[6.676611]  [] warn_slowpath_fmt+0x47/0x50
[6.676614]  [] debug_print_object+0x85/0xa0
[6.676616]  [] ? cascade+0x70/0x70
[6.676620]  [] debug_object_assert_init+0xf8/0x130
[6.676624]  [] ? trace_hardirqs_on+0xd/0x10
[6.676626]  [] del_timer+0x1f/0x70
[6.676631]  [] laptop_sync_completion+0x61/0x100
[6.676633]  [] ? laptop_io_completion+0x30/0x30
[6.676637]  [] sys_sync+0x7f/0x90
[6.676641]  [] entry_SYSCALL_64_fastpath+0x12/0x6f

Fixes: efaed9be998b ("ACPICA: Events: Enhance acpi_ev_execute_reg_method() to 
ensure no _REG evaluations can happen during OS early boot stages")
Link: https://lkml.org/lkml/2016/2/3/273
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=112911
Signed-off-by: Chris Bainbridge 
---
 drivers/acpi/acpica/evregion.c | 22 +-
 1 file changed, 21 insertions(+), 1 deletion(-)

diff --git a/drivers/acpi/acpica/evregion.c b/drivers/acpi/acpica/evregion.c
index 47092b4d633c..d0b02ef0effa 100644
--- a/drivers/acpi/acpica/evregion.c
+++ b/drivers/acpi/acpica/evregion.c
@@ -590,6 +590,7 @@ acpi_ev_execute_reg_method(union acpi_operand_object 
*region_obj, u32 function)
union acpi_operand_object *args[3];
union acpi_operand_object *region_obj2;
acpi_status status;
+   bool sp;
 
ACPI_FUNCTION_TRACE(ev_execute_reg_method);
 
@@ -598,9 +599,28 @@ acpi_ev_execute_reg_method(union acpi_operand_object 
*region_obj, u32 function)
return_ACPI_STATUS(AE_NOT_EXIST);
}
 
+   /*
+* For the default space_IDs, (the IDs for which there are default 
region handlers
+* installed) Only execute the _REG methods if the global 
initialization _REG
+* methods have already been run (via acpi_initialize_objects). In 
other words,
+* we will defer the execution of the _REG methods for these space_IDs 
until
+* execution of acpi_initialize_objects. This is done because we need 
the handlers
+* for the default spaces (mem/io/pci/table) to be installed before we 
can run
+* any control methods (or _REG methods). There is known BIOS code that 
depends
+* on this.
+*
+* For all other space_IDs, we can safely execute the _REG methods 
immediately.
+* This means that for IDs like embedded_controller, this function 
should be called
+* only after acpi_enable_subsystem has been called.
+*/
+
+   sp = (region_obj->region.space_id == ACPI_ADR_SPACE_SYSTEM_MEMORY ||
+ region_obj->region.space_id == ACPI_ADR_SPACE_SYSTEM_IO ||
+ region_obj->region.space_id == ACPI_ADR_SPACE_PCI_CONFIG ||
+ region_obj->region.space_id == ACPI_ADR_SPACE_DATA_TABLE);
if (region_obj2->extra.method_REG == NULL ||
region_obj->region.handler == NULL ||
-   !acpi_gbl_reg_methods_enabled) {
+   (sp && !acpi_gbl_reg_methods_enabled)) {
return_ACPI_STATUS(AE_OK);
}
 
-- 
2.1.4

[PATCH v2] x86/microcode: Change checksum to u32

2016-03-01 Thread Chris Bainbridge

Checksum should be unsigned 32-bit otherwise the calculation overflow
results in undefined behaviour:

[0.00] 

[0.00] UBSAN: Undefined behaviour in 
arch/x86/kernel/cpu/microcode/intel_lib.c:105:12
[0.00] signed integer overflow:
[0.00] -1500151068 + -2125470173 cannot be represented in type 'int'
[0.00] CPU: 0 PID: 0 Comm: swapper Not tainted 4.5.0-rc5+ #495
[0.00]  0086 0086  
83203968
[0.00]  81b30952 834c43b8 83203998 
814fe623
[0.00]  83203980 81bcdf2d 8339a448 
83203a08
[0.00] Call Trace:
[0.00]  [] dump_stack+0x4e/0x6c
[0.00]  [] ? inotify_ioctl+0x43/0x1c0
[0.00]  [] ubsan_epilogue+0xd/0x40
[0.00]  [] handle_overflow+0xbd/0xe0
[0.00]  [] __ubsan_handle_add_overflow+0xe/0x10
[0.00]  [] microcode_sanity_check+0x405/0x590
[0.00]  [] 
get_matching_model_microcode.isra.2.constprop.8+0xa5/0x345
[0.00]  [] ? early_idt_handler_common+0x3d/0xae
[0.00]  [] ? strlcpy+0x52/0xa0
[0.00]  [] ? find_cpio_data+0x371/0x510
[0.00]  [] load_ucode_intel_bsp+0xa1/0xe5
[0.00]  [] load_ucode_bsp+0xdf/0xf3
[0.00]  [] ? load_ucode_bsp+0xdf/0xf3
[0.00]  [] x86_64_start_kernel+0xd1/0xee
[0.00] 


Link: https://lkml.org/lkml/2016/2/27/79
Signed-off-by: Chris Bainbridge <chris.bainbri...@gmail.com>
---
 arch/x86/kernel/cpu/microcode/intel_lib.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kernel/cpu/microcode/intel_lib.c 
b/arch/x86/kernel/cpu/microcode/intel_lib.c
index b96896bcbdaf..99ca2c935777 100644
--- a/arch/x86/kernel/cpu/microcode/intel_lib.c
+++ b/arch/x86/kernel/cpu/microcode/intel_lib.c
@@ -49,7 +49,7 @@ int microcode_sanity_check(void *mc, int print_err)
unsigned long total_size, data_size, ext_table_size;
struct microcode_header_intel *mc_header = mc;
struct extended_sigtable *ext_header = NULL;
-   int sum, orig_sum, ext_sigcount = 0, i;
+   u32 sum, orig_sum, ext_sigcount = 0, i;
struct extended_signature *ext_sig;
 
total_size = get_totalsize(mc_header);
@@ -85,8 +85,8 @@ int microcode_sanity_check(void *mc, int print_err)
 
/* check extended table checksum */
if (ext_table_size) {
-   int ext_table_sum = 0;
-   int *ext_tablep = (int *)ext_header;
+   u32 ext_table_sum = 0;
+   u32 *ext_tablep = (u32 *)ext_header;
 
i = ext_table_size / DWSIZE;
while (i--)
@@ -102,7 +102,7 @@ int microcode_sanity_check(void *mc, int print_err)
orig_sum = 0;
i = (MC_HEADER_SIZE + data_size) / DWSIZE;
while (i--)
-   orig_sum += ((int *)mc)[i];
+   orig_sum += ((u32 *)mc)[i];
if (orig_sum) {
if (print_err)
pr_err("aborting, bad checksum\n");
-- 
2.1.4

[PATCH v2] x86/microcode: Change checksum to u32

2016-03-01 Thread Chris Bainbridge

Checksum should be unsigned 32-bit otherwise the calculation overflow
results in undefined behaviour:

[0.00] 

[0.00] UBSAN: Undefined behaviour in 
arch/x86/kernel/cpu/microcode/intel_lib.c:105:12
[0.00] signed integer overflow:
[0.00] -1500151068 + -2125470173 cannot be represented in type 'int'
[0.00] CPU: 0 PID: 0 Comm: swapper Not tainted 4.5.0-rc5+ #495
[0.00]  0086 0086  
83203968
[0.00]  81b30952 834c43b8 83203998 
814fe623
[0.00]  83203980 81bcdf2d 8339a448 
83203a08
[0.00] Call Trace:
[0.00]  [] dump_stack+0x4e/0x6c
[0.00]  [] ? inotify_ioctl+0x43/0x1c0
[0.00]  [] ubsan_epilogue+0xd/0x40
[0.00]  [] handle_overflow+0xbd/0xe0
[0.00]  [] __ubsan_handle_add_overflow+0xe/0x10
[0.00]  [] microcode_sanity_check+0x405/0x590
[0.00]  [] 
get_matching_model_microcode.isra.2.constprop.8+0xa5/0x345
[0.00]  [] ? early_idt_handler_common+0x3d/0xae
[0.00]  [] ? strlcpy+0x52/0xa0
[0.00]  [] ? find_cpio_data+0x371/0x510
[0.00]  [] load_ucode_intel_bsp+0xa1/0xe5
[0.00]  [] load_ucode_bsp+0xdf/0xf3
[0.00]  [] ? load_ucode_bsp+0xdf/0xf3
[0.00]  [] x86_64_start_kernel+0xd1/0xee
[0.00] 


Link: https://lkml.org/lkml/2016/2/27/79
Signed-off-by: Chris Bainbridge 
---
 arch/x86/kernel/cpu/microcode/intel_lib.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kernel/cpu/microcode/intel_lib.c 
b/arch/x86/kernel/cpu/microcode/intel_lib.c
index b96896bcbdaf..99ca2c935777 100644
--- a/arch/x86/kernel/cpu/microcode/intel_lib.c
+++ b/arch/x86/kernel/cpu/microcode/intel_lib.c
@@ -49,7 +49,7 @@ int microcode_sanity_check(void *mc, int print_err)
unsigned long total_size, data_size, ext_table_size;
struct microcode_header_intel *mc_header = mc;
struct extended_sigtable *ext_header = NULL;
-   int sum, orig_sum, ext_sigcount = 0, i;
+   u32 sum, orig_sum, ext_sigcount = 0, i;
struct extended_signature *ext_sig;
 
total_size = get_totalsize(mc_header);
@@ -85,8 +85,8 @@ int microcode_sanity_check(void *mc, int print_err)
 
/* check extended table checksum */
if (ext_table_size) {
-   int ext_table_sum = 0;
-   int *ext_tablep = (int *)ext_header;
+   u32 ext_table_sum = 0;
+   u32 *ext_tablep = (u32 *)ext_header;
 
i = ext_table_size / DWSIZE;
while (i--)
@@ -102,7 +102,7 @@ int microcode_sanity_check(void *mc, int print_err)
orig_sum = 0;
i = (MC_HEADER_SIZE + data_size) / DWSIZE;
while (i--)
-   orig_sum += ((int *)mc)[i];
+   orig_sum += ((u32 *)mc)[i];
if (orig_sum) {
if (print_err)
pr_err("aborting, bad checksum\n");
-- 
2.1.4

Re: [BUG] ODEBUG: assert_init not available (active state 0)

2016-02-27 Thread Chris Bainbridge

On Tue, Feb 23, 2016 at 02:18:22AM +, Zheng, Lv wrote:
> 
> In fact, I don't understand what your report is.
> What did you mean by referring the following commit:
> > ACPICA: Events: Enhance acpi_ev_execute_reg_method() to ensure no _REG
> > evaluations can happen during OS early boot stages
> Was this a bisection result for an issue?
> And what was the issue?

Yes it was a bisection result for the ODEBUG errors.

> If the issue is the following warning messages:
> > [   34.512758] [ cut here ]
> > [   34.512765] WARNING: CPU: 0 PID: 4975 at lib/debugobjects.c:263
> > debug_print_object+0x85/0xa0()
> > [   34.512770] ODEBUG: assert_init not available (active state 0) object 
> > type:
> > timer_list hint: stub_timer+0x0/0x20
> > [   34.512772] Modules linked in:
> > [   34.512774] CPU: 0 PID: 4975 Comm: systemd Not tainted 4.4.0-rc7+ #353
> > [   34.512776] Hardware name: Apple Inc. MacBookPro10,2/Mac-
> > AFD8A9D944EA4843, BIOS MBP102.88Z.0106.B0A.1509130955 09/13/2015
> > [   34.512779]  81f9b41c 880227a1bdb0 814ec829
> > 880227a1bdf8
> > [   34.512782]  880227a1bde8 810cd831 880227a1be90
> > 822514c0
> > [   34.512785]  81f9b4c2 8327e288 7f29190ba700
> > 880227a1be48
> > [   34.512786] Call Trace:
> > [   34.512790]  [] dump_stack+0x4b/0x72
> > [   34.512793]  [] warn_slowpath_common+0x81/0xc0
> > [   34.512795]  [] warn_slowpath_fmt+0x47/0x50
> > [   34.512798]  [] ? do_init_timer+0x52/0x60
> > [   34.512800]  [] debug_print_object+0x85/0xa0
> > [   34.512802]  [] ?
> > trace_event_raw_event_tick_stop+0x100/0x100
> > [   34.512805]  [] debug_object_assert_init+0xf8/0x130
> > [   34.512807]  [] ? trace_hardirqs_on+0xd/0x10
> > [   34.512810]  [] del_timer+0x1f/0x70
> > [   34.512813]  [] laptop_sync_completion+0x61/0x100
> > [   34.512815]  [] ? laptop_io_completion+0x30/0x30
> > [   34.512819]  [] sys_sync+0x7f/0x90
> > [   34.512824]  [] entry_SYSCALL_64_fastpath+0x12/0x6f
> > [   34.512825] ---[ end trace 8fe52cbaccc47e66 ]---
> 
> Our investigation result shows this is caused by a multiple deletion on the 
> same Linux kernel timer.
> And the commit you reported shouldn't be the culprit.
> So it looks to me like a wrong bisection result if it was a bisection result.

I've checked again and the bisect result is correct. It could be that
the issue was always there, but this error is new on my system at least.

> While the following commit may trigger such breakage:
> > ACPICA: Parser: Fix for SuperName method invocation
> So I think you need to test again after reverting this patch.
> The patch is actually reverted in ACPICA 20160212 release.
> You can use the following branch where ACPICA 20160212 release is applied:
> https://git.kernel.org/cgit/linux/kernel/git/rafael/linux-pm.git/?h=acpica-test
> test it again to confirm if such breakage still can be seen.

That branch also gives the ODEBUG error.

> If the issue is some graphical glitches.
> As I said, the commit is just a part of whole series that try to make Linux 
> ACPICA initialization sequences correctly ordered.
> It, along with other patches try to make Linux working as spec compliant 
> (also proven by Windows behavior).
> So you really need to apply more patches on top of acpica-test branch to 
> confirm again.
> For example, patches from this series:
> http://www.spinics.net/lists/linux-acpi/msg63550.html

Patch series does not apply cleanly (3 4 5 6 12 14 fail)

> Especially this one:
> http://www.spinics.net/lists/linux-acpi/msg63558.html

No change, ODEBUG error still occurs.


To sum up test results:

v4.4-rc7-48-g849c25719ac6
- ok

v4.4-rc7-49-gefaed9be998b
- odebug error

acpia-test
- odebug error

http://www.spinics.net/lists/linux-acpi/msg63550.html
- patch series does not apply cleanly (3 4 5 6 12 14 fail)

http://www.spinics.net/lists/linux-acpi/msg63558.html (patch7)
- odebug error

Re: [BUG] ODEBUG: assert_init not available (active state 0)

2016-02-27 Thread Chris Bainbridge

On Tue, Feb 23, 2016 at 02:18:22AM +, Zheng, Lv wrote:
> 
> In fact, I don't understand what your report is.
> What did you mean by referring the following commit:
> > ACPICA: Events: Enhance acpi_ev_execute_reg_method() to ensure no _REG
> > evaluations can happen during OS early boot stages
> Was this a bisection result for an issue?
> And what was the issue?

Yes it was a bisection result for the ODEBUG errors.

> If the issue is the following warning messages:
> > [   34.512758] [ cut here ]
> > [   34.512765] WARNING: CPU: 0 PID: 4975 at lib/debugobjects.c:263
> > debug_print_object+0x85/0xa0()
> > [   34.512770] ODEBUG: assert_init not available (active state 0) object 
> > type:
> > timer_list hint: stub_timer+0x0/0x20
> > [   34.512772] Modules linked in:
> > [   34.512774] CPU: 0 PID: 4975 Comm: systemd Not tainted 4.4.0-rc7+ #353
> > [   34.512776] Hardware name: Apple Inc. MacBookPro10,2/Mac-
> > AFD8A9D944EA4843, BIOS MBP102.88Z.0106.B0A.1509130955 09/13/2015
> > [   34.512779]  81f9b41c 880227a1bdb0 814ec829
> > 880227a1bdf8
> > [   34.512782]  880227a1bde8 810cd831 880227a1be90
> > 822514c0
> > [   34.512785]  81f9b4c2 8327e288 7f29190ba700
> > 880227a1be48
> > [   34.512786] Call Trace:
> > [   34.512790]  [] dump_stack+0x4b/0x72
> > [   34.512793]  [] warn_slowpath_common+0x81/0xc0
> > [   34.512795]  [] warn_slowpath_fmt+0x47/0x50
> > [   34.512798]  [] ? do_init_timer+0x52/0x60
> > [   34.512800]  [] debug_print_object+0x85/0xa0
> > [   34.512802]  [] ?
> > trace_event_raw_event_tick_stop+0x100/0x100
> > [   34.512805]  [] debug_object_assert_init+0xf8/0x130
> > [   34.512807]  [] ? trace_hardirqs_on+0xd/0x10
> > [   34.512810]  [] del_timer+0x1f/0x70
> > [   34.512813]  [] laptop_sync_completion+0x61/0x100
> > [   34.512815]  [] ? laptop_io_completion+0x30/0x30
> > [   34.512819]  [] sys_sync+0x7f/0x90
> > [   34.512824]  [] entry_SYSCALL_64_fastpath+0x12/0x6f
> > [   34.512825] ---[ end trace 8fe52cbaccc47e66 ]---
> 
> Our investigation result shows this is caused by a multiple deletion on the 
> same Linux kernel timer.
> And the commit you reported shouldn't be the culprit.
> So it looks to me like a wrong bisection result if it was a bisection result.

I've checked again and the bisect result is correct. It could be that
the issue was always there, but this error is new on my system at least.

> While the following commit may trigger such breakage:
> > ACPICA: Parser: Fix for SuperName method invocation
> So I think you need to test again after reverting this patch.
> The patch is actually reverted in ACPICA 20160212 release.
> You can use the following branch where ACPICA 20160212 release is applied:
> https://git.kernel.org/cgit/linux/kernel/git/rafael/linux-pm.git/?h=acpica-test
> test it again to confirm if such breakage still can be seen.

That branch also gives the ODEBUG error.

> If the issue is some graphical glitches.
> As I said, the commit is just a part of whole series that try to make Linux 
> ACPICA initialization sequences correctly ordered.
> It, along with other patches try to make Linux working as spec compliant 
> (also proven by Windows behavior).
> So you really need to apply more patches on top of acpica-test branch to 
> confirm again.
> For example, patches from this series:
> http://www.spinics.net/lists/linux-acpi/msg63550.html

Patch series does not apply cleanly (3 4 5 6 12 14 fail)

> Especially this one:
> http://www.spinics.net/lists/linux-acpi/msg63558.html

No change, ODEBUG error still occurs.


To sum up test results:

v4.4-rc7-48-g849c25719ac6
- ok

v4.4-rc7-49-gefaed9be998b
- odebug error

acpia-test
- odebug error

http://www.spinics.net/lists/linux-acpi/msg63550.html
- patch series does not apply cleanly (3 4 5 6 12 14 fail)

http://www.spinics.net/lists/linux-acpi/msg63558.html (patch7)
- odebug error

Re: [PATCH] x86/microcode: Change checksum to u32

2016-02-27 Thread Chris Bainbridge

On Sat, Feb 27, 2016 at 06:51:18PM +0100, Borislav Petkov wrote:
> On Sat, Feb 27, 2016 at 11:01:47AM +0000, Chris Bainbridge wrote:
> > Checksum should be unsigned 32-bit otherwise the calculation overflows
> > resulting in undefined behaviour:
> > 
> > [0.00] 
> > 
> > [0.00] UBSAN: Undefined behaviour in 
> > arch/x86/kernel/cpu/microcode/intel_lib.c:105:12
> > [0.00] signed integer overflow:
> > [0.00] -1500151068 + -2125470173 cannot be represented in type 'int'
> 
> So I've been staring at this error message for a while now and I'm
> having hard time understanding what it is telling me. Let's look at all
> three ints:
> 
> * sum is computed here:
> 
> sum = orig_sum
> - (mc_header->sig + mc_header->pf + mc_header->cksum)
> + (ext_sig->sig + ext_sig->pf + ext_sig->cksum);
> 
> and checks mc_header against each extended signature. If it overflows, we 
> abort:
> 
> if (sum) {
> if (print_err)
> pr_err("aborting, bad checksum\n");
> 
> 
> * orig_sum in the above sum can only be 0:
> 
> if (orig_sum) {
> if (print_err)
> pr_err("aborting, bad checksum\n");
> return -EINVAL;
> 
> and it adds a bunch of integers which can overflow, sure, but if it
> overflows, we exit early.
> 
> * and
> 
>   ext_sigcount = ext_header->count;
> 
> so it is some count of extended signatures.
> 
> So what is ubsan complaining about?

/* calculate the checksum */
orig_sum = 0;
i = (MC_HEADER_SIZE + data_size) / DWSIZE;
while (i--)
orig_sum += ((int *)mc)[i];

The checksum is the additive sum of all the 32-bit values, but 32-bit
addition overflow is undefined for signed ints. For comparison the
checksum function from iucode-tool uses u32:

intel_ucode_status_t intel_ucode_check_microcode(const void * const uc,
int strict)
{
...

uint32_t sum, orig_sum;
unsigned int i;
uint32_t *p;

...

/* Calculate the checksum.  We exclude the extended table as it
 * also has to have a zero checksum, in order to get better
 * coverage */
orig_sum = 0;
i = (INTEL_UC_V1_HEADER_SIZE + data_size) / sizeof(uint32_t);
p = (uint32_t *)uc;
while (i--) {
orig_sum  += *p;
p++;
}

Re: [PATCH] x86/microcode: Change checksum to u32

2016-02-27 Thread Chris Bainbridge

On Sat, Feb 27, 2016 at 06:51:18PM +0100, Borislav Petkov wrote:
> On Sat, Feb 27, 2016 at 11:01:47AM +0000, Chris Bainbridge wrote:
> > Checksum should be unsigned 32-bit otherwise the calculation overflows
> > resulting in undefined behaviour:
> > 
> > [0.00] 
> > 
> > [0.00] UBSAN: Undefined behaviour in 
> > arch/x86/kernel/cpu/microcode/intel_lib.c:105:12
> > [0.00] signed integer overflow:
> > [0.00] -1500151068 + -2125470173 cannot be represented in type 'int'
> 
> So I've been staring at this error message for a while now and I'm
> having hard time understanding what it is telling me. Let's look at all
> three ints:
> 
> * sum is computed here:
> 
> sum = orig_sum
> - (mc_header->sig + mc_header->pf + mc_header->cksum)
> + (ext_sig->sig + ext_sig->pf + ext_sig->cksum);
> 
> and checks mc_header against each extended signature. If it overflows, we 
> abort:
> 
> if (sum) {
> if (print_err)
> pr_err("aborting, bad checksum\n");
> 
> 
> * orig_sum in the above sum can only be 0:
> 
> if (orig_sum) {
> if (print_err)
> pr_err("aborting, bad checksum\n");
> return -EINVAL;
> 
> and it adds a bunch of integers which can overflow, sure, but if it
> overflows, we exit early.
> 
> * and
> 
>   ext_sigcount = ext_header->count;
> 
> so it is some count of extended signatures.
> 
> So what is ubsan complaining about?

/* calculate the checksum */
orig_sum = 0;
i = (MC_HEADER_SIZE + data_size) / DWSIZE;
while (i--)
orig_sum += ((int *)mc)[i];

The checksum is the additive sum of all the 32-bit values, but 32-bit
addition overflow is undefined for signed ints. For comparison the
checksum function from iucode-tool uses u32:

intel_ucode_status_t intel_ucode_check_microcode(const void * const uc,
int strict)
{
...

uint32_t sum, orig_sum;
unsigned int i;
uint32_t *p;

...

/* Calculate the checksum.  We exclude the extended table as it
 * also has to have a zero checksum, in order to get better
 * coverage */
orig_sum = 0;
i = (INTEL_UC_V1_HEADER_SIZE + data_size) / sizeof(uint32_t);
p = (uint32_t *)uc;
while (i--) {
orig_sum  += *p;
p++;
}

[PATCH] x86/microcode: Change checksum to u32

2016-02-27 Thread Chris Bainbridge

Checksum should be unsigned 32-bit otherwise the calculation overflows
resulting in undefined behaviour:

[0.00] 

[0.00] UBSAN: Undefined behaviour in 
arch/x86/kernel/cpu/microcode/intel_lib.c:105:12
[0.00] signed integer overflow:
[0.00] -1500151068 + -2125470173 cannot be represented in type 'int'
[0.00] CPU: 0 PID: 0 Comm: swapper Not tainted 4.5.0-rc5+ #495
[0.00]  0086 0086  
83203968
[0.00]  81b30952 834c43b8 83203998 
814fe623
[0.00]  83203980 81bcdf2d 8339a448 
83203a08
[0.00] Call Trace:
[0.00]  [] dump_stack+0x4e/0x6c
[0.00]  [] ? inotify_ioctl+0x43/0x1c0
[0.00]  [] ubsan_epilogue+0xd/0x40
[0.00]  [] handle_overflow+0xbd/0xe0
[0.00]  [] __ubsan_handle_add_overflow+0xe/0x10
[0.00]  [] microcode_sanity_check+0x405/0x590
[0.00]  [] 
get_matching_model_microcode.isra.2.constprop.8+0xa5/0x345
[0.00]  [] ? early_idt_handler_common+0x3d/0xae
[0.00]  [] ? strlcpy+0x52/0xa0
[0.00]  [] ? find_cpio_data+0x371/0x510
[0.00]  [] load_ucode_intel_bsp+0xa1/0xe5
[0.00]  [] load_ucode_bsp+0xdf/0xf3
[0.00]  [] ? load_ucode_bsp+0xdf/0xf3
[0.00]  [] x86_64_start_kernel+0xd1/0xee
[0.00] 


Signed-off-by: Chris Bainbridge <chris.bainbri...@gmail.com>
---
 arch/x86/kernel/cpu/microcode/intel_lib.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/kernel/cpu/microcode/intel_lib.c 
b/arch/x86/kernel/cpu/microcode/intel_lib.c
index b96896bcbdaf..2bef93ff2e3f 100644
--- a/arch/x86/kernel/cpu/microcode/intel_lib.c
+++ b/arch/x86/kernel/cpu/microcode/intel_lib.c
@@ -49,7 +49,7 @@ int microcode_sanity_check(void *mc, int print_err)
unsigned long total_size, data_size, ext_table_size;
struct microcode_header_intel *mc_header = mc;
struct extended_sigtable *ext_header = NULL;
-   int sum, orig_sum, ext_sigcount = 0, i;
+   u32 sum, orig_sum, ext_sigcount = 0, i;
struct extended_signature *ext_sig;
 
total_size = get_totalsize(mc_header);
-- 
2.1.4

[PATCH] x86/microcode: Change checksum to u32

2016-02-27 Thread Chris Bainbridge

Checksum should be unsigned 32-bit otherwise the calculation overflows
resulting in undefined behaviour:

[0.00] 

[0.00] UBSAN: Undefined behaviour in 
arch/x86/kernel/cpu/microcode/intel_lib.c:105:12
[0.00] signed integer overflow:
[0.00] -1500151068 + -2125470173 cannot be represented in type 'int'
[0.00] CPU: 0 PID: 0 Comm: swapper Not tainted 4.5.0-rc5+ #495
[0.00]  0086 0086  
83203968
[0.00]  81b30952 834c43b8 83203998 
814fe623
[0.00]  83203980 81bcdf2d 8339a448 
83203a08
[0.00] Call Trace:
[0.00]  [] dump_stack+0x4e/0x6c
[0.00]  [] ? inotify_ioctl+0x43/0x1c0
[0.00]  [] ubsan_epilogue+0xd/0x40
[0.00]  [] handle_overflow+0xbd/0xe0
[0.00]  [] __ubsan_handle_add_overflow+0xe/0x10
[0.00]  [] microcode_sanity_check+0x405/0x590
[0.00]  [] 
get_matching_model_microcode.isra.2.constprop.8+0xa5/0x345
[0.00]  [] ? early_idt_handler_common+0x3d/0xae
[0.00]  [] ? strlcpy+0x52/0xa0
[0.00]  [] ? find_cpio_data+0x371/0x510
[0.00]  [] load_ucode_intel_bsp+0xa1/0xe5
[0.00]  [] load_ucode_bsp+0xdf/0xf3
[0.00]  [] ? load_ucode_bsp+0xdf/0xf3
[0.00]  [] x86_64_start_kernel+0xd1/0xee
[0.00] 


Signed-off-by: Chris Bainbridge 
---
 arch/x86/kernel/cpu/microcode/intel_lib.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/kernel/cpu/microcode/intel_lib.c 
b/arch/x86/kernel/cpu/microcode/intel_lib.c
index b96896bcbdaf..2bef93ff2e3f 100644
--- a/arch/x86/kernel/cpu/microcode/intel_lib.c
+++ b/arch/x86/kernel/cpu/microcode/intel_lib.c
@@ -49,7 +49,7 @@ int microcode_sanity_check(void *mc, int print_err)
unsigned long total_size, data_size, ext_table_size;
struct microcode_header_intel *mc_header = mc;
struct extended_sigtable *ext_header = NULL;
-   int sum, orig_sum, ext_sigcount = 0, i;
+   u32 sum, orig_sum, ext_sigcount = 0, i;
struct extended_signature *ext_sig;
 
total_size = get_totalsize(mc_header);
-- 
2.1.4

[BUG] USB ethernet on XHCI dies under load

2016-02-25 Thread Chris Bainbridge

USB ethernet devices stop responding when plugged in to USB3 XHCI ports
and flooded with incoming traffic. The usbnet layer seems to get -EPROTO
from the xhci driver. Nothing is usually logged in kernel when this
occurs, but with dyndebug on there are errors like:

ax88179_178a 4-1.1:1.0 eth0: tx throttle -71
intr status -71

Sometimes:

xhci_hcd :00:14.0: xHCI xhci_drop_endpoint called with disabled ep 
88024ea1adc8

To reproduce just flood the adaptor with gigabit traffic from another
computer for a couple of minutes with any of:

ping -f $host
iperf3 -c $host -t 1000 -P 128
iperf3 -c $host -u -b $bandwidth
(for iperf tests run 'iperf3 -s' on the test platform)

iperf3 shows many errors/retries. If slub_debug is on (specifically
slub_debug=U) there are more retries and it fails faster,  which
suggests some kind of memory corruption. Other slub_debug options do not
appear to have any effect. Another symptom of memory corruption is that
other USB devices plugged in to different ports will disconnect and
reconnect. The problem is not these devices though, as the USB ethernet
still fails when all other devices are unplugged.

I have reproduced this with an AX88179 USB3 gigabit ethernet adaptor and
a dm9601 USB1 10/100 adaptor, on both a laptop with Intel XHCI chipset
and a desktop with VIA XHCI chipset. With the dm9601 adaptor just
running "iperf3 -c z  -t 10 -u -b 12m" (12mbit UDP stream) is enough to
quickly kill it.

I have verified that the bug does not occur with USB2 chipset of the
desktop, only USB3.

This bug appears to have been around a long time. There are some
relevant bug threads, several refer specifically to disconnects on
flooding with AX88179:

http://www.spinics.net/lists/linux-usb/msg113358.html
https://bugzilla.kernel.org/show_bug.cgi?id=75381
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1371233
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1269883

I bisected this with the AX88179 but it just led back to
v3.11-rc1-224-g452c447a497d  ("USBNET: increase max rx/tx qlen for
improving USB3 thoughtput") which is probably not the actual cause of
the bug.

[BUG] USB ethernet on XHCI dies under load

2016-02-25 Thread Chris Bainbridge

USB ethernet devices stop responding when plugged in to USB3 XHCI ports
and flooded with incoming traffic. The usbnet layer seems to get -EPROTO
from the xhci driver. Nothing is usually logged in kernel when this
occurs, but with dyndebug on there are errors like:

ax88179_178a 4-1.1:1.0 eth0: tx throttle -71
intr status -71

Sometimes:

xhci_hcd :00:14.0: xHCI xhci_drop_endpoint called with disabled ep 
88024ea1adc8

To reproduce just flood the adaptor with gigabit traffic from another
computer for a couple of minutes with any of:

ping -f $host
iperf3 -c $host -t 1000 -P 128
iperf3 -c $host -u -b $bandwidth
(for iperf tests run 'iperf3 -s' on the test platform)

iperf3 shows many errors/retries. If slub_debug is on (specifically
slub_debug=U) there are more retries and it fails faster,  which
suggests some kind of memory corruption. Other slub_debug options do not
appear to have any effect. Another symptom of memory corruption is that
other USB devices plugged in to different ports will disconnect and
reconnect. The problem is not these devices though, as the USB ethernet
still fails when all other devices are unplugged.

I have reproduced this with an AX88179 USB3 gigabit ethernet adaptor and
a dm9601 USB1 10/100 adaptor, on both a laptop with Intel XHCI chipset
and a desktop with VIA XHCI chipset. With the dm9601 adaptor just
running "iperf3 -c z  -t 10 -u -b 12m" (12mbit UDP stream) is enough to
quickly kill it.

I have verified that the bug does not occur with USB2 chipset of the
desktop, only USB3.

This bug appears to have been around a long time. There are some
relevant bug threads, several refer specifically to disconnects on
flooding with AX88179:

http://www.spinics.net/lists/linux-usb/msg113358.html
https://bugzilla.kernel.org/show_bug.cgi?id=75381
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1371233
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1269883

I bisected this with the AX88179 but it just led back to
v3.11-rc1-224-g452c447a497d  ("USBNET: increase max rx/tx qlen for
improving USB3 thoughtput") which is probably not the actual cause of
the bug.

Re: [BUG] ODEBUG: assert_init not available (active state 0)

2016-02-14 Thread Chris Bainbridge

On 5 February 2016 at 02:52, Zheng, Lv  wrote:
>
> So you could wait for just several days to test it again.
> And report here if you can still see issues.

I didn't notice any revert last week, is there something specific that
I should test? I just noticed that there are graphical glitches in
Firefox on Youtube as well.

Re: [BUG] ODEBUG: assert_init not available (active state 0)

2016-02-14 Thread Chris Bainbridge

On 5 February 2016 at 02:52, Zheng, Lv  wrote:
>
> So you could wait for just several days to test it again.
> And report here if you can still see issues.

I didn't notice any revert last week, is there something specific that
I should test? I just noticed that there are graphical glitches in
Firefox on Youtube as well.

[PATCH] usb: core: hub: hub_port_init lock controller instead of bus

2016-02-08 Thread Chris Bainbridge

The XHCI controller presents two USB buses to the system - one for USB 2
and one for USB 3. When only one bus is locked there is a race condition
with two threads in hub_port_init:

[8.984500] Call Trace:
[8.985698]  [] schedule+0x37/0x90
[8.986918]  [] usb_kill_urb+0x8d/0xd0
[8.988130]  [] ? wake_up_atomic_t+0x30/0x30
[8.989343]  [] usb_start_wait_urb+0xbe/0x150
[8.990561]  [] usb_control_msg+0xbc/0xf0
[8.991766]  [] hub_port_init+0x51e/0xb70
[8.992964]  [] hub_event+0x817/0x1570
[8.994156]  [] process_one_work+0x1ff/0x620
[8.995342]  [] ? process_one_work+0x15f/0x620
[8.996528]  [] worker_thread+0x64/0x4b0
[8.997707]  [] ? rescuer_thread+0x390/0x390
[8.998883]  [] kthread+0x105/0x120
[9.56]  [] ? kthread_create_on_node+0x200/0x200
[9.001241]  [] ret_from_fork+0x3f/0x70
[9.002420]  [] ? kthread_create_on_node+0x200/0x200

[9.870837] Call Trace:
[9.875664]  [] xhci_setup_device+0x53d/0xa40
[9.876871]  [] xhci_address_device+0xe/0x10
[9.878068]  [] hub_port_init+0x1bf/0xb70
[9.879262]  [] ? trace_hardirqs_on+0xd/0x10
[9.880465]  [] hub_event+0x817/0x1570
[9.881668]  [] process_one_work+0x1ff/0x620
[9.882869]  [] ? process_one_work+0x15f/0x620
[9.884074]  [] worker_thread+0x64/0x4b0
[9.885268]  [] ? rescuer_thread+0x390/0x390
[9.886457]  [] kthread+0x105/0x120
[9.887634]  [] ? kthread_create_on_node+0x200/0x200
[9.17]  [] ret_from_fork+0x3f/0x70
[9.889995]  [] ? kthread_create_on_node+0x200/0x200

Which results from the two call chains:

hub_port_init
 usb_get_device_descriptor
  usb_get_descriptor
   usb_control_msg
usb_internal_control_msg
 usb_start_wait_urb
  usb_submit_urb / wait_for_completion_timeout / usb_kill_urb

hub_port_init
 hub_set_address
  xhci_address_device
   xhci_setup_device

The logged kernel errors are:

[8.034843] xhci_hcd :00:14.0: Timeout while waiting for setup device 
command
[   13.183701] usb 3-3: device descriptor read/all, error -110

On a test system this failure occurred on 6% of all boots.

Hypothetically, it should perhaps be possible to set the address of the
hub on one bus while the hub on the other bus already has an outstanding
descriptor read request. But as this is not working reliably, fix it by
locking the controller in hub_port_init to prevent parallel
initialisation of both buses.

Fixes: 638139eb95d2 ("usb: hub: allow to process more usb hub events in 
parallel")
Signed-off-by: Chris Bainbridge 
---
 drivers/usb/core/hcd.c  | 15 +--
 drivers/usb/core/hub.c  |  8 
 include/linux/usb.h |  3 +--
 include/linux/usb/hcd.h |  1 +
 4 files changed, 19 insertions(+), 8 deletions(-)

diff --git a/drivers/usb/core/hcd.c b/drivers/usb/core/hcd.c
index df0e3b92533a..5824e819176a 100644
--- a/drivers/usb/core/hcd.c
+++ b/drivers/usb/core/hcd.c
@@ -966,7 +966,7 @@ static void usb_bus_init (struct usb_bus *bus)
bus->bandwidth_allocated = 0;
bus->bandwidth_int_reqs  = 0;
bus->bandwidth_isoc_reqs = 0;
-   mutex_init(>usb_address0_mutex);
+   mutex_init(>devnum_next_mutex);
 
INIT_LIST_HEAD (>bus_list);
 }
@@ -2497,6 +2497,14 @@ struct usb_hcd *usb_create_shared_hcd(const struct 
hc_driver *driver,
return NULL;
}
if (primary_hcd == NULL) {
+   hcd->address0_mutex = kmalloc(sizeof(*hcd->address0_mutex),
+   GFP_KERNEL);
+   if (!hcd->address0_mutex) {
+   kfree(hcd);
+   dev_dbg(dev, "hcd address0 mutex alloc failed\n");
+   return NULL;
+   }
+   mutex_init(hcd->address0_mutex);
hcd->bandwidth_mutex = kmalloc(sizeof(*hcd->bandwidth_mutex),
GFP_KERNEL);
if (!hcd->bandwidth_mutex) {
@@ -2508,6 +2516,7 @@ struct usb_hcd *usb_create_shared_hcd(const struct 
hc_driver *driver,
dev_set_drvdata(dev, hcd);
} else {
mutex_lock(_port_peer_mutex);
+   hcd->address0_mutex = primary_hcd->address0_mutex;
hcd->bandwidth_mutex = primary_hcd->bandwidth_mutex;
hcd->primary_hcd = primary_hcd;
primary_hcd->primary_hcd = primary_hcd;
@@ -2574,8 +2583,10 @@ static void hcd_release(struct kref *kref)
struct usb_hcd *hcd = container_of (kref, struct usb_hcd, kref);
 
mutex_lock(_port_peer_mutex);
-   if (usb_hcd_is_primary_hcd(hcd))
+   if (usb_hcd_is_primary_hcd(hcd)) {
+   kfree(hcd->address0_mutex);
kfree(hcd->bandwidth_mutex);
+   }
if (hcd->shared_hcd) {
struct usb_hcd *peer = hcd->shared_hcd;
 
diff --git a/drivers/usb/core/hub.c b/drivers/usb/core/hub.c
index 350dcd9af5d8..72ee2338bde0 100644
--

[PATCH] usb: core: hub: hub_port_init lock controller instead of bus

2016-02-08 Thread Chris Bainbridge

The XHCI controller presents two USB buses to the system - one for USB 2
and one for USB 3. When only one bus is locked there is a race condition
with two threads in hub_port_init:

[8.984500] Call Trace:
[8.985698]  [] schedule+0x37/0x90
[8.986918]  [] usb_kill_urb+0x8d/0xd0
[8.988130]  [] ? wake_up_atomic_t+0x30/0x30
[8.989343]  [] usb_start_wait_urb+0xbe/0x150
[8.990561]  [] usb_control_msg+0xbc/0xf0
[8.991766]  [] hub_port_init+0x51e/0xb70
[8.992964]  [] hub_event+0x817/0x1570
[8.994156]  [] process_one_work+0x1ff/0x620
[8.995342]  [] ? process_one_work+0x15f/0x620
[8.996528]  [] worker_thread+0x64/0x4b0
[8.997707]  [] ? rescuer_thread+0x390/0x390
[8.998883]  [] kthread+0x105/0x120
[9.56]  [] ? kthread_create_on_node+0x200/0x200
[9.001241]  [] ret_from_fork+0x3f/0x70
[9.002420]  [] ? kthread_create_on_node+0x200/0x200

[9.870837] Call Trace:
[9.875664]  [] xhci_setup_device+0x53d/0xa40
[9.876871]  [] xhci_address_device+0xe/0x10
[9.878068]  [] hub_port_init+0x1bf/0xb70
[9.879262]  [] ? trace_hardirqs_on+0xd/0x10
[9.880465]  [] hub_event+0x817/0x1570
[9.881668]  [] process_one_work+0x1ff/0x620
[9.882869]  [] ? process_one_work+0x15f/0x620
[9.884074]  [] worker_thread+0x64/0x4b0
[9.885268]  [] ? rescuer_thread+0x390/0x390
[9.886457]  [] kthread+0x105/0x120
[9.887634]  [] ? kthread_create_on_node+0x200/0x200
[9.17]  [] ret_from_fork+0x3f/0x70
[9.889995]  [] ? kthread_create_on_node+0x200/0x200

Which results from the two call chains:

hub_port_init
 usb_get_device_descriptor
  usb_get_descriptor
   usb_control_msg
usb_internal_control_msg
 usb_start_wait_urb
  usb_submit_urb / wait_for_completion_timeout / usb_kill_urb

hub_port_init
 hub_set_address
  xhci_address_device
   xhci_setup_device

The logged kernel errors are:

[8.034843] xhci_hcd :00:14.0: Timeout while waiting for setup device 
command
[   13.183701] usb 3-3: device descriptor read/all, error -110

On a test system this failure occurred on 6% of all boots.

Hypothetically, it should perhaps be possible to set the address of the
hub on one bus while the hub on the other bus already has an outstanding
descriptor read request. But as this is not working reliably, fix it by
locking the controller in hub_port_init to prevent parallel
initialisation of both buses.

Fixes: 638139eb95d2 ("usb: hub: allow to process more usb hub events in 
parallel")
Signed-off-by: Chris Bainbridge <chris.bainbri...@gmail.com>
---
 drivers/usb/core/hcd.c  | 15 +--
 drivers/usb/core/hub.c  |  8 
 include/linux/usb.h |  3 +--
 include/linux/usb/hcd.h |  1 +
 4 files changed, 19 insertions(+), 8 deletions(-)

diff --git a/drivers/usb/core/hcd.c b/drivers/usb/core/hcd.c
index df0e3b92533a..5824e819176a 100644
--- a/drivers/usb/core/hcd.c
+++ b/drivers/usb/core/hcd.c
@@ -966,7 +966,7 @@ static void usb_bus_init (struct usb_bus *bus)
bus->bandwidth_allocated = 0;
bus->bandwidth_int_reqs  = 0;
bus->bandwidth_isoc_reqs = 0;
-   mutex_init(>usb_address0_mutex);
+   mutex_init(>devnum_next_mutex);
 
INIT_LIST_HEAD (>bus_list);
 }
@@ -2497,6 +2497,14 @@ struct usb_hcd *usb_create_shared_hcd(const struct 
hc_driver *driver,
return NULL;
}
if (primary_hcd == NULL) {
+   hcd->address0_mutex = kmalloc(sizeof(*hcd->address0_mutex),
+   GFP_KERNEL);
+   if (!hcd->address0_mutex) {
+   kfree(hcd);
+   dev_dbg(dev, "hcd address0 mutex alloc failed\n");
+   return NULL;
+   }
+   mutex_init(hcd->address0_mutex);
hcd->bandwidth_mutex = kmalloc(sizeof(*hcd->bandwidth_mutex),
GFP_KERNEL);
if (!hcd->bandwidth_mutex) {
@@ -2508,6 +2516,7 @@ struct usb_hcd *usb_create_shared_hcd(const struct 
hc_driver *driver,
dev_set_drvdata(dev, hcd);
} else {
mutex_lock(_port_peer_mutex);
+   hcd->address0_mutex = primary_hcd->address0_mutex;
hcd->bandwidth_mutex = primary_hcd->bandwidth_mutex;
hcd->primary_hcd = primary_hcd;
primary_hcd->primary_hcd = primary_hcd;
@@ -2574,8 +2583,10 @@ static void hcd_release(struct kref *kref)
struct usb_hcd *hcd = container_of (kref, struct usb_hcd, kref);
 
mutex_lock(_port_peer_mutex);
-   if (usb_hcd_is_primary_hcd(hcd))
+   if (usb_hcd_is_primary_hcd(hcd)) {
+   kfree(hcd->address0_mutex);
kfree(hcd->bandwidth_mutex);
+   }
if (hcd->shared_hcd) {
struct usb_hcd *peer = hcd->shared_hcd;
 
diff --git a/drivers/usb/core/hub.c b/drivers/usb/core/hub.c
in

Re: [PATCH] usb: host: xhci: Replace bus lock with host controller lock

2016-02-05 Thread Chris Bainbridge

Running task list at fail point:

[8.978405] kworker/3:0 R  running task1176024  2 0x
[8.979624] Workqueue: usb_hub_wq hub_event
[8.980831]  880260613af8 880260613ac0 811247ed 
8802604597c0
[8.982061]  880260608000 880260614000 0100 
880260613b70
[8.983280]  880260613c14 1388 880260613b10 
81b9bab7
[8.984500] Call Trace:
[8.985698]  [] schedule+0x37/0x90
[8.986918]  [] usb_kill_urb+0x8d/0xd0
[8.988130]  [] ? wake_up_atomic_t+0x30/0x30
[8.989343]  [] usb_start_wait_urb+0xbe/0x150
[8.990561]  [] usb_control_msg+0xbc/0xf0
[8.991766]  [] hub_port_init+0x51e/0xb70
[8.992964]  [] hub_event+0x817/0x1570
[8.994156]  [] process_one_work+0x1ff/0x620
[8.995342]  [] ? process_one_work+0x15f/0x620
[8.996528]  [] worker_thread+0x64/0x4b0
[8.997707]  [] ? rescuer_thread+0x390/0x390  
[8.998883]  [] kthread+0x105/0x120
[9.56]  [] ? kthread_create_on_node+0x200/0x200
[9.001241]  [] ret_from_fork+0x3f/0x70
[9.002420]  [] ? kthread_create_on_node+0x200/0x200

[9.094152] kworker/3:1 R  running task11496   238  2 0x
[9.095361] Workqueue: pm pm_runtime_work
[9.096545]  88025fc13ae0 88025fc13aa8 0003811247ed 
880260bc
[9.097767]  8802603aaf80 88025fc14000 88025cc7a520 
88007c4b7978
[9.098986]  88025cc7a520 88007c4b7978 88025fc13af8 
81b9bab7
[9.100213] Call Trace:
[9.101415]  [] schedule+0x37/0x90
[9.102626]  [] usb_kill_urb+0x8d/0xd0
[9.103835]  [] ? wake_up_atomic_t+0x30/0x30
[9.105035]  [] hub_quiesce+0x6b/0xa0
[9.106226]  [] hub_suspend+0x12a/0x250
[9.107417]  [] usb_suspend_both+0x95/0x1d0
[9.108600]  [] usb_runtime_suspend+0x2e/0x70
[9.109783]  [] ? usb_probe_interface+0x310/0x310
[9.110958]  [] __rpm_callback+0x2d/0x70
[9.112130]  [] ? usb_probe_interface+0x310/0x310
[9.113305]  [] rpm_callback+0x1d/0x90
[9.114482]  [] rpm_suspend+0x14b/0x750
[9.115664]  [] __pm_runtime_suspend+0x57/0x90
[9.116852]  [] ? usb_runtime_resume+0x20/0x20
[9.118034]  [] usb_runtime_idle+0x25/0x30
[9.119210]  [] __rpm_callback+0x2d/0x70
[9.120382]  [] rpm_idle+0x221/0x410
[9.121541]  [] pm_runtime_work+0xa9/0xc0
[9.122693]  [] process_one_work+0x1ff/0x620
[9.123850]  [] ? process_one_work+0x15f/0x620
[9.124993]  [] worker_thread+0x64/0x4b0
[9.126134]  [] ? rescuer_thread+0x390/0x390
[9.127272]  [] ? rescuer_thread+0x390/0x390  
[9.128403]  [] kthread+0x105/0x120
[9.129524]  [] ? kthread_create_on_node+0x200/0x200
[9.130653]  [] ret_from_fork+0x3f/0x70
[9.131781]  [] ? kthread_create_on_node+0x200/0x200

[9.864737] kworker/3:2 R  running task11888  1348  2 0x0008
[9.865947] Workqueue: usb_hub_wq hub_event
[9.867152]  88025dbcdf00 88025dbd3b88 81109a8c 
811099a9
[9.868386]  88025dbcdf00  88025dbce270 
88025dbd3bb8
[9.869610]  81109c04 88025dac8000 88025dac8368 
88025c8a4000
[9.870837] Call Trace:
[9.872044]  [] sched_show_task+0x15c/0x260
[9.873248]  [] ? sched_show_task+0x79/0x260
[9.874456]  [] show_state_filter+0x74/0xc0   
[9.875664]  [] xhci_setup_device+0x53d/0xa40 
[9.876871]  [] xhci_address_device+0xe/0x10
[9.878068]  [] hub_port_init+0x1bf/0xb70
[9.879262]  [] ? trace_hardirqs_on+0xd/0x10
[9.880465]  [] hub_event+0x817/0x1570
[9.881668]  [] process_one_work+0x1ff/0x620
[9.882869]  [] ? process_one_work+0x15f/0x620
[9.884074]  [] worker_thread+0x64/0x4b0
[9.885268]  [] ? rescuer_thread+0x390/0x390
[9.886457]  [] kthread+0x105/0x120
[9.887634]  [] ? kthread_create_on_node+0x200/0x200
[9.17]  [] ret_from_fork+0x3f/0x70
[9.889995]  [] ? kthread_create_on_node+0x200/0x200

[9.891174] kworker/3:3 R  running task10808  1350  2 0x
[9.892367] Workqueue: events_freezable usb_stor_scan_dwork
[9.893560]  88025dbdbb78 88026a0ccd80 6a0ccd80 
8802604597c0
[9.894773]  88025dbcc740 88025dbdc000 88025dbdbbb0 
88026a0ccd80
[9.896004]  88026a0ccd80 0003 88025dbdbb90 
81b9bab7
[9.897233] Call Trace:
[9.898444]  [] schedule+0x37/0x90
[9.899670]  [] schedule_timeout+0x17c/0x2f0
[9.900893]  [] ? init_timer_on_stack_key+0x40/0x40
[9.902099]  [] 
wait_for_completion_interruptible_timeout+0xcd/0x160
[9.903324]  [] ? wake_up_q+0x70/0x70
[9.904536]  [] usb_stor_msg_common+0xd8/0x130
[9.905751]  [] usb_stor_control_msg+0x96/0xb0
[9.906961]  [] usb_stor_Bulk_max_lun+0x51/0xa0
[9.908163]  [] usb_stor_scan_dwork+0x7f/0xe0 
[9.909361]  [] process_one_work+0x1ff/0x620
[9.910563]  [] ? process_one_work+0x15f/0x620
[9.911762]  [] worker_thread+0x64/0x4b0
[

Re: [PATCH] usb: host: xhci: Replace bus lock with host controller lock

2016-02-05 Thread Chris Bainbridge

Running task list at fail point:

[8.978405] kworker/3:0 R  running task1176024  2 0x
[8.979624] Workqueue: usb_hub_wq hub_event
[8.980831]  880260613af8 880260613ac0 811247ed 
8802604597c0
[8.982061]  880260608000 880260614000 0100 
880260613b70
[8.983280]  880260613c14 1388 880260613b10 
81b9bab7
[8.984500] Call Trace:
[8.985698]  [] schedule+0x37/0x90
[8.986918]  [] usb_kill_urb+0x8d/0xd0
[8.988130]  [] ? wake_up_atomic_t+0x30/0x30
[8.989343]  [] usb_start_wait_urb+0xbe/0x150
[8.990561]  [] usb_control_msg+0xbc/0xf0
[8.991766]  [] hub_port_init+0x51e/0xb70
[8.992964]  [] hub_event+0x817/0x1570
[8.994156]  [] process_one_work+0x1ff/0x620
[8.995342]  [] ? process_one_work+0x15f/0x620
[8.996528]  [] worker_thread+0x64/0x4b0
[8.997707]  [] ? rescuer_thread+0x390/0x390  
[8.998883]  [] kthread+0x105/0x120
[9.56]  [] ? kthread_create_on_node+0x200/0x200
[9.001241]  [] ret_from_fork+0x3f/0x70
[9.002420]  [] ? kthread_create_on_node+0x200/0x200

[9.094152] kworker/3:1 R  running task11496   238  2 0x
[9.095361] Workqueue: pm pm_runtime_work
[9.096545]  88025fc13ae0 88025fc13aa8 0003811247ed 
880260bc
[9.097767]  8802603aaf80 88025fc14000 88025cc7a520 
88007c4b7978
[9.098986]  88025cc7a520 88007c4b7978 88025fc13af8 
81b9bab7
[9.100213] Call Trace:
[9.101415]  [] schedule+0x37/0x90
[9.102626]  [] usb_kill_urb+0x8d/0xd0
[9.103835]  [] ? wake_up_atomic_t+0x30/0x30
[9.105035]  [] hub_quiesce+0x6b/0xa0
[9.106226]  [] hub_suspend+0x12a/0x250
[9.107417]  [] usb_suspend_both+0x95/0x1d0
[9.108600]  [] usb_runtime_suspend+0x2e/0x70
[9.109783]  [] ? usb_probe_interface+0x310/0x310
[9.110958]  [] __rpm_callback+0x2d/0x70
[9.112130]  [] ? usb_probe_interface+0x310/0x310
[9.113305]  [] rpm_callback+0x1d/0x90
[9.114482]  [] rpm_suspend+0x14b/0x750
[9.115664]  [] __pm_runtime_suspend+0x57/0x90
[9.116852]  [] ? usb_runtime_resume+0x20/0x20
[9.118034]  [] usb_runtime_idle+0x25/0x30
[9.119210]  [] __rpm_callback+0x2d/0x70
[9.120382]  [] rpm_idle+0x221/0x410
[9.121541]  [] pm_runtime_work+0xa9/0xc0
[9.122693]  [] process_one_work+0x1ff/0x620
[9.123850]  [] ? process_one_work+0x15f/0x620
[9.124993]  [] worker_thread+0x64/0x4b0
[9.126134]  [] ? rescuer_thread+0x390/0x390
[9.127272]  [] ? rescuer_thread+0x390/0x390  
[9.128403]  [] kthread+0x105/0x120
[9.129524]  [] ? kthread_create_on_node+0x200/0x200
[9.130653]  [] ret_from_fork+0x3f/0x70
[9.131781]  [] ? kthread_create_on_node+0x200/0x200

[9.864737] kworker/3:2 R  running task11888  1348  2 0x0008
[9.865947] Workqueue: usb_hub_wq hub_event
[9.867152]  88025dbcdf00 88025dbd3b88 81109a8c 
811099a9
[9.868386]  88025dbcdf00  88025dbce270 
88025dbd3bb8
[9.869610]  81109c04 88025dac8000 88025dac8368 
88025c8a4000
[9.870837] Call Trace:
[9.872044]  [] sched_show_task+0x15c/0x260
[9.873248]  [] ? sched_show_task+0x79/0x260
[9.874456]  [] show_state_filter+0x74/0xc0   
[9.875664]  [] xhci_setup_device+0x53d/0xa40 
[9.876871]  [] xhci_address_device+0xe/0x10
[9.878068]  [] hub_port_init+0x1bf/0xb70
[9.879262]  [] ? trace_hardirqs_on+0xd/0x10
[9.880465]  [] hub_event+0x817/0x1570
[9.881668]  [] process_one_work+0x1ff/0x620
[9.882869]  [] ? process_one_work+0x15f/0x620
[9.884074]  [] worker_thread+0x64/0x4b0
[9.885268]  [] ? rescuer_thread+0x390/0x390
[9.886457]  [] kthread+0x105/0x120
[9.887634]  [] ? kthread_create_on_node+0x200/0x200
[9.17]  [] ret_from_fork+0x3f/0x70
[9.889995]  [] ? kthread_create_on_node+0x200/0x200

[9.891174] kworker/3:3 R  running task10808  1350  2 0x
[9.892367] Workqueue: events_freezable usb_stor_scan_dwork
[9.893560]  88025dbdbb78 88026a0ccd80 6a0ccd80 
8802604597c0
[9.894773]  88025dbcc740 88025dbdc000 88025dbdbbb0 
88026a0ccd80
[9.896004]  88026a0ccd80 0003 88025dbdbb90 
81b9bab7
[9.897233] Call Trace:
[9.898444]  [] schedule+0x37/0x90
[9.899670]  [] schedule_timeout+0x17c/0x2f0
[9.900893]  [] ? init_timer_on_stack_key+0x40/0x40
[9.902099]  [] 
wait_for_completion_interruptible_timeout+0xcd/0x160
[9.903324]  [] ? wake_up_q+0x70/0x70
[9.904536]  [] usb_stor_msg_common+0xd8/0x130
[9.905751]  [] usb_stor_control_msg+0x96/0xb0
[9.906961]  [] usb_stor_Bulk_max_lun+0x51/0xa0
[9.908163]  [] usb_stor_scan_dwork+0x7f/0xe0 
[9.909361]  [] process_one_work+0x1ff/0x620
[9.910563]  [] ? process_one_work+0x15f/0x620
[9.911762]  [] worker_thread+0x64/0x4b0
[

Re: [PATCH] usb: host: xhci: Replace bus lock with host controller lock

2016-02-04 Thread Chris Bainbridge

On Thu, Feb 04, 2016 at 04:00:51PM -0500, Alan Stern wrote:
> On Thu, 4 Feb 2016, Chris Bainbridge wrote:
> 
> > The XHCI controller presents two USB buses to the system - one for USB 2
> > and one for USB 3. When only one bus is locked there is a race condition
> > during hub init that results in errors like:
> > 
> >  [   13.183701] usb 3-3: device descriptor read/all, error -110
> 
> What exactly is the race condition?  Why does locking both buses fix 
> it?

[2.692571] xhci_hcd :00:14.0: xHCI Host Controller
[2.693279] xhci_hcd :00:14.0: new USB bus registered, assigned bus 
number 3
[2.694867] xhci_hcd :00:14.0: hcc params 0x20007181 hci version 0x100 
quirks 0xb930
[2.694880] xhci_hcd :00:14.0: cache line size of 256 is not supported
[2.695995] usb usb3: New USB device found, idVendor=1d6b, idProduct=0002
[2.696005] usb usb3: New USB device strings: Mfr=3, Product=2, 
SerialNumber=1
[2.696016] usb usb3: Product: xHCI Host Controller
[2.696024] usb usb3: Manufacturer: Linux 4.4.0 xhci-hcd
[2.696031] usb usb3: SerialNumber: :00:14.0
[2.698414] hub 3-0:1.0: USB hub found
[2.704723] xhci_hcd :00:14.0: xHCI Host Controller
[2.705502] xhci_hcd :00:14.0: new USB bus registered, assigned bus 
number 4
[2.706136] usb usb4: New USB device found, idVendor=1d6b, idProduct=0003
[2.706138] usb usb4: New USB device strings: Mfr=3, Product=2, 
SerialNumber=1
[2.706139] usb usb4: Product: xHCI Host Controller
[2.706140] usb usb4: Manufacturer: Linux 4.4.0 xhci-hcd
[2.706141] usb usb4: SerialNumber: :00:14.0
[2.708467] hub 4-0:1.0: USB hub found
[3.021779] usb 3-3: new high-speed USB device number 2 using xhci_hcd
[8.034843] xhci_hcd :00:14.0: Timeout while waiting for setup device 
command
...bus4 enumeration...
[   13.297835] usb 3-3: device descriptor read/all, error -110
...bus3 enumeration...

Note there seem to be two timeouts of 5 seconds/10 seconds after the
last message at 3.021779 seconds.

hub_port_init is called in parallel for both buses.
The first thread is in usb_get_device_descriptor when the second one
enters the function and calls the code to get an address. I don't know
precisely how it fails - it looks like the functions for doing the
initialisation are synchronous and sleeping waiting for a response and
that gets disrupted when the second thread tries to initialise the hub.
What was the basis for using a lock on the bus rather than the
controller?  Does the spec say that buses of the same controller can be
initialised in parallel? Mathias previously said:

> Just found an additional note in the xhci specs section 4.5.3 saying that:
> "Note: Software shall not transition more than one Device Slot to the Default 
> State at a time"
> which is what xhci_setup_device() does in addition to moving slots to the 
> addressed state

But I don't know if that means you can do the reset/set address/read
descriptors in parallel?

> > @@ -4312,7 +4312,7 @@ hub_port_init(struct usb_hub *hub, struct usb_device 
> > *udev, int port1,
> > if (oldspeed == USB_SPEED_LOW)
> > delay = HUB_LONG_RESET_TIME;
> >  
> > -   mutex_lock(>bus->usb_address0_mutex);
> > +   mutex_lock(>bus->controller->mutex);
> >  
> > /* Reset the device; full speed may morph to high speed */
> > /* FIXME a USB 2.0 device may morph into SuperSpeed on reset. */
> > @@ -4588,7 +4588,7 @@ fail:
> > hub_port_disable(hub, port1, 0);
> > update_devnum(udev, devnum);/* for disconnect processing */
> > }
> > -   mutex_unlock(>bus->usb_address0_mutex);
> > +   mutex_unlock(>bus->controller->mutex);
> > return retval;
> >  }
> 
> I don't think this is a good idea.  The driver core needs to be able to
> access the controller while this function is running.  You can
> introduce a new mutex if you want, perhaps in the primary hcd
> structure, but don't use bus->controller->mutex.

An explicit lock might be a good idea. I was trying to avoid adding
another lock so used the one in struct device as it appeared unused. The
XHCI code seems to only use the lock in struct xhci_hcd and ehci uses
struct ehci->lock.

btw I think this bug may be the same as reported at
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1437492

[PATCH] usb: host: xhci: Replace bus lock with host controller lock

2016-02-04 Thread Chris Bainbridge

The XHCI controller presents two USB buses to the system - one for USB 2
and one for USB 3. When only one bus is locked there is a race condition
during hub init that results in errors like:

 [   13.183701] usb 3-3: device descriptor read/all, error -110

On a test system this error occurred on 6% of boots. Fix this by locking
the bus controller instead of the bus, thus preventing simultaneous hub
init on both buses.

Fixes: 638139eb95d2 ("usb: hub: allow to process more usb hub events in 
parallel")
Signed-off-by: Chris Bainbridge 
---
 drivers/usb/core/hcd.c | 2 +-
 drivers/usb/core/hub.c | 8 
 include/linux/usb.h| 3 +--
 3 files changed, 6 insertions(+), 7 deletions(-)

diff --git a/drivers/usb/core/hcd.c b/drivers/usb/core/hcd.c
index df0e3b92533a..c468d047ec5e 100644
--- a/drivers/usb/core/hcd.c
+++ b/drivers/usb/core/hcd.c
@@ -966,7 +966,7 @@ static void usb_bus_init (struct usb_bus *bus)
bus->bandwidth_allocated = 0;
bus->bandwidth_int_reqs  = 0;
bus->bandwidth_isoc_reqs = 0;
-   mutex_init(>usb_address0_mutex);
+   mutex_init(>devnum_next_mutex);
 
INIT_LIST_HEAD (>bus_list);
 }
diff --git a/drivers/usb/core/hub.c b/drivers/usb/core/hub.c
index 350dcd9af5d8..e03c2f05c2f6 100644
--- a/drivers/usb/core/hub.c
+++ b/drivers/usb/core/hub.c
@@ -2066,7 +2066,7 @@ static void choose_devnum(struct usb_device *udev)
struct usb_bus  *bus = udev->bus;
 
/* be safe when more hub events are proceed in parallel */
-   mutex_lock(>usb_address0_mutex);
+   mutex_lock(>devnum_next_mutex);
if (udev->wusb) {
devnum = udev->portnum + 1;
BUG_ON(test_bit(devnum, bus->devmap.devicemap));
@@ -2084,7 +2084,7 @@ static void choose_devnum(struct usb_device *udev)
set_bit(devnum, bus->devmap.devicemap);
udev->devnum = devnum;
}
-   mutex_unlock(>usb_address0_mutex);
+   mutex_unlock(>devnum_next_mutex);
 }
 
 static void release_devnum(struct usb_device *udev)
@@ -4312,7 +4312,7 @@ hub_port_init(struct usb_hub *hub, struct usb_device 
*udev, int port1,
if (oldspeed == USB_SPEED_LOW)
delay = HUB_LONG_RESET_TIME;
 
-   mutex_lock(>bus->usb_address0_mutex);
+   mutex_lock(>bus->controller->mutex);
 
/* Reset the device; full speed may morph to high speed */
/* FIXME a USB 2.0 device may morph into SuperSpeed on reset. */
@@ -4588,7 +4588,7 @@ fail:
hub_port_disable(hub, port1, 0);
update_devnum(udev, devnum);/* for disconnect processing */
}
-   mutex_unlock(>bus->usb_address0_mutex);
+   mutex_unlock(>bus->controller->mutex);
return retval;
 }
 
diff --git a/include/linux/usb.h b/include/linux/usb.h
index 89533ba38691..6b736c82b9d1 100644
--- a/include/linux/usb.h
+++ b/include/linux/usb.h
@@ -371,14 +371,13 @@ struct usb_bus {
 
int devnum_next;/* Next open device number in
 * round-robin allocation */
+   struct mutex devnum_next_mutex; /* devnum_next mutex */
 
struct usb_devmap devmap;   /* device address allocation map */
struct usb_device *root_hub;/* Root hub */
struct usb_bus *hs_companion;   /* Companion EHCI bus, if any */
struct list_head bus_list;  /* list of busses */
 
-   struct mutex usb_address0_mutex; /* unaddressed device mutex */
-
int bandwidth_allocated;/* on this bus: how much of the time
 * reserved for periodic (intr/iso)
 * requests is used, on average?
-- 
2.1.4

Re: [PATCH] usb: host: xhci: Replace bus lock with host controller lock

2016-02-04 Thread Chris Bainbridge

On Thu, Feb 04, 2016 at 04:00:51PM -0500, Alan Stern wrote:
> On Thu, 4 Feb 2016, Chris Bainbridge wrote:
> 
> > The XHCI controller presents two USB buses to the system - one for USB 2
> > and one for USB 3. When only one bus is locked there is a race condition
> > during hub init that results in errors like:
> > 
> >  [   13.183701] usb 3-3: device descriptor read/all, error -110
> 
> What exactly is the race condition?  Why does locking both buses fix 
> it?

[2.692571] xhci_hcd :00:14.0: xHCI Host Controller
[2.693279] xhci_hcd :00:14.0: new USB bus registered, assigned bus 
number 3
[2.694867] xhci_hcd :00:14.0: hcc params 0x20007181 hci version 0x100 
quirks 0xb930
[2.694880] xhci_hcd :00:14.0: cache line size of 256 is not supported
[2.695995] usb usb3: New USB device found, idVendor=1d6b, idProduct=0002
[2.696005] usb usb3: New USB device strings: Mfr=3, Product=2, 
SerialNumber=1
[2.696016] usb usb3: Product: xHCI Host Controller
[2.696024] usb usb3: Manufacturer: Linux 4.4.0 xhci-hcd
[2.696031] usb usb3: SerialNumber: :00:14.0
[2.698414] hub 3-0:1.0: USB hub found
[2.704723] xhci_hcd :00:14.0: xHCI Host Controller
[2.705502] xhci_hcd :00:14.0: new USB bus registered, assigned bus 
number 4
[2.706136] usb usb4: New USB device found, idVendor=1d6b, idProduct=0003
[2.706138] usb usb4: New USB device strings: Mfr=3, Product=2, 
SerialNumber=1
[2.706139] usb usb4: Product: xHCI Host Controller
[2.706140] usb usb4: Manufacturer: Linux 4.4.0 xhci-hcd
[2.706141] usb usb4: SerialNumber: :00:14.0
[2.708467] hub 4-0:1.0: USB hub found
[3.021779] usb 3-3: new high-speed USB device number 2 using xhci_hcd
[8.034843] xhci_hcd :00:14.0: Timeout while waiting for setup device 
command
...bus4 enumeration...
[   13.297835] usb 3-3: device descriptor read/all, error -110
...bus3 enumeration...

Note there seem to be two timeouts of 5 seconds/10 seconds after the
last message at 3.021779 seconds.

hub_port_init is called in parallel for both buses.
The first thread is in usb_get_device_descriptor when the second one
enters the function and calls the code to get an address. I don't know
precisely how it fails - it looks like the functions for doing the
initialisation are synchronous and sleeping waiting for a response and
that gets disrupted when the second thread tries to initialise the hub.
What was the basis for using a lock on the bus rather than the
controller?  Does the spec say that buses of the same controller can be
initialised in parallel? Mathias previously said:

> Just found an additional note in the xhci specs section 4.5.3 saying that:
> "Note: Software shall not transition more than one Device Slot to the Default 
> State at a time"
> which is what xhci_setup_device() does in addition to moving slots to the 
> addressed state

But I don't know if that means you can do the reset/set address/read
descriptors in parallel?

> > @@ -4312,7 +4312,7 @@ hub_port_init(struct usb_hub *hub, struct usb_device 
> > *udev, int port1,
> > if (oldspeed == USB_SPEED_LOW)
> > delay = HUB_LONG_RESET_TIME;
> >  
> > -   mutex_lock(>bus->usb_address0_mutex);
> > +   mutex_lock(>bus->controller->mutex);
> >  
> > /* Reset the device; full speed may morph to high speed */
> > /* FIXME a USB 2.0 device may morph into SuperSpeed on reset. */
> > @@ -4588,7 +4588,7 @@ fail:
> > hub_port_disable(hub, port1, 0);
> > update_devnum(udev, devnum);/* for disconnect processing */
> > }
> > -   mutex_unlock(>bus->usb_address0_mutex);
> > +   mutex_unlock(>bus->controller->mutex);
> > return retval;
> >  }
> 
> I don't think this is a good idea.  The driver core needs to be able to
> access the controller while this function is running.  You can
> introduce a new mutex if you want, perhaps in the primary hcd
> structure, but don't use bus->controller->mutex.

An explicit lock might be a good idea. I was trying to avoid adding
another lock so used the one in struct device as it appeared unused. The
XHCI code seems to only use the lock in struct xhci_hcd and ehci uses
struct ehci->lock.

btw I think this bug may be the same as reported at
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1437492

[PATCH] usb: host: xhci: Replace bus lock with host controller lock

2016-02-04 Thread Chris Bainbridge

The XHCI controller presents two USB buses to the system - one for USB 2
and one for USB 3. When only one bus is locked there is a race condition
during hub init that results in errors like:

 [   13.183701] usb 3-3: device descriptor read/all, error -110

On a test system this error occurred on 6% of boots. Fix this by locking
the bus controller instead of the bus, thus preventing simultaneous hub
init on both buses.

Fixes: 638139eb95d2 ("usb: hub: allow to process more usb hub events in 
parallel")
Signed-off-by: Chris Bainbridge <chris.bainbri...@gmail.com>
---
 drivers/usb/core/hcd.c | 2 +-
 drivers/usb/core/hub.c | 8 
 include/linux/usb.h| 3 +--
 3 files changed, 6 insertions(+), 7 deletions(-)

diff --git a/drivers/usb/core/hcd.c b/drivers/usb/core/hcd.c
index df0e3b92533a..c468d047ec5e 100644
--- a/drivers/usb/core/hcd.c
+++ b/drivers/usb/core/hcd.c
@@ -966,7 +966,7 @@ static void usb_bus_init (struct usb_bus *bus)
bus->bandwidth_allocated = 0;
bus->bandwidth_int_reqs  = 0;
bus->bandwidth_isoc_reqs = 0;
-   mutex_init(>usb_address0_mutex);
+   mutex_init(>devnum_next_mutex);
 
INIT_LIST_HEAD (>bus_list);
 }
diff --git a/drivers/usb/core/hub.c b/drivers/usb/core/hub.c
index 350dcd9af5d8..e03c2f05c2f6 100644
--- a/drivers/usb/core/hub.c
+++ b/drivers/usb/core/hub.c
@@ -2066,7 +2066,7 @@ static void choose_devnum(struct usb_device *udev)
struct usb_bus  *bus = udev->bus;
 
/* be safe when more hub events are proceed in parallel */
-   mutex_lock(>usb_address0_mutex);
+   mutex_lock(>devnum_next_mutex);
if (udev->wusb) {
devnum = udev->portnum + 1;
BUG_ON(test_bit(devnum, bus->devmap.devicemap));
@@ -2084,7 +2084,7 @@ static void choose_devnum(struct usb_device *udev)
set_bit(devnum, bus->devmap.devicemap);
udev->devnum = devnum;
}
-   mutex_unlock(>usb_address0_mutex);
+   mutex_unlock(>devnum_next_mutex);
 }
 
 static void release_devnum(struct usb_device *udev)
@@ -4312,7 +4312,7 @@ hub_port_init(struct usb_hub *hub, struct usb_device 
*udev, int port1,
if (oldspeed == USB_SPEED_LOW)
delay = HUB_LONG_RESET_TIME;
 
-   mutex_lock(>bus->usb_address0_mutex);
+   mutex_lock(>bus->controller->mutex);
 
/* Reset the device; full speed may morph to high speed */
/* FIXME a USB 2.0 device may morph into SuperSpeed on reset. */
@@ -4588,7 +4588,7 @@ fail:
hub_port_disable(hub, port1, 0);
update_devnum(udev, devnum);/* for disconnect processing */
}
-   mutex_unlock(>bus->usb_address0_mutex);
+   mutex_unlock(>bus->controller->mutex);
return retval;
 }
 
diff --git a/include/linux/usb.h b/include/linux/usb.h
index 89533ba38691..6b736c82b9d1 100644
--- a/include/linux/usb.h
+++ b/include/linux/usb.h
@@ -371,14 +371,13 @@ struct usb_bus {
 
int devnum_next;/* Next open device number in
 * round-robin allocation */
+   struct mutex devnum_next_mutex; /* devnum_next mutex */
 
struct usb_devmap devmap;   /* device address allocation map */
struct usb_device *root_hub;/* Root hub */
struct usb_bus *hs_companion;   /* Companion EHCI bus, if any */
struct list_head bus_list;  /* list of busses */
 
-   struct mutex usb_address0_mutex; /* unaddressed device mutex */
-
int bandwidth_allocated;/* on this bus: how much of the time
 * reserved for periodic (intr/iso)
 * requests is used, on average?
-- 
2.1.4

[BUG] ACPI Error: ...

2016-02-03 Thread Chris Bainbridge

v4.5-rc2, Macbook IVB, ACPI errors on boot:

[0.618382] ACPI Error: Needed type [Reference], found [Integer] 
880260240fa0 (20150930/exresop-103)
[0.618448] ACPI Error: Method parse/execution failed [\_PR.CPU0._PDC] (Node 
880261acb9e0), AE_AML_OPERAND_TYPE (20150930/psparse-542)
[1.581790] ACPI Error: [\_PR_.CPU0._CST] Namespace lookup failure, 
AE_NOT_FOUND (20150930/psargs-360)
[1.581885] ACPI Error: Method parse/execution failed [\_PR.CPU1._CST] (Node 
88025f9962e0), AE_NOT_FOUND (20150930/psparse-542)
[1.585873] ACPI Error: [\_PR_.CPU0._CST] Namespace lookup failure, 
AE_NOT_FOUND (20150930/psargs-360)
[1.585968] ACPI Error: Method parse/execution failed [\_PR.CPU2._CST] (Node 
88025f997b50), AE_NOT_FOUND (20150930/psparse-542)
[1.590267] ACPI Error: [\_PR_.CPU0._CST] Namespace lookup failure, 
AE_NOT_FOUND (20150930/psargs-360)
[1.590360] ACPI Error: Method parse/execution failed [\_PR.CPU3._CST] (Node 
88025f996450), AE_NOT_FOUND (20150930/psparse-542)

ae90fbf562d733a392c7a0ffefe1e09b5a31c99c is the first bad commit
commit ae90fbf562d733a392c7a0ffefe1e09b5a31c99c
Author: Bob Moore 
Date:   Tue Dec 29 14:00:14 2015 +0800

ACPICA: Parser: Fix for SuperName method invocation

ACPICA commit 4b86d1046d06e462dae83ebcd5a66cc132a08f8f

SuperName parameters that are in fact control method invocations were
not handled correctly by the parser. This change fixes the problem
by identifying these properly as method invocations. This affects
about 14 different ASL operators that contain SuperName parameters.
ACPICA BZ 1002.

Link: https://github.com/acpica/acpica/commit/4b86d104
Link: https://bugs.acpica.org/show_bug.cgi?id=1002
Signed-off-by: Bob Moore 
Signed-off-by: Lv Zheng 
Signed-off-by: Rafael J. Wysocki 

:04 04 3c9015e32a1af1bbe211d455138e4940fc313543 
441172b0f1209cd86fc68f70a0ce7f0adb1f9e33 M  drivers

[BUG] ODEBUG: assert_init not available (active state 0)

2016-02-03 Thread Chris Bainbridge

v4.5-rc2.
Macbook 2012 (IVB).

ODEBUG reports an error or two at shutdown. Apart from the ODEBUG
error(s), this bug also seems to cause some graphical corruption when
typing in Firefox text fields.


efaed9be998b5ae0afb7458e057e5f4402b43fa0 is the first bad commit
commit efaed9be998b5ae0afb7458e057e5f4402b43fa0
Author: Lv Zheng 
Date:   Tue Dec 29 14:03:08 2015 +0800

ACPICA: Events: Enhance acpi_ev_execute_reg_method() to ensure no _REG 
evaluations can happen during OS early boot stages

ACPICA commit 31178590dde82368fdb0f6b0e466b6c0add96c57

We can ensure no early _REG evaluations by ensuring the following rules in
acpi_ev_execute_reg_method():
1. If an address space handler is installed during early stage,
   _REG(CONNECT) evaluations are blocked. This is achieved using
   acpi_gbl_reg_methods_enabled which is renamed from
   acpi_gbl_reg_methods_executed.
2. If _REG(CONNECT) has never been evalauted for the region object,
   _REG(DISCONNECT) evaluations are blocked. This is achieved by a new
   region object flag: AOPOBJ_REG_CONNECTED.
Note that, after applying this patch, we can ensure _REG(DISCONNECT) is
always paired to _REG(CONNECT). Lv Zheng

Link: https://github.com/acpica/acpica/commit/31178590
Signed-off-by: Lv Zheng 
Signed-off-by: Bob Moore 
Signed-off-by: Rafael J. Wysocki 

:04 04 357c50e20ee06ad557c77a2a03d09719dae9dc52 
3e0610826dc9cb86454baf06de9392aa31e0aa91 M  drivers


[   34.512758] [ cut here ]
[   34.512765] WARNING: CPU: 0 PID: 4975 at lib/debugobjects.c:263 
debug_print_object+0x85/0xa0()
[   34.512770] ODEBUG: assert_init not available (active state 0) object type: 
timer_list hint: stub_timer+0x0/0x20
[   34.512772] Modules linked in:
[   34.512774] CPU: 0 PID: 4975 Comm: systemd Not tainted 4.4.0-rc7+ #353
[   34.512776] Hardware name: Apple Inc. MacBookPro10,2/Mac-AFD8A9D944EA4843, 
BIOS MBP102.88Z.0106.B0A.1509130955 09/13/2015
[   34.512779]  81f9b41c 880227a1bdb0 814ec829 
880227a1bdf8
[   34.512782]  880227a1bde8 810cd831 880227a1be90 
822514c0
[   34.512785]  81f9b4c2 8327e288 7f29190ba700 
880227a1be48
[   34.512786] Call Trace:
[   34.512790]  [] dump_stack+0x4b/0x72
[   34.512793]  [] warn_slowpath_common+0x81/0xc0
[   34.512795]  [] warn_slowpath_fmt+0x47/0x50
[   34.512798]  [] ? do_init_timer+0x52/0x60
[   34.512800]  [] debug_print_object+0x85/0xa0
[   34.512802]  [] ? 
trace_event_raw_event_tick_stop+0x100/0x100
[   34.512805]  [] debug_object_assert_init+0xf8/0x130
[   34.512807]  [] ? trace_hardirqs_on+0xd/0x10
[   34.512810]  [] del_timer+0x1f/0x70
[   34.512813]  [] laptop_sync_completion+0x61/0x100
[   34.512815]  [] ? laptop_io_completion+0x30/0x30
[   34.512819]  [] sys_sync+0x7f/0x90
[   34.512824]  [] entry_SYSCALL_64_fastpath+0x12/0x6f
[   34.512825] ---[ end trace 8fe52cbaccc47e66 ]---
[   34.512830] [ cut here ]
[   34.512833] WARNING: CPU: 0 PID: 4975 at lib/debugobjects.c:263 
debug_print_object+0x85/0xa0()
[   34.512835] ODEBUG: assert_init not available (active state 0) object type: 
timer_list hint: stub_timer+0x0/0x20
[   34.512836] Modules linked in:
[   34.512838] CPU: 0 PID: 4975 Comm: systemd Tainted: GW   
4.4.0-rc7+ #353
[   34.512839] Hardware name: Apple Inc. MacBookPro10,2/Mac-AFD8A9D944EA4843, 
BIOS MBP102.88Z.0106.B0A.1509130955 09/13/2015
[   34.512842]  81f9b41c 880227a1bdb0 814ec829 
880227a1bdf8
[   34.512845]  880227a1bde8 810cd831 880227a1be90 
822514c0
[   34.512847]  81f9b4c2 8331f388 7f29190ba700 
880227a1be48
[   34.512848] Call Trace:
[   34.512850]  [] dump_stack+0x4b/0x72
[   34.512853]  [] warn_slowpath_common+0x81/0xc0
[   34.512855]  [] warn_slowpath_fmt+0x47/0x50
[   34.512857]  [] ? do_init_timer+0x52/0x60
[   34.512859]  [] debug_print_object+0x85/0xa0
[   34.512861]  [] ? 
trace_event_raw_event_tick_stop+0x100/0x100
[   34.512864]  [] debug_object_assert_init+0xf8/0x130
[   34.512865]  [] ? trace_hardirqs_on+0xd/0x10
[   34.512868]  [] del_timer+0x1f/0x70
[   34.512869]  [] laptop_sync_completion+0x61/0x100
[   34.512871]  [] ? laptop_io_completion+0x30/0x30
[   34.512873]  [] sys_sync+0x7f/0x90
[   34.512876]  [] entry_SYSCALL_64_fastpath+0x12/0x6f
[   34.512877] ---[ end trace 8fe52cbaccc47e67 ]---

[BUG] ODEBUG: assert_init not available (active state 0)

2016-02-03 Thread Chris Bainbridge

v4.5-rc2.
Macbook 2012 (IVB).

ODEBUG reports an error or two at shutdown. Apart from the ODEBUG
error(s), this bug also seems to cause some graphical corruption when
typing in Firefox text fields.


efaed9be998b5ae0afb7458e057e5f4402b43fa0 is the first bad commit
commit efaed9be998b5ae0afb7458e057e5f4402b43fa0
Author: Lv Zheng 
Date:   Tue Dec 29 14:03:08 2015 +0800

ACPICA: Events: Enhance acpi_ev_execute_reg_method() to ensure no _REG 
evaluations can happen during OS early boot stages

ACPICA commit 31178590dde82368fdb0f6b0e466b6c0add96c57

We can ensure no early _REG evaluations by ensuring the following rules in
acpi_ev_execute_reg_method():
1. If an address space handler is installed during early stage,
   _REG(CONNECT) evaluations are blocked. This is achieved using
   acpi_gbl_reg_methods_enabled which is renamed from
   acpi_gbl_reg_methods_executed.
2. If _REG(CONNECT) has never been evalauted for the region object,
   _REG(DISCONNECT) evaluations are blocked. This is achieved by a new
   region object flag: AOPOBJ_REG_CONNECTED.
Note that, after applying this patch, we can ensure _REG(DISCONNECT) is
always paired to _REG(CONNECT). Lv Zheng

Link: https://github.com/acpica/acpica/commit/31178590
Signed-off-by: Lv Zheng 
Signed-off-by: Bob Moore 
Signed-off-by: Rafael J. Wysocki 

:04 04 357c50e20ee06ad557c77a2a03d09719dae9dc52 
3e0610826dc9cb86454baf06de9392aa31e0aa91 M  drivers


[   34.512758] [ cut here ]
[   34.512765] WARNING: CPU: 0 PID: 4975 at lib/debugobjects.c:263 
debug_print_object+0x85/0xa0()
[   34.512770] ODEBUG: assert_init not available (active state 0) object type: 
timer_list hint: stub_timer+0x0/0x20
[   34.512772] Modules linked in:
[   34.512774] CPU: 0 PID: 4975 Comm: systemd Not tainted 4.4.0-rc7+ #353
[   34.512776] Hardware name: Apple Inc. MacBookPro10,2/Mac-AFD8A9D944EA4843, 
BIOS MBP102.88Z.0106.B0A.1509130955 09/13/2015
[   34.512779]  81f9b41c 880227a1bdb0 814ec829 
880227a1bdf8
[   34.512782]  880227a1bde8 810cd831 880227a1be90 
822514c0
[   34.512785]  81f9b4c2 8327e288 7f29190ba700 
880227a1be48
[   34.512786] Call Trace:
[   34.512790]  [] dump_stack+0x4b/0x72
[   34.512793]  [] warn_slowpath_common+0x81/0xc0
[   34.512795]  [] warn_slowpath_fmt+0x47/0x50
[   34.512798]  [] ? do_init_timer+0x52/0x60
[   34.512800]  [] debug_print_object+0x85/0xa0
[   34.512802]  [] ? 
trace_event_raw_event_tick_stop+0x100/0x100
[   34.512805]  [] debug_object_assert_init+0xf8/0x130
[   34.512807]  [] ? trace_hardirqs_on+0xd/0x10
[   34.512810]  [] del_timer+0x1f/0x70
[   34.512813]  [] laptop_sync_completion+0x61/0x100
[   34.512815]  [] ? laptop_io_completion+0x30/0x30
[   34.512819]  [] sys_sync+0x7f/0x90
[   34.512824]  [] entry_SYSCALL_64_fastpath+0x12/0x6f
[   34.512825] ---[ end trace 8fe52cbaccc47e66 ]---
[   34.512830] [ cut here ]
[   34.512833] WARNING: CPU: 0 PID: 4975 at lib/debugobjects.c:263 
debug_print_object+0x85/0xa0()
[   34.512835] ODEBUG: assert_init not available (active state 0) object type: 
timer_list hint: stub_timer+0x0/0x20
[   34.512836] Modules linked in:
[   34.512838] CPU: 0 PID: 4975 Comm: systemd Tainted: GW   
4.4.0-rc7+ #353
[   34.512839] Hardware name: Apple Inc. MacBookPro10,2/Mac-AFD8A9D944EA4843, 
BIOS MBP102.88Z.0106.B0A.1509130955 09/13/2015
[   34.512842]  81f9b41c 880227a1bdb0 814ec829 
880227a1bdf8
[   34.512845]  880227a1bde8 810cd831 880227a1be90 
822514c0
[   34.512847]  81f9b4c2 8331f388 7f29190ba700 
880227a1be48
[   34.512848] Call Trace:
[   34.512850]  [] dump_stack+0x4b/0x72
[   34.512853]  [] warn_slowpath_common+0x81/0xc0
[   34.512855]  [] warn_slowpath_fmt+0x47/0x50
[   34.512857]  [] ? do_init_timer+0x52/0x60
[   34.512859]  [] debug_print_object+0x85/0xa0
[   34.512861]  [] ? 
trace_event_raw_event_tick_stop+0x100/0x100
[   34.512864]  [] debug_object_assert_init+0xf8/0x130
[   34.512865]  [] ? trace_hardirqs_on+0xd/0x10
[   34.512868]  [] del_timer+0x1f/0x70
[   34.512869]  [] laptop_sync_completion+0x61/0x100
[   34.512871]  [] ? laptop_io_completion+0x30/0x30
[   34.512873]  [] sys_sync+0x7f/0x90
[   34.512876]  [] entry_SYSCALL_64_fastpath+0x12/0x6f
[   34.512877] ---[ end trace 8fe52cbaccc47e67 ]---

[BUG] ACPI Error: ...

2016-02-03 Thread Chris Bainbridge

v4.5-rc2, Macbook IVB, ACPI errors on boot:

[0.618382] ACPI Error: Needed type [Reference], found [Integer] 
880260240fa0 (20150930/exresop-103)
[0.618448] ACPI Error: Method parse/execution failed [\_PR.CPU0._PDC] (Node 
880261acb9e0), AE_AML_OPERAND_TYPE (20150930/psparse-542)
[1.581790] ACPI Error: [\_PR_.CPU0._CST] Namespace lookup failure, 
AE_NOT_FOUND (20150930/psargs-360)
[1.581885] ACPI Error: Method parse/execution failed [\_PR.CPU1._CST] (Node 
88025f9962e0), AE_NOT_FOUND (20150930/psparse-542)
[1.585873] ACPI Error: [\_PR_.CPU0._CST] Namespace lookup failure, 
AE_NOT_FOUND (20150930/psargs-360)
[1.585968] ACPI Error: Method parse/execution failed [\_PR.CPU2._CST] (Node 
88025f997b50), AE_NOT_FOUND (20150930/psparse-542)
[1.590267] ACPI Error: [\_PR_.CPU0._CST] Namespace lookup failure, 
AE_NOT_FOUND (20150930/psargs-360)
[1.590360] ACPI Error: Method parse/execution failed [\_PR.CPU3._CST] (Node 
88025f996450), AE_NOT_FOUND (20150930/psparse-542)

ae90fbf562d733a392c7a0ffefe1e09b5a31c99c is the first bad commit
commit ae90fbf562d733a392c7a0ffefe1e09b5a31c99c
Author: Bob Moore 
Date:   Tue Dec 29 14:00:14 2015 +0800

ACPICA: Parser: Fix for SuperName method invocation

ACPICA commit 4b86d1046d06e462dae83ebcd5a66cc132a08f8f

SuperName parameters that are in fact control method invocations were
not handled correctly by the parser. This change fixes the problem
by identifying these properly as method invocations. This affects
about 14 different ASL operators that contain SuperName parameters.
ACPICA BZ 1002.

Link: https://github.com/acpica/acpica/commit/4b86d104
Link: https://bugs.acpica.org/show_bug.cgi?id=1002
Signed-off-by: Bob Moore 
Signed-off-by: Lv Zheng 
Signed-off-by: Rafael J. Wysocki 

:04 04 3c9015e32a1af1bbe211d455138e4940fc313543 
441172b0f1209cd86fc68f70a0ce7f0adb1f9e33 M  drivers

[PATCH] net/mac80211/agg-rx.c: fix use of uninitialised values

2016-01-27 Thread Chris Bainbridge

Use kzalloc instead of kmalloc for struct tid_ampdu_rx. Fixes:

[7.976605] UBSAN: Undefined behaviour in net/mac80211/rx.c:932:29
[7.976608] load of value 2 is not a valid value for type '_Bool'
[7.976611] CPU: 3 PID: 1134 Comm: kworker/u16:7 Not tainted 4.5.0-rc1+ #265
[7.976613] Hardware name: Apple Inc. MacBookPro10,2/Mac-AFD8A9D944EA4843, 
BIOS MBP102.88Z.0106.B0A.1509130955 09/13/2015
[7.976616] Workqueue: phy0 rt2x00usb_work_rxdone
[7.976619]  0004 880254a7ba50 8181d866 
0007
[7.976622]  880254a7ba78 880254a7ba68 8188422d 
8379b500
[7.976626]  880254a7bab8 81884747 0202 
000348620032
[7.976629] Call Trace:
[7.976633]  [] dump_stack+0x45/0x5f
[7.976637]  [] ubsan_epilogue+0xd/0x40
[7.976642]  [] __ubsan_handle_load_invalid_value+0x67/0x70
[7.976646]  [] 
ieee80211_sta_reorder_release.isra.16+0x5ed/0x730
[7.976650]  [] 
ieee80211_prepare_and_rx_handle+0xd04/0x1c00
[7.976654]  [] ? usb_hcd_map_urb_for_dma+0x65e/0x960
[7.976659]  [] __ieee80211_rx_handle_packet+0x1f3/0x750
[7.976663]  [] ieee80211_rx_napi+0x447/0x990
[7.976667]  [] rt2x00lib_rxdone+0x305/0xbd0
[7.976670]  [] ? dequeue_task_fair+0x64f/0x1de0
[7.976674]  [] ? sched_clock_cpu+0xe6/0x150
[7.976678]  [] rt2x00usb_work_rxdone+0x7c/0x140
[7.976682]  [] process_one_work+0x226/0x860
[7.976686]  [] worker_thread+0x5c/0x680
[7.976690]  [] ? process_one_work+0x860/0x860
[7.976693]  [] kthread+0xf6/0x150
[7.976697]  [] ? kthread_worker_fn+0x310/0x310
[7.976700]  [] ret_from_fork+0x3f/0x70
[7.976703]  [] ? kthread_worker_fn+0x310/0x310

Link: https://lkml.org/lkml/2016/1/26/230
Signed-off-by: Chris Bainbridge 
---
 net/mac80211/agg-rx.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/mac80211/agg-rx.c b/net/mac80211/agg-rx.c
index 10ad4ac1fa0b..bde3344cbdd0 100644
--- a/net/mac80211/agg-rx.c
+++ b/net/mac80211/agg-rx.c
@@ -291,7 +291,7 @@ void __ieee80211_start_rx_ba_session(struct sta_info *sta,
}
 
/* prepare A-MPDU MLME for Rx aggregation */
-   tid_agg_rx = kmalloc(sizeof(struct tid_ampdu_rx), GFP_KERNEL);
+   tid_agg_rx = kzalloc(sizeof(struct tid_ampdu_rx), GFP_KERNEL);
if (!tid_agg_rx)
goto end;
 
-- 
2.1.4

[PATCH] net/mac80211/agg-rx.c: fix use of uninitialised values

2016-01-27 Thread Chris Bainbridge

Use kzalloc instead of kmalloc for struct tid_ampdu_rx. Fixes:

[7.976605] UBSAN: Undefined behaviour in net/mac80211/rx.c:932:29
[7.976608] load of value 2 is not a valid value for type '_Bool'
[7.976611] CPU: 3 PID: 1134 Comm: kworker/u16:7 Not tainted 4.5.0-rc1+ #265
[7.976613] Hardware name: Apple Inc. MacBookPro10,2/Mac-AFD8A9D944EA4843, 
BIOS MBP102.88Z.0106.B0A.1509130955 09/13/2015
[7.976616] Workqueue: phy0 rt2x00usb_work_rxdone
[7.976619]  0004 880254a7ba50 8181d866 
0007
[7.976622]  880254a7ba78 880254a7ba68 8188422d 
8379b500
[7.976626]  880254a7bab8 81884747 0202 
000348620032
[7.976629] Call Trace:
[7.976633]  [] dump_stack+0x45/0x5f
[7.976637]  [] ubsan_epilogue+0xd/0x40
[7.976642]  [] __ubsan_handle_load_invalid_value+0x67/0x70
[7.976646]  [] 
ieee80211_sta_reorder_release.isra.16+0x5ed/0x730
[7.976650]  [] 
ieee80211_prepare_and_rx_handle+0xd04/0x1c00
[7.976654]  [] ? usb_hcd_map_urb_for_dma+0x65e/0x960
[7.976659]  [] __ieee80211_rx_handle_packet+0x1f3/0x750
[7.976663]  [] ieee80211_rx_napi+0x447/0x990
[7.976667]  [] rt2x00lib_rxdone+0x305/0xbd0
[7.976670]  [] ? dequeue_task_fair+0x64f/0x1de0
[7.976674]  [] ? sched_clock_cpu+0xe6/0x150
[7.976678]  [] rt2x00usb_work_rxdone+0x7c/0x140
[7.976682]  [] process_one_work+0x226/0x860
[7.976686]  [] worker_thread+0x5c/0x680
[7.976690]  [] ? process_one_work+0x860/0x860
[7.976693]  [] kthread+0xf6/0x150
[7.976697]  [] ? kthread_worker_fn+0x310/0x310
[7.976700]  [] ret_from_fork+0x3f/0x70
[7.976703]  [] ? kthread_worker_fn+0x310/0x310

Link: https://lkml.org/lkml/2016/1/26/230
Signed-off-by: Chris Bainbridge <chris.bainbri...@gmail.com>
---
 net/mac80211/agg-rx.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/mac80211/agg-rx.c b/net/mac80211/agg-rx.c
index 10ad4ac1fa0b..bde3344cbdd0 100644
--- a/net/mac80211/agg-rx.c
+++ b/net/mac80211/agg-rx.c
@@ -291,7 +291,7 @@ void __ieee80211_start_rx_ba_session(struct sta_info *sta,
}
 
/* prepare A-MPDU MLME for Rx aggregation */
-   tid_agg_rx = kmalloc(sizeof(struct tid_ampdu_rx), GFP_KERNEL);
+   tid_agg_rx = kzalloc(sizeof(struct tid_ampdu_rx), GFP_KERNEL);
if (!tid_agg_rx)
goto end;
 
-- 
2.1.4

UBSAN: Undefined behaviour in fs/ext4/mballoc.c:1274:11

2016-01-26 Thread Chris Bainbridge

Mounting ext4 fs on 4.5.0-rc1

[ 1176.592557] 

[ 1176.592565] UBSAN: Undefined behaviour in fs/ext4/mballoc.c:1274:11
[ 1176.592567] shift exponent -1 is negative
[ 1176.592571] CPU: 3 PID: 4976 Comm: QThread Not tainted 4.5.0-rc1 #252
[ 1176.592573] Hardware name: Apple Inc. MacBookPro10,2/Mac-AFD8A9D944EA4843, 
BIOS MBP102.88Z.0106.B0A.1509130955 09/13/2015
[ 1176.592575]    8379dede 
88021f4bf8b8
[ 1176.592578]  81b2e7d9 0007 88021f4bf8e8 
88021f4bf8d0
[ 1176.592581]  81bcb87d  88021f4bf968 
81bcc1c1
[ 1176.592583] Call Trace:
[ 1176.592590]  [] dump_stack+0x45/0x6c
[ 1176.592595]  [] ubsan_epilogue+0xd/0x40
[ 1176.592597]  [] 
__ubsan_handle_shift_out_of_bounds+0xf1/0x140
[ 1176.592601]  [] ? ext4_mb_init_cache+0x677/0x1570
[ 1176.592604]  [] ? lru_cache_add+0x9/0x10
[ 1176.592607]  [] mb_find_order_for_block+0x132/0x1e0
[ 1176.592609]  [] mb_find_extent+0xde/0x6d0
[ 1176.592611]  [] ext4_mb_complex_scan_group+0x14b/0x8a0
[ 1176.592614]  [] ext4_mb_regular_allocator+0x3a7/0xcc0
[ 1176.592616]  [] ext4_mb_new_blocks+0x4c5/0x1270
[ 1176.592620]  [] ? ext4_ext_put_gap_in_cache+0x116/0x230
[ 1176.592622]  [] ext4_ext_map_blocks+0x81c/0x2280
[ 1176.592626]  [] ext4_map_blocks+0x1b6/0xe00
[ 1176.592628]  [] ext4_getblk+0x49/0x290
[ 1176.592630]  [] ext4_bread+0xe/0xb0
[ 1176.592633]  [] ext4_append+0xad/0x340
[ 1176.592635]  [] ext4_mkdir+0x344/0x970
[ 1176.592639]  [] vfs_mkdir+0x103/0x310
[ 1176.592642]  [] SyS_mkdir+0x93/0x130
[ 1176.592645]  [] entry_SYSCALL_64_fastpath+0x12/0x6a
[ 1176.592647]

Re: UBSAN: Undefined behaviour in fs/btrfs/inode.c:5845:10

2016-01-26 Thread Chris Bainbridge

On Tue, Jan 26, 2016 at 12:03:39PM +0100, David Sterba wrote:
> On Tue, Jan 26, 2016 at 10:13:33AM +0000, Chris Bainbridge wrote:
> > Booting 4.5.0-rc1 with new UBSAN checker enabled:
> > 
> > [3.859690] 
> > 
> > [3.859694] UBSAN: Undefined behaviour in fs/btrfs/inode.c:5845:10
> > [3.859696] signed integer overflow:
> > [3.859697] 9223372036854775807 + 1 cannot be represented in type 'long 
> > long int'
> > [3.859701] CPU: 3 PID: 3237 Comm: polkitd Not tainted 4.5.0-rc1 #252
> > [3.859702] Hardware name: Apple Inc. 
> > MacBookPro10,2/Mac-AFD8A9D944EA4843, BIOS MBP102.88Z.0106.B0A.1509130955 
> > 09/13/2015
> > [3.859706]    0001 
> > 88024d4bfcf8
> > [3.859709]  81b2e7d9 0001 88024d4bfd28 
> > 88024d4bfd10
> > [3.859711]  81bcb87d 83aceb48 88024d4bfd98 
> > 81bcbc4d
> > [3.859712] Call Trace:
> > [3.859719]  [] dump_stack+0x45/0x6c
> > [3.859723]  [] ubsan_epilogue+0xd/0x40
> > [3.859725]  [] handle_overflow+0xbd/0xe0
> > [3.859728]  [] __ubsan_handle_add_overflow+0xe/0x10
> > [3.859732]  [] btrfs_real_readdir+0x881/0xc10
> > [3.859737]  [] iterate_dir+0xdd/0x2d0
> > [3.859740]  [] SyS_getdents+0x9b/0x110
> > [3.859743]  [] ? iterate_dir+0x2d0/0x2d0
> > [3.859747]  [] entry_SYSCALL_64_fastpath+0x12/0x6a
> > [3.859749] 
> > 
> 
> That seems to be the same overflow as reported in the past, caught by
> the PaX SIZE_OVERFLOW plugin. There's a patch but not merged yet.
> 
> https://patchwork.kernel.org/patch/7611351/

Yes, the error does not appear with that patch applied.

UBSAN: Undefined behaviour in net/mac80211/rx.c:924:18

2016-01-26 Thread Chris Bainbridge

4.5.0-rc1 another UBSAN error:

[ 4845.229441] 

[ 4845.229454] UBSAN: Undefined behaviour in net/mac80211/rx.c:924:18
[ 4845.229458] load of value 2 is not a valid value for type '_Bool'
[ 4845.229464] CPU: 1 PID: 6266 Comm: kworker/u16:8 Not tainted 4.5.0-rc1 #252
[ 4845.229468] Hardware name: Apple Inc. MacBookPro10,2/Mac-AFD8A9D944EA4843, 
BIOS MBP102.88Z.0106.B0A.1509130955 09/13/2015
[ 4845.229491] Workqueue: phy2 rt2x00usb_work_rxdone
[ 4845.229493]    0002 
8801b13c39f8
[ 4845.229496]  81b2e7d9 0007 8801b13c3a20 
8801b13c3a10
[ 4845.229498]  81bcb87d 85016890 8801b13c3a60 
81bcc279
[ 4845.229501] Call Trace:
[ 4845.229506]  [] dump_stack+0x45/0x6c
[ 4845.229510]  [] ubsan_epilogue+0xd/0x40
[ 4845.229513]  [] __ubsan_handle_load_invalid_value+0x69/0x80
[ 4845.229517]  [] ? 
xhci_setup_addressable_virt_dev+0xeb2/0x13b0
[ 4845.229520]  [] ? pick_next_entity+0xcb/0x280
[ 4845.229524]  [] 
ieee80211_sta_reorder_release.isra.15+0x7e3/0xad0
[ 4845.229527]  [] 
ieee80211_prepare_and_rx_handle+0x11a7/0x2ab0
[ 4845.229530]  [] ? xhci_urb_enqueue+0x394/0x1140
[ 4845.229533]  [] ? usb_hcd_map_urb_for_dma+0x94f/0x1140
[ 4845.229537]  [] ? skb_release_data+0x117/0x2f0
[ 4845.229539]  [] __ieee80211_rx_handle_packet+0x26a/0x9a0
[ 4845.229542]  [] ? __kmalloc_reserve.isra.11+0x2c/0x80
[ 4845.229545]  [] ieee80211_rx_napi+0x651/0x12b0
[ 4845.229549]  [] rt2x00lib_rxdone+0x402/0x1120
[ 4845.229552]  [] ? dequeue_task_fair+0x97f/0x41d0
[ 4845.229554]  [] rt2x00usb_work_rxdone+0xac/0x1f0
[ 4845.229558]  [] ? __schedule+0x5cd/0x1770
[ 4845.229561]  [] process_one_work+0x266/0xc00
[ 4845.229563]  [] worker_thread+0x96/0xd40
[ 4845.229565]  [] ? process_scheduled_works+0x70/0x70
[ 4845.229568]  [] kthread+0x108/0x180
[ 4845.229571]  [] ? kthread_create_on_node+0x210/0x210
[ 4845.229573]  [] ret_from_fork+0x3f/0x70
[ 4845.229576]  [] ? kthread_create_on_node+0x210/0x210
[ 4845.229577]

UBSAN: Undefined behaviour in net/mac80211/rc80211_minstrel.h:47:34

2016-01-26 Thread Chris Bainbridge

4.5.0-rc1 with the new UBSAN checker:

[ 2624.978741] 

[ 2624.978749] UBSAN: Undefined behaviour in 
net/mac80211/rc80211_minstrel.h:47:34
[ 2624.978752] signed integer overflow:
[ 2624.978754] -32768 * 100 cannot be represented in type 'int'
[ 2624.978759] CPU: 1 PID: 5362 Comm: kworker/u16:2 Not tainted 4.5.0-rc1 #252
[ 2624.978762] Hardware name: Apple Inc. MacBookPro10,2/Mac-AFD8A9D944EA4843, 
BIOS MBP102.88Z.0106.B0A.1509130955 09/13/2015
[ 2624.978770] Workqueue: phy1 rt2800usb_work_txdone
[ 2624.978773]    000f4240 
8801e5d23990
[ 2624.978778]  81b2e7d9 0007 8801e5d239c0 
8801e5d239a8
[ 2624.978782]  81bcb87d 850b928c 8801e5d23a30 
81bcbc4d
[ 2624.978785] Call Trace:
[ 2624.978792]  [] dump_stack+0x45/0x6c
[ 2624.978797]  [] ubsan_epilogue+0xd/0x40
[ 2624.978802]  [] handle_overflow+0xbd/0xe0
[ 2624.978807]  [] ? number+0x35f/0x600
[ 2624.978812]  [] __ubsan_handle_mul_overflow+0xe/0x10
[ 2624.978819]  [] minstrel_calc_rate_stats+0x58c/0x5f0
[ 2624.978823]  [] minstrel_ht_update_stats.isra.4+0x148/0xdd0
[ 2624.978827]  [] minstrel_ht_tx_status+0x683/0x1120
[ 2624.978832]  [] ieee80211_tx_status+0xff8/0x2d10
[ 2624.978836]  [] rt2x00lib_txdone+0x42c/0x11a0
[ 2624.978841]  [] ? update_curr+0x15a/0x400
[ 2624.978846]  [] rt2800_txdone_entry+0xc5/0x1b0
[ 2624.978850]  [] rt2800usb_work_txdone+0x6ff/0xbb0
[ 2624.978854]  [] process_one_work+0x266/0xc00
[ 2624.978857]  [] worker_thread+0x96/0xd40
[ 2624.978861]  [] ? process_scheduled_works+0x70/0x70
[ 2624.978865]  [] kthread+0x108/0x180
[ 2624.978869]  [] ? kthread_create_on_node+0x210/0x210
[ 2624.978874]  [] ret_from_fork+0x3f/0x70
[ 2624.978878]  [] ? kthread_create_on_node+0x210/0x210
[ 2624.978880]

UBSAN: Undefined behaviour in fs/btrfs/inode.c:5845:10

2016-01-26 Thread Chris Bainbridge

Booting 4.5.0-rc1 with new UBSAN checker enabled:

[3.859690] 

[3.859694] UBSAN: Undefined behaviour in fs/btrfs/inode.c:5845:10
[3.859696] signed integer overflow:
[3.859697] 9223372036854775807 + 1 cannot be represented in type 'long long 
int'
[3.859701] CPU: 3 PID: 3237 Comm: polkitd Not tainted 4.5.0-rc1 #252
[3.859702] Hardware name: Apple Inc. MacBookPro10,2/Mac-AFD8A9D944EA4843, 
BIOS MBP102.88Z.0106.B0A.1509130955 09/13/2015
[3.859706]    0001 
88024d4bfcf8
[3.859709]  81b2e7d9 0001 88024d4bfd28 
88024d4bfd10
[3.859711]  81bcb87d 83aceb48 88024d4bfd98 
81bcbc4d
[3.859712] Call Trace:
[3.859719]  [] dump_stack+0x45/0x6c
[3.859723]  [] ubsan_epilogue+0xd/0x40
[3.859725]  [] handle_overflow+0xbd/0xe0
[3.859728]  [] __ubsan_handle_add_overflow+0xe/0x10
[3.859732]  [] btrfs_real_readdir+0x881/0xc10
[3.859737]  [] iterate_dir+0xdd/0x2d0
[3.859740]  [] SyS_getdents+0x9b/0x110
[3.859743]  [] ? iterate_dir+0x2d0/0x2d0
[3.859747]  [] entry_SYSCALL_64_fastpath+0x12/0x6a
[3.859749]

UBSAN: Undefined behaviour in drivers/usb/core/devio.c:1517:25

2016-01-26 Thread Chris Bainbridge

Booting 4.5.0-rc1 with new UBSAN checker enabled:

[4.556968] 

[4.556972] UBSAN: Undefined behaviour in drivers/usb/core/devio.c:1517:25
[4.556975] shift exponent -1 is negative
[4.556979] CPU: 2 PID: 3624 Comm: usb Not tainted 4.5.0-rc1 #252
[4.556981] Hardware name: Apple Inc. MacBookPro10,2/Mac-AFD8A9D944EA4843, 
BIOS MBP102.88Z.0106.B0A.1509130955 09/13/2015
[4.556984]    845c6528 
8802493b3c68
[4.556988]  81b2e7d9 0007 8802493b3c98 
8802493b3c80
[4.556992]  81bcb87d  8802493b3d10 
81bcc1c1
[4.556996] Call Trace:
[4.557004]  [] dump_stack+0x45/0x6c
[4.557010]  [] ubsan_epilogue+0xd/0x40
[4.557015]  [] 
__ubsan_handle_shift_out_of_bounds+0xf1/0x140
[4.557020]  [] ? __kmalloc+0x209/0x5f0
[4.557025]  [] ? usb_alloc_urb+0x15/0x40
[4.557030]  [] ? proc_do_submiturb+0x9af/0x2c30
[4.557034]  [] proc_do_submiturb+0x2994/0x2c30
[4.557039]  [] ? blocking_notifier_call_chain+0x11/0x20
[4.557044]  [] usbdev_do_ioctl+0x90b/0x2170
[4.557049]  [] ? hrtimer_start_range_ns+0x274/0xc60
[4.557053]  [] usbdev_ioctl+0x9/0x10
[4.557059]  [] do_vfs_ioctl+0xd2/0xcb0
[4.557063]  [] SyS_ioctl+0x74/0x80
[4.557067]  [] entry_SYSCALL_64_fastpath+0x12/0x6a
[4.557070]

UBSAN: Undefined behaviour in ./arch/x86/include/asm/atomic.h:156:2

2016-01-26 Thread Chris Bainbridge

Booting 4.5.0-rc1 with new UBSAN checker enabled:

[3.805449] 

[3.805453] UBSAN: Undefined behaviour in 
./arch/x86/include/asm/atomic.h:156:2
[3.805455] signed integer overflow:
[3.805456] -1720106381 + -1531247276 cannot be represented in type 'int'
[3.805460] CPU: 3 PID: 3235 Comm: cups-browsed Not tainted 4.5.0-rc1 #252
[3.805461] Hardware name: Apple Inc. MacBookPro10,2/Mac-AFD8A9D944EA4843, 
BIOS MBP102.88Z.0106.B0A.1509130955 09/13/2015
[3.805465]    a4bb0554 
88025f2c37c8
[3.805468]  81b2e7d9 0001 88025f2c37f8 
88025f2c37e0
[3.805470]  81bcb87d 84b16a74 88025f2c3868 
81bcbc4d
[3.805471] Call Trace:
[3.805478][] dump_stack+0x45/0x6c
[3.805483]  [] ubsan_epilogue+0xd/0x40
[3.805485]  [] handle_overflow+0xbd/0xe0
[3.805490]  [] ? csum_partial_copy_nocheck+0xf/0x20
[3.805493]  [] ? get_random_bytes+0x4f/0x100
[3.805496]  [] __ubsan_handle_add_overflow+0xe/0x10
[3.805500]  [] ip_idents_reserve+0x9a/0xd0
[3.805503]  [] __ip_select_ident+0xc9/0x160
[3.805506]  [] __ip_make_skb+0x83a/0x1e70
[3.805509]  [] ? ip_setup_cork+0x4d0/0x4d0
[3.805511]  [] ip_push_pending_frames+0x1d/0x50
[3.805514]  [] ip_send_unicast_reply+0x3df/0xa80
[3.805517]  [] ? ac6_proc_exit+0x40/0x40
[3.805521]  [] tcp_v4_send_reset+0x4e4/0xe40
[3.805524]  [] tcp_v4_rcv+0x979/0x1db0
[3.805527]  [] ip_local_deliver_finish+0x139/0x600
[3.805529]  [] ip_local_deliver+0x10a/0x1a0
[3.805532]  [] ? ip_rcv_finish+0xb10/0xb10
[3.805534]  [] ip_rcv_finish+0x23d/0xb10
[3.805536]  [] ip_rcv+0x45d/0xa30
[3.805540]  [] ? wake_up_q+0x12/0xd0
[3.805543]  [] ? inet_del_offload+0x40/0x40
[3.805545]  [] ? ip_local_deliver+0x1a0/0x1a0
[3.805549]  [] __netif_receive_skb_core+0xd2c/0x1c30
[3.805552]  [] __netif_receive_skb+0x29/0x190
[3.805556]  [] ? rcu_gp_kthread_wake+0x6e/0xb0
[3.805559]  [] process_backlog+0x116/0x5d0
[3.805562]  [] net_rx_action+0x395/0x9c0
[3.805566]  [] __do_softirq+0xbc/0x590
[3.805569]  [] do_softirq_own_stack+0x1c/0x30
[3.805572][] do_softirq.part.17+0x1d/0x20
[3.805575]  [] __local_bh_enable_ip+0x94/0xc0
[3.805577]  [] ip_finish_output2+0x219/0x880
[3.805580]  [] ip_finish_output+0x280/0x690
[3.805582]  [] ip_output+0x100/0x210
[3.805585]  [] ? ip_fragment.constprop.24+0x190/0x190
[3.805588]  [] ip_local_out+0x3b/0x80
[3.805590]  [] ip_queue_xmit+0x274/0x1000
[3.805593]  [] ? __skb_clone+0x5b/0x550
[3.805595]  [] tcp_transmit_skb+0x7b6/0x2260
[3.805598]  [] tcp_connect+0xf2d/0x1fe0
[3.805601]  [] ? secure_tcp_sequence_number+0x7c/0xd0
[3.805603]  [] tcp_v4_connect+0x4a7/0x15f0
[3.805607]  [] __inet_stream_connect+0xf1/0x880
[3.805609]  [] ? _raw_spin_unlock_bh+0x26/0x40
[3.805612]  [] ? __inet_stream_connect+0x880/0x880
[3.805615]  [] inet_stream_connect+0x38/0x80
[3.805618]  [] SYSC_connect+0xe7/0x170
[3.805622]  [] ? selinux_file_fcntl+0x8a/0x100
[3.805625]  [] ? security_file_fcntl+0x48/0x80
[3.805629]  [] ? SyS_fcntl+0x542/0x8b0
[3.805632]  [] SyS_connect+0x9/0x10
[3.805634]  [] entry_SYSCALL_64_fastpath+0x12/0x6a
[3.805636]

UBSAN: Undefined behaviour in drivers/usb/core/devio.c:1517:25

2016-01-26 Thread Chris Bainbridge

Booting 4.5.0-rc1 with new UBSAN checker enabled:

[4.556968] 

[4.556972] UBSAN: Undefined behaviour in drivers/usb/core/devio.c:1517:25
[4.556975] shift exponent -1 is negative
[4.556979] CPU: 2 PID: 3624 Comm: usb Not tainted 4.5.0-rc1 #252
[4.556981] Hardware name: Apple Inc. MacBookPro10,2/Mac-AFD8A9D944EA4843, 
BIOS MBP102.88Z.0106.B0A.1509130955 09/13/2015
[4.556984]    845c6528 
8802493b3c68
[4.556988]  81b2e7d9 0007 8802493b3c98 
8802493b3c80
[4.556992]  81bcb87d  8802493b3d10 
81bcc1c1
[4.556996] Call Trace:
[4.557004]  [] dump_stack+0x45/0x6c
[4.557010]  [] ubsan_epilogue+0xd/0x40
[4.557015]  [] 
__ubsan_handle_shift_out_of_bounds+0xf1/0x140
[4.557020]  [] ? __kmalloc+0x209/0x5f0
[4.557025]  [] ? usb_alloc_urb+0x15/0x40
[4.557030]  [] ? proc_do_submiturb+0x9af/0x2c30
[4.557034]  [] proc_do_submiturb+0x2994/0x2c30
[4.557039]  [] ? blocking_notifier_call_chain+0x11/0x20
[4.557044]  [] usbdev_do_ioctl+0x90b/0x2170
[4.557049]  [] ? hrtimer_start_range_ns+0x274/0xc60
[4.557053]  [] usbdev_ioctl+0x9/0x10
[4.557059]  [] do_vfs_ioctl+0xd2/0xcb0
[4.557063]  [] SyS_ioctl+0x74/0x80
[4.557067]  [] entry_SYSCALL_64_fastpath+0x12/0x6a
[4.557070]

UBSAN: Undefined behaviour in net/mac80211/rc80211_minstrel.h:47:34

2016-01-26 Thread Chris Bainbridge

4.5.0-rc1 with the new UBSAN checker:

[ 2624.978741] 

[ 2624.978749] UBSAN: Undefined behaviour in 
net/mac80211/rc80211_minstrel.h:47:34
[ 2624.978752] signed integer overflow:
[ 2624.978754] -32768 * 100 cannot be represented in type 'int'
[ 2624.978759] CPU: 1 PID: 5362 Comm: kworker/u16:2 Not tainted 4.5.0-rc1 #252
[ 2624.978762] Hardware name: Apple Inc. MacBookPro10,2/Mac-AFD8A9D944EA4843, 
BIOS MBP102.88Z.0106.B0A.1509130955 09/13/2015
[ 2624.978770] Workqueue: phy1 rt2800usb_work_txdone
[ 2624.978773]    000f4240 
8801e5d23990
[ 2624.978778]  81b2e7d9 0007 8801e5d239c0 
8801e5d239a8
[ 2624.978782]  81bcb87d 850b928c 8801e5d23a30 
81bcbc4d
[ 2624.978785] Call Trace:
[ 2624.978792]  [] dump_stack+0x45/0x6c
[ 2624.978797]  [] ubsan_epilogue+0xd/0x40
[ 2624.978802]  [] handle_overflow+0xbd/0xe0
[ 2624.978807]  [] ? number+0x35f/0x600
[ 2624.978812]  [] __ubsan_handle_mul_overflow+0xe/0x10
[ 2624.978819]  [] minstrel_calc_rate_stats+0x58c/0x5f0
[ 2624.978823]  [] minstrel_ht_update_stats.isra.4+0x148/0xdd0
[ 2624.978827]  [] minstrel_ht_tx_status+0x683/0x1120
[ 2624.978832]  [] ieee80211_tx_status+0xff8/0x2d10
[ 2624.978836]  [] rt2x00lib_txdone+0x42c/0x11a0
[ 2624.978841]  [] ? update_curr+0x15a/0x400
[ 2624.978846]  [] rt2800_txdone_entry+0xc5/0x1b0
[ 2624.978850]  [] rt2800usb_work_txdone+0x6ff/0xbb0
[ 2624.978854]  [] process_one_work+0x266/0xc00
[ 2624.978857]  [] worker_thread+0x96/0xd40
[ 2624.978861]  [] ? process_scheduled_works+0x70/0x70
[ 2624.978865]  [] kthread+0x108/0x180
[ 2624.978869]  [] ? kthread_create_on_node+0x210/0x210
[ 2624.978874]  [] ret_from_fork+0x3f/0x70
[ 2624.978878]  [] ? kthread_create_on_node+0x210/0x210
[ 2624.978880]

UBSAN: Undefined behaviour in ./arch/x86/include/asm/atomic.h:156:2

2016-01-26 Thread Chris Bainbridge

Booting 4.5.0-rc1 with new UBSAN checker enabled:

[3.805449] 

[3.805453] UBSAN: Undefined behaviour in 
./arch/x86/include/asm/atomic.h:156:2
[3.805455] signed integer overflow:
[3.805456] -1720106381 + -1531247276 cannot be represented in type 'int'
[3.805460] CPU: 3 PID: 3235 Comm: cups-browsed Not tainted 4.5.0-rc1 #252
[3.805461] Hardware name: Apple Inc. MacBookPro10,2/Mac-AFD8A9D944EA4843, 
BIOS MBP102.88Z.0106.B0A.1509130955 09/13/2015
[3.805465]    a4bb0554 
88025f2c37c8
[3.805468]  81b2e7d9 0001 88025f2c37f8 
88025f2c37e0
[3.805470]  81bcb87d 84b16a74 88025f2c3868 
81bcbc4d
[3.805471] Call Trace:
[3.805478][] dump_stack+0x45/0x6c
[3.805483]  [] ubsan_epilogue+0xd/0x40
[3.805485]  [] handle_overflow+0xbd/0xe0
[3.805490]  [] ? csum_partial_copy_nocheck+0xf/0x20
[3.805493]  [] ? get_random_bytes+0x4f/0x100
[3.805496]  [] __ubsan_handle_add_overflow+0xe/0x10
[3.805500]  [] ip_idents_reserve+0x9a/0xd0
[3.805503]  [] __ip_select_ident+0xc9/0x160
[3.805506]  [] __ip_make_skb+0x83a/0x1e70
[3.805509]  [] ? ip_setup_cork+0x4d0/0x4d0
[3.805511]  [] ip_push_pending_frames+0x1d/0x50
[3.805514]  [] ip_send_unicast_reply+0x3df/0xa80
[3.805517]  [] ? ac6_proc_exit+0x40/0x40
[3.805521]  [] tcp_v4_send_reset+0x4e4/0xe40
[3.805524]  [] tcp_v4_rcv+0x979/0x1db0
[3.805527]  [] ip_local_deliver_finish+0x139/0x600
[3.805529]  [] ip_local_deliver+0x10a/0x1a0
[3.805532]  [] ? ip_rcv_finish+0xb10/0xb10
[3.805534]  [] ip_rcv_finish+0x23d/0xb10
[3.805536]  [] ip_rcv+0x45d/0xa30
[3.805540]  [] ? wake_up_q+0x12/0xd0
[3.805543]  [] ? inet_del_offload+0x40/0x40
[3.805545]  [] ? ip_local_deliver+0x1a0/0x1a0
[3.805549]  [] __netif_receive_skb_core+0xd2c/0x1c30
[3.805552]  [] __netif_receive_skb+0x29/0x190
[3.805556]  [] ? rcu_gp_kthread_wake+0x6e/0xb0
[3.805559]  [] process_backlog+0x116/0x5d0
[3.805562]  [] net_rx_action+0x395/0x9c0
[3.805566]  [] __do_softirq+0xbc/0x590
[3.805569]  [] do_softirq_own_stack+0x1c/0x30
[3.805572][] do_softirq.part.17+0x1d/0x20
[3.805575]  [] __local_bh_enable_ip+0x94/0xc0
[3.805577]  [] ip_finish_output2+0x219/0x880
[3.805580]  [] ip_finish_output+0x280/0x690
[3.805582]  [] ip_output+0x100/0x210
[3.805585]  [] ? ip_fragment.constprop.24+0x190/0x190
[3.805588]  [] ip_local_out+0x3b/0x80
[3.805590]  [] ip_queue_xmit+0x274/0x1000
[3.805593]  [] ? __skb_clone+0x5b/0x550
[3.805595]  [] tcp_transmit_skb+0x7b6/0x2260
[3.805598]  [] tcp_connect+0xf2d/0x1fe0
[3.805601]  [] ? secure_tcp_sequence_number+0x7c/0xd0
[3.805603]  [] tcp_v4_connect+0x4a7/0x15f0
[3.805607]  [] __inet_stream_connect+0xf1/0x880
[3.805609]  [] ? _raw_spin_unlock_bh+0x26/0x40
[3.805612]  [] ? __inet_stream_connect+0x880/0x880
[3.805615]  [] inet_stream_connect+0x38/0x80
[3.805618]  [] SYSC_connect+0xe7/0x170
[3.805622]  [] ? selinux_file_fcntl+0x8a/0x100
[3.805625]  [] ? security_file_fcntl+0x48/0x80
[3.805629]  [] ? SyS_fcntl+0x542/0x8b0
[3.805632]  [] SyS_connect+0x9/0x10
[3.805634]  [] entry_SYSCALL_64_fastpath+0x12/0x6a
[3.805636]

UBSAN: Undefined behaviour in fs/ext4/mballoc.c:1274:11

2016-01-26 Thread Chris Bainbridge

Mounting ext4 fs on 4.5.0-rc1

[ 1176.592557] 

[ 1176.592565] UBSAN: Undefined behaviour in fs/ext4/mballoc.c:1274:11
[ 1176.592567] shift exponent -1 is negative
[ 1176.592571] CPU: 3 PID: 4976 Comm: QThread Not tainted 4.5.0-rc1 #252
[ 1176.592573] Hardware name: Apple Inc. MacBookPro10,2/Mac-AFD8A9D944EA4843, 
BIOS MBP102.88Z.0106.B0A.1509130955 09/13/2015
[ 1176.592575]    8379dede 
88021f4bf8b8
[ 1176.592578]  81b2e7d9 0007 88021f4bf8e8 
88021f4bf8d0
[ 1176.592581]  81bcb87d  88021f4bf968 
81bcc1c1
[ 1176.592583] Call Trace:
[ 1176.592590]  [] dump_stack+0x45/0x6c
[ 1176.592595]  [] ubsan_epilogue+0xd/0x40
[ 1176.592597]  [] 
__ubsan_handle_shift_out_of_bounds+0xf1/0x140
[ 1176.592601]  [] ? ext4_mb_init_cache+0x677/0x1570
[ 1176.592604]  [] ? lru_cache_add+0x9/0x10
[ 1176.592607]  [] mb_find_order_for_block+0x132/0x1e0
[ 1176.592609]  [] mb_find_extent+0xde/0x6d0
[ 1176.592611]  [] ext4_mb_complex_scan_group+0x14b/0x8a0
[ 1176.592614]  [] ext4_mb_regular_allocator+0x3a7/0xcc0
[ 1176.592616]  [] ext4_mb_new_blocks+0x4c5/0x1270
[ 1176.592620]  [] ? ext4_ext_put_gap_in_cache+0x116/0x230
[ 1176.592622]  [] ext4_ext_map_blocks+0x81c/0x2280
[ 1176.592626]  [] ext4_map_blocks+0x1b6/0xe00
[ 1176.592628]  [] ext4_getblk+0x49/0x290
[ 1176.592630]  [] ext4_bread+0xe/0xb0
[ 1176.592633]  [] ext4_append+0xad/0x340
[ 1176.592635]  [] ext4_mkdir+0x344/0x970
[ 1176.592639]  [] vfs_mkdir+0x103/0x310
[ 1176.592642]  [] SyS_mkdir+0x93/0x130
[ 1176.592645]  [] entry_SYSCALL_64_fastpath+0x12/0x6a
[ 1176.592647]

UBSAN: Undefined behaviour in fs/btrfs/inode.c:5845:10

2016-01-26 Thread Chris Bainbridge

Booting 4.5.0-rc1 with new UBSAN checker enabled:

[3.859690] 

[3.859694] UBSAN: Undefined behaviour in fs/btrfs/inode.c:5845:10
[3.859696] signed integer overflow:
[3.859697] 9223372036854775807 + 1 cannot be represented in type 'long long 
int'
[3.859701] CPU: 3 PID: 3237 Comm: polkitd Not tainted 4.5.0-rc1 #252
[3.859702] Hardware name: Apple Inc. MacBookPro10,2/Mac-AFD8A9D944EA4843, 
BIOS MBP102.88Z.0106.B0A.1509130955 09/13/2015
[3.859706]    0001 
88024d4bfcf8
[3.859709]  81b2e7d9 0001 88024d4bfd28 
88024d4bfd10
[3.859711]  81bcb87d 83aceb48 88024d4bfd98 
81bcbc4d
[3.859712] Call Trace:
[3.859719]  [] dump_stack+0x45/0x6c
[3.859723]  [] ubsan_epilogue+0xd/0x40
[3.859725]  [] handle_overflow+0xbd/0xe0
[3.859728]  [] __ubsan_handle_add_overflow+0xe/0x10
[3.859732]  [] btrfs_real_readdir+0x881/0xc10
[3.859737]  [] iterate_dir+0xdd/0x2d0
[3.859740]  [] SyS_getdents+0x9b/0x110
[3.859743]  [] ? iterate_dir+0x2d0/0x2d0
[3.859747]  [] entry_SYSCALL_64_fastpath+0x12/0x6a
[3.859749]

Re: UBSAN: Undefined behaviour in fs/btrfs/inode.c:5845:10

2016-01-26 Thread Chris Bainbridge

On Tue, Jan 26, 2016 at 12:03:39PM +0100, David Sterba wrote:
> On Tue, Jan 26, 2016 at 10:13:33AM +0000, Chris Bainbridge wrote:
> > Booting 4.5.0-rc1 with new UBSAN checker enabled:
> > 
> > [3.859690] 
> > 
> > [3.859694] UBSAN: Undefined behaviour in fs/btrfs/inode.c:5845:10
> > [3.859696] signed integer overflow:
> > [3.859697] 9223372036854775807 + 1 cannot be represented in type 'long 
> > long int'
> > [3.859701] CPU: 3 PID: 3237 Comm: polkitd Not tainted 4.5.0-rc1 #252
> > [3.859702] Hardware name: Apple Inc. 
> > MacBookPro10,2/Mac-AFD8A9D944EA4843, BIOS MBP102.88Z.0106.B0A.1509130955 
> > 09/13/2015
> > [3.859706]    0001 
> > 88024d4bfcf8
> > [3.859709]  81b2e7d9 0001 88024d4bfd28 
> > 88024d4bfd10
> > [3.859711]  81bcb87d 83aceb48 88024d4bfd98 
> > 81bcbc4d
> > [3.859712] Call Trace:
> > [3.859719]  [] dump_stack+0x45/0x6c
> > [3.859723]  [] ubsan_epilogue+0xd/0x40
> > [3.859725]  [] handle_overflow+0xbd/0xe0
> > [3.859728]  [] __ubsan_handle_add_overflow+0xe/0x10
> > [3.859732]  [] btrfs_real_readdir+0x881/0xc10
> > [3.859737]  [] iterate_dir+0xdd/0x2d0
> > [3.859740]  [] SyS_getdents+0x9b/0x110
> > [3.859743]  [] ? iterate_dir+0x2d0/0x2d0
> > [3.859747]  [] entry_SYSCALL_64_fastpath+0x12/0x6a
> > [3.859749] 
> > 
> 
> That seems to be the same overflow as reported in the past, caught by
> the PaX SIZE_OVERFLOW plugin. There's a patch but not merged yet.
> 
> https://patchwork.kernel.org/patch/7611351/

Yes, the error does not appear with that patch applied.

UBSAN: Undefined behaviour in net/mac80211/rx.c:924:18

2016-01-26 Thread Chris Bainbridge

4.5.0-rc1 another UBSAN error:

[ 4845.229441] 

[ 4845.229454] UBSAN: Undefined behaviour in net/mac80211/rx.c:924:18
[ 4845.229458] load of value 2 is not a valid value for type '_Bool'
[ 4845.229464] CPU: 1 PID: 6266 Comm: kworker/u16:8 Not tainted 4.5.0-rc1 #252
[ 4845.229468] Hardware name: Apple Inc. MacBookPro10,2/Mac-AFD8A9D944EA4843, 
BIOS MBP102.88Z.0106.B0A.1509130955 09/13/2015
[ 4845.229491] Workqueue: phy2 rt2x00usb_work_rxdone
[ 4845.229493]    0002 
8801b13c39f8
[ 4845.229496]  81b2e7d9 0007 8801b13c3a20 
8801b13c3a10
[ 4845.229498]  81bcb87d 85016890 8801b13c3a60 
81bcc279
[ 4845.229501] Call Trace:
[ 4845.229506]  [] dump_stack+0x45/0x6c
[ 4845.229510]  [] ubsan_epilogue+0xd/0x40
[ 4845.229513]  [] __ubsan_handle_load_invalid_value+0x69/0x80
[ 4845.229517]  [] ? 
xhci_setup_addressable_virt_dev+0xeb2/0x13b0
[ 4845.229520]  [] ? pick_next_entity+0xcb/0x280
[ 4845.229524]  [] 
ieee80211_sta_reorder_release.isra.15+0x7e3/0xad0
[ 4845.229527]  [] 
ieee80211_prepare_and_rx_handle+0x11a7/0x2ab0
[ 4845.229530]  [] ? xhci_urb_enqueue+0x394/0x1140
[ 4845.229533]  [] ? usb_hcd_map_urb_for_dma+0x94f/0x1140
[ 4845.229537]  [] ? skb_release_data+0x117/0x2f0
[ 4845.229539]  [] __ieee80211_rx_handle_packet+0x26a/0x9a0
[ 4845.229542]  [] ? __kmalloc_reserve.isra.11+0x2c/0x80
[ 4845.229545]  [] ieee80211_rx_napi+0x651/0x12b0
[ 4845.229549]  [] rt2x00lib_rxdone+0x402/0x1120
[ 4845.229552]  [] ? dequeue_task_fair+0x97f/0x41d0
[ 4845.229554]  [] rt2x00usb_work_rxdone+0xac/0x1f0
[ 4845.229558]  [] ? __schedule+0x5cd/0x1770
[ 4845.229561]  [] process_one_work+0x266/0xc00
[ 4845.229563]  [] worker_thread+0x96/0xd40
[ 4845.229565]  [] ? process_scheduled_works+0x70/0x70
[ 4845.229568]  [] kthread+0x108/0x180
[ 4845.229571]  [] ? kthread_create_on_node+0x210/0x210
[ 4845.229573]  [] ret_from_fork+0x3f/0x70
[ 4845.229576]  [] ? kthread_create_on_node+0x210/0x210
[ 4845.229577]

Re: 4.4-rc, intel dri i915, regular hangs and corruptions

2015-12-28 Thread Chris Bainbridge

On 17 December 2015 at 18:34, Norbert Preining  wrote:
> * font corruption
>   sometime sets of glyphs, or practically all glyphs disappear
>   related probably to bug
>   https://bugs.freedesktop.org/show_bug.cgi?id=55500
>   I have sent some info there already, without response
>
>   Currently my font displays some kind of strange symbols instead of
>   an m ... looks a bit like a Kanji.

I remember a similar bug around 2.99.917 but that tag is over a year
old now and there have been many bug fixes since. You'll need to
verify you can still reproduce your issue with the latest from
git://anongit.freedesktop.org/xorg/driver/xf86-video-intel and if so
do a bisect from the previous working kernel or xf86-video-intel to
identify the problematic commit.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 4.4-rc, intel dri i915, regular hangs and corruptions

2015-12-28 Thread Chris Bainbridge

On 17 December 2015 at 18:34, Norbert Preining  wrote:
> * font corruption
>   sometime sets of glyphs, or practically all glyphs disappear
>   related probably to bug
>   https://bugs.freedesktop.org/show_bug.cgi?id=55500
>   I have sent some info there already, without response
>
>   Currently my font displays some kind of strange symbols instead of
>   an m ... looks a bit like a Kanji.

I remember a similar bug around 2.99.917 but that tag is over a year
old now and there have been many bug fixes since. You'll need to
verify you can still reproduce your issue with the latest from
git://anongit.freedesktop.org/xorg/driver/xf86-video-intel and if so
do a bisect from the previous working kernel or xf86-video-intel to
identify the problematic commit.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] Revert "ACPI / SBS: Add 5 us delay to fix SBS hangs on MacBook"

2015-11-12 Thread Chris Bainbridge

This reverts commit 3349fb64b2927407017d970dd5c4daf3c5ad69f8.

After commit 8848d20cf09c ("ACPI / SMBus: Fix boot stalls / high CPU
caused by reentrant code") the delay is no longer necessary.

Signed-off-by: Chris Bainbridge 
---
 drivers/acpi/sbshc.c | 22 --
 1 file changed, 22 deletions(-)

diff --git a/drivers/acpi/sbshc.c b/drivers/acpi/sbshc.c
index e2900518cf7e..2fa8304171e0 100644
--- a/drivers/acpi/sbshc.c
+++ b/drivers/acpi/sbshc.c
@@ -14,7 +14,6 @@
 #include 
 #include 
 #include 
-#include 
 #include "sbshc.h"
 
 #define PREFIX "ACPI: "
@@ -89,8 +88,6 @@ enum acpi_smb_offset {
ACPI_SMB_ALARM_DATA = 0x26, /* 2 bytes alarm data */
 };
 
-static bool macbook;
-
 static inline int smb_hc_read(struct acpi_smb_hc *hc, u8 address, u8 *data)
 {
return ec_read(hc->offset + address, data);
@@ -121,8 +118,6 @@ static int acpi_smbus_transaction(struct acpi_smb_hc *hc, 
u8 protocol,
 
mutex_lock(>lock);
hc->done = false;
-   if (macbook)
-   udelay(5);
if (smb_hc_read(hc, ACPI_SMB_PROTOCOL, ))
goto end;
if (temp) {
@@ -250,29 +245,12 @@ extern int acpi_ec_add_query_handler(struct acpi_ec *ec, 
u8 query_bit,
  acpi_handle handle, acpi_ec_query_func func,
  void *data);
 
-static int macbook_dmi_match(const struct dmi_system_id *d)
-{
-   pr_debug("Detected MacBook, enabling workaround\n");
-   macbook = true;
-   return 0;
-}
-
-static struct dmi_system_id acpi_smbus_dmi_table[] = {
-   { macbook_dmi_match, "Apple MacBook", {
- DMI_MATCH(DMI_BOARD_VENDOR, "Apple"),
- DMI_MATCH(DMI_PRODUCT_NAME, "MacBook") },
-   },
-   { },
-};
-
 static int acpi_smbus_hc_add(struct acpi_device *device)
 {
int status;
unsigned long long val;
struct acpi_smb_hc *hc;
 
-   dmi_check_system(acpi_smbus_dmi_table);
-
if (!device)
return -EINVAL;
 
-- 
2.1.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] ACPI / SMBus: Fix boot stalls / high CPU caused by reentrant code

2015-11-12 Thread Chris Bainbridge

In the SBS initialisation, a reentrant call to wait_event_timeout()
causes an intermittent boot stall of several minutes usually following
the "Switching to clocksource tsc" message. Another symptom of this bug
is high CPU usage from programs (Firefox, upowerd) querying the battery
state. This is caused by:

 1. drivers/acpi/sbshc.c wait_transaction_complete() calls
wait_event_timeout():

if (wait_event_timeout(hc->wait, smb_check_done(hc),
   msecs_to_jiffies(timeout)))

 2. ___wait_event sets task state to uninterruptible

 3. ___wait_event calls the "condition" smb_check_done()

 4. smb_check_done (sbshc.c) calls through to ec_read() in
drivers/acpi/ec.c

 5. ec_guard() is reached which calls wait_event_timeout()

if (wait_event_timeout(ec->wait,
   ec_transaction_completed(ec),
   guard))

ie. wait_event_timeout() is being called again inside evaluation of
the previous wait_event_timeout() condition

 5. The EC IRQ handler calls wake_up() and wakes up the sleeping task in
ec_guard()

 6. The task is now in state running even though the wait "condition" is
still being evaluated

 7. The "condition" check returns false so ___wait_event calls
schedule_timeout()

 8. Since the task state is running, the scheduler immediately schedules
it again

 9. This loop usually repeats for around 250 seconds even though the
original wait_event_timeout was only 1000ms.

The timeout is incorrect because each call to schedule_timeout()
usually returns immediately, taking less than 1ms, so the jiffies
timeout counter is not decremented. The task is now stuck in a
running state, and so is highly likely to be immediately
rescheduled, which takes less than a jiffy. The loop will never exit
if all schedule_timeout() calls take less than a jiffy.

Fix this by replacing SMBus reads in the wait_event_timeout condition
with checks of a boolean value that is updated by the EC query handler.

Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=107191
Link: https://lkml.org/lkml/2015/11/6/776
Signed-off-by: Chris Bainbridge 
---
 drivers/acpi/sbshc.c | 26 +++---
 1 file changed, 7 insertions(+), 19 deletions(-)

diff --git a/drivers/acpi/sbshc.c b/drivers/acpi/sbshc.c
index bf034f8..e290051 100644
--- a/drivers/acpi/sbshc.c
+++ b/drivers/acpi/sbshc.c
@@ -30,6 +30,7 @@ struct acpi_smb_hc {
u8 query_bit;
smbus_alarm_callback callback;
void *context;
+   bool done;
 };
 
 static int acpi_smbus_hc_add(struct acpi_device *device);
@@ -100,27 +101,11 @@ static inline int smb_hc_write(struct acpi_smb_hc *hc, u8 
address, u8 data)
return ec_write(hc->offset + address, data);
 }
 
-static inline int smb_check_done(struct acpi_smb_hc *hc)
-{
-   union acpi_smb_status status = {.raw = 0};
-   smb_hc_read(hc, ACPI_SMB_STATUS, );
-   return status.fields.done && (status.fields.status == SMBUS_OK);
-}
-
 static int wait_transaction_complete(struct acpi_smb_hc *hc, int timeout)
 {
-   if (wait_event_timeout(hc->wait, smb_check_done(hc),
-  msecs_to_jiffies(timeout)))
+   if (wait_event_timeout(hc->wait, hc->done, msecs_to_jiffies(timeout)))
return 0;
-   /*
-* After the timeout happens, OS will try to check the status of SMbus.
-* If the status is what OS expected, it will be regarded as the bogus
-* timeout.
-*/
-   if (smb_check_done(hc))
-   return 0;
-   else
-   return -ETIME;
+   return -ETIME;
 }
 
 static int acpi_smbus_transaction(struct acpi_smb_hc *hc, u8 protocol,
@@ -135,6 +120,7 @@ static int acpi_smbus_transaction(struct acpi_smb_hc *hc, 
u8 protocol,
}
 
mutex_lock(>lock);
+   hc->done = false;
if (macbook)
udelay(5);
if (smb_hc_read(hc, ACPI_SMB_PROTOCOL, ))
@@ -235,8 +221,10 @@ static int smbus_alarm(void *context)
if (smb_hc_read(hc, ACPI_SMB_STATUS, ))
return 0;
/* Check if it is only a completion notify */
-   if (status.fields.done)
+   if (status.fields.done && status.fields.status == SMBUS_OK) {
+   hc->done = true;
wake_up(>wait);
+   }
if (!status.fields.alarm)
return 0;
mutex_lock(>lock);
-- 
2.1.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] Revert "ACPI / SBS: Add 5 us delay to fix SBS hangs on MacBook"

2015-11-12 Thread Chris Bainbridge

This reverts commit 3349fb64b2927407017d970dd5c4daf3c5ad69f8.

After commit 8848d20cf09c ("ACPI / SMBus: Fix boot stalls / high CPU
caused by reentrant code") the delay is no longer necessary.

Signed-off-by: Chris Bainbridge <chris.bainbri...@gmail.com>
---
 drivers/acpi/sbshc.c | 22 --
 1 file changed, 22 deletions(-)

diff --git a/drivers/acpi/sbshc.c b/drivers/acpi/sbshc.c
index e2900518cf7e..2fa8304171e0 100644
--- a/drivers/acpi/sbshc.c
+++ b/drivers/acpi/sbshc.c
@@ -14,7 +14,6 @@
 #include 
 #include 
 #include 
-#include 
 #include "sbshc.h"
 
 #define PREFIX "ACPI: "
@@ -89,8 +88,6 @@ enum acpi_smb_offset {
ACPI_SMB_ALARM_DATA = 0x26, /* 2 bytes alarm data */
 };
 
-static bool macbook;
-
 static inline int smb_hc_read(struct acpi_smb_hc *hc, u8 address, u8 *data)
 {
return ec_read(hc->offset + address, data);
@@ -121,8 +118,6 @@ static int acpi_smbus_transaction(struct acpi_smb_hc *hc, 
u8 protocol,
 
mutex_lock(>lock);
hc->done = false;
-   if (macbook)
-   udelay(5);
if (smb_hc_read(hc, ACPI_SMB_PROTOCOL, ))
goto end;
if (temp) {
@@ -250,29 +245,12 @@ extern int acpi_ec_add_query_handler(struct acpi_ec *ec, 
u8 query_bit,
  acpi_handle handle, acpi_ec_query_func func,
  void *data);
 
-static int macbook_dmi_match(const struct dmi_system_id *d)
-{
-   pr_debug("Detected MacBook, enabling workaround\n");
-   macbook = true;
-   return 0;
-}
-
-static struct dmi_system_id acpi_smbus_dmi_table[] = {
-   { macbook_dmi_match, "Apple MacBook", {
- DMI_MATCH(DMI_BOARD_VENDOR, "Apple"),
- DMI_MATCH(DMI_PRODUCT_NAME, "MacBook") },
-   },
-   { },
-};
-
 static int acpi_smbus_hc_add(struct acpi_device *device)
 {
int status;
unsigned long long val;
struct acpi_smb_hc *hc;
 
-   dmi_check_system(acpi_smbus_dmi_table);
-
if (!device)
return -EINVAL;
 
-- 
2.1.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] ACPI / SMBus: Fix boot stalls / high CPU caused by reentrant code

2015-11-12 Thread Chris Bainbridge

In the SBS initialisation, a reentrant call to wait_event_timeout()
causes an intermittent boot stall of several minutes usually following
the "Switching to clocksource tsc" message. Another symptom of this bug
is high CPU usage from programs (Firefox, upowerd) querying the battery
state. This is caused by:

 1. drivers/acpi/sbshc.c wait_transaction_complete() calls
wait_event_timeout():

if (wait_event_timeout(hc->wait, smb_check_done(hc),
   msecs_to_jiffies(timeout)))

 2. ___wait_event sets task state to uninterruptible

 3. ___wait_event calls the "condition" smb_check_done()

 4. smb_check_done (sbshc.c) calls through to ec_read() in
drivers/acpi/ec.c

 5. ec_guard() is reached which calls wait_event_timeout()

if (wait_event_timeout(ec->wait,
   ec_transaction_completed(ec),
   guard))

ie. wait_event_timeout() is being called again inside evaluation of
the previous wait_event_timeout() condition

 5. The EC IRQ handler calls wake_up() and wakes up the sleeping task in
ec_guard()

 6. The task is now in state running even though the wait "condition" is
still being evaluated

 7. The "condition" check returns false so ___wait_event calls
schedule_timeout()

 8. Since the task state is running, the scheduler immediately schedules
it again

 9. This loop usually repeats for around 250 seconds even though the
original wait_event_timeout was only 1000ms.

The timeout is incorrect because each call to schedule_timeout()
usually returns immediately, taking less than 1ms, so the jiffies
timeout counter is not decremented. The task is now stuck in a
running state, and so is highly likely to be immediately
rescheduled, which takes less than a jiffy. The loop will never exit
if all schedule_timeout() calls take less than a jiffy.

Fix this by replacing SMBus reads in the wait_event_timeout condition
with checks of a boolean value that is updated by the EC query handler.

Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=107191
Link: https://lkml.org/lkml/2015/11/6/776
Signed-off-by: Chris Bainbridge <chris.bainbri...@gmail.com>
---
 drivers/acpi/sbshc.c | 26 +++---
 1 file changed, 7 insertions(+), 19 deletions(-)

diff --git a/drivers/acpi/sbshc.c b/drivers/acpi/sbshc.c
index bf034f8..e290051 100644
--- a/drivers/acpi/sbshc.c
+++ b/drivers/acpi/sbshc.c
@@ -30,6 +30,7 @@ struct acpi_smb_hc {
u8 query_bit;
smbus_alarm_callback callback;
void *context;
+   bool done;
 };
 
 static int acpi_smbus_hc_add(struct acpi_device *device);
@@ -100,27 +101,11 @@ static inline int smb_hc_write(struct acpi_smb_hc *hc, u8 
address, u8 data)
return ec_write(hc->offset + address, data);
 }
 
-static inline int smb_check_done(struct acpi_smb_hc *hc)
-{
-   union acpi_smb_status status = {.raw = 0};
-   smb_hc_read(hc, ACPI_SMB_STATUS, );
-   return status.fields.done && (status.fields.status == SMBUS_OK);
-}
-
 static int wait_transaction_complete(struct acpi_smb_hc *hc, int timeout)
 {
-   if (wait_event_timeout(hc->wait, smb_check_done(hc),
-  msecs_to_jiffies(timeout)))
+   if (wait_event_timeout(hc->wait, hc->done, msecs_to_jiffies(timeout)))
return 0;
-   /*
-* After the timeout happens, OS will try to check the status of SMbus.
-* If the status is what OS expected, it will be regarded as the bogus
-* timeout.
-*/
-   if (smb_check_done(hc))
-   return 0;
-   else
-   return -ETIME;
+   return -ETIME;
 }
 
 static int acpi_smbus_transaction(struct acpi_smb_hc *hc, u8 protocol,
@@ -135,6 +120,7 @@ static int acpi_smbus_transaction(struct acpi_smb_hc *hc, 
u8 protocol,
}
 
mutex_lock(>lock);
+   hc->done = false;
if (macbook)
udelay(5);
if (smb_hc_read(hc, ACPI_SMB_PROTOCOL, ))
@@ -235,8 +221,10 @@ static int smbus_alarm(void *context)
if (smb_hc_read(hc, ACPI_SMB_STATUS, ))
return 0;
/* Check if it is only a completion notify */
-   if (status.fields.done)
+   if (status.fields.done && status.fields.status == SMBUS_OK) {
+   hc->done = true;
wake_up(>wait);
+   }
if (!status.fields.alarm)
return 0;
mutex_lock(>lock);
-- 
2.1.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Preserve task state in reentrant calls to ___wait_event

2015-11-07 Thread Chris Bainbridge

On Fri, Nov 06, 2015 at 08:44:08PM +, Chris Bainbridge wrote:
> -#define ___wait_event(wq, condition, state, exclusive, ret, cmd) \
> +#define ___wait_event(wq, condition, nstate, exclusive, ret, cmd)\
>  ({   \
>   __label__ __out;\
>   wait_queue_t __wait;\
>   long __ret = ret;   /* explicit shadow */   \
> + long ostate = current->state;   \

XXX

>   \
>   INIT_LIST_HEAD(&__wait.task_list);  \
>   if (exclusive)  \
> @@ -222,16 +223,16 @@ wait_queue_head_t *bit_waitqueue(void *, int);
>   __wait.flags = 0;   \
>   \
>   for (;;) {  \
> - long __int = prepare_to_wait_event(, &__wait, state);\
> + long __int = prepare_to_wait_event(, &__wait, nstate);\
>   \
>   if (condition)  \
>   break;  \
>   \
> - if (___wait_is_interruptible(state) && __int) { \
> + if (___wait_is_interruptible(nstate) && __int) {\
>   __ret = __int;  \
>   if (exclusive) {\
>   abort_exclusive_wait(, &__wait,  \
> -  state, NULL);  \
> +  nstate, NULL); \
>   goto __out; \
>   }   \
>   break;  \
> @@ -240,6 +241,7 @@ wait_queue_head_t *bit_waitqueue(void *, int);
>   cmd;\
>   }   \
>   finish_wait(, &__wait);  \
> + set_current_state(ostate);  \

I'm not convinced that this particular code is (or can be) race free in
the general reentrant case. The outer call to ___wait_event will miss 
any wake_up received in the inner call between XXX above (store of
current->state) and this point of restoring the previous state. So if
the inner condition evaluation or some interrupt handler happens to
trigger a wake_up meant for the outer call then it will be lost.

>  __out:   __ret;  
> \
>  })
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Preserve task state in reentrant calls to ___wait_event

2015-11-07 Thread Chris Bainbridge

On Fri, Nov 06, 2015 at 08:44:08PM +, Chris Bainbridge wrote:
> -#define ___wait_event(wq, condition, state, exclusive, ret, cmd) \
> +#define ___wait_event(wq, condition, nstate, exclusive, ret, cmd)\
>  ({   \
>   __label__ __out;\
>   wait_queue_t __wait;\
>   long __ret = ret;   /* explicit shadow */   \
> + long ostate = current->state;   \

XXX

>   \
>   INIT_LIST_HEAD(&__wait.task_list);  \
>   if (exclusive)  \
> @@ -222,16 +223,16 @@ wait_queue_head_t *bit_waitqueue(void *, int);
>   __wait.flags = 0;   \
>   \
>   for (;;) {  \
> - long __int = prepare_to_wait_event(, &__wait, state);\
> + long __int = prepare_to_wait_event(, &__wait, nstate);\
>   \
>   if (condition)  \
>   break;  \
>   \
> - if (___wait_is_interruptible(state) && __int) { \
> + if (___wait_is_interruptible(nstate) && __int) {\
>   __ret = __int;  \
>   if (exclusive) {\
>   abort_exclusive_wait(, &__wait,  \
> -  state, NULL);  \
> +  nstate, NULL); \
>   goto __out; \
>   }   \
>   break;  \
> @@ -240,6 +241,7 @@ wait_queue_head_t *bit_waitqueue(void *, int);
>   cmd;\
>   }   \
>   finish_wait(, &__wait);  \
> + set_current_state(ostate);  \

I'm not convinced that this particular code is (or can be) race free in
the general reentrant case. The outer call to ___wait_event will miss 
any wake_up received in the inner call between XXX above (store of
current->state) and this point of restoring the previous state. So if
the inner condition evaluation or some interrupt handler happens to
trigger a wake_up meant for the outer call then it will be lost.

>  __out:   __ret;  
> \
>  })
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] Preserve task state in reentrant calls to ___wait_event

2015-11-06 Thread Chris Bainbridge

In the ACPI SBS initialisation, a reentrant call to wait_event_timeout()
causes an intermittent boot stall of several minutes usually following
the "Switching to clocksource tsc" message. This stall is caused by:

 1. drivers/acpi/sbshc.c wait_transaction_complete() calls
wait_event_timeout():

if (wait_event_timeout(hc->wait, smb_check_done(hc),
   msecs_to_jiffies(timeout)))

 2. ___wait_event sets task state to uninterruptible

 3. ___wait_event calls the "condition" smb_check_done()

 4. smb_check_done (sbshc.c) calls through to ec_read() in
drivers/acpi/ec.c

 5. ec_guard() is reached which calls wait_event_timeout()

if (wait_event_timeout(ec->wait,
   ec_transaction_completed(ec),
   guard))

ie. wait_event_timeout() is being called again inside evaluation of
the previous wait_event_timeout() condition

 5. The EC IRQ handler calls wake_up() and wakes up the sleeping task in
ec_guard()

 6. The task is now in state running even though the wait "condition" is
still being evaluated

 7. The "condition" check returns false so ___wait_event calls
schedule_timeout()

 8. Since the task state is running, the scheduler immediately schedules
it again

 9. This loop repeats for around 250 seconds event though the original
wait_event_timeout was only 1000ms.

This happens because each the call to schedule_timeout() usually
returns immediately, taking less than 1ms, so the jiffies timeout
counter is not decremented. The task is now stuck in a running
state, and so is highly likely to get rescheduled immediately, which
takes less than a jiffy.

The root problem is that wait_event_timeout() does not preserve the task
state when called by tasks that are not running. We fix this by
preserving and restoring the task state in ___wait_event().

Signed-off-by: Chris Bainbridge 
---
I am assuming here that wait_event_timeout() is supposed to support reentrant
calls. If not, perhaps it should BUG_ON when called with a non-running task
state, and the SBS HC / ACPI EC code needs to be fixed to stop doing this. If
reentrant calls are supposed to work, then this patch will preserve the task
state (there may be a more appropriate way to support reentrant calls than this
exact patch, suggestions/alternatives are welcome, but this does work).
---
 include/linux/wait.h | 10 ++
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/include/linux/wait.h b/include/linux/wait.h
index 1e1bf9f..a847cf8 100644
--- a/include/linux/wait.h
+++ b/include/linux/wait.h
@@ -209,11 +209,12 @@ wait_queue_head_t *bit_waitqueue(void *, int);
  * otherwise.
  */
 
-#define ___wait_event(wq, condition, state, exclusive, ret, cmd)   \
+#define ___wait_event(wq, condition, nstate, exclusive, ret, cmd)  \
 ({ \
__label__ __out;\
wait_queue_t __wait;\
long __ret = ret;   /* explicit shadow */   \
+   long ostate = current->state;   \
\
INIT_LIST_HEAD(&__wait.task_list);  \
if (exclusive)  \
@@ -222,16 +223,16 @@ wait_queue_head_t *bit_waitqueue(void *, int);
__wait.flags = 0;   \
\
for (;;) {  \
-   long __int = prepare_to_wait_event(, &__wait, state);\
+   long __int = prepare_to_wait_event(, &__wait, nstate);\
\
if (condition)  \
break;  \
\
-   if (___wait_is_interruptible(state) && __int) { \
+   if (___wait_is_interruptible(nstate) && __int) {\
__ret = __int;  \
if (exclusive) {\
abort_exclusive_wait(, &__wait,  \
-state, NULL);  \
+nstate, NULL); \
goto __out; \
}   \
break;

[PATCH] Preserve task state in reentrant calls to ___wait_event

2015-11-06 Thread Chris Bainbridge

In the ACPI SBS initialisation, a reentrant call to wait_event_timeout()
causes an intermittent boot stall of several minutes usually following
the "Switching to clocksource tsc" message. This stall is caused by:

 1. drivers/acpi/sbshc.c wait_transaction_complete() calls
wait_event_timeout():

if (wait_event_timeout(hc->wait, smb_check_done(hc),
   msecs_to_jiffies(timeout)))

 2. ___wait_event sets task state to uninterruptible

 3. ___wait_event calls the "condition" smb_check_done()

 4. smb_check_done (sbshc.c) calls through to ec_read() in
drivers/acpi/ec.c

 5. ec_guard() is reached which calls wait_event_timeout()

if (wait_event_timeout(ec->wait,
   ec_transaction_completed(ec),
   guard))

ie. wait_event_timeout() is being called again inside evaluation of
the previous wait_event_timeout() condition

 5. The EC IRQ handler calls wake_up() and wakes up the sleeping task in
ec_guard()

 6. The task is now in state running even though the wait "condition" is
still being evaluated

 7. The "condition" check returns false so ___wait_event calls
schedule_timeout()

 8. Since the task state is running, the scheduler immediately schedules
it again

 9. This loop repeats for around 250 seconds event though the original
wait_event_timeout was only 1000ms.

This happens because each the call to schedule_timeout() usually
returns immediately, taking less than 1ms, so the jiffies timeout
counter is not decremented. The task is now stuck in a running
state, and so is highly likely to get rescheduled immediately, which
takes less than a jiffy.

The root problem is that wait_event_timeout() does not preserve the task
state when called by tasks that are not running. We fix this by
preserving and restoring the task state in ___wait_event().

Signed-off-by: Chris Bainbridge <chris.bainbri...@gmail.com>
---
I am assuming here that wait_event_timeout() is supposed to support reentrant
calls. If not, perhaps it should BUG_ON when called with a non-running task
state, and the SBS HC / ACPI EC code needs to be fixed to stop doing this. If
reentrant calls are supposed to work, then this patch will preserve the task
state (there may be a more appropriate way to support reentrant calls than this
exact patch, suggestions/alternatives are welcome, but this does work).
---
 include/linux/wait.h | 10 ++
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/include/linux/wait.h b/include/linux/wait.h
index 1e1bf9f..a847cf8 100644
--- a/include/linux/wait.h
+++ b/include/linux/wait.h
@@ -209,11 +209,12 @@ wait_queue_head_t *bit_waitqueue(void *, int);
  * otherwise.
  */
 
-#define ___wait_event(wq, condition, state, exclusive, ret, cmd)   \
+#define ___wait_event(wq, condition, nstate, exclusive, ret, cmd)  \
 ({ \
__label__ __out;\
wait_queue_t __wait;\
long __ret = ret;   /* explicit shadow */   \
+   long ostate = current->state;   \
\
INIT_LIST_HEAD(&__wait.task_list);  \
if (exclusive)  \
@@ -222,16 +223,16 @@ wait_queue_head_t *bit_waitqueue(void *, int);
__wait.flags = 0;   \
\
for (;;) {  \
-   long __int = prepare_to_wait_event(, &__wait, state);\
+   long __int = prepare_to_wait_event(, &__wait, nstate);\
\
if (condition)  \
break;  \
\
-   if (___wait_is_interruptible(state) && __int) { \
+   if (___wait_is_interruptible(nstate) && __int) {\
__ret = __int;  \
if (exclusive) {\
abort_exclusive_wait(, &__wait,  \
-state, NULL);  \
+nstate, NULL); \
goto __out; \
}   \

Re: Input: keyboard/Trackpad support for MacBookPro 12,1

2015-05-05 Thread Chris Bainbridge

On 5 May 2015 at 04:29, Yang Hongyang  wrote:
> Any ideas?

https://bugzilla.kernel.org/show_bug.cgi?id=96771
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Input: keyboard/Trackpad support for MacBookPro 12,1

2015-05-05 Thread Chris Bainbridge

On 5 May 2015 at 04:29, Yang Hongyang macrosh...@163.com wrote:
 Any ideas?

https://bugzilla.kernel.org/show_bug.cgi?id=96771
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[BUG] Macbook intermittent boot fail at "Switched to clocksource tsc"

2015-04-09 Thread Chris Bainbridge

Commit 7bc5a2b -- ACPI: Support _OSI("Darwin") correctly -- has
introduced an intermittent boot failure on an IvyBridge Macbook Pro 13
(MacBookPro10,2). The failure rate is around 12%. This was diagnosed
by doing a bisect with many boots for each kernel, then verifying
failure on 7bc5a2b, and then booting 50 times again with the prior
commit (9faf613) to verify that commit was ok (50 boots is necessary
to get a high probability >99% of hitting this).

Example fail: http://imgur.com//QHWJTYS

The failure has been observed up to and including 4.0-rc7.

The Apple BIOS is MBP102.88Z.0106.B07.1501071215 01/07/2015
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[BUG] Macbook intermittent boot fail at Switched to clocksource tsc

2015-04-09 Thread Chris Bainbridge

Commit 7bc5a2b -- ACPI: Support _OSI(Darwin) correctly -- has
introduced an intermittent boot failure on an IvyBridge Macbook Pro 13
(MacBookPro10,2). The failure rate is around 12%. This was diagnosed
by doing a bisect with many boots for each kernel, then verifying
failure on 7bc5a2b, and then booting 50 times again with the prior
commit (9faf613) to verify that commit was ok (50 boots is necessary
to get a high probability 99% of hitting this).

Example fail: http://imgur.com//QHWJTYS

The failure has been observed up to and including 4.0-rc7.

The Apple BIOS is MBP102.88Z.0106.B07.1501071215 01/07/2015
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[tip:x86/cpu] x86, cpu: Add forcepae parameter for booting PAE kernels on PAE-disabled Pentium M

2014-03-20 Thread tip-bot for Chris Bainbridge

Commit-ID:  69f2366c9456d0ce784cf5aba87ee77eeadc1d5e
Gitweb: http://git.kernel.org/tip/69f2366c9456d0ce784cf5aba87ee77eeadc1d5e
Author: Chris Bainbridge 
AuthorDate: Fri, 7 Mar 2014 18:40:42 +0700
Committer:  H. Peter Anvin 
CommitDate: Thu, 20 Mar 2014 16:31:54 -0700

x86, cpu: Add forcepae parameter for booting PAE kernels on PAE-disabled 
Pentium M

Many Pentium M systems disable PAE but may have a functionally usable PAE
implementation. This adds the "forcepae" parameter which bypasses the boot
check for PAE, and sets the CPU as being PAE capable. Using this parameter
will taint the kernel with TAINT_CPU_OUT_OF_SPEC.

Signed-off-by: Chris Bainbridge 
Link: http://lkml.kernel.org/r/20140307114040.GA4997@localhost
Acked-by: Borislav Petkov 
Signed-off-by: H. Peter Anvin 
---
 Documentation/kernel-parameters.txt |  7 +++
 arch/x86/boot/cpucheck.c| 20 
 arch/x86/kernel/cpu/intel.c | 19 +++
 3 files changed, 46 insertions(+)

diff --git a/Documentation/kernel-parameters.txt 
b/Documentation/kernel-parameters.txt
index 7116fda..06600cc 100644
--- a/Documentation/kernel-parameters.txt
+++ b/Documentation/kernel-parameters.txt
@@ -1011,6 +1011,13 @@ bytes respectively. Such letter suffixes can also be 
entirely omitted.
parameter will force ia64_sal_cache_flush to call
ia64_pal_cache_flush instead of SAL_CACHE_FLUSH.
 
+   forcepae [X86-32]
+   Forcefully enable Physical Address Extension (PAE).
+   Many Pentium M systems disable PAE but may have a
+   functionally usable PAE implementation.
+   Warning: use of this parameter will taint the kernel
+   and may cause unknown problems.
+
ftrace=[tracer]
[FTRACE] will set and start the specified tracer
as early as possible in order to facilitate early
diff --git a/arch/x86/boot/cpucheck.c b/arch/x86/boot/cpucheck.c
index 100a9a1..f0d0b20 100644
--- a/arch/x86/boot/cpucheck.c
+++ b/arch/x86/boot/cpucheck.c
@@ -67,6 +67,13 @@ static int is_transmeta(void)
   cpu_vendor[2] == A32('M', 'x', '8', '6');
 }
 
+static int is_intel(void)
+{
+   return cpu_vendor[0] == A32('G', 'e', 'n', 'u') &&
+  cpu_vendor[1] == A32('i', 'n', 'e', 'I') &&
+  cpu_vendor[2] == A32('n', 't', 'e', 'l');
+}
+
 /* Returns a bitmask of which words we have error bits in */
 static int check_cpuflags(void)
 {
@@ -153,6 +160,19 @@ int check_cpu(int *cpu_level_ptr, int *req_level_ptr, u32 
**err_flags_ptr)
asm("wrmsr" : : "a" (eax), "d" (edx), "c" (ecx));
 
err = check_cpuflags();
+   } else if (err == 0x01 &&
+  !(err_flags[0] & ~(1 << X86_FEATURE_PAE)) &&
+  is_intel() && cpu.level == 6 &&
+  (cpu.model == 9 || cpu.model == 13)) {
+   /* PAE is disabled on this Pentium M but can be forced */
+   if (cmdline_find_option_bool("forcepae")) {
+   puts("WARNING: Forcing PAE in CPU flags\n");
+   set_bit(X86_FEATURE_PAE, cpu.flags);
+   err = check_cpuflags();
+   }
+   else {
+   puts("WARNING: PAE disabled. Use parameter 'forcepae' 
to enable at your own risk!\n");
+   }
}
 
if (err_flags_ptr)
diff --git a/arch/x86/kernel/cpu/intel.c b/arch/x86/kernel/cpu/intel.c
index 34bbb55..897d620 100644
--- a/arch/x86/kernel/cpu/intel.c
+++ b/arch/x86/kernel/cpu/intel.c
@@ -186,6 +186,14 @@ static void intel_smp_check(struct cpuinfo_x86 *c)
}
 }
 
+static int forcepae;
+static int __init forcepae_setup(char *__unused)
+{
+   forcepae = 1;
+   return 1;
+}
+__setup("forcepae", forcepae_setup);
+
 static void intel_workarounds(struct cpuinfo_x86 *c)
 {
 #ifdef CONFIG_X86_F00F_BUG
@@ -214,6 +222,17 @@ static void intel_workarounds(struct cpuinfo_x86 *c)
clear_cpu_cap(c, X86_FEATURE_SEP);
 
/*
+* PAE CPUID issue: many Pentium M report no PAE but may have a
+* functionally usable PAE implementation.
+* Forcefully enable PAE if kernel parameter "forcepae" is present.
+*/
+   if (forcepae) {
+   printk(KERN_WARNING "PAE forced!\n");
+   set_cpu_cap(c, X86_FEATURE_PAE);
+   add_taint(TAINT_CPU_OUT_OF_SPEC, LOCKDEP_NOW_UNRELIABLE);
+   }
+
+   /*
 * P4 Xeon errata 037 workaround.
 * Hardware prefetcher may cause stale data to be loaded into the cache.
 */
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[tip:x86/cpu] x86, cpu: Add forcepae parameter for booting PAE kernels on PAE-disabled Pentium M

2014-03-20 Thread tip-bot for Chris Bainbridge

Commit-ID:  3e2ae0fed2a12ef72dea787db4c25466f9d89207
Gitweb: http://git.kernel.org/tip/3e2ae0fed2a12ef72dea787db4c25466f9d89207
Author: Chris Bainbridge 
AuthorDate: Fri, 7 Mar 2014 18:40:42 +0700
Committer:  H. Peter Anvin 
CommitDate: Thu, 20 Mar 2014 16:28:43 -0700

x86, cpu: Add forcepae parameter for booting PAE kernels on PAE-disabled 
Pentium M

Many Pentium M systems disable PAE but may have a functionally usable PAE
implementation. This adds the "forcepae" parameter which bypasses the boot
check for PAE, and sets the CPU as being PAE capable. Using this parameter
will taint the kernel with TAINT_CPU_OUT_OF_SPEC.

Signed-off-by: Chris Bainbridge 
Link: http://lkml.kernel.org/r/20140307114040.GA4997@localhost
Signed-off-by: H. Peter Anvin 
---
 Documentation/kernel-parameters.txt |  7 +++
 arch/x86/boot/cpucheck.c| 20 
 arch/x86/kernel/cpu/intel.c | 19 +++
 3 files changed, 46 insertions(+)

diff --git a/Documentation/kernel-parameters.txt 
b/Documentation/kernel-parameters.txt
index 7116fda..06600cc 100644
--- a/Documentation/kernel-parameters.txt
+++ b/Documentation/kernel-parameters.txt
@@ -1011,6 +1011,13 @@ bytes respectively. Such letter suffixes can also be 
entirely omitted.
parameter will force ia64_sal_cache_flush to call
ia64_pal_cache_flush instead of SAL_CACHE_FLUSH.
 
+   forcepae [X86-32]
+   Forcefully enable Physical Address Extension (PAE).
+   Many Pentium M systems disable PAE but may have a
+   functionally usable PAE implementation.
+   Warning: use of this parameter will taint the kernel
+   and may cause unknown problems.
+
ftrace=[tracer]
[FTRACE] will set and start the specified tracer
as early as possible in order to facilitate early
diff --git a/arch/x86/boot/cpucheck.c b/arch/x86/boot/cpucheck.c
index 100a9a1..f0d0b20 100644
--- a/arch/x86/boot/cpucheck.c
+++ b/arch/x86/boot/cpucheck.c
@@ -67,6 +67,13 @@ static int is_transmeta(void)
   cpu_vendor[2] == A32('M', 'x', '8', '6');
 }
 
+static int is_intel(void)
+{
+   return cpu_vendor[0] == A32('G', 'e', 'n', 'u') &&
+  cpu_vendor[1] == A32('i', 'n', 'e', 'I') &&
+  cpu_vendor[2] == A32('n', 't', 'e', 'l');
+}
+
 /* Returns a bitmask of which words we have error bits in */
 static int check_cpuflags(void)
 {
@@ -153,6 +160,19 @@ int check_cpu(int *cpu_level_ptr, int *req_level_ptr, u32 
**err_flags_ptr)
asm("wrmsr" : : "a" (eax), "d" (edx), "c" (ecx));
 
err = check_cpuflags();
+   } else if (err == 0x01 &&
+  !(err_flags[0] & ~(1 << X86_FEATURE_PAE)) &&
+  is_intel() && cpu.level == 6 &&
+  (cpu.model == 9 || cpu.model == 13)) {
+   /* PAE is disabled on this Pentium M but can be forced */
+   if (cmdline_find_option_bool("forcepae")) {
+   puts("WARNING: Forcing PAE in CPU flags\n");
+   set_bit(X86_FEATURE_PAE, cpu.flags);
+   err = check_cpuflags();
+   }
+   else {
+   puts("WARNING: PAE disabled. Use parameter 'forcepae' 
to enable at your own risk!\n");
+   }
}
 
if (err_flags_ptr)
diff --git a/arch/x86/kernel/cpu/intel.c b/arch/x86/kernel/cpu/intel.c
index 34bbb55..897d620 100644
--- a/arch/x86/kernel/cpu/intel.c
+++ b/arch/x86/kernel/cpu/intel.c
@@ -186,6 +186,14 @@ static void intel_smp_check(struct cpuinfo_x86 *c)
}
 }
 
+static int forcepae;
+static int __init forcepae_setup(char *__unused)
+{
+   forcepae = 1;
+   return 1;
+}
+__setup("forcepae", forcepae_setup);
+
 static void intel_workarounds(struct cpuinfo_x86 *c)
 {
 #ifdef CONFIG_X86_F00F_BUG
@@ -214,6 +222,17 @@ static void intel_workarounds(struct cpuinfo_x86 *c)
clear_cpu_cap(c, X86_FEATURE_SEP);
 
/*
+* PAE CPUID issue: many Pentium M report no PAE but may have a
+* functionally usable PAE implementation.
+* Forcefully enable PAE if kernel parameter "forcepae" is present.
+*/
+   if (forcepae) {
+   printk(KERN_WARNING "PAE forced!\n");
+   set_cpu_cap(c, X86_FEATURE_PAE);
+   add_taint(TAINT_CPU_OUT_OF_SPEC, LOCKDEP_NOW_UNRELIABLE);
+   }
+
+   /*
 * P4 Xeon errata 037 workaround.
 * Hardware prefetcher may cause stale data to be loaded into the cache.
 */
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[tip:x86/cpu] x86, cpu: Add forcepae parameter for booting PAE kernels on PAE-disabled Pentium M

2014-03-20 Thread tip-bot for Chris Bainbridge

Commit-ID:  3e2ae0fed2a12ef72dea787db4c25466f9d89207
Gitweb: http://git.kernel.org/tip/3e2ae0fed2a12ef72dea787db4c25466f9d89207
Author: Chris Bainbridge chris.bainbri...@gmail.com
AuthorDate: Fri, 7 Mar 2014 18:40:42 +0700
Committer:  H. Peter Anvin h...@zytor.com
CommitDate: Thu, 20 Mar 2014 16:28:43 -0700

x86, cpu: Add forcepae parameter for booting PAE kernels on PAE-disabled 
Pentium M

Many Pentium M systems disable PAE but may have a functionally usable PAE
implementation. This adds the forcepae parameter which bypasses the boot
check for PAE, and sets the CPU as being PAE capable. Using this parameter
will taint the kernel with TAINT_CPU_OUT_OF_SPEC.

Signed-off-by: Chris Bainbridge chris.bainbri...@gmail.com
Link: http://lkml.kernel.org/r/20140307114040.GA4997@localhost
Signed-off-by: H. Peter Anvin h...@zytor.com
---
 Documentation/kernel-parameters.txt |  7 +++
 arch/x86/boot/cpucheck.c| 20 
 arch/x86/kernel/cpu/intel.c | 19 +++
 3 files changed, 46 insertions(+)

diff --git a/Documentation/kernel-parameters.txt 
b/Documentation/kernel-parameters.txt
index 7116fda..06600cc 100644
--- a/Documentation/kernel-parameters.txt
+++ b/Documentation/kernel-parameters.txt
@@ -1011,6 +1011,13 @@ bytes respectively. Such letter suffixes can also be 
entirely omitted.
parameter will force ia64_sal_cache_flush to call
ia64_pal_cache_flush instead of SAL_CACHE_FLUSH.
 
+   forcepae [X86-32]
+   Forcefully enable Physical Address Extension (PAE).
+   Many Pentium M systems disable PAE but may have a
+   functionally usable PAE implementation.
+   Warning: use of this parameter will taint the kernel
+   and may cause unknown problems.
+
ftrace=[tracer]
[FTRACE] will set and start the specified tracer
as early as possible in order to facilitate early
diff --git a/arch/x86/boot/cpucheck.c b/arch/x86/boot/cpucheck.c
index 100a9a1..f0d0b20 100644
--- a/arch/x86/boot/cpucheck.c
+++ b/arch/x86/boot/cpucheck.c
@@ -67,6 +67,13 @@ static int is_transmeta(void)
   cpu_vendor[2] == A32('M', 'x', '8', '6');
 }
 
+static int is_intel(void)
+{
+   return cpu_vendor[0] == A32('G', 'e', 'n', 'u') 
+  cpu_vendor[1] == A32('i', 'n', 'e', 'I') 
+  cpu_vendor[2] == A32('n', 't', 'e', 'l');
+}
+
 /* Returns a bitmask of which words we have error bits in */
 static int check_cpuflags(void)
 {
@@ -153,6 +160,19 @@ int check_cpu(int *cpu_level_ptr, int *req_level_ptr, u32 
**err_flags_ptr)
asm(wrmsr : : a (eax), d (edx), c (ecx));
 
err = check_cpuflags();
+   } else if (err == 0x01 
+  !(err_flags[0]  ~(1  X86_FEATURE_PAE)) 
+  is_intel()  cpu.level == 6 
+  (cpu.model == 9 || cpu.model == 13)) {
+   /* PAE is disabled on this Pentium M but can be forced */
+   if (cmdline_find_option_bool(forcepae)) {
+   puts(WARNING: Forcing PAE in CPU flags\n);
+   set_bit(X86_FEATURE_PAE, cpu.flags);
+   err = check_cpuflags();
+   }
+   else {
+   puts(WARNING: PAE disabled. Use parameter 'forcepae' 
to enable at your own risk!\n);
+   }
}
 
if (err_flags_ptr)
diff --git a/arch/x86/kernel/cpu/intel.c b/arch/x86/kernel/cpu/intel.c
index 34bbb55..897d620 100644
--- a/arch/x86/kernel/cpu/intel.c
+++ b/arch/x86/kernel/cpu/intel.c
@@ -186,6 +186,14 @@ static void intel_smp_check(struct cpuinfo_x86 *c)
}
 }
 
+static int forcepae;
+static int __init forcepae_setup(char *__unused)
+{
+   forcepae = 1;
+   return 1;
+}
+__setup(forcepae, forcepae_setup);
+
 static void intel_workarounds(struct cpuinfo_x86 *c)
 {
 #ifdef CONFIG_X86_F00F_BUG
@@ -214,6 +222,17 @@ static void intel_workarounds(struct cpuinfo_x86 *c)
clear_cpu_cap(c, X86_FEATURE_SEP);
 
/*
+* PAE CPUID issue: many Pentium M report no PAE but may have a
+* functionally usable PAE implementation.
+* Forcefully enable PAE if kernel parameter forcepae is present.
+*/
+   if (forcepae) {
+   printk(KERN_WARNING PAE forced!\n);
+   set_cpu_cap(c, X86_FEATURE_PAE);
+   add_taint(TAINT_CPU_OUT_OF_SPEC, LOCKDEP_NOW_UNRELIABLE);
+   }
+
+   /*
 * P4 Xeon errata 037 workaround.
 * Hardware prefetcher may cause stale data to be loaded into the cache.
 */
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[tip:x86/cpu] x86, cpu: Add forcepae parameter for booting PAE kernels on PAE-disabled Pentium M

2014-03-20 Thread tip-bot for Chris Bainbridge

Commit-ID:  69f2366c9456d0ce784cf5aba87ee77eeadc1d5e
Gitweb: http://git.kernel.org/tip/69f2366c9456d0ce784cf5aba87ee77eeadc1d5e
Author: Chris Bainbridge chris.bainbri...@gmail.com
AuthorDate: Fri, 7 Mar 2014 18:40:42 +0700
Committer:  H. Peter Anvin h...@zytor.com
CommitDate: Thu, 20 Mar 2014 16:31:54 -0700

x86, cpu: Add forcepae parameter for booting PAE kernels on PAE-disabled 
Pentium M

Many Pentium M systems disable PAE but may have a functionally usable PAE
implementation. This adds the forcepae parameter which bypasses the boot
check for PAE, and sets the CPU as being PAE capable. Using this parameter
will taint the kernel with TAINT_CPU_OUT_OF_SPEC.

Signed-off-by: Chris Bainbridge chris.bainbri...@gmail.com
Link: http://lkml.kernel.org/r/20140307114040.GA4997@localhost
Acked-by: Borislav Petkov b...@suse.de
Signed-off-by: H. Peter Anvin h...@zytor.com
---
 Documentation/kernel-parameters.txt |  7 +++
 arch/x86/boot/cpucheck.c| 20 
 arch/x86/kernel/cpu/intel.c | 19 +++
 3 files changed, 46 insertions(+)

diff --git a/Documentation/kernel-parameters.txt 
b/Documentation/kernel-parameters.txt
index 7116fda..06600cc 100644
--- a/Documentation/kernel-parameters.txt
+++ b/Documentation/kernel-parameters.txt
@@ -1011,6 +1011,13 @@ bytes respectively. Such letter suffixes can also be 
entirely omitted.
parameter will force ia64_sal_cache_flush to call
ia64_pal_cache_flush instead of SAL_CACHE_FLUSH.
 
+   forcepae [X86-32]
+   Forcefully enable Physical Address Extension (PAE).
+   Many Pentium M systems disable PAE but may have a
+   functionally usable PAE implementation.
+   Warning: use of this parameter will taint the kernel
+   and may cause unknown problems.
+
ftrace=[tracer]
[FTRACE] will set and start the specified tracer
as early as possible in order to facilitate early
diff --git a/arch/x86/boot/cpucheck.c b/arch/x86/boot/cpucheck.c
index 100a9a1..f0d0b20 100644
--- a/arch/x86/boot/cpucheck.c
+++ b/arch/x86/boot/cpucheck.c
@@ -67,6 +67,13 @@ static int is_transmeta(void)
   cpu_vendor[2] == A32('M', 'x', '8', '6');
 }
 
+static int is_intel(void)
+{
+   return cpu_vendor[0] == A32('G', 'e', 'n', 'u') 
+  cpu_vendor[1] == A32('i', 'n', 'e', 'I') 
+  cpu_vendor[2] == A32('n', 't', 'e', 'l');
+}
+
 /* Returns a bitmask of which words we have error bits in */
 static int check_cpuflags(void)
 {
@@ -153,6 +160,19 @@ int check_cpu(int *cpu_level_ptr, int *req_level_ptr, u32 
**err_flags_ptr)
asm(wrmsr : : a (eax), d (edx), c (ecx));
 
err = check_cpuflags();
+   } else if (err == 0x01 
+  !(err_flags[0]  ~(1  X86_FEATURE_PAE)) 
+  is_intel()  cpu.level == 6 
+  (cpu.model == 9 || cpu.model == 13)) {
+   /* PAE is disabled on this Pentium M but can be forced */
+   if (cmdline_find_option_bool(forcepae)) {
+   puts(WARNING: Forcing PAE in CPU flags\n);
+   set_bit(X86_FEATURE_PAE, cpu.flags);
+   err = check_cpuflags();
+   }
+   else {
+   puts(WARNING: PAE disabled. Use parameter 'forcepae' 
to enable at your own risk!\n);
+   }
}
 
if (err_flags_ptr)
diff --git a/arch/x86/kernel/cpu/intel.c b/arch/x86/kernel/cpu/intel.c
index 34bbb55..897d620 100644
--- a/arch/x86/kernel/cpu/intel.c
+++ b/arch/x86/kernel/cpu/intel.c
@@ -186,6 +186,14 @@ static void intel_smp_check(struct cpuinfo_x86 *c)
}
 }
 
+static int forcepae;
+static int __init forcepae_setup(char *__unused)
+{
+   forcepae = 1;
+   return 1;
+}
+__setup(forcepae, forcepae_setup);
+
 static void intel_workarounds(struct cpuinfo_x86 *c)
 {
 #ifdef CONFIG_X86_F00F_BUG
@@ -214,6 +222,17 @@ static void intel_workarounds(struct cpuinfo_x86 *c)
clear_cpu_cap(c, X86_FEATURE_SEP);
 
/*
+* PAE CPUID issue: many Pentium M report no PAE but may have a
+* functionally usable PAE implementation.
+* Forcefully enable PAE if kernel parameter forcepae is present.
+*/
+   if (forcepae) {
+   printk(KERN_WARNING PAE forced!\n);
+   set_cpu_cap(c, X86_FEATURE_PAE);
+   add_taint(TAINT_CPU_OUT_OF_SPEC, LOCKDEP_NOW_UNRELIABLE);
+   }
+
+   /*
 * P4 Xeon errata 037 workaround.
 * Hardware prefetcher may cause stale data to be loaded into the cache.
 */
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http

[PATCH] x86: Add forcepae parameter for booting PAE kernels on PAE-disabled Pentium M

2014-03-07 Thread Chris Bainbridge

From: Chris Bainbridge 

Many Pentium M systems disable PAE but may have a functionally usable PAE 
implementation. This adds the "forcepae" parameter which bypasses the boot 
check for PAE, and sets the CPU as being PAE capable. Using this parameter 
will taint the kernel with TAINT_CPU_OUT_OF_SPEC.

Signed-off-by: Chris Bainbridge 
---
This patch depends on Dave Jones's TAINT_CPU_OUT_OF_SPEC patch @ 
https://lkml.org/lkml/2014/2/26/394

diff --git a/Documentation/kernel-parameters.txt 
b/Documentation/kernel-parameters.txt
index 580a60c..67755ea 100644
--- a/Documentation/kernel-parameters.txt
+++ b/Documentation/kernel-parameters.txt
@@ -1011,6 +1011,13 @@ bytes respectively. Such letter suffixes can also be 
entirely omitted.
parameter will force ia64_sal_cache_flush to call
ia64_pal_cache_flush instead of SAL_CACHE_FLUSH.
 
+   forcepae [X86-32]
+   Forcefully enable Physical Address Extension (PAE).
+   Many Pentium M systems disable PAE but may have a
+   functionally usable PAE implementation.
+   Warning: use of this parameter will taint the kernel
+   and may cause unknown problems.
+
ftrace=[tracer]
[FTRACE] will set and start the specified tracer
as early as possible in order to facilitate early
diff --git a/arch/x86/boot/cpucheck.c b/arch/x86/boot/cpucheck.c
index 100a9a1..f0d0b20 100644
--- a/arch/x86/boot/cpucheck.c
+++ b/arch/x86/boot/cpucheck.c
@@ -67,6 +67,13 @@ static int is_transmeta(void)
   cpu_vendor[2] == A32('M', 'x', '8', '6');
 }
 
+static int is_intel(void)
+{
+   return cpu_vendor[0] == A32('G', 'e', 'n', 'u') &&
+  cpu_vendor[1] == A32('i', 'n', 'e', 'I') &&
+  cpu_vendor[2] == A32('n', 't', 'e', 'l');
+}
+
 /* Returns a bitmask of which words we have error bits in */
 static int check_cpuflags(void)
 {
@@ -153,6 +160,19 @@ int check_cpu(int *cpu_level_ptr, int *req_level_ptr, u32 
**err_flags_ptr)
asm("wrmsr" : : "a" (eax), "d" (edx), "c" (ecx));
 
err = check_cpuflags();
+   } else if (err == 0x01 &&
+  !(err_flags[0] & ~(1 << X86_FEATURE_PAE)) &&
+  is_intel() && cpu.level == 6 &&
+  (cpu.model == 9 || cpu.model == 13)) {
+   /* PAE is disabled on this Pentium M but can be forced */
+   if (cmdline_find_option_bool("forcepae")) {
+   puts("WARNING: Forcing PAE in CPU flags\n");
+   set_bit(X86_FEATURE_PAE, cpu.flags);
+   err = check_cpuflags();
+   }
+   else {
+   puts("WARNING: PAE disabled. Use parameter 'forcepae' 
to enable at your own risk!\n");
+   }
}
 
if (err_flags_ptr)
diff --git a/arch/x86/kernel/cpu/intel.c b/arch/x86/kernel/cpu/intel.c
index ea56e7c..053cb59 100644
--- a/arch/x86/kernel/cpu/intel.c
+++ b/arch/x86/kernel/cpu/intel.c
@@ -195,6 +195,14 @@ static void intel_smp_check(struct cpuinfo_x86 *c)
}
 }
 
+static int forcepae;
+static int __init forcepae_setup(char *__unused)
+{
+   forcepae = 1;
+   return 1;
+}
+__setup("forcepae", forcepae_setup);
+
 static void intel_workarounds(struct cpuinfo_x86 *c)
 {
unsigned long lo, hi;
@@ -225,6 +233,17 @@ static void intel_workarounds(struct cpuinfo_x86 *c)
clear_cpu_cap(c, X86_FEATURE_SEP);
 
/*
+* PAE CPUID issue: many Pentium M report no PAE but may have a
+* functionally usable PAE implementation.
+* Forcefully enable PAE if kernel parameter "forcepae" is present.
+*/
+   if (forcepae) {
+   printk(KERN_WARNING "PAE forced!\n");
+   set_cpu_cap(c, X86_FEATURE_PAE);
+   add_taint(TAINT_CPU_OUT_OF_SPEC, LOCKDEP_NOW_UNRELIABLE);
+   }
+
+   /*
 * P4 Xeon errata 037 workaround.
 * Hardware prefetcher may cause stale data to be loaded into the cache.
 */
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] x86: Add forcepae parameter for booting PAE kernels on PAE-disabled Pentium M

2014-03-07 Thread Chris Bainbridge

From: Chris Bainbridge chris.bainbri...@gmail.com

Many Pentium M systems disable PAE but may have a functionally usable PAE 
implementation. This adds the forcepae parameter which bypasses the boot 
check for PAE, and sets the CPU as being PAE capable. Using this parameter 
will taint the kernel with TAINT_CPU_OUT_OF_SPEC.

Signed-off-by: Chris Bainbridge chris.bainbri...@gmail.com
---
This patch depends on Dave Jones's TAINT_CPU_OUT_OF_SPEC patch @ 
https://lkml.org/lkml/2014/2/26/394

diff --git a/Documentation/kernel-parameters.txt 
b/Documentation/kernel-parameters.txt
index 580a60c..67755ea 100644
--- a/Documentation/kernel-parameters.txt
+++ b/Documentation/kernel-parameters.txt
@@ -1011,6 +1011,13 @@ bytes respectively. Such letter suffixes can also be 
entirely omitted.
parameter will force ia64_sal_cache_flush to call
ia64_pal_cache_flush instead of SAL_CACHE_FLUSH.
 
+   forcepae [X86-32]
+   Forcefully enable Physical Address Extension (PAE).
+   Many Pentium M systems disable PAE but may have a
+   functionally usable PAE implementation.
+   Warning: use of this parameter will taint the kernel
+   and may cause unknown problems.
+
ftrace=[tracer]
[FTRACE] will set and start the specified tracer
as early as possible in order to facilitate early
diff --git a/arch/x86/boot/cpucheck.c b/arch/x86/boot/cpucheck.c
index 100a9a1..f0d0b20 100644
--- a/arch/x86/boot/cpucheck.c
+++ b/arch/x86/boot/cpucheck.c
@@ -67,6 +67,13 @@ static int is_transmeta(void)
   cpu_vendor[2] == A32('M', 'x', '8', '6');
 }
 
+static int is_intel(void)
+{
+   return cpu_vendor[0] == A32('G', 'e', 'n', 'u') 
+  cpu_vendor[1] == A32('i', 'n', 'e', 'I') 
+  cpu_vendor[2] == A32('n', 't', 'e', 'l');
+}
+
 /* Returns a bitmask of which words we have error bits in */
 static int check_cpuflags(void)
 {
@@ -153,6 +160,19 @@ int check_cpu(int *cpu_level_ptr, int *req_level_ptr, u32 
**err_flags_ptr)
asm(wrmsr : : a (eax), d (edx), c (ecx));
 
err = check_cpuflags();
+   } else if (err == 0x01 
+  !(err_flags[0]  ~(1  X86_FEATURE_PAE)) 
+  is_intel()  cpu.level == 6 
+  (cpu.model == 9 || cpu.model == 13)) {
+   /* PAE is disabled on this Pentium M but can be forced */
+   if (cmdline_find_option_bool(forcepae)) {
+   puts(WARNING: Forcing PAE in CPU flags\n);
+   set_bit(X86_FEATURE_PAE, cpu.flags);
+   err = check_cpuflags();
+   }
+   else {
+   puts(WARNING: PAE disabled. Use parameter 'forcepae' 
to enable at your own risk!\n);
+   }
}
 
if (err_flags_ptr)
diff --git a/arch/x86/kernel/cpu/intel.c b/arch/x86/kernel/cpu/intel.c
index ea56e7c..053cb59 100644
--- a/arch/x86/kernel/cpu/intel.c
+++ b/arch/x86/kernel/cpu/intel.c
@@ -195,6 +195,14 @@ static void intel_smp_check(struct cpuinfo_x86 *c)
}
 }
 
+static int forcepae;
+static int __init forcepae_setup(char *__unused)
+{
+   forcepae = 1;
+   return 1;
+}
+__setup(forcepae, forcepae_setup);
+
 static void intel_workarounds(struct cpuinfo_x86 *c)
 {
unsigned long lo, hi;
@@ -225,6 +233,17 @@ static void intel_workarounds(struct cpuinfo_x86 *c)
clear_cpu_cap(c, X86_FEATURE_SEP);
 
/*
+* PAE CPUID issue: many Pentium M report no PAE but may have a
+* functionally usable PAE implementation.
+* Forcefully enable PAE if kernel parameter forcepae is present.
+*/
+   if (forcepae) {
+   printk(KERN_WARNING PAE forced!\n);
+   set_cpu_cap(c, X86_FEATURE_PAE);
+   add_taint(TAINT_CPU_OUT_OF_SPEC, LOCKDEP_NOW_UNRELIABLE);
+   }
+
+   /*
 * P4 Xeon errata 037 workaround.
 * Hardware prefetcher may cause stale data to be loaded into the cache.
 */
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Re: [PATCH] x86: set Pentium M as PAE capable

2014-03-04 Thread Chris Bainbridge

On 4 March 2014 17:44, Borislav Petkov  wrote:
> On Tue, Mar 04, 2014 at 01:06:03PM +0700, Chris Bainbridge wrote:
>> On Mon, Mar 03, 2014 at 09:04:19PM -0800, H. Peter Anvin wrote:
>> > forcepae is descriptive.
>>
>> Back to forcepae.
>
> Ok, it looks ok to me after a quick look. Now you only have to ask
> Dave whether he's fine with you merging his patch into yours, write a
> nice commit message explaining why this patch is needed and give it a
> final test on your machine.

Dave - are you ok with the merge?

I take it I am expected to re-post the most recent patch plus
changelog entry as a new thread (and with a subject that better
reflects the new patch)?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Re: [PATCH] x86: set Pentium M as PAE capable

2014-03-04 Thread Chris Bainbridge

On 4 March 2014 17:44, Borislav Petkov b...@alien8.de wrote:
 On Tue, Mar 04, 2014 at 01:06:03PM +0700, Chris Bainbridge wrote:
 On Mon, Mar 03, 2014 at 09:04:19PM -0800, H. Peter Anvin wrote:
  forcepae is descriptive.

 Back to forcepae.

 Ok, it looks ok to me after a quick look. Now you only have to ask
 Dave whether he's fine with you merging his patch into yours, write a
 nice commit message explaining why this patch is needed and give it a
 final test on your machine.

Dave - are you ok with the merge?

I take it I am expected to re-post the most recent patch plus
changelog entry as a new thread (and with a subject that better
reflects the new patch)?
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Re: [PATCH] x86: set Pentium M as PAE capable

2014-03-03 Thread Chris Bainbridge

On Mon, Mar 03, 2014 at 09:04:19PM -0800, H. Peter Anvin wrote:
> forcepae is descriptive.

Back to forcepae.

Signed-off-by: Chris Bainbridge 
---
diff --git a/Documentation/kernel-parameters.txt 
b/Documentation/kernel-parameters.txt
index 580a60c..67755ea 100644
--- a/Documentation/kernel-parameters.txt
+++ b/Documentation/kernel-parameters.txt
@@ -1011,6 +1011,13 @@ bytes respectively. Such letter suffixes can also be 
entirely omitted.
parameter will force ia64_sal_cache_flush to call
ia64_pal_cache_flush instead of SAL_CACHE_FLUSH.
 
+   forcepae [X86-32]
+   Forcefully enable Physical Address Extension (PAE).
+   Many Pentium M systems disable PAE but may have a
+   functionally usable PAE implementation.
+   Warning: use of this parameter will taint the kernel
+   and may cause unknown problems.
+
ftrace=[tracer]
[FTRACE] will set and start the specified tracer
as early as possible in order to facilitate early
diff --git a/arch/x86/boot/cpucheck.c b/arch/x86/boot/cpucheck.c
index 100a9a1..f0d0b20 100644
--- a/arch/x86/boot/cpucheck.c
+++ b/arch/x86/boot/cpucheck.c
@@ -67,6 +67,13 @@ static int is_transmeta(void)
   cpu_vendor[2] == A32('M', 'x', '8', '6');
 }
 
+static int is_intel(void)
+{
+   return cpu_vendor[0] == A32('G', 'e', 'n', 'u') &&
+  cpu_vendor[1] == A32('i', 'n', 'e', 'I') &&
+  cpu_vendor[2] == A32('n', 't', 'e', 'l');
+}
+
 /* Returns a bitmask of which words we have error bits in */
 static int check_cpuflags(void)
 {
@@ -153,6 +160,19 @@ int check_cpu(int *cpu_level_ptr, int *req_level_ptr, u32 
**err_flags_ptr)
asm("wrmsr" : : "a" (eax), "d" (edx), "c" (ecx));
 
err = check_cpuflags();
+   } else if (err == 0x01 &&
+  !(err_flags[0] & ~(1 << X86_FEATURE_PAE)) &&
+  is_intel() && cpu.level == 6 &&
+  (cpu.model == 9 || cpu.model == 13)) {
+   /* PAE is disabled on this Pentium M but can be forced */
+   if (cmdline_find_option_bool("forcepae")) {
+   puts("WARNING: Forcing PAE in CPU flags\n");
+   set_bit(X86_FEATURE_PAE, cpu.flags);
+   err = check_cpuflags();
+   }
+   else {
+   puts("WARNING: PAE disabled. Use parameter 'forcepae' 
to enable at your own risk!\n");
+   }
}
 
if (err_flags_ptr)
diff --git a/arch/x86/kernel/cpu/amd.c b/arch/x86/kernel/cpu/amd.c
index c67ffa6..7ec4119 100644
--- a/arch/x86/kernel/cpu/amd.c
+++ b/arch/x86/kernel/cpu/amd.c
@@ -218,7 +218,7 @@ static void amd_k7_smp_check(struct cpuinfo_x86 *c)
 */
WARN_ONCE(1, "WARNING: This combination of AMD"
" processors is not suitable for SMP.\n");
-   add_taint(TAINT_UNSAFE_SMP, LOCKDEP_NOW_UNRELIABLE);
+   add_taint(TAINT_CPU_OUT_OF_SPEC, LOCKDEP_NOW_UNRELIABLE);
 }
 
 static void init_amd_k7(struct cpuinfo_x86 *c)
diff --git a/arch/x86/kernel/cpu/intel.c b/arch/x86/kernel/cpu/intel.c
index ea56e7c..053cb59 100644
--- a/arch/x86/kernel/cpu/intel.c
+++ b/arch/x86/kernel/cpu/intel.c
@@ -195,6 +195,14 @@ static void intel_smp_check(struct cpuinfo_x86 *c)
}
 }
 
+static int forcepae;
+static int __init forcepae_setup(char *__unused)
+{
+   forcepae = 1;
+   return 1;
+}
+__setup("forcepae", forcepae_setup);
+
 static void intel_workarounds(struct cpuinfo_x86 *c)
 {
unsigned long lo, hi;
@@ -225,6 +233,17 @@ static void intel_workarounds(struct cpuinfo_x86 *c)
clear_cpu_cap(c, X86_FEATURE_SEP);
 
/*
+* PAE CPUID issue: many Pentium M report no PAE but may have a
+* functionally usable PAE implementation.
+* Forcefully enable PAE if kernel parameter "forcepae" is present.
+*/
+   if (forcepae) {
+   printk(KERN_WARNING "PAE forced!\n");
+   set_cpu_cap(c, X86_FEATURE_PAE);
+   add_taint(TAINT_CPU_OUT_OF_SPEC, LOCKDEP_NOW_UNRELIABLE);
+   }
+
+   /*
 * P4 Xeon errata 037 workaround.
 * Hardware prefetcher may cause stale data to be loaded into the cache.
 */
diff --git a/include/linux/kernel.h b/include/linux/kernel.h
index 196d1ea..08fb024 100644
--- a/include/linux/kernel.h
+++ b/include/linux/kernel.h
@@ -458,7 +458,7 @@ extern enum system_states {
 
 #define TAINT_PROPRIETARY_MODULE   0
 #define TAINT_FORCED_MODULE1
-#define TAINT_UNSAFE_SMP   2
+#define TAINT_CPU_OUT_OF_SPEC  2
 #define TAINT_FORCED_RMMOD 3
 #defin

Re: Re: [PATCH] x86: set Pentium M as PAE capable

2014-03-03 Thread Chris Bainbridge

On Mon, Mar 03, 2014 at 08:29:39PM +0100, Borislav Petkov wrote:
> On Mon, Mar 03, 2014 at 03:04:35PM +0700, Chris Bainbridge wrote:
> > On 3 March 2014 02:05, Roland Kletzing  wrote:
> > > i would recommend adding the newly introduced param to
> > > Documentation/kernel-
> > > parameters.txt , though.
> > 
> > Done.
> > 
> > Signed-off-by: Chris Bainbridge 
> > ---
> > diff --git a/Documentation/kernel-parameters.txt 
> > b/Documentation/kernel-parameters.txt
> > index b9e9bd8..388b5e9 100644
> > --- a/Documentation/kernel-parameters.txt
> > +++ b/Documentation/kernel-parameters.txt
> > @@ -962,6 +962,13 @@ bytes respectively. Such letter suffixes can also be 
> > entirely omitted.
> > parameter will force ia64_sal_cache_flush to call
> > ia64_pal_cache_flush instead of SAL_CACHE_FLUSH.
> >  
> > +   forcepae [X86-32]
> > +   Forcefully enable Physical Address Extension (PAE).
> > +   Many Pentium M systems disable PAE but may have a
> > +   functionally usable PAE implementation.
> > +   Note: This parameter is unsupported, may cause unknown
> 
> What does "unsupported" mean here exactly?

It was supposed to have the dual meaning that neither the kernel
developers or Intel are going to help you if it doesn't work. But
perhaps it is unnecessary - being tainted implies that the kernel
developers won't help, and Intel certainly won't be interested.

> 
> This function is called check_cpuflags() now. You probably want to redo
> your patch against tip/master, i.e.:
> 
> git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git#master
> 

Done, new patch is against tip.

> > +   }
> > +   else {
> > +   puts("ERROR: PAE is disabled on this Pentium M\n"
> > +   "(PAE can potentially be enabled with "
> > +   "kernel parameter\n"
> > +   "\"forcepae\" - this is unsupported, may "
> > +   "cause unknown\n"
> > +   "problems, and will taint the kernel)\n");
> 
> This string could definitely violate the 80 cols rule so that it is much
> more readable:
> 
>   }
>   else
>   puts("WARNING: PAE disabled. Use \"forcepae\" to enable 
> at your own risk!\n");
> 
> I've shortened it to the most relevant info only. No need to say we're
> tainting the kernel because LOCKDEP_NOW_UNRELIABLE will cause that
> anyway below.

Ok I changed it to: "WARNING: PAE disabled. Use parameter 'pae' to
enable at your own risk!\n""

> > diff --git a/arch/x86/kernel/cpu/intel.c b/arch/x86/kernel/cpu/intel.c
> > index bbe1b8b..271686d 100644
> > --- a/arch/x86/kernel/cpu/intel.c
> > +++ b/arch/x86/kernel/cpu/intel.c
> > @@ -196,6 +196,14 @@ static void intel_smp_check(struct cpuinfo_x86 *c)
> > }
> >  }
> >  
> > +static int forcepae;
> > +static int __init forcepae_setup(char *__unused)
> > +{
> > +   forcepae = 1;
> > +   return 1;
> > +}
> > +__setup("forcepae", forcepae_setup);
> 
> Yeah, why not simply call it "pae"? It is smaller and the letter
> combination is not used yet and it means the same.

I don't see any reason not to just use "pae" "forcepae" is perhaps a bit
more descriptive but the other text in the patch clearly describes the
parameter so it doesn't really matter.

> >  static void intel_workarounds(struct cpuinfo_x86 *c)
> >  {
> > unsigned long lo, hi;
> > @@ -226,6 +234,17 @@ static void intel_workarounds(struct cpuinfo_x86 *c)
> > clear_cpu_cap(c, X86_FEATURE_SEP);
> >  
> > /*
> > +* PAE CPUID issue: many Pentium M report no PAE but may have a
> > +    * functionally usable PAE implementation.
> > +* Forcefully enable PAE if kernel parameter "forcepae" is present.
> > +*/
> > +   if (forcepae) {
> > +   printk(KERN_WARNING "PAE forced!\n");
> > +   set_cpu_cap(c, X86_FEATURE_PAE);
> > +   add_taint(TAINT_CPU_OUT_OF_SPEC, LOCKDEP_NOW_UNRELIABLE);
> 
> Right, this implies Dave's patch is preceding yours. I guess hpa can
> fish it out from the thread when applying.

I will include Dave's patch, it is trivial. 

New patch with all above changes follows.

Signed-off-by: Chris Bainbridge 
---

diff --git a/Documentation/kernel-par

Re: Re: [PATCH] x86: set Pentium M as PAE capable

2014-03-03 Thread Chris Bainbridge

On Sun, Mar 02, 2014 at 09:56:19PM +0100, Andreas Mohr wrote:
> Hi,
> 
> > /*
> > +* PAE CPUID bug: Pentium M reports no PAE but has PAE
> > +*/
> 
> Ain't that a tad strongly/incorrectly worded?

I've updated the wording.

On 3 March 2014 02:05, Roland Kletzing  wrote:
> i would recommend adding the newly introduced param to
> Documentation/kernel-
> parameters.txt , though.

Done.

Signed-off-by: Chris Bainbridge 
---
diff --git a/Documentation/kernel-parameters.txt 
b/Documentation/kernel-parameters.txt
index b9e9bd8..388b5e9 100644
--- a/Documentation/kernel-parameters.txt
+++ b/Documentation/kernel-parameters.txt
@@ -962,6 +962,13 @@ bytes respectively. Such letter suffixes can also be 
entirely omitted.
parameter will force ia64_sal_cache_flush to call
ia64_pal_cache_flush instead of SAL_CACHE_FLUSH.
 
+   forcepae [X86-32]
+   Forcefully enable Physical Address Extension (PAE).
+   Many Pentium M systems disable PAE but may have a
+   functionally usable PAE implementation.
+   Note: This parameter is unsupported, may cause unknown
+   problems, and will taint the kernel.
+
ftrace=[tracer]
[FTRACE] will set and start the specified tracer
as early as possible in order to facilitate early
diff --git a/arch/x86/boot/cpucheck.c b/arch/x86/boot/cpucheck.c
index 4d3ff03..93ba160 100644
--- a/arch/x86/boot/cpucheck.c
+++ b/arch/x86/boot/cpucheck.c
@@ -69,6 +69,13 @@ static int is_transmeta(void)
   cpu_vendor[2] == A32('M', 'x', '8', '6');
 }
 
+static int is_intel(void)
+{
+   return cpu_vendor[0] == A32('G', 'e', 'n', 'u') &&
+  cpu_vendor[1] == A32('i', 'n', 'e', 'I') &&
+  cpu_vendor[2] == A32('n', 't', 'e', 'l');
+}
+
 static int has_fpu(void)
 {
u16 fcw = -1, fsw = -1;
@@ -239,6 +246,24 @@ int check_cpu(int *cpu_level_ptr, int *req_level_ptr, u32 
**err_flags_ptr)
asm("wrmsr" : : "a" (eax), "d" (edx), "c" (ecx));
 
err = check_flags();
+   } else if (err == 0x01 &&
+  !(err_flags[0] & ~(1 << X86_FEATURE_PAE)) &&
+  is_intel() && cpu.level == 6 &&
+  (cpu.model == 9 || cpu.model == 13)) {
+   /* PAE is disabled on this Pentium M but can be forced */
+   if (cmdline_find_option_bool("forcepae")) {
+   puts("WARNING: Forcing PAE in CPU flags\n");
+   set_bit(X86_FEATURE_PAE, cpu.flags);
+   err = check_flags();
+   }
+   else {
+   puts("ERROR: PAE is disabled on this Pentium M\n"
+   "(PAE can potentially be enabled with "
+   "kernel parameter\n"
+   "\"forcepae\" - this is unsupported, may "
+   "cause unknown\n"
+   "problems, and will taint the kernel)\n");
+   }
}
 
if (err_flags_ptr)
diff --git a/arch/x86/kernel/cpu/intel.c b/arch/x86/kernel/cpu/intel.c
index bbe1b8b..271686d 100644
--- a/arch/x86/kernel/cpu/intel.c
+++ b/arch/x86/kernel/cpu/intel.c
@@ -196,6 +196,14 @@ static void intel_smp_check(struct cpuinfo_x86 *c)
}
 }
 
+static int forcepae;
+static int __init forcepae_setup(char *__unused)
+{
+   forcepae = 1;
+   return 1;
+}
+__setup("forcepae", forcepae_setup);
+
 static void intel_workarounds(struct cpuinfo_x86 *c)
 {
unsigned long lo, hi;
@@ -226,6 +234,17 @@ static void intel_workarounds(struct cpuinfo_x86 *c)
clear_cpu_cap(c, X86_FEATURE_SEP);
 
/*
+* PAE CPUID issue: many Pentium M report no PAE but may have a
+* functionally usable PAE implementation.
+* Forcefully enable PAE if kernel parameter "forcepae" is present.
+*/
+   if (forcepae) {
+   printk(KERN_WARNING "PAE forced!\n");
+   set_cpu_cap(c, X86_FEATURE_PAE);
+   add_taint(TAINT_CPU_OUT_OF_SPEC, LOCKDEP_NOW_UNRELIABLE);
+   }
+
+   /*
 * P4 Xeon errata 037 workaround.
 * Hardware prefetcher may cause stale data to be loaded into the cache.
 */
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Re: [PATCH] x86: set Pentium M as PAE capable

2014-03-03 Thread Chris Bainbridge

On Sun, Mar 02, 2014 at 09:56:19PM +0100, Andreas Mohr wrote:
 Hi,
 
  /*
  +* PAE CPUID bug: Pentium M reports no PAE but has PAE
  +*/
 
 Ain't that a tad strongly/incorrectly worded?

I've updated the wording.

On 3 March 2014 02:05, Roland Kletzing devz...@web.de wrote:
 i would recommend adding the newly introduced param to
 Documentation/kernel-
 parameters.txt , though.

Done.

Signed-off-by: Chris Bainbridge chris.bainbri...@gmail.com
---
diff --git a/Documentation/kernel-parameters.txt 
b/Documentation/kernel-parameters.txt
index b9e9bd8..388b5e9 100644
--- a/Documentation/kernel-parameters.txt
+++ b/Documentation/kernel-parameters.txt
@@ -962,6 +962,13 @@ bytes respectively. Such letter suffixes can also be 
entirely omitted.
parameter will force ia64_sal_cache_flush to call
ia64_pal_cache_flush instead of SAL_CACHE_FLUSH.
 
+   forcepae [X86-32]
+   Forcefully enable Physical Address Extension (PAE).
+   Many Pentium M systems disable PAE but may have a
+   functionally usable PAE implementation.
+   Note: This parameter is unsupported, may cause unknown
+   problems, and will taint the kernel.
+
ftrace=[tracer]
[FTRACE] will set and start the specified tracer
as early as possible in order to facilitate early
diff --git a/arch/x86/boot/cpucheck.c b/arch/x86/boot/cpucheck.c
index 4d3ff03..93ba160 100644
--- a/arch/x86/boot/cpucheck.c
+++ b/arch/x86/boot/cpucheck.c
@@ -69,6 +69,13 @@ static int is_transmeta(void)
   cpu_vendor[2] == A32('M', 'x', '8', '6');
 }
 
+static int is_intel(void)
+{
+   return cpu_vendor[0] == A32('G', 'e', 'n', 'u') 
+  cpu_vendor[1] == A32('i', 'n', 'e', 'I') 
+  cpu_vendor[2] == A32('n', 't', 'e', 'l');
+}
+
 static int has_fpu(void)
 {
u16 fcw = -1, fsw = -1;
@@ -239,6 +246,24 @@ int check_cpu(int *cpu_level_ptr, int *req_level_ptr, u32 
**err_flags_ptr)
asm(wrmsr : : a (eax), d (edx), c (ecx));
 
err = check_flags();
+   } else if (err == 0x01 
+  !(err_flags[0]  ~(1  X86_FEATURE_PAE)) 
+  is_intel()  cpu.level == 6 
+  (cpu.model == 9 || cpu.model == 13)) {
+   /* PAE is disabled on this Pentium M but can be forced */
+   if (cmdline_find_option_bool(forcepae)) {
+   puts(WARNING: Forcing PAE in CPU flags\n);
+   set_bit(X86_FEATURE_PAE, cpu.flags);
+   err = check_flags();
+   }
+   else {
+   puts(ERROR: PAE is disabled on this Pentium M\n
+   (PAE can potentially be enabled with 
+   kernel parameter\n
+   \forcepae\ - this is unsupported, may 
+   cause unknown\n
+   problems, and will taint the kernel)\n);
+   }
}
 
if (err_flags_ptr)
diff --git a/arch/x86/kernel/cpu/intel.c b/arch/x86/kernel/cpu/intel.c
index bbe1b8b..271686d 100644
--- a/arch/x86/kernel/cpu/intel.c
+++ b/arch/x86/kernel/cpu/intel.c
@@ -196,6 +196,14 @@ static void intel_smp_check(struct cpuinfo_x86 *c)
}
 }
 
+static int forcepae;
+static int __init forcepae_setup(char *__unused)
+{
+   forcepae = 1;
+   return 1;
+}
+__setup(forcepae, forcepae_setup);
+
 static void intel_workarounds(struct cpuinfo_x86 *c)
 {
unsigned long lo, hi;
@@ -226,6 +234,17 @@ static void intel_workarounds(struct cpuinfo_x86 *c)
clear_cpu_cap(c, X86_FEATURE_SEP);
 
/*
+* PAE CPUID issue: many Pentium M report no PAE but may have a
+* functionally usable PAE implementation.
+* Forcefully enable PAE if kernel parameter forcepae is present.
+*/
+   if (forcepae) {
+   printk(KERN_WARNING PAE forced!\n);
+   set_cpu_cap(c, X86_FEATURE_PAE);
+   add_taint(TAINT_CPU_OUT_OF_SPEC, LOCKDEP_NOW_UNRELIABLE);
+   }
+
+   /*
 * P4 Xeon errata 037 workaround.
 * Hardware prefetcher may cause stale data to be loaded into the cache.
 */
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Re: [PATCH] x86: set Pentium M as PAE capable

2014-03-03 Thread Chris Bainbridge

On Mon, Mar 03, 2014 at 08:29:39PM +0100, Borislav Petkov wrote:
 On Mon, Mar 03, 2014 at 03:04:35PM +0700, Chris Bainbridge wrote:
  On 3 March 2014 02:05, Roland Kletzing devz...@web.de wrote:
   i would recommend adding the newly introduced param to
   Documentation/kernel-
   parameters.txt , though.
  
  Done.
  
  Signed-off-by: Chris Bainbridge chris.bainbri...@gmail.com
  ---
  diff --git a/Documentation/kernel-parameters.txt 
  b/Documentation/kernel-parameters.txt
  index b9e9bd8..388b5e9 100644
  --- a/Documentation/kernel-parameters.txt
  +++ b/Documentation/kernel-parameters.txt
  @@ -962,6 +962,13 @@ bytes respectively. Such letter suffixes can also be 
  entirely omitted.
  parameter will force ia64_sal_cache_flush to call
  ia64_pal_cache_flush instead of SAL_CACHE_FLUSH.
   
  +   forcepae [X86-32]
  +   Forcefully enable Physical Address Extension (PAE).
  +   Many Pentium M systems disable PAE but may have a
  +   functionally usable PAE implementation.
  +   Note: This parameter is unsupported, may cause unknown
 
 What does unsupported mean here exactly?

It was supposed to have the dual meaning that neither the kernel
developers or Intel are going to help you if it doesn't work. But
perhaps it is unnecessary - being tainted implies that the kernel
developers won't help, and Intel certainly won't be interested.

 
 This function is called check_cpuflags() now. You probably want to redo
 your patch against tip/master, i.e.:
 
 git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git#master
 

Done, new patch is against tip.

  +   }
  +   else {
  +   puts(ERROR: PAE is disabled on this Pentium M\n
  +   (PAE can potentially be enabled with 
  +   kernel parameter\n
  +   \forcepae\ - this is unsupported, may 
  +   cause unknown\n
  +   problems, and will taint the kernel)\n);
 
 This string could definitely violate the 80 cols rule so that it is much
 more readable:
 
   }
   else
   puts(WARNING: PAE disabled. Use \forcepae\ to enable 
 at your own risk!\n);
 
 I've shortened it to the most relevant info only. No need to say we're
 tainting the kernel because LOCKDEP_NOW_UNRELIABLE will cause that
 anyway below.

Ok I changed it to: WARNING: PAE disabled. Use parameter 'pae' to
enable at your own risk!\n

  diff --git a/arch/x86/kernel/cpu/intel.c b/arch/x86/kernel/cpu/intel.c
  index bbe1b8b..271686d 100644
  --- a/arch/x86/kernel/cpu/intel.c
  +++ b/arch/x86/kernel/cpu/intel.c
  @@ -196,6 +196,14 @@ static void intel_smp_check(struct cpuinfo_x86 *c)
  }
   }
   
  +static int forcepae;
  +static int __init forcepae_setup(char *__unused)
  +{
  +   forcepae = 1;
  +   return 1;
  +}
  +__setup(forcepae, forcepae_setup);
 
 Yeah, why not simply call it pae? It is smaller and the letter
 combination is not used yet and it means the same.

I don't see any reason not to just use pae forcepae is perhaps a bit
more descriptive but the other text in the patch clearly describes the
parameter so it doesn't really matter.

   static void intel_workarounds(struct cpuinfo_x86 *c)
   {
  unsigned long lo, hi;
  @@ -226,6 +234,17 @@ static void intel_workarounds(struct cpuinfo_x86 *c)
  clear_cpu_cap(c, X86_FEATURE_SEP);
   
  /*
  +* PAE CPUID issue: many Pentium M report no PAE but may have a
  +* functionally usable PAE implementation.
  +* Forcefully enable PAE if kernel parameter forcepae is present.
  +*/
  +   if (forcepae) {
  +   printk(KERN_WARNING PAE forced!\n);
  +   set_cpu_cap(c, X86_FEATURE_PAE);
  +   add_taint(TAINT_CPU_OUT_OF_SPEC, LOCKDEP_NOW_UNRELIABLE);
 
 Right, this implies Dave's patch is preceding yours. I guess hpa can
 fish it out from the thread when applying.

I will include Dave's patch, it is trivial. 

New patch with all above changes follows.

Signed-off-by: Chris Bainbridge chris.bainbri...@gmail.com
---

diff --git a/Documentation/kernel-parameters.txt 
b/Documentation/kernel-parameters.txt
index 580a60c..7d57a5a 100644
--- a/Documentation/kernel-parameters.txt
+++ b/Documentation/kernel-parameters.txt
@@ -2307,6 +2307,13 @@ bytes respectively. Such letter suffixes can also be 
entirely omitted.
OSS [HW,OSS]
See Documentation/sound/oss/oss-parameters.txt
 
+   pae [X86-32]
+   Forcefully enable Physical Address Extension (PAE).
+   Many Pentium M systems disable PAE but may have a
+   functionally usable PAE implementation.
+   Warning: use of this parameter will taint the kernel
+   and may cause unknown problems.
+
panic=  [KNL] Kernel

Re: Re: [PATCH] x86: set Pentium M as PAE capable

2014-03-03 Thread Chris Bainbridge

On Mon, Mar 03, 2014 at 09:04:19PM -0800, H. Peter Anvin wrote:
 forcepae is descriptive.

Back to forcepae.

Signed-off-by: Chris Bainbridge chris.bainbri...@gmail.com
---
diff --git a/Documentation/kernel-parameters.txt 
b/Documentation/kernel-parameters.txt
index 580a60c..67755ea 100644
--- a/Documentation/kernel-parameters.txt
+++ b/Documentation/kernel-parameters.txt
@@ -1011,6 +1011,13 @@ bytes respectively. Such letter suffixes can also be 
entirely omitted.
parameter will force ia64_sal_cache_flush to call
ia64_pal_cache_flush instead of SAL_CACHE_FLUSH.
 
+   forcepae [X86-32]
+   Forcefully enable Physical Address Extension (PAE).
+   Many Pentium M systems disable PAE but may have a
+   functionally usable PAE implementation.
+   Warning: use of this parameter will taint the kernel
+   and may cause unknown problems.
+
ftrace=[tracer]
[FTRACE] will set and start the specified tracer
as early as possible in order to facilitate early
diff --git a/arch/x86/boot/cpucheck.c b/arch/x86/boot/cpucheck.c
index 100a9a1..f0d0b20 100644
--- a/arch/x86/boot/cpucheck.c
+++ b/arch/x86/boot/cpucheck.c
@@ -67,6 +67,13 @@ static int is_transmeta(void)
   cpu_vendor[2] == A32('M', 'x', '8', '6');
 }
 
+static int is_intel(void)
+{
+   return cpu_vendor[0] == A32('G', 'e', 'n', 'u') 
+  cpu_vendor[1] == A32('i', 'n', 'e', 'I') 
+  cpu_vendor[2] == A32('n', 't', 'e', 'l');
+}
+
 /* Returns a bitmask of which words we have error bits in */
 static int check_cpuflags(void)
 {
@@ -153,6 +160,19 @@ int check_cpu(int *cpu_level_ptr, int *req_level_ptr, u32 
**err_flags_ptr)
asm(wrmsr : : a (eax), d (edx), c (ecx));
 
err = check_cpuflags();
+   } else if (err == 0x01 
+  !(err_flags[0]  ~(1  X86_FEATURE_PAE)) 
+  is_intel()  cpu.level == 6 
+  (cpu.model == 9 || cpu.model == 13)) {
+   /* PAE is disabled on this Pentium M but can be forced */
+   if (cmdline_find_option_bool(forcepae)) {
+   puts(WARNING: Forcing PAE in CPU flags\n);
+   set_bit(X86_FEATURE_PAE, cpu.flags);
+   err = check_cpuflags();
+   }
+   else {
+   puts(WARNING: PAE disabled. Use parameter 'forcepae' 
to enable at your own risk!\n);
+   }
}
 
if (err_flags_ptr)
diff --git a/arch/x86/kernel/cpu/amd.c b/arch/x86/kernel/cpu/amd.c
index c67ffa6..7ec4119 100644
--- a/arch/x86/kernel/cpu/amd.c
+++ b/arch/x86/kernel/cpu/amd.c
@@ -218,7 +218,7 @@ static void amd_k7_smp_check(struct cpuinfo_x86 *c)
 */
WARN_ONCE(1, WARNING: This combination of AMD
 processors is not suitable for SMP.\n);
-   add_taint(TAINT_UNSAFE_SMP, LOCKDEP_NOW_UNRELIABLE);
+   add_taint(TAINT_CPU_OUT_OF_SPEC, LOCKDEP_NOW_UNRELIABLE);
 }
 
 static void init_amd_k7(struct cpuinfo_x86 *c)
diff --git a/arch/x86/kernel/cpu/intel.c b/arch/x86/kernel/cpu/intel.c
index ea56e7c..053cb59 100644
--- a/arch/x86/kernel/cpu/intel.c
+++ b/arch/x86/kernel/cpu/intel.c
@@ -195,6 +195,14 @@ static void intel_smp_check(struct cpuinfo_x86 *c)
}
 }
 
+static int forcepae;
+static int __init forcepae_setup(char *__unused)
+{
+   forcepae = 1;
+   return 1;
+}
+__setup(forcepae, forcepae_setup);
+
 static void intel_workarounds(struct cpuinfo_x86 *c)
 {
unsigned long lo, hi;
@@ -225,6 +233,17 @@ static void intel_workarounds(struct cpuinfo_x86 *c)
clear_cpu_cap(c, X86_FEATURE_SEP);
 
/*
+* PAE CPUID issue: many Pentium M report no PAE but may have a
+* functionally usable PAE implementation.
+* Forcefully enable PAE if kernel parameter forcepae is present.
+*/
+   if (forcepae) {
+   printk(KERN_WARNING PAE forced!\n);
+   set_cpu_cap(c, X86_FEATURE_PAE);
+   add_taint(TAINT_CPU_OUT_OF_SPEC, LOCKDEP_NOW_UNRELIABLE);
+   }
+
+   /*
 * P4 Xeon errata 037 workaround.
 * Hardware prefetcher may cause stale data to be loaded into the cache.
 */
diff --git a/include/linux/kernel.h b/include/linux/kernel.h
index 196d1ea..08fb024 100644
--- a/include/linux/kernel.h
+++ b/include/linux/kernel.h
@@ -458,7 +458,7 @@ extern enum system_states {
 
 #define TAINT_PROPRIETARY_MODULE   0
 #define TAINT_FORCED_MODULE1
-#define TAINT_UNSAFE_SMP   2
+#define TAINT_CPU_OUT_OF_SPEC  2
 #define TAINT_FORCED_RMMOD 3
 #define TAINT_MACHINE_CHECK4
 #define TAINT_BAD_PAGE 5
diff --git a/kernel/module.c b/kernel/module.c
index b99e801..8dc7f5e 100644
--- a/kernel/module.c
+++ b/kernel/module.c

Re: [PATCH] x86: set Pentium M as PAE capable

2014-02-28 Thread Chris Bainbridge

On Fri, Feb 28, 2014 at 03:27:50PM +0300, Dennis Mungai wrote:
> Hello people,
> 
> Note that revisions of the Dothan core were released in the first quarter
> of 2005 with the *Sonoma* chipsets and supported a 533 MT/s FSB and NX-bit
> (and PAE support required for it was enabled, unlike earlier Pentium Ms
> that had it disabled). These processors include the 730M (1.6 GHz), 740M
> (1.73 GHz), 750M (1.86 GHz), 760M (2.0 GHz) and 770M (2.13 GHz). These
> models all have a TDP of 27 W and a 2 MB L2 cache.
> 
> These CPUs should have PAE enabled. Only earlier versions of the Pentium M
> ( Older Dothans and the Banias core) do not have PAE support, officially.
> 
> -Dennis.

Good point, patch updated to not show the warning if PAE is already
enabled.

Signed-off-by: Chris Bainbridge 
---
diff --git a/arch/x86/boot/cpucheck.c b/arch/x86/boot/cpucheck.c
index 4d3ff03..3fd12bc 100644
--- a/arch/x86/boot/cpucheck.c
+++ b/arch/x86/boot/cpucheck.c
@@ -69,6 +69,13 @@ static int is_transmeta(void)
   cpu_vendor[2] == A32('M', 'x', '8', '6');
 }
 
+static int is_intel(void)
+{
+   return cpu_vendor[0] == A32('G', 'e', 'n', 'u') &&
+  cpu_vendor[1] == A32('i', 'n', 'e', 'I') &&
+  cpu_vendor[2] == A32('n', 't', 'e', 'l');
+}
+
 static int has_fpu(void)
 {
u16 fcw = -1, fsw = -1;
@@ -239,6 +246,21 @@ int check_cpu(int *cpu_level_ptr, int *req_level_ptr, u32 
**err_flags_ptr)
asm("wrmsr" : : "a" (eax), "d" (edx), "c" (ecx));
 
err = check_flags();
+   } else if (err == 0x01 &&
+  !(err_flags[0] & ~(1 << X86_FEATURE_PAE)) &&
+  is_intel() && cpu.level == 6 &&
+  (cpu.model == 9 || cpu.model == 13)) {
+   /* PAE is disabled on this Pentium M but can be forced */
+   if (cmdline_find_option_bool("forcepae")) {
+   puts("WARNING: Forcing PAE in CPU flags\n");
+   set_bit(X86_FEATURE_PAE, cpu.flags);
+   err = check_flags();
+   }
+   else {
+   puts("Pentium M: PAE is disabled, "
+"enable it with kernel argument \"forcepae\"\n"
+"(this will taint the kernel)\n");
+   }
}
 
if (err_flags_ptr)
diff --git a/arch/x86/kernel/cpu/intel.c b/arch/x86/kernel/cpu/intel.c
index bbe1b8b..873cf3b 100644
--- a/arch/x86/kernel/cpu/intel.c
+++ b/arch/x86/kernel/cpu/intel.c
@@ -196,6 +196,14 @@ static void intel_smp_check(struct cpuinfo_x86 *c)
}
 }
 
+static int forcepae;
+static int __init forcepae_setup(char *__unused)
+{
+   forcepae = 1;
+   return 1;
+}
+__setup("forcepae", forcepae_setup);
+
 static void intel_workarounds(struct cpuinfo_x86 *c)
 {
unsigned long lo, hi;
@@ -226,6 +234,15 @@ static void intel_workarounds(struct cpuinfo_x86 *c)
clear_cpu_cap(c, X86_FEATURE_SEP);
 
/*
+* PAE CPUID bug: Pentium M reports no PAE but has PAE
+*/
+   if (forcepae) {
+   printk(KERN_WARNING "PAE forced!\n");
+   set_cpu_cap(c, X86_FEATURE_PAE);
+   add_taint(TAINT_CPU_OUT_OF_SPEC, LOCKDEP_NOW_UNRELIABLE);
+   }
+
+   /*
 * P4 Xeon errata 037 workaround.
 * Hardware prefetcher may cause stale data to be loaded into the cache.
 */
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] x86: set Pentium M as PAE capable

2014-02-28 Thread Chris Bainbridge

On Fri, Feb 28, 2014 at 03:27:50PM +0300, Dennis Mungai wrote:
 Hello people,
 
 Note that revisions of the Dothan core were released in the first quarter
 of 2005 with the *Sonoma* chipsets and supported a 533 MT/s FSB and NX-bit
 (and PAE support required for it was enabled, unlike earlier Pentium Ms
 that had it disabled). These processors include the 730M (1.6 GHz), 740M
 (1.73 GHz), 750M (1.86 GHz), 760M (2.0 GHz) and 770M (2.13 GHz). These
 models all have a TDP of 27 W and a 2 MB L2 cache.
 
 These CPUs should have PAE enabled. Only earlier versions of the Pentium M
 ( Older Dothans and the Banias core) do not have PAE support, officially.
 
 -Dennis.

Good point, patch updated to not show the warning if PAE is already
enabled.

Signed-off-by: Chris Bainbridge chris.bainbri...@gmail.com
---
diff --git a/arch/x86/boot/cpucheck.c b/arch/x86/boot/cpucheck.c
index 4d3ff03..3fd12bc 100644
--- a/arch/x86/boot/cpucheck.c
+++ b/arch/x86/boot/cpucheck.c
@@ -69,6 +69,13 @@ static int is_transmeta(void)
   cpu_vendor[2] == A32('M', 'x', '8', '6');
 }
 
+static int is_intel(void)
+{
+   return cpu_vendor[0] == A32('G', 'e', 'n', 'u') 
+  cpu_vendor[1] == A32('i', 'n', 'e', 'I') 
+  cpu_vendor[2] == A32('n', 't', 'e', 'l');
+}
+
 static int has_fpu(void)
 {
u16 fcw = -1, fsw = -1;
@@ -239,6 +246,21 @@ int check_cpu(int *cpu_level_ptr, int *req_level_ptr, u32 
**err_flags_ptr)
asm(wrmsr : : a (eax), d (edx), c (ecx));
 
err = check_flags();
+   } else if (err == 0x01 
+  !(err_flags[0]  ~(1  X86_FEATURE_PAE)) 
+  is_intel()  cpu.level == 6 
+  (cpu.model == 9 || cpu.model == 13)) {
+   /* PAE is disabled on this Pentium M but can be forced */
+   if (cmdline_find_option_bool(forcepae)) {
+   puts(WARNING: Forcing PAE in CPU flags\n);
+   set_bit(X86_FEATURE_PAE, cpu.flags);
+   err = check_flags();
+   }
+   else {
+   puts(Pentium M: PAE is disabled, 
+enable it with kernel argument \forcepae\\n
+(this will taint the kernel)\n);
+   }
}
 
if (err_flags_ptr)
diff --git a/arch/x86/kernel/cpu/intel.c b/arch/x86/kernel/cpu/intel.c
index bbe1b8b..873cf3b 100644
--- a/arch/x86/kernel/cpu/intel.c
+++ b/arch/x86/kernel/cpu/intel.c
@@ -196,6 +196,14 @@ static void intel_smp_check(struct cpuinfo_x86 *c)
}
 }
 
+static int forcepae;
+static int __init forcepae_setup(char *__unused)
+{
+   forcepae = 1;
+   return 1;
+}
+__setup(forcepae, forcepae_setup);
+
 static void intel_workarounds(struct cpuinfo_x86 *c)
 {
unsigned long lo, hi;
@@ -226,6 +234,15 @@ static void intel_workarounds(struct cpuinfo_x86 *c)
clear_cpu_cap(c, X86_FEATURE_SEP);
 
/*
+* PAE CPUID bug: Pentium M reports no PAE but has PAE
+*/
+   if (forcepae) {
+   printk(KERN_WARNING PAE forced!\n);
+   set_cpu_cap(c, X86_FEATURE_PAE);
+   add_taint(TAINT_CPU_OUT_OF_SPEC, LOCKDEP_NOW_UNRELIABLE);
+   }
+
+   /*
 * P4 Xeon errata 037 workaround.
 * Hardware prefetcher may cause stale data to be loaded into the cache.
 */
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] x86: set Pentium M as PAE capable

2014-02-27 Thread Chris Bainbridge

On Wed, Feb 26, 2014 at 10:49:49AM -0500, Dave Jones wrote:
> On Wed, Feb 26, 2014 at 02:18:52PM +0100, Borislav Petkov wrote:
>  > On Wed, Feb 26, 2014 at 07:12:59PM +0700, Chris Bainbridge wrote:
>  > > @@ -226,6 +234,15 @@ static void intel_workarounds(struct cpuinfo_x86 *c)
>  > >  clear_cpu_cap(c, X86_FEATURE_SEP);
>  > >  
>  > >  /*
>  > > + * PAE CPUID bug: Pentium M reports no PAE but has PAE
>  > > + */
>  > > +if (forcepae) {
>  > > +printk(KERN_WARNING "PAE forced!\n");
>  > > +set_cpu_cap(c, X86_FEATURE_PAE);
>  > > +add_taint(TAINT_MACHINE_CHECK, LOCKDEP_NOW_UNRELIABLE);
>  > 
>  > This is certainly the wrong taint flag. We'd need a new one or to
>  > repurpose another one as I suggested in a previous mail.
> 
> I'd suggest repurposing 'S'. Instead of 'unsafe smp', it could mean
> "out of Spec". We currently only use that flag on some ancient athlon xp,
> so there's not going to be any kind of collision.
> 
> Start with the below maybe ?
> 
>   Dave

Patch looks fine. Is the patch I previously posted, combined with this
patch, sufficient for inclusion in the kernel? Or is there anything else
I need to do?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] x86: set Pentium M as PAE capable

2014-02-27 Thread Chris Bainbridge

On Wed, Feb 26, 2014 at 10:49:49AM -0500, Dave Jones wrote:
 On Wed, Feb 26, 2014 at 02:18:52PM +0100, Borislav Petkov wrote:
   On Wed, Feb 26, 2014 at 07:12:59PM +0700, Chris Bainbridge wrote:
@@ -226,6 +234,15 @@ static void intel_workarounds(struct cpuinfo_x86 *c)
 clear_cpu_cap(c, X86_FEATURE_SEP);
 
 /*
+ * PAE CPUID bug: Pentium M reports no PAE but has PAE
+ */
+if (forcepae) {
+printk(KERN_WARNING PAE forced!\n);
+set_cpu_cap(c, X86_FEATURE_PAE);
+add_taint(TAINT_MACHINE_CHECK, LOCKDEP_NOW_UNRELIABLE);
   
   This is certainly the wrong taint flag. We'd need a new one or to
   repurpose another one as I suggested in a previous mail.
 
 I'd suggest repurposing 'S'. Instead of 'unsafe smp', it could mean
 out of Spec. We currently only use that flag on some ancient athlon xp,
 so there's not going to be any kind of collision.
 
 Start with the below maybe ?
 
   Dave

Patch looks fine. Is the patch I previously posted, combined with this
patch, sufficient for inclusion in the kernel? Or is there anything else
I need to do?
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] x86: set Pentium M as PAE capable

2014-02-26 Thread Chris Bainbridge

On Tue, Feb 25, 2014 at 09:16:02AM -0800, H. Peter Anvin wrote:
> On 02/25/2014 08:26 AM, Dave Jones wrote:
> > On Tue, Feb 25, 2014 at 02:45:57AM -0800, H. Peter Anvin wrote:
> >  > On 02/24/2014 10:01 PM, Chris Bainbridge wrote:
> >  > > Pentium M is PAE capable but does not indicate so in the CPUID 
> > response.
> >  > > This is an issue now that some distributions are no longer shipping
> >  > > non-PAE kernels (those distributions no longer boot on Pentium M). This
> >  > > small patch fixes the issue by forcing the PAE capability on Pentium M.
> >  > >
> >  > > For more discussion see https://bugs.launchpad.net/baltix/+bug/930447
> >  > >
> >  > 
> >  > 1. This patch doesn't match the discussion in the link.
> >  > 2. You would have to also enable this in the cpu testing code in
> >  > arch/x86/boot.
> >  > 3. At the very least we need to print a serious warning that the CPU
> >  > is being run outside its specifications.  I have no personal
> >  > information about why this CPUID bit was disabled, but it could be
> >  > that it was discovered in testing that it didn't work correctly in
> >  > all circumstances (e.g. high temperature.)  This is very much "use
> >  > at your own risk..."; you could get data corruption or even
> >  > hardware damage.
> > 
> > About six years ago, we almost went down this same path for Fedora,
> > and I'm fairly sure the only reason we backed off and decided to not
> > pursue it was that we found some Pentium M's where it just didn't work.
> > 
> 
> OK, that *definitely* means that if we're doing this at all we're doing
> it via an explicit opt-in on the command line, and tainting the kernel
> in the process.
> 
> I don't know if anyone (Chris?) is interested enough in the problem to
> do such a patch, though.  I know I'm not too interested in spending a
> bunch of time on.
> 
>   -hpa
> 

The basic findings of the bug discussion is that people are successfully
running PAE kernels on Pentium M (for some unknown reason Grub skips the
validate_cpu code in the kernel, so existing PAE kernels will run
unmodified, although they do fail when booted with syslinux), and people
are using a user-space hack to add "pae" to /proc/cpuinfo.

In all of the testing reported on the Launchpad bug and elsewhere, every
user who managed to boot a PAE kernel on Pentium M reported success.
There was a single report of failure, but the user encountered the "This
kernel requires the following features" message, so the failure was
caused by some issue of his boot setup not skipping the cpu validation
code, rather than a PAE failure in the Pentium M.

It is possible that PAE was disabled for technical reasons, or for
commercial reasons (e.g. to discourage vendors from building Pentium M
servers). We don't know. What we do know is that people are using PAE
kernels on Pentium M systems, and that not all are aware of the
implications (for a user with an existing install of Debian who apt-gets a
PAE kernel, it will install and boot (thanks to Grub) and no errors or
warnings will have been shown to indicate that their system is now
running "out of spec")

I have made the requested changes to the patch:
---
diff --git a/arch/x86/boot/cpucheck.c b/arch/x86/boot/cpucheck.c
index 4d3ff03..3220734 100644
--- a/arch/x86/boot/cpucheck.c
+++ b/arch/x86/boot/cpucheck.c
@@ -151,6 +151,16 @@ static void get_flags(void)
: : "ebx");
}
}
+
+   if (cmdline_find_option_bool("forcepae")) {
+   puts("WARNING: Forcing PAE in CPU flags\n");
+   set_bit(X86_FEATURE_PAE, cpu.flags);
+   }
+   else if ((cpu.level == 6) && ((cpu.model == 9) || (cpu.model == 13))) {
+   puts("Pentium M: PAE is disabled, "
+"enable it with kernel argument \"forcepae\"\n"
+"(\"forcepae\" is unsupported and will taint the 
kernel)\n");
+   }
 }
 
 /* Returns a bitmask of which words we have error bits in */
diff --git a/arch/x86/kernel/cpu/intel.c b/arch/x86/kernel/cpu/intel.c
index bbe1b8b..1047098 100644
--- a/arch/x86/kernel/cpu/intel.c
+++ b/arch/x86/kernel/cpu/intel.c
@@ -196,6 +196,14 @@ static void intel_smp_check(struct cpuinfo_x86 *c)
}
 }
 
+static int forcepae;
+static int __init forcepae_setup(char *__unused)
+{
+   forcepae = 1;
+   return 1;
+}
+__setup("forcepae", forcepae_setup);
+
 static void intel_workarounds(struct cpuinfo_x86 *c)
 {
unsigned long lo, hi;
@@ -226,6 +234,15 @@ static void intel_workarounds(struct cpuinfo_x

Re: [PATCH] x86: set Pentium M as PAE capable

2014-02-26 Thread Chris Bainbridge

On Tue, Feb 25, 2014 at 09:16:02AM -0800, H. Peter Anvin wrote:
 On 02/25/2014 08:26 AM, Dave Jones wrote:
  On Tue, Feb 25, 2014 at 02:45:57AM -0800, H. Peter Anvin wrote:
On 02/24/2014 10:01 PM, Chris Bainbridge wrote:
 Pentium M is PAE capable but does not indicate so in the CPUID 
  response.
 This is an issue now that some distributions are no longer shipping
 non-PAE kernels (those distributions no longer boot on Pentium M). This
 small patch fixes the issue by forcing the PAE capability on Pentium M.

 For more discussion see https://bugs.launchpad.net/baltix/+bug/930447


1. This patch doesn't match the discussion in the link.
2. You would have to also enable this in the cpu testing code in
arch/x86/boot.
3. At the very least we need to print a serious warning that the CPU
is being run outside its specifications.  I have no personal
information about why this CPUID bit was disabled, but it could be
that it was discovered in testing that it didn't work correctly in
all circumstances (e.g. high temperature.)  This is very much use
at your own risk...; you could get data corruption or even
hardware damage.
  
  About six years ago, we almost went down this same path for Fedora,
  and I'm fairly sure the only reason we backed off and decided to not
  pursue it was that we found some Pentium M's where it just didn't work.
  
 
 OK, that *definitely* means that if we're doing this at all we're doing
 it via an explicit opt-in on the command line, and tainting the kernel
 in the process.
 
 I don't know if anyone (Chris?) is interested enough in the problem to
 do such a patch, though.  I know I'm not too interested in spending a
 bunch of time on.
 
   -hpa
 

The basic findings of the bug discussion is that people are successfully
running PAE kernels on Pentium M (for some unknown reason Grub skips the
validate_cpu code in the kernel, so existing PAE kernels will run
unmodified, although they do fail when booted with syslinux), and people
are using a user-space hack to add pae to /proc/cpuinfo.

In all of the testing reported on the Launchpad bug and elsewhere, every
user who managed to boot a PAE kernel on Pentium M reported success.
There was a single report of failure, but the user encountered the This
kernel requires the following features message, so the failure was
caused by some issue of his boot setup not skipping the cpu validation
code, rather than a PAE failure in the Pentium M.

It is possible that PAE was disabled for technical reasons, or for
commercial reasons (e.g. to discourage vendors from building Pentium M
servers). We don't know. What we do know is that people are using PAE
kernels on Pentium M systems, and that not all are aware of the
implications (for a user with an existing install of Debian who apt-gets a
PAE kernel, it will install and boot (thanks to Grub) and no errors or
warnings will have been shown to indicate that their system is now
running out of spec)

I have made the requested changes to the patch:
---
diff --git a/arch/x86/boot/cpucheck.c b/arch/x86/boot/cpucheck.c
index 4d3ff03..3220734 100644
--- a/arch/x86/boot/cpucheck.c
+++ b/arch/x86/boot/cpucheck.c
@@ -151,6 +151,16 @@ static void get_flags(void)
: : ebx);
}
}
+
+   if (cmdline_find_option_bool(forcepae)) {
+   puts(WARNING: Forcing PAE in CPU flags\n);
+   set_bit(X86_FEATURE_PAE, cpu.flags);
+   }
+   else if ((cpu.level == 6)  ((cpu.model == 9) || (cpu.model == 13))) {
+   puts(Pentium M: PAE is disabled, 
+enable it with kernel argument \forcepae\\n
+(\forcepae\ is unsupported and will taint the 
kernel)\n);
+   }
 }
 
 /* Returns a bitmask of which words we have error bits in */
diff --git a/arch/x86/kernel/cpu/intel.c b/arch/x86/kernel/cpu/intel.c
index bbe1b8b..1047098 100644
--- a/arch/x86/kernel/cpu/intel.c
+++ b/arch/x86/kernel/cpu/intel.c
@@ -196,6 +196,14 @@ static void intel_smp_check(struct cpuinfo_x86 *c)
}
 }
 
+static int forcepae;
+static int __init forcepae_setup(char *__unused)
+{
+   forcepae = 1;
+   return 1;
+}
+__setup(forcepae, forcepae_setup);
+
 static void intel_workarounds(struct cpuinfo_x86 *c)
 {
unsigned long lo, hi;
@@ -226,6 +234,15 @@ static void intel_workarounds(struct cpuinfo_x86 *c)
clear_cpu_cap(c, X86_FEATURE_SEP);
 
/*
+* PAE CPUID bug: Pentium M reports no PAE but has PAE
+*/
+   if (forcepae) {
+   printk(KERN_WARNING PAE forced!\n);
+   set_cpu_cap(c, X86_FEATURE_PAE);
+   add_taint(TAINT_MACHINE_CHECK, LOCKDEP_NOW_UNRELIABLE);
+   }
+
+   /*
 * P4 Xeon errata 037 workaround.
 * Hardware prefetcher may cause stale data to be loaded into the cache.
 */
--
To unsubscribe from

[PATCH] x86: set Pentium M as PAE capable

2014-02-24 Thread Chris Bainbridge

Pentium M is PAE capable but does not indicate so in the CPUID response.
This is an issue now that some distributions are no longer shipping
non-PAE kernels (those distributions no longer boot on Pentium M). This
small patch fixes the issue by forcing the PAE capability on Pentium M.

For more discussion see https://bugs.launchpad.net/baltix/+bug/930447

Signed-off-by: Chris Bainbridge 

---

diff --git a/arch/x86/kernel/cpu/intel.c b/arch/x86/kernel/cpu/intel.c
index bbe1b8b..97996aa 100644
--- a/arch/x86/kernel/cpu/intel.c
+++ b/arch/x86/kernel/cpu/intel.c
@@ -226,6 +226,12 @@ static void intel_workarounds(struct cpuinfo_x86 *c)
clear_cpu_cap(c, X86_FEATURE_SEP);
 
/*
+* PAE CPUID bug: Pentium M reports no PAE but has PAE
+*/
+   if ((c->x86 == 6) && ((c->x86_model == 9) || (c->x86_model == 13)))
+   set_cpu_cap(c, X86_FEATURE_PAE);
+
+   /*
 * P4 Xeon errata 037 workaround.
 * Hardware prefetcher may cause stale data to be loaded into the cache.
 */
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] x86: set Pentium M as PAE capable

2014-02-24 Thread Chris Bainbridge

Pentium M is PAE capable but does not indicate so in the CPUID response.
This is an issue now that some distributions are no longer shipping
non-PAE kernels (those distributions no longer boot on Pentium M). This
small patch fixes the issue by forcing the PAE capability on Pentium M.

For more discussion see https://bugs.launchpad.net/baltix/+bug/930447

Signed-off-by: Chris Bainbridge chris.bainbri...@gmail.com

---

diff --git a/arch/x86/kernel/cpu/intel.c b/arch/x86/kernel/cpu/intel.c
index bbe1b8b..97996aa 100644
--- a/arch/x86/kernel/cpu/intel.c
+++ b/arch/x86/kernel/cpu/intel.c
@@ -226,6 +226,12 @@ static void intel_workarounds(struct cpuinfo_x86 *c)
clear_cpu_cap(c, X86_FEATURE_SEP);
 
/*
+* PAE CPUID bug: Pentium M reports no PAE but has PAE
+*/
+   if ((c-x86 == 6)  ((c-x86_model == 9) || (c-x86_model == 13)))
+   set_cpu_cap(c, X86_FEATURE_PAE);
+
+   /*
 * P4 Xeon errata 037 workaround.
 * Hardware prefetcher may cause stale data to be loaded into the cache.
 */
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

92 matches

Mail list logo