Re: Multiple problems with the Linux kernel on an AMD desktop

2016-11-25 Thread Rogério Brito
Hi, Clemens and others.

On Nov 25 2016, Clemens Ladisch wrote:
> Rogério Brito wrote:
> > [  130.007219] evbug: Event. Dev: input6, Type: 0, Code: 0, Value: 0
> 
> The evbug module is intended for debugging; it dumps all input events
> into syslog.  If you do not want these messages, do not load this module.
> (If it is loaded automatically, you have an actual bug.)

It *was* loaded automatically, and I didn't specifically asked it to be
loaded, but I'm not sure if other parts of userspace forced it to be
loaded. I will disable it, then.

Here is the relevant part of the config file:

,[ grep -i evbug /boot/config-4.9.0-040900rc6-generic ]
| CONFIG_INPUT_EVBUG=m
`


Thanks,

-- 
Rogério Brito : rbrito@{ime.usp.br,gmail.com} : GPG key 4096R/BCFC
http://rb.doesntexist.org/blog : Projects : https://github.com/rbrito/
DebianQA: http://qa.debian.org/developer.php?login=rbrito%40ime.usp.br


Re: Multiple problems with the Linux kernel on an AMD desktop

2016-11-25 Thread Rogério Brito
Hi, Clemens and others.

On Nov 25 2016, Clemens Ladisch wrote:
> Rogério Brito wrote:
> > [  130.007219] evbug: Event. Dev: input6, Type: 0, Code: 0, Value: 0
> 
> The evbug module is intended for debugging; it dumps all input events
> into syslog.  If you do not want these messages, do not load this module.
> (If it is loaded automatically, you have an actual bug.)

It *was* loaded automatically, and I didn't specifically asked it to be
loaded, but I'm not sure if other parts of userspace forced it to be
loaded. I will disable it, then.

Here is the relevant part of the config file:

,[ grep -i evbug /boot/config-4.9.0-040900rc6-generic ]
| CONFIG_INPUT_EVBUG=m
`


Thanks,

-- 
Rogério Brito : rbrito@{ime.usp.br,gmail.com} : GPG key 4096R/BCFC
http://rb.doesntexist.org/blog : Projects : https://github.com/rbrito/
DebianQA: http://qa.debian.org/developer.php?login=rbrito%40ime.usp.br


Re: Multiple problems with the Linux kernel on an AMD desktop

2016-11-25 Thread Borislav Petkov
On Fri, Nov 25, 2016 at 02:53:00PM -0200, Rogério Brito wrote:
> Here is part from /proc/interrupts that contains interrupt 18 *without* 
> irqpoll:
> 
> ---
>CPU0   CPU1   CPU2   CPU3   
>   0: 47  0  0  0   IO-APIC   2-edge  timer
>   1:  0  0  0  2   IO-APIC   1-edge  i8042
>   7:  0  0  0  0   IO-APIC   7-edge  
> parport0
>   8:  0  0  0  1   IO-APIC   8-edge  rtc0
>   9:  0  0  0  0   IO-APIC   9-fasteoi   acpi
>  10:  0  0  0  0   IO-APIC  10-edge  
> radeon
>  12:  0  0  0  4   IO-APIC  12-edge  i8042
>  16:  0 96  4990   IO-APIC  16-fasteoi   
> ohci_hcd:usb3, ohci_hcd:usb4, snd_hda_intel:card0
>  17:  0   2457  1140   IO-APIC  17-fasteoi   
> ehci_hcd:usb1
>  18:  1 11 43  99947   IO-APIC  18-fasteoi   
> ohci_hcd:usb5, ohci_hcd:usb6, ohci_hcd:usb7

Can you connect the printer to a different port so that it doesn't use
OCHI to see if it makes any difference?

>  19:  0  0  0  0   IO-APIC  19-fasteoi   
> ehci_hcd:usb2
>  22:  0  22169139   8731   IO-APIC  22-fasteoi   
> ahci[:00:11.0]
>  25:  0  0 11753   PCI-MSI 1048576-edge  
> eth0
> (...)
> ---
-- 
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.


Re: Multiple problems with the Linux kernel on an AMD desktop

2016-11-25 Thread Borislav Petkov
On Fri, Nov 25, 2016 at 02:53:00PM -0200, Rogério Brito wrote:
> Here is part from /proc/interrupts that contains interrupt 18 *without* 
> irqpoll:
> 
> ---
>CPU0   CPU1   CPU2   CPU3   
>   0: 47  0  0  0   IO-APIC   2-edge  timer
>   1:  0  0  0  2   IO-APIC   1-edge  i8042
>   7:  0  0  0  0   IO-APIC   7-edge  
> parport0
>   8:  0  0  0  1   IO-APIC   8-edge  rtc0
>   9:  0  0  0  0   IO-APIC   9-fasteoi   acpi
>  10:  0  0  0  0   IO-APIC  10-edge  
> radeon
>  12:  0  0  0  4   IO-APIC  12-edge  i8042
>  16:  0 96  4990   IO-APIC  16-fasteoi   
> ohci_hcd:usb3, ohci_hcd:usb4, snd_hda_intel:card0
>  17:  0   2457  1140   IO-APIC  17-fasteoi   
> ehci_hcd:usb1
>  18:  1 11 43  99947   IO-APIC  18-fasteoi   
> ohci_hcd:usb5, ohci_hcd:usb6, ohci_hcd:usb7

Can you connect the printer to a different port so that it doesn't use
OCHI to see if it makes any difference?

>  19:  0  0  0  0   IO-APIC  19-fasteoi   
> ehci_hcd:usb2
>  22:  0  22169139   8731   IO-APIC  22-fasteoi   
> ahci[:00:11.0]
>  25:  0  0 11753   PCI-MSI 1048576-edge  
> eth0
> (...)
> ---
-- 
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.


Re: Multiple problems with the Linux kernel on an AMD desktop

2016-11-25 Thread Rogério Brito
Hi, Clemens and Borislav.

On Nov 25 2016, Clemens Ladisch wrote:
> Rogério Brito wrote:
> > * I have never been able to boot this computer of mine without the option
> >   irqpoll---otherwise, I get the nobody cared message.
> 
> The "nobody cared" message indicates that there were too many interrupts
> that no driver felt responsible for, so the kernel has disabled that
> interrupt vector.  The irqpoll option is a workaround to get the devices
> on that interrupt vector to work, but it's not perfect.

Ah, great to know. I don't know if this is related or not, but I read
somewhere (don't remember where) that the machine may have performance
slightly reduced when irqpoll is used.

> It's possible that most of your problems are caused by the irqpoll option.

Excellent to know.

> What IRQ is the problematic one (see the "nobody cared" message)?  What
> devices are connected to it (see /proc/interrupts)?

>From the dmesg log, the interrupt is 18.

Here is part from /proc/interrupts that contains interrupt 18 *without* irqpoll:

---
   CPU0   CPU1   CPU2   CPU3   
  0: 47  0  0  0   IO-APIC   2-edge  timer
  1:  0  0  0  2   IO-APIC   1-edge  i8042
  7:  0  0  0  0   IO-APIC   7-edge  
parport0
  8:  0  0  0  1   IO-APIC   8-edge  rtc0
  9:  0  0  0  0   IO-APIC   9-fasteoi   acpi
 10:  0  0  0  0   IO-APIC  10-edge  radeon
 12:  0  0  0  4   IO-APIC  12-edge  i8042
 16:  0 96  4990   IO-APIC  16-fasteoi   
ohci_hcd:usb3, ohci_hcd:usb4, snd_hda_intel:card0
 17:  0   2457  1140   IO-APIC  17-fasteoi   
ehci_hcd:usb1
 18:  1 11 43  99947   IO-APIC  18-fasteoi   
ohci_hcd:usb5, ohci_hcd:usb6, ohci_hcd:usb7
 19:  0  0  0  0   IO-APIC  19-fasteoi   
ehci_hcd:usb2
 22:  0  22169139   8731   IO-APIC  22-fasteoi   
ahci[:00:11.0]
 25:  0  0 11753   PCI-MSI 1048576-edge  
eth0
(...)
---

Here is part from /proc/interrupts that contains interrupt 18 *with* irqpoll:

---
   CPU0   CPU1   CPU2   CPU3   
  0: 46  0  0  0   IO-APIC   2-edge  timer
  1:  0  0  0  2   IO-APIC   1-edge  i8042
  7:  0  0  0  0   IO-APIC   7-edge  
parport0
  8:  0  0  0  1   IO-APIC   8-edge  rtc0
  9:  0  0  0  0   IO-APIC   9-fasteoi   acpi
 10:  0  0  0  0   IO-APIC  10-edge  radeon
 12:  0  0  0  4   IO-APIC  12-edge  i8042
 16:  0103  6983   IO-APIC  16-fasteoi   
ohci_hcd:usb3, ohci_hcd:usb4, snd_hda_intel:card0
 17:  0588  0144   IO-APIC  17-fasteoi   
ehci_hcd:usb1
 18:  0  0  0705   IO-APIC  18-fasteoi   
ohci_hcd:usb5, ohci_hcd:usb6, ohci_hcd:usb7
 19:  0  0  0  0   IO-APIC  19-fasteoi   
ehci_hcd:usb2
 22:  0  18049  4   8540   IO-APIC  22-fasteoi   
ahci[:00:11.0]
 25:  0  0  0327   PCI-MSI 1048576-edge  
eth0
(...)
---

I'm attaching both files to this message.

> Does the problem go away when you prevent the corresponding driver(s) from
> loading?

Since the OHCI_HCD driver is built-in (as opposed to a module), I don't know
how to disable it. I can try to recompile the kernel with it as a module and
rename it as some garbage, so that it doesn't get loaded...


Thanks a lot,

-- 
Rogério Brito : rbrito@{ime.usp.br,gmail.com} : GPG key 4096R/BCFC
http://rb.doesntexist.org/blog : Projects : https://github.com/rbrito/
DebianQA: http://qa.debian.org/developer.php?login=rbrito%40ime.usp.br
   CPU0   CPU1   CPU2   CPU3   
  0: 47  0  0  0   IO-APIC   2-edge  timer
  1:  0  0  0  2   IO-APIC   1-edge  i8042
  7:  0  0  0  0   IO-APIC   7-edge  
parport0
  8:  0  0  0  1   IO-APIC   8-edge  rtc0
  9:  0  0  0  0   IO-APIC   9-fasteoi   acpi
 10:  0  0  0  0   IO-APIC  10-edge  radeon
 12:  0  0  0  4   IO-APIC  12-edge  i8042
 16:  0 96  4990   IO-APIC  16-fasteoi   
ohci_hcd:usb3, ohci_hcd:usb4, snd_hda_intel:card0
 17:  0   

Re: Multiple problems with the Linux kernel on an AMD desktop

2016-11-25 Thread Rogério Brito
Hi, Clemens and Borislav.

On Nov 25 2016, Clemens Ladisch wrote:
> Rogério Brito wrote:
> > * I have never been able to boot this computer of mine without the option
> >   irqpoll---otherwise, I get the nobody cared message.
> 
> The "nobody cared" message indicates that there were too many interrupts
> that no driver felt responsible for, so the kernel has disabled that
> interrupt vector.  The irqpoll option is a workaround to get the devices
> on that interrupt vector to work, but it's not perfect.

Ah, great to know. I don't know if this is related or not, but I read
somewhere (don't remember where) that the machine may have performance
slightly reduced when irqpoll is used.

> It's possible that most of your problems are caused by the irqpoll option.

Excellent to know.

> What IRQ is the problematic one (see the "nobody cared" message)?  What
> devices are connected to it (see /proc/interrupts)?

>From the dmesg log, the interrupt is 18.

Here is part from /proc/interrupts that contains interrupt 18 *without* irqpoll:

---
   CPU0   CPU1   CPU2   CPU3   
  0: 47  0  0  0   IO-APIC   2-edge  timer
  1:  0  0  0  2   IO-APIC   1-edge  i8042
  7:  0  0  0  0   IO-APIC   7-edge  
parport0
  8:  0  0  0  1   IO-APIC   8-edge  rtc0
  9:  0  0  0  0   IO-APIC   9-fasteoi   acpi
 10:  0  0  0  0   IO-APIC  10-edge  radeon
 12:  0  0  0  4   IO-APIC  12-edge  i8042
 16:  0 96  4990   IO-APIC  16-fasteoi   
ohci_hcd:usb3, ohci_hcd:usb4, snd_hda_intel:card0
 17:  0   2457  1140   IO-APIC  17-fasteoi   
ehci_hcd:usb1
 18:  1 11 43  99947   IO-APIC  18-fasteoi   
ohci_hcd:usb5, ohci_hcd:usb6, ohci_hcd:usb7
 19:  0  0  0  0   IO-APIC  19-fasteoi   
ehci_hcd:usb2
 22:  0  22169139   8731   IO-APIC  22-fasteoi   
ahci[:00:11.0]
 25:  0  0 11753   PCI-MSI 1048576-edge  
eth0
(...)
---

Here is part from /proc/interrupts that contains interrupt 18 *with* irqpoll:

---
   CPU0   CPU1   CPU2   CPU3   
  0: 46  0  0  0   IO-APIC   2-edge  timer
  1:  0  0  0  2   IO-APIC   1-edge  i8042
  7:  0  0  0  0   IO-APIC   7-edge  
parport0
  8:  0  0  0  1   IO-APIC   8-edge  rtc0
  9:  0  0  0  0   IO-APIC   9-fasteoi   acpi
 10:  0  0  0  0   IO-APIC  10-edge  radeon
 12:  0  0  0  4   IO-APIC  12-edge  i8042
 16:  0103  6983   IO-APIC  16-fasteoi   
ohci_hcd:usb3, ohci_hcd:usb4, snd_hda_intel:card0
 17:  0588  0144   IO-APIC  17-fasteoi   
ehci_hcd:usb1
 18:  0  0  0705   IO-APIC  18-fasteoi   
ohci_hcd:usb5, ohci_hcd:usb6, ohci_hcd:usb7
 19:  0  0  0  0   IO-APIC  19-fasteoi   
ehci_hcd:usb2
 22:  0  18049  4   8540   IO-APIC  22-fasteoi   
ahci[:00:11.0]
 25:  0  0  0327   PCI-MSI 1048576-edge  
eth0
(...)
---

I'm attaching both files to this message.

> Does the problem go away when you prevent the corresponding driver(s) from
> loading?

Since the OHCI_HCD driver is built-in (as opposed to a module), I don't know
how to disable it. I can try to recompile the kernel with it as a module and
rename it as some garbage, so that it doesn't get loaded...


Thanks a lot,

-- 
Rogério Brito : rbrito@{ime.usp.br,gmail.com} : GPG key 4096R/BCFC
http://rb.doesntexist.org/blog : Projects : https://github.com/rbrito/
DebianQA: http://qa.debian.org/developer.php?login=rbrito%40ime.usp.br
   CPU0   CPU1   CPU2   CPU3   
  0: 47  0  0  0   IO-APIC   2-edge  timer
  1:  0  0  0  2   IO-APIC   1-edge  i8042
  7:  0  0  0  0   IO-APIC   7-edge  
parport0
  8:  0  0  0  1   IO-APIC   8-edge  rtc0
  9:  0  0  0  0   IO-APIC   9-fasteoi   acpi
 10:  0  0  0  0   IO-APIC  10-edge  radeon
 12:  0  0  0  4   IO-APIC  12-edge  i8042
 16:  0 96  4990   IO-APIC  16-fasteoi   
ohci_hcd:usb3, ohci_hcd:usb4, snd_hda_intel:card0
 17:  0   

Re: Multiple problems with the Linux kernel on an AMD desktop

2016-11-25 Thread Borislav Petkov
On Fri, Nov 25, 2016 at 02:05:48PM -0200, Rogério Brito wrote:
> In fact, I have quite a few computers that are not running Linux that well
> at this moment and I guess that lack of report from final users (or,
> perhaps, reports being lost in the way) prevents those problems from getting
> fixed.

CC me on those, I'd take a look.

> Ihope that my efforts will help other users to have fewer problems with
> Linux on older machines, at least.

> To speed things up a bit, I grabbed Ubuntu's precompiled 4.8 and 4.9-rc6
> (without any patches on top of Linus's tree) and booted on this machine.
> 
> The scanner problem is still there with vanilla 4.8 (with the irqpoll
> option), but is gone with vanilla 4.9-rc6 (with the irqpoll option).

Does -rc6 work *without* irqpoll?

Also, you can diff dmesg from both kernels and see whether you can spot
something relevant.

> I guess that backports of fixes to this (once detected) are needed for
> -stable kernels that distributions are shipping with?

Yes, once we know what fixes the issues.

> The other problems ("nobody cared" and the flood of evbug/lost xx rtc
> interrupts messages) remain with 4.9-rc6.
> 
> Interestingly, for a layman like me:
> 
> * if I remove the irqpoll option, the "hpet1: lost xx rtc interrupts" messages

Aha, so irqpoll is crap. Just remove it.

>   are gone, but I still get messages like
> 
> [  130.007219] evbug: Event. Dev: input6, Type: 0, Code: 0, Value: 0
> [  130.167191] evbug: Event. Dev: input6, Type: 4, Code: 4, Value: 458767
> [  130.167195] evbug: Event. Dev: input6, Type: 1, Code: 38, Value: 1
> [  130.167197] evbug: Event. Dev: input6, Type: 0, Code: 0, Value: 0
> [  130.247174] evbug: Event. Dev: input6, Type: 4, Code: 4, Value: 458767
> 
> * if I keep the irqpoll option, I get both "hpet1: lost xx rtc interrupts"
>   AND the evbug messages remain.

Just blacklist that module, it is for debugging input events.

> I'm attaching the dmesg of 4.9-rc6 both with and without irqpoll to this
> message.

Thanks.

[0.00] DMI: To Be Filled By O.E.M. To Be Filled By O.E.M./To be filled 
by O.E.M., BIOS 0500 05/11/2010

Has your BIOS *ever* been updated? If not, why not?

Yap, that BIOS is "fun":

[0.00] Aperture pointing to e820 RAM. Ignoring.
[0.00] AGP: Your BIOS doesn't leave an aperture memory hole
[0.00] AGP: Please enable the IOMMU option in the BIOS setup
[0.00] AGP: This costs you 64MB of RAM

Do you have an IOMMU option in your BIOS?

[   30.434052] usblp 5-2:1.1: usblp1: USB Bidirectional printer dev 2 if 1 alt 
0 proto 2 vid 0x03F0 pid 0x4811
[   34.157510] irq 18: nobody cared (try booting with the "irqpoll" option)
[   34.157516] CPU: 3 PID: 0 Comm: swapper/3 Not tainted 
4.9.0-040900rc6-generic #201611201731
[   34.157518] Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./To 
be filled by O.E.M., BIOS 0500 05/11/2010
[   34.157520]  8a4cdfd83eb8 8f217542 8a4cd6fbb200 
8a4cd6fbb2b4
[   34.157524]  8a4cdfd83ee8 8eee5005 8a4cd6fbb200 

[   34.157527]  8fd5d560 0022 8a4cdfd83f20 
8eee5393
[   34.157529] Call Trace:
[   34.157531]   
[   34.157537]  [] dump_stack+0x63/0x81
[   34.157540]  [] __report_bad_irq+0x35/0xc0
[   34.157542]  [] note_interrupt+0x243/0x290
[   34.157544]  [] handle_irq_event_percpu+0x54/0x80
[   34.157546]  [] handle_irq_event+0x3e/0x60
[   34.157548]  [] handle_fasteoi_irq+0x9f/0x150
[   34.157551]  [] handle_irq+0x1a/0x30
[   34.157554]  [] do_IRQ+0x4b/0xd0
[   34.157556]  [] common_interrupt+0x82/0x82
[   34.157557]   
[   34.157560]  [] ? native_safe_halt+0x6/0x10
[   34.157562]  [] default_idle+0x20/0xd0
[   34.157565]  [] arch_cpu_idle+0xf/0x20
[   34.157568]  [] default_idle_call+0x23/0x30
[   34.157570]  [] cpu_startup_entry+0x1d0/0x240
[   34.157573]  [] start_secondary+0x151/0x190
[   34.157575] handlers:
[   34.157577] [] usb_hcd_irq
[   34.157578] [] usb_hcd_irq
[   34.157580] [] usb_hcd_irq
[   34.157581] Disabling IRQ #18

Looks to me like that USB host controller driver doesn't want to handle
its interrupt.

Lemme add USB people as I have no clue here why...

-- 
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.


Re: Multiple problems with the Linux kernel on an AMD desktop

2016-11-25 Thread Borislav Petkov
On Fri, Nov 25, 2016 at 02:05:48PM -0200, Rogério Brito wrote:
> In fact, I have quite a few computers that are not running Linux that well
> at this moment and I guess that lack of report from final users (or,
> perhaps, reports being lost in the way) prevents those problems from getting
> fixed.

CC me on those, I'd take a look.

> Ihope that my efforts will help other users to have fewer problems with
> Linux on older machines, at least.

> To speed things up a bit, I grabbed Ubuntu's precompiled 4.8 and 4.9-rc6
> (without any patches on top of Linus's tree) and booted on this machine.
> 
> The scanner problem is still there with vanilla 4.8 (with the irqpoll
> option), but is gone with vanilla 4.9-rc6 (with the irqpoll option).

Does -rc6 work *without* irqpoll?

Also, you can diff dmesg from both kernels and see whether you can spot
something relevant.

> I guess that backports of fixes to this (once detected) are needed for
> -stable kernels that distributions are shipping with?

Yes, once we know what fixes the issues.

> The other problems ("nobody cared" and the flood of evbug/lost xx rtc
> interrupts messages) remain with 4.9-rc6.
> 
> Interestingly, for a layman like me:
> 
> * if I remove the irqpoll option, the "hpet1: lost xx rtc interrupts" messages

Aha, so irqpoll is crap. Just remove it.

>   are gone, but I still get messages like
> 
> [  130.007219] evbug: Event. Dev: input6, Type: 0, Code: 0, Value: 0
> [  130.167191] evbug: Event. Dev: input6, Type: 4, Code: 4, Value: 458767
> [  130.167195] evbug: Event. Dev: input6, Type: 1, Code: 38, Value: 1
> [  130.167197] evbug: Event. Dev: input6, Type: 0, Code: 0, Value: 0
> [  130.247174] evbug: Event. Dev: input6, Type: 4, Code: 4, Value: 458767
> 
> * if I keep the irqpoll option, I get both "hpet1: lost xx rtc interrupts"
>   AND the evbug messages remain.

Just blacklist that module, it is for debugging input events.

> I'm attaching the dmesg of 4.9-rc6 both with and without irqpoll to this
> message.

Thanks.

[0.00] DMI: To Be Filled By O.E.M. To Be Filled By O.E.M./To be filled 
by O.E.M., BIOS 0500 05/11/2010

Has your BIOS *ever* been updated? If not, why not?

Yap, that BIOS is "fun":

[0.00] Aperture pointing to e820 RAM. Ignoring.
[0.00] AGP: Your BIOS doesn't leave an aperture memory hole
[0.00] AGP: Please enable the IOMMU option in the BIOS setup
[0.00] AGP: This costs you 64MB of RAM

Do you have an IOMMU option in your BIOS?

[   30.434052] usblp 5-2:1.1: usblp1: USB Bidirectional printer dev 2 if 1 alt 
0 proto 2 vid 0x03F0 pid 0x4811
[   34.157510] irq 18: nobody cared (try booting with the "irqpoll" option)
[   34.157516] CPU: 3 PID: 0 Comm: swapper/3 Not tainted 
4.9.0-040900rc6-generic #201611201731
[   34.157518] Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./To 
be filled by O.E.M., BIOS 0500 05/11/2010
[   34.157520]  8a4cdfd83eb8 8f217542 8a4cd6fbb200 
8a4cd6fbb2b4
[   34.157524]  8a4cdfd83ee8 8eee5005 8a4cd6fbb200 

[   34.157527]  8fd5d560 0022 8a4cdfd83f20 
8eee5393
[   34.157529] Call Trace:
[   34.157531]   
[   34.157537]  [] dump_stack+0x63/0x81
[   34.157540]  [] __report_bad_irq+0x35/0xc0
[   34.157542]  [] note_interrupt+0x243/0x290
[   34.157544]  [] handle_irq_event_percpu+0x54/0x80
[   34.157546]  [] handle_irq_event+0x3e/0x60
[   34.157548]  [] handle_fasteoi_irq+0x9f/0x150
[   34.157551]  [] handle_irq+0x1a/0x30
[   34.157554]  [] do_IRQ+0x4b/0xd0
[   34.157556]  [] common_interrupt+0x82/0x82
[   34.157557]   
[   34.157560]  [] ? native_safe_halt+0x6/0x10
[   34.157562]  [] default_idle+0x20/0xd0
[   34.157565]  [] arch_cpu_idle+0xf/0x20
[   34.157568]  [] default_idle_call+0x23/0x30
[   34.157570]  [] cpu_startup_entry+0x1d0/0x240
[   34.157573]  [] start_secondary+0x151/0x190
[   34.157575] handlers:
[   34.157577] [] usb_hcd_irq
[   34.157578] [] usb_hcd_irq
[   34.157580] [] usb_hcd_irq
[   34.157581] Disabling IRQ #18

Looks to me like that USB host controller driver doesn't want to handle
its interrupt.

Lemme add USB people as I have no clue here why...

-- 
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.


Re: Multiple problems with the Linux kernel on an AMD desktop

2016-11-25 Thread Clemens Ladisch
Rogério Brito wrote:
> [  130.007219] evbug: Event. Dev: input6, Type: 0, Code: 0, Value: 0

The evbug module is intended for debugging; it dumps all input events
into syslog.  If you do not want these messages, do not load this module.
(If it is loaded automatically, you have an actual bug.)


Regards,
Clemens


Re: Multiple problems with the Linux kernel on an AMD desktop

2016-11-25 Thread Clemens Ladisch
Rogério Brito wrote:
> [  130.007219] evbug: Event. Dev: input6, Type: 0, Code: 0, Value: 0

The evbug module is intended for debugging; it dumps all input events
into syslog.  If you do not want these messages, do not load this module.
(If it is loaded automatically, you have an actual bug.)


Regards,
Clemens


Re: Multiple problems with the Linux kernel on an AMD desktop

2016-11-25 Thread Rogério Brito
Dear Boris and Clemens,

First of all, thank you very much for your replies. They are very much
appreciated.

On Nov 25 2016, Borislav Petkov wrote:
> On Thu, Nov 24, 2016 at 09:39:57PM -0200, Rogério Brito wrote:
> > Before I go on describing the problems that I have, I want to say that I can
> > bisect the kernel, apply patches and give feedback for the problems that I
> > am seeing.
> 
> Good. We're going to need them.

Great. I'm willing to do that.

In fact, I have quite a few computers that are not running Linux that well
at this moment and I guess that lack of report from final users (or,
perhaps, reports being lost in the way) prevents those problems from getting
fixed.

Ihope that my efforts will help other users to have fewer problems with
Linux on older machines, at least.

> Please checkout lates Linus kernel:
> 
> http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git
> 
> build it, boot it on your machine, catch dmesg and send it to me.

To speed things up a bit, I grabbed Ubuntu's precompiled 4.8 and 4.9-rc6
(without any patches on top of Linus's tree) and booted on this machine.

The scanner problem is still there with vanilla 4.8 (with the irqpoll
option), but is gone with vanilla 4.9-rc6 (with the irqpoll option).

I guess that backports of fixes to this (once detected) are needed for
-stable kernels that distributions are shipping with?

The other problems ("nobody cared" and the flood of evbug/lost xx rtc
interrupts messages) remain with 4.9-rc6.

Interestingly, for a layman like me:

* if I remove the irqpoll option, the "hpet1: lost xx rtc interrupts" messages
  are gone, but I still get messages like

[  130.007219] evbug: Event. Dev: input6, Type: 0, Code: 0, Value: 0
[  130.167191] evbug: Event. Dev: input6, Type: 4, Code: 4, Value: 458767
[  130.167195] evbug: Event. Dev: input6, Type: 1, Code: 38, Value: 1
[  130.167197] evbug: Event. Dev: input6, Type: 0, Code: 0, Value: 0
[  130.247174] evbug: Event. Dev: input6, Type: 4, Code: 4, Value: 458767

* if I keep the irqpoll option, I get both "hpet1: lost xx rtc interrupts"
  AND the evbug messages remain.

I'm attaching the dmesg of 4.9-rc6 both with and without irqpoll to this
message.

I'm now going to chase the information regarding /proc/interrupts that
Clemens asked about.


Thanks,

-- 
Rogério Brito : rbrito@{ime.usp.br,gmail.com} : GPG key 4096R/BCFC
http://rb.doesntexist.org/blog : Projects : https://github.com/rbrito/
DebianQA: http://qa.debian.org/developer.php?login=rbrito%40ime.usp.br


dmesg-4.9.0-040900rc6-generic-with-irqpoll-1480088522.log.gz
Description: application/gzip


dmesg-4.9.0-040900rc6-generic-without-irqpoll-1480087431.log.gz
Description: application/gzip


Re: Multiple problems with the Linux kernel on an AMD desktop

2016-11-25 Thread Rogério Brito
Dear Boris and Clemens,

First of all, thank you very much for your replies. They are very much
appreciated.

On Nov 25 2016, Borislav Petkov wrote:
> On Thu, Nov 24, 2016 at 09:39:57PM -0200, Rogério Brito wrote:
> > Before I go on describing the problems that I have, I want to say that I can
> > bisect the kernel, apply patches and give feedback for the problems that I
> > am seeing.
> 
> Good. We're going to need them.

Great. I'm willing to do that.

In fact, I have quite a few computers that are not running Linux that well
at this moment and I guess that lack of report from final users (or,
perhaps, reports being lost in the way) prevents those problems from getting
fixed.

Ihope that my efforts will help other users to have fewer problems with
Linux on older machines, at least.

> Please checkout lates Linus kernel:
> 
> http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git
> 
> build it, boot it on your machine, catch dmesg and send it to me.

To speed things up a bit, I grabbed Ubuntu's precompiled 4.8 and 4.9-rc6
(without any patches on top of Linus's tree) and booted on this machine.

The scanner problem is still there with vanilla 4.8 (with the irqpoll
option), but is gone with vanilla 4.9-rc6 (with the irqpoll option).

I guess that backports of fixes to this (once detected) are needed for
-stable kernels that distributions are shipping with?

The other problems ("nobody cared" and the flood of evbug/lost xx rtc
interrupts messages) remain with 4.9-rc6.

Interestingly, for a layman like me:

* if I remove the irqpoll option, the "hpet1: lost xx rtc interrupts" messages
  are gone, but I still get messages like

[  130.007219] evbug: Event. Dev: input6, Type: 0, Code: 0, Value: 0
[  130.167191] evbug: Event. Dev: input6, Type: 4, Code: 4, Value: 458767
[  130.167195] evbug: Event. Dev: input6, Type: 1, Code: 38, Value: 1
[  130.167197] evbug: Event. Dev: input6, Type: 0, Code: 0, Value: 0
[  130.247174] evbug: Event. Dev: input6, Type: 4, Code: 4, Value: 458767

* if I keep the irqpoll option, I get both "hpet1: lost xx rtc interrupts"
  AND the evbug messages remain.

I'm attaching the dmesg of 4.9-rc6 both with and without irqpoll to this
message.

I'm now going to chase the information regarding /proc/interrupts that
Clemens asked about.


Thanks,

-- 
Rogério Brito : rbrito@{ime.usp.br,gmail.com} : GPG key 4096R/BCFC
http://rb.doesntexist.org/blog : Projects : https://github.com/rbrito/
DebianQA: http://qa.debian.org/developer.php?login=rbrito%40ime.usp.br


dmesg-4.9.0-040900rc6-generic-with-irqpoll-1480088522.log.gz
Description: application/gzip


dmesg-4.9.0-040900rc6-generic-without-irqpoll-1480087431.log.gz
Description: application/gzip


Re: Multiple problems with the Linux kernel on an AMD desktop

2016-11-25 Thread Clemens Ladisch
Rogério Brito wrote:
> * I have never been able to boot this computer of mine without the option
>   irqpoll---otherwise, I get the nobody cared message.

The "nobody cared" message indicates that there were too many interrupts
that no driver felt responsible for, so the kernel has disabled that
interrupt vector.  The irqpoll option is a workaround to get the devices
on that interrupt vector to work, but it's not perfect.

It's possible that most of your problems are caused by the irqpoll option.

What IRQ is the problematic one (see the "nobody cared" message)?  What
devices are connected to it (see /proc/interrupts)?  Does the problem go
away when you prevent the corresponding driver(s) from loading?


Regards,
Clemens


Re: Multiple problems with the Linux kernel on an AMD desktop

2016-11-25 Thread Clemens Ladisch
Rogério Brito wrote:
> * I have never been able to boot this computer of mine without the option
>   irqpoll---otherwise, I get the nobody cared message.

The "nobody cared" message indicates that there were too many interrupts
that no driver felt responsible for, so the kernel has disabled that
interrupt vector.  The irqpoll option is a workaround to get the devices
on that interrupt vector to work, but it's not perfect.

It's possible that most of your problems are caused by the irqpoll option.

What IRQ is the problematic one (see the "nobody cared" message)?  What
devices are connected to it (see /proc/interrupts)?  Does the problem go
away when you prevent the corresponding driver(s) from loading?


Regards,
Clemens


Re: Multiple problems with the Linux kernel on an AMD desktop

2016-11-24 Thread Borislav Petkov
On Thu, Nov 24, 2016 at 09:39:57PM -0200, Rogério Brito wrote:
> Before I go on describing the problems that I have, I want to say that I can
> bisect the kernel, apply patches and give feedback for the problems that I
> am seeing.

Good. We're going to need them.

Please checkout lates Linus kernel:

http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git

build it, boot it on your machine, catch dmesg and send it to me.

Thanks!

-- 
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.


Re: Multiple problems with the Linux kernel on an AMD desktop

2016-11-24 Thread Borislav Petkov
On Thu, Nov 24, 2016 at 09:39:57PM -0200, Rogério Brito wrote:
> Before I go on describing the problems that I have, I want to say that I can
> bisect the kernel, apply patches and give feedback for the problems that I
> am seeing.

Good. We're going to need them.

Please checkout lates Linus kernel:

http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git

build it, boot it on your machine, catch dmesg and send it to me.

Thanks!

-- 
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.