Re: disabling secondary CPU hangs / system fails to suspend with kernel 4.19+

2019-04-11 Thread Thomas Müller
Hi,

good news: starting with 5.0.6 suspend is working again.

Best regards
Thomas

Am 29.03.19 um 10:22 schrieb Thomas Müller:
> Hi,
> 
> Am 18.03.19 um 12:57 schrieb Peter Zijlstra:
>> On Fri, Mar 15, 2019 at 09:21:02PM +0100, Thomas Müller wrote:
>>> I've just re-tested with runlevel 3.
>>> Not a real VGA console, but at least no Wayland or Gnome to interfere...
>>>
>>> `echo 0 > /sys/...` just blocks and no message whatsoever is visible in 
>>> dmesg.
>>>
>>> I've executed `echo 0 > ...` in the background to keep my console 
>>> functional and I can e.g. echo
>>> something to /dev/kmsg and it shows up, so reading/updating the log buffer 
>>> appears to be working
>>> just fine.
>>
>> Damn.. Thanks for trying. I'll see if I can come up with something, but
>> I'm out of idea for now :/
>>
> Any new ideas so far?
> 
> For reference:
> I've just tested a vanilla 5.0.5 with localmodconfig (attached)... same 
> behavior :(
> 
> 
> Best regards
> Thomas
> 


Re: disabling secondary CPU hangs / system fails to suspend with kernel 4.19+

2019-03-15 Thread Thomas Müller
Hi,

Am 15.03.19 um 13:15 schrieb Peter Zijlstra:
> On Fri, Mar 15, 2019 at 12:41:00PM +0100, Thomas Müller wrote:
> 
>>> What .config do you have?
>> The one packaged by Fedora. I've attached the one for 4.20.15 as reference.
> 
> Thanks, I'll have a poke, see what, if anything, is different from the
> kernels I ran.
> 
>>> And what, if anything do you see on the
>>> console when it goes funny?
>> Nothing unfortunately.
>> When trying to suspend the display immediately goes blank, the system 
>> becomes unresponsive and the
>> status LED within the power button start flashing rapidly (just like it does 
>> when the power cord is
>> attached).
>>
>>
>>> I think you wrote that hot-un-plug never completes? Is there anything in
>>> dmesg when it's stuck in:
>>>
>>>   echo 0 > /sys/devices/system/cpu/cpu1/online
>>>
>>> ?
>> I've just tried that again and the system immediately froze.
> 
> Hmm, I tought you said the system remained semi usable, just that reboot
> stopped working thereafter and it needed a power cycle.
> 
>> `journalctl -f` was running in a second window but it had no chance to 
>> output anything... :/
> 
> Ah, you're using a GUI!
> 
> Stop doing that ;-)
Easier said than done ;)

> See if you can use the VGA console; not a FB console or a DRM console,
> but the real ancient, proper text mode, VGA console.
I've just re-tested with runlevel 3.
Not a real VGA console, but at least no Wayland or Gnome to interfere...

`echo 0 > /sys/...` just blocks and no message whatsoever is visible in dmesg.

I've executed `echo 0 > ...` in the background to keep my console functional 
and I can e.g. echo
something to /dev/kmsg and it shows up, so reading/updating the log buffer 
appears to be working
just fine.
A power cycle is still necessary to recover the system.


> Now, don't ask me how to do that, because I don't know, I've been
> running on pure serial console output for the past 10 years or so, heck
> I don't even have systemd.
> 
> And you might have to do something like: dmesg -n8, to get the console
> to print the kernel messages or something.
> 


disabling secondary CPU hangs / system fails to suspend with kernel 4.19+

2019-03-14 Thread Thomas Müller
Hi,

starting with kernel 4.19 my Lenovo ThinkPad X1 Carbon 5th no longer properly 
suspends.

This is 100% reproducible and git bisect points to the following commit:
> [be45bf5395e0886a93fc816bbe41a008ec2e42e2] watchdog/softlockup: Fix 
> cpu_stop_queue_work() double-queue bug
> be45bf5395e0886a93fc816bbe41a008ec2e42e2 is the first bad commit
> commit be45bf5395e0886a93fc816bbe41a008ec2e42e2
> Author: Peter Zijlstra 
> Date:   Fri Jul 13 12:42:08 2018 +0200
> 
> watchdog/softlockup: Fix cpu_stop_queue_work() double-queue bug
> 
> When scheduling is delayed for longer than the softlockup interrupt
> period it is possible to double-queue the cpu_stop_work, causing list
> corruption.
> 
> Cure this by adding a completion to track the cpu_stop_work's
> progress.
> 
> Reported-by: kernel test robot 
> Tested-by: Rong Chen 
> Signed-off-by: Peter Zijlstra (Intel) 
> Cc: Linus Torvalds 
> Cc: Peter Zijlstra 
> Cc: Thomas Gleixner 
> Fixes: 9cf57731b63e ("watchdog/softlockup: Replace "watchdog/%u" threads 
> with cpu_stop_work")
> Link: 
> http://lkml.kernel.org/r/20180713104208.gw2...@hirez.programming.kicks-ass.net
> Signed-off-by: Ingo Molnar 
> 
> :04 04 6aca2dbb84bc33fe442b18b3d0a135c27adff7b9 
> 2710af12d32e4b98df07768716689b213bce45fc M  kernel

The bugzilla reports have some additional details:
* https://bugzilla.redhat.com/show_bug.cgi?id=1671504
* https://bugzilla.kernel.org/show_bug.cgi?id=202679
* https://bugzilla.kernel.org/show_bug.cgi?id=202137

I'm happy to provide additional information or test a patch or two (as long as 
it doesn't
eat up my notebook ;))


Best regards
Thomas


[BUG] Lockup on boot when trying to bring up r8169 NIC

2007-07-19 Thread Thomas Müller
Hi,

I already sent this two days ago, but I have the feeling it was
overlooked or filtered because of a large attachment.


If I try to boot 2.6.21.6, 2.6.22.1 or 2.6.22-git8 the system completely
hangs when init tries to bring up my r8169-based NIC. Not even the
keyboard lights are working anymore.

If I unplug the network cable, boot continues just fine and everything
works as it should.
If I boot with the cable unplugged, the system also hangs and continues
after I plug in the cable.


Everything works fine with 2.6.20.15.


Configuration:
  http://www.mathtm.de/config_2.6.20.15_fc6based
  http://www.mathtm.de/config_2.6.21.6_f7based


Using a Fedora kernel (based on 2.6.21.5) I get the following kernel
message:
r8169: eth0: link down
BUG: soft lockup detected on CPU#0!
 [] softlockup_tick+0xa5/0xb4
 [] update_process_times+0x3b/0x5e
 [] tick_sched_timer+0x57/0x9a
 [] hrtimer_interrupt+0x12b/0x1b6
 [] tick_sched_timer+0x0/0x9a
 [] timer_interrupt+0x2c/0x32
 [] handle_IRQ_event+0x1a/0x3f
 [] handle_level_irq+0x81/0xc7
 [] do_IRQ+0xb8/0xd1
 [] common_interrupt+0x23/0x28
 [] handle_IRQ_event+0x11/0x3f
 [] handle_level_irq+0x81/0xc7
 [] handle_level_irq+0x0/0xc7
 [] do_IRQ+0xac/0xd1
 [] common_interrupt+0x23/0x28
 [] __do_softirq+0x54/0xba
 [] do_softirq+0x59/0xb1
 [] handle_level_irq+0x0/0xc7
 [] irq_exit+0x38/0x6b
 [] do_IRQ+0xbd/0xd1
 [] common_interrupt+0x23/0x28
 [] find_busiest_group+0x264/0x4c5
 [] _spin_unlock_irqrestore+0x8/0x9
 [] __mod_timer+0xa1/0xab
 [] rtl8169_open+0x12e/0x194 [r8169]
 [] dev_open+0x2b/0x62
 [] dev_change_flags+0x47/0xe4
 [] devinet_ioctl+0x250/0x56a
 [] copy_to_user+0x3c/0x50
 [] sock_ioctl+0x19f/0x1be
 [] sock_ioctl+0x0/0x1be
 [] do_ioctl+0x1f/0x62
 [] vfs_ioctl+0x244/0x256
 [] sys_ioctl+0x4c/0x64
 [] syscall_call+0x7/0xb
 ===
r8169: eth0: link up



There already is a bugzilla entry at
http://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=242572
I know, not everyone is a fan of bugzilla, but maybe someone wants to
take a look at what was discussed there.


Please CC me as I'm not subscribed to the list and don't hesitate to
tell me that I forgot to include some crucial information ;)


Regards,
Thomas

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[BUG] Lockup on boot when trying to bring up r8169 NIC

2007-07-19 Thread Thomas Müller
Hi,

I already sent this two days ago, but I have the feeling it was
overlooked or filtered because of a large attachment.


If I try to boot 2.6.21.6, 2.6.22.1 or 2.6.22-git8 the system completely
hangs when init tries to bring up my r8169-based NIC. Not even the
keyboard lights are working anymore.

If I unplug the network cable, boot continues just fine and everything
works as it should.
If I boot with the cable unplugged, the system also hangs and continues
after I plug in the cable.


Everything works fine with 2.6.20.15.


Configuration:
  http://www.mathtm.de/config_2.6.20.15_fc6based
  http://www.mathtm.de/config_2.6.21.6_f7based


Using a Fedora kernel (based on 2.6.21.5) I get the following kernel
message:
r8169: eth0: link down
BUG: soft lockup detected on CPU#0!
 [c0451ea2] softlockup_tick+0xa5/0xb4
 [c042e930] update_process_times+0x3b/0x5e
 [c043d298] tick_sched_timer+0x57/0x9a
 [c0439df5] hrtimer_interrupt+0x12b/0x1b6
 [c043d241] tick_sched_timer+0x0/0x9a
 [c0408534] timer_interrupt+0x2c/0x32
 [c045210e] handle_IRQ_event+0x1a/0x3f
 [c045354e] handle_level_irq+0x81/0xc7
 [c04072c7] do_IRQ+0xb8/0xd1
 [c04058ff] common_interrupt+0x23/0x28
 [c0452105] handle_IRQ_event+0x11/0x3f
 [c045354e] handle_level_irq+0x81/0xc7
 [c04534cd] handle_level_irq+0x0/0xc7
 [c04072bb] do_IRQ+0xac/0xd1
 [c04058ff] common_interrupt+0x23/0x28
 [c042b2dc] __do_softirq+0x54/0xba
 [c04071b7] do_softirq+0x59/0xb1
 [c04534cd] handle_level_irq+0x0/0xc7
 [c042b194] irq_exit+0x38/0x6b
 [c04072cc] do_IRQ+0xbd/0xd1
 [c04058ff] common_interrupt+0x23/0x28
 [c04200d8] find_busiest_group+0x264/0x4c5
 [c0601895] _spin_unlock_irqrestore+0x8/0x9
 [c042e863] __mod_timer+0xa1/0xab
 [f8a4e1ec] rtl8169_open+0x12e/0x194 [r8169]
 [c05a3054] dev_open+0x2b/0x62
 [c05a1aa1] dev_change_flags+0x47/0xe4
 [c05de45c] devinet_ioctl+0x250/0x56a
 [c04e72c0] copy_to_user+0x3c/0x50
 [c0598b47] sock_ioctl+0x19f/0x1be
 [c05989a8] sock_ioctl+0x0/0x1be
 [c047f713] do_ioctl+0x1f/0x62
 [c047f99a] vfs_ioctl+0x244/0x256
 [c047f9f8] sys_ioctl+0x4c/0x64
 [c0404f70] syscall_call+0x7/0xb
 ===
r8169: eth0: link up



There already is a bugzilla entry at
http://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=242572
I know, not everyone is a fan of bugzilla, but maybe someone wants to
take a look at what was discussed there.


Please CC me as I'm not subscribed to the list and don't hesitate to
tell me that I forgot to include some crucial information ;)


Regards,
Thomas

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/