Ever since upgrading to a Ryzen (1700X), I have experienced frequent
system freezes, which may be related to the problems discussed here.
The freeze mostly happens during a certain heavily threaded task with
disk io.
Symptoms:
* Screen completely freezes, including mouse pointer,
* Existing SSH connections die, no new connection can be established,
* System can no longer switch to text console,
* LEDs indicate **unceasing disk activity**,
* System still responds to pings,
* Alt-SysRq keys remain active, but cannot output to screen even if already in
text console.
I've succeeded in capturing kernel logging after a freeze using
netconsole:
This timeout message appears:
[35042.581242] INFO: task jbd2/dm-2-8:610 blocked for more than 120 seconds.
[35042.581259] Not tainted 4.15.0-62-generic #69-Ubuntu
[35042.581262] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables
this message.
[35042.581273] jbd2/dm-2-8 D 0 610 2 0x80000000
[35042.581278] Call Trace:
[35042.581290] __schedule+0x24e/0x880
[35042.581295] ? bit_wait+0x60/0x60
[35042.581300] schedule+0x2c/0x80
[35042.581304] io_schedule+0x16/0x40
[35042.581308] bit_wait_io+0x11/0x60
[35042.581313] __wait_on_bit+0x4c/0x90
[35042.581317] out_of_line_wait_on_bit+0x90/0xb0
[35042.581323] ? bit_waitqueue+0x40/0x40
[35042.581328] __wait_on_buffer+0x32/0x40
[35042.581333] jbd2_journal_commit_transaction+0xdac/0x1730
[35042.581337] ? __switch_to_asm+0x41/0x70
[35042.581343] kjournald2+0xc8/0x270
[35042.581347] ? kjournald2+0xc8/0x270
[35042.581351] ? wait_woken+0x80/0x80
[35042.581355] kthread+0x121/0x140
[35042.581359] ? commit_timeout+0x20/0x20
[35042.581363] ? kthread_create_worker_on_cpu+0x70/0x70
[35042.581366] ret_from_fork+0x22/0x40
[35042.581242] INFO: task jbd2/dm-2-8:610 blocked for more than 120 seconds.
[35042.581259] Not tainted 4.15.0-62-generic #69-Ubuntu
[35042.581262] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables
this message.
[35042.581273] jbd2/dm-2-8 D 0 610 2 0x80000000
[35042.581278] Call Trace:
[35042.581290] __schedule+0x24e/0x880
[35042.581295] ? bit_wait+0x60/0x60
[35042.581300] schedule+0x2c/0x80
[35042.581304] io_schedule+0x16/0x40
[35042.581308] bit_wait_io+0x11/0x60
[35042.581313] __wait_on_bit+0x4c/0x90
[35042.581317] out_of_line_wait_on_bit+0x90/0xb0
[35042.581323] ? bit_waitqueue+0x40/0x40
[35042.581328] __wait_on_buffer+0x32/0x40
[35042.581333] jbd2_journal_commit_transaction+0xdac/0x1730
[35042.581337] ? __switch_to_asm+0x41/0x70
[35042.581343] kjournald2+0xc8/0x270
[35042.581347] ? kjournald2+0xc8/0x270
[35042.581351] ? wait_woken+0x80/0x80
[35042.581355] kthread+0x121/0x140
[35042.581359] ? commit_timeout+0x20/0x20
[35042.581363] ? kthread_create_worker_on_cpu+0x70/0x70
[35042.581366] ret_from_fork+0x22/0x40
Also, I have thousands of lines of output for blocked tasks. Most traces
look more or less like this:
[34274.346748] sysrq: SysRq : Show Blocked State
[34274.346766] task PC stack pid father
[34274.346771] systemd D 0 1 0 0x00000000
[34274.346776] Call Trace:
[34274.346786] __schedule+0x24e/0x880
[34274.346792] ? mempool_alloc_slab+0x15/0x20
[34274.346795] schedule+0x2c/0x80
[34274.346798] schedule_timeout+0x15d/0x350
[34274.346804] ? __next_timer_interrupt+0xe0/0xe0
[34274.346808] ? wait_woken+0x80/0x80
[34274.346812] io_schedule_timeout+0x1e/0x50
[34274.346815] mempool_alloc+0x15d/0x190
[34274.346820] ? wait_woken+0x80/0x80
[34274.346825] bio_alloc_bioset+0xa9/0x1e0
[34274.346830] __split_and_process_non_flush+0x147/0x2c0
[34274.346834] __split_and_process_bio+0x139/0x2a0
[34274.346838] dm_make_request+0x7a/0xd0
[34274.346843] ? SyS_madvise+0x990/0x990
[34274.346847] generic_make_request+0x124/0x300
[34274.346850] submit_bio+0x73/0x140
[34274.346853] ? submit_bio+0x73/0x140
[34274.346856] ? get_swap_bio+0xcd/0x100
[34274.346861] __swap_writepage+0x323/0x3b0
[34274.346865] ? __frontswap_store+0x73/0x100
[34274.346869] swap_writepage+0x34/0x90
[34274.346872] pageout.isra.54+0x11b/0x350
[34274.346878] shrink_page_list+0x99a/0xbc0
[34274.346883] shrink_inactive_list+0x242/0x590
[34274.346887] shrink_node_memcg+0x364/0x770
[34274.346892] shrink_node+0xf7/0x300
[34274.346896] ? shrink_node+0xf7/0x300
[34274.346900] do_try_to_free_pages+0xc9/0x330
[34274.346904] try_to_free_pages+0xee/0x1b0
[34274.346910] __alloc_pages_slowpath+0x3fc/0xe00
[34274.346914] ? __switch_to_asm+0x35/0x70
[34274.346917] ? __switch_to_asm+0x35/0x70
[34274.346920] ? __switch_to_asm+0x35/0x70
[34274.346924] ? __switch_to_asm+0x35/0x70
[34274.346929] ? __switch_to_asm+0x35/0x70
[34274.346932] ? __switch_to_asm+0x41/0x70
[34274.346936] __alloc_pages_nodemask+0x29a/0x2c0
[34274.346940] alloc_pages_current+0x6a/0xe0
[34274.346944] __page_cache_alloc+0x81/0xa0
[34274.346948] __do_page_cache_readahead+0x113/0x2c0
[34274.346952] ? radix_tree_lookup_slot+0x22/0x50
[34274.346956] ? find_get_entry+0x1e/0x110
[34274.346959] filemap_fault+0x2ad/0x6f0
[34274.346968] ? filemap_fault+0x2ad/0x6f0
[34274.346971] ? page_add_file_rmap+0x134/0x180
[34274.346975] ? filemap_map_pages+0x181/0x390
[34274.346980] ext4_filemap_fault+0x31/0x44
[34274.346748] sysrq: SysRq : Show Blocked State
[34274.346984] __do_fault+0x5b/0x115
[34274.346988] __handle_mm_fault+0xdef/0x1290
[34274.346992] handle_mm_fault+0xb1/0x210
[34274.346997] __do_page_fault+0x281/0x4b0
[34274.347001] do_page_fault+0x2e/0xe0
[34274.347004] ? page_fault+0x2f/0x50
[34274.347008] page_fault+0x45/0x50
[34274.347011] RIP: 0033:0x7fa9446ee83a
[34274.347015] RSP: 002b:00007ffcccb01470 EFLAGS: 00010206
[34274.347019] RAX: 0000000000000001 RBX: 00005615eff63650 RCX:
00007fa944bcebb7
[34274.347021] RDX: 0000000000000093 RSI: 00007ffcccb01470 RDI:
0000000000000000
[34274.347025] RBP: 00007ffcccb01c60 R08: 0000000000000000 R09:
0000000000000008
[34274.347027] R10: 00000000ffffffff R11: 0000000000000000 R12:
0000000000000001
[34274.347032] R13: ffffffffffffffff R14: 00007ffcccb01470 R15:
0000000000000001
Another detail that may be relevant it that show-task-states outputs
about 5000 lines of this kind:
[34830.962684] in-flight: 3235:kcryptd_crypt [dm_crypt],
3237:kcryptd_crypt [dm_crypt], 6056:kcryptd_crypt [dm_crypt],
6058:kcryptd_crypt [dm_crypt], 6055:kcryptd_crypt [dm_crypt],
6057:kcryptd_crypt [dm_crypt], 3992:kcryptd_crypt [dm_crypt],
4861:kcryptd_crypt [dm_crypt], 32431:kcryptd_crypt [dm_crypt],
2682:kcryptd_crypt [dm_crypt], 4850:kcryptd_crypt [dm_crypt],
1429:kcryptd_crypt [dm_crypt], 6054:kcryptd_crypt [dm_crypt],
6060:kcryptd_crypt [dm_crypt], 4862:kcryptd_crypt [dm_crypt],
1862:kcryptd_crypt [dm_crypt]
[34830.874519] DefaultDispatch 22853 343906.389378 12131 120
[34830.962714] delayed: kcryptd_crypt [dm_crypt], kcryptd_crypt
[dm_crypt], kcryptd_crypt [dm_crypt], kcryptd_crypt [dm_crypt], kcryptd_crypt
[dm_crypt], kcryptd_crypt [dm_crypt], kcryptd_crypt [dm_crypt], kcryptd_crypt
[dm_crypt], kcryptd_crypt [dm_crypt], kcryptd_crypt [dm_crypt], kcryptd_crypt
[dm_crypt], kcryptd_crypt [dm_crypt], kcryptd_crypt [dm_crypt], kcryptd_crypt
[dm_crypt], kcryptd_crypt [dm_crypt], kcryptd_crypt [dm_crypt], kcryptd_crypt
[dm_crypt], kcryptd_crypt [dm_crypt], kcryptd_crypt [dm_crypt], kcryptd_crypt
[dm_crypt], kcryptd_crypt [dm_crypt], kcryptd_crypt [dm_crypt], kcryptd_crypt
[dm_crypt], kcryptd_crypt [dm_crypt], kcryptd_crypt [dm_crypt], kcryptd_crypt
[dm_crypt], kcryptd_crypt [dm_crypt], kcryptd_crypt [dm_crypt], kcryptd_crypt
[dm_crypt], kcryptd_crypt [dm_crypt], kcryptd_crypt [dm_crypt]
[34830.962761] , kcryptd_crypt [dm_crypt], kcryptd_crypt [dm_crypt],
kcryptd_crypt [dm_crypt], kcryptd_crypt [dm_crypt], kcryptd_crypt [dm_crypt],
kcryptd_crypt [dm_crypt], kcryptd_crypt [dm_crypt], kcryptd_crypt [dm_crypt],
kcryptd_crypt [dm_crypt], kcryptd_crypt [dm_crypt], kcryptd_crypt [dm_crypt],
kcryptd_crypt [dm_crypt], kcryptd_crypt [dm_crypt], kcryptd_crypt [dm_crypt],
kcryptd_crypt [dm_crypt], kcryptd_crypt [dm_crypt], kcryptd_crypt [dm_crypt],
kcryptd_crypt [dm_crypt], kcryptd_crypt [dm_crypt], kcryptd_crypt [dm_crypt],
kcryptd_crypt [dm_crypt], kcryptd_crypt [dm_crypt], kcryptd_crypt [dm_crypt],
kcryptd_crypt [dm_crypt], kcryptd_crypt [dm_crypt], kcryptd_crypt [dm_crypt],
kcryptd_crypt [dm_crypt], kcryptd_crypt [dm_crypt], kcryptd_crypt [dm_crypt],
kcryptd_crypt [dm_crypt], kcryptd_crypt [dm_crypt]
[34830.881008] .nr_spread_over : 0
[34830.962862] , kcryptd_crypt [dm_crypt], kcryptd_crypt [dm_crypt],
kcryptd_crypt [dm_crypt], kcryptd_crypt [dm_crypt], kcryptd_crypt [dm_crypt],
kcryptd_crypt [dm_crypt], kcryptd_crypt [dm_crypt], kcryptd_crypt [dm_crypt],
kcryptd_crypt [dm_crypt], kcryptd_crypt [dm_crypt], kcryptd_crypt [dm_crypt],
kcryptd_crypt [dm_crypt], kcryptd_crypt [dm_crypt], kcryptd_crypt [dm_crypt],
kcryptd_crypt [dm_crypt], kcryptd_crypt [dm_crypt], kcryptd_crypt [dm_crypt],
kcryptd_crypt [dm_crypt], kcryptd_crypt [dm_crypt], kcryptd_crypt [dm_crypt],
kcryptd_crypt [dm_crypt], kcryptd_crypt [dm_crypt], kcryptd_crypt [dm_crypt],
kcryptd_crypt [dm_crypt], kcryptd_crypt [dm_crypt], kcryptd_crypt [dm_crypt],
kcryptd_crypt [dm_crypt], kcryptd_crypt [dm_crypt], kcryptd_crypt [dm_crypt],
kcryptd_crypt [dm_crypt], kcryptd_crypt [dm_crypt]
Is there someone who can interpret all this? If it is helpful I can
attach the full blocked-tasks output.
(kernel version is 4.15.0-62-generic)
--
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1690085
Title:
Ryzen 1800X freeze - rcu_sched detected stalls on CPUs/tasks
To manage notifications about this bug go to:
https://bugs.launchpad.net/linux/+bug/1690085/+subscriptions
--
ubuntu-bugs mailing list
[email protected]
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs