Public bug reported: Hibernation on AWS instances with jammy/5.19.0-1019-aws sometimes fails due to the following failure to freeze:
Feb 1 01:09:05 ip-172-31-54-178 kernel: [ 443.247854] PM: hibernation: hibernation entry Feb 1 01:09:05 ip-172-31-54-178 kernel: [ 443.347353] TSC found unstable after boot, most likely due to broken BIOS. Use 'tsc=unstable'. Feb 1 01:09:05 ip-172-31-54-178 kernel: [ 443.347355] sched_clock: Marking unstable (442909362062, 1007864825)<-(443748056670, -400707172) Feb 1 01:12:33 ip-172-31-54-178 kernel: [ 443.940489] Filesystems sync: 0.022 seconds Feb 1 01:12:33 ip-172-31-54-178 kernel: [ 443.940492] Freezing user space processes ... (elapsed 0.001 seconds) done. Feb 1 01:12:33 ip-172-31-54-178 kernel: [ 443.941611] OOM killer disabled. Feb 1 01:12:33 ip-172-31-54-178 kernel: [ 443.943036] PM: hibernation: Marking nosave pages: [mem 0x00000000-0x00000fff] Feb 1 01:12:33 ip-172-31-54-178 kernel: [ 443.943039] PM: hibernation: Marking nosave pages: [mem 0x0009f000-0x000fffff] Feb 1 01:12:33 ip-172-31-54-178 kernel: [ 443.943041] PM: hibernation: Marking nosave pages: [mem 0xbffe8000-0xffffffff] Feb 1 01:12:33 ip-172-31-54-178 kernel: [ 443.943950] PM: hibernation: Basic memory bitmaps created Feb 1 01:12:33 ip-172-31-54-178 kernel: [ 443.943961] PM: hibernation: Preallocating image memory Feb 1 01:12:33 ip-172-31-54-178 kernel: [ 630.782421] PM: hibernation: Allocated 9655951 pages for snapshot Feb 1 01:12:33 ip-172-31-54-178 kernel: [ 630.782424] PM: hibernation: Allocated 38623804 kbytes in 186.83 seconds (206.73 MB/s) Feb 1 01:12:33 ip-172-31-54-178 kernel: [ 630.782426] Freezing remaining freezable tasks ... Feb 1 01:12:33 ip-172-31-54-178 kernel: [ 650.789826] Freezing of tasks failed after 20.007 seconds (1 tasks refusing to freeze, wq_busy=0): Feb 1 01:12:33 ip-172-31-54-178 kernel: [ 650.792830] task:kswapd0 state:D stack: 0 pid: 328 ppid: 2 flags:0x00004000 Feb 1 01:12:33 ip-172-31-54-178 kernel: [ 650.792833] Call Trace: Feb 1 01:12:33 ip-172-31-54-178 kernel: [ 650.792835] <TASK> Feb 1 01:12:33 ip-172-31-54-178 kernel: [ 650.792837] __schedule+0x248/0x5d0 Feb 1 01:12:33 ip-172-31-54-178 kernel: [ 650.792842] schedule+0x58/0x100 Feb 1 01:12:33 ip-172-31-54-178 kernel: [ 650.792844] io_schedule+0x46/0x80 Feb 1 01:12:33 ip-172-31-54-178 kernel: [ 650.792846] blk_mq_get_tag+0x117/0x2e0 Feb 1 01:12:33 ip-172-31-54-178 kernel: [ 650.792852] ? destroy_sched_domains_rcu+0x40/0x40 Feb 1 01:12:33 ip-172-31-54-178 kernel: [ 650.792857] __blk_mq_alloc_requests+0xc4/0x1e0 Feb 1 01:12:33 ip-172-31-54-178 kernel: [ 650.792859] blk_mq_get_new_requests+0xce/0x190 Feb 1 01:12:33 ip-172-31-54-178 kernel: [ 650.792861] blk_mq_submit_bio+0x1e6/0x430 Feb 1 01:12:33 ip-172-31-54-178 kernel: [ 650.792864] __submit_bio+0xf6/0x190 Feb 1 01:12:33 ip-172-31-54-178 kernel: [ 650.792866] submit_bio_noacct_nocheck+0xc2/0x120 Feb 1 01:12:33 ip-172-31-54-178 kernel: [ 650.792869] submit_bio_noacct+0x1c5/0x540 Feb 1 01:12:33 ip-172-31-54-178 kernel: [ 650.792871] ? sio_write_complete+0x1f0/0x1f0 Feb 1 01:12:33 ip-172-31-54-178 kernel: [ 650.792875] submit_bio+0x47/0xf0 Feb 1 01:12:33 ip-172-31-54-178 kernel: [ 650.792877] __swap_writepage+0x157/0x570 Feb 1 01:12:33 ip-172-31-54-178 kernel: [ 650.792879] swap_writepage+0x2f/0x80 Feb 1 01:12:33 ip-172-31-54-178 kernel: [ 650.792880] pageout+0xe2/0x2f0 Feb 1 01:12:33 ip-172-31-54-178 kernel: [ 650.792883] shrink_page_list+0x60b/0xc80 Feb 1 01:12:33 ip-172-31-54-178 kernel: [ 650.792885] shrink_inactive_list+0x1bc/0x4d0 Feb 1 01:12:33 ip-172-31-54-178 kernel: [ 650.792886] shrink_lruvec+0x2f5/0x450 Feb 1 01:12:33 ip-172-31-54-178 kernel: [ 650.792888] shrink_node_memcgs+0x166/0x1d0 Feb 1 01:12:33 ip-172-31-54-178 kernel: [ 650.792890] shrink_node+0x156/0x5a0 Feb 1 01:12:33 ip-172-31-54-178 kernel: [ 650.792891] ? __schedule+0x250/0x5d0 Feb 1 01:12:33 ip-172-31-54-178 kernel: [ 650.792893] balance_pgdat+0x37b/0x880 Feb 1 01:12:33 ip-172-31-54-178 kernel: [ 650.792894] ? zone_watermark_ok_safe+0x4f/0x100 Feb 1 01:12:33 ip-172-31-54-178 kernel: [ 650.792899] ? balance_pgdat+0x880/0x880 Feb 1 01:12:33 ip-172-31-54-178 kernel: [ 650.792900] kswapd+0x10c/0x1c0 Feb 1 01:12:33 ip-172-31-54-178 kernel: [ 650.792901] ? balance_pgdat+0x880/0x880 Feb 1 01:12:33 ip-172-31-54-178 kernel: [ 650.792903] kthread+0xd1/0xf0 Feb 1 01:12:33 ip-172-31-54-178 kernel: [ 650.792906] ? kthread_complete_and_exit+0x20/0x20 Feb 1 01:12:33 ip-172-31-54-178 kernel: [ 650.792909] ret_from_fork+0x22/0x30 Feb 1 01:12:33 ip-172-31-54-178 kernel: [ 650.792913] </TASK> Feb 1 01:12:33 ip-172-31-54-178 kernel: [ 650.792921] Feb 1 01:12:33 ip-172-31-54-178 kernel: [ 650.792922] Restarting kernel threads ... done. Feb 1 01:12:33 ip-172-31-54-178 kernel: [ 651.516499] PM: hibernation: Basic memory bitmaps freed Feb 1 01:12:33 ip-172-31-54-178 kernel: [ 651.516502] OOM killer enabled. Feb 1 01:12:33 ip-172-31-54-178 kernel: [ 651.516502] Restarting tasks ... done. Feb 1 01:12:33 ip-172-31-54-178 kernel: [ 651.516740] sched_clock: Marking stable (650508475881, 1007864825)->(651346297015, 170043691) Feb 1 01:12:33 ip-172-31-54-178 kernel: [ 651.626777] PM: hibernation: hibernation exit Feb 1 01:12:33 ip-172-31-54-178 kernel: [ 651.670368] systemd[1]: snapd.service: Watchdog timeout (limit 5min)! Feb 1 01:12:33 ip-172-31-54-178 kernel: [ 651.672610] systemd[1]: snapd.service: Killing process 986 (snapd) with signal SIGABRT. Feb 1 01:12:33 ip-172-31-54-178 kernel: [ 651.719887] systemd[1]: snapd.service: Main process exited, code=exited, status=2/INVALIDARGUMENT Feb 1 01:12:33 ip-172-31-54-178 kernel: [ 651.719895] systemd[1]: snapd.service: Failed with result 'watchdog'. Feb 1 01:12:33 ip-172-31-54-178 kernel: [ 651.720923] systemd[1]: snapd.service: Consumed 1.714s CPU time. Feb 1 01:12:33 ip-172-31-54-178 kernel: [ 651.796678] systemd[1]: Starting /usr/lib/ec2-hibinit-agent/hibinit-resume... Feb 1 01:12:33 ip-172-31-54-178 kernel: [ 651.797487] systemd[1]: systemd-hibernate.service: Main process exited, code=exited, status=1/FAILURE Feb 1 01:12:33 ip-172-31-54-178 kernel: [ 651.797650] systemd[1]: systemd-hibernate.service: Failed with result 'exit-code'. Feb 1 01:12:33 ip-172-31-54-178 kernel: [ 651.798075] systemd[1]: Failed to start Hibernate. Feb 1 01:12:33 ip-172-31-54-178 kernel: [ 651.800047] systemd[1]: Dependency failed for System Hibernation. Feb 1 01:12:33 ip-172-31-54-178 kernel: [ 651.800082] systemd[1]: hibernate.target: Job hibernate.target/start failed with result 'dependency'. Feb 1 01:12:33 ip-172-31-54-178 kernel: [ 651.800130] systemd[1]: systemd-hibernate.service: Consumed 36.142s CPU time. Feb 1 01:12:33 ip-172-31-54-178 kernel: [ 651.806905] systemd[1]: Stopped target Sleep. Hibernation testing was performed across 93 instance types with 10 runs each. Each run consists of two hibernation and resume cycles while running a memory allocator. This issue was seen in about 25 of those 930 runs. It was observed on c5.12xlarge, c5d.12xlarge, m5a.large, m5a.xlarge, m5a.2xlarge, m5ad.xlarge, m5ad.2xlarge, r5a.xlarge, r5a.2xlarge, t3a.medium, t3a.xlarge, and t3a.2xlarge. ** Affects: linux-aws (Ubuntu) Importance: Undecided Status: New -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux-aws in Ubuntu. https://bugs.launchpad.net/bugs/2006620 Title: linux-aws-5.19 hibernation tasks sometimes fail to freeze Status in linux-aws package in Ubuntu: New Bug description: Hibernation on AWS instances with jammy/5.19.0-1019-aws sometimes fails due to the following failure to freeze: Feb 1 01:09:05 ip-172-31-54-178 kernel: [ 443.247854] PM: hibernation: hibernation entry Feb 1 01:09:05 ip-172-31-54-178 kernel: [ 443.347353] TSC found unstable after boot, most likely due to broken BIOS. Use 'tsc=unstable'. Feb 1 01:09:05 ip-172-31-54-178 kernel: [ 443.347355] sched_clock: Marking unstable (442909362062, 1007864825)<-(443748056670, -400707172) Feb 1 01:12:33 ip-172-31-54-178 kernel: [ 443.940489] Filesystems sync: 0.022 seconds Feb 1 01:12:33 ip-172-31-54-178 kernel: [ 443.940492] Freezing user space processes ... (elapsed 0.001 seconds) done. Feb 1 01:12:33 ip-172-31-54-178 kernel: [ 443.941611] OOM killer disabled. Feb 1 01:12:33 ip-172-31-54-178 kernel: [ 443.943036] PM: hibernation: Marking nosave pages: [mem 0x00000000-0x00000fff] Feb 1 01:12:33 ip-172-31-54-178 kernel: [ 443.943039] PM: hibernation: Marking nosave pages: [mem 0x0009f000-0x000fffff] Feb 1 01:12:33 ip-172-31-54-178 kernel: [ 443.943041] PM: hibernation: Marking nosave pages: [mem 0xbffe8000-0xffffffff] Feb 1 01:12:33 ip-172-31-54-178 kernel: [ 443.943950] PM: hibernation: Basic memory bitmaps created Feb 1 01:12:33 ip-172-31-54-178 kernel: [ 443.943961] PM: hibernation: Preallocating image memory Feb 1 01:12:33 ip-172-31-54-178 kernel: [ 630.782421] PM: hibernation: Allocated 9655951 pages for snapshot Feb 1 01:12:33 ip-172-31-54-178 kernel: [ 630.782424] PM: hibernation: Allocated 38623804 kbytes in 186.83 seconds (206.73 MB/s) Feb 1 01:12:33 ip-172-31-54-178 kernel: [ 630.782426] Freezing remaining freezable tasks ... Feb 1 01:12:33 ip-172-31-54-178 kernel: [ 650.789826] Freezing of tasks failed after 20.007 seconds (1 tasks refusing to freeze, wq_busy=0): Feb 1 01:12:33 ip-172-31-54-178 kernel: [ 650.792830] task:kswapd0 state:D stack: 0 pid: 328 ppid: 2 flags:0x00004000 Feb 1 01:12:33 ip-172-31-54-178 kernel: [ 650.792833] Call Trace: Feb 1 01:12:33 ip-172-31-54-178 kernel: [ 650.792835] <TASK> Feb 1 01:12:33 ip-172-31-54-178 kernel: [ 650.792837] __schedule+0x248/0x5d0 Feb 1 01:12:33 ip-172-31-54-178 kernel: [ 650.792842] schedule+0x58/0x100 Feb 1 01:12:33 ip-172-31-54-178 kernel: [ 650.792844] io_schedule+0x46/0x80 Feb 1 01:12:33 ip-172-31-54-178 kernel: [ 650.792846] blk_mq_get_tag+0x117/0x2e0 Feb 1 01:12:33 ip-172-31-54-178 kernel: [ 650.792852] ? destroy_sched_domains_rcu+0x40/0x40 Feb 1 01:12:33 ip-172-31-54-178 kernel: [ 650.792857] __blk_mq_alloc_requests+0xc4/0x1e0 Feb 1 01:12:33 ip-172-31-54-178 kernel: [ 650.792859] blk_mq_get_new_requests+0xce/0x190 Feb 1 01:12:33 ip-172-31-54-178 kernel: [ 650.792861] blk_mq_submit_bio+0x1e6/0x430 Feb 1 01:12:33 ip-172-31-54-178 kernel: [ 650.792864] __submit_bio+0xf6/0x190 Feb 1 01:12:33 ip-172-31-54-178 kernel: [ 650.792866] submit_bio_noacct_nocheck+0xc2/0x120 Feb 1 01:12:33 ip-172-31-54-178 kernel: [ 650.792869] submit_bio_noacct+0x1c5/0x540 Feb 1 01:12:33 ip-172-31-54-178 kernel: [ 650.792871] ? sio_write_complete+0x1f0/0x1f0 Feb 1 01:12:33 ip-172-31-54-178 kernel: [ 650.792875] submit_bio+0x47/0xf0 Feb 1 01:12:33 ip-172-31-54-178 kernel: [ 650.792877] __swap_writepage+0x157/0x570 Feb 1 01:12:33 ip-172-31-54-178 kernel: [ 650.792879] swap_writepage+0x2f/0x80 Feb 1 01:12:33 ip-172-31-54-178 kernel: [ 650.792880] pageout+0xe2/0x2f0 Feb 1 01:12:33 ip-172-31-54-178 kernel: [ 650.792883] shrink_page_list+0x60b/0xc80 Feb 1 01:12:33 ip-172-31-54-178 kernel: [ 650.792885] shrink_inactive_list+0x1bc/0x4d0 Feb 1 01:12:33 ip-172-31-54-178 kernel: [ 650.792886] shrink_lruvec+0x2f5/0x450 Feb 1 01:12:33 ip-172-31-54-178 kernel: [ 650.792888] shrink_node_memcgs+0x166/0x1d0 Feb 1 01:12:33 ip-172-31-54-178 kernel: [ 650.792890] shrink_node+0x156/0x5a0 Feb 1 01:12:33 ip-172-31-54-178 kernel: [ 650.792891] ? __schedule+0x250/0x5d0 Feb 1 01:12:33 ip-172-31-54-178 kernel: [ 650.792893] balance_pgdat+0x37b/0x880 Feb 1 01:12:33 ip-172-31-54-178 kernel: [ 650.792894] ? zone_watermark_ok_safe+0x4f/0x100 Feb 1 01:12:33 ip-172-31-54-178 kernel: [ 650.792899] ? balance_pgdat+0x880/0x880 Feb 1 01:12:33 ip-172-31-54-178 kernel: [ 650.792900] kswapd+0x10c/0x1c0 Feb 1 01:12:33 ip-172-31-54-178 kernel: [ 650.792901] ? balance_pgdat+0x880/0x880 Feb 1 01:12:33 ip-172-31-54-178 kernel: [ 650.792903] kthread+0xd1/0xf0 Feb 1 01:12:33 ip-172-31-54-178 kernel: [ 650.792906] ? kthread_complete_and_exit+0x20/0x20 Feb 1 01:12:33 ip-172-31-54-178 kernel: [ 650.792909] ret_from_fork+0x22/0x30 Feb 1 01:12:33 ip-172-31-54-178 kernel: [ 650.792913] </TASK> Feb 1 01:12:33 ip-172-31-54-178 kernel: [ 650.792921] Feb 1 01:12:33 ip-172-31-54-178 kernel: [ 650.792922] Restarting kernel threads ... done. Feb 1 01:12:33 ip-172-31-54-178 kernel: [ 651.516499] PM: hibernation: Basic memory bitmaps freed Feb 1 01:12:33 ip-172-31-54-178 kernel: [ 651.516502] OOM killer enabled. Feb 1 01:12:33 ip-172-31-54-178 kernel: [ 651.516502] Restarting tasks ... done. Feb 1 01:12:33 ip-172-31-54-178 kernel: [ 651.516740] sched_clock: Marking stable (650508475881, 1007864825)->(651346297015, 170043691) Feb 1 01:12:33 ip-172-31-54-178 kernel: [ 651.626777] PM: hibernation: hibernation exit Feb 1 01:12:33 ip-172-31-54-178 kernel: [ 651.670368] systemd[1]: snapd.service: Watchdog timeout (limit 5min)! Feb 1 01:12:33 ip-172-31-54-178 kernel: [ 651.672610] systemd[1]: snapd.service: Killing process 986 (snapd) with signal SIGABRT. Feb 1 01:12:33 ip-172-31-54-178 kernel: [ 651.719887] systemd[1]: snapd.service: Main process exited, code=exited, status=2/INVALIDARGUMENT Feb 1 01:12:33 ip-172-31-54-178 kernel: [ 651.719895] systemd[1]: snapd.service: Failed with result 'watchdog'. Feb 1 01:12:33 ip-172-31-54-178 kernel: [ 651.720923] systemd[1]: snapd.service: Consumed 1.714s CPU time. Feb 1 01:12:33 ip-172-31-54-178 kernel: [ 651.796678] systemd[1]: Starting /usr/lib/ec2-hibinit-agent/hibinit-resume... Feb 1 01:12:33 ip-172-31-54-178 kernel: [ 651.797487] systemd[1]: systemd-hibernate.service: Main process exited, code=exited, status=1/FAILURE Feb 1 01:12:33 ip-172-31-54-178 kernel: [ 651.797650] systemd[1]: systemd-hibernate.service: Failed with result 'exit-code'. Feb 1 01:12:33 ip-172-31-54-178 kernel: [ 651.798075] systemd[1]: Failed to start Hibernate. Feb 1 01:12:33 ip-172-31-54-178 kernel: [ 651.800047] systemd[1]: Dependency failed for System Hibernation. Feb 1 01:12:33 ip-172-31-54-178 kernel: [ 651.800082] systemd[1]: hibernate.target: Job hibernate.target/start failed with result 'dependency'. Feb 1 01:12:33 ip-172-31-54-178 kernel: [ 651.800130] systemd[1]: systemd-hibernate.service: Consumed 36.142s CPU time. Feb 1 01:12:33 ip-172-31-54-178 kernel: [ 651.806905] systemd[1]: Stopped target Sleep. Hibernation testing was performed across 93 instance types with 10 runs each. Each run consists of two hibernation and resume cycles while running a memory allocator. This issue was seen in about 25 of those 930 runs. It was observed on c5.12xlarge, c5d.12xlarge, m5a.large, m5a.xlarge, m5a.2xlarge, m5ad.xlarge, m5ad.2xlarge, r5a.xlarge, r5a.2xlarge, t3a.medium, t3a.xlarge, and t3a.2xlarge. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux-aws/+bug/2006620/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp