Public bug reported:

Hibernation on AWS instances with jammy/5.19.0-1019-aws sometimes fails
due to the following failure to freeze:

Feb  1 01:09:05 ip-172-31-54-178 kernel: [  443.247854] PM: hibernation: 
hibernation entry
Feb  1 01:09:05 ip-172-31-54-178 kernel: [  443.347353] TSC found unstable 
after boot, most likely due to broken BIOS. Use 'tsc=unstable'.
Feb  1 01:09:05 ip-172-31-54-178 kernel: [  443.347355] sched_clock: Marking 
unstable (442909362062, 1007864825)<-(443748056670, -400707172)
Feb  1 01:12:33 ip-172-31-54-178 kernel: [  443.940489] Filesystems sync: 0.022 
seconds
Feb  1 01:12:33 ip-172-31-54-178 kernel: [  443.940492] Freezing user space 
processes ... (elapsed 0.001 seconds) done.
Feb  1 01:12:33 ip-172-31-54-178 kernel: [  443.941611] OOM killer disabled.
Feb  1 01:12:33 ip-172-31-54-178 kernel: [  443.943036] PM: hibernation: 
Marking nosave pages: [mem 0x00000000-0x00000fff]
Feb  1 01:12:33 ip-172-31-54-178 kernel: [  443.943039] PM: hibernation: 
Marking nosave pages: [mem 0x0009f000-0x000fffff]
Feb  1 01:12:33 ip-172-31-54-178 kernel: [  443.943041] PM: hibernation: 
Marking nosave pages: [mem 0xbffe8000-0xffffffff]
Feb  1 01:12:33 ip-172-31-54-178 kernel: [  443.943950] PM: hibernation: Basic 
memory bitmaps created
Feb  1 01:12:33 ip-172-31-54-178 kernel: [  443.943961] PM: hibernation: 
Preallocating image memory
Feb  1 01:12:33 ip-172-31-54-178 kernel: [  630.782421] PM: hibernation: 
Allocated 9655951 pages for snapshot
Feb  1 01:12:33 ip-172-31-54-178 kernel: [  630.782424] PM: hibernation: 
Allocated 38623804 kbytes in 186.83 seconds (206.73 MB/s)
Feb  1 01:12:33 ip-172-31-54-178 kernel: [  630.782426] Freezing remaining 
freezable tasks ... 
Feb  1 01:12:33 ip-172-31-54-178 kernel: [  650.789826] Freezing of tasks 
failed after 20.007 seconds (1 tasks refusing to freeze, wq_busy=0):
Feb  1 01:12:33 ip-172-31-54-178 kernel: [  650.792830] task:kswapd0         
state:D stack:    0 pid:  328 ppid:     2 flags:0x00004000
Feb  1 01:12:33 ip-172-31-54-178 kernel: [  650.792833] Call Trace:
Feb  1 01:12:33 ip-172-31-54-178 kernel: [  650.792835]  <TASK>
Feb  1 01:12:33 ip-172-31-54-178 kernel: [  650.792837]  __schedule+0x248/0x5d0
Feb  1 01:12:33 ip-172-31-54-178 kernel: [  650.792842]  schedule+0x58/0x100
Feb  1 01:12:33 ip-172-31-54-178 kernel: [  650.792844]  io_schedule+0x46/0x80
Feb  1 01:12:33 ip-172-31-54-178 kernel: [  650.792846]  
blk_mq_get_tag+0x117/0x2e0
Feb  1 01:12:33 ip-172-31-54-178 kernel: [  650.792852]  ? 
destroy_sched_domains_rcu+0x40/0x40
Feb  1 01:12:33 ip-172-31-54-178 kernel: [  650.792857]  
__blk_mq_alloc_requests+0xc4/0x1e0
Feb  1 01:12:33 ip-172-31-54-178 kernel: [  650.792859]  
blk_mq_get_new_requests+0xce/0x190
Feb  1 01:12:33 ip-172-31-54-178 kernel: [  650.792861]  
blk_mq_submit_bio+0x1e6/0x430
Feb  1 01:12:33 ip-172-31-54-178 kernel: [  650.792864]  __submit_bio+0xf6/0x190
Feb  1 01:12:33 ip-172-31-54-178 kernel: [  650.792866]  
submit_bio_noacct_nocheck+0xc2/0x120
Feb  1 01:12:33 ip-172-31-54-178 kernel: [  650.792869]  
submit_bio_noacct+0x1c5/0x540
Feb  1 01:12:33 ip-172-31-54-178 kernel: [  650.792871]  ? 
sio_write_complete+0x1f0/0x1f0
Feb  1 01:12:33 ip-172-31-54-178 kernel: [  650.792875]  submit_bio+0x47/0xf0
Feb  1 01:12:33 ip-172-31-54-178 kernel: [  650.792877]  
__swap_writepage+0x157/0x570
Feb  1 01:12:33 ip-172-31-54-178 kernel: [  650.792879]  
swap_writepage+0x2f/0x80
Feb  1 01:12:33 ip-172-31-54-178 kernel: [  650.792880]  pageout+0xe2/0x2f0
Feb  1 01:12:33 ip-172-31-54-178 kernel: [  650.792883]  
shrink_page_list+0x60b/0xc80
Feb  1 01:12:33 ip-172-31-54-178 kernel: [  650.792885]  
shrink_inactive_list+0x1bc/0x4d0
Feb  1 01:12:33 ip-172-31-54-178 kernel: [  650.792886]  
shrink_lruvec+0x2f5/0x450
Feb  1 01:12:33 ip-172-31-54-178 kernel: [  650.792888]  
shrink_node_memcgs+0x166/0x1d0
Feb  1 01:12:33 ip-172-31-54-178 kernel: [  650.792890]  shrink_node+0x156/0x5a0
Feb  1 01:12:33 ip-172-31-54-178 kernel: [  650.792891]  ? 
__schedule+0x250/0x5d0
Feb  1 01:12:33 ip-172-31-54-178 kernel: [  650.792893]  
balance_pgdat+0x37b/0x880
Feb  1 01:12:33 ip-172-31-54-178 kernel: [  650.792894]  ? 
zone_watermark_ok_safe+0x4f/0x100
Feb  1 01:12:33 ip-172-31-54-178 kernel: [  650.792899]  ? 
balance_pgdat+0x880/0x880
Feb  1 01:12:33 ip-172-31-54-178 kernel: [  650.792900]  kswapd+0x10c/0x1c0
Feb  1 01:12:33 ip-172-31-54-178 kernel: [  650.792901]  ? 
balance_pgdat+0x880/0x880
Feb  1 01:12:33 ip-172-31-54-178 kernel: [  650.792903]  kthread+0xd1/0xf0
Feb  1 01:12:33 ip-172-31-54-178 kernel: [  650.792906]  ? 
kthread_complete_and_exit+0x20/0x20
Feb  1 01:12:33 ip-172-31-54-178 kernel: [  650.792909]  ret_from_fork+0x22/0x30
Feb  1 01:12:33 ip-172-31-54-178 kernel: [  650.792913]  </TASK>
Feb  1 01:12:33 ip-172-31-54-178 kernel: [  650.792921] 
Feb  1 01:12:33 ip-172-31-54-178 kernel: [  650.792922] Restarting kernel 
threads ... done.
Feb  1 01:12:33 ip-172-31-54-178 kernel: [  651.516499] PM: hibernation: Basic 
memory bitmaps freed
Feb  1 01:12:33 ip-172-31-54-178 kernel: [  651.516502] OOM killer enabled.
Feb  1 01:12:33 ip-172-31-54-178 kernel: [  651.516502] Restarting tasks ... 
done.
Feb  1 01:12:33 ip-172-31-54-178 kernel: [  651.516740] sched_clock: Marking 
stable (650508475881, 1007864825)->(651346297015, 170043691)
Feb  1 01:12:33 ip-172-31-54-178 kernel: [  651.626777] PM: hibernation: 
hibernation exit
Feb  1 01:12:33 ip-172-31-54-178 kernel: [  651.670368] systemd[1]: 
snapd.service: Watchdog timeout (limit 5min)!
Feb  1 01:12:33 ip-172-31-54-178 kernel: [  651.672610] systemd[1]: 
snapd.service: Killing process 986 (snapd) with signal SIGABRT.
Feb  1 01:12:33 ip-172-31-54-178 kernel: [  651.719887] systemd[1]: 
snapd.service: Main process exited, code=exited, status=2/INVALIDARGUMENT
Feb  1 01:12:33 ip-172-31-54-178 kernel: [  651.719895] systemd[1]: 
snapd.service: Failed with result 'watchdog'.
Feb  1 01:12:33 ip-172-31-54-178 kernel: [  651.720923] systemd[1]: 
snapd.service: Consumed 1.714s CPU time.
Feb  1 01:12:33 ip-172-31-54-178 kernel: [  651.796678] systemd[1]: Starting 
/usr/lib/ec2-hibinit-agent/hibinit-resume...
Feb  1 01:12:33 ip-172-31-54-178 kernel: [  651.797487] systemd[1]: 
systemd-hibernate.service: Main process exited, code=exited, status=1/FAILURE
Feb  1 01:12:33 ip-172-31-54-178 kernel: [  651.797650] systemd[1]: 
systemd-hibernate.service: Failed with result 'exit-code'.
Feb  1 01:12:33 ip-172-31-54-178 kernel: [  651.798075] systemd[1]: Failed to 
start Hibernate.
Feb  1 01:12:33 ip-172-31-54-178 kernel: [  651.800047] systemd[1]: Dependency 
failed for System Hibernation.
Feb  1 01:12:33 ip-172-31-54-178 kernel: [  651.800082] systemd[1]: 
hibernate.target: Job hibernate.target/start failed with result 'dependency'.
Feb  1 01:12:33 ip-172-31-54-178 kernel: [  651.800130] systemd[1]: 
systemd-hibernate.service: Consumed 36.142s CPU time.
Feb  1 01:12:33 ip-172-31-54-178 kernel: [  651.806905] systemd[1]: Stopped 
target Sleep.

Hibernation testing was performed across 93 instance types with 10 runs
each. Each run consists of two hibernation and resume cycles while
running a memory allocator. This issue was seen in about 25 of those 930
runs. It was observed on c5.12xlarge, c5d.12xlarge, m5a.large,
m5a.xlarge, m5a.2xlarge, m5ad.xlarge, m5ad.2xlarge, r5a.xlarge,
r5a.2xlarge, t3a.medium, t3a.xlarge, and t3a.2xlarge.

** Affects: linux-aws (Ubuntu)
     Importance: Undecided
         Status: New

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux-aws in Ubuntu.
https://bugs.launchpad.net/bugs/2006620

Title:
  linux-aws-5.19 hibernation tasks sometimes fail to freeze

Status in linux-aws package in Ubuntu:
  New

Bug description:
  Hibernation on AWS instances with jammy/5.19.0-1019-aws sometimes
  fails due to the following failure to freeze:

  Feb  1 01:09:05 ip-172-31-54-178 kernel: [  443.247854] PM: hibernation: 
hibernation entry
  Feb  1 01:09:05 ip-172-31-54-178 kernel: [  443.347353] TSC found unstable 
after boot, most likely due to broken BIOS. Use 'tsc=unstable'.
  Feb  1 01:09:05 ip-172-31-54-178 kernel: [  443.347355] sched_clock: Marking 
unstable (442909362062, 1007864825)<-(443748056670, -400707172)
  Feb  1 01:12:33 ip-172-31-54-178 kernel: [  443.940489] Filesystems sync: 
0.022 seconds
  Feb  1 01:12:33 ip-172-31-54-178 kernel: [  443.940492] Freezing user space 
processes ... (elapsed 0.001 seconds) done.
  Feb  1 01:12:33 ip-172-31-54-178 kernel: [  443.941611] OOM killer disabled.
  Feb  1 01:12:33 ip-172-31-54-178 kernel: [  443.943036] PM: hibernation: 
Marking nosave pages: [mem 0x00000000-0x00000fff]
  Feb  1 01:12:33 ip-172-31-54-178 kernel: [  443.943039] PM: hibernation: 
Marking nosave pages: [mem 0x0009f000-0x000fffff]
  Feb  1 01:12:33 ip-172-31-54-178 kernel: [  443.943041] PM: hibernation: 
Marking nosave pages: [mem 0xbffe8000-0xffffffff]
  Feb  1 01:12:33 ip-172-31-54-178 kernel: [  443.943950] PM: hibernation: 
Basic memory bitmaps created
  Feb  1 01:12:33 ip-172-31-54-178 kernel: [  443.943961] PM: hibernation: 
Preallocating image memory
  Feb  1 01:12:33 ip-172-31-54-178 kernel: [  630.782421] PM: hibernation: 
Allocated 9655951 pages for snapshot
  Feb  1 01:12:33 ip-172-31-54-178 kernel: [  630.782424] PM: hibernation: 
Allocated 38623804 kbytes in 186.83 seconds (206.73 MB/s)
  Feb  1 01:12:33 ip-172-31-54-178 kernel: [  630.782426] Freezing remaining 
freezable tasks ... 
  Feb  1 01:12:33 ip-172-31-54-178 kernel: [  650.789826] Freezing of tasks 
failed after 20.007 seconds (1 tasks refusing to freeze, wq_busy=0):
  Feb  1 01:12:33 ip-172-31-54-178 kernel: [  650.792830] task:kswapd0         
state:D stack:    0 pid:  328 ppid:     2 flags:0x00004000
  Feb  1 01:12:33 ip-172-31-54-178 kernel: [  650.792833] Call Trace:
  Feb  1 01:12:33 ip-172-31-54-178 kernel: [  650.792835]  <TASK>
  Feb  1 01:12:33 ip-172-31-54-178 kernel: [  650.792837]  
__schedule+0x248/0x5d0
  Feb  1 01:12:33 ip-172-31-54-178 kernel: [  650.792842]  schedule+0x58/0x100
  Feb  1 01:12:33 ip-172-31-54-178 kernel: [  650.792844]  io_schedule+0x46/0x80
  Feb  1 01:12:33 ip-172-31-54-178 kernel: [  650.792846]  
blk_mq_get_tag+0x117/0x2e0
  Feb  1 01:12:33 ip-172-31-54-178 kernel: [  650.792852]  ? 
destroy_sched_domains_rcu+0x40/0x40
  Feb  1 01:12:33 ip-172-31-54-178 kernel: [  650.792857]  
__blk_mq_alloc_requests+0xc4/0x1e0
  Feb  1 01:12:33 ip-172-31-54-178 kernel: [  650.792859]  
blk_mq_get_new_requests+0xce/0x190
  Feb  1 01:12:33 ip-172-31-54-178 kernel: [  650.792861]  
blk_mq_submit_bio+0x1e6/0x430
  Feb  1 01:12:33 ip-172-31-54-178 kernel: [  650.792864]  
__submit_bio+0xf6/0x190
  Feb  1 01:12:33 ip-172-31-54-178 kernel: [  650.792866]  
submit_bio_noacct_nocheck+0xc2/0x120
  Feb  1 01:12:33 ip-172-31-54-178 kernel: [  650.792869]  
submit_bio_noacct+0x1c5/0x540
  Feb  1 01:12:33 ip-172-31-54-178 kernel: [  650.792871]  ? 
sio_write_complete+0x1f0/0x1f0
  Feb  1 01:12:33 ip-172-31-54-178 kernel: [  650.792875]  submit_bio+0x47/0xf0
  Feb  1 01:12:33 ip-172-31-54-178 kernel: [  650.792877]  
__swap_writepage+0x157/0x570
  Feb  1 01:12:33 ip-172-31-54-178 kernel: [  650.792879]  
swap_writepage+0x2f/0x80
  Feb  1 01:12:33 ip-172-31-54-178 kernel: [  650.792880]  pageout+0xe2/0x2f0
  Feb  1 01:12:33 ip-172-31-54-178 kernel: [  650.792883]  
shrink_page_list+0x60b/0xc80
  Feb  1 01:12:33 ip-172-31-54-178 kernel: [  650.792885]  
shrink_inactive_list+0x1bc/0x4d0
  Feb  1 01:12:33 ip-172-31-54-178 kernel: [  650.792886]  
shrink_lruvec+0x2f5/0x450
  Feb  1 01:12:33 ip-172-31-54-178 kernel: [  650.792888]  
shrink_node_memcgs+0x166/0x1d0
  Feb  1 01:12:33 ip-172-31-54-178 kernel: [  650.792890]  
shrink_node+0x156/0x5a0
  Feb  1 01:12:33 ip-172-31-54-178 kernel: [  650.792891]  ? 
__schedule+0x250/0x5d0
  Feb  1 01:12:33 ip-172-31-54-178 kernel: [  650.792893]  
balance_pgdat+0x37b/0x880
  Feb  1 01:12:33 ip-172-31-54-178 kernel: [  650.792894]  ? 
zone_watermark_ok_safe+0x4f/0x100
  Feb  1 01:12:33 ip-172-31-54-178 kernel: [  650.792899]  ? 
balance_pgdat+0x880/0x880
  Feb  1 01:12:33 ip-172-31-54-178 kernel: [  650.792900]  kswapd+0x10c/0x1c0
  Feb  1 01:12:33 ip-172-31-54-178 kernel: [  650.792901]  ? 
balance_pgdat+0x880/0x880
  Feb  1 01:12:33 ip-172-31-54-178 kernel: [  650.792903]  kthread+0xd1/0xf0
  Feb  1 01:12:33 ip-172-31-54-178 kernel: [  650.792906]  ? 
kthread_complete_and_exit+0x20/0x20
  Feb  1 01:12:33 ip-172-31-54-178 kernel: [  650.792909]  
ret_from_fork+0x22/0x30
  Feb  1 01:12:33 ip-172-31-54-178 kernel: [  650.792913]  </TASK>
  Feb  1 01:12:33 ip-172-31-54-178 kernel: [  650.792921] 
  Feb  1 01:12:33 ip-172-31-54-178 kernel: [  650.792922] Restarting kernel 
threads ... done.
  Feb  1 01:12:33 ip-172-31-54-178 kernel: [  651.516499] PM: hibernation: 
Basic memory bitmaps freed
  Feb  1 01:12:33 ip-172-31-54-178 kernel: [  651.516502] OOM killer enabled.
  Feb  1 01:12:33 ip-172-31-54-178 kernel: [  651.516502] Restarting tasks ... 
done.
  Feb  1 01:12:33 ip-172-31-54-178 kernel: [  651.516740] sched_clock: Marking 
stable (650508475881, 1007864825)->(651346297015, 170043691)
  Feb  1 01:12:33 ip-172-31-54-178 kernel: [  651.626777] PM: hibernation: 
hibernation exit
  Feb  1 01:12:33 ip-172-31-54-178 kernel: [  651.670368] systemd[1]: 
snapd.service: Watchdog timeout (limit 5min)!
  Feb  1 01:12:33 ip-172-31-54-178 kernel: [  651.672610] systemd[1]: 
snapd.service: Killing process 986 (snapd) with signal SIGABRT.
  Feb  1 01:12:33 ip-172-31-54-178 kernel: [  651.719887] systemd[1]: 
snapd.service: Main process exited, code=exited, status=2/INVALIDARGUMENT
  Feb  1 01:12:33 ip-172-31-54-178 kernel: [  651.719895] systemd[1]: 
snapd.service: Failed with result 'watchdog'.
  Feb  1 01:12:33 ip-172-31-54-178 kernel: [  651.720923] systemd[1]: 
snapd.service: Consumed 1.714s CPU time.
  Feb  1 01:12:33 ip-172-31-54-178 kernel: [  651.796678] systemd[1]: Starting 
/usr/lib/ec2-hibinit-agent/hibinit-resume...
  Feb  1 01:12:33 ip-172-31-54-178 kernel: [  651.797487] systemd[1]: 
systemd-hibernate.service: Main process exited, code=exited, status=1/FAILURE
  Feb  1 01:12:33 ip-172-31-54-178 kernel: [  651.797650] systemd[1]: 
systemd-hibernate.service: Failed with result 'exit-code'.
  Feb  1 01:12:33 ip-172-31-54-178 kernel: [  651.798075] systemd[1]: Failed to 
start Hibernate.
  Feb  1 01:12:33 ip-172-31-54-178 kernel: [  651.800047] systemd[1]: 
Dependency failed for System Hibernation.
  Feb  1 01:12:33 ip-172-31-54-178 kernel: [  651.800082] systemd[1]: 
hibernate.target: Job hibernate.target/start failed with result 'dependency'.
  Feb  1 01:12:33 ip-172-31-54-178 kernel: [  651.800130] systemd[1]: 
systemd-hibernate.service: Consumed 36.142s CPU time.
  Feb  1 01:12:33 ip-172-31-54-178 kernel: [  651.806905] systemd[1]: Stopped 
target Sleep.

  Hibernation testing was performed across 93 instance types with 10
  runs each. Each run consists of two hibernation and resume cycles
  while running a memory allocator. This issue was seen in about 25 of
  those 930 runs. It was observed on c5.12xlarge, c5d.12xlarge,
  m5a.large, m5a.xlarge, m5a.2xlarge, m5ad.xlarge, m5ad.2xlarge,
  r5a.xlarge, r5a.2xlarge, t3a.medium, t3a.xlarge, and t3a.2xlarge.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux-aws/+bug/2006620/+subscriptions


-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to     : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

Reply via email to