TL;DR @seth-arnold, as a test can you try to set the following options?

  $ echo $((32 * 1024 * 1024)) | sudo tee /proc/sys/vm/dirty_bytes
  $ echo $((32 * 1024 * 1024)) | sudo tee /proc/sys/vm/dirty_background_bytes

Repeat the test and see if the system is still unresponsive.

Details below.

This is what I think it's happening in this last scenario: interactive
performance killed when a large I/O writer is running.

The large I/O writer generates a lot of dirty pages, nothing is forcing
to sync those pages to the backing store until the dirty_ratio (=20%) /
dirty_background_ratio (=10%) thresholds are hit. And they can be quite
high with the default settings in systems with a lot of RAM.

For example in a system with 16GB of free/reclaimable memory, the amount
of dirty memory that is allowed before a writer is actively forced to
flush those pages to the backing store is: 16GB * 20 / 100 = 3.2GB.
Flusher threads are started when the amount of dirty pages is 16GB * 10
/ 100 = 1.6GB of dirty memory.

So, if the writer doesn't stop, it will consume all the free pages in
the system and at that point we are going to have a lot of dirty pages.
Then the kernel needs to decide what to do to free up some pages.

Reclaimable memory is the first choice: cached clean pages that already
have a copy on the corresponding backing store are easy to reclaim,
because they just need to be dropped from the page cache (no I/O
involved). Dirty pages are more expensive to reclaim, because they need
to be flushed to the backing store before freeing up the page. Same with
anonymous memory that needs to be flushed to the swap device, before
being able to re-use the page.

So when the system starts to reclaim some pages, we see some swap
activity and we also see some I/O due to the flushing of the dirty
pages. I think the system becomes sluggish, because there are too many
dirty pages, the kernel is spending too much time to select the right
pages to reclaim and interactive performance is killed.

This looks like a bug/regression in the kernel and I think we should
definitely investigate more and track down the reason of the problem. In
the meantime, as a test to prove this thoery I think we could try to
reduce the amount of allowed dirty pages in the system, tuning the dirty
thresholds: vm.dirty_bytes and vm.dirty_background_bytes (using the
*_bytes tuners to have a more fine-grained control on those thresholds)
and see if there are some benefits in the specific scenario reported by
Seth.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1861359

Title:
  swap storms kills interactive use

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1861359/+subscriptions

-- 
ubuntu-bugs mailing list
[email protected]
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

Reply via email to