We performed additional testing. We tried the following on 6.17 kernel:

1. MGLRU tuning (enabled=0x0001, min_ttl_ms=1000) - regression persists
2. MGLRU fully disabled (enabled=0x0000) - regression persists


Full diagnostic outputs from two nodes:

 - Node A: 6.14.0-1017-azure (healthy)
 - Node B: 6.17.0-1008-azure (impacted)

are available here:

https://gist.github.com/arekpalinski/ae46b228a31f9fa18f442b77582e786f


AI based analysis of those and spotted differences:

== Key Differences (from /proc/vmstat comparison) ==


  Dirty page management:
    6.14: nr_dirty = 42,330 (165 MiB)     nr_written = 8,570,282
    6.17: nr_dirty = 302,099 (1,180 MiB)   nr_written = 3,799,142

    6.14 has written 2.26x more pages yet has 7.1x fewer dirty pages.
    Writeback on 6.17 is not keeping up.

  Reclaim hitting dirty pages:
    6.14: nr_vmscan_immediate_reclaim = 0
    6.17: nr_vmscan_immediate_reclaim = 2,168,908

    On 6.14 kswapd never encounters pages under writeback that it 
    needs to immediately reclaim. On 6.17 this happens 2.1M times.

  File page thrashing:
    6.14: workingset_refault_file = 340,050
    6.17: workingset_refault_file = 25,391,003  (74.7x worse)

  LRU churn:
    6.14: pgdeactivate = 334
    6.17: pgdeactivate = 19,627,719  (58,743x worse)

  kswapd activity:
    6.14: pageoutrun = 33       kswapd_low_wmark_hit_quickly = 0
    6.17: pageoutrun = 4,792    kswapd_low_wmark_hit_quickly = 1,559

  Direct reclaim:
    6.14: pgscan_direct = 256        allocstall_normal = 1
    6.17: pgscan_direct = 3,290,591  allocstall_normal = 1,003

  Memory pressure (PSI):
    6.14: some = 0.00%   full = 0.00%
    6.17: some = 91.80%  full = 77.49%

  Normal zone free pages:
    6.14: 34,159 (well above high watermark = 23,822)
    6.17: 20,355 (barely above low watermark = 19,852)

== Sysctl Configuration (identical on both nodes) ==

  vm.swappiness = 10
  vm.vfs_cache_pressure = 100
  vm.dirty_ratio = 20
  vm.dirty_background_ratio = 10
  vm.watermark_scale_factor = 10
  vm.watermark_boost_factor = 15000
  vm.zone_reclaim_mode = 0

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/2143713

Title:
  Performance regression between 6.14.0-1017 and 6.17.0-1008.8

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+bug/2143713/+subscriptions


-- 
ubuntu-bugs mailing list
[email protected]
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

Reply via email to