Public bug reported:

This is a public version of https://bugs.launchpad.net/bugs/2034984

[Description]
  When running UnixBench/Execl throughput case, false sharing is observed due 
to frequent read on base_addr and write on free_bytes, chunk_md.
  UnixBench/Execl represents a class of workload where bash scripts are spawned 
frequently to do some short jobs. It will do system call on execl frequently, 
and execl will call mm_init to initialize mm_struct of the process. mm_init 
will call __percpu_counter_init for percpu_counters initialization. Then 
pcpu_alloc is called to read the base_addr of pcpu_chunk for memory allocation. 
Inside pcpu_alloc, it will call pcpu_alloc_area to allocate memory from a 
specified chunk. This function will update "free_bytes" and "chunk_md" to 
record the rest free bytes and other meta data for this chunk. Correspondingly, 
pcpu_free_area will also update these 2 members when free memory.

  In current pcpu_chunk layout, `base_addr' is in the same cache line
with `free_bytes' and `chunk_md', and `base_addr' is at the last 8
bytes. This patch moves `bound_map' up to `base_addr', to let
`base_addr' locate in a new cacheline.

[Hardware Information]
  Architecture:
    Intel / AMD (x86_64)
  Platform(s):
    Platform-Independent
  Date HW is expected at Canonical:

  Component(s):
    Performance and Scalability

[Software Information]
  Target Version:
    23.10
  Target Kernel:
    6.5
  Commit IDs:
    3a6358c0dbe6 percpu-internal/pcpu_chunk: re-layout pcpu_chunk structure to 
reduce false sharing
  External Links:

[Business Justification]

[Testing guidance]

[External ID]
  OSVE-5160


The following requested patch has been applied upstream for v6.5-rc1:

 - 3a6358c0dbe6 percpu-internal/pcpu_chunk: re-layout pcpu_chunk
structure to reduce false sharing

https://github.com/torvalds/linux/commit/3a6358c0dbe6a286a4f4504ba392a6039a9fbd12

** Affects: linux (Ubuntu)
     Importance: Undecided
     Assignee: Philip Cox (philcox)
         Status: In Progress

** Affects: linux (Ubuntu Jammy)
     Importance: Undecided
     Assignee: Philip Cox (philcox)
         Status: In Progress

** Also affects: linux (Ubuntu Jammy)
   Importance: Undecided
       Status: New

** Changed in: linux (Ubuntu Jammy)
     Assignee: (unassigned) => Philip Cox (philcox)

** Changed in: linux (Ubuntu)
       Status: New => In Progress

** Changed in: linux (Ubuntu Jammy)
       Status: New => In Progress

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/2053152

Title:
  performance: mm/percpu-internal.h: Re-layout pcpu_chunk to mitigate
  false sharing

Status in linux package in Ubuntu:
  In Progress
Status in linux source package in Jammy:
  In Progress

Bug description:
  This is a public version of https://bugs.launchpad.net/bugs/2034984

  [Description]
    When running UnixBench/Execl throughput case, false sharing is observed due 
to frequent read on base_addr and write on free_bytes, chunk_md.
    UnixBench/Execl represents a class of workload where bash scripts are 
spawned frequently to do some short jobs. It will do system call on execl 
frequently, and execl will call mm_init to initialize mm_struct of the process. 
mm_init will call __percpu_counter_init for percpu_counters initialization. 
Then pcpu_alloc is called to read the base_addr of pcpu_chunk for memory 
allocation. Inside pcpu_alloc, it will call pcpu_alloc_area to allocate memory 
from a specified chunk. This function will update "free_bytes" and "chunk_md" 
to record the rest free bytes and other meta data for this chunk. 
Correspondingly, pcpu_free_area will also update these 2 members when free 
memory.

    In current pcpu_chunk layout, `base_addr' is in the same cache line
  with `free_bytes' and `chunk_md', and `base_addr' is at the last 8
  bytes. This patch moves `bound_map' up to `base_addr', to let
  `base_addr' locate in a new cacheline.

  [Hardware Information]
    Architecture:
      Intel / AMD (x86_64)
    Platform(s):
      Platform-Independent
    Date HW is expected at Canonical:

    Component(s):
      Performance and Scalability

  [Software Information]
    Target Version:
      23.10
    Target Kernel:
      6.5
    Commit IDs:
      3a6358c0dbe6 percpu-internal/pcpu_chunk: re-layout pcpu_chunk structure 
to reduce false sharing
    External Links:

  [Business Justification]

  [Testing guidance]

  [External ID]
    OSVE-5160


  The following requested patch has been applied upstream for v6.5-rc1:

   - 3a6358c0dbe6 percpu-internal/pcpu_chunk: re-layout pcpu_chunk
  structure to reduce false sharing

  
https://github.com/torvalds/linux/commit/3a6358c0dbe6a286a4f4504ba392a6039a9fbd12

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2053152/+subscriptions


-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to     : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

Reply via email to