Re: [PATCH v2.1 1/1] kernel/cgroup: Add "dmem" memory accounting cgroup

2025-01-14 Thread Maxime Ripard
Hi Geert,

On Tue, Jan 14, 2025 at 11:16:43AM +0100, Geert Uytterhoeven wrote:
> Hi Maarten,
> 
> On Wed, Dec 4, 2024 at 3:32 PM Maarten Lankhorst  wrote:
> > This code is based on the RDMA and misc cgroup initially, but now
> > uses page_counter. It uses the same min/low/max semantics as the memory
> > cgroup as a result.
> >
> > There's a small mismatch as TTM uses u64, and page_counter long pages.
> > In practice it's not a problem. 32-bits systems don't really come with
> > >=4GB cards and as long as we're consistently wrong with units, it's
> > fine. The device page size may not be in the same units as kernel page
> > size, and each region might also have a different page size (VRAM vs GART
> > for example).
> >
> > The interface is simple:
> > - Call dmem_cgroup_register_region()
> > - Use dmem_cgroup_try_charge to check if you can allocate a chunk of memory,
> >   use dmem_cgroup__uncharge when freeing it. This may return an error code,
> >   or -EAGAIN when the cgroup limit is reached. In that case a reference
> >   to the limiting pool is returned.
> > - The limiting cs can be used as compare function for
> >   dmem_cgroup_state_evict_valuable.
> > - After having evicted enough, drop reference to limiting cs with
> >   dmem_cgroup_pool_state_put.
> >
> > This API allows you to limit device resources with cgroups.
> > You can see the supported cards in /sys/fs/cgroup/dmem.capacity
> > You need to echo +dmem to cgroup.subtree_control, and then you can
> > partition device memory.
> >
> > Co-developed-by: Friedrich Vock 
> > Signed-off-by: Friedrich Vock 
> > Co-developed-by: Maxime Ripard 
> > Signed-off-by: Maxime Ripard 
> > Signed-off-by: Maarten Lankhorst 
> 
> Thanks for your patch, which is now commit b168ed458ddecc17
> ("kernel/cgroup: Add "dmem" memory accounting cgroup") in drm/drm-next.
> 
> > --- a/init/Kconfig
> > +++ b/init/Kconfig
> > @@ -1128,6 +1128,7 @@ config CGROUP_PIDS
> >
> >  config CGROUP_RDMA
> > bool "RDMA controller"
> > +   select PAGE_COUNTER
> 
> This change looks unrelated?
> 
> Oh, reading your response to the build error, this should have been below?

Indeed, good catch.

> > help
> >   Provides enforcement of RDMA resources defined by IB stack.
> >   It is fairly easy for consumers to exhaust RDMA resources, which
> > @@ -1136,6 +1137,15 @@ config CGROUP_RDMA
> >   Attaching processes with active RDMA resources to the cgroup
> >   hierarchy is allowed even if can cross the hierarchy's limit.
> >
> > +config CGROUP_DMEM
> > +   bool "Device memory controller (DMEM)"
> > +   help
> > + The DMEM controller allows compatible devices to restrict device
> > + memory usage based on the cgroup hierarchy.
> > +
> > + As an example, it allows you to restrict VRAM usage for 
> > applications
> > + in the DRM subsystem.
> > +
> 
> Do you envision other users than DRM?
> Perhaps this should depend on DRM for now?

dma-buf heaps and v4l2 support are in progress right now.

Maxime


signature.asc
Description: PGP signature


Re: [PATCH v2.1 1/1] kernel/cgroup: Add "dmem" memory accounting cgroup

2025-01-14 Thread Geert Uytterhoeven
Hi Maarten,

On Wed, Dec 4, 2024 at 3:32 PM Maarten Lankhorst  wrote:
> This code is based on the RDMA and misc cgroup initially, but now
> uses page_counter. It uses the same min/low/max semantics as the memory
> cgroup as a result.
>
> There's a small mismatch as TTM uses u64, and page_counter long pages.
> In practice it's not a problem. 32-bits systems don't really come with
> >=4GB cards and as long as we're consistently wrong with units, it's
> fine. The device page size may not be in the same units as kernel page
> size, and each region might also have a different page size (VRAM vs GART
> for example).
>
> The interface is simple:
> - Call dmem_cgroup_register_region()
> - Use dmem_cgroup_try_charge to check if you can allocate a chunk of memory,
>   use dmem_cgroup__uncharge when freeing it. This may return an error code,
>   or -EAGAIN when the cgroup limit is reached. In that case a reference
>   to the limiting pool is returned.
> - The limiting cs can be used as compare function for
>   dmem_cgroup_state_evict_valuable.
> - After having evicted enough, drop reference to limiting cs with
>   dmem_cgroup_pool_state_put.
>
> This API allows you to limit device resources with cgroups.
> You can see the supported cards in /sys/fs/cgroup/dmem.capacity
> You need to echo +dmem to cgroup.subtree_control, and then you can
> partition device memory.
>
> Co-developed-by: Friedrich Vock 
> Signed-off-by: Friedrich Vock 
> Co-developed-by: Maxime Ripard 
> Signed-off-by: Maxime Ripard 
> Signed-off-by: Maarten Lankhorst 

Thanks for your patch, which is now commit b168ed458ddecc17
("kernel/cgroup: Add "dmem" memory accounting cgroup") in drm/drm-next.

> --- a/init/Kconfig
> +++ b/init/Kconfig
> @@ -1128,6 +1128,7 @@ config CGROUP_PIDS
>
>  config CGROUP_RDMA
> bool "RDMA controller"
> +   select PAGE_COUNTER

This change looks unrelated?

Oh, reading your response to the build error, this should have been below?

> help
>   Provides enforcement of RDMA resources defined by IB stack.
>   It is fairly easy for consumers to exhaust RDMA resources, which
> @@ -1136,6 +1137,15 @@ config CGROUP_RDMA
>   Attaching processes with active RDMA resources to the cgroup
>   hierarchy is allowed even if can cross the hierarchy's limit.
>
> +config CGROUP_DMEM
> +   bool "Device memory controller (DMEM)"
> +   help
> + The DMEM controller allows compatible devices to restrict device
> + memory usage based on the cgroup hierarchy.
> +
> + As an example, it allows you to restrict VRAM usage for applications
> + in the DRM subsystem.
> +

Do you envision other users than DRM?
Perhaps this should depend on DRM for now?

Gr{oetje,eeting}s,

Geert


--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- [email protected]

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds


Re: [PATCH v2.1 1/1] kernel/cgroup: Add "dmem" memory accounting cgroup

2024-12-05 Thread Maarten Lankhorst

Hey,

Missing a select PAGE_COUNTER in init/Kconfig. I thought I had fixed it, 
but I must have forgotten to commit those changes when developing 
between 2 machines.


Cheers,
~Maarten

Den 2024-12-05 kl. 03:27, skrev kernel test robot:

Hi Maarten,

kernel test robot noticed the following build errors:

[auto build test ERROR on tj-cgroup/for-next]
[also build test ERROR on akpm-mm/mm-everything linus/master v6.13-rc1 
next-20241204]
[cannot apply to drm-misc/drm-misc-next drm-tip/drm-tip]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:
https://github.com/intel-lab-lkp/linux/commits/Maarten-Lankhorst/kernel-cgroup-Add-dmem-memory-accounting-cgroup/20241204-233207
base:   https://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup.git for-next
patch link:
https://lore.kernel.org/r/20241204143112.1250983-1-dev%40lankhorst.se
patch subject: [PATCH v2.1 1/1] kernel/cgroup: Add "dmem" memory accounting 
cgroup
config: um-randconfig-r061-20241205 
(https://download.01.org/0day-ci/archive/20241205/[email protected]/config)
compiler: gcc-12 (Debian 12.2.0-14) 12.2.0
reproduce (this is a W=1 build): 
(https://download.01.org/0day-ci/archive/20241205/[email protected]/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot 
| Closes: 
https://lore.kernel.org/oe-kbuild-all/[email protected]/

All errors (new ones prefixed by >>):

/usr/bin/ld: kernel/cgroup/dmem.o: in function `set_resource_min':

kernel/cgroup/dmem.c:115: undefined reference to `page_counter_set_min'

/usr/bin/ld: kernel/cgroup/dmem.o: in function `set_resource_low':

kernel/cgroup/dmem.c:121: undefined reference to `page_counter_set_low'

/usr/bin/ld: kernel/cgroup/dmem.o: in function `set_resource_max':

kernel/cgroup/dmem.c:127: undefined reference to `page_counter_set_max'

/usr/bin/ld: kernel/cgroup/dmem.o: in function `reset_all_resource_limits':

kernel/cgroup/dmem.c:115: undefined reference to `page_counter_set_min'
/usr/bin/ld: kernel/cgroup/dmem.c:121: undefined reference to 
`page_counter_set_low'
/usr/bin/ld: kernel/cgroup/dmem.c:127: undefined reference to 
`page_counter_set_max'

/usr/bin/ld: kernel/cgroup/dmem.o: in function `dmem_cgroup_uncharge':

kernel/cgroup/dmem.c:607: undefined reference to `page_counter_uncharge'

/usr/bin/ld: kernel/cgroup/dmem.o: in function 
`dmem_cgroup_calculate_protection':

kernel/cgroup/dmem.c:275: undefined reference to 
`page_counter_calculate_protection'

/usr/bin/ld: kernel/cgroup/dmem.o: in function `dmem_cgroup_try_charge':

kernel/cgroup/dmem.c:657: undefined reference to `page_counter_try_charge'

collect2: error: ld returned 1 exit status

Kconfig warnings: (for reference only)
WARNING: unmet direct dependencies detected for GET_FREE_REGION
Depends on [n]: SPARSEMEM [=n]
Selected by [y]:
- RESOURCE_KUNIT_TEST [=y] && RUNTIME_TESTING_MENU [=y] && KUNIT [=y]


vim +115 kernel/cgroup/dmem.c

111 
112 static void
113 set_resource_min(struct dmem_cgroup_pool_state *pool, u64 val)
114 {
  > 115  page_counter_set_min(&pool->cnt, val);
116 }
117 
118 static void
119 set_resource_low(struct dmem_cgroup_pool_state *pool, u64 val)
120 {
  > 121  page_counter_set_low(&pool->cnt, val);
122 }
123 
124 static void
125 set_resource_max(struct dmem_cgroup_pool_state *pool, u64 val)
126 {
  > 127  page_counter_set_max(&pool->cnt, val);
128 }
129 
130 static u64 get_resource_low(struct dmem_cgroup_pool_state *pool)
131 {
132 return pool ? READ_ONCE(pool->cnt.low) : 0;
133 }
134 
135 static u64 get_resource_min(struct dmem_cgroup_pool_state *pool)
136 {
137 return pool ? READ_ONCE(pool->cnt.min) : 0;
138 }
139 
140 static u64 get_resource_max(struct dmem_cgroup_pool_state *pool)
141 {
142 return pool ? READ_ONCE(pool->cnt.max) : PAGE_COUNTER_MAX;
143 }
144 
145 static u64 get_resource_current(struct dmem_cgroup_pool_state *pool)
146 {
147 return pool ? page_counter_read(&pool->cnt) : 0;
148 }
149 
150 static void reset_all_resource_limits(struct dmem_cgroup_pool_state 
*rpool)
151 {
152 set_resource_min(rpool, 0);
153 set_resource_low(rpool, 0);
154 set_resource_max(rpool, PAGE_COUNTER_MAX);
155 }
156 
157 static void dmemcs_offline(struct cgroup_subsys_state *css)
158 {

Re: [PATCH v2.1 1/1] kernel/cgroup: Add "dmem" memory accounting cgroup

2024-12-04 Thread kernel test robot
Hi Maarten,

kernel test robot noticed the following build errors:

[auto build test ERROR on tj-cgroup/for-next]
[also build test ERROR on akpm-mm/mm-everything linus/master v6.13-rc1 
next-20241204]
[cannot apply to drm-misc/drm-misc-next drm-tip/drm-tip]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:
https://github.com/intel-lab-lkp/linux/commits/Maarten-Lankhorst/kernel-cgroup-Add-dmem-memory-accounting-cgroup/20241204-233207
base:   https://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup.git for-next
patch link:
https://lore.kernel.org/r/20241204143112.1250983-1-dev%40lankhorst.se
patch subject: [PATCH v2.1 1/1] kernel/cgroup: Add "dmem" memory accounting 
cgroup
config: um-randconfig-r061-20241205 
(https://download.01.org/0day-ci/archive/20241205/[email protected]/config)
compiler: gcc-12 (Debian 12.2.0-14) 12.2.0
reproduce (this is a W=1 build): 
(https://download.01.org/0day-ci/archive/20241205/[email protected]/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot 
| Closes: 
https://lore.kernel.org/oe-kbuild-all/[email protected]/

All errors (new ones prefixed by >>):

   /usr/bin/ld: kernel/cgroup/dmem.o: in function `set_resource_min':
>> kernel/cgroup/dmem.c:115: undefined reference to `page_counter_set_min'
   /usr/bin/ld: kernel/cgroup/dmem.o: in function `set_resource_low':
>> kernel/cgroup/dmem.c:121: undefined reference to `page_counter_set_low'
   /usr/bin/ld: kernel/cgroup/dmem.o: in function `set_resource_max':
>> kernel/cgroup/dmem.c:127: undefined reference to `page_counter_set_max'
   /usr/bin/ld: kernel/cgroup/dmem.o: in function `reset_all_resource_limits':
>> kernel/cgroup/dmem.c:115: undefined reference to `page_counter_set_min'
>> /usr/bin/ld: kernel/cgroup/dmem.c:121: undefined reference to 
>> `page_counter_set_low'
>> /usr/bin/ld: kernel/cgroup/dmem.c:127: undefined reference to 
>> `page_counter_set_max'
   /usr/bin/ld: kernel/cgroup/dmem.o: in function `dmem_cgroup_uncharge':
>> kernel/cgroup/dmem.c:607: undefined reference to `page_counter_uncharge'
   /usr/bin/ld: kernel/cgroup/dmem.o: in function 
`dmem_cgroup_calculate_protection':
>> kernel/cgroup/dmem.c:275: undefined reference to 
>> `page_counter_calculate_protection'
   /usr/bin/ld: kernel/cgroup/dmem.o: in function `dmem_cgroup_try_charge':
>> kernel/cgroup/dmem.c:657: undefined reference to `page_counter_try_charge'
   collect2: error: ld returned 1 exit status

Kconfig warnings: (for reference only)
   WARNING: unmet direct dependencies detected for GET_FREE_REGION
   Depends on [n]: SPARSEMEM [=n]
   Selected by [y]:
   - RESOURCE_KUNIT_TEST [=y] && RUNTIME_TESTING_MENU [=y] && KUNIT [=y]


vim +115 kernel/cgroup/dmem.c

   111  
   112  static void
   113  set_resource_min(struct dmem_cgroup_pool_state *pool, u64 val)
   114  {
 > 115  page_counter_set_min(&pool->cnt, val);
   116  }
   117  
   118  static void
   119  set_resource_low(struct dmem_cgroup_pool_state *pool, u64 val)
   120  {
 > 121  page_counter_set_low(&pool->cnt, val);
   122  }
   123  
   124  static void
   125  set_resource_max(struct dmem_cgroup_pool_state *pool, u64 val)
   126  {
 > 127  page_counter_set_max(&pool->cnt, val);
   128  }
   129  
   130  static u64 get_resource_low(struct dmem_cgroup_pool_state *pool)
   131  {
   132  return pool ? READ_ONCE(pool->cnt.low) : 0;
   133  }
   134  
   135  static u64 get_resource_min(struct dmem_cgroup_pool_state *pool)
   136  {
   137  return pool ? READ_ONCE(pool->cnt.min) : 0;
   138  }
   139  
   140  static u64 get_resource_max(struct dmem_cgroup_pool_state *pool)
   141  {
   142  return pool ? READ_ONCE(pool->cnt.max) : PAGE_COUNTER_MAX;
   143  }
   144  
   145  static u64 get_resource_current(struct dmem_cgroup_pool_state *pool)
   146  {
   147  return pool ? page_counter_read(&pool->cnt) : 0;
   148  }
   149  
   150  static void reset_all_resource_limits(struct dmem_cgroup_pool_state 
*rpool)
   151  {
   152  set_resource_min(rpool, 0);
   153  set_resource_low(rpool, 0);
   154  set_resource_max(rpool, PAGE_COUNTER_MAX);
   155  }
   156  
   157  static void dmemcs_offline(struct cgroup_subsys_state *css)
   158  {
   159  struct dmemcg_state *dmemcs = css_to_dmemcs(css);
   160  struct dmem_cgroup_pool_state *pool;
   161  
   162  rcu_read_lock();
   163  list_for

Re: [PATCH v2.1 1/1] kernel/cgroup: Add "dmem" memory accounting cgroup

2024-12-04 Thread kernel test robot
Hi Maarten,

kernel test robot noticed the following build warnings:

[auto build test WARNING on tj-cgroup/for-next]
[also build test WARNING on akpm-mm/mm-everything linus/master v6.13-rc1 
next-20241204]
[cannot apply to drm-misc/drm-misc-next drm-tip/drm-tip]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:
https://github.com/intel-lab-lkp/linux/commits/Maarten-Lankhorst/kernel-cgroup-Add-dmem-memory-accounting-cgroup/20241204-233207
base:   https://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup.git for-next
patch link:
https://lore.kernel.org/r/20241204143112.1250983-1-dev%40lankhorst.se
patch subject: [PATCH v2.1 1/1] kernel/cgroup: Add "dmem" memory accounting 
cgroup
config: sh-allmodconfig 
(https://download.01.org/0day-ci/archive/20241205/[email protected]/config)
compiler: sh4-linux-gcc (GCC) 14.2.0
reproduce (this is a W=1 build): 
(https://download.01.org/0day-ci/archive/20241205/[email protected]/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot 
| Closes: 
https://lore.kernel.org/oe-kbuild-all/[email protected]/

All warnings (new ones prefixed by >>):

   kernel/cgroup/dmem.c: In function 'dmem_cgroup_state_evict_valuable':
>> kernel/cgroup/dmem.c:302:30: warning: variable 'climit' set but not used 
>> [-Wunused-but-set-variable]
 302 | struct page_counter *climit, *ctest;
 |  ^~
--
>> kernel/cgroup/dmem.c:300: warning: Excess function parameter 'dev' 
>> description in 'dmem_cgroup_state_evict_valuable'
>> kernel/cgroup/dmem.c:300: warning: Excess function parameter 'index' 
>> description in 'dmem_cgroup_state_evict_valuable'
>> kernel/cgroup/dmem.c:635: warning: Function parameter or struct member 
>> 'region' not described in 'dmem_cgroup_try_charge'
>> kernel/cgroup/dmem.c:635: warning: Excess function parameter 'dev' 
>> description in 'dmem_cgroup_try_charge'


vim +/climit +302 kernel/cgroup/dmem.c

   280  
   281  /**
   282   * dmem_cgroup_state_evict_valuable() - Check if we should evict from 
test_pool
   283   * @dev: &dmem_cgroup_region
   284   * @index: The index number of the region being tested.
   285   * @limit_pool: The pool for which we hit limits
   286   * @test_pool: The pool for which to test
   287   * @ignore_low: Whether we have to respect low watermarks.
   288   * @ret_hit_low: Pointer to whether it makes sense to consider low 
watermark.
   289   *
   290   * This function returns true if we can evict from @test_pool, false if 
not.
   291   * When returning false and @ignore_low is false, @ret_hit_low may
   292   * be set to true to indicate this function can be retried with 
@ignore_low
   293   * set to true.
   294   *
   295   * Return: bool
   296   */
   297  bool dmem_cgroup_state_evict_valuable(struct dmem_cgroup_pool_state 
*limit_pool,
   298struct dmem_cgroup_pool_state 
*test_pool,
   299bool ignore_low, bool 
*ret_hit_low)
 > 300  {
   301  struct dmem_cgroup_pool_state *pool = test_pool;
 > 302  struct page_counter *climit, *ctest;
   303  u64 used, min, low;
   304  
   305  /* Can always evict from current pool, despite limits */
   306  if (limit_pool == test_pool)
   307  return true;
   308  
   309  if (limit_pool) {
   310  if (!parent_dmemcs(limit_pool->cs))
   311  return true;
   312  
   313  for (pool = test_pool; pool && limit_pool != pool; pool 
= pool_parent(pool))
   314  {}
   315  
   316  if (!pool)
   317  return false;
   318  } else {
   319  /*
   320   * If there is no cgroup limiting memory usage, use the 
root
   321   * cgroup instead for limit calculations.
   322   */
   323  for (limit_pool = test_pool; pool_parent(limit_pool); 
limit_pool = pool_parent(limit_pool))
   324  {}
   325  }
   326  
   327  climit = &limit_pool->cnt;
   328  ctest = &test_pool->cnt;
   329  
   330  dmem_cgroup_calculate_protection(limit_pool, test_pool);
   331  
   332  used = page_counter_read(ctest);
   333  min = READ_ONCE(ctest->emin);
   334  
   335  if (used <= min)
   336   

[PATCH v2.1 1/1] kernel/cgroup: Add "dmem" memory accounting cgroup

2024-12-04 Thread Maarten Lankhorst
This code is based on the RDMA and misc cgroup initially, but now
uses page_counter. It uses the same min/low/max semantics as the memory
cgroup as a result.

There's a small mismatch as TTM uses u64, and page_counter long pages.
In practice it's not a problem. 32-bits systems don't really come with
>=4GB cards and as long as we're consistently wrong with units, it's
fine. The device page size may not be in the same units as kernel page
size, and each region might also have a different page size (VRAM vs GART
for example).

The interface is simple:
- Call dmem_cgroup_register_region()
- Use dmem_cgroup_try_charge to check if you can allocate a chunk of memory,
  use dmem_cgroup__uncharge when freeing it. This may return an error code,
  or -EAGAIN when the cgroup limit is reached. In that case a reference
  to the limiting pool is returned.
- The limiting cs can be used as compare function for
  dmem_cgroup_state_evict_valuable.
- After having evicted enough, drop reference to limiting cs with
  dmem_cgroup_pool_state_put.

This API allows you to limit device resources with cgroups.
You can see the supported cards in /sys/fs/cgroup/dmem.capacity
You need to echo +dmem to cgroup.subtree_control, and then you can
partition device memory.

Co-developed-by: Friedrich Vock 
Signed-off-by: Friedrich Vock 
Co-developed-by: Maxime Ripard 
Signed-off-by: Maxime Ripard 
Signed-off-by: Maarten Lankhorst 
---
I completely messed up the !CONFIG_CGROUP_DMEM path. Resending just this patch 
to compile cleanly without CONFIG_CGROUP_DMEM enabled.

 Documentation/admin-guide/cgroup-v2.rst |  58 +-
 Documentation/core-api/cgroup.rst   |   9 +
 Documentation/core-api/index.rst|   1 +
 Documentation/gpu/drm-compute.rst   |  54 ++
 include/linux/cgroup_dmem.h |  66 ++
 include/linux/cgroup_subsys.h   |   4 +
 include/linux/page_counter.h|   2 +-
 init/Kconfig|  10 +
 kernel/cgroup/Makefile  |   1 +
 kernel/cgroup/dmem.c| 861 
 mm/page_counter.c   |   4 +-
 11 files changed, 1060 insertions(+), 10 deletions(-)
 create mode 100644 Documentation/core-api/cgroup.rst
 create mode 100644 Documentation/gpu/drm-compute.rst
 create mode 100644 include/linux/cgroup_dmem.h
 create mode 100644 kernel/cgroup/dmem.c

diff --git a/Documentation/admin-guide/cgroup-v2.rst 
b/Documentation/admin-guide/cgroup-v2.rst
index 315ede811c9d0..cb1b4e759b7e2 100644
--- a/Documentation/admin-guide/cgroup-v2.rst
+++ b/Documentation/admin-guide/cgroup-v2.rst
@@ -64,13 +64,14 @@ v1 is available under 
:ref:`Documentation/admin-guide/cgroup-v1/index.rst 
+#include 
+
+struct dmem_cgroup_pool_state;
+
+/* Opaque definition of a cgroup region, used internally */
+struct dmem_cgroup_region;
+
+#if IS_ENABLED(CONFIG_CGROUP_DMEM)
+struct dmem_cgroup_region *dmem_cgroup_register_region(u64 size, const char 
*name_fmt, ...) __printf(2,3);
+void dmem_cgroup_unregister_region(struct dmem_cgroup_region *region);
+int dmem_cgroup_try_charge(struct dmem_cgroup_region *region, u64 size,
+  struct dmem_cgroup_pool_state **ret_pool,
+  struct dmem_cgroup_pool_state **ret_limit_pool);
+void dmem_cgroup_uncharge(struct dmem_cgroup_pool_state *pool, u64 size);
+bool dmem_cgroup_state_evict_valuable(struct dmem_cgroup_pool_state 
*limit_pool,
+ struct dmem_cgroup_pool_state *test_pool,
+ bool ignore_low, bool *ret_hit_low);
+
+void dmem_cgroup_pool_state_put(struct dmem_cgroup_pool_state *pool);
+#else
+static inline __printf(2,3) struct dmem_cgroup_region *
+dmem_cgroup_register_region(u64 size, const char *name_fmt, ...)
+{
+   return NULL;
+}
+
+static inline void dmem_cgroup_unregister_region(struct dmem_cgroup_region 
*region)
+{ }
+
+static inline int dmem_cgroup_try_charge(struct dmem_cgroup_region *region, 
u64 size,
+struct dmem_cgroup_pool_state 
**ret_pool,
+struct dmem_cgroup_pool_state 
**ret_limit_pool)
+{
+   *ret_pool = NULL;
+
+   if (ret_limit_pool)
+   *ret_limit_pool = NULL;
+
+   return 0;
+}
+
+static inline void dmem_cgroup_uncharge(struct dmem_cgroup_pool_state *pool, 
u64 size)
+{ }
+
+static inline
+bool dmem_cgroup_state_evict_valuable(struct dmem_cgroup_pool_state 
*limit_pool,
+ struct dmem_cgroup_pool_state *test_pool,
+ bool ignore_low, bool *ret_hit_low)
+{
+   return true;
+}
+
+static inline void dmem_cgroup_pool_state_put(struct dmem_cgroup_pool_state 
*pool)
+{ }
+
+#endif
+#endif /* _CGROUP_DMEM_H */
diff --git a/include/linux/cgroup_subsys.h b/include/linux/cgroup_subsys.h
index 4452354872307..3fd0bcbf30803 100644
--- a/include/linux/cgroup_subsys.h
+++ b/include/linux/cgroup_subsys.h
@@ -