Re: [RFC PATCH v3 00/11] new cgroup controller for gpu/drm subsystem

2019-07-02 Thread Daniel Vetter
On Sun, Jun 30, 2019 at 01:10:28AM -0400, Kenny Ho wrote:
> On Thu, Jun 27, 2019 at 3:24 AM Daniel Vetter  wrote:
> > Another question I have: What about HMM? With the device memory zone
> > the core mm will be a lot more involved in managing that, but I also
> > expect that we'll have classic buffer-based management for a long time
> > still. So these need to work together, and I fear slightly that we'll
> > have memcg and drmcg fighting over the same pieces a bit perhaps?
> >
> > Adding Jerome, maybe he has some thoughts on this.
> 
> I just did a bit of digging and this looks like the current behaviour:
> https://www.kernel.org/doc/html/v5.1/vm/hmm.html#memory-cgroup-memcg-and-rss-accounting
> 
> "For now device memory is accounted as any regular page in rss
> counters (either anonymous if device page is used for anonymous, file
> if device page is used for file backed page or shmem if device page is
> used for shared memory). This is a deliberate choice to keep existing
> applications, that might start using device memory without knowing
> about it, running unimpacted.
> 
> A drawback is that the OOM killer might kill an application using a
> lot of device memory and not a lot of regular system memory and thus
> not freeing much system memory. We want to gather more real world
> experience on how applications and system react under memory pressure
> in the presence of device memory before deciding to account device
> memory differently."

Hm ... I also just learned that the device memory stuff, at least the hmm
part, is probably getting removed again, and only the hmm_mirror part of
hmm will be kept. So maybe this doesn't matter to us. But really no idea.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [RFC PATCH v3 00/11] new cgroup controller for gpu/drm subsystem

2019-06-29 Thread Kenny Ho
On Thu, Jun 27, 2019 at 3:24 AM Daniel Vetter  wrote:
> Another question I have: What about HMM? With the device memory zone
> the core mm will be a lot more involved in managing that, but I also
> expect that we'll have classic buffer-based management for a long time
> still. So these need to work together, and I fear slightly that we'll
> have memcg and drmcg fighting over the same pieces a bit perhaps?
>
> Adding Jerome, maybe he has some thoughts on this.

I just did a bit of digging and this looks like the current behaviour:
https://www.kernel.org/doc/html/v5.1/vm/hmm.html#memory-cgroup-memcg-and-rss-accounting

"For now device memory is accounted as any regular page in rss
counters (either anonymous if device page is used for anonymous, file
if device page is used for file backed page or shmem if device page is
used for shared memory). This is a deliberate choice to keep existing
applications, that might start using device memory without knowing
about it, running unimpacted.

A drawback is that the OOM killer might kill an application using a
lot of device memory and not a lot of regular system memory and thus
not freeing much system memory. We want to gather more real world
experience on how applications and system react under memory pressure
in the presence of device memory before deciding to account device
memory differently."

Regards,
Kenny
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [RFC PATCH v3 00/11] new cgroup controller for gpu/drm subsystem

2019-06-27 Thread Daniel Vetter
On Wed, Jun 26, 2019 at 5:05 PM Kenny Ho  wrote:
> This is a follow up to the RFC I made previously to introduce a cgroup
> controller for the GPU/DRM subsystem [v1,v2].  The goal is to be able to
> provide resource management to GPU resources using things like container.
> The cover letter from v1 is copied below for reference.
>
> [v1]: 
> https://lists.freedesktop.org/archives/dri-devel/2018-November/197106.html
> [v2]: https://www.spinics.net/lists/cgroups/msg22074.html
>
> v3:
> Base on feedbacks on v2:
> * removed .help type file from v2
> * conform to cgroup convention for default and max handling
> * conform to cgroup convention for addressing device specific limits (with 
> major:minor)
> New function:
> * adopted memparse for memory size related attributes
> * added macro to marshall drmcgrp cftype private  (DRMCG_CTF_PRIV, etc.)
> * added ttm buffer usage stats (per cgroup, for system, tt, vram.)
> * added ttm buffer usage limit (per cgroup, for vram.)
> * added per cgroup bandwidth stats and limiting (burst and average bandwidth)
>
> v2:
> * Removed the vendoring concepts
> * Add limit to total buffer allocation
> * Add limit to the maximum size of a buffer allocation
>
> v1: cover letter
>
> The purpose of this patch series is to start a discussion for a generic cgroup
> controller for the drm subsystem.  The design proposed here is a very early 
> one.
> We are hoping to engage the community as we develop the idea.
>
>
> Backgrounds
> ==
> Control Groups/cgroup provide a mechanism for aggregating/partitioning sets of
> tasks, and all their future children, into hierarchical groups with 
> specialized
> behaviour, such as accounting/limiting the resources which processes in a 
> cgroup
> can access[1].  Weights, limits, protections, allocations are the main 
> resource
> distribution models.  Existing cgroup controllers includes cpu, memory, io,
> rdma, and more.  cgroup is one of the foundational technologies that enables 
> the
> popular container application deployment and management method.
>
> Direct Rendering Manager/drm contains code intended to support the needs of
> complex graphics devices. Graphics drivers in the kernel may make use of DRM
> functions to make tasks like memory management, interrupt handling and DMA
> easier, and provide a uniform interface to applications.  The DRM has also
> developed beyond traditional graphics applications to support compute/GPGPU
> applications.
>
>
> Motivations
> =
> As GPU grow beyond the realm of desktop/workstation graphics into areas like
> data center clusters and IoT, there are increasing needs to monitor and 
> regulate
> GPU as a resource like cpu, memory and io.
>
> Matt Roper from Intel began working on similar idea in early 2018 [2] for the
> purpose of managing GPU priority using the cgroup hierarchy.  While that
> particular use case may not warrant a standalone drm cgroup controller, there
> are other use cases where having one can be useful [3].  Monitoring GPU
> resources such as VRAM and buffers, CU (compute unit [AMD's nomenclature])/EU
> (execution unit [Intel's nomenclature]), GPU job scheduling [4] can help
> sysadmins get a better understanding of the applications usage profile.  
> Further
> usage regulations of the aforementioned resources can also help sysadmins
> optimize workload deployment on limited GPU resources.
>
> With the increased importance of machine learning, data science and other
> cloud-based applications, GPUs are already in production use in data centers
> today [5,6,7].  Existing GPU resource management is very course grain, 
> however,
> as sysadmins are only able to distribute workload on a per-GPU basis [8].  An
> alternative is to use GPU virtualization (with or without SRIOV) but it
> generally acts on the entire GPU instead of the specific resources in a GPU.
> With a drm cgroup controller, we can enable alternate, fine-grain, sub-GPU
> resource management (in addition to what may be available via GPU
> virtualization.)
>
> In addition to production use, the DRM cgroup can also help with testing
> graphics application robustness by providing a mean to artificially limit DRM
> resources availble to the applications.
>
>
> Challenges
> 
> While there are common infrastructure in DRM that is shared across many 
> vendors
> (the scheduler [4] for example), there are also aspects of DRM that are vendor
> specific.  To accommodate this, we borrowed the mechanism used by the cgroup 
> to
> handle different kinds of cgroup controller.
>
> Resources for DRM are also often device (GPU) specific instead of system
> specific and a system may contain more than one GPU.  For this, we borrowed 
> some
> of the ideas from RDMA cgroup controller.

Another question I have: What about HMM? With the device memory zone
the core mm will be a lot more involved in managing that, but I also
expect that we'll have classic buffer-based management for a long time
still. So these need to work 

[RFC PATCH v3 00/11] new cgroup controller for gpu/drm subsystem

2019-06-26 Thread Kenny Ho
This is a follow up to the RFC I made previously to introduce a cgroup
controller for the GPU/DRM subsystem [v1,v2].  The goal is to be able to
provide resource management to GPU resources using things like container.
The cover letter from v1 is copied below for reference.

[v1]: https://lists.freedesktop.org/archives/dri-devel/2018-November/197106.html
[v2]: https://www.spinics.net/lists/cgroups/msg22074.html

v3:
Base on feedbacks on v2:
* removed .help type file from v2
* conform to cgroup convention for default and max handling
* conform to cgroup convention for addressing device specific limits (with 
major:minor)
New function:
* adopted memparse for memory size related attributes
* added macro to marshall drmcgrp cftype private  (DRMCG_CTF_PRIV, etc.)
* added ttm buffer usage stats (per cgroup, for system, tt, vram.)
* added ttm buffer usage limit (per cgroup, for vram.)
* added per cgroup bandwidth stats and limiting (burst and average bandwidth)

v2:
* Removed the vendoring concepts
* Add limit to total buffer allocation
* Add limit to the maximum size of a buffer allocation

v1: cover letter

The purpose of this patch series is to start a discussion for a generic cgroup
controller for the drm subsystem.  The design proposed here is a very early one.
We are hoping to engage the community as we develop the idea.


Backgrounds
==
Control Groups/cgroup provide a mechanism for aggregating/partitioning sets of
tasks, and all their future children, into hierarchical groups with specialized
behaviour, such as accounting/limiting the resources which processes in a cgroup
can access[1].  Weights, limits, protections, allocations are the main resource
distribution models.  Existing cgroup controllers includes cpu, memory, io,
rdma, and more.  cgroup is one of the foundational technologies that enables the
popular container application deployment and management method.

Direct Rendering Manager/drm contains code intended to support the needs of
complex graphics devices. Graphics drivers in the kernel may make use of DRM
functions to make tasks like memory management, interrupt handling and DMA
easier, and provide a uniform interface to applications.  The DRM has also
developed beyond traditional graphics applications to support compute/GPGPU
applications.


Motivations
=
As GPU grow beyond the realm of desktop/workstation graphics into areas like
data center clusters and IoT, there are increasing needs to monitor and regulate
GPU as a resource like cpu, memory and io.

Matt Roper from Intel began working on similar idea in early 2018 [2] for the
purpose of managing GPU priority using the cgroup hierarchy.  While that
particular use case may not warrant a standalone drm cgroup controller, there
are other use cases where having one can be useful [3].  Monitoring GPU
resources such as VRAM and buffers, CU (compute unit [AMD's nomenclature])/EU
(execution unit [Intel's nomenclature]), GPU job scheduling [4] can help
sysadmins get a better understanding of the applications usage profile.  Further
usage regulations of the aforementioned resources can also help sysadmins
optimize workload deployment on limited GPU resources.

With the increased importance of machine learning, data science and other
cloud-based applications, GPUs are already in production use in data centers
today [5,6,7].  Existing GPU resource management is very course grain, however,
as sysadmins are only able to distribute workload on a per-GPU basis [8].  An
alternative is to use GPU virtualization (with or without SRIOV) but it
generally acts on the entire GPU instead of the specific resources in a GPU.
With a drm cgroup controller, we can enable alternate, fine-grain, sub-GPU
resource management (in addition to what may be available via GPU
virtualization.)

In addition to production use, the DRM cgroup can also help with testing
graphics application robustness by providing a mean to artificially limit DRM
resources availble to the applications.


Challenges

While there are common infrastructure in DRM that is shared across many vendors
(the scheduler [4] for example), there are also aspects of DRM that are vendor
specific.  To accommodate this, we borrowed the mechanism used by the cgroup to
handle different kinds of cgroup controller.

Resources for DRM are also often device (GPU) specific instead of system
specific and a system may contain more than one GPU.  For this, we borrowed some
of the ideas from RDMA cgroup controller.

Approach
===
To experiment with the idea of a DRM cgroup, we would like to start with basic
accounting and statistics, then continue to iterate and add regulating
mechanisms into the driver.

[1] https://www.kernel.org/doc/Documentation/cgroup-v1/cgroups.txt
[2] https://lists.freedesktop.org/archives/intel-gfx/2018-January/153156.html
[3] https://www.spinics.net/lists/cgroups/msg20720.html
[4] https://elixir.bootlin.com/linux/latest/source/drivers/gpu/drm/scheduler
[5]