Re: [PATCH RFC 2/5] cgroup: Add mechanism to register vendor specific DRM devices

2018-11-20 Thread Tejun Heo
Hello, On Tue, Nov 20, 2018 at 10:21:14PM +, Ho, Kenny wrote: > By this reply, are you suggesting that vendor specific resources > will never be acceptable to be managed under cgroup? Let say a user I wouldn't say never but whatever which gets included as a cgroup controller should have

Re: [PATCH RFC 2/5] cgroup: Add mechanism to register vendor specific DRM devices

2018-11-20 Thread Tejun Heo
Hello, On Tue, Nov 20, 2018 at 01:58:11PM -0500, Kenny Ho wrote: > Since many parts of the DRM subsystem has vendor-specific > implementations, we introduce mechanisms for vendor to register their > specific resources and control files to the DRM cgroup subsystem. A > vendor will register itself

Re: [PATCH v2 0/4] AMDKFD (AMD GPU compute) support for device cgroup.

2019-05-17 Thread Tejun Heo
On Fri, May 17, 2019 at 04:14:52PM +, Kasiviswanathan, Harish wrote: > amdkfd (part of amdgpu) driver supports the AMD GPU compute stack. > amdkfd exposes only a single device /dev/kfd even if multiple AMD GPU > (compute) devices exist in a system. However, amdgpu drvier exposes a > separate

Re: [PATCH v2 4/4] drm/amdkfd: Check against device cgroup

2019-05-28 Thread Tejun Heo
Hello, On Fri, May 17, 2019 at 08:12:17PM +, Kuehling, Felix wrote: > Patches 1,2,4 will be submitted through amd-staging-drm-next. Patch 3 > goes through the cgroup tree. Patch 4 depends on patch 3. So submitting > patch 4 will need to wait until we rebase amd-staging-drm-next on a new >

Re: [PATCH v2 0/4] AMDKFD (AMD GPU compute) support for device cgroup.

2019-05-28 Thread Tejun Heo
Hello, On Fri, May 17, 2019 at 08:04:42PM +, Kasiviswanathan, Harish wrote: > 1). Documentation for user on how to use device cgroup for amdkfd device. I > have some more information on this in patch 4. I see. Yeah, I just missed that. Thanks. -- tejun

Re: [PATCH v2 4/4] drm/amdkfd: Check against device cgroup

2019-05-29 Thread Tejun Heo
On Wed, May 29, 2019 at 08:45:44PM +, Kuehling, Felix wrote: > Just to clarify, are you saying that we should upstream this change > through Alex Deucher's amd-staging-drm-next and Dave Airlie's drm-next > trees? Yeah, sure, whichever tree is the most convenient. Thanks. -- tejun

Re: [RFC PATCH v2 4/5] drm, cgroup: Add total GEM buffer allocation limit

2019-05-16 Thread Tejun Heo
Hello, I haven't gone through the patchset yet but some quick comments. On Wed, May 15, 2019 at 10:29:21PM -0400, Kenny Ho wrote: > Given this controller is specific to the drm kernel subsystem which > uses minor to identify drm device, I don't see a need to complicate > the interfaces more by

Re: [PATCH RFC v4 00/16] new cgroup controller for gpu/drm subsystem

2019-08-30 Thread Tejun Heo
Hello, I just glanced through the interface and don't have enough context to give any kind of detailed review yet. I'll try to read up and understand more and would greatly appreciate if you can give me some pointers to read up on the resources being controlled and how the actual use cases would

Re: [PATCH RFC v4 00/16] new cgroup controller for gpu/drm subsystem

2019-09-03 Thread Tejun Heo
Hello, Daniel. On Tue, Sep 03, 2019 at 09:55:50AM +0200, Daniel Vetter wrote: > > * While breaking up and applying control to different types of > > internal objects may seem attractive to folks who work day in and > > day out with the subsystem, they aren't all that useful to users and > >

Re: [PATCH RFC v4 00/16] new cgroup controller for gpu/drm subsystem

2019-09-10 Thread Tejun Heo
Hello, Michal. On Tue, Sep 10, 2019 at 01:54:48PM +0200, Michal Hocko wrote: > > So, while it'd great to have shrinkers in the longer term, it's not a > > strict requirement to be accounted in memcg. It already accounts a > > lot of memory which isn't reclaimable (a lot of slabs and socket > >

Re: [PATCH RFC v4 01/16] drm: Add drm_minor_for_each

2019-09-06 Thread Tejun Heo
Hello, On Wed, Sep 04, 2019 at 10:54:34AM +0200, Daniel Vetter wrote: > Anyway, I don't think reusing the drm_minor registration makes sense, > since we want to be on the drm_device, not on the minor. Which is a bit > awkward for cgroups, which wants to identify devices using major.minor > pairs.

Re: [PATCH RFC v4 00/16] new cgroup controller for gpu/drm subsystem

2019-09-06 Thread Tejun Heo
Hello, Daniel. On Fri, Sep 06, 2019 at 05:34:16PM +0200, Daniel Vetter wrote: > > Hmm... what'd be the fundamental difference from slab or socket memory > > which are handled through memcg? Is system memory used by GPUs have > > further global restrictions in addition to the amount of physical >

Re: [PATCH RFC v4 00/16] new cgroup controller for gpu/drm subsystem

2019-09-06 Thread Tejun Heo
Hello, Daniel. On Tue, Sep 03, 2019 at 09:48:22PM +0200, Daniel Vetter wrote: > I think system memory separate from vram makes sense. For one, vram is > like 10x+ faster than system memory, so we definitely want to have > good control on that. But maybe we only want one vram bucket overall > for

Re: [PATCH RFC v4 01/16] drm: Add drm_minor_for_each

2019-09-06 Thread Tejun Heo
Hello, Daniel. On Fri, Sep 06, 2019 at 05:36:02PM +0200, Daniel Vetter wrote: > Block devices are a great example I think. How do you handle the > partitions on that? For drm we also have a main minor interface, and cgroup IO controllers only distribute hardware IO capacity and are blind to

Re: [PATCH RFC v4 02/16] cgroup: Introduce cgroup for drm subsystem

2019-12-02 Thread Tejun Heo
On Fri, Nov 29, 2019 at 01:00:36AM -0500, Kenny Ho wrote: > On Tue, Oct 1, 2019 at 10:31 AM Michal Koutný wrote: > > On Thu, Aug 29, 2019 at 02:05:19AM -0400, Kenny Ho wrote: > > > +struct cgroup_subsys drm_cgrp_subsys = { > > > + .css_alloc = drmcg_css_alloc, > > > + .css_free

Re: [PATCH 09/11] drm, cgroup: Introduce lgpu as DRM cgroup resource

2020-02-14 Thread Tejun Heo
Hello, Kenny, Daniel. (cc'ing Johannes) On Fri, Feb 14, 2020 at 01:51:32PM -0500, Kenny Ho wrote: > On Fri, Feb 14, 2020 at 1:34 PM Daniel Vetter wrote: > > > > I think guidance from Tejun in previos discussions was pretty clear that > > he expects cgroups to be both a) standardized and c)

Re: [PATCH 09/11] drm, cgroup: Introduce lgpu as DRM cgroup resource

2020-02-14 Thread Tejun Heo
On Fri, Feb 14, 2020 at 03:28:40PM -0500, Kenny Ho wrote: > Can you elaborate, per your understanding, how the lgpu weight > attribute differ from the io.weight you suggested? Is it merely a Oh, it's the non-weight part which is problematic. > formatting/naming issue or is it the implementation

Re: [PATCH] device_cgroup: Cleanup cgroup eBPF device filter code

2020-04-13 Thread Tejun Heo
On Fri, Apr 03, 2020 at 07:55:28PM +0200, Odin Ugedal wrote: > Original cgroup v2 eBPF code for filtering device access made it > possible to compile with CONFIG_CGROUP_DEVICE=n and still use the eBPF > filtering. Change > commit 4b7d4d453fc4 ("device_cgroup: Export devcgroup_check_permission") >

Re: [PATCH v2 00/11] new cgroup controller for gpu/drm subsystem

2020-04-13 Thread Tejun Heo
Hello, On Mon, Apr 13, 2020 at 05:40:32PM -0400, Kenny Ho wrote: > By lack of consense, do you mean Intel's assertion that a standard is > not a standard until Intel implements it? (That was in the context of > OpenCL language standard with the concept of SubDevice.) I thought > the discussion

Re: [PATCH v2 00/11] new cgroup controller for gpu/drm subsystem

2020-04-13 Thread Tejun Heo
Hello, Kenny. On Tue, Mar 24, 2020 at 02:49:27PM -0400, Kenny Ho wrote: > Can you elaborate more on what are the missing pieces? Sorry about the long delay, but I think we've been going in circles for quite a while now. Let's try to make it really simple as the first step. How about something

Re: [PATCH v2 00/11] new cgroup controller for gpu/drm subsystem

2020-04-13 Thread Tejun Heo
Hello, On Mon, Apr 13, 2020 at 04:17:14PM -0400, Kenny Ho wrote: > Perhaps we can even narrow things down to just > gpu.weight/gpu.compute.weight as a start? In this aspect, is the key That sounds great to me. > objection to the current implementation of gpu.compute.weight the >

Re: [PATCH v2 00/11] new cgroup controller for gpu/drm subsystem

2020-03-24 Thread Tejun Heo
On Tue, Mar 17, 2020 at 12:03:20PM -0400, Kenny Ho wrote: > What's your thoughts on this latest series? My overall impression is that the feedbacks aren't being incorporated throughly / sufficiently. Thanks. -- tejun ___ amd-gfx mailing list

Re: [RFC] Add BPF_PROG_TYPE_CGROUP_IOCTL

2021-05-07 Thread Tejun Heo
Hello, On Fri, May 07, 2021 at 06:30:56PM -0400, Alex Deucher wrote: > Maybe we are speaking past each other. I'm not following. We got > here because a device specific cgroup didn't make sense. With my > Linux user hat on, that makes sense. I don't want to write code to a > bunch of device

Re: [RFC] Add BPF_PROG_TYPE_CGROUP_IOCTL

2021-05-07 Thread Tejun Heo
Hello, On Fri, May 07, 2021 at 06:54:13PM +0200, Daniel Vetter wrote: > All I meant is that for the container/cgroups world starting out with > time-sharing feels like the best fit, least because your SRIOV designers > also seem to think that's the best first cut for cloud-y computing. > Whether

Re: [RFC] Add BPF_PROG_TYPE_CGROUP_IOCTL

2021-05-07 Thread Tejun Heo
Hello, On Fri, May 07, 2021 at 03:55:39PM -0400, Alex Deucher wrote: > The problem is temporal partitioning on GPUs is much harder to enforce > unless you have a special case like SR-IOV. Spatial partitioning, on > AMD GPUs at least, is widely available and easily enforced. What is > the point

Re: [PATCH] Revert "workqueue: remove unused cancel_work()"

2022-06-07 Thread Tejun Heo
On Sat, May 21, 2022 at 12:04:00AM -0400, Andrey Grodzovsky wrote: > From 78df30cc97f10c885f5159a293e6afe2348aa60c Mon Sep 17 00:00:00 2001 > From: Andrey Grodzovsky > Date: Thu, 19 May 2022 09:47:28 -0400 > Subject: Revert "workqueue: remove unused cancel_work()" > > This reverts commit

Re: [PATCH] Revert "workqueue: remove unused cancel_work()"

2022-06-07 Thread Tejun Heo
On Tue, Jun 07, 2022 at 01:39:01PM -0400, Alex Deucher wrote: > On Tue, Jun 7, 2022 at 1:14 PM Tejun Heo wrote: > > > > On Sat, May 21, 2022 at 12:04:00AM -0400, Andrey Grodzovsky wrote: > > > From 78df30cc97f10c885f5159a293e6afe2348aa60c Mon Sep 17 00:00:00 2001 >

Re: [PATCH] Revert "workqueue: remove unused cancel_work()"

2022-05-20 Thread Tejun Heo
On Fri, May 20, 2022 at 08:22:39AM +0200, Christian König wrote: > Am 20.05.22 um 02:47 schrieb Lai Jiangshan: > > On Thu, May 19, 2022 at 11:04 PM Andrey Grodzovsky > > wrote: > > > See this patch-set https://www.spinics.net/lists/amd-gfx/msg78514.html, > > > specifically patch > > >

Re: Selecting CPUs for queuing work on

2022-08-12 Thread Tejun Heo
On Fri, Aug 12, 2022 at 04:26:47PM -0400, Felix Kuehling wrote: > Hi workqueue maintainers, > > In the KFD (amdgpu) driver we found a need to schedule bottom half interrupt > handlers on CPU cores different from the one where the top-half interrupt > handler runs to avoid the interrupt handler

Re: Selecting CPUs for queuing work on

2022-08-12 Thread Tejun Heo
Hello, On Fri, Aug 12, 2022 at 04:54:04PM -0400, Felix Kuehling wrote: > In principle, I think IRQ routing to CPUs can change dynamically with > irqbalance. I wonder whether this is something which should be exposed to userland rather than trying to do dynamically in the kernel and let

Re: [RFC PATCH 0/4] Add support for DRM cgroup memory accounting.

2023-05-10 Thread Tejun Heo
Hello, On Wed, May 10, 2023 at 04:59:01PM +0200, Maarten Lankhorst wrote: > The misc controller is not granular enough. A single computer may have any > number of > graphics cards, some of them with multiple regions of vram inside a single > card. Extending the misc controller to support

Re: [RFC PATCH 0/4] Add support for DRM cgroup memory accounting.

2023-05-05 Thread Tejun Heo
Hello, On Wed, May 03, 2023 at 10:34:56AM +0200, Maarten Lankhorst wrote: > RFC as I'm looking for comments. > > For long running compute, it can be beneficial to partition the GPU memory > between cgroups, so each cgroup can use its maximum amount of memory without > interfering with other