Hi,

> > > > > +       select TRACE_GPU_MEM

> > > > > +#ifdef CONFIG_TRACE_GPU_MEM

That doesn't make sense btw.

> > > > > +#ifdef CONFIG_TRACE_GPU_MEM
> > > > > +static inline void virtio_gpu_trace_total_mem(struct 
> > > > > virtio_gpu_device *vgdev,
> > > > > +                                             s64 delta)
> > > > > +{
> > > > > +       u64 total_mem = atomic64_add_return(delta, &vgdev->total_mem);
> > > > > +
> > > > > +       trace_gpu_mem_total(0, 0, total_mem);

Hmm, so no per process tracking (pid arg hard-coded to zero)?
Any plans for that?
The cgroups patches mentioned by Daniel should address that btw.

The gpu_id is hardcoded to zero too.  Shouldn't that be something like
the minor number of the drm device?  Or maybe something else in case you
need drm and non-drm gpu devices work side-by-side?

> > > Thanks for your reply! Android Cuttlefish virtual platform is using
> > > the virtio-gpu driver, and we currently are carrying this small patch
> > > at the downstream side. This is essential for us because:
> > > (1) Android has deprecated debugfs on production devices already

IIRC there have been discussions about a statfs, so you can export stats
with a sane interface without also enabling all the power provided by
debugfs, exactly because of the concerns to do that on production
systems.

Not sure what the state is, seems to not be upstream yet.  That would be
(beside cgroups) another thing to look at.

> > > Android relies on this tracepoint + eBPF to make the GPU memory totals
> > > available at runtime on production devices, which has been enforced
> > > already. Not only game developers can have a reliable kernel total GPU
> > > memory to look at, but also Android leverages this to take GPU memory
> > > usage out from the system lost ram.

Sounds like you define "gpu memory" as "system memory used to store gpu
data".  Is that correct?  What about device memory?

> > > I'm not sure whether the other DRM drivers would like to integrate
> > > this tracepoint(maybe upstream drivers will move away from debugfs
> > > later as well?), but at least we hope virtio-gpu can take this.

Well, it is basically the same for all drivers using the gem shmem
helpers.  So I see little reason why we should do that at virtio-gpu
level.

> Android GPU vendors have integrated this tracepoint to track gpu
> memory usage total(mapped into the gpu address space), which consists
> of below:
> (1) directly allocated via physical page allocator
> (2) imported external memory backed by dma-bufs
> (3) allocated exportable memory backed by dma-bufs

Hmm, the tracepoint doesn't track which of the three groups the memory
belongs to.  Which I think is important, specifically group (2) because
that might already be accounted for by the exporting driver ...

> Our Android kernel team is leading the other side of effort to help
> remove the dma-bufs overlap(those mapped into a gpu device) as a joint
> effort, so that we can accurately explain the memory usage of the
> entire Android system.

I suspect once you figured that you'll notice that this little hack is
rather incomplete.

> For virtio-gpu, since that's used by our reference platform
> Cuttlefish(Cloud Android), we have to integrate the same tracepoint as
> well to enforce the use of this tracepoint and the eBPF stuff built on
> top to support runtime query of gpu memory on production devices. For
> virtio-gpu at this moment, we only want to track GEM allocations since
> PRIME import is currently not supported/used in Cuttlefish. That's all
> we are doing in this small patch.

take care,
  Gerd

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Reply via email to