Re: [PATCH] drm: support up to 128 drm devices

2023-07-17 Thread Simon Ser
On Monday, July 17th, 2023 at 15:24, Emil Velikov  
wrote:

> > > For going forward, here is one way we can shave this yak:
> > >  - update libdrm to max 64 nodes
> > >  - roll libdrm release, nag distributions to update to it // could be
> > > folded with the next release below
> > >  - update libdrm to use name driven type detection
> > >  - roll libdrm release, nag distributions to update to it
> > >  - once ^^ release hits most distributions, merge the above proposed
> > > kernel patch
> > >- the commit message should explain the caveats and fixed libdrm 
> > > version
> > >- we should be prepared to revert the commit, if it causes user
> > > space regression - fix (see below) and re-introduce the kernel patch
> > > 1-2 releases later
> >
> > That sounds really scary to me. I'd really prefer to try not to break the
> > kernel uAPI here.
> >
> 
> With part in particular? Mind you I'm not particularly happy either,
> since in essence it's like a controlled explosion.

I believe there are ways to extend the uAPI to support more devices without
breaking the uAPI. Michał Winiarski's patch for instance tried something to
this effect.

> > The kernel rule is "do not break user-space".
> 
> Yes, in a perfect world. In practice, there have been multiple kernel
> changes breaking user-space. Some got reverted, some remained.
> AFAICT the above will get us out of the sticky situation we're in with
> the least amount of explosion.
> 
> If there is a concrete proposal, please go ahead and sorry if I've
> missed it. I'm supposed to be off, having fun with family when I saw
> this whole thing explode.
> 
> Small note: literally all the users I've seen will stop on a missing
> node (card or render) aka if the kernel creates card0...63 and then
> card200... then (hard wavy estimate) 80% of the apps will be broken.

That's fine, because that's not a kernel regression. Supporting more than 64
devices is a new kernel feature, and if some user-space ignores the new nodes,
that's not a kernel regression. A regression only happens when a use-case which
works with an older kernel is broken by a newer kernel.


Re: [PATCH] drm: support up to 128 drm devices

2023-07-17 Thread Emil Velikov
On Mon, 17 Jul 2023 at 10:45, Simon Ser  wrote:
>
> On Monday, July 17th, 2023 at 09:30, Emil Velikov  
> wrote:
>
> > > I'm worried what might happen with old user-space, especially old libdrm.
> >
> > I also share the same concern. Although the bigger issue is not libdrm
> > - since we can update it and prod distributions to update it.
> > The biggest concern is software that cannot be rebuild/updated -
> > closed source and various open-source that has been abandoned.
>
> Well. Now that we have Flatpak and AppImage and friends, we're really likely
> to have old libdrm copies vendored all over the place, and these will stick
> around essentially forever.
>

The flatpak devs have been very helpful. So I'm pretty sure we can get
those updated - even for older flatpaks.
For AppImage, I have no experience.

> > For going forward, here is one way we can shave this yak:
> >  - update libdrm to max 64 nodes
> >  - roll libdrm release, nag distributions to update to it // could be
> > folded with the next release below
> >  - update libdrm to use name driven type detection
> >  - roll libdrm release, nag distributions to update to it
> >  - once ^^ release hits most distributions, merge the above proposed
> > kernel patch
> >- the commit message should explain the caveats and fixed libdrm version
> >- we should be prepared to revert the commit, if it causes user
> > space regression - fix (see below) and re-introduce the kernel patch
> > 1-2 releases later
>
> That sounds really scary to me. I'd really prefer to try not to break the
> kernel uAPI here.
>

With part in particular? Mind you I'm not particularly happy either,
since in essence it's like a controlled explosion.

> The kernel rule is "do not break user-space".

Yes, in a perfect world. In practice, there have been multiple kernel
changes breaking user-space. Some got reverted, some remained.
AFAICT the above will get us out of the sticky situation we're in with
the least amount of explosion.

If there is a concrete proposal, please go ahead and sorry if I've
missed it. I'm supposed to be off, having fun with family when I saw
this whole thing explode.

Small note: literally all the users I've seen will stop on a missing
node (card or render) aka if the kernel creates card0...63 and then
card200... then (hard wavy estimate) 80% of the apps will be broken.

HTH
Emil


Re: [PATCH] drm: support up to 128 drm devices

2023-07-17 Thread Simon Ser
On Monday, July 17th, 2023 at 09:30, Emil Velikov  
wrote:

> > I'm worried what might happen with old user-space, especially old libdrm.
> 
> I also share the same concern. Although the bigger issue is not libdrm
> - since we can update it and prod distributions to update it.
> The biggest concern is software that cannot be rebuild/updated -
> closed source and various open-source that has been abandoned.

Well. Now that we have Flatpak and AppImage and friends, we're really likely
to have old libdrm copies vendored all over the place, and these will stick
around essentially forever.

> For going forward, here is one way we can shave this yak:
>  - update libdrm to max 64 nodes
>  - roll libdrm release, nag distributions to update to it // could be
> folded with the next release below
>  - update libdrm to use name driven type detection
>  - roll libdrm release, nag distributions to update to it
>  - once ^^ release hits most distributions, merge the above proposed
> kernel patch
>- the commit message should explain the caveats and fixed libdrm version
>- we should be prepared to revert the commit, if it causes user
> space regression - fix (see below) and re-introduce the kernel patch
> 1-2 releases later

That sounds really scary to me. I'd really prefer to try not to break the
kernel uAPI here.

The kernel rule is "do not break user-space".


Re: [PATCH] drm: support up to 128 drm devices

2023-07-17 Thread Emil Velikov
On Fri, 14 Jul 2023 at 11:32, Simon Ser  wrote:
>
> (cc Daniel Vetter and Pekka because this change has uAPI repercussions)
>
> On Friday, June 30th, 2023 at 13:56, James Zhu  wrote:
>
> > From: Christian König 
> >
> > This makes room for up to 128 DRM devices.
> >
> > Signed-off-by: Christian König 
> > Signed-off-by: James Zhu 
> > ---
> >  drivers/gpu/drm/drm_drv.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/drivers/gpu/drm/drm_drv.c b/drivers/gpu/drm/drm_drv.c
> > index 73b845a75d52..0d55c6f5 100644
> > --- a/drivers/gpu/drm/drm_drv.c
> > +++ b/drivers/gpu/drm/drm_drv.c
> > @@ -137,7 +137,7 @@ static int drm_minor_alloc(struct drm_device *dev, 
> > unsigned int type)
> >   r = idr_alloc(_minors_idr,
> >   NULL,
> >   64 * type,
> > - 64 * (type + 1),
> > + 64 * (type + 2),
>
> The type comes from this enum:
>
> enum drm_minor_type {
> DRM_MINOR_PRIMARY,
> DRM_MINOR_CONTROL,
> DRM_MINOR_RENDER,
> DRM_MINOR_ACCEL = 32,
> };
>
> Before this patch, 0..63 are for primary, 64..127 are for control (never
> exposed by the kernel), 128..191 are for render, 2048..2112 are for accel.
> After this patch, 0..127 are for primary, 64..191 are for control (never
> exposed by the kernel), 128..255 are for render, 2048..2176 are for accel.
>
> I'm worried what might happen with old user-space, especially old libdrm.

I also share the same concern. Although the bigger issue is not libdrm
- since we can update it and prod distributions to update it.
The biggest concern is software that cannot be rebuild/updated -
closed source and various open-source that has been abandoned.

As mentioned in the gitlab ticket - the current style of embedding the
uABI/uAPI in each and every application is not great IMHO. That is why
I've introduced the `drmGetDevices2` API, to consolidate most of the
chaos in a single place.

For going forward, here is one way we can shave this yak:
 - update libdrm to max 64 nodes
 - roll libdrm release, nag distributions to update to it // could be
folded with the next release below
 - update libdrm to use name driven type detection
 - roll libdrm release, nag distributions to update to it
 - once ^^ release hits most distributions, merge the above proposed
kernel patch
   - the commit message should explain the caveats and fixed libdrm version
   - we should be prepared to revert the commit, if it causes user
space regression - fix (see below) and re-introduce the kernel patch
1-2 releases later

In parallel to all the above, as a community we should collectively
audit and update open-source applications to the `drmDevices2` API.

As with other legacy (DRI1), this one will take some time but we can get there.

HTH
Emil