Re: [PATCH 01/13] drm/amdgpu: introduce and honour DRM_FORCE_AUTH workaround

2019-06-24 Thread Daniel Vetter
On Mon, Jun 24, 2019 at 11:37 AM Michel Dänzer  wrote:
> On 2019-06-21 5:52 p.m., Michel Dänzer wrote:
> >> That was the entire point of kms, and it succeed really well. That's
> >> why I don't think amdgpu doing their own flavour pick is going to help
> >> anyone here,
> > Other drivers are welcome to emulate amdgpu's design making the above
> > possible. :)
>
> Seriously though, I don't think any changes outside of libdrm/Mesa
> should be needed with other drivers either. Fundamentally there's
> nothing magic about it, just sharing BOs via PRIME between the render
> node file description and that passed in via the EGL/GBM/... API.

The libdrm/mesa design isn't the hard part, we have that even as a
helper for etnaviv and all those drivers. Being inconsistent here is a
think more a nuisance for integrators, who might want to sandbox gpu
clients real hard and really want to know what access rights they need
to give out.

But then we have much, much bigger fish to fry from a cross-driver
consistency pov, so *shrug*. Just feels somewhat silly we can't even
get agreement or some kind of consistent plan on something fairly
simple like this.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [PATCH 01/13] drm/amdgpu: introduce and honour DRM_FORCE_AUTH workaround

2019-06-24 Thread Michel Dänzer
On 2019-06-21 5:52 p.m., Michel Dänzer wrote:
> On 2019-06-21 5:44 p.m., Daniel Vetter wrote:
>> On Fri, Jun 21, 2019 at 05:15:22PM +0200, Michel Dänzer wrote:
>>> On 2019-06-21 1:50 p.m., Daniel Vetter wrote:
 On Fri, Jun 21, 2019 at 1:37 PM Christian König
  wrote:
> Am 21.06.19 um 13:03 schrieb Daniel Vetter:
>> So if you want to depracate render functionality on primary nodes, you
>> _have_ to do that hiding in userspace. Or you'll break a lot of
>> compositors everywhere. Just testing -amdgpu doesn't really prove
>> anything here. So you won't move the larger ecosystem with this at
>> all, that ship sailed.
>
> So what else can we do? That sounds like you suggest we should
> completely drop render node functionality.
>
> I mean it's not my preferred option, but certainly something that
> everybody could do.
>
> My primary concern is that we somehow need to get rid of thinks like GEM
> flink and all that other crufty stuff we still have around on the
> primary node (we should probably make a list of that).
>
> Switching everything over to render nodes just sounded like the best
> alternative so far to archive that.

 tbh I do like that plan too, but it's a lot more work. And I think to
 have any push whatsoever we probably need to roll it out in gbm as a
 hack to keep things going. But probably not enough.

 I also think that at least some compositors will bother to do the
 right thing, and actually bother with render nodes and all that
 correctly. It's just that there's way more which dont.
>>>
>>> With amdgpu, we can make userspace always use render nodes for rendering
>>> with changes to libdrm_amdgpu/radeonsi/xf86-video-amdgpu (and maybe some
>>> amdgpu-pro components) only. No GBM/compositor changes needed.
>>
>> This is a very non-exhaustive list of userspace that runs on your driver
>> ... This wayland compositor thing, actually shipping now, and there's many 
>> :-)
> 
> You don't seem to understand what I wrote. Everything will work at least
> as well as now, without any other changes.
> 
> [...]
> 
>> That was the entire point of kms, and it succeed really well. That's
>> why I don't think amdgpu doing their own flavour pick is going to help
>> anyone here,
> Other drivers are welcome to emulate amdgpu's design making the above
> possible. :)

Seriously though, I don't think any changes outside of libdrm/Mesa
should be needed with other drivers either. Fundamentally there's
nothing magic about it, just sharing BOs via PRIME between the render
node file description and that passed in via the EGL/GBM/... API.


-- 
Earthling Michel Dänzer   |  https://www.amd.com
Libre software enthusiast | Mesa and X developer
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [PATCH 01/13] drm/amdgpu: introduce and honour DRM_FORCE_AUTH workaround

2019-06-21 Thread Michel Dänzer
On 2019-06-21 5:44 p.m., Daniel Vetter wrote:
> On Fri, Jun 21, 2019 at 05:15:22PM +0200, Michel Dänzer wrote:
>> On 2019-06-21 1:50 p.m., Daniel Vetter wrote:
>>> On Fri, Jun 21, 2019 at 1:37 PM Christian König
>>>  wrote:
 Am 21.06.19 um 13:03 schrieb Daniel Vetter:
> So if you want to depracate render functionality on primary nodes, you
> _have_ to do that hiding in userspace. Or you'll break a lot of
> compositors everywhere. Just testing -amdgpu doesn't really prove
> anything here. So you won't move the larger ecosystem with this at
> all, that ship sailed.

 So what else can we do? That sounds like you suggest we should
 completely drop render node functionality.

 I mean it's not my preferred option, but certainly something that
 everybody could do.

 My primary concern is that we somehow need to get rid of thinks like GEM
 flink and all that other crufty stuff we still have around on the
 primary node (we should probably make a list of that).

 Switching everything over to render nodes just sounded like the best
 alternative so far to archive that.
>>>
>>> tbh I do like that plan too, but it's a lot more work. And I think to
>>> have any push whatsoever we probably need to roll it out in gbm as a
>>> hack to keep things going. But probably not enough.
>>>
>>> I also think that at least some compositors will bother to do the
>>> right thing, and actually bother with render nodes and all that
>>> correctly. It's just that there's way more which dont.
>>
>> With amdgpu, we can make userspace always use render nodes for rendering
>> with changes to libdrm_amdgpu/radeonsi/xf86-video-amdgpu (and maybe some
>> amdgpu-pro components) only. No GBM/compositor changes needed.
> 
> This is a very non-exhaustive list of userspace that runs on your driver
> ... This wayland compositor thing, actually shipping now, and there's many :-)

You don't seem to understand what I wrote. Everything will work at least
as well as now, without any other changes.


> Once we picked a color it's a simple technical matter of how to roll
> it out, using Kconfig options, or only enabling on new hw, or "merge
> and fix the regressions as they come" or any of the other well proven
> paths forward.

 Yeah, the problem is I don't see an option which currently works for
 everyone.

 I absolutely need a grace time of multiple years until we can apply this
 to amdgpu/radeon.
>>>
>>> Yeah that's what I meant with "pick a color, pick a way to roll it
>>> out". "enable for new hw, roll out years and years later" is one of
>>> the options for roll out.
>>
>> One gotcha being that generic userspace such as the Xorg modesetting
>> driver may still try to use phased out functionality such as DRI2 with
>> new hardware.
> 
> There's a lot more generic userspace than -modesetting.

That is affected by phasing out DRI2 related functionality? (I thought
that was the context of this sub-sub-thread)


> That was the entire point of kms, and it succeed really well. That's
> why I don't think amdgpu doing their own flavour pick is going to help
> anyone here,
Other drivers are welcome to emulate amdgpu's design making the above
possible. :)


> except maybe if all you care about is the amd stand-alone> stack only.
> But then why do you bother with this upstream stuff at all> ...

I'm afraid you've lost me here.


-- 
Earthling Michel Dänzer   |  https://www.amd.com
Libre software enthusiast | Mesa and X developer
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [PATCH 01/13] drm/amdgpu: introduce and honour DRM_FORCE_AUTH workaround

2019-06-21 Thread Daniel Vetter
On Fri, Jun 21, 2019 at 05:15:22PM +0200, Michel Dänzer wrote:
> On 2019-06-21 1:50 p.m., Daniel Vetter wrote:
> > On Fri, Jun 21, 2019 at 1:37 PM Christian König
> >  wrote:
> >> Am 21.06.19 um 13:03 schrieb Daniel Vetter:
> >>> So if you want to depracate render functionality on primary nodes, you
> >>> _have_ to do that hiding in userspace. Or you'll break a lot of
> >>> compositors everywhere. Just testing -amdgpu doesn't really prove
> >>> anything here. So you won't move the larger ecosystem with this at
> >>> all, that ship sailed.
> >>
> >> So what else can we do? That sounds like you suggest we should
> >> completely drop render node functionality.
> >>
> >> I mean it's not my preferred option, but certainly something that
> >> everybody could do.
> >>
> >> My primary concern is that we somehow need to get rid of thinks like GEM
> >> flink and all that other crufty stuff we still have around on the
> >> primary node (we should probably make a list of that).
> >>
> >> Switching everything over to render nodes just sounded like the best
> >> alternative so far to archive that.
> > 
> > tbh I do like that plan too, but it's a lot more work. And I think to
> > have any push whatsoever we probably need to roll it out in gbm as a
> > hack to keep things going. But probably not enough.
> > 
> > I also think that at least some compositors will bother to do the
> > right thing, and actually bother with render nodes and all that
> > correctly. It's just that there's way more which dont.
> 
> With amdgpu, we can make userspace always use render nodes for rendering
> with changes to libdrm_amdgpu/radeonsi/xf86-video-amdgpu (and maybe some
> amdgpu-pro components) only. No GBM/compositor changes needed.

This is a very non-exhaustive list of userspace that runs on your driver
... This wayland compositor thing, actually shipping now, and there's many :-)

> >>> At that point this all feels like a bikeshed, and for a bikeshed I
> >>> don't care what the color is we pick, as long as they're all painted
> >>> the same.
> 
> Then some sheds shouldn't have been re-painted without DRM_AUTH already...

Christian proposed that and then didn't feel like reverting it, plus vc4,
and tegra never bothered with it. There's quite a bit more than the tail
out of this particular bag already.

> >>> Once we picked a color it's a simple technical matter of how to roll
> >>> it out, using Kconfig options, or only enabling on new hw, or "merge
> >>> and fix the regressions as they come" or any of the other well proven
> >>> paths forward.
> >>
> >> Yeah, the problem is I don't see an option which currently works for
> >> everyone.
> >>
> >> I absolutely need a grace time of multiple years until we can apply this
> >> to amdgpu/radeon.
> > 
> > Yeah that's what I meant with "pick a color, pick a way to roll it
> > out". "enable for new hw, roll out years and years later" is one of
> > the options for roll out.
> 
> One gotcha being that generic userspace such as the Xorg modesetting
> driver may still try to use phased out functionality such as DRI2 with
> new hardware.

There's a lot more generic userspace than -modesetting. That was the
entire point of kms, and it succeed really well. That's why I don't think
amdgpu doing their own flavour pick is going to help anyone here, except
maybe if all you care about is the amd stand-alone stack only. But then
why do you bother with this upstream stuff at all ...

> >> How about this:
> >> 1. We keep DRM_AUTH around for amdgpu/radeon for now.
> >> 2. We add a Kconfig option to ignore DRM_AUTH, currently default to N.
> >> This also works as a workaround for old RADV.
> >> 3. We start to work on further removing old cruft from the primary node.
> >> Possible the Kconfig option I suggested to disable GEM flink.
> > 
> > Hm I'm not worried about flink really. It's gem_open which is the
> > security gap, and that one has to stay master-only forever.
> 
> GEM_OPEN is used by DRI2 clients, it can't be master-only. It's probably
> one of the main reasons for the existence of DRM_AUTH.

Oh sweet I forgot we're allowing this both ways :-/ Well doesn't change that
much really, once we break one of these the other isn't useful anymore
either.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [PATCH 01/13] drm/amdgpu: introduce and honour DRM_FORCE_AUTH workaround

2019-06-21 Thread Daniel Vetter
On Fri, Jun 21, 2019 at 01:00:17PM +, Koenig, Christian wrote:
> Am 21.06.19 um 14:47 schrieb Emil Velikov:
> > On 2019/06/21, Koenig, Christian wrote:
> >> Am 21.06.19 um 13:58 schrieb Emil Velikov:
> >>> On 2019/06/21, Koenig, Christian wrote:
>  Am 21.06.19 um 12:53 schrieb Emil Velikov:
> > On 2019/06/21, Koenig, Christian wrote:
> >> [SNIP]
> >> Well partially. That RADV broke is unfortunate, but we have done so 
> >> many
> >> customized userspace stuff both based on Mesa/DDX as well as closed
> >> source components that it is just highly likely that we would break
> >> something else as well.
> >>
> > As an engineer I like to work with tangibles. The highly likely part is
> > probable, but as mentioned multiple times the series allows for a _dead_
> > trivial way to address any such problems.
>  Well to clarify my concern is that this won't be so trivial.
> 
>  You implemented on workaround for one specific case and it is perfectly
>  possible that for other cases we would have to completely revert the
>  removal of DRM_AUTH.
> 
> >>> I would encourage you to take a closer look at my patch and point out
> >>> how parcicular cases cannot be handled.
> >> Well the last time I looked it only blocked the first call to the IOCTL.
> >>
> > Hmm interesting, you're replied to my patch without even reading it :'-(
> 
> Well I've NAKed the series because of the exposure of the render node 
> functionality
> 
> > Can you please give it a close look and reply inline?
> >
> > [1] https://lists.freedesktop.org/archives/dri-devel/2019-May/219238.html
> 
> So the workaround no longer just blocks the first IOCTL.
> 
> But then the question is why don't you just keep the DRM_AUTH flag 
> instead of adding the same functionality with a new one?
> 
> >>>   From your experiense, do user-space developers care about doing (or
> >>> generally do) the right thing?
> >> No, not at all. Except for Marek and Michel they just take what works
> >> and even Marek is often short-cutting on this.
> >>
> > Interesting, guess I should have asked this question from the start.
> 
> Well sounds like you don't have to work with a closed source driver 
> team. But as I said I honestly would do the same as user space developer.

From an upstream kernel pov I've never cared about the blobs. I don't
upstream should terribly start caring about blobs - if they ship hacks
that don't reflect what the open stack does, their problem.
 
Speaking as someone who's pissed off closed source driver teams to no end.
I'm happy to be of service here too :-)

> I mean it's really a bunch of more code to maintain, and getting rid of 
> code is always less work in the long term then keeping it.
> 
> That kernel developers scream: No no, please don't do that we want to 
> keep using it is completely irrelevant for this.

Jokes aside, I think we should look at what's the right thing to do
looking at open source only, and if that gets the blobby folks shafted,
well so be it. Really not my problem, and I'm pretty sure Dave doesn't
care one hair of an inch more.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [PATCH 01/13] drm/amdgpu: introduce and honour DRM_FORCE_AUTH workaround

2019-06-21 Thread Michel Dänzer
On 2019-06-21 1:58 p.m., Emil Velikov wrote:
> On 2019/06/21, Koenig, Christian wrote:
>> Am 21.06.19 um 12:53 schrieb Emil Velikov:
>>> On 2019/06/21, Koenig, Christian wrote:
> 
> In particular, that user-space will "remove" render nodes.
 Yes, that is my main concern here. You basically make render nodes
 superfluously. That gives userspace all legitimization to also remove
 support for them. That is not stupid or evil or whatever, userspace
 would just follow the kernel design.

>>> Do you have an example of userspace doing such an extremely silly thing?
>>> It does seem like suspect you're a couple of steps beyond overcautious,
>>> perhaps rightfully so. Maybe you've seen some closed-source user-space
>>> going crazy? Or any other projects?
>>
>> The key point is that I don't think this is silly or strange or crazy at 
>> all. See the kernel defines the interface userspace can and should use.
>>
>> When the kernel defines that everything will work with the primary node 
>> it is perfectly valid for userspace to drop support for the render node.
>>
>> I mean why should they keep this? Just because we tell them not to do this?
>>
> From your experiense, do user-space developers care about doing (or
> generally do) the right thing?
> 
> In either case, as pointed already the cat is out of the bag - has been
> for years, and if developers did behave as you describe them they would
> have "removed" render nodes already.

That may be the case with userspace specific to DRM_AUTH-less kernel
drivers, but such userspace couldn't work with all the other drivers. So
I'd argue that while the bag may be open and the cat's tail may even be
sticking out already, the cat is still firmly in the bag. :)


-- 
Earthling Michel Dänzer   |  https://www.amd.com
Libre software enthusiast | Mesa and X developer
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [PATCH 01/13] drm/amdgpu: introduce and honour DRM_FORCE_AUTH workaround

2019-06-21 Thread Michel Dänzer
On 2019-06-21 1:50 p.m., Daniel Vetter wrote:
> On Fri, Jun 21, 2019 at 1:37 PM Christian König
>  wrote:
>> Am 21.06.19 um 13:03 schrieb Daniel Vetter:
>>> So if you want to depracate render functionality on primary nodes, you
>>> _have_ to do that hiding in userspace. Or you'll break a lot of
>>> compositors everywhere. Just testing -amdgpu doesn't really prove
>>> anything here. So you won't move the larger ecosystem with this at
>>> all, that ship sailed.
>>
>> So what else can we do? That sounds like you suggest we should
>> completely drop render node functionality.
>>
>> I mean it's not my preferred option, but certainly something that
>> everybody could do.
>>
>> My primary concern is that we somehow need to get rid of thinks like GEM
>> flink and all that other crufty stuff we still have around on the
>> primary node (we should probably make a list of that).
>>
>> Switching everything over to render nodes just sounded like the best
>> alternative so far to archive that.
> 
> tbh I do like that plan too, but it's a lot more work. And I think to
> have any push whatsoever we probably need to roll it out in gbm as a
> hack to keep things going. But probably not enough.
> 
> I also think that at least some compositors will bother to do the
> right thing, and actually bother with render nodes and all that
> correctly. It's just that there's way more which dont.

With amdgpu, we can make userspace always use render nodes for rendering
with changes to libdrm_amdgpu/radeonsi/xf86-video-amdgpu (and maybe some
amdgpu-pro components) only. No GBM/compositor changes needed.


>>> At that point this all feels like a bikeshed, and for a bikeshed I
>>> don't care what the color is we pick, as long as they're all painted
>>> the same.

Then some sheds shouldn't have been re-painted without DRM_AUTH already...


>>> Once we picked a color it's a simple technical matter of how to roll
>>> it out, using Kconfig options, or only enabling on new hw, or "merge
>>> and fix the regressions as they come" or any of the other well proven
>>> paths forward.
>>
>> Yeah, the problem is I don't see an option which currently works for
>> everyone.
>>
>> I absolutely need a grace time of multiple years until we can apply this
>> to amdgpu/radeon.
> 
> Yeah that's what I meant with "pick a color, pick a way to roll it
> out". "enable for new hw, roll out years and years later" is one of
> the options for roll out.

One gotcha being that generic userspace such as the Xorg modesetting
driver may still try to use phased out functionality such as DRI2 with
new hardware.


>> How about this:
>> 1. We keep DRM_AUTH around for amdgpu/radeon for now.
>> 2. We add a Kconfig option to ignore DRM_AUTH, currently default to N.
>> This also works as a workaround for old RADV.
>> 3. We start to work on further removing old cruft from the primary node.
>> Possible the Kconfig option I suggested to disable GEM flink.
> 
> Hm I'm not worried about flink really. It's gem_open which is the
> security gap, and that one has to stay master-only forever.

GEM_OPEN is used by DRI2 clients, it can't be master-only. It's probably
one of the main reasons for the existence of DRM_AUTH.


-- 
Earthling Michel Dänzer   |  https://www.amd.com
Libre software enthusiast | Mesa and X developer
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [PATCH 01/13] drm/amdgpu: introduce and honour DRM_FORCE_AUTH workaround

2019-06-21 Thread Koenig, Christian
Am 21.06.19 um 14:47 schrieb Emil Velikov:
> On 2019/06/21, Koenig, Christian wrote:
>> Am 21.06.19 um 13:58 schrieb Emil Velikov:
>>> On 2019/06/21, Koenig, Christian wrote:
 Am 21.06.19 um 12:53 schrieb Emil Velikov:
> On 2019/06/21, Koenig, Christian wrote:
>> [SNIP]
>> Well partially. That RADV broke is unfortunate, but we have done so many
>> customized userspace stuff both based on Mesa/DDX as well as closed
>> source components that it is just highly likely that we would break
>> something else as well.
>>
> As an engineer I like to work with tangibles. The highly likely part is
> probable, but as mentioned multiple times the series allows for a _dead_
> trivial way to address any such problems.
 Well to clarify my concern is that this won't be so trivial.

 You implemented on workaround for one specific case and it is perfectly
 possible that for other cases we would have to completely revert the
 removal of DRM_AUTH.

>>> I would encourage you to take a closer look at my patch and point out
>>> how parcicular cases cannot be handled.
>> Well the last time I looked it only blocked the first call to the IOCTL.
>>
> Hmm interesting, you're replied to my patch without even reading it :'-(

Well I've NAKed the series because of the exposure of the render node 
functionality

> Can you please give it a close look and reply inline?
>
> [1] https://lists.freedesktop.org/archives/dri-devel/2019-May/219238.html

So the workaround no longer just blocks the first IOCTL.

But then the question is why don't you just keep the DRM_AUTH flag 
instead of adding the same functionality with a new one?

>>>   From your experiense, do user-space developers care about doing (or
>>> generally do) the right thing?
>> No, not at all. Except for Marek and Michel they just take what works
>> and even Marek is often short-cutting on this.
>>
> Interesting, guess I should have asked this question from the start.

Well sounds like you don't have to work with a closed source driver 
team. But as I said I honestly would do the same as user space developer.

I mean it's really a bunch of more code to maintain, and getting rid of 
code is always less work in the long term then keeping it.

That kernel developers scream: No no, please don't do that we want to 
keep using it is completely irrelevant for this.

Christian.

>
> Thanks
> Emil

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [PATCH 01/13] drm/amdgpu: introduce and honour DRM_FORCE_AUTH workaround

2019-06-21 Thread Emil Velikov
On 2019/06/21, Koenig, Christian wrote:
> Am 21.06.19 um 13:58 schrieb Emil Velikov:
> > On 2019/06/21, Koenig, Christian wrote:
> >> Am 21.06.19 um 12:53 schrieb Emil Velikov:
> >>> On 2019/06/21, Koenig, Christian wrote:
>  [SNIP]
>  Well partially. That RADV broke is unfortunate, but we have done so many
>  customized userspace stuff both based on Mesa/DDX as well as closed
>  source components that it is just highly likely that we would break
>  something else as well.
> 
> >>> As an engineer I like to work with tangibles. The highly likely part is
> >>> probable, but as mentioned multiple times the series allows for a _dead_
> >>> trivial way to address any such problems.
> >> Well to clarify my concern is that this won't be so trivial.
> >>
> >> You implemented on workaround for one specific case and it is perfectly
> >> possible that for other cases we would have to completely revert the
> >> removal of DRM_AUTH.
> >>
> > I would encourage you to take a closer look at my patch and point out
> > how parcicular cases cannot be handled.
> 
> Well the last time I looked it only blocked the first call to the IOCTL.
> 
Hmm interesting, you're replied to my patch without even reading it :'-(
Can you please give it a close look and reply inline?

[1] https://lists.freedesktop.org/archives/dri-devel/2019-May/219238.html

> >  From your experiense, do user-space developers care about doing (or
> > generally do) the right thing?
> 
> No, not at all. Except for Marek and Michel they just take what works 
> and even Marek is often short-cutting on this.
> 
Interesting, guess I should have asked this question from the start.

Thanks
Emil
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [PATCH 01/13] drm/amdgpu: introduce and honour DRM_FORCE_AUTH workaround

2019-06-21 Thread Koenig, Christian
Am 21.06.19 um 13:58 schrieb Emil Velikov:
> On 2019/06/21, Koenig, Christian wrote:
>> Am 21.06.19 um 12:53 schrieb Emil Velikov:
>>> On 2019/06/21, Koenig, Christian wrote:
 [SNIP]
 Well partially. That RADV broke is unfortunate, but we have done so many
 customized userspace stuff both based on Mesa/DDX as well as closed
 source components that it is just highly likely that we would break
 something else as well.

>>> As an engineer I like to work with tangibles. The highly likely part is
>>> probable, but as mentioned multiple times the series allows for a _dead_
>>> trivial way to address any such problems.
>> Well to clarify my concern is that this won't be so trivial.
>>
>> You implemented on workaround for one specific case and it is perfectly
>> possible that for other cases we would have to completely revert the
>> removal of DRM_AUTH.
>>
> I would encourage you to take a closer look at my patch and point out
> how parcicular cases cannot be handled.

Well the last time I looked it only blocked the first call to the IOCTL.

If that is no longer the case then what is the actual difference to 
DRM_AUTH+DRM_ALLOW_RENDER?

> In particular, that user-space will "remove" render nodes.
 Yes, that is my main concern here. You basically make render nodes
 superfluously. That gives userspace all legitimization to also remove
 support for them. That is not stupid or evil or whatever, userspace
 would just follow the kernel design.

>>> Do you have an example of userspace doing such an extremely silly thing?
>>> It does seem like suspect you're a couple of steps beyond overcautious,
>>> perhaps rightfully so. Maybe you've seen some closed-source user-space
>>> going crazy? Or any other projects?
>> The key point is that I don't think this is silly or strange or crazy at
>> all. See the kernel defines the interface userspace can and should use.
>>
>> When the kernel defines that everything will work with the primary node
>> it is perfectly valid for userspace to drop support for the render node.
>>
>> I mean why should they keep this? Just because we tell them not to do this?
>>
>  From your experiense, do user-space developers care about doing (or
> generally do) the right thing?

No, not at all. Except for Marek and Michel they just take what works 
and even Marek is often short-cutting on this.

> In either case, as pointed already the cat is out of the bag - has been
> for years, and if developers did behave as you describe them they would
> have "removed" render nodes already.

Currently render nodes are mandatory because they are needed for 
headless operation.

E.g. we have a whole bunch of customers which do transcoding with that 
on GPUs which don't even have a CRTC or even X running.

Regards,
Christian.
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [PATCH 01/13] drm/amdgpu: introduce and honour DRM_FORCE_AUTH workaround

2019-06-21 Thread Emil Velikov
On 2019/06/21, Daniel Vetter wrote:
> On Fri, Jun 21, 2019 at 1:50 PM Daniel Vetter  wrote:
> >
> > On Fri, Jun 21, 2019 at 1:37 PM Christian König
> >  wrote:
> > >
> > > Am 21.06.19 um 13:03 schrieb Daniel Vetter:
> > > > On Fri, Jun 21, 2019 at 12:31 PM Koenig, Christian
> > > >  wrote:
> > > >> Am 21.06.19 um 12:20 schrieb Emil Velikov:
> > > >>> In particular, that user-space will "remove" render nodes.
> > > >> Yes, that is my main concern here. You basically make render nodes
> > > >> superfluously. That gives userspace all legitimization to also remove
> > > >> support for them. That is not stupid or evil or whatever, userspace
> > > >> would just follow the kernel design.
> > > > This already happened. At least for kms-only display socs we had to
> > > > hide the separate render node (and there you really have to deal with
> > > > the separate render node, because it's a distinct driver) entirely
> > > > within gbm/mesa.
> > >
> > > Ok, that is something I didn't knew before and is rather interesting.
> > >
> > > > So if you want to depracate render functionality on primary nodes, you
> > > > _have_ to do that hiding in userspace. Or you'll break a lot of
> > > > compositors everywhere. Just testing -amdgpu doesn't really prove
> > > > anything here. So you won't move the larger ecosystem with this at
> > > > all, that ship sailed.
> > >
> > > So what else can we do? That sounds like you suggest we should
> > > completely drop render node functionality.
> > >
> > > I mean it's not my preferred option, but certainly something that
> > > everybody could do.
> > >
> > > My primary concern is that we somehow need to get rid of thinks like GEM
> > > flink and all that other crufty stuff we still have around on the
> > > primary node (we should probably make a list of that).
> > >
> > > Switching everything over to render nodes just sounded like the best
> > > alternative so far to archive that.
> >
> > tbh I do like that plan too, but it's a lot more work. And I think to
> > have any push whatsoever we probably need to roll it out in gbm as a
> > hack to keep things going. But probably not enough.
> >
> > I also think that at least some compositors will bother to do the
> > right thing, and actually bother with render nodes and all that
> > correctly. It's just that there's way more which dont.
> >
> > Also for server rendering it'll be render nodes all the way down I
> > hope (or we need to seriously educate cloud people about the
> > permissions they set on their default images, and distros on how this
> > cloud stuff should work.
> >
> > > > At that point this all feels like a bikeshed, and for a bikeshed I
> > > > don't care what the color is we pick, as long as they're all painted
> > > > the same.
> > > >
> > > > Once we picked a color it's a simple technical matter of how to roll
> > > > it out, using Kconfig options, or only enabling on new hw, or "merge
> > > > and fix the regressions as they come" or any of the other well proven
> > > > paths forward.
> > >
> > > Yeah, the problem is I don't see an option which currently works for
> > > everyone.
> > >
> > > I absolutely need a grace time of multiple years until we can apply this
> > > to amdgpu/radeon.
> >
> > Yeah that's what I meant with "pick a color, pick a way to roll it
> > out". "enable for new hw, roll out years and years later" is one of
> > the options for roll out.
> >
> > > And that under the prerequisite to have a plan to somehow enable that
> > > functionality now to make it at least painful for userspace to rely on
> > > hack around that.
> > >
> > > > So if you want to do this, please start with the mesa side work (as
> > > > the biggest userspace, not all of it) with patches to redirect all
> > > > primary node rendering to render nodes. The code is there already for
> > > > socs, just needs to be rolled out more. Or we go with the DRM_AUTH
> > > > horrors. Or a 3rd option, I really don't care which it is, as long as
> > > > its consistent.
> > >
> > > How about this:
> > > 1. We keep DRM_AUTH around for amdgpu/radeon for now.
> > > 2. We add a Kconfig option to ignore DRM_AUTH, currently default to N.
> > > This also works as a workaround for old RADV.
> > > 3. We start to work on further removing old cruft from the primary node.
> > > Possible the Kconfig option I suggested to disable GEM flink.
> >
> > Hm I'm not worried about flink really. It's gem_open which is the
> > security gap, and that one has to stay master-only forever. I guess we
> > could try to disable that so people have to deal with dma-buf (and
> > once you have that render nodes become a lot easier to use). But then
> > I still think we have drivers which don't do dma-buf self-import, so
> > again we're stuck. So maybe first step is to just roll out a default
> > self-import dma-buf support for every gem driver. Then start ditching
> > flink/gem_open. Then start ditching render support on primary nodes.
> > Every step in the way taking a few years of 

Re: [PATCH 01/13] drm/amdgpu: introduce and honour DRM_FORCE_AUTH workaround

2019-06-21 Thread Emil Velikov
On 2019/06/21, Koenig, Christian wrote:
> Am 21.06.19 um 12:53 schrieb Emil Velikov:
> > On 2019/06/21, Koenig, Christian wrote:
> >> [SNIP]
> >> Well partially. That RADV broke is unfortunate, but we have done so many
> >> customized userspace stuff both based on Mesa/DDX as well as closed
> >> source components that it is just highly likely that we would break
> >> something else as well.
> >>
> > As an engineer I like to work with tangibles. The highly likely part is
> > probable, but as mentioned multiple times the series allows for a _dead_
> > trivial way to address any such problems.
> 
> Well to clarify my concern is that this won't be so trivial.
> 
> You implemented on workaround for one specific case and it is perfectly 
> possible that for other cases we would have to completely revert the 
> removal of DRM_AUTH.
> 
I would encourage you to take a closer look at my patch and point out
how parcicular cases cannot be handled.

> >>> In particular, that user-space will "remove" render nodes.
> >> Yes, that is my main concern here. You basically make render nodes
> >> superfluously. That gives userspace all legitimization to also remove
> >> support for them. That is not stupid or evil or whatever, userspace
> >> would just follow the kernel design.
> >>
> > Do you have an example of userspace doing such an extremely silly thing?
> > It does seem like suspect you're a couple of steps beyond overcautious,
> > perhaps rightfully so. Maybe you've seen some closed-source user-space
> > going crazy? Or any other projects?
> 
> The key point is that I don't think this is silly or strange or crazy at 
> all. See the kernel defines the interface userspace can and should use.
> 
> When the kernel defines that everything will work with the primary node 
> it is perfectly valid for userspace to drop support for the render node.
> 
> I mean why should they keep this? Just because we tell them not to do this?
> 
From your experiense, do user-space developers care about doing (or
generally do) the right thing?

In either case, as pointed already the cat is out of the bag - has been
for years, and if developers did behave as you describe them they would
have "removed" render nodes already.

> > In other words, being cautious is great, but without references of
> > misuse it's very hard for others to draw the full picture.
> >
> >>> I'm really sad that amdgpu insists on standing out, hope one day it will
> >>> converge. Yet since all other maintainers are ok with the series, I'll
> >>> start merging patches in a few hours. I'm really sad that amdgpu wants
> >>> to stand out, hope it will converge sooner rather than later.
> >>>
> >>> Christian, how would you like amdgpu handled - with a separate flag in
> >>> the driver or shall we special case amdgpu within core drm?
> >> No, I ask you to please stick around DRM_AUTH for at least amdgpu and
> >> radeon. And NOT enable any render node functionality for them on the
> >> primary node.
> >>
> > My question is how do you want this handled:
> >   - DRM_PLEASE_FORCE_AUTH - added to AMDGPU/RADEON, or
> >   - driver_name == amdgpu, in core DRM.
> 
> I want to keep DRM_AUTH in amdgpu and radeon for at least the next 5-10 
> years.
> 
Believe we're all fully aware of that fact. I'm asking which _approach_
you prefer. That said, I'll send a new patch/series and we'll nitpick it
there.

Thanks
-Emil
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [PATCH 01/13] drm/amdgpu: introduce and honour DRM_FORCE_AUTH workaround

2019-06-21 Thread Daniel Vetter
On Fri, Jun 21, 2019 at 1:50 PM Daniel Vetter  wrote:
>
> On Fri, Jun 21, 2019 at 1:37 PM Christian König
>  wrote:
> >
> > Am 21.06.19 um 13:03 schrieb Daniel Vetter:
> > > On Fri, Jun 21, 2019 at 12:31 PM Koenig, Christian
> > >  wrote:
> > >> Am 21.06.19 um 12:20 schrieb Emil Velikov:
> > >>> In particular, that user-space will "remove" render nodes.
> > >> Yes, that is my main concern here. You basically make render nodes
> > >> superfluously. That gives userspace all legitimization to also remove
> > >> support for them. That is not stupid or evil or whatever, userspace
> > >> would just follow the kernel design.
> > > This already happened. At least for kms-only display socs we had to
> > > hide the separate render node (and there you really have to deal with
> > > the separate render node, because it's a distinct driver) entirely
> > > within gbm/mesa.
> >
> > Ok, that is something I didn't knew before and is rather interesting.
> >
> > > So if you want to depracate render functionality on primary nodes, you
> > > _have_ to do that hiding in userspace. Or you'll break a lot of
> > > compositors everywhere. Just testing -amdgpu doesn't really prove
> > > anything here. So you won't move the larger ecosystem with this at
> > > all, that ship sailed.
> >
> > So what else can we do? That sounds like you suggest we should
> > completely drop render node functionality.
> >
> > I mean it's not my preferred option, but certainly something that
> > everybody could do.
> >
> > My primary concern is that we somehow need to get rid of thinks like GEM
> > flink and all that other crufty stuff we still have around on the
> > primary node (we should probably make a list of that).
> >
> > Switching everything over to render nodes just sounded like the best
> > alternative so far to archive that.
>
> tbh I do like that plan too, but it's a lot more work. And I think to
> have any push whatsoever we probably need to roll it out in gbm as a
> hack to keep things going. But probably not enough.
>
> I also think that at least some compositors will bother to do the
> right thing, and actually bother with render nodes and all that
> correctly. It's just that there's way more which dont.
>
> Also for server rendering it'll be render nodes all the way down I
> hope (or we need to seriously educate cloud people about the
> permissions they set on their default images, and distros on how this
> cloud stuff should work.
>
> > > At that point this all feels like a bikeshed, and for a bikeshed I
> > > don't care what the color is we pick, as long as they're all painted
> > > the same.
> > >
> > > Once we picked a color it's a simple technical matter of how to roll
> > > it out, using Kconfig options, or only enabling on new hw, or "merge
> > > and fix the regressions as they come" or any of the other well proven
> > > paths forward.
> >
> > Yeah, the problem is I don't see an option which currently works for
> > everyone.
> >
> > I absolutely need a grace time of multiple years until we can apply this
> > to amdgpu/radeon.
>
> Yeah that's what I meant with "pick a color, pick a way to roll it
> out". "enable for new hw, roll out years and years later" is one of
> the options for roll out.
>
> > And that under the prerequisite to have a plan to somehow enable that
> > functionality now to make it at least painful for userspace to rely on
> > hack around that.
> >
> > > So if you want to do this, please start with the mesa side work (as
> > > the biggest userspace, not all of it) with patches to redirect all
> > > primary node rendering to render nodes. The code is there already for
> > > socs, just needs to be rolled out more. Or we go with the DRM_AUTH
> > > horrors. Or a 3rd option, I really don't care which it is, as long as
> > > its consistent.
> >
> > How about this:
> > 1. We keep DRM_AUTH around for amdgpu/radeon for now.
> > 2. We add a Kconfig option to ignore DRM_AUTH, currently default to N.
> > This also works as a workaround for old RADV.
> > 3. We start to work on further removing old cruft from the primary node.
> > Possible the Kconfig option I suggested to disable GEM flink.
>
> Hm I'm not worried about flink really. It's gem_open which is the
> security gap, and that one has to stay master-only forever. I guess we
> could try to disable that so people have to deal with dma-buf (and
> once you have that render nodes become a lot easier to use). But then
> I still think we have drivers which don't do dma-buf self-import, so
> again we're stuck. So maybe first step is to just roll out a default
> self-import dma-buf support for every gem driver. Then start ditching
> flink/gem_open. Then start ditching render support on primary nodes.
> Every step in the way taking a few years of prodding userspace plus
> even more years to wait until it's all gone.

I forgot one step here: I think we even still have drivers without
render node support. As long as those exists (and are still relevant)
then userspace needs primary 

Re: [PATCH 01/13] drm/amdgpu: introduce and honour DRM_FORCE_AUTH workaround

2019-06-21 Thread Daniel Vetter
On Fri, Jun 21, 2019 at 1:37 PM Christian König
 wrote:
>
> Am 21.06.19 um 13:03 schrieb Daniel Vetter:
> > On Fri, Jun 21, 2019 at 12:31 PM Koenig, Christian
> >  wrote:
> >> Am 21.06.19 um 12:20 schrieb Emil Velikov:
> >>> In particular, that user-space will "remove" render nodes.
> >> Yes, that is my main concern here. You basically make render nodes
> >> superfluously. That gives userspace all legitimization to also remove
> >> support for them. That is not stupid or evil or whatever, userspace
> >> would just follow the kernel design.
> > This already happened. At least for kms-only display socs we had to
> > hide the separate render node (and there you really have to deal with
> > the separate render node, because it's a distinct driver) entirely
> > within gbm/mesa.
>
> Ok, that is something I didn't knew before and is rather interesting.
>
> > So if you want to depracate render functionality on primary nodes, you
> > _have_ to do that hiding in userspace. Or you'll break a lot of
> > compositors everywhere. Just testing -amdgpu doesn't really prove
> > anything here. So you won't move the larger ecosystem with this at
> > all, that ship sailed.
>
> So what else can we do? That sounds like you suggest we should
> completely drop render node functionality.
>
> I mean it's not my preferred option, but certainly something that
> everybody could do.
>
> My primary concern is that we somehow need to get rid of thinks like GEM
> flink and all that other crufty stuff we still have around on the
> primary node (we should probably make a list of that).
>
> Switching everything over to render nodes just sounded like the best
> alternative so far to archive that.

tbh I do like that plan too, but it's a lot more work. And I think to
have any push whatsoever we probably need to roll it out in gbm as a
hack to keep things going. But probably not enough.

I also think that at least some compositors will bother to do the
right thing, and actually bother with render nodes and all that
correctly. It's just that there's way more which dont.

Also for server rendering it'll be render nodes all the way down I
hope (or we need to seriously educate cloud people about the
permissions they set on their default images, and distros on how this
cloud stuff should work.

> > At that point this all feels like a bikeshed, and for a bikeshed I
> > don't care what the color is we pick, as long as they're all painted
> > the same.
> >
> > Once we picked a color it's a simple technical matter of how to roll
> > it out, using Kconfig options, or only enabling on new hw, or "merge
> > and fix the regressions as they come" or any of the other well proven
> > paths forward.
>
> Yeah, the problem is I don't see an option which currently works for
> everyone.
>
> I absolutely need a grace time of multiple years until we can apply this
> to amdgpu/radeon.

Yeah that's what I meant with "pick a color, pick a way to roll it
out". "enable for new hw, roll out years and years later" is one of
the options for roll out.

> And that under the prerequisite to have a plan to somehow enable that
> functionality now to make it at least painful for userspace to rely on
> hack around that.
>
> > So if you want to do this, please start with the mesa side work (as
> > the biggest userspace, not all of it) with patches to redirect all
> > primary node rendering to render nodes. The code is there already for
> > socs, just needs to be rolled out more. Or we go with the DRM_AUTH
> > horrors. Or a 3rd option, I really don't care which it is, as long as
> > its consistent.
>
> How about this:
> 1. We keep DRM_AUTH around for amdgpu/radeon for now.
> 2. We add a Kconfig option to ignore DRM_AUTH, currently default to N.
> This also works as a workaround for old RADV.
> 3. We start to work on further removing old cruft from the primary node.
> Possible the Kconfig option I suggested to disable GEM flink.

Hm I'm not worried about flink really. It's gem_open which is the
security gap, and that one has to stay master-only forever. I guess we
could try to disable that so people have to deal with dma-buf (and
once you have that render nodes become a lot easier to use). But then
I still think we have drivers which don't do dma-buf self-import, so
again we're stuck. So maybe first step is to just roll out a default
self-import dma-buf support for every gem driver. Then start ditching
flink/gem_open. Then start ditching render support on primary nodes.
Every step in the way taking a few years of prodding userspace plus
even more years to wait until it's all gone.

Another option would be to extract the kms stuff from primary nodes,
but we've tried that with control nodes. Until I realized a few years
back that with control nodes it's impossible to get any rendered
buffer in there at all, so useless, and I removed it. No one ever
complained.

So yeah would be really nice if we could fix this, but the universe
conspires against us too much it seems. Hence the fallback 

Re: [PATCH 01/13] drm/amdgpu: introduce and honour DRM_FORCE_AUTH workaround

2019-06-21 Thread Christian König

Am 21.06.19 um 13:03 schrieb Daniel Vetter:

On Fri, Jun 21, 2019 at 12:31 PM Koenig, Christian
 wrote:

Am 21.06.19 um 12:20 schrieb Emil Velikov:

In particular, that user-space will "remove" render nodes.

Yes, that is my main concern here. You basically make render nodes
superfluously. That gives userspace all legitimization to also remove
support for them. That is not stupid or evil or whatever, userspace
would just follow the kernel design.

This already happened. At least for kms-only display socs we had to
hide the separate render node (and there you really have to deal with
the separate render node, because it's a distinct driver) entirely
within gbm/mesa.


Ok, that is something I didn't knew before and is rather interesting.


So if you want to depracate render functionality on primary nodes, you
_have_ to do that hiding in userspace. Or you'll break a lot of
compositors everywhere. Just testing -amdgpu doesn't really prove
anything here. So you won't move the larger ecosystem with this at
all, that ship sailed.


So what else can we do? That sounds like you suggest we should 
completely drop render node functionality.


I mean it's not my preferred option, but certainly something that 
everybody could do.


My primary concern is that we somehow need to get rid of thinks like GEM 
flink and all that other crufty stuff we still have around on the 
primary node (we should probably make a list of that).


Switching everything over to render nodes just sounded like the best 
alternative so far to archive that.



At that point this all feels like a bikeshed, and for a bikeshed I
don't care what the color is we pick, as long as they're all painted
the same.

Once we picked a color it's a simple technical matter of how to roll
it out, using Kconfig options, or only enabling on new hw, or "merge
and fix the regressions as they come" or any of the other well proven
paths forward.


Yeah, the problem is I don't see an option which currently works for 
everyone.


I absolutely need a grace time of multiple years until we can apply this 
to amdgpu/radeon.


And that under the prerequisite to have a plan to somehow enable that 
functionality now to make it at least painful for userspace to rely on 
hack around that.



So if you want to do this, please start with the mesa side work (as
the biggest userspace, not all of it) with patches to redirect all
primary node rendering to render nodes. The code is there already for
socs, just needs to be rolled out more. Or we go with the DRM_AUTH
horrors. Or a 3rd option, I really don't care which it is, as long as
its consistent.


How about this:
1. We keep DRM_AUTH around for amdgpu/radeon for now.
2. We add a Kconfig option to ignore DRM_AUTH, currently default to N. 
This also works as a workaround for old RADV.
3. We start to work on further removing old cruft from the primary node. 
Possible the Kconfig option I suggested to disable GEM flink.


Regards,
Christian.



tldr; consistent color choice on this bikeshed please.

Thanks, Daniel


___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [PATCH 01/13] drm/amdgpu: introduce and honour DRM_FORCE_AUTH workaround

2019-06-21 Thread Koenig, Christian
Am 21.06.19 um 12:53 schrieb Emil Velikov:
> On 2019/06/21, Koenig, Christian wrote:
>> [SNIP]
>> Well partially. That RADV broke is unfortunate, but we have done so many
>> customized userspace stuff both based on Mesa/DDX as well as closed
>> source components that it is just highly likely that we would break
>> something else as well.
>>
> As an engineer I like to work with tangibles. The highly likely part is
> probable, but as mentioned multiple times the series allows for a _dead_
> trivial way to address any such problems.

Well to clarify my concern is that this won't be so trivial.

You implemented on workaround for one specific case and it is perfectly 
possible that for other cases we would have to completely revert the 
removal of DRM_AUTH.

>>> In particular, that user-space will "remove" render nodes.
>> Yes, that is my main concern here. You basically make render nodes
>> superfluously. That gives userspace all legitimization to also remove
>> support for them. That is not stupid or evil or whatever, userspace
>> would just follow the kernel design.
>>
> Do you have an example of userspace doing such an extremely silly thing?
> It does seem like suspect you're a couple of steps beyond overcautious,
> perhaps rightfully so. Maybe you've seen some closed-source user-space
> going crazy? Or any other projects?

The key point is that I don't think this is silly or strange or crazy at 
all. See the kernel defines the interface userspace can and should use.

When the kernel defines that everything will work with the primary node 
it is perfectly valid for userspace to drop support for the render node.

I mean why should they keep this? Just because we tell them not to do this?

> In other words, being cautious is great, but without references of
> misuse it's very hard for others to draw the full picture.
>
>>> I'm really sad that amdgpu insists on standing out, hope one day it will
>>> converge. Yet since all other maintainers are ok with the series, I'll
>>> start merging patches in a few hours. I'm really sad that amdgpu wants
>>> to stand out, hope it will converge sooner rather than later.
>>>
>>> Christian, how would you like amdgpu handled - with a separate flag in
>>> the driver or shall we special case amdgpu within core drm?
>> No, I ask you to please stick around DRM_AUTH for at least amdgpu and
>> radeon. And NOT enable any render node functionality for them on the
>> primary node.
>>
> My question is how do you want this handled:
>   - DRM_PLEASE_FORCE_AUTH - added to AMDGPU/RADEON, or
>   - driver_name == amdgpu, in core DRM.

I want to keep DRM_AUTH in amdgpu and radeon for at least the next 5-10 
years.

At least until we have established that nobody is using the primary node 
for command submission any more. Plus some grace time to make sure that 
this has been spread enough.

Regards,
Christian.

>
>
> Thanks
> Emil

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [PATCH 01/13] drm/amdgpu: introduce and honour DRM_FORCE_AUTH workaround

2019-06-21 Thread Daniel Vetter
On Fri, Jun 21, 2019 at 12:31 PM Koenig, Christian
 wrote:
> Am 21.06.19 um 12:20 schrieb Emil Velikov:
> > In particular, that user-space will "remove" render nodes.
>
> Yes, that is my main concern here. You basically make render nodes
> superfluously. That gives userspace all legitimization to also remove
> support for them. That is not stupid or evil or whatever, userspace
> would just follow the kernel design.

This already happened. At least for kms-only display socs we had to
hide the separate render node (and there you really have to deal with
the separate render node, because it's a distinct driver) entirely
within gbm/mesa.

So if you want to depracate render functionality on primary nodes, you
_have_ to do that hiding in userspace. Or you'll break a lot of
compositors everywhere. Just testing -amdgpu doesn't really prove
anything here. So you won't move the larger ecosystem with this at
all, that ship sailed.

At that point this all feels like a bikeshed, and for a bikeshed I
don't care what the color is we pick, as long as they're all painted
the same.

Once we picked a color it's a simple technical matter of how to roll
it out, using Kconfig options, or only enabling on new hw, or "merge
and fix the regressions as they come" or any of the other well proven
paths forward.

So if you want to do this, please start with the mesa side work (as
the biggest userspace, not all of it) with patches to redirect all
primary node rendering to render nodes. The code is there already for
socs, just needs to be rolled out more. Or we go with the DRM_AUTH
horrors. Or a 3rd option, I really don't care which it is, as long as
its consistent.

tldr; consistent color choice on this bikeshed please.

Thanks, Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [PATCH 01/13] drm/amdgpu: introduce and honour DRM_FORCE_AUTH workaround

2019-06-21 Thread Emil Velikov
On 2019/06/21, Koenig, Christian wrote:
> No this should apply to all drivers, not just amdgpu.
> 
> > While I suggested:
> >   - keeping amdgpu consistent with the rest
> >   - exposing the KConfig option for the whole of the kernel, and
> >   - merging it alongside this work
> >
> > So I effectively waited for weeks, instead of simply going ahead and
> > writing/submitting patches. That's a bit unfortunate...
> >
> >>> Because we simply made sure that userspace is using the render node for
> >>> command submission and not the primary node.
> >>>
> >>> So nobody can go ahead and remove render node support any more :)
> >> How does this work? I thought the entire problem of DRM_AUTH removal
> >> for you was that it broke stuff, and you didn't want to deal with
> >> that. We still have that exact problem afaics ... old userspace
> >> doesn't simply vanish, even if you entirely nuke it going forward.
> >>
> >> Also, testing on the amdgpu stack isn't really testing a hole lot
> >> here, it's all the various silly compositors out there I'd expect to
> >> fall over. Will gbm/radeonsi/whatever just internally do all the
> >> necessary dma-buf import/export between the primary node and display
> >> node to keep this all working?
> > If I understood Christian, feel free to correct me, the fact that my
> > earlier patch broke RADV was not the argument. The problem was was the
> > patch does.
> 
> Well partially. That RADV broke is unfortunate, but we have done so many 
> customized userspace stuff both based on Mesa/DDX as well as closed 
> source components that it is just highly likely that we would break 
> something else as well.
> 
As an engineer I like to work with tangibles. The highly likely part is
probable, but as mentioned multiple times the series allows for a _dead_
trivial way to address any such problems.

> > In particular, that user-space will "remove" render nodes.
> 
> Yes, that is my main concern here. You basically make render nodes 
> superfluously. That gives userspace all legitimization to also remove 
> support for them. That is not stupid or evil or whatever, userspace 
> would just follow the kernel design.
> 
Do you have an example of userspace doing such an extremely silly thing?
It does seem like suspect you're a couple of steps beyond overcautious,
perhaps rightfully so. Maybe you've seen some closed-source user-space
going crazy? Or any other projects?

In other words, being cautious is great, but without references of
misuse it's very hard for others to draw the full picture.

> > I'm really sad that amdgpu insists on standing out, hope one day it will
> > converge. Yet since all other maintainers are ok with the series, I'll
> > start merging patches in a few hours. I'm really sad that amdgpu wants
> > to stand out, hope it will converge sooner rather than later.
> >
> > Christian, how would you like amdgpu handled - with a separate flag in
> > the driver or shall we special case amdgpu within core drm?
> 
> No, I ask you to please stick around DRM_AUTH for at least amdgpu and 
> radeon. And NOT enable any render node functionality for them on the 
> primary node.
> 
My question is how do you want this handled:
 - DRM_PLEASE_FORCE_AUTH - added to AMDGPU/RADEON, or
 - driver_name == amdgpu, in core DRM.


Thanks
Emil
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [PATCH 01/13] drm/amdgpu: introduce and honour DRM_FORCE_AUTH workaround

2019-06-21 Thread Koenig, Christian
Am 21.06.19 um 12:20 schrieb Emil Velikov:
> On 2019/06/21, Daniel Vetter wrote:
>> On Fri, Jun 21, 2019 at 11:25 AM Koenig, Christian
>>  wrote:
>>> Am 21.06.19 um 11:09 schrieb Daniel Vetter:
 On Fri, Jun 21, 2019 at 07:12:14AM +, Koenig, Christian wrote:
> Am 20.06.19 um 18:30 schrieb Emil Velikov:
>> On 2019/06/14, Koenig, Christian wrote:
>>> Am 14.06.19 um 17:53 schrieb Emil Velikov:
 On 2019/06/14, Koenig, Christian wrote:
> Am 14.06.19 um 14:09 schrieb Emil Velikov:
>> On 2019/05/27, Emil Velikov wrote:
>> [SNIP]
>> Hi Christian,
>>
>>
>> In the following, I would like to summarise and emphasize the need 
>> for
>> DRM_AUTH removal. I would kindly ask you to spend a couple of minutes
>> extra reading it.
>>
>>
>> Today DRM drivers* do not make any distinction between primary and
>> render node clients.
> That is actually not 100% correct. We have a special case where a DRM
> master is allowed to change the priority of render node clients.
>
 Can you provide a link? I cannot find that code.
>>> See amdgpu_sched_ioctl().
>>>
>> Thus for a render capable driver, any premise of
>> separation, security or otherwise imposed via DRM_AUTH is a fallacy.
> Yeah, that's what I agree on. I just don't think that removing 
> DRM_AUTH
> now is the right direction to take.
>
 Could have been clearer - I'm talking about DRM_AUTH | DRM_RENDER_ALLOW
 ioctls.

 That aside, can you propose an alternative solution that addresses this
 and the second point just below?
>>> Give me a few days to work on this, it's already Friday 6pm here.
>>>
>> Hi Christian,
>>
>> Any progress? As mentioned earlier, I'm OK with writing the patches 
>> although
>> I would love to hear your plan.
> First of all I tried to disable DRM authentication completely with a
> kernel config option. Surprisingly that actually works out of the box at
> least on the AMDGPU stack.
>
> This effectively allows us to get rid of DRI2 and the related security
> problems. Only thing left for that is that I'm just not sure how to
> signal this to userspace so that the DDX wouldn't advertise DRI2 at all
> any more.
>
>
> As a next step I looked into if we can disable the command submission
> for DRM master. Turned out that this is relatively easy as well.
>
> All we have to do is to fix the bug Michel pointed out about KMS handles
> for display and let the DDX use a render node instead of the DRM master
> for Glamor. Still need to sync up with Michel and/or Marek whats the
> best way of doing this.
>
>
> The last step (and that's what I haven't tried yet) would be to disable
> DRM authentication for Navi even when it is still compiled into the
> kernel. But that is more or less just a typing exercise.
 So just to get this straight, we're now full on embracing a subsystem
 split here:
 - amdgpu plans to ditch dri2/rendering on primary nodes
 - bunch of drivers (I think more than i915 by now) merged the DRM_AUTH
 removal
 - others are just hanging in there somehow

 "best of both^W worlds" ftw?
>>> Yes, exactly!
>>>
>>> Think a step further, when this is upstream we can apply the DRM_AUTH
>>> removal to amdgpu as well.
>>>
> So this is effectively what I suggested/planned with a couple of caveats:
>   - making amdgpu behave differently from the rest of drm ;-(
>   - having the KConfig amdgpu only and merged before this DRM_AUTH

No this should apply to all drivers, not just amdgpu.

> While I suggested:
>   - keeping amdgpu consistent with the rest
>   - exposing the KConfig option for the whole of the kernel, and
>   - merging it alongside this work
>
> So I effectively waited for weeks, instead of simply going ahead and
> writing/submitting patches. That's a bit unfortunate...
>
>>> Because we simply made sure that userspace is using the render node for
>>> command submission and not the primary node.
>>>
>>> So nobody can go ahead and remove render node support any more :)
>> How does this work? I thought the entire problem of DRM_AUTH removal
>> for you was that it broke stuff, and you didn't want to deal with
>> that. We still have that exact problem afaics ... old userspace
>> doesn't simply vanish, even if you entirely nuke it going forward.
>>
>> Also, testing on the amdgpu stack isn't really testing a hole lot
>> here, it's all the various silly compositors out there I'd expect to
>> fall over. Will gbm/radeonsi/whatever just internally do all the
>> necessary dma-buf import/export between the primary node and display
>> node to keep this all working?
> If I understood Christian, feel free to correct me, the fact that my
> earlier patch 

Re: [PATCH 01/13] drm/amdgpu: introduce and honour DRM_FORCE_AUTH workaround

2019-06-21 Thread Emil Velikov
On 2019/06/21, Daniel Vetter wrote:
> On Fri, Jun 21, 2019 at 11:25 AM Koenig, Christian
>  wrote:
> >
> > Am 21.06.19 um 11:09 schrieb Daniel Vetter:
> > > On Fri, Jun 21, 2019 at 07:12:14AM +, Koenig, Christian wrote:
> > >> Am 20.06.19 um 18:30 schrieb Emil Velikov:
> > >>> On 2019/06/14, Koenig, Christian wrote:
> >  Am 14.06.19 um 17:53 schrieb Emil Velikov:
> > > On 2019/06/14, Koenig, Christian wrote:
> > >> Am 14.06.19 um 14:09 schrieb Emil Velikov:
> > >>> On 2019/05/27, Emil Velikov wrote:
> > >>> [SNIP]
> > >>> Hi Christian,
> > >>>
> > >>>
> > >>> In the following, I would like to summarise and emphasize the need 
> > >>> for
> > >>> DRM_AUTH removal. I would kindly ask you to spend a couple of 
> > >>> minutes
> > >>> extra reading it.
> > >>>
> > >>>
> > >>> Today DRM drivers* do not make any distinction between primary and
> > >>> render node clients.
> > >> That is actually not 100% correct. We have a special case where a DRM
> > >> master is allowed to change the priority of render node clients.
> > >>
> > > Can you provide a link? I cannot find that code.
> >  See amdgpu_sched_ioctl().
> > 
> > >>> Thus for a render capable driver, any premise of
> > >>> separation, security or otherwise imposed via DRM_AUTH is a fallacy.
> > >> Yeah, that's what I agree on. I just don't think that removing 
> > >> DRM_AUTH
> > >> now is the right direction to take.
> > >>
> > > Could have been clearer - I'm talking about DRM_AUTH | 
> > > DRM_RENDER_ALLOW
> > > ioctls.
> > >
> > > That aside, can you propose an alternative solution that addresses 
> > > this
> > > and the second point just below?
> >  Give me a few days to work on this, it's already Friday 6pm here.
> > 
> > >>> Hi Christian,
> > >>>
> > >>> Any progress? As mentioned earlier, I'm OK with writing the patches 
> > >>> although
> > >>> I would love to hear your plan.
> > >> First of all I tried to disable DRM authentication completely with a
> > >> kernel config option. Surprisingly that actually works out of the box at
> > >> least on the AMDGPU stack.
> > >>
> > >> This effectively allows us to get rid of DRI2 and the related security
> > >> problems. Only thing left for that is that I'm just not sure how to
> > >> signal this to userspace so that the DDX wouldn't advertise DRI2 at all
> > >> any more.
> > >>
> > >>
> > >> As a next step I looked into if we can disable the command submission
> > >> for DRM master. Turned out that this is relatively easy as well.
> > >>
> > >> All we have to do is to fix the bug Michel pointed out about KMS handles
> > >> for display and let the DDX use a render node instead of the DRM master
> > >> for Glamor. Still need to sync up with Michel and/or Marek whats the
> > >> best way of doing this.
> > >>
> > >>
> > >> The last step (and that's what I haven't tried yet) would be to disable
> > >> DRM authentication for Navi even when it is still compiled into the
> > >> kernel. But that is more or less just a typing exercise.
> > > So just to get this straight, we're now full on embracing a subsystem
> > > split here:
> > > - amdgpu plans to ditch dri2/rendering on primary nodes
> > > - bunch of drivers (I think more than i915 by now) merged the DRM_AUTH
> > >removal
> > > - others are just hanging in there somehow
> > >
> > > "best of both^W worlds" ftw?
> >
> > Yes, exactly!
> >
> > Think a step further, when this is upstream we can apply the DRM_AUTH
> > removal to amdgpu as well.
> >
So this is effectively what I suggested/planned with a couple of caveats:
 - making amdgpu behave differently from the rest of drm ;-(
 - having the KConfig amdgpu only and merged before this DRM_AUTH

While I suggested:
 - keeping amdgpu consistent with the rest
 - exposing the KConfig option for the whole of the kernel, and
 - merging it alongside this work

So I effectively waited for weeks, instead of simply going ahead and
writing/submitting patches. That's a bit unfortunate...

> > Because we simply made sure that userspace is using the render node for
> > command submission and not the primary node.
> >
> > So nobody can go ahead and remove render node support any more :)
> 
> How does this work? I thought the entire problem of DRM_AUTH removal
> for you was that it broke stuff, and you didn't want to deal with
> that. We still have that exact problem afaics ... old userspace
> doesn't simply vanish, even if you entirely nuke it going forward.
> 
> Also, testing on the amdgpu stack isn't really testing a hole lot
> here, it's all the various silly compositors out there I'd expect to
> fall over. Will gbm/radeonsi/whatever just internally do all the
> necessary dma-buf import/export between the primary node and display
> node to keep this all working?

If I understood Christian, feel free to correct me, the fact that my
earlier patch broke RADV was not the 

Re: [PATCH 01/13] drm/amdgpu: introduce and honour DRM_FORCE_AUTH workaround

2019-06-21 Thread Christian König

Am 21.06.19 um 11:35 schrieb Daniel Vetter:

On Fri, Jun 21, 2019 at 11:25 AM Koenig, Christian
 wrote:

Am 21.06.19 um 11:09 schrieb Daniel Vetter:

On Fri, Jun 21, 2019 at 07:12:14AM +, Koenig, Christian wrote:

Am 20.06.19 um 18:30 schrieb Emil Velikov:

On 2019/06/14, Koenig, Christian wrote:

Am 14.06.19 um 17:53 schrieb Emil Velikov:

On 2019/06/14, Koenig, Christian wrote:

Am 14.06.19 um 14:09 schrieb Emil Velikov:

On 2019/05/27, Emil Velikov wrote:
[SNIP]
Hi Christian,


In the following, I would like to summarise and emphasize the need for
DRM_AUTH removal. I would kindly ask you to spend a couple of minutes
extra reading it.


Today DRM drivers* do not make any distinction between primary and
render node clients.

That is actually not 100% correct. We have a special case where a DRM
master is allowed to change the priority of render node clients.


Can you provide a link? I cannot find that code.

See amdgpu_sched_ioctl().


Thus for a render capable driver, any premise of
separation, security or otherwise imposed via DRM_AUTH is a fallacy.

Yeah, that's what I agree on. I just don't think that removing DRM_AUTH
now is the right direction to take.


Could have been clearer - I'm talking about DRM_AUTH | DRM_RENDER_ALLOW
ioctls.

That aside, can you propose an alternative solution that addresses this
and the second point just below?

Give me a few days to work on this, it's already Friday 6pm here.


Hi Christian,

Any progress? As mentioned earlier, I'm OK with writing the patches although
I would love to hear your plan.

First of all I tried to disable DRM authentication completely with a
kernel config option. Surprisingly that actually works out of the box at
least on the AMDGPU stack.

This effectively allows us to get rid of DRI2 and the related security
problems. Only thing left for that is that I'm just not sure how to
signal this to userspace so that the DDX wouldn't advertise DRI2 at all
any more.


As a next step I looked into if we can disable the command submission
for DRM master. Turned out that this is relatively easy as well.

All we have to do is to fix the bug Michel pointed out about KMS handles
for display and let the DDX use a render node instead of the DRM master
for Glamor. Still need to sync up with Michel and/or Marek whats the
best way of doing this.


The last step (and that's what I haven't tried yet) would be to disable
DRM authentication for Navi even when it is still compiled into the
kernel. But that is more or less just a typing exercise.

So just to get this straight, we're now full on embracing a subsystem
split here:
- amdgpu plans to ditch dri2/rendering on primary nodes
- bunch of drivers (I think more than i915 by now) merged the DRM_AUTH
removal
- others are just hanging in there somehow

"best of both^W worlds" ftw?

Yes, exactly!

Think a step further, when this is upstream we can apply the DRM_AUTH
removal to amdgpu as well.

Because we simply made sure that userspace is using the render node for
command submission and not the primary node.

So nobody can go ahead and remove render node support any more :)

How does this work? I thought the entire problem of DRM_AUTH removal
for you was that it broke stuff, and you didn't want to deal with
that.


Yeah, that is indeed still true.

It's just that we have done way to many projects with radeon/amdgpu and 
customized userspace stuff.



We still have that exact problem afaics ... old userspace
doesn't simply vanish, even if you entirely nuke it going forward.


Well old userspace doesn't work with new hardware either.

So the idea is to keep old userspace for old hardware working, but only 
disable old stuff for new hardware.



Also, testing on the amdgpu stack isn't really testing a hole lot
here, it's all the various silly compositors out there I'd expect to
fall over. Will gbm/radeonsi/whatever just internally do all the
necessary dma-buf import/export between the primary node and display
node to keep this all working?


Yes, at least that's how I understand Michel's idea.

We keep both file descriptors for primary and render node around at the 
same time anyway. So the change is actually not that much.


This also solves the problem that people are running things as root, 
since we now always use the render node and never the primary node for 
everything except KMS.


Christian.


-Daniel


___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [PATCH 01/13] drm/amdgpu: introduce and honour DRM_FORCE_AUTH workaround

2019-06-21 Thread Daniel Vetter
On Fri, Jun 21, 2019 at 11:25 AM Koenig, Christian
 wrote:
>
> Am 21.06.19 um 11:09 schrieb Daniel Vetter:
> > On Fri, Jun 21, 2019 at 07:12:14AM +, Koenig, Christian wrote:
> >> Am 20.06.19 um 18:30 schrieb Emil Velikov:
> >>> On 2019/06/14, Koenig, Christian wrote:
>  Am 14.06.19 um 17:53 schrieb Emil Velikov:
> > On 2019/06/14, Koenig, Christian wrote:
> >> Am 14.06.19 um 14:09 schrieb Emil Velikov:
> >>> On 2019/05/27, Emil Velikov wrote:
> >>> [SNIP]
> >>> Hi Christian,
> >>>
> >>>
> >>> In the following, I would like to summarise and emphasize the need for
> >>> DRM_AUTH removal. I would kindly ask you to spend a couple of minutes
> >>> extra reading it.
> >>>
> >>>
> >>> Today DRM drivers* do not make any distinction between primary and
> >>> render node clients.
> >> That is actually not 100% correct. We have a special case where a DRM
> >> master is allowed to change the priority of render node clients.
> >>
> > Can you provide a link? I cannot find that code.
>  See amdgpu_sched_ioctl().
> 
> >>> Thus for a render capable driver, any premise of
> >>> separation, security or otherwise imposed via DRM_AUTH is a fallacy.
> >> Yeah, that's what I agree on. I just don't think that removing DRM_AUTH
> >> now is the right direction to take.
> >>
> > Could have been clearer - I'm talking about DRM_AUTH | DRM_RENDER_ALLOW
> > ioctls.
> >
> > That aside, can you propose an alternative solution that addresses this
> > and the second point just below?
>  Give me a few days to work on this, it's already Friday 6pm here.
> 
> >>> Hi Christian,
> >>>
> >>> Any progress? As mentioned earlier, I'm OK with writing the patches 
> >>> although
> >>> I would love to hear your plan.
> >> First of all I tried to disable DRM authentication completely with a
> >> kernel config option. Surprisingly that actually works out of the box at
> >> least on the AMDGPU stack.
> >>
> >> This effectively allows us to get rid of DRI2 and the related security
> >> problems. Only thing left for that is that I'm just not sure how to
> >> signal this to userspace so that the DDX wouldn't advertise DRI2 at all
> >> any more.
> >>
> >>
> >> As a next step I looked into if we can disable the command submission
> >> for DRM master. Turned out that this is relatively easy as well.
> >>
> >> All we have to do is to fix the bug Michel pointed out about KMS handles
> >> for display and let the DDX use a render node instead of the DRM master
> >> for Glamor. Still need to sync up with Michel and/or Marek whats the
> >> best way of doing this.
> >>
> >>
> >> The last step (and that's what I haven't tried yet) would be to disable
> >> DRM authentication for Navi even when it is still compiled into the
> >> kernel. But that is more or less just a typing exercise.
> > So just to get this straight, we're now full on embracing a subsystem
> > split here:
> > - amdgpu plans to ditch dri2/rendering on primary nodes
> > - bunch of drivers (I think more than i915 by now) merged the DRM_AUTH
> >removal
> > - others are just hanging in there somehow
> >
> > "best of both^W worlds" ftw?
>
> Yes, exactly!
>
> Think a step further, when this is upstream we can apply the DRM_AUTH
> removal to amdgpu as well.
>
> Because we simply made sure that userspace is using the render node for
> command submission and not the primary node.
>
> So nobody can go ahead and remove render node support any more :)

How does this work? I thought the entire problem of DRM_AUTH removal
for you was that it broke stuff, and you didn't want to deal with
that. We still have that exact problem afaics ... old userspace
doesn't simply vanish, even if you entirely nuke it going forward.

Also, testing on the amdgpu stack isn't really testing a hole lot
here, it's all the various silly compositors out there I'd expect to
fall over. Will gbm/radeonsi/whatever just internally do all the
necessary dma-buf import/export between the primary node and display
node to keep this all working?
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [PATCH 01/13] drm/amdgpu: introduce and honour DRM_FORCE_AUTH workaround

2019-06-21 Thread Koenig, Christian
Am 21.06.19 um 11:09 schrieb Daniel Vetter:
> On Fri, Jun 21, 2019 at 07:12:14AM +, Koenig, Christian wrote:
>> Am 20.06.19 um 18:30 schrieb Emil Velikov:
>>> On 2019/06/14, Koenig, Christian wrote:
 Am 14.06.19 um 17:53 schrieb Emil Velikov:
> On 2019/06/14, Koenig, Christian wrote:
>> Am 14.06.19 um 14:09 schrieb Emil Velikov:
>>> On 2019/05/27, Emil Velikov wrote:
>>> [SNIP]
>>> Hi Christian,
>>>
>>>
>>> In the following, I would like to summarise and emphasize the need for
>>> DRM_AUTH removal. I would kindly ask you to spend a couple of minutes
>>> extra reading it.
>>>
>>>
>>> Today DRM drivers* do not make any distinction between primary and
>>> render node clients.
>> That is actually not 100% correct. We have a special case where a DRM
>> master is allowed to change the priority of render node clients.
>>
> Can you provide a link? I cannot find that code.
 See amdgpu_sched_ioctl().

>>> Thus for a render capable driver, any premise of
>>> separation, security or otherwise imposed via DRM_AUTH is a fallacy.
>> Yeah, that's what I agree on. I just don't think that removing DRM_AUTH
>> now is the right direction to take.
>>
> Could have been clearer - I'm talking about DRM_AUTH | DRM_RENDER_ALLOW
> ioctls.
>
> That aside, can you propose an alternative solution that addresses this
> and the second point just below?
 Give me a few days to work on this, it's already Friday 6pm here.

>>> Hi Christian,
>>>
>>> Any progress? As mentioned earlier, I'm OK with writing the patches although
>>> I would love to hear your plan.
>> First of all I tried to disable DRM authentication completely with a
>> kernel config option. Surprisingly that actually works out of the box at
>> least on the AMDGPU stack.
>>
>> This effectively allows us to get rid of DRI2 and the related security
>> problems. Only thing left for that is that I'm just not sure how to
>> signal this to userspace so that the DDX wouldn't advertise DRI2 at all
>> any more.
>>
>>
>> As a next step I looked into if we can disable the command submission
>> for DRM master. Turned out that this is relatively easy as well.
>>
>> All we have to do is to fix the bug Michel pointed out about KMS handles
>> for display and let the DDX use a render node instead of the DRM master
>> for Glamor. Still need to sync up with Michel and/or Marek whats the
>> best way of doing this.
>>
>>
>> The last step (and that's what I haven't tried yet) would be to disable
>> DRM authentication for Navi even when it is still compiled into the
>> kernel. But that is more or less just a typing exercise.
> So just to get this straight, we're now full on embracing a subsystem
> split here:
> - amdgpu plans to ditch dri2/rendering on primary nodes
> - bunch of drivers (I think more than i915 by now) merged the DRM_AUTH
>removal
> - others are just hanging in there somehow
>
> "best of both^W worlds" ftw?

Yes, exactly!

Think a step further, when this is upstream we can apply the DRM_AUTH 
removal to amdgpu as well.

Because we simply made sure that userspace is using the render node for 
command submission and not the primary node.

So nobody can go ahead and remove render node support any more :)

Regards,
Christian.

> -Daniel

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [PATCH 01/13] drm/amdgpu: introduce and honour DRM_FORCE_AUTH workaround

2019-06-21 Thread Daniel Vetter
On Fri, Jun 21, 2019 at 07:12:14AM +, Koenig, Christian wrote:
> Am 20.06.19 um 18:30 schrieb Emil Velikov:
> > On 2019/06/14, Koenig, Christian wrote:
> >> Am 14.06.19 um 17:53 schrieb Emil Velikov:
> >>> On 2019/06/14, Koenig, Christian wrote:
>  Am 14.06.19 um 14:09 schrieb Emil Velikov:
> > On 2019/05/27, Emil Velikov wrote:
> > [SNIP]
> > Hi Christian,
> >
> >
> > In the following, I would like to summarise and emphasize the need for
> > DRM_AUTH removal. I would kindly ask you to spend a couple of minutes
> > extra reading it.
> >
> >
> > Today DRM drivers* do not make any distinction between primary and
> > render node clients.
>  That is actually not 100% correct. We have a special case where a DRM
>  master is allowed to change the priority of render node clients.
> 
> >>> Can you provide a link? I cannot find that code.
> >> See amdgpu_sched_ioctl().
> >>
> > Thus for a render capable driver, any premise of
> > separation, security or otherwise imposed via DRM_AUTH is a fallacy.
>  Yeah, that's what I agree on. I just don't think that removing DRM_AUTH
>  now is the right direction to take.
> 
> >>> Could have been clearer - I'm talking about DRM_AUTH | DRM_RENDER_ALLOW
> >>> ioctls.
> >>>
> >>> That aside, can you propose an alternative solution that addresses this
> >>> and the second point just below?
> >> Give me a few days to work on this, it's already Friday 6pm here.
> >>
> > Hi Christian,
> >
> > Any progress? As mentioned earlier, I'm OK with writing the patches although
> > I would love to hear your plan.
> 
> First of all I tried to disable DRM authentication completely with a 
> kernel config option. Surprisingly that actually works out of the box at 
> least on the AMDGPU stack.
> 
> This effectively allows us to get rid of DRI2 and the related security 
> problems. Only thing left for that is that I'm just not sure how to 
> signal this to userspace so that the DDX wouldn't advertise DRI2 at all 
> any more.
> 
> 
> As a next step I looked into if we can disable the command submission 
> for DRM master. Turned out that this is relatively easy as well.
> 
> All we have to do is to fix the bug Michel pointed out about KMS handles 
> for display and let the DDX use a render node instead of the DRM master 
> for Glamor. Still need to sync up with Michel and/or Marek whats the 
> best way of doing this.
> 
> 
> The last step (and that's what I haven't tried yet) would be to disable 
> DRM authentication for Navi even when it is still compiled into the 
> kernel. But that is more or less just a typing exercise.

So just to get this straight, we're now full on embracing a subsystem
split here:
- amdgpu plans to ditch dri2/rendering on primary nodes
- bunch of drivers (I think more than i915 by now) merged the DRM_AUTH
  removal
- others are just hanging in there somehow

"best of both^W worlds" ftw?
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [PATCH 01/13] drm/amdgpu: introduce and honour DRM_FORCE_AUTH workaround

2019-06-21 Thread Koenig, Christian
Am 21.06.19 um 09:41 schrieb Michel Dänzer:
> On 2019-06-21 9:12 a.m., Koenig, Christian wrote:
>> First of all I tried to disable DRM authentication completely with a
>> kernel config option. Surprisingly that actually works out of the box at
>> least on the AMDGPU stack.
>>
>> This effectively allows us to get rid of DRI2 and the related security
>> problems. Only thing left for that is that I'm just not sure how to
>> signal this to userspace so that the DDX wouldn't advertise DRI2 at all
>> any more.
> FWIW, getting rid of DRI2 also needs to be discussed with amdgpu-pro
> OpenGL driver folks.

Good point, but I don't expect much problems from that direction.

IIRC they where quite happy to not have to support DRI2 except for a 
very very old X server version.

>> As a next step I looked into if we can disable the command submission
>> for DRM master. Turned out that this is relatively easy as well.
>>
>> All we have to do is to fix the bug Michel pointed out about KMS handles
>> for display
> I'm working on that, consider it fixed.
>
>
>> and let the DDX use a render node instead of the DRM master for Glamor.
>> Still need to sync up with Michel and/or Marek whats the best way of
>> doing this.
> My suggestion was to add a new variant of amdgpu_device_initialize. When
> the new variant is called, libdrm_amdgpu internally uses a render node
> for command submission etc. whenever possible.

Yeah, sounds like a plan to me.

Christian.
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [PATCH 01/13] drm/amdgpu: introduce and honour DRM_FORCE_AUTH workaround

2019-06-21 Thread Michel Dänzer
On 2019-06-21 9:12 a.m., Koenig, Christian wrote:
> 
> First of all I tried to disable DRM authentication completely with a 
> kernel config option. Surprisingly that actually works out of the box at 
> least on the AMDGPU stack.
> 
> This effectively allows us to get rid of DRI2 and the related security 
> problems. Only thing left for that is that I'm just not sure how to 
> signal this to userspace so that the DDX wouldn't advertise DRI2 at all 
> any more.

FWIW, getting rid of DRI2 also needs to be discussed with amdgpu-pro
OpenGL driver folks.


> As a next step I looked into if we can disable the command submission 
> for DRM master. Turned out that this is relatively easy as well.
> 
> All we have to do is to fix the bug Michel pointed out about KMS handles 
> for display

I'm working on that, consider it fixed.


> and let the DDX use a render node instead of the DRM master for Glamor.
> Still need to sync up with Michel and/or Marek whats the best way of
> doing this.

My suggestion was to add a new variant of amdgpu_device_initialize. When
the new variant is called, libdrm_amdgpu internally uses a render node
for command submission etc. whenever possible.


-- 
Earthling Michel Dänzer   |  https://www.amd.com
Libre software enthusiast | Mesa and X developer
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [PATCH 01/13] drm/amdgpu: introduce and honour DRM_FORCE_AUTH workaround

2019-06-21 Thread Koenig, Christian
Am 20.06.19 um 18:30 schrieb Emil Velikov:
> On 2019/06/14, Koenig, Christian wrote:
>> Am 14.06.19 um 17:53 schrieb Emil Velikov:
>>> On 2019/06/14, Koenig, Christian wrote:
 Am 14.06.19 um 14:09 schrieb Emil Velikov:
> On 2019/05/27, Emil Velikov wrote:
> [SNIP]
> Hi Christian,
>
>
> In the following, I would like to summarise and emphasize the need for
> DRM_AUTH removal. I would kindly ask you to spend a couple of minutes
> extra reading it.
>
>
> Today DRM drivers* do not make any distinction between primary and
> render node clients.
 That is actually not 100% correct. We have a special case where a DRM
 master is allowed to change the priority of render node clients.

>>> Can you provide a link? I cannot find that code.
>> See amdgpu_sched_ioctl().
>>
> Thus for a render capable driver, any premise of
> separation, security or otherwise imposed via DRM_AUTH is a fallacy.
 Yeah, that's what I agree on. I just don't think that removing DRM_AUTH
 now is the right direction to take.

>>> Could have been clearer - I'm talking about DRM_AUTH | DRM_RENDER_ALLOW
>>> ioctls.
>>>
>>> That aside, can you propose an alternative solution that addresses this
>>> and the second point just below?
>> Give me a few days to work on this, it's already Friday 6pm here.
>>
> Hi Christian,
>
> Any progress? As mentioned earlier, I'm OK with writing the patches although
> I would love to hear your plan.

First of all I tried to disable DRM authentication completely with a 
kernel config option. Surprisingly that actually works out of the box at 
least on the AMDGPU stack.

This effectively allows us to get rid of DRI2 and the related security 
problems. Only thing left for that is that I'm just not sure how to 
signal this to userspace so that the DDX wouldn't advertise DRI2 at all 
any more.


As a next step I looked into if we can disable the command submission 
for DRM master. Turned out that this is relatively easy as well.

All we have to do is to fix the bug Michel pointed out about KMS handles 
for display and let the DDX use a render node instead of the DRM master 
for Glamor. Still need to sync up with Michel and/or Marek whats the 
best way of doing this.


The last step (and that's what I haven't tried yet) would be to disable 
DRM authentication for Navi even when it is still compiled into the 
kernel. But that is more or less just a typing exercise.

Regards,
Christian.

>
> Thanks
> Emil

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [PATCH 01/13] drm/amdgpu: introduce and honour DRM_FORCE_AUTH workaround

2019-06-20 Thread Emil Velikov
On 2019/06/14, Koenig, Christian wrote:
> Am 14.06.19 um 17:53 schrieb Emil Velikov:
> > On 2019/06/14, Koenig, Christian wrote:
> >> Am 14.06.19 um 14:09 schrieb Emil Velikov:
> >>> On 2019/05/27, Emil Velikov wrote:
> >>> [SNIP]
> >>> Hi Christian,
> >>>
> >>>
> >>> In the following, I would like to summarise and emphasize the need for
> >>> DRM_AUTH removal. I would kindly ask you to spend a couple of minutes
> >>> extra reading it.
> >>>
> >>>
> >>> Today DRM drivers* do not make any distinction between primary and
> >>> render node clients.
> >> That is actually not 100% correct. We have a special case where a DRM
> >> master is allowed to change the priority of render node clients.
> >>
> > Can you provide a link? I cannot find that code.
> 
> See amdgpu_sched_ioctl().
> 
> >>> Thus for a render capable driver, any premise of
> >>> separation, security or otherwise imposed via DRM_AUTH is a fallacy.
> >> Yeah, that's what I agree on. I just don't think that removing DRM_AUTH
> >> now is the right direction to take.
> >>
> > Could have been clearer - I'm talking about DRM_AUTH | DRM_RENDER_ALLOW
> > ioctls.
> >
> > That aside, can you propose an alternative solution that addresses this
> > and the second point just below?
> 
> Give me a few days to work on this, it's already Friday 6pm here.
> 
Hi Christian,

Any progress? As mentioned earlier, I'm OK with writing the patches although
I would love to hear your plan.

Thanks
Emil
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [PATCH 01/13] drm/amdgpu: introduce and honour DRM_FORCE_AUTH workaround

2019-06-14 Thread Emil Velikov
On 2019/06/14, Koenig, Christian wrote:
> Am 14.06.19 um 17:53 schrieb Emil Velikov:
> > On 2019/06/14, Koenig, Christian wrote:
> >> Am 14.06.19 um 14:09 schrieb Emil Velikov:
> >>> On 2019/05/27, Emil Velikov wrote:
> >>> [SNIP]
> >>> Hi Christian,
> >>>
> >>>
> >>> In the following, I would like to summarise and emphasize the need for
> >>> DRM_AUTH removal. I would kindly ask you to spend a couple of minutes
> >>> extra reading it.
> >>>
> >>>
> >>> Today DRM drivers* do not make any distinction between primary and
> >>> render node clients.
> >> That is actually not 100% correct. We have a special case where a DRM
> >> master is allowed to change the priority of render node clients.
> >>
> > Can you provide a link? I cannot find that code.
> 
> See amdgpu_sched_ioctl().
> 
> >>> Thus for a render capable driver, any premise of
> >>> separation, security or otherwise imposed via DRM_AUTH is a fallacy.
> >> Yeah, that's what I agree on. I just don't think that removing DRM_AUTH
> >> now is the right direction to take.
> >>
> > Could have been clearer - I'm talking about DRM_AUTH | DRM_RENDER_ALLOW
> > ioctls.
> >
> > That aside, can you propose an alternative solution that addresses this
> > and the second point just below?
> 
> Give me a few days to work on this, it's already Friday 6pm here.
> 
Great thanks. Fwiw I was asking for a ideas/proposal, was not expecting you
to write the patches ;-)

Emil
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [PATCH 01/13] drm/amdgpu: introduce and honour DRM_FORCE_AUTH workaround

2019-06-14 Thread Koenig, Christian
Am 14.06.19 um 17:53 schrieb Emil Velikov:
> On 2019/06/14, Koenig, Christian wrote:
>> Am 14.06.19 um 14:09 schrieb Emil Velikov:
>>> On 2019/05/27, Emil Velikov wrote:
>>> [SNIP]
>>> Hi Christian,
>>>
>>>
>>> In the following, I would like to summarise and emphasize the need for
>>> DRM_AUTH removal. I would kindly ask you to spend a couple of minutes
>>> extra reading it.
>>>
>>>
>>> Today DRM drivers* do not make any distinction between primary and
>>> render node clients.
>> That is actually not 100% correct. We have a special case where a DRM
>> master is allowed to change the priority of render node clients.
>>
> Can you provide a link? I cannot find that code.

See amdgpu_sched_ioctl().

>>> Thus for a render capable driver, any premise of
>>> separation, security or otherwise imposed via DRM_AUTH is a fallacy.
>> Yeah, that's what I agree on. I just don't think that removing DRM_AUTH
>> now is the right direction to take.
>>
> Could have been clearer - I'm talking about DRM_AUTH | DRM_RENDER_ALLOW
> ioctls.
>
> That aside, can you propose an alternative solution that addresses this
> and the second point just below?

Give me a few days to work on this, it's already Friday 6pm here.

Christian.

>
>>> Considering the examples of flaky or outright missing drmAuth in prime
>>> open-source projects (mesa, kmscube, libva) we can reasonably assume
>>> other projects exbibit the same problem.
>>>
>>> For these and/or other reasons, two DRM drivers have dropped DRM_AUTH
>>> since day one.
>>>
>>> Since we are interested in providing consistent and predictable
>>> behaviour to user-space, dropping DRM_AUTH seems to be the most
>>> effective way forward.
>> Well and what do you do with drivers which doesn't implement render nodes?
>>
> AFAICT there is a single non-DC driver which does not expose - QXL.
> I would expect it runs on a rather customised stack.
>
> Would be great to fix that, but ETIME and it's the only exception to the
> rule.
>
>
>>> Of course, I'd be more than happy to hear for any other way to achieve
>>> the goal outlined.
>>>
>>> Based on the series, other maintainers agree with the idea/goal here.
>>> Amdgpu not following the same pattern would compromise predictability
>>> across drivers and complicate userspace, so I would kindly ask you to
>>> reconsider.
>>>
>>> Alternatively, I see two ways to special case:
>>>- New flag annotating amdgpu/radeon - akin to the one proposed earlier
>>>- Check for amdgpu/radeon in core DRM
>> I perfectly agree that I don't want any special handling for amdgpu/radeon.
>>
>> My key point is that with DRM_AUTH we created an authorization and
>> authentication mess in DRM which is functionality which doesn't belong
>> into the DRM subsystem in the first place.
>>
> Precisely and I've outlined below how to resolve this in the long run.
>
>>> Now on your pain point - not allowing render iocts via the primary node,
>>> and how this patch could make it harder to achieve that goal.
>>>
>>> While I love the idea, there are number of obstacles that prevent us
>>> from doing so at this time:
>>>- Ensuring both old and new userspace does not regress
>> Yeah, agree totally. That's why I said we should probably start doing
>> this for only for upcoming hardware generations.
>>
> That will side-step the regression issue, but will enforce driver
> specific behaviour outlined before.
>
>>>- Drivers (QXL, others?) do not expose a render node
>> Well isn't that is a rather big problem for the removal of DRM_AUTH in
>> general?
>>
>> E.g. at least QXL would then expose functionality on the primary node
>> without authentication.
>>
> With this series QXL remains functionally unchanged. I would love to fix
> that as well yet ETIME as mentioned above.
>
>>>- We want to avoid driver specific behaviour
>>>
>>> The only way forward that I can see is:
>>>- Address QXL/others to expose render nodes
>>>- Add a Kconfig toggle to disable !KMS ioctls via the primary node
>>>- Jump-start ^^ with distribution X
>>>- Fix user-space fallouts
>>>- X months down the line, flip the Kconfig
>>>- In case of regressions - revert the KConfig, goto Fix user-space...
>> Well that at least sounds like a plan :) Let's to this!
>>
> We're talking about months if not years until it comes to fruition. We
> need something quicker.
>
> That said, I'm quite happy to take the first stab, yet this is not a
> replacement for this series.
>
>>> That said, the proposal will not conflict with the DRM_AUTH removal. If
>>> anything it is step 0.5 of the grand master plan.
>> That's the point I strongly disagree on.
>>
>> By lowering the DRM_AUTH restriction you are encouraging userspace to
>> use the shortcut and use the primary node for rendering instead of
>> fixing the code and using the render node.
>>
> Have to disagree here. After working on the user-space side and fixing
> issues in various projects I can honestly say that most user-space is
> sane and 

Re: [PATCH 01/13] drm/amdgpu: introduce and honour DRM_FORCE_AUTH workaround

2019-06-14 Thread Emil Velikov
On 2019/06/14, Koenig, Christian wrote:
> Am 14.06.19 um 14:09 schrieb Emil Velikov:
> > On 2019/05/27, Emil Velikov wrote:
> > [SNIP]
> > Hi Christian,
> >
> >
> > In the following, I would like to summarise and emphasize the need for
> > DRM_AUTH removal. I would kindly ask you to spend a couple of minutes
> > extra reading it.
> >
> >
> > Today DRM drivers* do not make any distinction between primary and
> > render node clients.
> 
> That is actually not 100% correct. We have a special case where a DRM 
> master is allowed to change the priority of render node clients.
> 
Can you provide a link? I cannot find that code.

> > Thus for a render capable driver, any premise of
> > separation, security or otherwise imposed via DRM_AUTH is a fallacy.
> 
> Yeah, that's what I agree on. I just don't think that removing DRM_AUTH 
> now is the right direction to take.
> 
Could have been clearer - I'm talking about DRM_AUTH | DRM_RENDER_ALLOW
ioctls.

That aside, can you propose an alternative solution that addresses this
and the second point just below?

> > Considering the examples of flaky or outright missing drmAuth in prime
> > open-source projects (mesa, kmscube, libva) we can reasonably assume
> > other projects exbibit the same problem.
> >
> > For these and/or other reasons, two DRM drivers have dropped DRM_AUTH
> > since day one.
> >
> > Since we are interested in providing consistent and predictable
> > behaviour to user-space, dropping DRM_AUTH seems to be the most
> > effective way forward.
> 
> Well and what do you do with drivers which doesn't implement render nodes?
> 
AFAICT there is a single non-DC driver which does not expose - QXL.
I would expect it runs on a rather customised stack.

Would be great to fix that, but ETIME and it's the only exception to the
rule.


> > Of course, I'd be more than happy to hear for any other way to achieve
> > the goal outlined.
> >
> > Based on the series, other maintainers agree with the idea/goal here.
> > Amdgpu not following the same pattern would compromise predictability
> > across drivers and complicate userspace, so I would kindly ask you to
> > reconsider.
> >
> > Alternatively, I see two ways to special case:
> >   - New flag annotating amdgpu/radeon - akin to the one proposed earlier
> >   - Check for amdgpu/radeon in core DRM
> 
> I perfectly agree that I don't want any special handling for amdgpu/radeon.
> 
> My key point is that with DRM_AUTH we created an authorization and 
> authentication mess in DRM which is functionality which doesn't belong 
> into the DRM subsystem in the first place.
> 
Precisely and I've outlined below how to resolve this in the long run.

> > Now on your pain point - not allowing render iocts via the primary node,
> > and how this patch could make it harder to achieve that goal.
> >
> > While I love the idea, there are number of obstacles that prevent us
> > from doing so at this time:
> >   - Ensuring both old and new userspace does not regress
> 
> Yeah, agree totally. That's why I said we should probably start doing 
> this for only for upcoming hardware generations.
> 
That will side-step the regression issue, but will enforce driver
specific behaviour outlined before.

> >   - Drivers (QXL, others?) do not expose a render node
> 
> Well isn't that is a rather big problem for the removal of DRM_AUTH in 
> general?
> 
> E.g. at least QXL would then expose functionality on the primary node 
> without authentication.
> 
With this series QXL remains functionally unchanged. I would love to fix
that as well yet ETIME as mentioned above.

> >   - We want to avoid driver specific behaviour
> >
> > The only way forward that I can see is:
> >   - Address QXL/others to expose render nodes
> >   - Add a Kconfig toggle to disable !KMS ioctls via the primary node
> >   - Jump-start ^^ with distribution X
> >   - Fix user-space fallouts
> >   - X months down the line, flip the Kconfig
> >   - In case of regressions - revert the KConfig, goto Fix user-space...
> 
> Well that at least sounds like a plan :) Let's to this!
> 
We're talking about months if not years until it comes to fruition. We
need something quicker.

That said, I'm quite happy to take the first stab, yet this is not a
replacement for this series.

> > That said, the proposal will not conflict with the DRM_AUTH removal. If
> > anything it is step 0.5 of the grand master plan.
> 
> That's the point I strongly disagree on.
> 
> By lowering the DRM_AUTH restriction you are encouraging userspace to 
> use the shortcut and use the primary node for rendering instead of 
> fixing the code and using the render node.
> 
Have to disagree here. After working on the user-space side and fixing
issues in various projects I can honestly say that most user-space is
sane and developers _care_ about doing things correctly.

> So at the end of the day userspace will most likely completely drop 
> support for the render node, simply for the reason that it became 
> superfluous. You can 

Re: [PATCH 01/13] drm/amdgpu: introduce and honour DRM_FORCE_AUTH workaround

2019-06-14 Thread Michel Dänzer
On 2019-06-14 2:55 p.m., Koenig, Christian wrote:
> Am 14.06.19 um 14:09 schrieb Emil Velikov:
> 
>> That said, the proposal will not conflict with the DRM_AUTH removal. If
>> anything it is step 0.5 of the grand master plan.
> 
> That's the point I strongly disagree on.
> 
> By lowering the DRM_AUTH restriction you are encouraging userspace to 
> use the shortcut and use the primary node for rendering instead of 
> fixing the code and using the render node.
> 
> So at the end of the day userspace will most likely completely drop 
> support for the render node, simply for the reason that it became 
> superfluous. You can just open up the primary node and get the same 
> functionality.
> 
> I absolutely can't understand how you can argue that this won't make it 
> harder to cleanly separate rendering and primary node functionality.

FWIW, I agree with Christian on this.


Also FWIW, the solution I'm working on for
https://bugs.freedesktop.org/110903 should allow making libdrm_amdgpu
always use a render node for rendering. For backwards compatibility
it'll probably require adding a new variant of amdgpu_device_initialize
though, since existing users may rely on the first amdgpu_device for a
GPU using the DRM file description passed to amdgpu_device_initialize
for rendering.


-- 
Earthling Michel Dänzer   |  https://www.amd.com
Libre software enthusiast | Mesa and X developer
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [PATCH 01/13] drm/amdgpu: introduce and honour DRM_FORCE_AUTH workaround

2019-06-14 Thread Koenig, Christian
Am 14.06.19 um 14:09 schrieb Emil Velikov:
> On 2019/05/27, Emil Velikov wrote:
> [SNIP]
> Hi Christian,
>
>
> In the following, I would like to summarise and emphasize the need for
> DRM_AUTH removal. I would kindly ask you to spend a couple of minutes
> extra reading it.
>
>
> Today DRM drivers* do not make any distinction between primary and
> render node clients.

That is actually not 100% correct. We have a special case where a DRM 
master is allowed to change the priority of render node clients.

> Thus for a render capable driver, any premise of
> separation, security or otherwise imposed via DRM_AUTH is a fallacy.

Yeah, that's what I agree on. I just don't think that removing DRM_AUTH 
now is the right direction to take.

> Considering the examples of flaky or outright missing drmAuth in prime
> open-source projects (mesa, kmscube, libva) we can reasonably assume
> other projects exbibit the same problem.
>
> For these and/or other reasons, two DRM drivers have dropped DRM_AUTH
> since day one.
>
> Since we are interested in providing consistent and predictable
> behaviour to user-space, dropping DRM_AUTH seems to be the most
> effective way forward.

Well and what do you do with drivers which doesn't implement render nodes?

> Of course, I'd be more than happy to hear for any other way to achieve
> the goal outlined.
>
> Based on the series, other maintainers agree with the idea/goal here.
> Amdgpu not following the same pattern would compromise predictability
> across drivers and complicate userspace, so I would kindly ask you to
> reconsider.
>
> Alternatively, I see two ways to special case:
>   - New flag annotating amdgpu/radeon - akin to the one proposed earlier
>   - Check for amdgpu/radeon in core DRM

I perfectly agree that I don't want any special handling for amdgpu/radeon.

My key point is that with DRM_AUTH we created an authorization and 
authentication mess in DRM which is functionality which doesn't belong 
into the DRM subsystem in the first place.

> Now on your pain point - not allowing render iocts via the primary node,
> and how this patch could make it harder to achieve that goal.
>
> While I love the idea, there are number of obstacles that prevent us
> from doing so at this time:
>   - Ensuring both old and new userspace does not regress

Yeah, agree totally. That's why I said we should probably start doing 
this for only for upcoming hardware generations.

>   - Drivers (QXL, others?) do not expose a render node

Well isn't that is a rather big problem for the removal of DRM_AUTH in 
general?

E.g. at least QXL would then expose functionality on the primary node 
without authentication.

>   - We want to avoid driver specific behaviour
>
> The only way forward that I can see is:
>   - Address QXL/others to expose render nodes
>   - Add a Kconfig toggle to disable !KMS ioctls via the primary node
>   - Jump-start ^^ with distribution X
>   - Fix user-space fallouts
>   - X months down the line, flip the Kconfig
>   - In case of regressions - revert the KConfig, goto Fix user-space...

Well that at least sounds like a plan :) Let's to this!

> That said, the proposal will not conflict with the DRM_AUTH removal. If
> anything it is step 0.5 of the grand master plan.

That's the point I strongly disagree on.

By lowering the DRM_AUTH restriction you are encouraging userspace to 
use the shortcut and use the primary node for rendering instead of 
fixing the code and using the render node.

So at the end of the day userspace will most likely completely drop 
support for the render node, simply for the reason that it became 
superfluous. You can just open up the primary node and get the same 
functionality.

I absolutely can't understand how you can argue that this won't make it 
harder to cleanly separate rendering and primary node functionality.

Regards,
Christian.

>
>
> Tl;Dr: Removing DRM_AUTH is orthogonal to not allowing render iocts via
> the primary node. Here's an idea how to achieve the latter.
>
>
> Thanks
> Emil

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [PATCH 01/13] drm/amdgpu: introduce and honour DRM_FORCE_AUTH workaround

2019-06-14 Thread Emil Velikov
On 2019/05/27, Emil Velikov wrote:
> From: Emil Velikov 
> 
> Currently one can circumvent DRM_AUTH, when the ioctl is exposed via the
> render node. A seemingly deliberate design decision.
> 
> Hence we can drop the DRM_AUTH all together (details in follow-up patch)
> yet not all userspace checks if it's authenticated, but instead uses
> uncommon assumptions.
> 
> After days of digging through git log and testing, only a single (ab)use
> was spotted - the Mesa RADV driver, using the AMDGPU_INFO ioctl and
> assuming that failure implies lack of authentication.
> 
> Affected versions are:
>  - the whole 18.2.x series, which is EOL
>  - the whole 18.3.x series, which is EOL
>  - the 19.0.x series, prior to 19.0.4
> 
> Add a special quirk for that case, thus we can drop DRM_AUTH bits as
> mentioned earlier.
> 
> Since all the affected userspace is EOL, we also add a kconfig option
> to disable this quirk.
> 
> The whole approach is inspired by DRIVER_KMS_LEGACY_CONTEXT
> 
> Cc: Alex Deucher 
> Cc: Christian König 
> Cc: amd-...@lists.freedesktop.org
> Cc: David Airlie 
> Cc: Daniel Vetter 
> Signed-off-by: Emil Velikov 
> ---
>  drivers/gpu/drm/amd/amdgpu/Kconfig  | 16 
>  drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c | 12 +++-
>  drivers/gpu/drm/drm_ioctl.c |  5 +
>  include/drm/drm_ioctl.h | 17 +
>  4 files changed, 49 insertions(+), 1 deletion(-)
> 

Hi Christian,


In the following, I would like to summarise and emphasize the need for
DRM_AUTH removal. I would kindly ask you to spend a couple of minutes
extra reading it.


Today DRM drivers* do not make any distinction between primary and
render node clients. Thus for a render capable driver, any premise of
separation, security or otherwise imposed via DRM_AUTH is a fallacy.

Considering the examples of flaky or outright missing drmAuth in prime
open-source projects (mesa, kmscube, libva) we can reasonably assume
other projects exbibit the same problem.

For these and/or other reasons, two DRM drivers have dropped DRM_AUTH
since day one.

Since we are interested in providing consistent and predictable
behaviour to user-space, dropping DRM_AUTH seems to be the most
effective way forward.

Of course, I'd be more than happy to hear for any other way to achieve
the goal outlined.

Based on the series, other maintainers agree with the idea/goal here.
Amdgpu not following the same pattern would compromise predictability
across drivers and complicate userspace, so I would kindly ask you to
reconsider.

Alternatively, I see two ways to special case:
 - New flag annotating amdgpu/radeon - akin to the one proposed earlier
 - Check for amdgpu/radeon in core DRM



Now on your pain point - not allowing render iocts via the primary node,
and how this patch could make it harder to achieve that goal.

While I love the idea, there are number of obstacles that prevent us
from doing so at this time:
 - Ensuring both old and new userspace does not regress
 - Drivers (QXL, others?) do not expose a render node
 - We want to avoid driver specific behaviour

The only way forward that I can see is:
 - Address QXL/others to expose render nodes
 - Add a Kconfig toggle to disable !KMS ioctls via the primary node
 - Jump-start ^^ with distribution X
 - Fix user-space fallouts
 - X months down the line, flip the Kconfig
 - In case of regressions - revert the KConfig, goto Fix user-space...


That said, the proposal will not conflict with the DRM_AUTH removal. If
anything it is step 0.5 of the grand master plan.


Tl;Dr: Removing DRM_AUTH is orthogonal to not allowing render iocts via
the primary node. Here's an idea how to achieve the latter.


Thanks
Emil
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [PATCH 01/13] drm/amdgpu: introduce and honour DRM_FORCE_AUTH workaround

2019-06-04 Thread Emil Velikov
On 2019/05/31, Koenig, Christian wrote:
> Am 29.05.19 um 18:29 schrieb Emil Velikov:
> > On 2019/05/29, Koenig, Christian wrote:
> >> Am 29.05.19 um 15:03 schrieb Emil Velikov:
> >>> On 2019/05/29, Dave Airlie wrote:
>  On Wed, 29 May 2019 at 02:47, Emil Velikov  
>  wrote:
> > On 2019/05/28, Koenig, Christian wrote:
> >> Am 28.05.19 um 18:10 schrieb Emil Velikov:
> >>> On 2019/05/28, Daniel Vetter wrote:
>  On Tue, May 28, 2019 at 10:03 AM Koenig, Christian
>   wrote:
> > Am 28.05.19 um 09:38 schrieb Daniel Vetter:
> >> [SNIP]
> >>> Might be a good idea looking into reverting it partially, so that 
> >>> at
> >>> least command submission and buffer allocation is still blocked.
> >> I thought the issue is a lot more than vainfo, it's pretty much 
> >> every
> >> hacked up compositor under the sun getting this wrong one way or
> >> another. Thinking about this some more, I also have no idea how 
> >> you'd
> >> want to deprecate rendering on primary nodes in general. Apparently
> >> that breaks -modesetting already, and probably lots more 
> >> compositors.
> >> And it looks like we're finally achieve the goal kms set out to 10
> >> years ago, and new compositors are sprouting up all the time. I 
> >> guess
> >> we could just break them all (on new hardware) and tell them to all
> >> suck it up. But I don't think that's a great option. And just
> >> deprecating this on amdgpu is going to be even harder, since then
> >> everywhere else it'll keep working, and it's just amdgpu.ko that 
> >> looks
> >> broken.
> >>
> >> Aside: I'm not supporting Emil's idea here because it fixes any 
> >> issues
> >> Intel has - Intel doesn't care. I support it because reality sucks,
> >> people get this render vs. primary vs. multi-gpu prime wrong all 
> >> the
> >> time (that's also why we have hardcoded display+gpu pairs in mesa 
> >> for
> >> the various soc combinations out there), and this looks like a
> >> pragmatic solution. It'd be nice if every compositor and everything
> >> else would perfectly support multi gpu and only use render nodes 
> >> for
> >> rendering, and only primary nodes for display. But reality is that
> >> people hack on stuff until gears on screen and then move on to more
> >> interesting things (to them). So I don't think we'll ever win this 
> >> :-/
> > Yeah, but this is a classic case of working around user space 
> > issues by
> > making kernel changes instead of fixing user space.
> >
> > Having privileged (output control) and unprivileged (rendering 
> > control)
> > functionality behind the same node is a mistake we have made a long 
> > time
> > ago and render nodes finally seemed to be a way to fix that.
> >
> > I mean why are compositors using the primary node in the first 
> > place?
> > Because they want to have access to privileged resources I think 
> > and in
> > this case it is perfectly ok to do so.
> >
> > Now extending unprivileged access to the primary node actually 
> > sounds
> > like a step into the wrong direction to me.
> >
> > I rather think that we should go down the route of completely 
> > dropping
> > command submission and buffer allocation through the primary node 
> > for
> > non master clients. And then as next step at some point drop 
> > support for
> > authentication/flink.
> >
> > I mean we have done this with UMS as well and I don't see much 
> > other way
> > to move forward and get rid of those ancient interface in the long 
> > term.
>  Well kms had some really good benefits that drove quick adoption, 
>  like
>  "suspend/resume actually has a chance of working" or "comes with
>  buffer management so you can run multiple gears".
> 
>  The render node thing is a lot more niche use case (prime, better 
>  priv
>  separation), plus "it's cleaner design". And the "cleaner design" 
>  part
>  is something that empirically doesn't seem to matter :-/ Just two
>  examples:
>  - KHR_display/leases just iterated display resources on the fd needed
>  for rendering (and iirc there was even a patch to expose that for
>  render nodes too so it works with DRI3), because implementing
>  protocols is too hard. Barely managed to stop that one before it
>  happened.
>  - Various video players use the vblank ioctl on directly to schedule
>  frames, 

Re: [PATCH 01/13] drm/amdgpu: introduce and honour DRM_FORCE_AUTH workaround

2019-06-04 Thread Daniel Vetter
On Tue, Jun 4, 2019 at 1:24 PM Koenig, Christian
 wrote:
>
> Am 04.06.19 um 12:50 schrieb Michel Dänzer:
> > On 2019-05-28 10:03 a.m., Koenig, Christian wrote:
> >> I rather think that we should go down the route of completely dropping
> >> command submission and buffer allocation through the primary node for
> >> non master clients. And then as next step at some point drop support for
> >> authentication/flink.
> > Keep in mind that even display servers aren't DRM master while their VT
> > isn't active, so this might be problematic if a display server needs to
> > do some command submission / buffer allocation during that time.
>
> If I understand it correctly the DRM fd stays master even when the VT is
> switched away, it's just not the current master any more.
>
> So in this case fpriv->is_master stays true, but
> drm_is_current_master(fpriv) returns false.
>
> And yes we mixed that up in amdgpu, i915 and vmwgfx. Somebody should
> probably write patches to fix this.

master should always be authenticated, so should be able to continue
rendering. Well ... except on drivers who do take isolation somewhat
serious, but don't have full per-client isolation. Those actually
refuse rendering/gpu access for any authenticated client if their
corresponding master isn't the current master. But I think only vmwgfx
does that.

Per-client isolation has taken over anyway, so all a bit moot, at
least on modern hw.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [PATCH 01/13] drm/amdgpu: introduce and honour DRM_FORCE_AUTH workaround

2019-06-04 Thread Koenig, Christian
Am 04.06.19 um 12:50 schrieb Michel Dänzer:
> On 2019-05-28 10:03 a.m., Koenig, Christian wrote:
>> I rather think that we should go down the route of completely dropping
>> command submission and buffer allocation through the primary node for
>> non master clients. And then as next step at some point drop support for
>> authentication/flink.
> Keep in mind that even display servers aren't DRM master while their VT
> isn't active, so this might be problematic if a display server needs to
> do some command submission / buffer allocation during that time.

If I understand it correctly the DRM fd stays master even when the VT is 
switched away, it's just not the current master any more.

So in this case fpriv->is_master stays true, but 
drm_is_current_master(fpriv) returns false.

And yes we mixed that up in amdgpu, i915 and vmwgfx. Somebody should 
probably write patches to fix this.

Regards,
Christian.
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [PATCH 01/13] drm/amdgpu: introduce and honour DRM_FORCE_AUTH workaround

2019-06-04 Thread Michel Dänzer
On 2019-05-28 10:03 a.m., Koenig, Christian wrote:
> 
> I rather think that we should go down the route of completely dropping 
> command submission and buffer allocation through the primary node for 
> non master clients. And then as next step at some point drop support for 
> authentication/flink.

Keep in mind that even display servers aren't DRM master while their VT
isn't active, so this might be problematic if a display server needs to
do some command submission / buffer allocation during that time.


-- 
Earthling Michel Dänzer   |  https://www.amd.com
Libre software enthusiast | Mesa and X developer
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [PATCH 01/13] drm/amdgpu: introduce and honour DRM_FORCE_AUTH workaround

2019-05-31 Thread Koenig, Christian
Am 29.05.19 um 18:29 schrieb Emil Velikov:
> On 2019/05/29, Koenig, Christian wrote:
>> Am 29.05.19 um 15:03 schrieb Emil Velikov:
>>> On 2019/05/29, Dave Airlie wrote:
 On Wed, 29 May 2019 at 02:47, Emil Velikov  
 wrote:
> On 2019/05/28, Koenig, Christian wrote:
>> Am 28.05.19 um 18:10 schrieb Emil Velikov:
>>> On 2019/05/28, Daniel Vetter wrote:
 On Tue, May 28, 2019 at 10:03 AM Koenig, Christian
  wrote:
> Am 28.05.19 um 09:38 schrieb Daniel Vetter:
>> [SNIP]
>>> Might be a good idea looking into reverting it partially, so that at
>>> least command submission and buffer allocation is still blocked.
>> I thought the issue is a lot more than vainfo, it's pretty much every
>> hacked up compositor under the sun getting this wrong one way or
>> another. Thinking about this some more, I also have no idea how you'd
>> want to deprecate rendering on primary nodes in general. Apparently
>> that breaks -modesetting already, and probably lots more compositors.
>> And it looks like we're finally achieve the goal kms set out to 10
>> years ago, and new compositors are sprouting up all the time. I guess
>> we could just break them all (on new hardware) and tell them to all
>> suck it up. But I don't think that's a great option. And just
>> deprecating this on amdgpu is going to be even harder, since then
>> everywhere else it'll keep working, and it's just amdgpu.ko that 
>> looks
>> broken.
>>
>> Aside: I'm not supporting Emil's idea here because it fixes any 
>> issues
>> Intel has - Intel doesn't care. I support it because reality sucks,
>> people get this render vs. primary vs. multi-gpu prime wrong all the
>> time (that's also why we have hardcoded display+gpu pairs in mesa for
>> the various soc combinations out there), and this looks like a
>> pragmatic solution. It'd be nice if every compositor and everything
>> else would perfectly support multi gpu and only use render nodes for
>> rendering, and only primary nodes for display. But reality is that
>> people hack on stuff until gears on screen and then move on to more
>> interesting things (to them). So I don't think we'll ever win this 
>> :-/
> Yeah, but this is a classic case of working around user space issues 
> by
> making kernel changes instead of fixing user space.
>
> Having privileged (output control) and unprivileged (rendering 
> control)
> functionality behind the same node is a mistake we have made a long 
> time
> ago and render nodes finally seemed to be a way to fix that.
>
> I mean why are compositors using the primary node in the first place?
> Because they want to have access to privileged resources I think and 
> in
> this case it is perfectly ok to do so.
>
> Now extending unprivileged access to the primary node actually sounds
> like a step into the wrong direction to me.
>
> I rather think that we should go down the route of completely dropping
> command submission and buffer allocation through the primary node for
> non master clients. And then as next step at some point drop support 
> for
> authentication/flink.
>
> I mean we have done this with UMS as well and I don't see much other 
> way
> to move forward and get rid of those ancient interface in the long 
> term.
 Well kms had some really good benefits that drove quick adoption, like
 "suspend/resume actually has a chance of working" or "comes with
 buffer management so you can run multiple gears".

 The render node thing is a lot more niche use case (prime, better priv
 separation), plus "it's cleaner design". And the "cleaner design" part
 is something that empirically doesn't seem to matter :-/ Just two
 examples:
 - KHR_display/leases just iterated display resources on the fd needed
 for rendering (and iirc there was even a patch to expose that for
 render nodes too so it works with DRI3), because implementing
 protocols is too hard. Barely managed to stop that one before it
 happened.
 - Various video players use the vblank ioctl on directly to schedule
 frames, without telling the compositor. I discovered that when I
 wanted to limite the vblank ioctl to master clients only. Again,
 apparently too hard to use the existing extensions, or fix the bugs in
 there, or whatever. One userspace got fixed last year, but it'll
 probably get copypasted around forever :-/

 So I don't think we'll ever manage to roll a clean 

Re: [PATCH 01/13] drm/amdgpu: introduce and honour DRM_FORCE_AUTH workaround

2019-05-29 Thread Emil Velikov
On 2019/05/29, Koenig, Christian wrote:
> Am 29.05.19 um 15:03 schrieb Emil Velikov:
> > On 2019/05/29, Dave Airlie wrote:
> >> On Wed, 29 May 2019 at 02:47, Emil Velikov  
> >> wrote:
> >>> On 2019/05/28, Koenig, Christian wrote:
>  Am 28.05.19 um 18:10 schrieb Emil Velikov:
> > On 2019/05/28, Daniel Vetter wrote:
> >> On Tue, May 28, 2019 at 10:03 AM Koenig, Christian
> >>  wrote:
> >>> Am 28.05.19 um 09:38 schrieb Daniel Vetter:
>  [SNIP]
> > Might be a good idea looking into reverting it partially, so that at
> > least command submission and buffer allocation is still blocked.
>  I thought the issue is a lot more than vainfo, it's pretty much every
>  hacked up compositor under the sun getting this wrong one way or
>  another. Thinking about this some more, I also have no idea how you'd
>  want to deprecate rendering on primary nodes in general. Apparently
>  that breaks -modesetting already, and probably lots more compositors.
>  And it looks like we're finally achieve the goal kms set out to 10
>  years ago, and new compositors are sprouting up all the time. I guess
>  we could just break them all (on new hardware) and tell them to all
>  suck it up. But I don't think that's a great option. And just
>  deprecating this on amdgpu is going to be even harder, since then
>  everywhere else it'll keep working, and it's just amdgpu.ko that 
>  looks
>  broken.
> 
>  Aside: I'm not supporting Emil's idea here because it fixes any 
>  issues
>  Intel has - Intel doesn't care. I support it because reality sucks,
>  people get this render vs. primary vs. multi-gpu prime wrong all the
>  time (that's also why we have hardcoded display+gpu pairs in mesa for
>  the various soc combinations out there), and this looks like a
>  pragmatic solution. It'd be nice if every compositor and everything
>  else would perfectly support multi gpu and only use render nodes for
>  rendering, and only primary nodes for display. But reality is that
>  people hack on stuff until gears on screen and then move on to more
>  interesting things (to them). So I don't think we'll ever win this 
>  :-/
> >>> Yeah, but this is a classic case of working around user space issues 
> >>> by
> >>> making kernel changes instead of fixing user space.
> >>>
> >>> Having privileged (output control) and unprivileged (rendering 
> >>> control)
> >>> functionality behind the same node is a mistake we have made a long 
> >>> time
> >>> ago and render nodes finally seemed to be a way to fix that.
> >>>
> >>> I mean why are compositors using the primary node in the first place?
> >>> Because they want to have access to privileged resources I think and 
> >>> in
> >>> this case it is perfectly ok to do so.
> >>>
> >>> Now extending unprivileged access to the primary node actually sounds
> >>> like a step into the wrong direction to me.
> >>>
> >>> I rather think that we should go down the route of completely dropping
> >>> command submission and buffer allocation through the primary node for
> >>> non master clients. And then as next step at some point drop support 
> >>> for
> >>> authentication/flink.
> >>>
> >>> I mean we have done this with UMS as well and I don't see much other 
> >>> way
> >>> to move forward and get rid of those ancient interface in the long 
> >>> term.
> >> Well kms had some really good benefits that drove quick adoption, like
> >> "suspend/resume actually has a chance of working" or "comes with
> >> buffer management so you can run multiple gears".
> >>
> >> The render node thing is a lot more niche use case (prime, better priv
> >> separation), plus "it's cleaner design". And the "cleaner design" part
> >> is something that empirically doesn't seem to matter :-/ Just two
> >> examples:
> >> - KHR_display/leases just iterated display resources on the fd needed
> >> for rendering (and iirc there was even a patch to expose that for
> >> render nodes too so it works with DRI3), because implementing
> >> protocols is too hard. Barely managed to stop that one before it
> >> happened.
> >> - Various video players use the vblank ioctl on directly to schedule
> >> frames, without telling the compositor. I discovered that when I
> >> wanted to limite the vblank ioctl to master clients only. Again,
> >> apparently too hard to use the existing extensions, or fix the bugs in
> >> there, or whatever. One userspace got fixed last year, but it'll
> >> probably get copypasted around forever :-/
> >>
> >> So I don't think we'll ever manage to roll a clean split out, and best
> >> we can do is give in 

Re: [PATCH 01/13] drm/amdgpu: introduce and honour DRM_FORCE_AUTH workaround

2019-05-29 Thread Koenig, Christian
Am 29.05.19 um 15:03 schrieb Emil Velikov:
> On 2019/05/29, Dave Airlie wrote:
>> On Wed, 29 May 2019 at 02:47, Emil Velikov  wrote:
>>> On 2019/05/28, Koenig, Christian wrote:
 Am 28.05.19 um 18:10 schrieb Emil Velikov:
> On 2019/05/28, Daniel Vetter wrote:
>> On Tue, May 28, 2019 at 10:03 AM Koenig, Christian
>>  wrote:
>>> Am 28.05.19 um 09:38 schrieb Daniel Vetter:
 [SNIP]
> Might be a good idea looking into reverting it partially, so that at
> least command submission and buffer allocation is still blocked.
 I thought the issue is a lot more than vainfo, it's pretty much every
 hacked up compositor under the sun getting this wrong one way or
 another. Thinking about this some more, I also have no idea how you'd
 want to deprecate rendering on primary nodes in general. Apparently
 that breaks -modesetting already, and probably lots more compositors.
 And it looks like we're finally achieve the goal kms set out to 10
 years ago, and new compositors are sprouting up all the time. I guess
 we could just break them all (on new hardware) and tell them to all
 suck it up. But I don't think that's a great option. And just
 deprecating this on amdgpu is going to be even harder, since then
 everywhere else it'll keep working, and it's just amdgpu.ko that looks
 broken.

 Aside: I'm not supporting Emil's idea here because it fixes any issues
 Intel has - Intel doesn't care. I support it because reality sucks,
 people get this render vs. primary vs. multi-gpu prime wrong all the
 time (that's also why we have hardcoded display+gpu pairs in mesa for
 the various soc combinations out there), and this looks like a
 pragmatic solution. It'd be nice if every compositor and everything
 else would perfectly support multi gpu and only use render nodes for
 rendering, and only primary nodes for display. But reality is that
 people hack on stuff until gears on screen and then move on to more
 interesting things (to them). So I don't think we'll ever win this :-/
>>> Yeah, but this is a classic case of working around user space issues by
>>> making kernel changes instead of fixing user space.
>>>
>>> Having privileged (output control) and unprivileged (rendering control)
>>> functionality behind the same node is a mistake we have made a long time
>>> ago and render nodes finally seemed to be a way to fix that.
>>>
>>> I mean why are compositors using the primary node in the first place?
>>> Because they want to have access to privileged resources I think and in
>>> this case it is perfectly ok to do so.
>>>
>>> Now extending unprivileged access to the primary node actually sounds
>>> like a step into the wrong direction to me.
>>>
>>> I rather think that we should go down the route of completely dropping
>>> command submission and buffer allocation through the primary node for
>>> non master clients. And then as next step at some point drop support for
>>> authentication/flink.
>>>
>>> I mean we have done this with UMS as well and I don't see much other way
>>> to move forward and get rid of those ancient interface in the long term.
>> Well kms had some really good benefits that drove quick adoption, like
>> "suspend/resume actually has a chance of working" or "comes with
>> buffer management so you can run multiple gears".
>>
>> The render node thing is a lot more niche use case (prime, better priv
>> separation), plus "it's cleaner design". And the "cleaner design" part
>> is something that empirically doesn't seem to matter :-/ Just two
>> examples:
>> - KHR_display/leases just iterated display resources on the fd needed
>> for rendering (and iirc there was even a patch to expose that for
>> render nodes too so it works with DRI3), because implementing
>> protocols is too hard. Barely managed to stop that one before it
>> happened.
>> - Various video players use the vblank ioctl on directly to schedule
>> frames, without telling the compositor. I discovered that when I
>> wanted to limite the vblank ioctl to master clients only. Again,
>> apparently too hard to use the existing extensions, or fix the bugs in
>> there, or whatever. One userspace got fixed last year, but it'll
>> probably get copypasted around forever :-/
>>
>> So I don't think we'll ever manage to roll a clean split out, and best
>> we can do is give in and just hand userspace what it wants. As much as
>> that's misguided and unclean and all that. Maybe it'll result in a
>> least fewer stuff getting run as root to hack around this, because
>> fixing properly seems not to be on the table.
>>
>> The beauty of kms is that we've achieved the 

Re: [PATCH 01/13] drm/amdgpu: introduce and honour DRM_FORCE_AUTH workaround

2019-05-29 Thread Emil Velikov
On 2019/05/29, Dave Airlie wrote:
> On Wed, 29 May 2019 at 02:47, Emil Velikov  wrote:
> >
> > On 2019/05/28, Koenig, Christian wrote:
> > > Am 28.05.19 um 18:10 schrieb Emil Velikov:
> > > > On 2019/05/28, Daniel Vetter wrote:
> > > >> On Tue, May 28, 2019 at 10:03 AM Koenig, Christian
> > > >>  wrote:
> > > >>> Am 28.05.19 um 09:38 schrieb Daniel Vetter:
> > >  [SNIP]
> > > > Might be a good idea looking into reverting it partially, so that at
> > > > least command submission and buffer allocation is still blocked.
> > >  I thought the issue is a lot more than vainfo, it's pretty much every
> > >  hacked up compositor under the sun getting this wrong one way or
> > >  another. Thinking about this some more, I also have no idea how you'd
> > >  want to deprecate rendering on primary nodes in general. Apparently
> > >  that breaks -modesetting already, and probably lots more compositors.
> > >  And it looks like we're finally achieve the goal kms set out to 10
> > >  years ago, and new compositors are sprouting up all the time. I guess
> > >  we could just break them all (on new hardware) and tell them to all
> > >  suck it up. But I don't think that's a great option. And just
> > >  deprecating this on amdgpu is going to be even harder, since then
> > >  everywhere else it'll keep working, and it's just amdgpu.ko that 
> > >  looks
> > >  broken.
> > > 
> > >  Aside: I'm not supporting Emil's idea here because it fixes any 
> > >  issues
> > >  Intel has - Intel doesn't care. I support it because reality sucks,
> > >  people get this render vs. primary vs. multi-gpu prime wrong all the
> > >  time (that's also why we have hardcoded display+gpu pairs in mesa for
> > >  the various soc combinations out there), and this looks like a
> > >  pragmatic solution. It'd be nice if every compositor and everything
> > >  else would perfectly support multi gpu and only use render nodes for
> > >  rendering, and only primary nodes for display. But reality is that
> > >  people hack on stuff until gears on screen and then move on to more
> > >  interesting things (to them). So I don't think we'll ever win this 
> > >  :-/
> > > >>> Yeah, but this is a classic case of working around user space issues 
> > > >>> by
> > > >>> making kernel changes instead of fixing user space.
> > > >>>
> > > >>> Having privileged (output control) and unprivileged (rendering 
> > > >>> control)
> > > >>> functionality behind the same node is a mistake we have made a long 
> > > >>> time
> > > >>> ago and render nodes finally seemed to be a way to fix that.
> > > >>>
> > > >>> I mean why are compositors using the primary node in the first place?
> > > >>> Because they want to have access to privileged resources I think and 
> > > >>> in
> > > >>> this case it is perfectly ok to do so.
> > > >>>
> > > >>> Now extending unprivileged access to the primary node actually sounds
> > > >>> like a step into the wrong direction to me.
> > > >>>
> > > >>> I rather think that we should go down the route of completely dropping
> > > >>> command submission and buffer allocation through the primary node for
> > > >>> non master clients. And then as next step at some point drop support 
> > > >>> for
> > > >>> authentication/flink.
> > > >>>
> > > >>> I mean we have done this with UMS as well and I don't see much other 
> > > >>> way
> > > >>> to move forward and get rid of those ancient interface in the long 
> > > >>> term.
> > > >> Well kms had some really good benefits that drove quick adoption, like
> > > >> "suspend/resume actually has a chance of working" or "comes with
> > > >> buffer management so you can run multiple gears".
> > > >>
> > > >> The render node thing is a lot more niche use case (prime, better priv
> > > >> separation), plus "it's cleaner design". And the "cleaner design" part
> > > >> is something that empirically doesn't seem to matter :-/ Just two
> > > >> examples:
> > > >> - KHR_display/leases just iterated display resources on the fd needed
> > > >> for rendering (and iirc there was even a patch to expose that for
> > > >> render nodes too so it works with DRI3), because implementing
> > > >> protocols is too hard. Barely managed to stop that one before it
> > > >> happened.
> > > >> - Various video players use the vblank ioctl on directly to schedule
> > > >> frames, without telling the compositor. I discovered that when I
> > > >> wanted to limite the vblank ioctl to master clients only. Again,
> > > >> apparently too hard to use the existing extensions, or fix the bugs in
> > > >> there, or whatever. One userspace got fixed last year, but it'll
> > > >> probably get copypasted around forever :-/
> > > >>
> > > >> So I don't think we'll ever manage to roll a clean split out, and best
> > > >> we can do is give in and just hand userspace what it wants. As much as
> > > >> that's misguided and unclean and all 

Re: [PATCH 01/13] drm/amdgpu: introduce and honour DRM_FORCE_AUTH workaround

2019-05-28 Thread Dave Airlie
On Wed, 29 May 2019 at 02:47, Emil Velikov  wrote:
>
> On 2019/05/28, Koenig, Christian wrote:
> > Am 28.05.19 um 18:10 schrieb Emil Velikov:
> > > On 2019/05/28, Daniel Vetter wrote:
> > >> On Tue, May 28, 2019 at 10:03 AM Koenig, Christian
> > >>  wrote:
> > >>> Am 28.05.19 um 09:38 schrieb Daniel Vetter:
> >  [SNIP]
> > > Might be a good idea looking into reverting it partially, so that at
> > > least command submission and buffer allocation is still blocked.
> >  I thought the issue is a lot more than vainfo, it's pretty much every
> >  hacked up compositor under the sun getting this wrong one way or
> >  another. Thinking about this some more, I also have no idea how you'd
> >  want to deprecate rendering on primary nodes in general. Apparently
> >  that breaks -modesetting already, and probably lots more compositors.
> >  And it looks like we're finally achieve the goal kms set out to 10
> >  years ago, and new compositors are sprouting up all the time. I guess
> >  we could just break them all (on new hardware) and tell them to all
> >  suck it up. But I don't think that's a great option. And just
> >  deprecating this on amdgpu is going to be even harder, since then
> >  everywhere else it'll keep working, and it's just amdgpu.ko that looks
> >  broken.
> > 
> >  Aside: I'm not supporting Emil's idea here because it fixes any issues
> >  Intel has - Intel doesn't care. I support it because reality sucks,
> >  people get this render vs. primary vs. multi-gpu prime wrong all the
> >  time (that's also why we have hardcoded display+gpu pairs in mesa for
> >  the various soc combinations out there), and this looks like a
> >  pragmatic solution. It'd be nice if every compositor and everything
> >  else would perfectly support multi gpu and only use render nodes for
> >  rendering, and only primary nodes for display. But reality is that
> >  people hack on stuff until gears on screen and then move on to more
> >  interesting things (to them). So I don't think we'll ever win this :-/
> > >>> Yeah, but this is a classic case of working around user space issues by
> > >>> making kernel changes instead of fixing user space.
> > >>>
> > >>> Having privileged (output control) and unprivileged (rendering control)
> > >>> functionality behind the same node is a mistake we have made a long time
> > >>> ago and render nodes finally seemed to be a way to fix that.
> > >>>
> > >>> I mean why are compositors using the primary node in the first place?
> > >>> Because they want to have access to privileged resources I think and in
> > >>> this case it is perfectly ok to do so.
> > >>>
> > >>> Now extending unprivileged access to the primary node actually sounds
> > >>> like a step into the wrong direction to me.
> > >>>
> > >>> I rather think that we should go down the route of completely dropping
> > >>> command submission and buffer allocation through the primary node for
> > >>> non master clients. And then as next step at some point drop support for
> > >>> authentication/flink.
> > >>>
> > >>> I mean we have done this with UMS as well and I don't see much other way
> > >>> to move forward and get rid of those ancient interface in the long term.
> > >> Well kms had some really good benefits that drove quick adoption, like
> > >> "suspend/resume actually has a chance of working" or "comes with
> > >> buffer management so you can run multiple gears".
> > >>
> > >> The render node thing is a lot more niche use case (prime, better priv
> > >> separation), plus "it's cleaner design". And the "cleaner design" part
> > >> is something that empirically doesn't seem to matter :-/ Just two
> > >> examples:
> > >> - KHR_display/leases just iterated display resources on the fd needed
> > >> for rendering (and iirc there was even a patch to expose that for
> > >> render nodes too so it works with DRI3), because implementing
> > >> protocols is too hard. Barely managed to stop that one before it
> > >> happened.
> > >> - Various video players use the vblank ioctl on directly to schedule
> > >> frames, without telling the compositor. I discovered that when I
> > >> wanted to limite the vblank ioctl to master clients only. Again,
> > >> apparently too hard to use the existing extensions, or fix the bugs in
> > >> there, or whatever. One userspace got fixed last year, but it'll
> > >> probably get copypasted around forever :-/
> > >>
> > >> So I don't think we'll ever manage to roll a clean split out, and best
> > >> we can do is give in and just hand userspace what it wants. As much as
> > >> that's misguided and unclean and all that. Maybe it'll result in a
> > >> least fewer stuff getting run as root to hack around this, because
> > >> fixing properly seems not to be on the table.
> > >>
> > >> The beauty of kms is that we've achieved the mission, everyone's
> > >> writing their own thing. Which is also terrible, and I don't 

Re: [PATCH 01/13] drm/amdgpu: introduce and honour DRM_FORCE_AUTH workaround

2019-05-28 Thread Emil Velikov
On 2019/05/28, Koenig, Christian wrote:
> Am 28.05.19 um 18:10 schrieb Emil Velikov:
> > On 2019/05/28, Daniel Vetter wrote:
> >> On Tue, May 28, 2019 at 10:03 AM Koenig, Christian
> >>  wrote:
> >>> Am 28.05.19 um 09:38 schrieb Daniel Vetter:
>  [SNIP]
> > Might be a good idea looking into reverting it partially, so that at
> > least command submission and buffer allocation is still blocked.
>  I thought the issue is a lot more than vainfo, it's pretty much every
>  hacked up compositor under the sun getting this wrong one way or
>  another. Thinking about this some more, I also have no idea how you'd
>  want to deprecate rendering on primary nodes in general. Apparently
>  that breaks -modesetting already, and probably lots more compositors.
>  And it looks like we're finally achieve the goal kms set out to 10
>  years ago, and new compositors are sprouting up all the time. I guess
>  we could just break them all (on new hardware) and tell them to all
>  suck it up. But I don't think that's a great option. And just
>  deprecating this on amdgpu is going to be even harder, since then
>  everywhere else it'll keep working, and it's just amdgpu.ko that looks
>  broken.
> 
>  Aside: I'm not supporting Emil's idea here because it fixes any issues
>  Intel has - Intel doesn't care. I support it because reality sucks,
>  people get this render vs. primary vs. multi-gpu prime wrong all the
>  time (that's also why we have hardcoded display+gpu pairs in mesa for
>  the various soc combinations out there), and this looks like a
>  pragmatic solution. It'd be nice if every compositor and everything
>  else would perfectly support multi gpu and only use render nodes for
>  rendering, and only primary nodes for display. But reality is that
>  people hack on stuff until gears on screen and then move on to more
>  interesting things (to them). So I don't think we'll ever win this :-/
> >>> Yeah, but this is a classic case of working around user space issues by
> >>> making kernel changes instead of fixing user space.
> >>>
> >>> Having privileged (output control) and unprivileged (rendering control)
> >>> functionality behind the same node is a mistake we have made a long time
> >>> ago and render nodes finally seemed to be a way to fix that.
> >>>
> >>> I mean why are compositors using the primary node in the first place?
> >>> Because they want to have access to privileged resources I think and in
> >>> this case it is perfectly ok to do so.
> >>>
> >>> Now extending unprivileged access to the primary node actually sounds
> >>> like a step into the wrong direction to me.
> >>>
> >>> I rather think that we should go down the route of completely dropping
> >>> command submission and buffer allocation through the primary node for
> >>> non master clients. And then as next step at some point drop support for
> >>> authentication/flink.
> >>>
> >>> I mean we have done this with UMS as well and I don't see much other way
> >>> to move forward and get rid of those ancient interface in the long term.
> >> Well kms had some really good benefits that drove quick adoption, like
> >> "suspend/resume actually has a chance of working" or "comes with
> >> buffer management so you can run multiple gears".
> >>
> >> The render node thing is a lot more niche use case (prime, better priv
> >> separation), plus "it's cleaner design". And the "cleaner design" part
> >> is something that empirically doesn't seem to matter :-/ Just two
> >> examples:
> >> - KHR_display/leases just iterated display resources on the fd needed
> >> for rendering (and iirc there was even a patch to expose that for
> >> render nodes too so it works with DRI3), because implementing
> >> protocols is too hard. Barely managed to stop that one before it
> >> happened.
> >> - Various video players use the vblank ioctl on directly to schedule
> >> frames, without telling the compositor. I discovered that when I
> >> wanted to limite the vblank ioctl to master clients only. Again,
> >> apparently too hard to use the existing extensions, or fix the bugs in
> >> there, or whatever. One userspace got fixed last year, but it'll
> >> probably get copypasted around forever :-/
> >>
> >> So I don't think we'll ever manage to roll a clean split out, and best
> >> we can do is give in and just hand userspace what it wants. As much as
> >> that's misguided and unclean and all that. Maybe it'll result in a
> >> least fewer stuff getting run as root to hack around this, because
> >> fixing properly seems not to be on the table.
> >>
> >> The beauty of kms is that we've achieved the mission, everyone's
> >> writing their own thing. Which is also terrible, and I don't think
> >> it'll get better.
> >
> > With the risk of coming rude I will repeat my earlier comment:
> >
> > The problem is _neither_ Intel nor libva specific.
> >
> >
> >
> > That said, let's step back for a moment 

Re: [PATCH 01/13] drm/amdgpu: introduce and honour DRM_FORCE_AUTH workaround

2019-05-28 Thread Koenig, Christian
Am 28.05.19 um 18:10 schrieb Emil Velikov:
> On 2019/05/28, Daniel Vetter wrote:
>> On Tue, May 28, 2019 at 10:03 AM Koenig, Christian
>>  wrote:
>>> Am 28.05.19 um 09:38 schrieb Daniel Vetter:
 [SNIP]
> Might be a good idea looking into reverting it partially, so that at
> least command submission and buffer allocation is still blocked.
 I thought the issue is a lot more than vainfo, it's pretty much every
 hacked up compositor under the sun getting this wrong one way or
 another. Thinking about this some more, I also have no idea how you'd
 want to deprecate rendering on primary nodes in general. Apparently
 that breaks -modesetting already, and probably lots more compositors.
 And it looks like we're finally achieve the goal kms set out to 10
 years ago, and new compositors are sprouting up all the time. I guess
 we could just break them all (on new hardware) and tell them to all
 suck it up. But I don't think that's a great option. And just
 deprecating this on amdgpu is going to be even harder, since then
 everywhere else it'll keep working, and it's just amdgpu.ko that looks
 broken.

 Aside: I'm not supporting Emil's idea here because it fixes any issues
 Intel has - Intel doesn't care. I support it because reality sucks,
 people get this render vs. primary vs. multi-gpu prime wrong all the
 time (that's also why we have hardcoded display+gpu pairs in mesa for
 the various soc combinations out there), and this looks like a
 pragmatic solution. It'd be nice if every compositor and everything
 else would perfectly support multi gpu and only use render nodes for
 rendering, and only primary nodes for display. But reality is that
 people hack on stuff until gears on screen and then move on to more
 interesting things (to them). So I don't think we'll ever win this :-/
>>> Yeah, but this is a classic case of working around user space issues by
>>> making kernel changes instead of fixing user space.
>>>
>>> Having privileged (output control) and unprivileged (rendering control)
>>> functionality behind the same node is a mistake we have made a long time
>>> ago and render nodes finally seemed to be a way to fix that.
>>>
>>> I mean why are compositors using the primary node in the first place?
>>> Because they want to have access to privileged resources I think and in
>>> this case it is perfectly ok to do so.
>>>
>>> Now extending unprivileged access to the primary node actually sounds
>>> like a step into the wrong direction to me.
>>>
>>> I rather think that we should go down the route of completely dropping
>>> command submission and buffer allocation through the primary node for
>>> non master clients. And then as next step at some point drop support for
>>> authentication/flink.
>>>
>>> I mean we have done this with UMS as well and I don't see much other way
>>> to move forward and get rid of those ancient interface in the long term.
>> Well kms had some really good benefits that drove quick adoption, like
>> "suspend/resume actually has a chance of working" or "comes with
>> buffer management so you can run multiple gears".
>>
>> The render node thing is a lot more niche use case (prime, better priv
>> separation), plus "it's cleaner design". And the "cleaner design" part
>> is something that empirically doesn't seem to matter :-/ Just two
>> examples:
>> - KHR_display/leases just iterated display resources on the fd needed
>> for rendering (and iirc there was even a patch to expose that for
>> render nodes too so it works with DRI3), because implementing
>> protocols is too hard. Barely managed to stop that one before it
>> happened.
>> - Various video players use the vblank ioctl on directly to schedule
>> frames, without telling the compositor. I discovered that when I
>> wanted to limite the vblank ioctl to master clients only. Again,
>> apparently too hard to use the existing extensions, or fix the bugs in
>> there, or whatever. One userspace got fixed last year, but it'll
>> probably get copypasted around forever :-/
>>
>> So I don't think we'll ever manage to roll a clean split out, and best
>> we can do is give in and just hand userspace what it wants. As much as
>> that's misguided and unclean and all that. Maybe it'll result in a
>> least fewer stuff getting run as root to hack around this, because
>> fixing properly seems not to be on the table.
>>
>> The beauty of kms is that we've achieved the mission, everyone's
>> writing their own thing. Which is also terrible, and I don't think
>> it'll get better.
>
> With the risk of coming rude I will repeat my earlier comment:
>
> The problem is _neither_ Intel nor libva specific.
>
>
>
> That said, let's step back for a moment and consider:
>
>   - the "block everything but KMS via the primary node" idea is great but
> orthogonal
>
>   - the series does address issues that are vendor-agnostic
>
>   - by default this series does _not_ cause 

Re: [PATCH 01/13] drm/amdgpu: introduce and honour DRM_FORCE_AUTH workaround

2019-05-28 Thread Emil Velikov
On 2019/05/28, Daniel Vetter wrote:
> On Tue, May 28, 2019 at 10:03 AM Koenig, Christian
>  wrote:
> >
> > Am 28.05.19 um 09:38 schrieb Daniel Vetter:
> > > [SNIP]
> > >> Might be a good idea looking into reverting it partially, so that at
> > >> least command submission and buffer allocation is still blocked.
> > > I thought the issue is a lot more than vainfo, it's pretty much every
> > > hacked up compositor under the sun getting this wrong one way or
> > > another. Thinking about this some more, I also have no idea how you'd
> > > want to deprecate rendering on primary nodes in general. Apparently
> > > that breaks -modesetting already, and probably lots more compositors.
> > > And it looks like we're finally achieve the goal kms set out to 10
> > > years ago, and new compositors are sprouting up all the time. I guess
> > > we could just break them all (on new hardware) and tell them to all
> > > suck it up. But I don't think that's a great option. And just
> > > deprecating this on amdgpu is going to be even harder, since then
> > > everywhere else it'll keep working, and it's just amdgpu.ko that looks
> > > broken.
> > >
> > > Aside: I'm not supporting Emil's idea here because it fixes any issues
> > > Intel has - Intel doesn't care. I support it because reality sucks,
> > > people get this render vs. primary vs. multi-gpu prime wrong all the
> > > time (that's also why we have hardcoded display+gpu pairs in mesa for
> > > the various soc combinations out there), and this looks like a
> > > pragmatic solution. It'd be nice if every compositor and everything
> > > else would perfectly support multi gpu and only use render nodes for
> > > rendering, and only primary nodes for display. But reality is that
> > > people hack on stuff until gears on screen and then move on to more
> > > interesting things (to them). So I don't think we'll ever win this :-/
> >
> > Yeah, but this is a classic case of working around user space issues by
> > making kernel changes instead of fixing user space.
> >
> > Having privileged (output control) and unprivileged (rendering control)
> > functionality behind the same node is a mistake we have made a long time
> > ago and render nodes finally seemed to be a way to fix that.
> >
> > I mean why are compositors using the primary node in the first place?
> > Because they want to have access to privileged resources I think and in
> > this case it is perfectly ok to do so.
> >
> > Now extending unprivileged access to the primary node actually sounds
> > like a step into the wrong direction to me.
> >
> > I rather think that we should go down the route of completely dropping
> > command submission and buffer allocation through the primary node for
> > non master clients. And then as next step at some point drop support for
> > authentication/flink.
> >
> > I mean we have done this with UMS as well and I don't see much other way
> > to move forward and get rid of those ancient interface in the long term.
> 
> Well kms had some really good benefits that drove quick adoption, like
> "suspend/resume actually has a chance of working" or "comes with
> buffer management so you can run multiple gears".
> 
> The render node thing is a lot more niche use case (prime, better priv
> separation), plus "it's cleaner design". And the "cleaner design" part
> is something that empirically doesn't seem to matter :-/ Just two
> examples:
> - KHR_display/leases just iterated display resources on the fd needed
> for rendering (and iirc there was even a patch to expose that for
> render nodes too so it works with DRI3), because implementing
> protocols is too hard. Barely managed to stop that one before it
> happened.
> - Various video players use the vblank ioctl on directly to schedule
> frames, without telling the compositor. I discovered that when I
> wanted to limite the vblank ioctl to master clients only. Again,
> apparently too hard to use the existing extensions, or fix the bugs in
> there, or whatever. One userspace got fixed last year, but it'll
> probably get copypasted around forever :-/
> 
> So I don't think we'll ever manage to roll a clean split out, and best
> we can do is give in and just hand userspace what it wants. As much as
> that's misguided and unclean and all that. Maybe it'll result in a
> least fewer stuff getting run as root to hack around this, because
> fixing properly seems not to be on the table.
> 
> The beauty of kms is that we've achieved the mission, everyone's
> writing their own thing. Which is also terrible, and I don't think
> it'll get better.


With the risk of coming rude I will repeat my earlier comment:

The problem is _neither_ Intel nor libva specific.



That said, let's step back for a moment and consider:

 - the "block everything but KMS via the primary node" idea is great but
orthogonal

 - the series does address issues that are vendor-agnostic

 - by default this series does _not_ cause any regression be that for
new or old userspace

 - there 

Re: [PATCH 01/13] drm/amdgpu: introduce and honour DRM_FORCE_AUTH workaround

2019-05-28 Thread Daniel Vetter
On Tue, May 28, 2019 at 10:03 AM Koenig, Christian
 wrote:
>
> Am 28.05.19 um 09:38 schrieb Daniel Vetter:
> > [SNIP]
> >> Might be a good idea looking into reverting it partially, so that at
> >> least command submission and buffer allocation is still blocked.
> > I thought the issue is a lot more than vainfo, it's pretty much every
> > hacked up compositor under the sun getting this wrong one way or
> > another. Thinking about this some more, I also have no idea how you'd
> > want to deprecate rendering on primary nodes in general. Apparently
> > that breaks -modesetting already, and probably lots more compositors.
> > And it looks like we're finally achieve the goal kms set out to 10
> > years ago, and new compositors are sprouting up all the time. I guess
> > we could just break them all (on new hardware) and tell them to all
> > suck it up. But I don't think that's a great option. And just
> > deprecating this on amdgpu is going to be even harder, since then
> > everywhere else it'll keep working, and it's just amdgpu.ko that looks
> > broken.
> >
> > Aside: I'm not supporting Emil's idea here because it fixes any issues
> > Intel has - Intel doesn't care. I support it because reality sucks,
> > people get this render vs. primary vs. multi-gpu prime wrong all the
> > time (that's also why we have hardcoded display+gpu pairs in mesa for
> > the various soc combinations out there), and this looks like a
> > pragmatic solution. It'd be nice if every compositor and everything
> > else would perfectly support multi gpu and only use render nodes for
> > rendering, and only primary nodes for display. But reality is that
> > people hack on stuff until gears on screen and then move on to more
> > interesting things (to them). So I don't think we'll ever win this :-/
>
> Yeah, but this is a classic case of working around user space issues by
> making kernel changes instead of fixing user space.
>
> Having privileged (output control) and unprivileged (rendering control)
> functionality behind the same node is a mistake we have made a long time
> ago and render nodes finally seemed to be a way to fix that.
>
> I mean why are compositors using the primary node in the first place?
> Because they want to have access to privileged resources I think and in
> this case it is perfectly ok to do so.
>
> Now extending unprivileged access to the primary node actually sounds
> like a step into the wrong direction to me.
>
> I rather think that we should go down the route of completely dropping
> command submission and buffer allocation through the primary node for
> non master clients. And then as next step at some point drop support for
> authentication/flink.
>
> I mean we have done this with UMS as well and I don't see much other way
> to move forward and get rid of those ancient interface in the long term.

Well kms had some really good benefits that drove quick adoption, like
"suspend/resume actually has a chance of working" or "comes with
buffer management so you can run multiple gears".

The render node thing is a lot more niche use case (prime, better priv
separation), plus "it's cleaner design". And the "cleaner design" part
is something that empirically doesn't seem to matter :-/ Just two
examples:
- KHR_display/leases just iterated display resources on the fd needed
for rendering (and iirc there was even a patch to expose that for
render nodes too so it works with DRI3), because implementing
protocols is too hard. Barely managed to stop that one before it
happened.
- Various video players use the vblank ioctl on directly to schedule
frames, without telling the compositor. I discovered that when I
wanted to limite the vblank ioctl to master clients only. Again,
apparently too hard to use the existing extensions, or fix the bugs in
there, or whatever. One userspace got fixed last year, but it'll
probably get copypasted around forever :-/

So I don't think we'll ever manage to roll a clean split out, and best
we can do is give in and just hand userspace what it wants. As much as
that's misguided and unclean and all that. Maybe it'll result in a
least fewer stuff getting run as root to hack around this, because
fixing properly seems not to be on the table.

The beauty of kms is that we've achieved the mission, everyone's
writing their own thing. Which is also terrible, and I don't think
it'll get better.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [PATCH 01/13] drm/amdgpu: introduce and honour DRM_FORCE_AUTH workaround

2019-05-28 Thread Koenig, Christian
Am 28.05.19 um 09:38 schrieb Daniel Vetter:
> [SNIP]
>> Might be a good idea looking into reverting it partially, so that at
>> least command submission and buffer allocation is still blocked.
> I thought the issue is a lot more than vainfo, it's pretty much every
> hacked up compositor under the sun getting this wrong one way or
> another. Thinking about this some more, I also have no idea how you'd
> want to deprecate rendering on primary nodes in general. Apparently
> that breaks -modesetting already, and probably lots more compositors.
> And it looks like we're finally achieve the goal kms set out to 10
> years ago, and new compositors are sprouting up all the time. I guess
> we could just break them all (on new hardware) and tell them to all
> suck it up. But I don't think that's a great option. And just
> deprecating this on amdgpu is going to be even harder, since then
> everywhere else it'll keep working, and it's just amdgpu.ko that looks
> broken.
>
> Aside: I'm not supporting Emil's idea here because it fixes any issues
> Intel has - Intel doesn't care. I support it because reality sucks,
> people get this render vs. primary vs. multi-gpu prime wrong all the
> time (that's also why we have hardcoded display+gpu pairs in mesa for
> the various soc combinations out there), and this looks like a
> pragmatic solution. It'd be nice if every compositor and everything
> else would perfectly support multi gpu and only use render nodes for
> rendering, and only primary nodes for display. But reality is that
> people hack on stuff until gears on screen and then move on to more
> interesting things (to them). So I don't think we'll ever win this :-/

Yeah, but this is a classic case of working around user space issues by 
making kernel changes instead of fixing user space.

Having privileged (output control) and unprivileged (rendering control) 
functionality behind the same node is a mistake we have made a long time 
ago and render nodes finally seemed to be a way to fix that.

I mean why are compositors using the primary node in the first place? 
Because they want to have access to privileged resources I think and in 
this case it is perfectly ok to do so.

Now extending unprivileged access to the primary node actually sounds 
like a step into the wrong direction to me.

I rather think that we should go down the route of completely dropping 
command submission and buffer allocation through the primary node for 
non master clients. And then as next step at some point drop support for 
authentication/flink.

I mean we have done this with UMS as well and I don't see much other way 
to move forward and get rid of those ancient interface in the long term.

Regards,
Christian.

> -Daniel

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [PATCH 01/13] drm/amdgpu: introduce and honour DRM_FORCE_AUTH workaround

2019-05-28 Thread Daniel Vetter
On Tue, May 28, 2019 at 8:58 AM Koenig, Christian
 wrote:
>
> Am 27.05.19 um 17:26 schrieb Daniel Vetter:
> > On Mon, May 27, 2019 at 3:42 PM Koenig, Christian
> >  wrote:
> >> Am 27.05.19 um 15:26 schrieb Emil Velikov:
> >>> On 2019/05/27, Daniel Vetter wrote:
>  On Mon, May 27, 2019 at 10:47:39AM +, Koenig, Christian wrote:
> > Am 27.05.19 um 10:17 schrieb Emil Velikov:
> >> From: Emil Velikov 
> >>
> >> Currently one can circumvent DRM_AUTH, when the ioctl is exposed via 
> >> the
> >> render node. A seemingly deliberate design decision.
> >>
> >> Hence we can drop the DRM_AUTH all together (details in follow-up 
> >> patch)
> >> yet not all userspace checks if it's authenticated, but instead uses
> >> uncommon assumptions.
> >>
> >> After days of digging through git log and testing, only a single 
> >> (ab)use
> >> was spotted - the Mesa RADV driver, using the AMDGPU_INFO ioctl and
> >> assuming that failure implies lack of authentication.
> >>
> >> Affected versions are:
> >> - the whole 18.2.x series, which is EOL
> >> - the whole 18.3.x series, which is EOL
> >> - the 19.0.x series, prior to 19.0.4
> > Well you could have saved your time, cause this is still a NAK.
> >
> > For the record: I strongly think that we don't want to expose any render
> > functionality on the primary node.
> >
> > To even go a step further I would say that at least on AMD hardware we
> > should completely disable DRI2 for one of the next generations.
> >
> > As a follow up I would then completely disallow the DRM authentication
> > for amdgpu, so that the command submission interface on the primary node
> > can only be used by the display server.
>  So amdgpu is running in one direction, while everyone else is running in
>  the other direction? Please note that your patch to change i915 landed
>  already, so that ship is sailing (but we could ofc revert that back
>  again).
> 
>  Imo really not a great idea if we do a amdgpu vs. everyone else split
>  here. If we want to deprecate dri2/flink/rendering on primary nodes 
>  across
>  the stack, then that should be a stack wide decision, not an amdgpu one.
> 
> >>> Cannot agree more - I would love to see drivers stay consistent.
> >> Yeah, completely agree to that. That's why I think we should not do this
> >> at all and just let Intel fix it's userspace bugs :P
> > So you're planning to submit that revert? You did jump the gun with
> > sending out that patch after all too ... (aside from it got merged
> > because of some other mixup with r-b tags and what not).
>
> Well already regretting submitting that. On the other hand what is the
> minimum IOCTLs we need to get working to fix the vainfo issue?

We have a bit more time, it's only going to be in 5.3. But yeah if we
don't bottom out on any of the options here I think revert it has to
be.

> Might be a good idea looking into reverting it partially, so that at
> least command submission and buffer allocation is still blocked.

I thought the issue is a lot more than vainfo, it's pretty much every
hacked up compositor under the sun getting this wrong one way or
another. Thinking about this some more, I also have no idea how you'd
want to deprecate rendering on primary nodes in general. Apparently
that breaks -modesetting already, and probably lots more compositors.
And it looks like we're finally achieve the goal kms set out to 10
years ago, and new compositors are sprouting up all the time. I guess
we could just break them all (on new hardware) and tell them to all
suck it up. But I don't think that's a great option. And just
deprecating this on amdgpu is going to be even harder, since then
everywhere else it'll keep working, and it's just amdgpu.ko that looks
broken.

Aside: I'm not supporting Emil's idea here because it fixes any issues
Intel has - Intel doesn't care. I support it because reality sucks,
people get this render vs. primary vs. multi-gpu prime wrong all the
time (that's also why we have hardcoded display+gpu pairs in mesa for
the various soc combinations out there), and this looks like a
pragmatic solution. It'd be nice if every compositor and everything
else would perfectly support multi gpu and only use render nodes for
rendering, and only primary nodes for display. But reality is that
people hack on stuff until gears on screen and then move on to more
interesting things (to them). So I don't think we'll ever win this :-/
-Daniel

> Christian.
>
> > -Daniel
> >
> >> Anyway my concern is really that we should stop extending functionality
> >> on the primary node.
> >>
> >> E.g. the render node is for use by the clients and the primary node for
> >> mode setting and use by the display server only.
> >>
> >> Regards,
> >> Christian.
> >>
> >>> Fwiw, this series consistently paves the way toward nuking DRM_AUTH ;-)
> >>>
> >>> 

Re: [PATCH 01/13] drm/amdgpu: introduce and honour DRM_FORCE_AUTH workaround

2019-05-28 Thread Koenig, Christian
Am 27.05.19 um 17:26 schrieb Daniel Vetter:
> On Mon, May 27, 2019 at 3:42 PM Koenig, Christian
>  wrote:
>> Am 27.05.19 um 15:26 schrieb Emil Velikov:
>>> On 2019/05/27, Daniel Vetter wrote:
 On Mon, May 27, 2019 at 10:47:39AM +, Koenig, Christian wrote:
> Am 27.05.19 um 10:17 schrieb Emil Velikov:
>> From: Emil Velikov 
>>
>> Currently one can circumvent DRM_AUTH, when the ioctl is exposed via the
>> render node. A seemingly deliberate design decision.
>>
>> Hence we can drop the DRM_AUTH all together (details in follow-up patch)
>> yet not all userspace checks if it's authenticated, but instead uses
>> uncommon assumptions.
>>
>> After days of digging through git log and testing, only a single (ab)use
>> was spotted - the Mesa RADV driver, using the AMDGPU_INFO ioctl and
>> assuming that failure implies lack of authentication.
>>
>> Affected versions are:
>> - the whole 18.2.x series, which is EOL
>> - the whole 18.3.x series, which is EOL
>> - the 19.0.x series, prior to 19.0.4
> Well you could have saved your time, cause this is still a NAK.
>
> For the record: I strongly think that we don't want to expose any render
> functionality on the primary node.
>
> To even go a step further I would say that at least on AMD hardware we
> should completely disable DRI2 for one of the next generations.
>
> As a follow up I would then completely disallow the DRM authentication
> for amdgpu, so that the command submission interface on the primary node
> can only be used by the display server.
 So amdgpu is running in one direction, while everyone else is running in
 the other direction? Please note that your patch to change i915 landed
 already, so that ship is sailing (but we could ofc revert that back
 again).

 Imo really not a great idea if we do a amdgpu vs. everyone else split
 here. If we want to deprecate dri2/flink/rendering on primary nodes across
 the stack, then that should be a stack wide decision, not an amdgpu one.

>>> Cannot agree more - I would love to see drivers stay consistent.
>> Yeah, completely agree to that. That's why I think we should not do this
>> at all and just let Intel fix it's userspace bugs :P
> So you're planning to submit that revert? You did jump the gun with
> sending out that patch after all too ... (aside from it got merged
> because of some other mixup with r-b tags and what not).

Well already regretting submitting that. On the other hand what is the 
minimum IOCTLs we need to get working to fix the vainfo issue?

Might be a good idea looking into reverting it partially, so that at 
least command submission and buffer allocation is still blocked.

Christian.

> -Daniel
>
>> Anyway my concern is really that we should stop extending functionality
>> on the primary node.
>>
>> E.g. the render node is for use by the clients and the primary node for
>> mode setting and use by the display server only.
>>
>> Regards,
>> Christian.
>>
>>> Fwiw, this series consistently paves the way toward nuking DRM_AUTH ;-)
>>>
>>> Thanks
>>> Emil
>

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [PATCH 01/13] drm/amdgpu: introduce and honour DRM_FORCE_AUTH workaround

2019-05-27 Thread Emil Velikov
On 2019/05/27, Koenig, Christian wrote:
> Am 27.05.19 um 15:26 schrieb Emil Velikov:
> > On 2019/05/27, Daniel Vetter wrote:
> >> On Mon, May 27, 2019 at 10:47:39AM +, Koenig, Christian wrote:
> >>> Am 27.05.19 um 10:17 schrieb Emil Velikov:
>  From: Emil Velikov 
> 
>  Currently one can circumvent DRM_AUTH, when the ioctl is exposed via the
>  render node. A seemingly deliberate design decision.
> 
>  Hence we can drop the DRM_AUTH all together (details in follow-up patch)
>  yet not all userspace checks if it's authenticated, but instead uses
>  uncommon assumptions.
> 
>  After days of digging through git log and testing, only a single (ab)use
>  was spotted - the Mesa RADV driver, using the AMDGPU_INFO ioctl and
>  assuming that failure implies lack of authentication.
> 
>  Affected versions are:
> - the whole 18.2.x series, which is EOL
> - the whole 18.3.x series, which is EOL
> - the 19.0.x series, prior to 19.0.4
> >>> Well you could have saved your time, cause this is still a NAK.
> >>>
> >>> For the record: I strongly think that we don't want to expose any render
> >>> functionality on the primary node.
> >>>
> >>> To even go a step further I would say that at least on AMD hardware we
> >>> should completely disable DRI2 for one of the next generations.
> >>>
> >>> As a follow up I would then completely disallow the DRM authentication
> >>> for amdgpu, so that the command submission interface on the primary node
> >>> can only be used by the display server.
> >> So amdgpu is running in one direction, while everyone else is running in
> >> the other direction? Please note that your patch to change i915 landed
> >> already, so that ship is sailing (but we could ofc revert that back
> >> again).
> >>
> >> Imo really not a great idea if we do a amdgpu vs. everyone else split
> >> here. If we want to deprecate dri2/flink/rendering on primary nodes across
> >> the stack, then that should be a stack wide decision, not an amdgpu one.
> >>
> > Cannot agree more - I would love to see drivers stay consistent.
> 
> Yeah, completely agree to that. That's why I think we should not do this 
> at all and just let Intel fix it's userspace bugs :P
> 
Pretty sure I mentioned it before - might have been too subtle:

The problem is _neither_ Intel nor libva specific.


> Anyway my concern is really that we should stop extending functionality 
> on the primary node.
> 
> E.g. the render node is for use by the clients and the primary node for 
> mode setting and use by the display server only.
> 
Fully agreed. I'm not extending anything really. If anything - code is
removed from drivers and core :-)

Thanks
Emil
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [PATCH 01/13] drm/amdgpu: introduce and honour DRM_FORCE_AUTH workaround

2019-05-27 Thread Daniel Vetter
On Mon, May 27, 2019 at 3:42 PM Koenig, Christian
 wrote:
>
> Am 27.05.19 um 15:26 schrieb Emil Velikov:
> > On 2019/05/27, Daniel Vetter wrote:
> >> On Mon, May 27, 2019 at 10:47:39AM +, Koenig, Christian wrote:
> >>> Am 27.05.19 um 10:17 schrieb Emil Velikov:
>  From: Emil Velikov 
> 
>  Currently one can circumvent DRM_AUTH, when the ioctl is exposed via the
>  render node. A seemingly deliberate design decision.
> 
>  Hence we can drop the DRM_AUTH all together (details in follow-up patch)
>  yet not all userspace checks if it's authenticated, but instead uses
>  uncommon assumptions.
> 
>  After days of digging through git log and testing, only a single (ab)use
>  was spotted - the Mesa RADV driver, using the AMDGPU_INFO ioctl and
>  assuming that failure implies lack of authentication.
> 
>  Affected versions are:
> - the whole 18.2.x series, which is EOL
> - the whole 18.3.x series, which is EOL
> - the 19.0.x series, prior to 19.0.4
> >>> Well you could have saved your time, cause this is still a NAK.
> >>>
> >>> For the record: I strongly think that we don't want to expose any render
> >>> functionality on the primary node.
> >>>
> >>> To even go a step further I would say that at least on AMD hardware we
> >>> should completely disable DRI2 for one of the next generations.
> >>>
> >>> As a follow up I would then completely disallow the DRM authentication
> >>> for amdgpu, so that the command submission interface on the primary node
> >>> can only be used by the display server.
> >> So amdgpu is running in one direction, while everyone else is running in
> >> the other direction? Please note that your patch to change i915 landed
> >> already, so that ship is sailing (but we could ofc revert that back
> >> again).
> >>
> >> Imo really not a great idea if we do a amdgpu vs. everyone else split
> >> here. If we want to deprecate dri2/flink/rendering on primary nodes across
> >> the stack, then that should be a stack wide decision, not an amdgpu one.
> >>
> > Cannot agree more - I would love to see drivers stay consistent.
>
> Yeah, completely agree to that. That's why I think we should not do this
> at all and just let Intel fix it's userspace bugs :P

So you're planning to submit that revert? You did jump the gun with
sending out that patch after all too ... (aside from it got merged
because of some other mixup with r-b tags and what not).
-Daniel

> Anyway my concern is really that we should stop extending functionality
> on the primary node.
>
> E.g. the render node is for use by the clients and the primary node for
> mode setting and use by the display server only.
>
> Regards,
> Christian.
>
> > Fwiw, this series consistently paves the way toward nuking DRM_AUTH ;-)
> >
> > Thanks
> > Emil
>


-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [PATCH 01/13] drm/amdgpu: introduce and honour DRM_FORCE_AUTH workaround

2019-05-27 Thread Emil Velikov
On 2019/05/27, Daniel Vetter wrote:
> On Mon, May 27, 2019 at 09:17:29AM +0100, Emil Velikov wrote:
> > From: Emil Velikov 
> > 
> > Currently one can circumvent DRM_AUTH, when the ioctl is exposed via the
> > render node. A seemingly deliberate design decision.
> > 
> > Hence we can drop the DRM_AUTH all together (details in follow-up patch)
> > yet not all userspace checks if it's authenticated, but instead uses
> > uncommon assumptions.
> > 
> > After days of digging through git log and testing, only a single (ab)use
> > was spotted - the Mesa RADV driver, using the AMDGPU_INFO ioctl and
> > assuming that failure implies lack of authentication.
> 
> Maybe correction here: Id does not care about authentication at all. It
> wants to figure out whether it has access to modeset resources, which is
> something entirely different, but happened to match in amdgpu's case.
> 
It feels like we have conflated the two (perhaps due to historical
reasons) and I'm not 100% sure which one the AMDGPU code inspects.

How about:

Hence we can drop the DRM_AUTH all together (details in follow-up patch)
yet that cause a regression in some userspace, when it conflates the
authentication status with access to modeset resources.

After days of digging through git log and testing, only a single (ab)use
was spotted - the Mesa RADV driver, using the AMDGPU_INFO ioctl.

> > Affected versions are:
> >  - the whole 18.2.x series, which is EOL
> >  - the whole 18.3.x series, which is EOL
> >  - the 19.0.x series, prior to 19.0.4
> 
> Hm I think I'm blind, I'm not seeing the fix merged to mesa master. Still
> there on gitlab:
> 
> https://gitlab.freedesktop.org/mesa/mesa/blob/master/src/amd/vulkan/radv_device.c#L291
> 
> What am I missing?
> 
Opted for a simple, more generic solution see commit
c962a78f18284e2971301bf68c9c60500ca398e4

This way, the same check is:
 - enforced where it's used
 - present for all Mesa Vulkan drivers


Will include the sha + commit title for v2.

> > Add a special quirk for that case, thus we can drop DRM_AUTH bits as
> > mentioned earlier.
> > 
> > Since all the affected userspace is EOL, we also add a kconfig option
> > to disable this quirk.
> > 
> > The whole approach is inspired by DRIVER_KMS_LEGACY_CONTEXT
> > 
> > Cc: Alex Deucher 
> > Cc: Christian König 
> > Cc: amd-...@lists.freedesktop.org
> > Cc: David Airlie 
> > Cc: Daniel Vetter 
> > Signed-off-by: Emil Velikov 
> 
> Aside from the nits I think reasonable approach.

Great, glad to hear. Now all we need is to bribe^Wconvince Christian.

Thanks
Emil
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [PATCH 01/13] drm/amdgpu: introduce and honour DRM_FORCE_AUTH workaround

2019-05-27 Thread Koenig, Christian
Am 27.05.19 um 15:26 schrieb Emil Velikov:
> On 2019/05/27, Daniel Vetter wrote:
>> On Mon, May 27, 2019 at 10:47:39AM +, Koenig, Christian wrote:
>>> Am 27.05.19 um 10:17 schrieb Emil Velikov:
 From: Emil Velikov 

 Currently one can circumvent DRM_AUTH, when the ioctl is exposed via the
 render node. A seemingly deliberate design decision.

 Hence we can drop the DRM_AUTH all together (details in follow-up patch)
 yet not all userspace checks if it's authenticated, but instead uses
 uncommon assumptions.

 After days of digging through git log and testing, only a single (ab)use
 was spotted - the Mesa RADV driver, using the AMDGPU_INFO ioctl and
 assuming that failure implies lack of authentication.

 Affected versions are:
- the whole 18.2.x series, which is EOL
- the whole 18.3.x series, which is EOL
- the 19.0.x series, prior to 19.0.4
>>> Well you could have saved your time, cause this is still a NAK.
>>>
>>> For the record: I strongly think that we don't want to expose any render
>>> functionality on the primary node.
>>>
>>> To even go a step further I would say that at least on AMD hardware we
>>> should completely disable DRI2 for one of the next generations.
>>>
>>> As a follow up I would then completely disallow the DRM authentication
>>> for amdgpu, so that the command submission interface on the primary node
>>> can only be used by the display server.
>> So amdgpu is running in one direction, while everyone else is running in
>> the other direction? Please note that your patch to change i915 landed
>> already, so that ship is sailing (but we could ofc revert that back
>> again).
>>
>> Imo really not a great idea if we do a amdgpu vs. everyone else split
>> here. If we want to deprecate dri2/flink/rendering on primary nodes across
>> the stack, then that should be a stack wide decision, not an amdgpu one.
>>
> Cannot agree more - I would love to see drivers stay consistent.

Yeah, completely agree to that. That's why I think we should not do this 
at all and just let Intel fix it's userspace bugs :P

Anyway my concern is really that we should stop extending functionality 
on the primary node.

E.g. the render node is for use by the clients and the primary node for 
mode setting and use by the display server only.

Regards,
Christian.

> Fwiw, this series consistently paves the way toward nuking DRM_AUTH ;-)
>
> Thanks
> Emil

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [PATCH 01/13] drm/amdgpu: introduce and honour DRM_FORCE_AUTH workaround

2019-05-27 Thread Daniel Vetter
On Mon, May 27, 2019 at 3:26 PM Daniel Vetter  wrote:
>
> On Mon, May 27, 2019 at 01:52:05PM +0100, Emil Velikov wrote:
> > On 2019/05/27, Koenig, Christian wrote:
> > > Am 27.05.19 um 14:05 schrieb Emil Velikov:
> > > > On 2019/05/27, Koenig, Christian wrote:
> > > >> Am 27.05.19 um 10:17 schrieb Emil Velikov:
> > > >>> From: Emil Velikov 
> > > >>>
> > > >>> Currently one can circumvent DRM_AUTH, when the ioctl is exposed via 
> > > >>> the
> > > >>> render node. A seemingly deliberate design decision.
> > > >>>
> > > >>> Hence we can drop the DRM_AUTH all together (details in follow-up 
> > > >>> patch)
> > > >>> yet not all userspace checks if it's authenticated, but instead uses
> > > >>> uncommon assumptions.
> > > >>>
> > > >>> After days of digging through git log and testing, only a eingle 
> > > >>> (ab)use
> > > >>> was spotted - the Mesa RADV driver, using the AMDGPU_INFO ioctl and
> > > >>> assuming that failure implies lack of authentication.
> > > >>>
> > > >>> Affected versions are:
> > > >>>- the whole 18.2.x series, which is EOL
> > > >>>- the whole 18.3.x series, which is EOL
> > > >>>- the 19.0.x series, prior to 19.0.4
> > > >> Well you could have saved your time, cause this is still a NAK.
> > > >>
> > > > Did I mention that I've spent quite a bit of time in AMDVLK? Even fixed
> > > > a bug while I was there :-)
> > >
> > > Yeah, thanks for doing this.
> > >
> > > But we have done so much stuff with customers which can't be audited
> > > this way, that I still have a really bad feeling about this :/
> > >
> > Fair point, I've already thought about those.
> >
> > Example A:  customer X, using closed source/private software Y.
> > Ioctls A, B and C require the old behaviour - simply add FORCE_AUTH to
> > the A B C and carry on happily.
>
> Hm, if the entire concern here is all the bazillion different versions of
> blobs shipped in the past few years, why can't we just have a revert of
> this in the amdgpu DKMS? Not like one more patch among the hundres will
> matter. I'd suspect that the overlap of people wanting to run the blobs
> and people who don't run the DKMS or similar is roughly 0. Always been the
> case here at Intel at least.
>
> If there's other stuff that we need to audit (like rocm or whatever), then
> we should look into those ofc.

Also note that Emil's patch actually doesn't break anything, since
default y. So you don't even need the revert (except maybe in 10 years
or so when we throw that option out).
-Daniel


> > Example B: the team, as a whole thinks that this will be problematic for
> > customer X - add FORCE_AUTH to all ioctls and carry on.
> >
> > I do see and understand why anyone can be hesitant about the series.
> >
> > IMHO the above examples, illustrate quite reasonable compromise between
> > open-source and closed-source/private solutions.
> >
> > Don't you agree?
> >
> > > >> For the record: I strongly think that we don't want to expose any 
> > > >> render
> > > >> functionality on the primary node.
> > > >>
> > > >> To even go a step further I would say that at least on AMD hardware we
> > > >> should completely disable DRI2 for one of the next generations.
> > > >>
> > > > It's doable and overall pretty neat idea.
> > > >
> > > > There is a consern that this approach may cause far more regressions
> > > > that this series. We are talking about years worth of Mesa drivers (et
> > > > al) that depend on render functionality exposed via the primary node.
> > >
> > > Yeah, that's I'm perfectly aware of. It's the reason why I said we
> > > should only do it for new hardware generations.
> > >
> > > But at some point I think we should do this and get rid of
> > > authentication/flink/DRI2/
> > >
> > Fwiw I share a similar sentiment. If you've looked through the series,
> > this is effectively step 1, towards nuking DRM_AUTH :-)
> >
> > > > I'm OK with writing the patches, but it'll be up-to the AMDGPU team to
> > > > follow-up with any regressions. Are you ok with that?
> > >
> > > As long as we have a check like adev->family_id >
> > > WHATEVER_IS_THE_CURRENT_LATEST_UPSTREAM_HW, I think we are fine with that.
> > >
> > > If I understood Michel correctly xf86-video-amdgpu should work, but
> > > modeset might break and needs a patch.
> > >
> > Unless I have concrete WHATEVER_IS_THE... I cannot do much here :-(
> >
> >
> > Thanks
> > Emil
>
> --
> Daniel Vetter
> Software Engineer, Intel Corporation
> http://blog.ffwll.ch



-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [PATCH 01/13] drm/amdgpu: introduce and honour DRM_FORCE_AUTH workaround

2019-05-27 Thread Emil Velikov
On 2019/05/27, Daniel Vetter wrote:
> On Mon, May 27, 2019 at 10:47:39AM +, Koenig, Christian wrote:
> > Am 27.05.19 um 10:17 schrieb Emil Velikov:
> > > From: Emil Velikov 
> > >
> > > Currently one can circumvent DRM_AUTH, when the ioctl is exposed via the
> > > render node. A seemingly deliberate design decision.
> > >
> > > Hence we can drop the DRM_AUTH all together (details in follow-up patch)
> > > yet not all userspace checks if it's authenticated, but instead uses
> > > uncommon assumptions.
> > >
> > > After days of digging through git log and testing, only a single (ab)use
> > > was spotted - the Mesa RADV driver, using the AMDGPU_INFO ioctl and
> > > assuming that failure implies lack of authentication.
> > >
> > > Affected versions are:
> > >   - the whole 18.2.x series, which is EOL
> > >   - the whole 18.3.x series, which is EOL
> > >   - the 19.0.x series, prior to 19.0.4
> > 
> > Well you could have saved your time, cause this is still a NAK.
> > 
> > For the record: I strongly think that we don't want to expose any render 
> > functionality on the primary node.
> > 
> > To even go a step further I would say that at least on AMD hardware we 
> > should completely disable DRI2 for one of the next generations.
> > 
> > As a follow up I would then completely disallow the DRM authentication 
> > for amdgpu, so that the command submission interface on the primary node 
> > can only be used by the display server.
> 
> So amdgpu is running in one direction, while everyone else is running in
> the other direction? Please note that your patch to change i915 landed
> already, so that ship is sailing (but we could ofc revert that back
> again).
> 
> Imo really not a great idea if we do a amdgpu vs. everyone else split
> here. If we want to deprecate dri2/flink/rendering on primary nodes across
> the stack, then that should be a stack wide decision, not an amdgpu one.
> 
Cannot agree more - I would love to see drivers stay consistent.

Fwiw, this series consistently paves the way toward nuking DRM_AUTH ;-)

Thanks
Emil
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [PATCH 01/13] drm/amdgpu: introduce and honour DRM_FORCE_AUTH workaround

2019-05-27 Thread Daniel Vetter
On Mon, May 27, 2019 at 01:52:05PM +0100, Emil Velikov wrote:
> On 2019/05/27, Koenig, Christian wrote:
> > Am 27.05.19 um 14:05 schrieb Emil Velikov:
> > > On 2019/05/27, Koenig, Christian wrote:
> > >> Am 27.05.19 um 10:17 schrieb Emil Velikov:
> > >>> From: Emil Velikov 
> > >>>
> > >>> Currently one can circumvent DRM_AUTH, when the ioctl is exposed via the
> > >>> render node. A seemingly deliberate design decision.
> > >>>
> > >>> Hence we can drop the DRM_AUTH all together (details in follow-up patch)
> > >>> yet not all userspace checks if it's authenticated, but instead uses
> > >>> uncommon assumptions.
> > >>>
> > >>> After days of digging through git log and testing, only a eingle (ab)use
> > >>> was spotted - the Mesa RADV driver, using the AMDGPU_INFO ioctl and
> > >>> assuming that failure implies lack of authentication.
> > >>>
> > >>> Affected versions are:
> > >>>- the whole 18.2.x series, which is EOL
> > >>>- the whole 18.3.x series, which is EOL
> > >>>- the 19.0.x series, prior to 19.0.4
> > >> Well you could have saved your time, cause this is still a NAK.
> > >>
> > > Did I mention that I've spent quite a bit of time in AMDVLK? Even fixed
> > > a bug while I was there :-)
> > 
> > Yeah, thanks for doing this.
> > 
> > But we have done so much stuff with customers which can't be audited 
> > this way, that I still have a really bad feeling about this :/
> > 
> Fair point, I've already thought about those.
> 
> Example A:  customer X, using closed source/private software Y.
> Ioctls A, B and C require the old behaviour - simply add FORCE_AUTH to
> the A B C and carry on happily.

Hm, if the entire concern here is all the bazillion different versions of
blobs shipped in the past few years, why can't we just have a revert of
this in the amdgpu DKMS? Not like one more patch among the hundres will
matter. I'd suspect that the overlap of people wanting to run the blobs
and people who don't run the DKMS or similar is roughly 0. Always been the
case here at Intel at least.

If there's other stuff that we need to audit (like rocm or whatever), then
we should look into those ofc.
-Daniel


> 
> Example B: the team, as a whole thinks that this will be problematic for
> customer X - add FORCE_AUTH to all ioctls and carry on.
> 
> I do see and understand why anyone can be hesitant about the series.
> 
> IMHO the above examples, illustrate quite reasonable compromise between
> open-source and closed-source/private solutions.
> 
> Don't you agree?
> 
> > >> For the record: I strongly think that we don't want to expose any render
> > >> functionality on the primary node.
> > >>
> > >> To even go a step further I would say that at least on AMD hardware we
> > >> should completely disable DRI2 for one of the next generations.
> > >>
> > > It's doable and overall pretty neat idea.
> > >
> > > There is a consern that this approach may cause far more regressions
> > > that this series. We are talking about years worth of Mesa drivers (et
> > > al) that depend on render functionality exposed via the primary node.
> > 
> > Yeah, that's I'm perfectly aware of. It's the reason why I said we 
> > should only do it for new hardware generations.
> > 
> > But at some point I think we should do this and get rid of 
> > authentication/flink/DRI2/
> > 
> Fwiw I share a similar sentiment. If you've looked through the series,
> this is effectively step 1, towards nuking DRM_AUTH :-)
> 
> > > I'm OK with writing the patches, but it'll be up-to the AMDGPU team to
> > > follow-up with any regressions. Are you ok with that?
> > 
> > As long as we have a check like adev->family_id > 
> > WHATEVER_IS_THE_CURRENT_LATEST_UPSTREAM_HW, I think we are fine with that.
> > 
> > If I understood Michel correctly xf86-video-amdgpu should work, but 
> > modeset might break and needs a patch.
> > 
> Unless I have concrete WHATEVER_IS_THE... I cannot do much here :-(
> 
> 
> Thanks
> Emil

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [PATCH 01/13] drm/amdgpu: introduce and honour DRM_FORCE_AUTH workaround

2019-05-27 Thread Daniel Vetter
On Mon, May 27, 2019 at 10:47:39AM +, Koenig, Christian wrote:
> Am 27.05.19 um 10:17 schrieb Emil Velikov:
> > From: Emil Velikov 
> >
> > Currently one can circumvent DRM_AUTH, when the ioctl is exposed via the
> > render node. A seemingly deliberate design decision.
> >
> > Hence we can drop the DRM_AUTH all together (details in follow-up patch)
> > yet not all userspace checks if it's authenticated, but instead uses
> > uncommon assumptions.
> >
> > After days of digging through git log and testing, only a single (ab)use
> > was spotted - the Mesa RADV driver, using the AMDGPU_INFO ioctl and
> > assuming that failure implies lack of authentication.
> >
> > Affected versions are:
> >   - the whole 18.2.x series, which is EOL
> >   - the whole 18.3.x series, which is EOL
> >   - the 19.0.x series, prior to 19.0.4
> 
> Well you could have saved your time, cause this is still a NAK.
> 
> For the record: I strongly think that we don't want to expose any render 
> functionality on the primary node.
> 
> To even go a step further I would say that at least on AMD hardware we 
> should completely disable DRI2 for one of the next generations.
> 
> As a follow up I would then completely disallow the DRM authentication 
> for amdgpu, so that the command submission interface on the primary node 
> can only be used by the display server.

So amdgpu is running in one direction, while everyone else is running in
the other direction? Please note that your patch to change i915 landed
already, so that ship is sailing (but we could ofc revert that back
again).

Imo really not a great idea if we do a amdgpu vs. everyone else split
here. If we want to deprecate dri2/flink/rendering on primary nodes across
the stack, then that should be a stack wide decision, not an amdgpu one.

Same for the other one, i.e. this stuff here.
-Daniel

> 
> Regards,
> Christian.
> 
> >
> > Add a special quirk for that case, thus we can drop DRM_AUTH bits as
> > mentioned earlier.
> >
> > Since all the affected userspace is EOL, we also add a kconfig option
> > to disable this quirk.
> >
> > The whole approach is inspired by DRIVER_KMS_LEGACY_CONTEXT
> >
> > Cc: Alex Deucher 
> > Cc: Christian König 
> > Cc: amd-...@lists.freedesktop.org
> > Cc: David Airlie 
> > Cc: Daniel Vetter 
> > Signed-off-by: Emil Velikov 
> > ---
> >   drivers/gpu/drm/amd/amdgpu/Kconfig  | 16 
> >   drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c | 12 +++-
> >   drivers/gpu/drm/drm_ioctl.c |  5 +
> >   include/drm/drm_ioctl.h | 17 +
> >   4 files changed, 49 insertions(+), 1 deletion(-)
> >
> > diff --git a/drivers/gpu/drm/amd/amdgpu/Kconfig 
> > b/drivers/gpu/drm/amd/amdgpu/Kconfig
> > index 9221e5489069..da415f445187 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/Kconfig
> > +++ b/drivers/gpu/drm/amd/amdgpu/Kconfig
> > @@ -40,6 +40,22 @@ config DRM_AMDGPU_GART_DEBUGFS
> >   Selecting this option creates a debugfs file to inspect the mapped
> >   pages. Uses more memory for housekeeping, enable only for debugging.
> >   
> > +config DRM_AMDGPU_FORCE_AUTH
> > +   bool "Force authentication check on AMDGPU_INFO ioctl"
> > +   default y
> > +   help
> > + There were some version of the Mesa RADV drivers, which relied on
> > + the ioctl failing, if the client is not authenticated.
> > +
> > + Namely, the following versions are affected:
> > +   - the whole 18.2.x series, which is EOL
> > +   - the whole 18.3.x series, which is EOL
> > +   - the 19.0.x series, prior to 19.0.4
> > +
> > + Modern distributions, should disable this. That will allow various
> > + other clients to work, that would otherwise require root privileges.
> > +
> > +
> >   source "drivers/gpu/drm/amd/acp/Kconfig"
> >   source "drivers/gpu/drm/amd/display/Kconfig"
> >   source "drivers/gpu/drm/amd/amdkfd/Kconfig"
> > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c 
> > b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
> > index b17d0545728e..b8076929440b 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
> > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
> > @@ -1214,7 +1214,17 @@ const struct drm_ioctl_desc amdgpu_ioctls_kms[] = {
> > DRM_IOCTL_DEF_DRV(AMDGPU_GEM_MMAP, amdgpu_gem_mmap_ioctl, 
> > DRM_AUTH|DRM_RENDER_ALLOW),
> > DRM_IOCTL_DEF_DRV(AMDGPU_GEM_WAIT_IDLE, amdgpu_gem_wait_idle_ioctl, 
> > DRM_AUTH|DRM_RENDER_ALLOW),
> > DRM_IOCTL_DEF_DRV(AMDGPU_CS, amdgpu_cs_ioctl, 
> > DRM_AUTH|DRM_RENDER_ALLOW),
> > -   DRM_IOCTL_DEF_DRV(AMDGPU_INFO, amdgpu_info_ioctl, 
> > DRM_AUTH|DRM_RENDER_ALLOW),
> > +   /* The DRM_FORCE_AUTH is effectively a workaround for the RADV Mesa 
> > driver.
> > +* This is required for Mesa:
> > +*  - the whole 18.2.x series, which is EOL
> > +*  - the whole 18.3.x series, which is EOL
> > +*  - the 19.0.x series, prior to 19.0.4
> > +*/
> > +   DRM_IOCTL_DEF_DRV(AMDGPU_INFO, amdgpu_info_ioctl,
> > +#if defined(DRM_AMDGPU_FORCE_AUTH)

Re: [PATCH 01/13] drm/amdgpu: introduce and honour DRM_FORCE_AUTH workaround

2019-05-27 Thread Daniel Vetter
On Mon, May 27, 2019 at 09:17:29AM +0100, Emil Velikov wrote:
> From: Emil Velikov 
> 
> Currently one can circumvent DRM_AUTH, when the ioctl is exposed via the
> render node. A seemingly deliberate design decision.
> 
> Hence we can drop the DRM_AUTH all together (details in follow-up patch)
> yet not all userspace checks if it's authenticated, but instead uses
> uncommon assumptions.
> 
> After days of digging through git log and testing, only a single (ab)use
> was spotted - the Mesa RADV driver, using the AMDGPU_INFO ioctl and
> assuming that failure implies lack of authentication.

Maybe correction here: Id does not care about authentication at all. It
wants to figure out whether it has access to modeset resources, which is
something entirely different, but happened to match in amdgpu's case.

> Affected versions are:
>  - the whole 18.2.x series, which is EOL
>  - the whole 18.3.x series, which is EOL
>  - the 19.0.x series, prior to 19.0.4

Hm I think I'm blind, I'm not seeing the fix merged to mesa master. Still
there on gitlab:

https://gitlab.freedesktop.org/mesa/mesa/blob/master/src/amd/vulkan/radv_device.c#L291

What am I missing?

> Add a special quirk for that case, thus we can drop DRM_AUTH bits as
> mentioned earlier.
> 
> Since all the affected userspace is EOL, we also add a kconfig option
> to disable this quirk.
> 
> The whole approach is inspired by DRIVER_KMS_LEGACY_CONTEXT
> 
> Cc: Alex Deucher 
> Cc: Christian König 
> Cc: amd-...@lists.freedesktop.org
> Cc: David Airlie 
> Cc: Daniel Vetter 
> Signed-off-by: Emil Velikov 

Aside from the nits I think reasonable approach.
-Daniel

> ---
>  drivers/gpu/drm/amd/amdgpu/Kconfig  | 16 
>  drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c | 12 +++-
>  drivers/gpu/drm/drm_ioctl.c |  5 +
>  include/drm/drm_ioctl.h | 17 +
>  4 files changed, 49 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/amd/amdgpu/Kconfig 
> b/drivers/gpu/drm/amd/amdgpu/Kconfig
> index 9221e5489069..da415f445187 100644
> --- a/drivers/gpu/drm/amd/amdgpu/Kconfig
> +++ b/drivers/gpu/drm/amd/amdgpu/Kconfig
> @@ -40,6 +40,22 @@ config DRM_AMDGPU_GART_DEBUGFS
> Selecting this option creates a debugfs file to inspect the mapped
> pages. Uses more memory for housekeeping, enable only for debugging.
>  
> +config DRM_AMDGPU_FORCE_AUTH
> + bool "Force authentication check on AMDGPU_INFO ioctl"
> + default y
> + help
> +   There were some version of the Mesa RADV drivers, which relied on
> +   the ioctl failing, if the client is not authenticated.
> +
> +   Namely, the following versions are affected:
> + - the whole 18.2.x series, which is EOL
> + - the whole 18.3.x series, which is EOL
> + - the 19.0.x series, prior to 19.0.4
> +
> +   Modern distributions, should disable this. That will allow various
> +   other clients to work, that would otherwise require root privileges.
> +
> +
>  source "drivers/gpu/drm/amd/acp/Kconfig"
>  source "drivers/gpu/drm/amd/display/Kconfig"
>  source "drivers/gpu/drm/amd/amdkfd/Kconfig"
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
> index b17d0545728e..b8076929440b 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
> @@ -1214,7 +1214,17 @@ const struct drm_ioctl_desc amdgpu_ioctls_kms[] = {
>   DRM_IOCTL_DEF_DRV(AMDGPU_GEM_MMAP, amdgpu_gem_mmap_ioctl, 
> DRM_AUTH|DRM_RENDER_ALLOW),
>   DRM_IOCTL_DEF_DRV(AMDGPU_GEM_WAIT_IDLE, amdgpu_gem_wait_idle_ioctl, 
> DRM_AUTH|DRM_RENDER_ALLOW),
>   DRM_IOCTL_DEF_DRV(AMDGPU_CS, amdgpu_cs_ioctl, 
> DRM_AUTH|DRM_RENDER_ALLOW),
> - DRM_IOCTL_DEF_DRV(AMDGPU_INFO, amdgpu_info_ioctl, 
> DRM_AUTH|DRM_RENDER_ALLOW),
> + /* The DRM_FORCE_AUTH is effectively a workaround for the RADV Mesa 
> driver.
> +  * This is required for Mesa:
> +  *  - the whole 18.2.x series, which is EOL
> +  *  - the whole 18.3.x series, which is EOL
> +  *  - the 19.0.x series, prior to 19.0.4
> +  */
> + DRM_IOCTL_DEF_DRV(AMDGPU_INFO, amdgpu_info_ioctl,
> +#if defined(DRM_AMDGPU_FORCE_AUTH)
> + DRM_FORCE_AUTH|
> +#endif
> + DRM_AUTH|DRM_RENDER_ALLOW),
>   DRM_IOCTL_DEF_DRV(AMDGPU_WAIT_CS, amdgpu_cs_wait_ioctl, 
> DRM_AUTH|DRM_RENDER_ALLOW),
>   DRM_IOCTL_DEF_DRV(AMDGPU_WAIT_FENCES, amdgpu_cs_wait_fences_ioctl, 
> DRM_AUTH|DRM_RENDER_ALLOW),
>   DRM_IOCTL_DEF_DRV(AMDGPU_GEM_METADATA, amdgpu_gem_metadata_ioctl, 
> DRM_AUTH|DRM_RENDER_ALLOW),
> diff --git a/drivers/gpu/drm/drm_ioctl.c b/drivers/gpu/drm/drm_ioctl.c
> index 2263e3ddd822..9841c0076f02 100644
> --- a/drivers/gpu/drm/drm_ioctl.c
> +++ b/drivers/gpu/drm/drm_ioctl.c
> @@ -544,6 +544,11 @@ int drm_ioctl_permit(u32 flags, struct drm_file 
> *file_priv)
>drm_is_render_client(file_priv)))
>   return -EACCES;
>  
> + 

Re: [PATCH 01/13] drm/amdgpu: introduce and honour DRM_FORCE_AUTH workaround

2019-05-27 Thread Emil Velikov
On 2019/05/27, Koenig, Christian wrote:
> Am 27.05.19 um 14:05 schrieb Emil Velikov:
> > On 2019/05/27, Koenig, Christian wrote:
> >> Am 27.05.19 um 10:17 schrieb Emil Velikov:
> >>> From: Emil Velikov 
> >>>
> >>> Currently one can circumvent DRM_AUTH, when the ioctl is exposed via the
> >>> render node. A seemingly deliberate design decision.
> >>>
> >>> Hence we can drop the DRM_AUTH all together (details in follow-up patch)
> >>> yet not all userspace checks if it's authenticated, but instead uses
> >>> uncommon assumptions.
> >>>
> >>> After days of digging through git log and testing, only a eingle (ab)use
> >>> was spotted - the Mesa RADV driver, using the AMDGPU_INFO ioctl and
> >>> assuming that failure implies lack of authentication.
> >>>
> >>> Affected versions are:
> >>>- the whole 18.2.x series, which is EOL
> >>>- the whole 18.3.x series, which is EOL
> >>>- the 19.0.x series, prior to 19.0.4
> >> Well you could have saved your time, cause this is still a NAK.
> >>
> > Did I mention that I've spent quite a bit of time in AMDVLK? Even fixed
> > a bug while I was there :-)
> 
> Yeah, thanks for doing this.
> 
> But we have done so much stuff with customers which can't be audited 
> this way, that I still have a really bad feeling about this :/
> 
Fair point, I've already thought about those.

Example A:  customer X, using closed source/private software Y.
Ioctls A, B and C require the old behaviour - simply add FORCE_AUTH to
the A B C and carry on happily.

Example B: the team, as a whole thinks that this will be problematic for
customer X - add FORCE_AUTH to all ioctls and carry on.

I do see and understand why anyone can be hesitant about the series.

IMHO the above examples, illustrate quite reasonable compromise between
open-source and closed-source/private solutions.

Don't you agree?

> >> For the record: I strongly think that we don't want to expose any render
> >> functionality on the primary node.
> >>
> >> To even go a step further I would say that at least on AMD hardware we
> >> should completely disable DRI2 for one of the next generations.
> >>
> > It's doable and overall pretty neat idea.
> >
> > There is a consern that this approach may cause far more regressions
> > that this series. We are talking about years worth of Mesa drivers (et
> > al) that depend on render functionality exposed via the primary node.
> 
> Yeah, that's I'm perfectly aware of. It's the reason why I said we 
> should only do it for new hardware generations.
> 
> But at some point I think we should do this and get rid of 
> authentication/flink/DRI2/
> 
Fwiw I share a similar sentiment. If you've looked through the series,
this is effectively step 1, towards nuking DRM_AUTH :-)

> > I'm OK with writing the patches, but it'll be up-to the AMDGPU team to
> > follow-up with any regressions. Are you ok with that?
> 
> As long as we have a check like adev->family_id > 
> WHATEVER_IS_THE_CURRENT_LATEST_UPSTREAM_HW, I think we are fine with that.
> 
> If I understood Michel correctly xf86-video-amdgpu should work, but 
> modeset might break and needs a patch.
> 
Unless I have concrete WHATEVER_IS_THE... I cannot do much here :-(


Thanks
Emil
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [PATCH 01/13] drm/amdgpu: introduce and honour DRM_FORCE_AUTH workaround

2019-05-27 Thread Koenig, Christian
Am 27.05.19 um 14:05 schrieb Emil Velikov:
> On 2019/05/27, Koenig, Christian wrote:
>> Am 27.05.19 um 10:17 schrieb Emil Velikov:
>>> From: Emil Velikov 
>>>
>>> Currently one can circumvent DRM_AUTH, when the ioctl is exposed via the
>>> render node. A seemingly deliberate design decision.
>>>
>>> Hence we can drop the DRM_AUTH all together (details in follow-up patch)
>>> yet not all userspace checks if it's authenticated, but instead uses
>>> uncommon assumptions.
>>>
>>> After days of digging through git log and testing, only a single (ab)use
>>> was spotted - the Mesa RADV driver, using the AMDGPU_INFO ioctl and
>>> assuming that failure implies lack of authentication.
>>>
>>> Affected versions are:
>>>- the whole 18.2.x series, which is EOL
>>>- the whole 18.3.x series, which is EOL
>>>- the 19.0.x series, prior to 19.0.4
>> Well you could have saved your time, cause this is still a NAK.
>>
> Did I mention that I've spent quite a bit of time in AMDVLK? Even fixed
> a bug while I was there :-)

Yeah, thanks for doing this.

But we have done so much stuff with customers which can't be audited 
this way, that I still have a really bad feeling about this :/

>> For the record: I strongly think that we don't want to expose any render
>> functionality on the primary node.
>>
>> To even go a step further I would say that at least on AMD hardware we
>> should completely disable DRI2 for one of the next generations.
>>
> It's doable and overall pretty neat idea.
>
> There is a consern that this approach may cause far more regressions
> that this series. We are talking about years worth of Mesa drivers (et
> al) that depend on render functionality exposed via the primary node.

Yeah, that's I'm perfectly aware of. It's the reason why I said we 
should only do it for new hardware generations.

But at some point I think we should do this and get rid of 
authentication/flink/DRI2/

> I'm OK with writing the patches, but it'll be up-to the AMDGPU team to
> follow-up with any regressions. Are you ok with that?

As long as we have a check like adev->family_id > 
WHATEVER_IS_THE_CURRENT_LATEST_UPSTREAM_HW, I think we are fine with that.

If I understood Michel correctly xf86-video-amdgpu should work, but 
modeset might break and needs a patch.

> Fwiw I could also move the FORCE_AUTH hack to core drm, if you prefer.

Well, the hack is the least of my concerns.

Thanks,
Christian.

>
> Thanks
> Emil

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [PATCH 01/13] drm/amdgpu: introduce and honour DRM_FORCE_AUTH workaround

2019-05-27 Thread Emil Velikov
On 2019/05/27, Koenig, Christian wrote:
> Am 27.05.19 um 10:17 schrieb Emil Velikov:
> > From: Emil Velikov 
> >
> > Currently one can circumvent DRM_AUTH, when the ioctl is exposed via the
> > render node. A seemingly deliberate design decision.
> >
> > Hence we can drop the DRM_AUTH all together (details in follow-up patch)
> > yet not all userspace checks if it's authenticated, but instead uses
> > uncommon assumptions.
> >
> > After days of digging through git log and testing, only a single (ab)use
> > was spotted - the Mesa RADV driver, using the AMDGPU_INFO ioctl and
> > assuming that failure implies lack of authentication.
> >
> > Affected versions are:
> >   - the whole 18.2.x series, which is EOL
> >   - the whole 18.3.x series, which is EOL
> >   - the 19.0.x series, prior to 19.0.4
> 
> Well you could have saved your time, cause this is still a NAK.
> 
Did I mention that I've spent quite a bit of time in AMDVLK? Even fixed 
a bug while I was there :-)

> For the record: I strongly think that we don't want to expose any render 
> functionality on the primary node.
> 
> To even go a step further I would say that at least on AMD hardware we 
> should completely disable DRI2 for one of the next generations.
>
It's doable and overall pretty neat idea.

There is a consern that this approach may cause far more regressions
that this series. We are talking about years worth of Mesa drivers (et
al) that depend on render functionality exposed via the primary node.

I'm OK with writing the patches, but it'll be up-to the AMDGPU team to
follow-up with any regressions. Are you ok with that?

Fwiw I could also move the FORCE_AUTH hack to core drm, if you prefer.

Thanks
Emil
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [PATCH 01/13] drm/amdgpu: introduce and honour DRM_FORCE_AUTH workaround

2019-05-27 Thread Koenig, Christian
Am 27.05.19 um 10:17 schrieb Emil Velikov:
> From: Emil Velikov 
>
> Currently one can circumvent DRM_AUTH, when the ioctl is exposed via the
> render node. A seemingly deliberate design decision.
>
> Hence we can drop the DRM_AUTH all together (details in follow-up patch)
> yet not all userspace checks if it's authenticated, but instead uses
> uncommon assumptions.
>
> After days of digging through git log and testing, only a single (ab)use
> was spotted - the Mesa RADV driver, using the AMDGPU_INFO ioctl and
> assuming that failure implies lack of authentication.
>
> Affected versions are:
>   - the whole 18.2.x series, which is EOL
>   - the whole 18.3.x series, which is EOL
>   - the 19.0.x series, prior to 19.0.4

Well you could have saved your time, cause this is still a NAK.

For the record: I strongly think that we don't want to expose any render 
functionality on the primary node.

To even go a step further I would say that at least on AMD hardware we 
should completely disable DRI2 for one of the next generations.

As a follow up I would then completely disallow the DRM authentication 
for amdgpu, so that the command submission interface on the primary node 
can only be used by the display server.

Regards,
Christian.

>
> Add a special quirk for that case, thus we can drop DRM_AUTH bits as
> mentioned earlier.
>
> Since all the affected userspace is EOL, we also add a kconfig option
> to disable this quirk.
>
> The whole approach is inspired by DRIVER_KMS_LEGACY_CONTEXT
>
> Cc: Alex Deucher 
> Cc: Christian König 
> Cc: amd-...@lists.freedesktop.org
> Cc: David Airlie 
> Cc: Daniel Vetter 
> Signed-off-by: Emil Velikov 
> ---
>   drivers/gpu/drm/amd/amdgpu/Kconfig  | 16 
>   drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c | 12 +++-
>   drivers/gpu/drm/drm_ioctl.c |  5 +
>   include/drm/drm_ioctl.h | 17 +
>   4 files changed, 49 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/Kconfig 
> b/drivers/gpu/drm/amd/amdgpu/Kconfig
> index 9221e5489069..da415f445187 100644
> --- a/drivers/gpu/drm/amd/amdgpu/Kconfig
> +++ b/drivers/gpu/drm/amd/amdgpu/Kconfig
> @@ -40,6 +40,22 @@ config DRM_AMDGPU_GART_DEBUGFS
> Selecting this option creates a debugfs file to inspect the mapped
> pages. Uses more memory for housekeeping, enable only for debugging.
>   
> +config DRM_AMDGPU_FORCE_AUTH
> + bool "Force authentication check on AMDGPU_INFO ioctl"
> + default y
> + help
> +   There were some version of the Mesa RADV drivers, which relied on
> +   the ioctl failing, if the client is not authenticated.
> +
> +   Namely, the following versions are affected:
> + - the whole 18.2.x series, which is EOL
> + - the whole 18.3.x series, which is EOL
> + - the 19.0.x series, prior to 19.0.4
> +
> +   Modern distributions, should disable this. That will allow various
> +   other clients to work, that would otherwise require root privileges.
> +
> +
>   source "drivers/gpu/drm/amd/acp/Kconfig"
>   source "drivers/gpu/drm/amd/display/Kconfig"
>   source "drivers/gpu/drm/amd/amdkfd/Kconfig"
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
> index b17d0545728e..b8076929440b 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
> @@ -1214,7 +1214,17 @@ const struct drm_ioctl_desc amdgpu_ioctls_kms[] = {
>   DRM_IOCTL_DEF_DRV(AMDGPU_GEM_MMAP, amdgpu_gem_mmap_ioctl, 
> DRM_AUTH|DRM_RENDER_ALLOW),
>   DRM_IOCTL_DEF_DRV(AMDGPU_GEM_WAIT_IDLE, amdgpu_gem_wait_idle_ioctl, 
> DRM_AUTH|DRM_RENDER_ALLOW),
>   DRM_IOCTL_DEF_DRV(AMDGPU_CS, amdgpu_cs_ioctl, 
> DRM_AUTH|DRM_RENDER_ALLOW),
> - DRM_IOCTL_DEF_DRV(AMDGPU_INFO, amdgpu_info_ioctl, 
> DRM_AUTH|DRM_RENDER_ALLOW),
> + /* The DRM_FORCE_AUTH is effectively a workaround for the RADV Mesa 
> driver.
> +  * This is required for Mesa:
> +  *  - the whole 18.2.x series, which is EOL
> +  *  - the whole 18.3.x series, which is EOL
> +  *  - the 19.0.x series, prior to 19.0.4
> +  */
> + DRM_IOCTL_DEF_DRV(AMDGPU_INFO, amdgpu_info_ioctl,
> +#if defined(DRM_AMDGPU_FORCE_AUTH)
> + DRM_FORCE_AUTH|
> +#endif
> + DRM_AUTH|DRM_RENDER_ALLOW),
>   DRM_IOCTL_DEF_DRV(AMDGPU_WAIT_CS, amdgpu_cs_wait_ioctl, 
> DRM_AUTH|DRM_RENDER_ALLOW),
>   DRM_IOCTL_DEF_DRV(AMDGPU_WAIT_FENCES, amdgpu_cs_wait_fences_ioctl, 
> DRM_AUTH|DRM_RENDER_ALLOW),
>   DRM_IOCTL_DEF_DRV(AMDGPU_GEM_METADATA, amdgpu_gem_metadata_ioctl, 
> DRM_AUTH|DRM_RENDER_ALLOW),
> diff --git a/drivers/gpu/drm/drm_ioctl.c b/drivers/gpu/drm/drm_ioctl.c
> index 2263e3ddd822..9841c0076f02 100644
> --- a/drivers/gpu/drm/drm_ioctl.c
> +++ b/drivers/gpu/drm/drm_ioctl.c
> @@ -544,6 +544,11 @@ int drm_ioctl_permit(u32 flags, struct drm_file 
> *file_priv)
>drm_is_render_client(file_priv)))
>