[Intel-gfx] [PATCH 1/5] drm: Kernel Crash in drm_unlock

2015-05-05 Thread Daniel Vetter
On Tue, May 05, 2015 at 06:37:51AM +, Antoine, Peter wrote:
> On Mon, 2015-05-04 at 15:52 +0200, Daniel Vetter wrote:
> > On Tue, Apr 28, 2015 at 10:52:32AM +0100, chris at chris-wilson.co.uk wrote:
> > > On Tue, Apr 28, 2015 at 10:21:49AM +0100, Dave Gordon wrote:
> > > > On 24/04/15 06:52, Antoine, Peter wrote:
> > > > > I picked up this work due to the following Jira ticket created by the
> > > > > security team (on Android) and was asked to give it a second look and
> > > > > found a few more issues with the hw lock code.
> > > > > 
> > > > > https://jira01.devtools.intel.com/browse/GMINL-5388
> > > > > I/O control on /dev/dri/card0 crashes the kernel (0x4008642b)
> > > > > 
> > > > > It also stops Linux as it kills the driver, I guess it might be 
> > > > > possible
> > > > > to reload the gfx driver. On a unpatched system the test that is
> > > > > included in the issue or the igt test that has been posted for the 
> > > > > issue
> > > > > will show the problem.
> > > > > 
> > > > > I ran the test on an unpatched system here and the gui stopped and the
> > > > > keyboard stopped responding, so I rebooted. With the patched system I
> > > > > did not need to reboot.
> > > > > 
> > > > > Should I change the SIGTERM to SIGSEGV, not quite the same thing but
> > > > > tooling is better at handling a segfault than a SIGTERM and the
> > > > > application that calls this IOCTL is using an uninitialised hw lock so
> > > > > it is kind of the same as differencing an uninitialised pointer (kind
> > > > > of). Or, I could just remove it, but the bug has been in the code for 
> > > > > at
> > > > > least two years (and known about), and I would guess that any code 
> > > > > that
> > > > > is calling this is fuzzing the IOCTLs (as this is how the security 
> > > > > team
> > > > > found it) and we should reward them with a application exit.
> > > > > 
> > > > > Peter. 
> > > > 
> > > > SIGSEGV would be a better choice.
> > > > 
> > > > SIGTERM is normally sent by a user -- it's the default signal sent by
> > > > kill(1). It's also commonly used to tell a long-running daemon process
> > > > to tidy up and exit cleanly.
> > > > 
> > > > SIGSEGV commonly means "you accessed something that doesn't exist/isn't
> > > > mapped/you don't have permissions for". There are specific subcases that
> > > > can be indicated via the siginfo data; this is from the sigaction(1)
> > > > manpage:
> > > > 
> > > > The following values can be placed in si_code for a SIGSEGV signal:
> > > > 
> > > > SEGV_MAPERRaddress not mapped to object
> > > > 
> > > > SEGV_ACCERRinvalid permissions for mapped object
> > > > 
> > > > SIGBUS would also be a possibility but that's generally taken to mean
> > > > that an access got all the way to some physical bus and then faulted,
> > > > whereas SIGSEGV suggests the access was rejected during the
> > > > virtual-to-physical mapping process.
> > > 
> > > None of the above. Just return -EINVAL, -EPERM, -EACCESS as appropriate.
> > 
> > Seconded, we really don't want to be in the business of fixing up the drm
> > design mistakes of the past 15 years. As long as we can fully lock out
> > this particular dragon when running i915 we're imo good enough. The dri1
> > design of a kernel shim driver cooperating with the ums driver for hw
> > ownership is fundamentally unfixable.
> > 
> > Also we can't change any of it for drivers actually using it since it'll
> > break them, which is a big no-go.
> > -Daniel
> 
> I will remove it. But, If you are using this code path the driver/kernel
> will have crashed. It covers a NULL pointer deference, so we are not
> changing the API that anyone is actually using.

With ums/dri1 userspace can crash the kernel. That's pretty much by
design, and it's pretty much unfixable. Trying to fix up the obvious ones
like here is just cargo-culting. The only real option is to make
absolutely sure we can't ever reach these codepaths at all from kms
drivers like i915. Which tends to be a lot more work for auditing, but
will be much more useful.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch


[Intel-gfx] [PATCH 1/5] drm: Kernel Crash in drm_unlock

2015-05-05 Thread Antoine, Peter
On Mon, 2015-05-04 at 15:52 +0200, Daniel Vetter wrote:
> On Tue, Apr 28, 2015 at 10:52:32AM +0100, chris at chris-wilson.co.uk wrote:
> > On Tue, Apr 28, 2015 at 10:21:49AM +0100, Dave Gordon wrote:
> > > On 24/04/15 06:52, Antoine, Peter wrote:
> > > > I picked up this work due to the following Jira ticket created by the
> > > > security team (on Android) and was asked to give it a second look and
> > > > found a few more issues with the hw lock code.
> > > > 
> > > > https://jira01.devtools.intel.com/browse/GMINL-5388
> > > > I/O control on /dev/dri/card0 crashes the kernel (0x4008642b)
> > > > 
> > > > It also stops Linux as it kills the driver, I guess it might be possible
> > > > to reload the gfx driver. On a unpatched system the test that is
> > > > included in the issue or the igt test that has been posted for the issue
> > > > will show the problem.
> > > > 
> > > > I ran the test on an unpatched system here and the gui stopped and the
> > > > keyboard stopped responding, so I rebooted. With the patched system I
> > > > did not need to reboot.
> > > > 
> > > > Should I change the SIGTERM to SIGSEGV, not quite the same thing but
> > > > tooling is better at handling a segfault than a SIGTERM and the
> > > > application that calls this IOCTL is using an uninitialised hw lock so
> > > > it is kind of the same as differencing an uninitialised pointer (kind
> > > > of). Or, I could just remove it, but the bug has been in the code for at
> > > > least two years (and known about), and I would guess that any code that
> > > > is calling this is fuzzing the IOCTLs (as this is how the security team
> > > > found it) and we should reward them with a application exit.
> > > > 
> > > > Peter. 
> > > 
> > > SIGSEGV would be a better choice.
> > > 
> > > SIGTERM is normally sent by a user -- it's the default signal sent by
> > > kill(1). It's also commonly used to tell a long-running daemon process
> > > to tidy up and exit cleanly.
> > > 
> > > SIGSEGV commonly means "you accessed something that doesn't exist/isn't
> > > mapped/you don't have permissions for". There are specific subcases that
> > > can be indicated via the siginfo data; this is from the sigaction(1)
> > > manpage:
> > > 
> > > The following values can be placed in si_code for a SIGSEGV signal:
> > > 
> > > SEGV_MAPERRaddress not mapped to object
> > > 
> > > SEGV_ACCERRinvalid permissions for mapped object
> > > 
> > > SIGBUS would also be a possibility but that's generally taken to mean
> > > that an access got all the way to some physical bus and then faulted,
> > > whereas SIGSEGV suggests the access was rejected during the
> > > virtual-to-physical mapping process.
> > 
> > None of the above. Just return -EINVAL, -EPERM, -EACCESS as appropriate.
> 
> Seconded, we really don't want to be in the business of fixing up the drm
> design mistakes of the past 15 years. As long as we can fully lock out
> this particular dragon when running i915 we're imo good enough. The dri1
> design of a kernel shim driver cooperating with the ums driver for hw
> ownership is fundamentally unfixable.
> 
> Also we can't change any of it for drivers actually using it since it'll
> break them, which is a big no-go.
> -Daniel

I will remove it. But, If you are using this code path the driver/kernel
will have crashed. It covers a NULL pointer deference, so we are not
changing the API that anyone is actually using.

Peter. 



[Intel-gfx] [PATCH 1/5] drm: Kernel Crash in drm_unlock

2015-05-04 Thread Daniel Vetter
On Tue, Apr 28, 2015 at 10:52:32AM +0100, chris at chris-wilson.co.uk wrote:
> On Tue, Apr 28, 2015 at 10:21:49AM +0100, Dave Gordon wrote:
> > On 24/04/15 06:52, Antoine, Peter wrote:
> > > I picked up this work due to the following Jira ticket created by the
> > > security team (on Android) and was asked to give it a second look and
> > > found a few more issues with the hw lock code.
> > > 
> > > https://jira01.devtools.intel.com/browse/GMINL-5388
> > > I/O control on /dev/dri/card0 crashes the kernel (0x4008642b)
> > > 
> > > It also stops Linux as it kills the driver, I guess it might be possible
> > > to reload the gfx driver. On a unpatched system the test that is
> > > included in the issue or the igt test that has been posted for the issue
> > > will show the problem.
> > > 
> > > I ran the test on an unpatched system here and the gui stopped and the
> > > keyboard stopped responding, so I rebooted. With the patched system I
> > > did not need to reboot.
> > > 
> > > Should I change the SIGTERM to SIGSEGV, not quite the same thing but
> > > tooling is better at handling a segfault than a SIGTERM and the
> > > application that calls this IOCTL is using an uninitialised hw lock so
> > > it is kind of the same as differencing an uninitialised pointer (kind
> > > of). Or, I could just remove it, but the bug has been in the code for at
> > > least two years (and known about), and I would guess that any code that
> > > is calling this is fuzzing the IOCTLs (as this is how the security team
> > > found it) and we should reward them with a application exit.
> > > 
> > > Peter. 
> > 
> > SIGSEGV would be a better choice.
> > 
> > SIGTERM is normally sent by a user -- it's the default signal sent by
> > kill(1). It's also commonly used to tell a long-running daemon process
> > to tidy up and exit cleanly.
> > 
> > SIGSEGV commonly means "you accessed something that doesn't exist/isn't
> > mapped/you don't have permissions for". There are specific subcases that
> > can be indicated via the siginfo data; this is from the sigaction(1)
> > manpage:
> > 
> > The following values can be placed in si_code for a SIGSEGV signal:
> > 
> > SEGV_MAPERRaddress not mapped to object
> > 
> > SEGV_ACCERRinvalid permissions for mapped object
> > 
> > SIGBUS would also be a possibility but that's generally taken to mean
> > that an access got all the way to some physical bus and then faulted,
> > whereas SIGSEGV suggests the access was rejected during the
> > virtual-to-physical mapping process.
> 
> None of the above. Just return -EINVAL, -EPERM, -EACCESS as appropriate.

Seconded, we really don't want to be in the business of fixing up the drm
design mistakes of the past 15 years. As long as we can fully lock out
this particular dragon when running i915 we're imo good enough. The dri1
design of a kernel shim driver cooperating with the ums driver for hw
ownership is fundamentally unfixable.

Also we can't change any of it for drivers actually using it since it'll
break them, which is a big no-go.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch


[Intel-gfx] [PATCH 1/5] drm: Kernel Crash in drm_unlock

2015-04-28 Thread Dave Gordon
On 28/04/15 10:21, Dave Gordon wrote:
> On 24/04/15 06:52, Antoine, Peter wrote:
>> I picked up this work due to the following Jira ticket created by the
>> security team (on Android) and was asked to give it a second look and
>> found a few more issues with the hw lock code.
>>
>> https://jira01.devtools.intel.com/browse/GMINL-5388
>> I/O control on /dev/dri/card0 crashes the kernel (0x4008642b)
>>
>> It also stops Linux as it kills the driver, I guess it might be possible
>> to reload the gfx driver. On a unpatched system the test that is
>> included in the issue or the igt test that has been posted for the issue
>> will show the problem.
>>
>> I ran the test on an unpatched system here and the gui stopped and the
>> keyboard stopped responding, so I rebooted. With the patched system I
>> did not need to reboot.
>>
>> Should I change the SIGTERM to SIGSEGV, not quite the same thing but
>> tooling is better at handling a segfault than a SIGTERM and the
>> application that calls this IOCTL is using an uninitialised hw lock so
>> it is kind of the same as differencing an uninitialised pointer (kind
>> of). Or, I could just remove it, but the bug has been in the code for at
>> least two years (and known about), and I would guess that any code that
>> is calling this is fuzzing the IOCTLs (as this is how the security team
>> found it) and we should reward them with a application exit.
>>
>> Peter. 
> 
> SIGSEGV would be a better choice.

[snip]

Nope, I've changed my mind about this. I thought the problematic case
was just a process releasing the lock without having acquired it, but on
further examination it's really that the DRM master process has gone
away or otherwise deleted (or never created?) the lock. And THEN the
(non-master?) process tries to release the (now) non-existent lock.

But more importantly, there's existing code in the acquire-lock path
that sends SIGTERM for this case, so obviously the release-lock code
should be as similar as possible.

.Dave.


[Intel-gfx] [PATCH 1/5] drm: Kernel Crash in drm_unlock

2015-04-28 Thread ch...@chris-wilson.co.uk
On Tue, Apr 28, 2015 at 10:21:49AM +0100, Dave Gordon wrote:
> On 24/04/15 06:52, Antoine, Peter wrote:
> > I picked up this work due to the following Jira ticket created by the
> > security team (on Android) and was asked to give it a second look and
> > found a few more issues with the hw lock code.
> > 
> > https://jira01.devtools.intel.com/browse/GMINL-5388
> > I/O control on /dev/dri/card0 crashes the kernel (0x4008642b)
> > 
> > It also stops Linux as it kills the driver, I guess it might be possible
> > to reload the gfx driver. On a unpatched system the test that is
> > included in the issue or the igt test that has been posted for the issue
> > will show the problem.
> > 
> > I ran the test on an unpatched system here and the gui stopped and the
> > keyboard stopped responding, so I rebooted. With the patched system I
> > did not need to reboot.
> > 
> > Should I change the SIGTERM to SIGSEGV, not quite the same thing but
> > tooling is better at handling a segfault than a SIGTERM and the
> > application that calls this IOCTL is using an uninitialised hw lock so
> > it is kind of the same as differencing an uninitialised pointer (kind
> > of). Or, I could just remove it, but the bug has been in the code for at
> > least two years (and known about), and I would guess that any code that
> > is calling this is fuzzing the IOCTLs (as this is how the security team
> > found it) and we should reward them with a application exit.
> > 
> > Peter. 
> 
> SIGSEGV would be a better choice.
> 
> SIGTERM is normally sent by a user -- it's the default signal sent by
> kill(1). It's also commonly used to tell a long-running daemon process
> to tidy up and exit cleanly.
> 
> SIGSEGV commonly means "you accessed something that doesn't exist/isn't
> mapped/you don't have permissions for". There are specific subcases that
> can be indicated via the siginfo data; this is from the sigaction(1)
> manpage:
> 
> The following values can be placed in si_code for a SIGSEGV signal:
> 
> SEGV_MAPERRaddress not mapped to object
> 
> SEGV_ACCERRinvalid permissions for mapped object
> 
> SIGBUS would also be a possibility but that's generally taken to mean
> that an access got all the way to some physical bus and then faulted,
> whereas SIGSEGV suggests the access was rejected during the
> virtual-to-physical mapping process.

None of the above. Just return -EINVAL, -EPERM, -EACCESS as appropriate.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre


[Intel-gfx] [PATCH 1/5] drm: Kernel Crash in drm_unlock

2015-04-28 Thread Dave Gordon
On 24/04/15 06:52, Antoine, Peter wrote:
> I picked up this work due to the following Jira ticket created by the
> security team (on Android) and was asked to give it a second look and
> found a few more issues with the hw lock code.
> 
> https://jira01.devtools.intel.com/browse/GMINL-5388
> I/O control on /dev/dri/card0 crashes the kernel (0x4008642b)
> 
> It also stops Linux as it kills the driver, I guess it might be possible
> to reload the gfx driver. On a unpatched system the test that is
> included in the issue or the igt test that has been posted for the issue
> will show the problem.
> 
> I ran the test on an unpatched system here and the gui stopped and the
> keyboard stopped responding, so I rebooted. With the patched system I
> did not need to reboot.
> 
> Should I change the SIGTERM to SIGSEGV, not quite the same thing but
> tooling is better at handling a segfault than a SIGTERM and the
> application that calls this IOCTL is using an uninitialised hw lock so
> it is kind of the same as differencing an uninitialised pointer (kind
> of). Or, I could just remove it, but the bug has been in the code for at
> least two years (and known about), and I would guess that any code that
> is calling this is fuzzing the IOCTLs (as this is how the security team
> found it) and we should reward them with a application exit.
> 
> Peter. 

SIGSEGV would be a better choice.

SIGTERM is normally sent by a user -- it's the default signal sent by
kill(1). It's also commonly used to tell a long-running daemon process
to tidy up and exit cleanly.

SIGSEGV commonly means "you accessed something that doesn't exist/isn't
mapped/you don't have permissions for". There are specific subcases that
can be indicated via the siginfo data; this is from the sigaction(1)
manpage:

The following values can be placed in si_code for a SIGSEGV signal:

SEGV_MAPERRaddress not mapped to object

SEGV_ACCERRinvalid permissions for mapped object

SIGBUS would also be a possibility but that's generally taken to mean
that an access got all the way to some physical bus and then faulted,
whereas SIGSEGV suggests the access was rejected during the
virtual-to-physical mapping process.

.Dave.


[Intel-gfx] [PATCH 1/5] drm: Kernel Crash in drm_unlock

2015-04-24 Thread Antoine, Peter
I picked up this work due to the following Jira ticket created by the
security team (on Android) and was asked to give it a second look and
found a few more issues with the hw lock code.

https://jira01.devtools.intel.com/browse/GMINL-5388
I/O control on /dev/dri/card0 crashes the kernel (0x4008642b)

It also stops Linux as it kills the driver, I guess it might be possible
to reload the gfx driver. On a unpatched system the test that is
included in the issue or the igt test that has been posted for the issue
will show the problem.

I ran the test on an unpatched system here and the gui stopped and the
keyboard stopped responding, so I rebooted. With the patched system I
did not need to reboot.

Should I change the SIGTERM to SIGSEGV, not quite the same thing but
tooling is better at handling a segfault than a SIGTERM and the
application that calls this IOCTL is using an uninitialised hw lock so
it is kind of the same as differencing an uninitialised pointer (kind
of). Or, I could just remove it, but the bug has been in the code for at
least two years (and known about), and I would guess that any code that
is calling this is fuzzing the IOCTLs (as this is how the security team
found it) and we should reward them with a application exit.

Peter. 


On Thu, 2015-04-23 at 15:39 +0100, Chris Wilson wrote:
> On Thu, Apr 23, 2015 at 02:34:24PM +, Antoine, Peter wrote:
> > Before the patch the system required rebooting (driver crash and/or kernel 
> > panic).
> > Now the application gets terminated.
> 
> It should have an GPF which should not have required a reboot, but
> terminated the application. Unless you set it to automatically reboot.
> -Chris
> 



[Intel-gfx] [PATCH 1/5] drm: Kernel Crash in drm_unlock

2015-04-23 Thread Chris Wilson
On Thu, Apr 23, 2015 at 02:34:24PM +, Antoine, Peter wrote:
> Before the patch the system required rebooting (driver crash and/or kernel 
> panic).
> Now the application gets terminated.

It should have an GPF which should not have required a reboot, but
terminated the application. Unless you set it to automatically reboot.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre


[Intel-gfx] [PATCH 1/5] drm: Kernel Crash in drm_unlock

2015-04-23 Thread Chris Wilson
On Thu, Apr 23, 2015 at 03:07:54PM +0100, Peter Antoine wrote:
> This patch fixes a possible kernel crash when drm_unlock (DRM_IOCTL_UNLOCK)
> is called by a application that has not had a lock created by it. This
> crash can be caused by any application from all users.

Crashing the application is still a crash...
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre


[PATCH 1/5] drm: Kernel Crash in drm_unlock

2015-04-23 Thread Peter Antoine
This patch fixes a possible kernel crash when drm_unlock (DRM_IOCTL_UNLOCK)
is called by a application that has not had a lock created by it. This
crash can be caused by any application from all users.

Issue: VIZ-5485
Signed-off-by: Peter Antoine 
---
 drivers/gpu/drm/drm_lock.c | 8 
 1 file changed, 8 insertions(+)

diff --git a/drivers/gpu/drm/drm_lock.c b/drivers/gpu/drm/drm_lock.c
index f861361..070dd5d 100644
--- a/drivers/gpu/drm/drm_lock.c
+++ b/drivers/gpu/drm/drm_lock.c
@@ -159,6 +159,14 @@ int drm_legacy_unlock(struct drm_device *dev, void *data, 
struct drm_file *file_
return -EINVAL;
}

+   if (!master->lock.hw_lock) {
+   DRM_ERROR(
+   "Device has been unregistered. Hard exit. Process %d\n",
+   task_pid_nr(current));
+   send_sig(SIGTERM, current, 0);
+   return -EPERM;
+   }
+
if (drm_legacy_lock_free(>lock, lock->context)) {
/* FIXME: Should really bail out here. */
}
-- 
1.9.1



[Intel-gfx] [PATCH 1/5] drm: Kernel Crash in drm_unlock

2015-04-23 Thread Antoine, Peter
Before the patch the system required rebooting (driver crash and/or kernel 
panic).
Now the application gets terminated.

This follows the pattern in the other parts of this code that checks that 
pointer.

Peter.

-Original Message-
From: Chris Wilson [mailto:ch...@chris-wilson.co.uk] 
Sent: Thursday, April 23, 2015 3:20 PM
To: Antoine, Peter
Cc: intel-gfx at lists.freedesktop.org; airlied at redhat.com; dri-devel at 
lists.freedesktop.org; daniel.vetter at ffwll.ch
Subject: Re: [Intel-gfx] [PATCH 1/5] drm: Kernel Crash in drm_unlock

On Thu, Apr 23, 2015 at 03:07:54PM +0100, Peter Antoine wrote:
> This patch fixes a possible kernel crash when drm_unlock 
> (DRM_IOCTL_UNLOCK) is called by a application that has not had a lock 
> created by it. This crash can be caused by any application from all users.

Crashing the application is still a crash...
-Chris

--
Chris Wilson, Intel Open Source Technology Centre