Re: preventing GPU reset DoS

2009-09-23 Thread Bruno Kleinert
Am Mittwoch, den 23.09.2009, 06:10 +1000 schrieb Dave Airlie:
 I'm just wondering what other use-case we'd need anything more
 agressive?

Hi,

the computer rooms at my University came to my mind: There are students
who want to do their homework for the computer graphics lectures. If it
*is* possible to lock a machine for a longer time, they could lock up
machine by machine by accident and the computer room guidelines say that
it's forbidden to turn off/reset computers and the administrators aren't
around all the time. This could also be done intentionally to annoy
other users...

Another scenario could be an Internet cafe or so.

I don't care if I can or can not DoS myself on my own machine, but I see
a potential problem in multi-user environments. Maybe having the
possibility to adjust protection during run-time would be a nice to
have feature.

Kind regards - Fuddl


signature.asc
Description: Dies ist ein digital signierter Nachrichtenteil
--
Come build with us! The BlackBerryreg; Developer Conference in SF, CA
is the only developer event you need to attend this year. Jumpstart your
developing skills, take BlackBerry mobile applications to market and stay 
ahead of the curve. Join us from November 9#45;12, 2009. Register now#33;
http://p.sf.net/sfu/devconf--
___
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel


Re: preventing GPU reset DoS

2009-09-23 Thread Nicolai Hähnle
Am Tuesday 22 September 2009 23:25:09 schrieb Pauli Nieminen:
 Too bad GPU reset is already now stopping this use case while it doesn't
 protect user from possible attack causing multiple GPU reset in row. So
 this long rendering operation blocking GPU is more like scheduler or mesa
 bug that it doesn't split rendering to small enough parts that we can
 scheduler something else in between for user interface. Is it possible to
 scheduler something els to GPU wile only part of GPU runs the slow and long
 running shader? If no then it looks like big limitation in hw design.

I would hope the hardware people thought of this on newer GPUs, but at least I 
haven't seen anything to support context switching in the docs released by 
AMD.

As for the rest, I agree that it's a problem. It is actually roughly the same 
problem as when the system goes into a swapping loop of death, except it may 
actually be easier to identify the culprit. After all, by simply checking 
which fences have already been written back by the GPU, we should be able to 
determine which client caused the currently executing command stream.

That probably does require adding some more tracking, but perhaps it can be 
integrated into the existing fence mechanisms.

The second part would be to punish applications that have caused GPU hangs. 
Frankly, killing them seems like a bad idea; it seems better to de-prioritize 
them and force them to wait before sending new command buffer.

Another major worry is that we should somehow make sure that the X server - or 
alternative future display servers - will not become victims of this regime. 
After all, if the X server services an indirect rendering GLX client, it could 
also be hoodwinked by this client into submitting too-long-running command 
streams.

If the DRM clients get appropriate feedback when they caused a GPU reset, the 
X server could potentially use this information to punish GLX clients 
accordingly.

cu,
Nicolai

--
Come build with us! The BlackBerryreg; Developer Conference in SF, CA
is the only developer event you need to attend this year. Jumpstart your
developing skills, take BlackBerry mobile applications to market and stay 
ahead of the curve. Join us from November 9#45;12, 2009. Register now#33;
http://p.sf.net/sfu/devconf
--
___
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel


Re: preventing GPU reset DoS

2009-09-23 Thread Pauli Nieminen
On Wed, Sep 23, 2009 at 10:52 PM, Nicolai Hähnle nhaeh...@gmail.com wrote:

 Am Tuesday 22 September 2009 23:25:09 schrieb Pauli Nieminen:
  Too bad GPU reset is already now stopping this use case while it doesn't
  protect user from possible attack causing multiple GPU reset in row. So
  this long rendering operation blocking GPU is more like scheduler or mesa
  bug that it doesn't split rendering to small enough parts that we can
  scheduler something else in between for user interface. Is it possible to
  scheduler something els to GPU wile only part of GPU runs the slow and
 long
  running shader? If no then it looks like big limitation in hw design.

 I would hope the hardware people thought of this on newer GPUs, but at
 least I
 haven't seen anything to support context switching in the docs released by
 AMD.

 As for the rest, I agree that it's a problem. It is actually roughly the
 same
 problem as when the system goes into a swapping loop of death, except it
 may
 actually be easier to identify the culprit. After all, by simply checking
 which fences have already been written back by the GPU, we should be able
 to
 determine which client caused the currently executing command stream.


Now I  remember some talk that WDDM driver model requires preemptive
scheduling from driver so maybe r600+ cards have preemptive scheduling
support at least in some forms.

That probably does require adding some more tracking, but perhaps it can be
 integrated into the existing fence mechanisms.

 The second part would be to punish applications that have caused GPU hangs.
 Frankly, killing them seems like a bad idea; it seems better to
 de-prioritize
 them and force them to wait before sending new command buffer.

 Problem here is that each GPU hang will last over 500ms before GPU is
reset. It might be something like first lower the priority and then if hangs
continue then start killing the application. I think that GPU hang is more
like memory access violation in normal application so it should cause crash.

Rendering of application will anyway be broken after reset because some of
rendering operations failed and image would be corrupted.


 Another major worry is that we should somehow make sure that the X server -
 or
 alternative future display servers - will not become victims of this
 regime.
 After all, if the X server services an indirect rendering GLX client, it
 could
 also be hoodwinked by this client into submitting too-long-running command
 streams.

 If the DRM clients get appropriate feedback when they caused a GPU reset,
 the
 X server could potentially use this information to punish GLX clients
 accordingly.

 cu,
 Nicolai


So we would need secure communication link with xserver so it could
cooperate after GPU hang and penalize the broken application the way that is
tough to be correct. In my option sending fatal signal is the best option
but if all the problems in keeping broken application running can be fixed
somehow then it could do something else.
--
Come build with us! The BlackBerryreg; Developer Conference in SF, CA
is the only developer event you need to attend this year. Jumpstart your
developing skills, take BlackBerry mobile applications to market and stay 
ahead of the curve. Join us from November 9#45;12, 2009. Register now#33;
http://p.sf.net/sfu/devconf--
___
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel


preventing GPU reset DoS

2009-09-22 Thread Pauli Nieminen
Hi!

I have been thinking GPU reset as possible DoS attack from
user-space.Problem here is that display doesn't work anymore at all if
attacker chooses to run a application that constantly causes GPU hang. It
would be of course ideal to have CS checker not to let in any problematic
combinations of commands. Butin practice we can't assume that everything is
safe with all hardware so we need to take some actions prevent possible
problems.

So first defense would be terminating application that did send command
stream that caused GPU hang. But attacker could easily by-pass this
protection with forking all the time new processes.

So we need stronger defense if same user account is causing multiple hangs
in short time frame. I would think temporary denying new DRI access would
let user to gain back control of system and take actions to stop the
problematic program from running.

Pauli
--
Come build with us! The BlackBerryreg; Developer Conference in SF, CA
is the only developer event you need to attend this year. Jumpstart your
developing skills, take BlackBerry mobile applications to market and stay 
ahead of the curve. Join us from November 9#45;12, 2009. Register now#33;
http://p.sf.net/sfu/devconf--
___
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel


Re: preventing GPU reset DoS

2009-09-22 Thread Keith Whitwell
On Tue, 2009-09-22 at 12:13 -0700, Pauli Nieminen wrote:
 Hi!
 
 I have been thinking GPU reset as possible DoS attack from
 user-space.Problem here is that display doesn't work anymore at all if
 attacker chooses to run a application that constantly causes GPU hang.
 It would be of course ideal to have CS checker not to let in any
 problematic combinations of commands. Butin practice we can't assume
 that everything is safe with all hardware so we need to take some
 actions prevent possible problems.
 
 So first defense would be terminating application that did send
 command stream that caused GPU hang. But attacker could easily by-pass
 this protection with forking all the time new processes.
 
 So we need stronger defense if same user account is causing multiple
 hangs in short time frame. I would think temporary denying new DRI
 access would let user to gain back control of system and take actions
 to stop the problematic program from running.

OK, but you'd want to be able to turn it off for developers -- you've
just described my normal workflow...

Keith


--
Come build with us! The BlackBerryreg; Developer Conference in SF, CA
is the only developer event you need to attend this year. Jumpstart your
developing skills, take BlackBerry mobile applications to market and stay 
ahead of the curve. Join us from November 9#45;12, 2009. Register now#33;
http://p.sf.net/sfu/devconf
--
___
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel


Re: preventing GPU reset DoS

2009-09-22 Thread Dave Airlie
On Wed, Sep 23, 2009 at 5:13 AM, Pauli Nieminen suok...@gmail.com wrote:
 Hi!

 I have been thinking GPU reset as possible DoS attack from
 user-space.Problem here is that display doesn't work anymore at all if
 attacker chooses to run a application that constantly causes GPU hang. It
 would be of course ideal to have CS checker not to let in any problematic
 combinations of commands. Butin practice we can't assume that everything is
 safe with all hardware so we need to take some actions prevent possible
 problems.

 So first defense would be terminating application that did send command
 stream that caused GPU hang. But attacker could easily by-pass this
 protection with forking all the time new processes.

 So we need stronger defense if same user account is causing multiple hangs
 in short time frame. I would think temporary denying new DRI access would
 let user to gain back control of system and take actions to stop the
 problematic program from running.


It depends on what sort of system you are talking about, in a normal
desktop/laptop user type systems, the DoS is either going to be
sitting at it in which case the power button, or caused by an app
running on it, possibly X, in which case killing it will solve the issue.

In a multi-user gpgpu environment, if someone starts a DoS, it
should only lock up the GPU not the CPU, unless the start a CPU
DoS as well, so in that case an admin can always ssh in and
kill the DoS user a/c etc.

I'm just wondering what other use-case we'd need anything more
agressive?

Dave.

--
Come build with us! The BlackBerryreg; Developer Conference in SF, CA
is the only developer event you need to attend this year. Jumpstart your
developing skills, take BlackBerry mobile applications to market and stay 
ahead of the curve. Join us from November 9#45;12, 2009. Register now#33;
http://p.sf.net/sfu/devconf
--
___
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel


Re: preventing GPU reset DoS

2009-09-22 Thread Nicolai Hähnle
Am Tuesday 22 September 2009 21:13:47 schrieb Pauli Nieminen:
 Hi!

 I have been thinking GPU reset as possible DoS attack from
 user-space.Problem here is that display doesn't work anymore at all if
 attacker chooses to run a application that constantly causes GPU hang. It
 would be of course ideal to have CS checker not to let in any problematic
 combinations of commands. Butin practice we can't assume that everything is
 safe with all hardware so we need to take some actions prevent possible
 problems.

I'm pretty confident that you can write a perfectly legal OpenGL application 
that creates commands that take *minutes* to run on decent graphics cards. 
Just produce a huge number of screen-sized primitives and use a very long 
fragment program that samples from a huge, non-mipmapped texture in an 
entirely non-locally-coherent way - and that doesn't even take GLSL loops into 
account!

So this is really not so much about safe-guarding against illegal hardware 
commands, but about how to deal with the fact that we can't do pre-emptive 
scheduling on the GPU.

In the end, the symptoms are the same, but it might change how you think about 
the problem.

 So first defense would be terminating application that did send command
 stream that caused GPU hang. But attacker could easily by-pass this
 protection with forking all the time new processes.

 So we need stronger defense if same user account is causing multiple hangs
 in short time frame. I would think temporary denying new DRI access would
 let user to gain back control of system and take actions to stop the
 problematic program from running.

I have a feeling that this needs a solution that cooperates across the whole 
stack ...

cu,
Nicolai

--
Come build with us! The BlackBerryreg; Developer Conference in SF, CA
is the only developer event you need to attend this year. Jumpstart your
developing skills, take BlackBerry mobile applications to market and stay 
ahead of the curve. Join us from November 9#45;12, 2009. Register now#33;
http://p.sf.net/sfu/devconf
--
___
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel


Re: preventing GPU reset DoS

2009-09-22 Thread Corbin Simpson
On 09/22/2009 01:19 PM, Nicolai Hähnle wrote:
 I'm pretty confident that you can write a perfectly legal OpenGL application 
 that creates commands that take *minutes* to run on decent graphics cards. 
 Just produce a huge number of screen-sized primitives and use a very long 
 fragment program that samples from a huge, non-mipmapped texture in an 
 entirely non-locally-coherent way - and that doesn't even take GLSL loops 
 into 
 account!

I don't have any examples with me, but you could, as an admittedly
contrived possibility, take several hundred thousand verts, submit them
in immediate mode without VBOs, multitexture from all eight texture
samplers using 256x256x256 3D textures, use 4 4096x4096 render targets,
and 16x FSAA multisample the whole shebang. Should (barely) fit on a
512MB r500, and take at least half a minute to execute, probably more.

And as you said, GLSL can be used to write *very* long-running shaders.

Really, there's no solution to this that won't also lock out legitimate
uses, I fear.

~ C.

--
Come build with us! The BlackBerryreg; Developer Conference in SF, CA
is the only developer event you need to attend this year. Jumpstart your
developing skills, take BlackBerry mobile applications to market and stay 
ahead of the curve. Join us from November 9#45;12, 2009. Register now#33;
http://p.sf.net/sfu/devconf
--
___
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel


Re: preventing GPU reset DoS

2009-09-22 Thread Pauli Nieminen
On Tue, Sep 22, 2009 at 11:51 PM, Corbin Simpson
mostawesomed...@gmail.comwrote:

 On 09/22/2009 01:19 PM, Nicolai Hähnle wrote:
  I'm pretty confident that you can write a perfectly legal OpenGL
 application
  that creates commands that take *minutes* to run on decent graphics
 cards.
  Just produce a huge number of screen-sized primitives and use a very long
  fragment program that samples from a huge, non-mipmapped texture in an
  entirely non-locally-coherent way - and that doesn't even take GLSL loops
 into
  account!

 I don't have any examples with me, but you could, as an admittedly
 contrived possibility, take several hundred thousand verts, submit them
 in immediate mode without VBOs, multitexture from all eight texture
 samplers using 256x256x256 3D textures, use 4 4096x4096 render targets,
 and 16x FSAA multisample the whole shebang. Should (barely) fit on a
 512MB r500, and take at least half a minute to execute, probably more.

 And as you said, GLSL can be used to write *very* long-running shaders.

 Really, there's no solution to this that won't also lock out legitimate
 uses, I fear.

 ~ C.

 Too bad GPU reset is already now stopping this use case while it doesn't
protect user from possible attack causing multiple GPU reset in row. So this
long rendering operation blocking GPU is more like scheduler or mesa bug
that it doesn't split rendering to small enough parts that we can scheduler
something else in between for user interface. Is it possible to scheduler
something els to GPU wile only part of GPU runs the slow and long running
shader? If no then it looks like big limitation in hw design.

I can see also possible attacks that use GPU hang as way to disable
computers from local use possible scenario. And if this only requires access
to normal user account it is real problem and should be protected. It is
possible that someone would write virus that targets Linux desktop and tries
to casue harm to users.
This kind of virus in corporate network causing many computers fall to
unusable state would cost quite a lot. Even tough admin could clean the
virus with ssh but it still cost a lot of money when computer is unusable
even for short time.

Do I have to come up with more scenarios why GPU reset would need to protect
local access to computer? I think it is just bad if normal user can cause
local access to be unusable for extended period of time.

Of course this protections has to be configurable in runtime or boot time.

And I posted idea here before even exploring how it would be practically
possible to get more toughs what should be taken in account.
--
Come build with us! The BlackBerryreg; Developer Conference in SF, CA
is the only developer event you need to attend this year. Jumpstart your
developing skills, take BlackBerry mobile applications to market and stay 
ahead of the curve. Join us from November 9#45;12, 2009. Register now#33;
http://p.sf.net/sfu/devconf--
___
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel