Hi all,
I have a problem that can be split into pieces of different sizes.
Essentially, the larger the size is, the more efficiently it runs.
However, on Windows (and I understand similar things happen on Linux) a
single GPU kernel launch cannot take more than 5 seconds on XP or 2
seconds on Vista/Win7, or the Timeout Detection and Recovery (TDR)
system will terminate it and raise an error (also causing the screen to
flash). My problem is that I want to run my kernels for as long as
possible for maximum efficiency, but I don't know how long the kernel
launch will take as a function of problem size until I run it. I could
profile my functions and work out something that would probably work,
but this is for a software package that will be used by third parties,
and I'd like it to be handled automatically (and preferably without the
screen flashes, which will disturb users).
Has anyone worked out a good way of dealing with this?
One option is to increase the TDR window as detailed in:
http://www.microsoft.com/whdc/device/display/wddm_timeout.mspx
This might have adverse effects though, and I'm not sure all users of my
package would be happy changing these values (it's also not automatic).
Another option is to have two GPUs, one of which is not attached to a
monitor and only used in compute mode (as discussed at
http://forums.nvidia.com/index.php?showtopic=171630). Again, fine for me
(I have two), but not so good for users who I guess in many cases will
only have one.
A final option that I thought of would be to check for a launch timeout
failure after each kernel launch, and if it happens, divide my problem
size by two and try again, repeating until I don't get any launch
failures. The trouble with this approach is that I'll get multiple
failures and screen flashes before it settles down to a value that
works, wasting a little bit of time but more importantly being quite
alarming. It also doesn't feel very elegant... ;-)
Any other ideas or experiences dealing with this problem?
Dan
_______________________________________________
PyCUDA mailing list
[email protected]
http://lists.tiker.net/listinfo/pycuda