I don’t really know enough about the details of python threading or the cuda runtime model to really understand why it is tripped up when multiple threads talk to the same context; but from my limited understanding, I would say it must be because even though all functional calls are given serially, python thread switching may happen at any time, and this can break assumptions made in the cuda runtime model.

It occurred to me that any of the concurrency libraries built on top of the greenlet model should be able to solve that problem though; since different execution paths yield at specified times, one should be able to circumvent troubles of this kind (which again, is the only kind I can think of). I never worked with greenlets, and again do not really know what is tripping up cuda, so I do not really have a concrete solution, and might be missing something. But I do know that having a more pythonic way of exposing kernel-level paralellism in pycuda/pyopencl would be awesome.

http://pypi.python.org/pypi/greenlet



-----Oorspronkelijk bericht----- From: Andreas Kloeckner
Sent: Sunday, October 07, 2012 11:33 PM
To: Freddie Witherden ; [email protected]
Subject: Re: [PyCUDA] Fwd: Re: Contexts and Threading

Freddie Witherden <[email protected]> writes:
-------- Original Message --------
Subject: Re: [PyCUDA] Contexts and Threading
Date: Sat, 29 Sep 2012 19:08:09 +0200
From: Eelco Hoogendoorn <[email protected]>
To: Freddie Witherden <[email protected]>



Actually; Seems I should RTFM; see the pycuda FAQ
Combining threads and streams does not seem to work at all (or I am doing
something really stupid). Seems like you need to init the context in the
thread, and can not share it between them.

At least for the thing I have in mind, creating a context per thread
wouldn’t
really make sense; a context has a huge overhead, and trying to get
multiple
contexts to play nicely on the same device at the same time has so far
eluded me as well.

That is rather disappointing, as it seems there is no way around the hacky
state machine stream nonsense, if you want to run a lot of small kernels in
parallel (I am thinking millions of calls, each of which would be lucky to
saturate a single SMP)

Am I missing something?

Does CUDA allow multiple threads to be attached to a context nowadays?
Note all of this sharing between threads is a non-issue (i.e. works
without a problem) in OpenCL. Currently, each thread needs its own
kernel, though... (see message on PyOpenCL list that I'll send in a
bit).

Andreas


_______________________________________________
PyCUDA mailing list
[email protected]
http://lists.tiker.net/listinfo/pycuda

_______________________________________________
PyCUDA mailing list
[email protected]
http://lists.tiker.net/listinfo/pycuda

Reply via email to