Forgot to cc the list. Darn. :) ---------- Forwarded Message ----------
Betreff: Re: [PyCUDA] broadcasting and strided data Datum: Donnerstag 27 August 2009 Von: Andreas Klöckner <[email protected]> An: James Bergstra <[email protected]> On Dienstag 25 August 2009, you wrote: > It probably requires the expertise of a few people to get the design > right, so I'm reluctant even to try to put a patch together. First, > it requires some changes to the data container. Some of the issues > that come up are: > - what should be the strides for broadcastable dimensions (I like 0, > but numpy does it differently) Assigning a stride zero seems to be a good "simple" way, even though it seems like that might waste some processor power on unneeded index math. How does numpy do it? > - should strides be in data-type units or byte units I find this somewhat irrelevant--for the kernels themselves, data-type units are likely more useful, especially if texturing is used. For storage, looking like numpy by using byte offsets might be the way to go. Since doing the conversion on the host right ahead of the kernel invocation is easy and cheap, I don't see why we can't have our cake and eat it, too. (see also next question) > - should strides and dimensions be stored in host memory, device > memory, or both (how/when should they be synchronized?) Host memory seems to be the right place, as kernel parameters, originating from there, are the only way by which a variable can be easily spread to each thread, without incurring a global mem access penalty. > As the data structure gets more complicated, the kernels become more > complex too. My experience is that all kernels have to have a > "general" version that is pretty slow, and progressively, more and > more special cases get optimized. I find it helpful to do things the other way around. Solve a rather special case first, then generalize. Even incremental solutions are valuable. > Kernel code generators get bloated. Deciding on the right complexity for the generators is definitely an issue. Rome wasn't built in a day. Going about this incrementally and not rushing it seems like a wise idea. You're not on your own. > How many kinds of kernels are there in PyCUDA right now? (Given that > the same code-generator can produce many elementwise kernels, I mean > to count that as one *kind* of kernel.) How many things would break > if arrays were strided? Two. Elementwise kernels and reduction kernels are the kinds currently implemented. All of the GpuArray functionality is written in terms of these two. Andreas -------------------------------------------------------
signature.asc
Description: This is a digitally signed message part.
_______________________________________________ PyCUDA mailing list [email protected] http://tiker.net/mailman/listinfo/pycuda_tiker.net
