Hi Max, On Fri, Apr 25, 2014 at 4:24 AM, Max Argus <[email protected]> wrote:
> For my code indexing to get coherent memory access is most important, > for this I would(if possible) like to have abstract objects that > handle* this abstractly. Lets say that for a reduction in which the > order is unimportant I want to be able to do "for index in > orange(array)" in such a way that orange is a python object that will > give me boring linear objects so the code still works there but during > optimization/translation it can be replaced by more elaborate options. > (1) How were you planning to deal with these things? I was not really planning anything more complicated than just a translation from n-dimensional array indices to a flat pointer to GPU memory. That is, I was only hoping to achieve a more convenient way to write kernels, in Python instead of a Mako + C mixture I use now in Reikna. The optimizations you are doing seem interesting, but I think they are quite separate from the translation process itself. > If you plan to just always unroll everything ( don't know if this is > desirable ) such things might need to happen before peval. Btw how do > I get peval to do that. At the moment I do: Currently there is no unrolling functionality (as I mentioned earlier, I'm currently making some architecture change to simplify the addition of new features). I was planning something along the lines of: 1) automatic unrolling (heuristic is yet unclear) 2) force unrolling — recognizing the code like "for i in unroll(range(n)):" > (2) I quite liked my decorators to specify what is being optimized, > would it be possible to preserve this interface, peval should take the > outer decorators and ignore those that it doesn't know. Peval preserves all the decorators at the moment (at least it should). There are some caveats with the decorators though (due to the way peval discovers the function code), see https://github.com/Manticore/peval/blob/master/docs/source/index.rst for details. > (3) add_spec.replace("__binding_1","range(100)"), how do I get peval > to unroll my loop for me? See above. Currently this feature is not implemented. > (4) It might be good to skip the parsing step. > partial_apply(...).getAST() or something similar would be helpful. partial_apply() returns a normal callable function, with the proper signature, globals, closure and so on, so adding some getAST() method is not really desirable. The problem here is that the source of this function cannot be discovered by inspect.getsource(), and therefore you cannot parse it and get its AST. What can be done is exposing the internal Function class which, in addition to encapsulating some Python magic of extracting global and closure variables, knows where to look for the source of the function object constructed by peval. It has a 'tree' attribute containing the AST. > (5) Would the threaded fenced reduction(cuda sample) be a good program > to demo eval->translate with it seem to have templates, indexing for > coherent memory access, and be relatively fundamental/important. Plus > it already has optimized c++ cuda code for comparison. Yes, I think it will be a very good example. It can also demonstrate passing and using a custom predicate (as another GPU function) and working with arbitrary structures instead of integers/floats. Best regards, Bogdan _______________________________________________ PyCUDA mailing list [email protected] http://lists.tiker.net/listinfo/pycuda
