Hi Max,

On Fri, Apr 25, 2014 at 4:24 AM, Max Argus <[email protected]> wrote:

> For my code indexing to get coherent memory access is most important,
> for this I would(if possible) like to have abstract objects that
> handle* this abstractly. Lets say that for a reduction in which the
> order is unimportant I want to be able to do "for index in
> orange(array)" in such a way that orange is a python object that will
> give me boring linear objects so the code still works there but during
> optimization/translation it can be replaced by more elaborate options.
> (1) How were you planning to deal with these things?

I was not really planning anything more complicated than just a
translation from n-dimensional array indices to a flat pointer to GPU
memory. That is, I was only hoping to achieve a more convenient way to
write kernels, in Python instead of a Mako + C mixture I use now in
Reikna. The optimizations you are doing seem interesting, but I think
they are quite separate from the translation process itself.

> If you plan to just always unroll everything ( don't know if this is
> desirable ) such things might need to happen before peval. Btw how do
> I get peval to do that. At the moment I do:

Currently there is no unrolling functionality (as I mentioned earlier,
I'm currently making some architecture change to simplify the addition
of new features). I was planning something along the lines of:
1) automatic unrolling (heuristic is yet unclear)
2) force unrolling — recognizing the code like "for i in unroll(range(n)):"

> (2) I quite liked my decorators to specify what is being optimized,
> would it be possible to preserve this interface, peval should take the
> outer decorators and ignore those that it doesn't know.

Peval preserves all the decorators at the moment (at least it should).
There are some caveats with the decorators though (due to the way
peval discovers the function code), see
https://github.com/Manticore/peval/blob/master/docs/source/index.rst
for details.

> (3) add_spec.replace("__binding_1","range(100)"), how do I get peval
> to unroll my loop for me?

See above. Currently this feature is not implemented.

> (4) It might be good to skip the parsing step.
> partial_apply(...).getAST() or something similar would be helpful.

partial_apply() returns a normal callable function, with the proper
signature, globals, closure and so on, so adding some getAST() method
is not really desirable. The problem here is that the source of this
function cannot be discovered by inspect.getsource(), and therefore
you cannot parse it and get its AST. What can be done is exposing the
internal Function class which, in addition to encapsulating some
Python magic of extracting global and closure variables, knows where
to look for the source of the function object constructed by peval. It
has a 'tree' attribute containing the AST.

> (5) Would the threaded fenced reduction(cuda sample) be a good program
> to demo eval->translate with it seem to have templates, indexing for
> coherent memory access, and be relatively fundamental/important. Plus
> it already has optimized c++ cuda code for comparison.

Yes, I think it will be a very good example. It can also demonstrate
passing and using a custom predicate (as another GPU function) and
working with arbitrary structures instead of integers/floats.

Best regards,
Bogdan

_______________________________________________
PyCUDA mailing list
[email protected]
http://lists.tiker.net/listinfo/pycuda

Reply via email to