Hi Haael, hi 黄若尘, On 21 November 2014 10:55, <ha...@interia.pl> wrote: > I would suggest a different approach, more similar to Armin's idea of > parallelization. > > You could just optimistically assume that the loop is parallelizable. Just > execute few steps at once (each in its own memory sandbox) and check for > conflicts later. This also plays nice with STM.
I thought about that too, but the granularity is very wrong for STM: the overhead of running tiny transactions will completely dwarf any potential speed gains. If we're talking about tiny transactions then maybe HTM would be more suitable. I have no idea if HTM will ever start appearing on GPU, though. Moreover, you still have the general hard problems of automatic parallelization, like communicating between threads the progress made; unless it is carefully done on a case-by-case basis by a human, this often adds (again) considerable overheads. To 黄若尘: here's a quick answer to your question. It's not very clean, but I would patch rpython/jit/backend/x86/regalloc.py, prepare_loop(), just after it calls _prepare(). It gets a list of rewritten operations ready to be turned into assembler. I guess you'd need to check at this point if the loop contains only operations you support, and if so, produce some different code (possibly GPU). Then either abort the job here by raising some exception, or if it makes sense, change the 'operations' list so that it becomes just a few assembler instructions that will start and stop the GPU code. My own two cents about this project, however, is that it's relatively easy to support a few special cases, but it quickly becomes very, very hard to support more general code. You are likely to end up with a system that only compiles to GPU some very specific templates and nothing else. The end result for a user is obscure, because he won't get to use the GPU unless he writes loops that follow exactly some very strict rules. I certainly see why the end user might prefer to use a DSL instead: i.e. he knows he wants to use the GPU at specific places, and he is ready to use a separate very restricted "language" to express what he wants to do, as long as it is guaranteed to use the GPU. (The needs in this case are very different from the general PyPy JIT, which tries to accelerate any Python code.) A bientôt, Armin. _______________________________________________ pypy-dev mailing list pypy-dev@python.org https://mail.python.org/mailman/listinfo/pypy-dev