I would like to compute the result of i*j for a number of i's and j's, and 
I would like to do so concurrently. If I use the scan function over my 
sequence of i's and j's, I will get my desired result, but it will not 
perform the operations concurrently. If I have 100 cores in my single GPU, 
I would like there to be 100 asynchronous computations (technically more 
since each core has multiple threads) of the multiplication and final 
assignment to one vector that will be returned. This is similar to how 
multiprocessing works in base python with CPU cores. The Theano tutorial 
claims that it uses GPU asynchronous capabilities, but I am not sure of 
that as I have ran scan functions, and they seems to go as fast or slower 
than the CPU.

Should I not use scan? Can this even be done in Theano? Do I have to use 
PyCUDA?

-- 

--- 
You received this message because you are subscribed to the Google Groups 
"theano-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Reply via email to