If you can create a theano vector that has all of the i's and a second theano vector that has all of the j's, then you can just do i*j and will will perform all of the multiplications in parallel.
On Wednesday, October 26, 2016 at 11:48:06 PM UTC-7, [email protected] wrote: > > I would like to compute the result of i*j for a number of i's and j's, and > I would like to do so concurrently. If I use the scan function over my > sequence of i's and j's, I will get my desired result, but it will not > perform the operations concurrently. If I have 100 cores in my single GPU, > I would like there to be 100 asynchronous computations (technically more > since each core has multiple threads) of the multiplication and final > assignment to one vector that will be returned. This is similar to how > multiprocessing works in base python with CPU cores. The Theano tutorial > claims that it uses GPU asynchronous capabilities, but I am not sure of > that as I have ran scan functions, and they seems to go as fast or slower > than the CPU. > > Should I not use scan? Can this even be done in Theano? Do I have to use > PyCUDA? > -- --- You received this message because you are subscribed to the Google Groups "theano-users" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/d/optout.
