Re: [theano-users] Avoiding HostFromGPU at every Index into Shared Variable?

2018-02-07 Thread Frédéric Bastien
On the GPU, not all indexing are fast. The slices are fast (just a view). But on advanced indexing, only this version have been well optimized: a_tensor[a_vector_of_int] The vector_of_int can be on any of the dimensions from memory. But for sure on the first dimensions. We have code that

[theano-users] Avoiding HostFromGPU at every Index into Shared Variable?

2018-01-19 Thread Adam Stooke
Hi, I am holding an array on the GPU (in a shared variable), and I'm sampling random minibatches from it, but it seems there is a call to HostFromGpu at every index, which causes significant delay. Is there a way to avoid this? Here is a minimal code example, plus the debug and profiling