Thinking of a different design:
1. Master python process builds and compiles all theano functions like
normal (for GPU), and pickles them.
2. Worker processes initialize on other GPUs and unpickle all the functions.
3. User calls wrapped theano functions in master process, which signals to
Ah, too bad! Ok thanks for the warning, I'll stick to the multiprocessing
approach for now. If I can think of a different way which is more
generalized, I'll let you know.
Aside from making the code nicer, I was also hoping to use the NCCL
collectives. I saw some work in libgpuarray towards
Hi,
I have a concept of how to implement data parallelism to utilize multiple
GPUs, and I'd appreciate any feedback before I start on this.
First, some background:
--I'm working within an established, fairly complex code base. It builds
NNs using Lasagne and computes gradients and other
Might be related: https://github.com/Theano/libgpuarray/issues/404
On Tuesday, April 11, 2017 at 8:11:28 AM UTC-7, nouiz wrote:
>
> What is your cuda version? Can you update to cuda 8? Can you update cudnn
> to version 6?
>
> It seem the error is inside cudnn, so updating it could fix the
Hi,
I am holding an array on the GPU (in a shared variable), and I'm sampling
random minibatches from it, but it seems there is a call to HostFromGpu at
every index, which causes significant delay. Is there a way to avoid this?
Here is a minimal code example, plus the debug and profiling
to read about any of your experiences.
Thanks,
Adam
On Tuesday, January 23, 2018 at 4:19:13 PM UTC-8, Adam Stooke wrote:
>
> I realize now the above example might seem strange where I make the
> "selections" an explicit list, rather than just feeding the "idxs" direct
uot;, np.allclose(np_answer,
dest_map.get_value()))
# print("map time: ", t_map)
if IDX:
print("Theano idx values pass: ", np.allclose(np_answer,
dest_idx.get_value()))
# print("idx time: ", t_idx)
On Friday, January 19, 2018 at 12:42:16 PM