Re: [theano-users] Re: Info about types on new GPU backend

Frédéric Bastien Wed, 15 Feb 2017 09:05:25 -0800

I have no idea if what you propose would work well. You can make a new OP
that use pycuda for the computation. We do that for our fft op in the new
back-end:


https://github.com/Theano/Theano/blob/master/theano/gpuarray/fft.py

On Sat, Feb 11, 2017 at 6:45 AM Kiuhnm Mnhuik <[email protected]> wrote:

> What do you mean by "reusing a row"? If each core does one and only one
> reduction on a single row, therefore there shouldn't be any reuse.
> I mean that one core of the GPU accesses and reduces one and only one
> specific row:
>
>        *** row1 ***        <----- just core 1
>        *** row2 ***        <----- just core 2
>        *** row3 ***        <----- just core 3
>        *** row4 ***        <----- just core 4
>        *** row5 ***        <----- just core 5
>
> This makes sense because there are so many rows that all cores can run in
> parallel, each one working on its own row.
>
> Reductions aren't usually a bottleneck, but I'm doing something quite
> unusual.
>
> Can I use pyCuda to work *directly* on Theano data already allocated on
> the GPU? This might be my only option. I can't copy or move the data back
> to the CPU or it'll kill performance.
>
>
> On Friday, February 10, 2017 at 7:37:57 PM UTC+1, nouiz wrote:
>
> X+Y is trivially parallelisable. Bug not X.sum(axis=1). I'm pretty sure we
> do something sensible. I check the code and it is the case.
>
> Reduction isn't trivially parallelisable. This is way it get less speed
> up. When we reuse a row, we can't parallelize it as much as when adding 2
> matrix.
> But in all cases, in a real model, it shouldn't make a difference,
> reduction aren't bottleneck normally. If you have such case, I would like
> to see a profile that show this.
>
> Fred
>
> On Tue, Feb 7, 2017 at 6:28 PM Kiuhnm Mnhuik <[email protected]> wrote:
>
> Hi Fred,
>
> I'm talking about the GPU. With a 10000x1000 matrix X, X.sum(axis=1) is 10
> times slower than X + Y, where Y is another matrix of the same shape,
> according to my tests.
> I suspect that you're reducing each row using some O(logn) algorithm which
> makes sense when one needs to reduce a single long vector. But in this
> case, shouldn't we assign each row to a single core of the GPU and reduce
> the row as we would do on the CPU? The parallelism would result from having
> so many rows.
> Of course, if the matrix had just 10 rows this algorithm would be very
> slow, but with 10000 rows it should be faster than what you're doing right
> now. It might be almost as fast as doing X + Y.
> I'm speculating since I've never looked into CUDA programming (it's on my
> TODO list!).
>
>
> On Tuesday, February 7, 2017 at 10:49:47 PM UTC+1, nouiz wrote:
>
> Hi,
>
> for the gpu, int are only supported in the new gpu back-end
> (device=cuda*). In the old back-end, they would end up being on CPU. This
> is why at many places it is told to not use int on the GPU. But it isn't
> true with the new back-end.
>
> For the reduction being slow, we didn't parallelize it on the CPU. It
> wasn't a bottleneck on the CPU and we don't have much time to optimize the
> CPU. So I would recommand to time your real model on the CPU before
> spending much time thinking about the parallel reduction on CPU as it is
> probably not a problem.
>
>
>
>
> Fred
>
> On Mon, Feb 6, 2017 at 8:11 PM Kiuhnm Mnhuik <[email protected]> wrote:
>
> Reductions are quite slow. Without the final reduction I get a 100x speed
> up.
> Why is Y.sum(axis=1) so slow? I think that if each core handled a single
> row it'd be 10 times faster for matrices with many rows like in this case.
> Theano is probably using an O(logn) algorithm which is only useful when
> one needs to reduce a single but long vector.
> Can you confirm?
>
>
> On Tuesday, February 7, 2017 at 12:37:02 AM UTC+1, Kiuhnm Mnhuik wrote:
>
> I tried the following code:
>
>     def test_speed():
>         print('Computing X and X2...', end='', flush=True)
>         X_np = np.random.uniform(0, 100, size=(10000, 1000)).astype(floatX)
>         X2_np = np.random.uniform(0, 100, size=(10000,
> 1000)).astype(floatX)
>         print('done!', flush=True)
>
>         print('Moving X and X2 to the GPU...', end='', flush=True)
>         X = theano.shared(X_np)
>         X2 = theano.shared(X2_np)
>         print('done!', flush=True)
>
>         print('Building the graph...', end='', flush=True)
>         Y = X
>         for _ in range(100):
>             # Y = Y * (Y <= X2)
>             Y = Y * (Y - X2)
>         Y.sum(axis=1)
>         print('done!', flush=True)
>
>         print('compiling...', end='', flush=True)
>         f = theano.function([], Y)
>         print('done!', flush=True)
>
>         import time
>         t = time.clock()
>         f()
>         print(time.clock() - t)
>
> Note that there is a line with '<=' and another with '-' in the loop.
> They're exclusive. Here are the timings in seconds:
>
>             CPU      GPU
>     '-'     0.21    0.016
>     <=      0.39    0.019
>
> I'd say I don't need to worry about using comparisons.
>
> On Monday, February 6, 2017 at 1:20:13 PM UTC+1, Kiuhnm Mnhuik wrote:
>
> I'm using Theano 0.9.0b1 with the new back-end.
> Should I use float32 for everything (even for bool masks) for maximum
> speed on GPU (GTX 970)?
>
> --
>
> ---
> You received this message because you are subscribed to the Google Groups
> "theano-users" group.
>
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
>
>
> For more options, visit https://groups.google.com/d/optout.
>
> --
>
> ---
> You received this message because you are subscribed to the Google Groups
> "theano-users" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> For more options, visit https://groups.google.com/d/optout.
>
> --
>
> ---
> You received this message because you are subscribed to the Google Groups
> "theano-users" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> For more options, visit https://groups.google.com/d/optout.
>

-- 

--- 
You received this message because you are subscribed to the Google Groups 
"theano-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Re: [theano-users] Re: Info about types on new GPU backend

Reply via email to