[theano-users] Re: How do you optimise code for a GPU ? (Currently getting worse performance on GPU than CPU)

Aditya Gudimella Thu, 14 Jul 2016 07:00:09 -0700

Profile your functions. That'll give you an idea of how much time is being 
spent on each op and where each op is taking place (CPU or GPU)


On Thursday, 14 July 2016 04:17:29 UTC-4, Lindley Lentati wrote:
>
> Hi there, 
>
> So i've recently started getting into theano, and wanted to try and take 
> my existing code and put it on a GPU. I tried the simple example on the 
> website, and got similar improvements to those quoted, so was quite hopeful 
> going in !  The code that i'm evaluating is the following, hopefully 
> commented sufficiently to make it clear whats going on:
>
> amps   = tt.vector('amps', dtype=theano.config.floatX)
> offs   = tt.vector('offs', dtype=theano.config.floatX)
> sigs   = tt.vector('sigs', dtype=theano.config.floatX)
> phase  = tt.scalar('phase', dtype=theano.config.floatX)
>
> #TFlatTimes is a float32 shared vector that is 1024*useToAs long and 
> contains the observed times of useToAs light curves, each of 
> #which is sampled with 1024 bins.  useToAs is set to 100 in this case but 
> will eventually be tens of thousands.
>
> #ReferencePeriod is a float32 shared scalar
> #Tg1width, Tg2amp, Tg2width are float32 shared scalars that define a 
> double Gaussian model
> #phase is  a single free parameter that defines when to evaluate the 
> gausisan model jointly for each light curve
>
>
> #first shift TFlatTimes by phase, and then wrap between values of 
> -ReferencePeriod/2 and +ReferencePeriod/2, store as x
> #Then evaluate first gaussian as y
> #repeat for the position of the second gaussian and evaluate it as y2
>
> x = ( TFlatTimes - phase + ReferencePeriod/2) % (ReferencePeriod ) - 
> ReferencePeriod/2
> y = tt.exp(-0.5*(x)**2/Tg1width**2)
> x2 = ( TFlatTimes - phase - gsep + ReferencePeriod/2) % (ReferencePeriod ) 
> - ReferencePeriod/2
> y2 = Tg2amp*tt.exp(-0.5*(x2)**2/Tg2width**2)
>
> #AmpVec, OffVec, and SigVec contain the overall amplitude of each curve, 
> an offset, and the noise level
> #Each is 1024*useToAs in length and is a single number (ie amps[0]) 
> repeated 1024 times, then amps[1] 1024 times etc
>
>
> AmpVec = theano.tensor.extra_ops.repeat(amps, 1024)
> OffVec = theano.tensor.extra_ops.repeat(offs, 1024)
> SigVec = theano.tensor.extra_ops.repeat(sigs, 1024)
>
> Nbins=Nbins.astype(int)
> TNbins=theano.shared(Nbins)
>
> #construct final signal vector, the sum of the two gaussians multipled by 
> the overall amplitude for that curve, plus the offset
>
> s = AmpVec*(y+y2) + OffVec
>
>
> #calculate log likelihood
>
> like = 0.5*tt.sum(((TFlatData-s)/SigVec)**2)  + 0.5*tt.sum(TNbins[:useToAs
> ]*tt.log(sigs**2))
>
>
> #calculate gradient with respect to the parameters
>
>
> glike = tt.grad(like, [phase, amps, offs, sigs])
>
>
> #define functions to return likelihood, gradient, and the signal vector
>
> getS = theano.function([phase, amps, offs], s)
> getX = theano.function([phase, amps, offs, sigs], like)    
> getG = theano.function([phase, amps, offs, sigs], glike)
>
>
> #Wrap these in a single function that is passed vectors of parameters
> def TheanoFunc2(phaseval, ampvec, offvec, sigvec):
>
>     l=getX(phaseval, ampvec, offvec, sigvec)*1
>     g=getG(phaseval, ampvec, offvec, sigvec)    
>     return l, g
>
>
> I then wanted to test this by evaluating TheanoFunc2 20000 times using 
> random numbers as the input:
>
>
> pval = np.float32(0.00288206)
> Tpval = theano.shared(pval)
>
>
> ltot = 0
>
> #define random number functions
>
> from theano.sandbox.rng_mrg import MRG_RandomStreams as RandomStreams
>
> theano_rng = RandomStreams(189)
>
>
> avals = theano.function([], theano_rng.normal( size = (useToAs,), avg = 
> 0.0, std = 1.0, dtype=theano.config.floatX))
> ovals = theano.function([], theano_rng.normal( size = (useToAs,), avg = 
> 0.0, std = 1.0, dtype=theano.config.floatX))
> nvals = theano.function([], theano_rng.normal( size = (useToAs,), avg = 
> 0.0, std = 1.0, dtype=theano.config.floatX)**2)
>
>
> start = time.clock()
>
>
> for i in range(20000):
>     if(i%100 == 0):
>         print i
>
>
>
>     l, g = TheanoFunc2(pval, avals(), ovals(), nvals())
>
>     ltot += l
>
> end = time.clock()
>
> print "time", start-end
>
>
>
> I then timed this for CPU and GPU uses using: 
>
> setenv THEANO_FLAGS 'mode=FAST_RUN,device=cpu,floatX=float32' 
>
> and 
>
> setenv THEANO_FLAGS 'mode=FAST_RUN,device=gpu,floatX=float32'
>
>
> and get times of 469.33s on CPU, and 561.29s on a GPU.
>
>
> Unfortunately I have no idea why that might be, is there any way to see 
> how much/when stuff is being copied to and from the GPU?  In principle all 
> i need to do is copy my initial vector of parameters to the GPU, and then 
> just return the likelihood and gradient, everything else can be made and 
> kept on the GPU.
>
> If anyone was able to look through this and shed some light, I would 
> greatly appreciate it!
>
> Thanks
>
>

-- 

--- 
You received this message because you are subscribed to the Google Groups 
"theano-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

[theano-users] Re: How do you optimise code for a GPU ? (Currently getting worse performance on GPU than CPU)

Reply via email to