Hi,

I have a simple bilinear interpolation code, which is part of a larger 
algorithm. To simplify things, I created a minimal example of my problem.

When using *two-dimensional array indexing* in the interpolation, the 
profiler shows a lot of time in "*AdvancedIncSubtensor*", which is python 
code. After some research I switched to *linear indexing* and got the 
expected speedup on CPU. The code now uses "*AdvancedIncSubtensor1*", which 
has a C implementation:

*CPU before*:
THEANO function evaluation took 0.124027 seconds.
THEANO gradient evaluation took 0.248830 seconds.


*CPU after:*THEANO function evaluation took 0.087603 seconds.
THEANO gradient evaluation took 0.180211 seconds.

However*, *on the GPU (GTX980)*, *this leads to a >200 times increase in 
runtime (!):

*GPU before *(floatX=float32,device=cuda0):
THEANO function evaluation took 0.286931 seconds.
THEANO gradient evaluation took 0.203636 seconds.


*GPU after:*THEANO function evaluation took 0.191236 seconds.
THEANO gradient evaluation took 44.522054 seconds.

Nearly all of the time is spent in *"GpuAdvancedIncSubtensor1"*. The same 
can be observed using the old gpu backend. I'm using Theano (0.9.0rc4).

Here is the code, you can just comment/uncomment one of the blocks in 
interp2D for the before/after behavior:


import theano
import numpy
import time

def interpolationTest():
  
  numpy.random.seed(42)
  m1=[2048,2048]
  m2 = [1025,1025]
  m2Grid = numpy.meshgrid(numpy.linspace(0,m1[0],m2[0]),numpy.linspace(0,m1[
1],m2[1]))
  y = m2Grid[0].flatten() + (numpy.random.random(m2Grid[0].size)-0.5)
  x = theano.tensor.vector()

  fval = theano.tensor.sum(interp2D(x,m1, m2))
  f = theano.function([x],fval,allow_input_downcast=True)
  grad = theano.tensor.grad(fval,x)
  df = theano.function([x],grad,allow_input_downcast=True)

  t0 = time.time()
  val_y = f(y)
  t1 = time.time()
  
  print("THEANO function evaluation took {0:f} seconds.".format(t1-t0))
  
  t0 = time.time()
  dF_y = df(y)
  t1 = time.time()
  
  print("THEANO gradient evaluation took {0:f} seconds.".format(t1-t0))
  
def interp2D(I,m1,m2):
  
  m1Grid = numpy.meshgrid(numpy.linspace(0.5,m1[1]-0.5,m1[1])*0.5,numpy.
linspace(0.5,m1[0]-0.5,m1[0])*0.5)

  xPoint = theano.shared(m1Grid[1].flatten().astype(theano.config.floatX))
  yPoint = theano.shared(m1Grid[0].flatten().astype(theano.config.floatX))
  
  integerX = theano.tensor.cast(theano.tensor.floor(xPoint),'int32') 
#theano.tensor.floor
  remainderX = xPoint-theano.tensor.cast(integerX,'floatX')

  integerY = theano.tensor.cast(theano.tensor.floor(yPoint),'int32') 
#theano.tensor.floor
  remainderY = yPoint-theano.tensor.cast(integerY,'floatX')
  
#2D-Indexing
#   I= theano.tensor.reshape(I,m2)
#   I00 = I[integerX,integerY]
#   I01 = I[integerX,integerY+1]
#   I10 = I[integerX+1,integerY]
#   I11 = I[integerX+1,integerY+1]
##########

##linear indexing
  I00 = I[integerX*m2[1]+integerY]
  I01 = I[integerX*m2[1]+(integerY+1)]
  I10 = I[(integerX+1)*m2[1]+integerY]
  I11 = I[(integerX+1)*m2[1]+(integerY+1)]
##########
   
  invRemainderX = 1.0-remainderX
  invRemainderY = 1.0-remainderY 
  
  return invRemainderX*(invRemainderY*I00+remainderY*I01) + remainderX*(
invRemainderY*I10+remainderY*I11)

if __name__ == '__main__':
  interpolationTest()

Any ideas what is happening here, and how to fix this?

Thanks!








-- 

--- 
You received this message because you are subscribed to the Google Groups 
"theano-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Reply via email to