I see, you can use batched_dot for that. I wrote a gist which compares the numpy matmul, theano batch_dot, and theano multiply and sum approaches. https://gist.github.com/JesseLivezey/42cabcf87aa0033410f7520933942127
On GPU, the multiply and sum seems to be fastest, but it will also use more memory. On Monday, May 8, 2017 at 1:30:33 AM UTC-7, Šarūnas S. wrote: > > Currently, I have 3 approaches that are portable to theano: > > # 3D example > axis = 0 > prob = np.random.random( ( 1, 1000, 50 ) ) > cases = np.random.random( ( 1000, 1000, 50 ) ) > > # Elementwise + sum > for i in xrange( 100 ): > result = ( cases * prob ).sum( axis=1-axis, keepdims=True ) > > # Loop version > result = np.zeros( ( 1000, 1, 50 ) ) > for i in xrange( 5 ): > result[ :, :, i ] = np.dot( prob[ :, :, i ], cases[ :, :, i ] ) > > # Block diagonal sparse dot version > prob_big = np.zeros( ( 1, 1000, 50, 50 ) ) > cases_big = np.zeros( ( 1000, 1000, 50, 50 ) ) > > for i in xrange( 50 ): > prob_big[ :, :, i, i ] = prob[ :, :, i, i ] > cases_big[ :, :, i, i ] = prob[ :, :, i, i ] > > intermediate = np.tensordot( prob_big, cases_big, axes=[ [ 0 ], [ 1 ] ] ) > result = np.zeros( 1000, 1, 50 ) > for i in range( 50 ): > result[ :, :, i ] = intermediate[ :, :, i, i ] > > I think the the one which would structure this as a sparse block diagonal > matrix would work best since I've seen some support for the block sparse > matrices. However, it looks like I would still need some loop for > blocksparse to iterate over all the blocks. Is there a way to somehow do > all the blocks at once and collect the diagonal without using scan? > > On Saturday, 6 May 2017 10:41:06 UTC+3, Šarūnas S. wrote: >> >> I have tried that, but to no avail. The problem is that I have to >> multiply on 2 axes, but sum only on 1. >> >> On Friday, 5 May 2017 19:23:12 UTC+3, Jesse Livezey wrote: >>> >>> I think tensordot should do what you want >>> >>> http://deeplearning.net/software/theano/library/tensor/basic.html#theano.tensor.tensordot >>> something like >>> result = T.tensordot(prob, cases, axes=1) >>> >>> >>> >>> On Friday, May 5, 2017 at 3:17:14 AM UTC-7, Šarūnas S. wrote: >>>> >>>> I was shown that in *numpy* I could speed it up in the following way: >>>> >>>> result = np.einsum('ijk,ijk->ik', prob, cases)[:,None,:] >>>> >>>> >>>> result = np.matmul(prob.transpose(2,0,1), cases.T).T >>>> >>>> >>>> Bot give me the expected speedup in *numpy*, but neither is >>>> implemented in *Theano*. Is there a way to do the same in *Theano* on >>>> the *GPU*? >>>> >>>> >>>> >>>> On Friday, 5 May 2017 11:15:26 UTC+3, Šarūnas S. wrote: >>>>> >>>>> In my current theano script the bottleneck is equivalent to the >>>>> following numpy code: >>>>> >>>>> import numpy as np >>>>> >>>>> # 3D example >>>>> axis = 0 >>>>> prob = np.random.random( ( 1, 1000, 50 ) ) >>>>> cases = np.random.random( ( 1000, 1000, 50 ) ) >>>>> >>>>> start = time.time( ) >>>>> for i in xrange( 1000 ): >>>>> result = ( cases * prob ).sum( axis=1-axis, keepdims=True ) >>>>> print '3D naive method took {} seconds'.format( time.time() - start ) >>>>> print result.shape >>>>> print >>>>> >>>>> I had seen in 2D case that replacing elementwise+sum with a dot >>>>> product gave me 5x speedup. Are there any theano matrix operations that >>>>> could help me out here? >>>>> >>>> -- --- You received this message because you are subscribed to the Google Groups "theano-users" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/d/optout.
