I see, you can use batched_dot for that. I wrote a gist which compares the 
numpy matmul, theano batch_dot, and theano multiply and sum approaches.
https://gist.github.com/JesseLivezey/42cabcf87aa0033410f7520933942127

On GPU, the multiply and sum seems to be fastest, but it will also use more 
memory.


On Monday, May 8, 2017 at 1:30:33 AM UTC-7, Šarūnas S. wrote:
>
> Currently, I have 3 approaches that are portable to theano:
>
> # 3D example
> axis = 0
> prob = np.random.random( ( 1, 1000, 50 ) )
> cases = np.random.random( ( 1000, 1000, 50 ) )
>
> # Elementwise + sum
> for i in xrange( 100 ):
> result = ( cases * prob ).sum( axis=1-axis, keepdims=True )
>
> # Loop version
> result = np.zeros( ( 1000, 1, 50 ) )
> for i in xrange( 5 ):
> result[ :, :, i ] = np.dot( prob[ :, :, i ], cases[ :, :, i ] )
>
> # Block diagonal sparse dot version
> prob_big = np.zeros( ( 1, 1000, 50, 50 ) )
> cases_big = np.zeros( ( 1000, 1000, 50, 50 ) )
>
> for i in xrange( 50 ):
> prob_big[ :, :, i, i ] = prob[ :, :, i, i ]
> cases_big[ :, :, i, i ] = prob[ :, :, i, i ]
>
> intermediate = np.tensordot( prob_big, cases_big, axes=[ [ 0 ], [ 1 ] ] )
> result = np.zeros( 1000, 1, 50 )
> for i in range( 50 ):
> result[ :, :, i ] = intermediate[ :, :, i, i ]
>
> I think the the one which would structure this as a sparse block diagonal 
> matrix would work best since I've seen some support for the block sparse 
> matrices. However, it looks like I would still need some loop for 
> blocksparse to iterate over all the blocks. Is there a way to somehow do 
> all the blocks at once and collect the diagonal without using scan? 
>
> On Saturday, 6 May 2017 10:41:06 UTC+3, Šarūnas S. wrote:
>>
>> I have tried that, but to no avail. The problem is that I have to 
>> multiply on 2 axes, but sum only on 1. 
>>
>> On Friday, 5 May 2017 19:23:12 UTC+3, Jesse Livezey wrote:
>>>
>>> I think tensordot should do what you want
>>>
>>> http://deeplearning.net/software/theano/library/tensor/basic.html#theano.tensor.tensordot
>>> something like
>>> result = T.tensordot(prob, cases, axes=1)
>>>
>>>
>>>
>>> On Friday, May 5, 2017 at 3:17:14 AM UTC-7, Šarūnas S. wrote:
>>>>
>>>> I was shown that in *numpy* I could speed it up in the following way:
>>>>
>>>> result = np.einsum('ijk,ijk->ik', prob, cases)[:,None,:]
>>>>
>>>>
>>>> result = np.matmul(prob.transpose(2,0,1), cases.T).T
>>>>
>>>>
>>>> Bot give me the expected speedup in *numpy*, but neither is 
>>>> implemented in *Theano*. Is there a way to do the same in *Theano* on 
>>>> the *GPU*?
>>>>
>>>>
>>>>
>>>> On Friday, 5 May 2017 11:15:26 UTC+3, Šarūnas S. wrote:
>>>>>
>>>>> In my current theano script the bottleneck is equivalent to the 
>>>>> following numpy code:
>>>>>
>>>>> import numpy as np
>>>>>
>>>>> # 3D example
>>>>> axis = 0
>>>>> prob = np.random.random( ( 1, 1000, 50 ) )
>>>>> cases = np.random.random( ( 1000, 1000, 50 ) )
>>>>>
>>>>> start = time.time(  )
>>>>> for i in xrange( 1000 ):
>>>>> result = ( cases * prob ).sum( axis=1-axis, keepdims=True )
>>>>> print '3D naive method took {} seconds'.format( time.time() - start )
>>>>> print result.shape
>>>>> print
>>>>>
>>>>> I had seen in 2D case that replacing elementwise+sum with a dot 
>>>>> product gave me 5x speedup. Are there any theano matrix operations that 
>>>>> could help me out here? 
>>>>>
>>>>

-- 

--- 
You received this message because you are subscribed to the Google Groups 
"theano-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Reply via email to