Currently, I have 3 approaches that are portable to theano:

# 3D example
axis = 0
prob = np.random.random( ( 1, 1000, 50 ) )
cases = np.random.random( ( 1000, 1000, 50 ) )

# Elementwise + sum
for i in xrange( 100 ):
result = ( cases * prob ).sum( axis=1-axis, keepdims=True )

# Loop version
result = np.zeros( ( 1000, 1, 50 ) )
for i in xrange( 5 ):
result[ :, :, i ] = np.dot( prob[ :, :, i ], cases[ :, :, i ] )

# Block diagonal sparse dot version
prob_big = np.zeros( ( 1, 1000, 50, 50 ) )
cases_big = np.zeros( ( 1000, 1000, 50, 50 ) )

for i in xrange( 50 ):
prob_big[ :, :, i, i ] = prob[ :, :, i, i ]
cases_big[ :, :, i, i ] = prob[ :, :, i, i ]

intermediate = np.tensordot( prob_big, cases_big, axes=[ [ 0 ], [ 1 ] ] )
result = np.zeros( 1000, 1, 50 )
for i in range( 50 ):
result[ :, :, i ] = intermediate[ :, :, i, i ]

I think the the one which would structure this as a sparse block diagonal 
matrix would work best since I've seen some support for the block sparse 
matrices. However, it looks like I would still need some loop for 
blocksparse to iterate over all the blocks. Is there a way to somehow do 
all the blocks at once and collect the diagonal without using scan? 

On Saturday, 6 May 2017 10:41:06 UTC+3, Šarūnas S. wrote:
>
> I have tried that, but to no avail. The problem is that I have to multiply 
> on 2 axes, but sum only on 1. 
>
> On Friday, 5 May 2017 19:23:12 UTC+3, Jesse Livezey wrote:
>>
>> I think tensordot should do what you want
>>
>> http://deeplearning.net/software/theano/library/tensor/basic.html#theano.tensor.tensordot
>> something like
>> result = T.tensordot(prob, cases, axes=1)
>>
>>
>>
>> On Friday, May 5, 2017 at 3:17:14 AM UTC-7, Šarūnas S. wrote:
>>>
>>> I was shown that in *numpy* I could speed it up in the following way:
>>>
>>> result = np.einsum('ijk,ijk->ik', prob, cases)[:,None,:]
>>>
>>>
>>> result = np.matmul(prob.transpose(2,0,1), cases.T).T
>>>
>>>
>>> Bot give me the expected speedup in *numpy*, but neither is implemented 
>>> in *Theano*. Is there a way to do the same in *Theano* on the *GPU*?
>>>
>>>
>>>
>>> On Friday, 5 May 2017 11:15:26 UTC+3, Šarūnas S. wrote:
>>>>
>>>> In my current theano script the bottleneck is equivalent to the 
>>>> following numpy code:
>>>>
>>>> import numpy as np
>>>>
>>>> # 3D example
>>>> axis = 0
>>>> prob = np.random.random( ( 1, 1000, 50 ) )
>>>> cases = np.random.random( ( 1000, 1000, 50 ) )
>>>>
>>>> start = time.time(  )
>>>> for i in xrange( 1000 ):
>>>> result = ( cases * prob ).sum( axis=1-axis, keepdims=True )
>>>> print '3D naive method took {} seconds'.format( time.time() - start )
>>>> print result.shape
>>>> print
>>>>
>>>> I had seen in 2D case that replacing elementwise+sum with a dot product 
>>>> gave me 5x speedup. Are there any theano matrix operations that could help 
>>>> me out here? 
>>>>
>>>

-- 

--- 
You received this message because you are subscribed to the Google Groups 
"theano-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Reply via email to