[theano-users] Implementing AutoRec, very sparse feature vectors: how to make the dot product cheaper by considering only very few rows of a weights matrix?

fstab Mon, 07 Nov 2016 08:05:19 -0800

Hi,

AutoRec is a collaborative filtering method, based on a feature-sparse 
autoencoder. Its main mechanism is considering just a limited portion of 
the weight matrices connected to a limited amount of observed (sparse) 
features.
The paper can be found here: 
https://www.nicta.com.au/pub-download/full/8604/


Being the input feature vector incredibly highly-dimensional and having 
only a few features active at a time, the gradients are calculated only on 
those weights that, in a specific datapoint SGD update, are connected to a 
visible feature. The other weights are ignored.

Similarly, the gradients of the weight matrix of the decoder are only those 
connected to the target same visible features as in the input.

With a naive implementation the dot products are inefficiently extremely 
expensive, while by using a sparse CSR input vector, at least the encoding 
step is way cheaper because the dot product takes into account sparsity. ( 
http://www.deeplearning.net/software/theano/library/sparse/index.html )

I still have a problem with the decoder stage as I did not find a way to 
transform the decoder's weights matrix to a sparse matrix in which only a 
very limited number of rows are set, and the other are set at 0. In the 
learning this weights matrix would be alternatively zeroed in the rows that 
do not correspond to observed features. One idea is to use a selector 
matrix that is basically an 'eye' identity matrix, with many of its 
diagonal entries set at 0.

For example, when considering a 3d datapoint [ NA, 12, NA ], (in this 
example the dimensionality is set way lower then what is in reality), the 
decoder matrix should be handled as such: (W_d^T * S)^T

where W is the completely filled decoder's weights matrix, for example 
[[1,2],[3,4],[5,6]] if the low dimensional latent codes are assumed to have 
dimensionality 2; and S is a partial-identity 'selector' matrix 
[[0,0,0],[0,1,0],[0,0,0]], assumed to be in a sparse. Ideally the format of 
the desired partial weights matrix would be [[0,0],[3,4],[0,0]], an in 
sparse format in order to make a dot product between the latent 2d code and 
the weight matrix efficient in order to calculate the selected gradients.

Sorry for being verbose, I wanted to make sure to be understood.

Anyways, any idea on how to implement this at best? Is there an easy way to 
create the 'partial-identity/selector' matrix starting from the sparse 
input feature vector?

regards,
-Francesco

-- 

--- 
You received this message because you are subscribed to the Google Groups 
"theano-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

[theano-users] Implementing AutoRec, very sparse feature vectors: how to make the dot product cheaper by considering only very few rows of a weights matrix?

Reply via email to