Hi, AutoRec is a collaborative filtering method, based on a feature-sparse autoencoder. Its main mechanism is considering just a limited portion of the weight matrices connected to a limited amount of observed (sparse) features. The paper can be found here: https://www.nicta.com.au/pub-download/full/8604/
Being the input feature vector incredibly highly-dimensional and having only a few features active at a time, the gradients are calculated only on those weights that, in a specific datapoint SGD update, are connected to a visible feature. The other weights are ignored. Similarly, the gradients of the weight matrix of the decoder are only those connected to the target same visible features as in the input. With a naive implementation the dot products are inefficiently extremely expensive, while by using a sparse CSR input vector, at least the encoding step is way cheaper because the dot product takes into account sparsity. ( http://www.deeplearning.net/software/theano/library/sparse/index.html ) I still have a problem with the decoder stage as I did not find a way to transform the decoder's weights matrix to a sparse matrix in which only a very limited number of rows are set, and the other are set at 0. In the learning this weights matrix would be alternatively zeroed in the rows that do not correspond to observed features. One idea is to use a selector matrix that is basically an 'eye' identity matrix, with many of its diagonal entries set at 0. For example, when considering a 3d datapoint [ NA, 12, NA ], (in this example the dimensionality is set way lower then what is in reality), the decoder matrix should be handled as such: (W_d^T * S)^T where W is the completely filled decoder's weights matrix, for example [[1,2],[3,4],[5,6]] if the low dimensional latent codes are assumed to have dimensionality 2; and S is a partial-identity 'selector' matrix [[0,0,0],[0,1,0],[0,0,0]], assumed to be in a sparse. Ideally the format of the desired partial weights matrix would be [[0,0],[3,4],[0,0]], an in sparse format in order to make a dot product between the latent 2d code and the weight matrix efficient in order to calculate the selected gradients. Sorry for being verbose, I wanted to make sure to be understood. Anyways, any idea on how to implement this at best? Is there an easy way to create the 'partial-identity/selector' matrix starting from the sparse input feature vector? regards, -Francesco -- --- You received this message because you are subscribed to the Google Groups "theano-users" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/d/optout.
