Here's one I've been puzzling over for a bit. In a factorization based on the SVD or what have you, you reconstruct the approximate original matrix (well, one row) by multiplying the matrices back together and looking for the largest elements. This is essentially multiplying a user feature vector by the entire item-feature matrix to reconstruct one approximate row of the input.
That's not necessarily so slow, but it's not the fastest thing. I want to speed it up. It seems like there ought to be some shortcut, even if it means a probabilistic approach that could get it slightly wrong at times. I say so because most item feature vectors are nowhere near the user feature vector in feature space. Their dot product is not going to be the largest. In fact, given the user feature vector you can say exactly where in feature space (which direction) you want to look for the top items. For example, if the user feature vector is (2,1) you are looking for item vector (x,y) that maximizes 2x+y and that is largest in the direction of (2,1). When feature space is 50+-dimensional though, I'm having a hard time thinking of an efficient way to index those item feature vectors such that one could quickly find a few buckets of items to check and with high confidence have found the best recommendations. The bases I have are not necessarily orthogonal let alone orthonormal either. I bet, hope, someone will have an insight? You could cluster the items with k-means, quickly, I suppose. I was hoping for something less heavy. Sean
