Yeah picking the top-n things comes up repeatedly. There's one straightforward and efficient way to do it with a heap but it's the same 15 lines of code every time and deserves refactoring.
In fact in my own Mahout-based side project I have a more general "TopN" class that does exactly this because it comes up so much. Hmm, maybe I should stick it in Mahout and rewire things to use this more general class. In the meantime, adapting the logic is just a little copy-and-paste work. On Mon, Apr 25, 2011 at 8:32 PM, Julian Limon <[email protected]>wrote: > Hello all, > > I'm using SVD to reduce the dimensionality of a text corpus. When I get > queries, I generate a new matrix with them (based on the dictionary of the > index) and apply the same matrix transformation. Finally, I > multiply (SVD'd) the index matrix by the (SVD'd) query matrix to get a > similarity vector for each query. > > My question is, is there a class (or a command-line instruction) that > generates the top items from this vector? I know that Taste has a > abstraction called "TopItems", but I wonder if a similar thing exists for > vectors. > > Thanks a lot, > > Julian >
