Yeah picking the top-n things comes up repeatedly. There's one
straightforward and efficient way to do it with a heap but it's the same 15
lines of code every time and deserves refactoring.

In fact in my own Mahout-based side project I have a more general "TopN"
class that does exactly this because it comes up so much. Hmm, maybe I
should stick it in Mahout and rewire things to use this more general class.

In the meantime, adapting the logic is just a little copy-and-paste work.

On Mon, Apr 25, 2011 at 8:32 PM, Julian Limon <[email protected]>wrote:

> Hello all,
>
> I'm using SVD to reduce the dimensionality of a text corpus. When I get
> queries, I generate a new matrix with them (based on the dictionary of the
> index) and apply the same matrix transformation. Finally, I
> multiply (SVD'd) the index matrix by the (SVD'd) query matrix to get a
> similarity vector for each query.
>
> My question is, is there a class (or a command-line instruction) that
> generates the top items from this vector? I know that Taste has a
> abstraction called "TopItems", but I wonder if a similar thing exists for
> vectors.
>
> Thanks a lot,
>
> Julian
>

Reply via email to