The cluster, classification and decompositional jobs all like the same kind of input. These can be viewed as matrices or sequences of vectors; it comes to much the same sort of thing. The gotcha is that the user often has tokens in fielded documents (ratings, documents, purchase history). Other than that, it should be pretty easy. Even the output of most of these programs can be matrices/vector sequences.
On Tue, Sep 13, 2011 at 6:59 AM, Sean Owen <[email protected]> wrote: > > Another simpler use case is the system of making Vector files out of > Lucene > > analysis output. The Lanzcos distributed solver takes matrices instead of > > vectors. What if I want to run the Lucene vector file output through this > > SVD? I have to somehow turn my (named) vectors into a (labeled) matrix. > > I smell the need for a utility that can convert "n m-dimensional > vectors" into an "mxn matrix". (2)
