I've never implemented LSI. Is there a way to incrementally build the model (by simply indexing documents) or is it something that one only runs after the fact once one has built up the much bigger matrix? If it's the former, I bet it wouldn't be that hard to just implement the appropriate new codecs and similarity, assuming Lucene trunk. If it's the latter, then Ted's comment about pushing back into Lucene gets a bit hairier. Still, I wonder if the Codecs/Similarity could help here, too.
What's a typical workflow look like for building all of this? On Nov 13, 2011, at 3:58 PM, Ted Dunning wrote: > Essentially not. > > And I would worry about how to push the LSI vectors back into lucene in a > coherent and usable way. > > On Sun, Nov 13, 2011 at 10:47 AM, Sebastian Schelter <[email protected]> wrote: > >> Is there some documentation/tutorial available on how to build a LSI >> pipeline with mahout and lucene? >> >> --sebastian >>
