It is possible to index/vectorize new documents in an existing projection. Building the projection is pretty much a from-scratch operation. Rebuilding the projection can be done pretty infrequently.
On Thu, Nov 17, 2011 at 1:47 PM, Grant Ingersoll <[email protected]>wrote: > I've never implemented LSI. Is there a way to incrementally build the > model (by simply indexing documents) or is it something that one only runs > after the fact once one has built up the much bigger matrix? If it's the > former, I bet it wouldn't be that hard to just implement the appropriate > new codecs and similarity, assuming Lucene trunk. If it's the latter, then > Ted's comment about pushing back into Lucene gets a bit hairier. Still, I > wonder if the Codecs/Similarity could help here, too. > > What's a typical workflow look like for building all of this? > > On Nov 13, 2011, at 3:58 PM, Ted Dunning wrote: > > > Essentially not. > > > > And I would worry about how to push the LSI vectors back into lucene in a > > coherent and usable way. > > > > On Sun, Nov 13, 2011 at 10:47 AM, Sebastian Schelter <[email protected]> > wrote: > > > >> Is there some documentation/tutorial available on how to build a LSI > >> pipeline with mahout and lucene? > >> > >> --sebastian > >> > > >
