It is possible to index/vectorize new documents in an existing projection.
 Building the projection is pretty much a from-scratch operation.
 Rebuilding the projection can be done pretty infrequently.

On Thu, Nov 17, 2011 at 1:47 PM, Grant Ingersoll <[email protected]>wrote:

> I've never implemented LSI.  Is there a way to incrementally build the
> model (by simply indexing documents) or is it something that one only runs
> after the fact once one has built up the much bigger matrix?  If it's the
> former, I bet it wouldn't be that hard to just implement the appropriate
> new codecs and similarity, assuming Lucene trunk.  If it's the latter, then
> Ted's comment about pushing back into Lucene gets a bit hairier.  Still, I
> wonder if the Codecs/Similarity could help here, too.
>
> What's a typical workflow look like for building all of this?
>
> On Nov 13, 2011, at 3:58 PM, Ted Dunning wrote:
>
> > Essentially not.
> >
> > And I would worry about how to push the LSI vectors back into lucene in a
> > coherent and usable way.
> >
> > On Sun, Nov 13, 2011 at 10:47 AM, Sebastian Schelter <[email protected]>
> wrote:
> >
> >> Is there some documentation/tutorial available on how to build a LSI
> >> pipeline with mahout and lucene?
> >>
> >> --sebastian
> >>
>
>
>

Reply via email to