I use seq2sparse + ssvd. Subsequent use patterns vary and in my case proprietary but mainly revolve around fold-in updates into pretrained term space and various locality sensitive tricks depending on the patterns you use. My pattern involves scanning first n nearest neighbours with smallest distance first preferably without examining the entire neighborhood as opposed to finding all neighbours in a given distance radius which is what most of algorithms actually do out of the box.
I suspect although not quite convinced that document mixture models such as lda would produce a better fit than classic svd based lsi. On Nov 13, 2011 10:48 AM, "Sebastian Schelter" <[email protected]> wrote: > Is there some documentation/tutorial available on how to build a LSI > pipeline with mahout and lucene? > > --sebastian >
