Re: LDA from Lucene Indexes

Ted Dunning Wed, 04 May 2011 10:47:20 -0700

Pipelining is good for abstraction and really bad for performance (in the
map-reduce world).

My thought is that we could have a multipurpose tool.  Input would be a
lucene index and the program would read term vectors or original text as
available.  Output would be either sequence file full of text or sequence
file full of vectors.

This would allow pipelining if interesting, but would also allow the common
case of generating vectors to proceed in one step.

On Wed, May 4, 2011 at 10:41 AM, Jake Mannix <[email protected]> wrote:

> On Wed, May 4, 2011 at 10:33 AM, Ted Dunning <[email protected]>
> wrote:
>
> > It might be that the right thing is to just tweak the current seq2saprse
> > process.
> >
> > Jake,
> >
> > is that what you were thinking?
> >
>
> Well seq2sparse is really for grabbing sequence files, and lucene.vector
> grabs
> lucene indexes... I was just imagining another script that takes lucene
> indexes
> and produces text files (or sequence files of text), so you can just
> pipeline it.
>
> I haven't thought about it too carefully, however.
>

Re: LDA from Lucene Indexes

Reply via email to