On Wed, May 4, 2011 at 8:53 AM, Julian Limon <[email protected]>wrote:
> This sounds really interesting. Is there a way to dump certain fields from > a > Lucene index to text files? > > If so, I could use Lucene to do the parsing, and then seqdirectory and > seq2sparse to generate Mahout vectors out of these files. > You need to either have the fields Store.YES, or TermVector.YES for this to work. If you have the latter, then you don't need them in text files, you can use the usual lucene.vector script to produce mahout vectors. To dump stored fields, we don't currently have a script to do that, but it should be another 5 lines of code to write one (ok, 25 lines, including boilerplate, damn java). File a ticket, there are lots of people around here who could write that code. -jake > Thanks, > > Julian > > 2011/5/3 Jake Mannix <[email protected]> > > > On Tue, May 3, 2011 at 6:17 PM, Grant Ingersoll <[email protected]> > > wrote: > > > > > > > > > Although technically, we could add the capability to take a Store.YES > > > field > > > > and re-tokenize and > > > > build vectors from this as well. > > > > > > True, or we could just dump stored fields out to text and use the > > existing > > > text converter > > > > > > That would probably be the right way to do that, actually. > > > > -jake > > >
