I suggest asking this question on the lucene-users mailing list. On Thu, Jul 5, 2012 at 8:56 AM, Praveen Chandar <[email protected]> wrote: > Hi, > I've used lucene as a data source for Mahout in the past. Recently, I > switched to Lucene 4.0 (trunk) and in lucene 4.0 the indexing/term vector > APIs have changed. > And I not able to find an efficient way to read the term frequency vectors > from the lucene index as Mahout Vectors. > > To be clear, I am trying to read the term frequency for a subset of > documents from the lucene index and load it onto Mahout Vectors inorder to > apply various clustering algorithms. Lucene 4.0 provides an iterate able > class "Terms" to read the term frequencies of the document and my current > implementation iterates these terms in each document and adds it to > Mahout's "RandomAccessSparseVector" using the "Dictionary" class to enode > the term string. > > Is there an efficient implementation to read the term vectors directly from > the index ? > Praveen
-- Lance Norskog [email protected]
