I suggest asking this question on the lucene-users mailing list.

On Thu, Jul 5, 2012 at 8:56 AM, Praveen Chandar
<[email protected]> wrote:
> Hi,
> I've used lucene as a data source for Mahout in the past. Recently, I
> switched to Lucene 4.0 (trunk) and in lucene 4.0 the indexing/term vector
> APIs have changed.
> And I not able to find an efficient way to read the term frequency vectors
> from the lucene index as Mahout Vectors.
>
> To be clear, I am trying to read the term frequency for a subset of
> documents from the lucene index and load it onto Mahout Vectors inorder to
> apply various clustering algorithms. Lucene 4.0 provides an iterate able
>  class "Terms" to read the term frequencies of the document and my current
> implementation iterates these terms in each document and adds it to
> Mahout's "RandomAccessSparseVector" using the "Dictionary" class to enode
> the term string.
>
> Is there an efficient implementation to read the term vectors directly from
> the index ?
> Praveen



-- 
Lance Norskog
[email protected]

Reply via email to