Never used lucene.vector myself,  thinking loud here. Assuming that dict.out is 
in TextFormat.
You could use 'seqdirectory' to convert dict to a sequencefileformat. 

This can then be fed into cvb.




________________________________
 From: James Forth <[email protected]>
To: "[email protected]" <[email protected]> 
Sent: Tuesday, June 4, 2013 8:00 PM
Subject: Dictionary file format in Lucene-Mahout integration
 

Hello,


I’m wondering if anyone can help with a question about the dictionary format in
lucene.vector-cvb integration.  I’ve previously used the pathway from text
files:  seqdirectory >
seq2sparse > rowid > cvb  and it works fine.  The
dictionary created by seq2sparse is in sequence file format, and this is 
accepted by cvb.

But when using a pathway from a lucene index:  lucene.vector > cvb  there is a 
problem with cvb throwing the error “dict.out not a SequenceFile”. 
Lucene.vector appears to generate a dictionary in plain text format, but cvb
requires it in sequence file format.

Does anyone know how to use lucence.vector with cvb, which I assume means
obtaining a dictionary as a sequence file from lucene.vector?

Thanks for your help.

James

Reply via email to