Never used lucene.vector myself, thinking loud here. Assuming that dict.out is in TextFormat. You could use 'seqdirectory' to convert dict to a sequencefileformat.
This can then be fed into cvb. ________________________________ From: James Forth <[email protected]> To: "[email protected]" <[email protected]> Sent: Tuesday, June 4, 2013 8:00 PM Subject: Dictionary file format in Lucene-Mahout integration Hello, I’m wondering if anyone can help with a question about the dictionary format in lucene.vector-cvb integration. I’ve previously used the pathway from text files: seqdirectory > seq2sparse > rowid > cvb and it works fine. The dictionary created by seq2sparse is in sequence file format, and this is accepted by cvb. But when using a pathway from a lucene index: lucene.vector > cvb there is a problem with cvb throwing the error “dict.out not a SequenceFile”. Lucene.vector appears to generate a dictionary in plain text format, but cvb requires it in sequence file format. Does anyone know how to use lucence.vector with cvb, which I assume means obtaining a dictionary as a sequence file from lucene.vector? Thanks for your help. James
