Re: Creating dataset from Lucene Index

Jack Pay Mon, 25 Mar 2013 04:59:40 -0700

You can use Solr to index your files.

If you cannot do it using vanilla Solr then you may want to look at Tika which 
allows many files types to be indexed.


Once you have your Solr (which is actually a standard Lucene) index you can 
create Mahout vectors from this using 

mahout lucene.vector  

for help type

mahout lucene.vector -h

Jack
On 25 Mar 2013, at 11:50, Fabrizio Macedonio wrote:

> 
> Hi all, 
> 
> is possible create mahout dataset from Lucene Index?
> 
> How can i create a dataset from my docs file (doc, docx, pdf)?
> 
> Thanks,
> 
> Fabrizio

Re: Creating dataset from Lucene Index

Reply via email to