You can use Solr to index your files.

If you cannot do it using vanilla Solr then you may want to look at Tika which 
allows many files types to be indexed.

Once you have your Solr (which is actually a standard Lucene) index you can 
create Mahout vectors from this using 

mahout lucene.vector  

for help type

mahout lucene.vector -h

Jack
On 25 Mar 2013, at 11:50, Fabrizio Macedonio wrote:

> 
> Hi all, 
> 
> is possible create mahout dataset from Lucene Index?
> 
> How can i create a dataset from my docs file (doc, docx, pdf)?
> 
> Thanks,
> 
> Fabrizio                                        

Reply via email to