You can use Solr to index your files. If you cannot do it using vanilla Solr then you may want to look at Tika which allows many files types to be indexed.
Once you have your Solr (which is actually a standard Lucene) index you can create Mahout vectors from this using mahout lucene.vector for help type mahout lucene.vector -h Jack On 25 Mar 2013, at 11:50, Fabrizio Macedonio wrote: > > Hi all, > > is possible create mahout dataset from Lucene Index? > > How can i create a dataset from my docs file (doc, docx, pdf)? > > Thanks, > > Fabrizio
