Laurent, Check-out Solr 1.4.
You can download the trunk and Build it on your box. The Solr 1.4 does this out-of-the-box. No configuration required. You can use HTTP POST to post the document using some Linux utility like Curl and the PDF/Word/RTF/PPT/XLS etc. will be indexed. We tested this last week. Tika has already been included in Solr 1.4. Cheers Rajan On Wed, Sep 2, 2009 at 5:26 PM, Angel Ice <lbil...@yahoo.fr> wrote: > Hi everybody. > > I hope it's the right place for questions, if not sorry. > > I'm trying to index rich documents (PDF, MS docs etc) in SolR/Lucene. > I have seen a few examples explaining how to use tika to solve this. But > most of these examples are using curl to send documents to Solr or an HTML > POST with an input file. > But i'd like to do it in full java. > Is there a way to use Solrj to index the documents with the > ExtractingRequestHandler of SolR or at least to get the extracted xml back > (with the extract.only option) ? > > Many thanks. > > Laurent. > > > >