OK, so I put my pdf files in a directory /path/to/pdf, and edited example-DIH/solr/tika/conf/tika-data-config.xml to contain the parameter <entity name="tika-test" processor="TikaEntityProcessor" url="/path/to/pdf" format="xml" >
What should I do next? Shawn Heisey-4 wrote > On 10/11/2013 9:32 AM, PeteBleackley wrote: >> I tried changing the options to -Dauto -Dfiletypes=pdf. This gave me a >> 404 >> error, apparently caused by post.jar adding /extract to the end of the >> URL > > In order to use post.jar, you would need the /update/extract handler, > which is not defined in the tika core under example-DIH. > > The example-DIH configurations are intended to use and illustrate the > dataimport handler - documents are imported using the /dataimport > handler and its config file, not sent directly with post.jar. > > Here's a page covering what you would need in order to send PDFs > directly rather than import them using DIH: > > http://wiki.apache.org/solr/ExtractingRequestHandler > > Thanks, > Shawn -- View this message in context: http://lucene.472066.n3.nabble.com/Problems-using-DataImportHandler-and-TikaEntityProcessor-tp4094983p4095366.html Sent from the Solr - User mailing list archive at Nabble.com.