OK, so I put my pdf files in a directory /path/to/pdf, and edited
example-DIH/solr/tika/conf/tika-data-config.xml to contain the parameter
<entity name="tika-test"
processor="TikaEntityProcessor" url="/path/to/pdf"
format="xml" >

What should I do next?


Shawn Heisey-4 wrote
> On 10/11/2013 9:32 AM, PeteBleackley wrote:
>> I tried changing the options to -Dauto -Dfiletypes=pdf. This gave me a
>> 404
>> error, apparently caused by post.jar adding /extract to the end of the
>> URL
> 
> In order to use post.jar, you would need the /update/extract handler,
> which is not defined in the tika core under example-DIH.
> 
> The example-DIH configurations are intended to use and illustrate the
> dataimport handler - documents are imported using the /dataimport
> handler and its config file, not sent directly with post.jar.
> 
> Here's a page covering what you would need in order to send PDFs
> directly rather than import them using DIH:
> 
> http://wiki.apache.org/solr/ExtractingRequestHandler
> 
> Thanks,
> Shawn





--
View this message in context: 
http://lucene.472066.n3.nabble.com/Problems-using-DataImportHandler-and-TikaEntityProcessor-tp4094983p4095366.html
Sent from the Solr - User mailing list archive at Nabble.com.

Reply via email to