I am indexing PDFs and a separate process has converted any image PDFs to
search PDF before solr gets near it. I notice that tika is very slow at parsing
some PDFs. I don't need any metadata (which I suspect is slowing tika down),
just the text. Has anyone used an alternative PDF text extraction library in a
SOLRJ context?
Notice: This email and any attachments are confidential and may not be used,
published or redistributed without the prior written consent of the Institute
of Geological and Nuclear Sciences Limited (GNS Science). If received in error
please destroy and immediately notify GNS Science. Do not copy or disclose the
contents.