Problem with Solr indexing "non-searchable" pdf files

2015-12-17 Thread RICARDO EITO BRUN
Hi, I am using SOLR as part of the dspace 5.4 SW application. I have a problem when running the dspace indexing command (index-discovery). Most of the files are not being added to the index, and an exception is raised. It seems that Solr does not process the PDF files that are result of scanning

Re: Problem with Solr indexing "non-searchable" pdf files

2015-12-17 Thread Erick Erickson
Not sure how much help I can be, I have no clue what DSpace is doing with Solr. If you're willing to try to index straight to Solr, you can always use SolrJ to parse the files, it's actually not very hard. Here's an example: https://lucidworks.com/blog/2012/02/14/indexing-with-solrj/ some