indexing pdf binary stored in mongodb?

2016-02-05 Thread Arnett, Gabriel
Anyone have any experience indexing pdfs stored in binary form in mongodb? . Gabe Arnett Senior Director Moody's Analytics - The information contained in this e-mail message, and any attachment thereto, is

Re: indexing pdf binary stored in mongodb?

2016-02-05 Thread Jack Krupansky
See if they are stored in BSON format using GridFS. If so, you can simply use the mongofiles command to retrieve the PDF into a local file and index that in Solr either using Solr Cell or Tika. See: http://blog.mongodb.org/post/183689081/storing-large-objects-and-files-in-mongodb