I'm using the Regain search engine powered by Lucene It has integration with pdfbox using a special indexing preparator called PdfBoxPreparator.
Does anyone know if PdfBoxPreparator will extract data from the title, author, and keyword sections of the pdf? Also, what pdf versions are compatible? Thank you! Here is a post in the Regain forum that I submitted, but I have not heard anything. http://forum.murfman.de/en/viewtopic.php?f=3&t=1216 I am saving PDFs on my system that are "scanned" and therefore there is no text available in the body. I am looking for a good way to find these and I was thinking that I could do so by editing the title, keywords, and author lines in the PDF. -- David

