pdfboxpreparator and Regain

David Picella Sun, 06 Dec 2009 14:04:41 -0800

I'm using the Regain search engine powered by Lucene

It has integration with pdfbox using a special indexing preparator called
PdfBoxPreparator.


Does anyone know if PdfBoxPreparator will extract data from the title,
author, and keyword sections of the pdf?  Also, what pdf versions are
compatible?  Thank you!

Here is a post in the Regain forum that I submitted, but I have not heard
anything.
http://forum.murfman.de/en/viewtopic.php?f=3&t=1216

I am saving PDFs on my system that are "scanned" and therefore there is no
text available in the body.  I am looking for a good way to find these and I
was thinking that I could do so by editing the title, keywords, and author
lines in the PDF.

-- 
David

pdfboxpreparator and Regain

Reply via email to