Hi,

Am 01.09.2012 04:24, schrieb Mac P:

Hello Forum

Is there any way to to split a master pdf file consisted of so many pages into 
separate pages based on the content or keywords in each page?

Each page has the person's first and last name. I would like to grep the last 
name and write a scripts to separate each page, turn it into a new pdf file 
with the last name being part of the file name instead of sequential numbers 
matching the total number of pages at the end of each file name.

I know PDFs are binary documents. Are there any tools to look up the last names 
and manipulate them that way?
Use PDFSplit [1] to split your pdf in single pages and ExtractText [2] to get the string your looking for. The first goal should work out of the box the latter could be complicated depending on the used fonts etc. Just give it a try.

Thanks

Mac

BR
Andreas Lehmkühler

P.S.: Subscribe yourself correctly to the mailing-list [3], otherwise you won't get any answer.

[1] http://pdfbox.apache.org/commandlineutilities/PDFSplit.html
[2] http://pdfbox.apache.org/commandlineutilities/ExtractText.html
[3] http://pdfbox.apache.org/mail-lists.html

Reply via email to