Hi,
Am 01.09.2012 04:24, schrieb Mac P:
Hello Forum
Is there any way to to split a master pdf file consisted of so many pages into
separate pages based on the content or keywords in each page?
Each page has the person's first and last name. I would like to grep the last
name and write a scripts to separate each page, turn it into a new pdf file
with the last name being part of the file name instead of sequential numbers
matching the total number of pages at the end of each file name.
I know PDFs are binary documents. Are there any tools to look up the last names
and manipulate them that way?
Use PDFSplit [1] to split your pdf in single pages and ExtractText [2] to get
the string your looking for. The first goal should work out of the box the
latter could be complicated depending on the used fonts etc. Just give it a try.
Thanks
Mac
BR
Andreas Lehmkühler
P.S.: Subscribe yourself correctly to the mailing-list [3], otherwise you won't
get any answer.
[1] http://pdfbox.apache.org/commandlineutilities/PDFSplit.html
[2] http://pdfbox.apache.org/commandlineutilities/ExtractText.html
[3] http://pdfbox.apache.org/mail-lists.html