You want to subclass PDFTextStripper. It can do all the things you’ve mentioned.
— John > On 7 Oct 2015, at 05:13, robyp7 . <[email protected]> wrote: > > hi > > i would ask to you a question about PDFTextStripper: > > I need to extract only some keyword/text patterns during the parsing of > every pdf line ON EACH PAGE (NOT ALL PDF PAGES) > > > for eg. > > pdf like: > ABC 123 > xyg 4 > zz 2 > > I only need to obtain a string text > > ABC 123 > zzz 2 > > and i need also to get the page position of every text extracted > > So i suppose to use a filter parsing > > public class myFilter { > > public accept( String text){ > .. > } > } > > during the pdf parsing (line by line), pdfBox call method accept > > Isn't there something like an Estenxion (aka specialization/implementation) > that do this, and how add for PDFBox? > > Im checking the source code but i cant find it.. I check that method > writeText return all pages and not each one.. > > If there isnt a solution i have to make filter parsing on entire text > string and use tag page > > Page n 1 > ABC 123 > xyg 4 > zz 1 > > .. > .. > > Page n 2 > ABC 456 > xyhk > zz 2

