hi

i would ask to you a question about PDFTextStripper:

I need to extract only some keyword/text patterns during the parsing of
every pdf line ON EACH PAGE (NOT ALL PDF PAGES)


for eg.

pdf like:
ABC 123
xyg 4
zz 2

I only need to obtain a string text

ABC 123
zzz 2

and i need also to get the page position of every text extracted

So i suppose to use a filter parsing

public class myFilter {

public accept( String text){
..
}
}

during the pdf parsing (line by line), pdfBox  call method accept

Isn't there something like an Estenxion (aka specialization/implementation)
that do this, and how add for PDFBox?

Im checking the source code but i cant find it.. I check that method
writeText return all pages and not each one..

If there isnt a solution i have to make filter parsing on entire text
string and use tag page

Page n 1
ABC 123
xyg 4
zz 1

..
..

Page n 2
ABC 456
xyhk
zz 2

Reply via email to