hi
i would ask to you a question about PDFTextStripper:
I need to extract only some keyword/text patterns during the parsing of
every pdf line ON EACH PAGE (NOT ALL PDF PAGES)
for eg.
pdf like:
ABC 123
xyg 4
zz 2
I only need to obtain a string text
ABC 123
zzz 2
and i need also to get the page position of every text extracted
So i suppose to use a filter parsing
public class myFilter {
public accept( String text){
..
}
}
during the pdf parsing (line by line), pdfBox call method accept
Isn't there something like an Estenxion (aka specialization/implementation)
that do this, and how add for PDFBox?
Im checking the source code but i cant find it.. I check that method
writeText return all pages and not each one..
If there isnt a solution i have to make filter parsing on entire text
string and use tag page
Page n 1
ABC 123
xyg 4
zz 1
..
..
Page n 2
ABC 456
xyhk
zz 2