Re: user a filter in a PDFStripper parsing

John Hewson Thu, 08 Oct 2015 17:46:05 -0700

You want to subclass PDFTextStripper. It can do all the things you’ve mentioned.


— John

> On 7 Oct 2015, at 05:13, robyp7 . <[email protected]> wrote:
> 
> hi
> 
> i would ask to you a question about PDFTextStripper:
> 
> I need to extract only some keyword/text patterns during the parsing of
> every pdf line ON EACH PAGE (NOT ALL PDF PAGES)
> 
> 
> for eg.
> 
> pdf like:
> ABC 123
> xyg 4
> zz 2
> 
> I only need to obtain a string text
> 
> ABC 123
> zzz 2
> 
> and i need also to get the page position of every text extracted
> 
> So i suppose to use a filter parsing
> 
> public class myFilter {
> 
> public accept( String text){
> ..
> }
> }
> 
> during the pdf parsing (line by line), pdfBox  call method accept
> 
> Isn't there something like an Estenxion (aka specialization/implementation)
> that do this, and how add for PDFBox?
> 
> Im checking the source code but i cant find it.. I check that method
> writeText return all pages and not each one..
> 
> If there isnt a solution i have to make filter parsing on entire text
> string and use tag page
> 
> Page n 1
> ABC 123
> xyg 4
> zz 1
> 
> ..
> ..
> 
> Page n 2
> ABC 456
> xyhk
> zz 2

Re: user a filter in a PDFStripper parsing

Reply via email to