Re: Add highlight/annotation to known string of text within a PDF

Joël Kuiper Tue, 09 Sep 2014 15:58:20 -0700

So I figured it out. Those were not a pleasant 6 hours ;-) 

I’ve subclassed the PDFTextStripper to build a cache (called textCache) that 
maintains (per page) a mapping between the characters and the TextPositions, 
instead of just returning the final string. 
Using a regular expression you can then find the TextPositions in the cache 
that match the pattern. 
From that list of TextPositions the bounding boxes can then be calculated which 
can be put in as PDAnnotationTextMarkup's.


The code is not pretty (haven’t done Java in a while and it was a rush job) but 
it may provide a nice starting point for more serious stuff! 

https://gist.github.com/joelkuiper/9eb52555e02edb653dcf

Hopefully this is useful to someone else as well! 

Joël

Re: Add highlight/annotation to known string of text within a PDF

Reply via email to