Yes, that's pretty much how you can do it, and yes, it's very tricky to implement. I have in fact written the code that does something like that and I use it in many of my applications.
Acrobat and Preview probably do something similar, yes. On Tue, Sep 2, 2014 at 11:11 PM, Joël Kuiper <[email protected]> wrote: > Hey Maruan, > > Thought that would be easier … but unless there’s a way I’m overlooking > it’s actually really tricky. > I guess it would mean lifting the code from the PDFTextStripper that does > the extraction, and instead of returning just the string … also return the > a mapping to the TextPosition’s. > Then somehow figure out from the TextPosition’s the bounding boxes of the > text … then write those as annotations separately, I guess. > > It all seems rather complicated … is this the route Acrobat and > Preview.app etc take to make the highlighting work? > > Joël > > > > On 02 Sep 2014, at 19:58, Maruan Sahyoun <[email protected]> wrote: > > > > Hi Joël, > > > > do you already have the text positions on the page? > > > > Maruan Sahyoun > > > >> Am 02.09.2014 um 19:52 schrieb "Joël Kuiper" <[email protected]>: > >> > >> Well they're uploaded. Basically a user uploads a PDF, the system runs > some prediction / pattern matching on the text and the user receives the > PDF with the predicted parts highlighted. > >> > >> > >> I'm just a bit confused on how to (properly) do the last part. > >> — > >> https://joelkuiper.eu > >> > >>> On Tue, Sep 2, 2014 at 7:30 PM, Jan Tosovsky <[email protected]> > wrote: > >>> > >>>> On 2014-09-02 Joël Kuiper wrote: > >>>> > >>>> The problem is that I have a PDF for which I want to highlight a known > >>>> string with a color. > >>> From what the PDF is produced? It is always better to do this kind of > job in the source document. > >>> Jan > >

