Hey Maruan, Thought that would be easier … but unless there’s a way I’m overlooking it’s actually really tricky. I guess it would mean lifting the code from the PDFTextStripper that does the extraction, and instead of returning just the string … also return the a mapping to the TextPosition’s. Then somehow figure out from the TextPosition’s the bounding boxes of the text … then write those as annotations separately, I guess.
It all seems rather complicated … is this the route Acrobat and Preview.app etc take to make the highlighting work? Joël > On 02 Sep 2014, at 19:58, Maruan Sahyoun <[email protected]> wrote: > > Hi Joël, > > do you already have the text positions on the page? > > Maruan Sahyoun > >> Am 02.09.2014 um 19:52 schrieb "Joël Kuiper" <[email protected]>: >> >> Well they're uploaded. Basically a user uploads a PDF, the system runs some >> prediction / pattern matching on the text and the user receives the PDF >> with the predicted parts highlighted. >> >> >> I'm just a bit confused on how to (properly) do the last part. >> — >> https://joelkuiper.eu >> >>> On Tue, Sep 2, 2014 at 7:30 PM, Jan Tosovsky <[email protected]> wrote: >>> >>>> On 2014-09-02 Joël Kuiper wrote: >>>> >>>> The problem is that I have a PDF for which I want to highlight a known >>>> string with a color. >>> From what the PDF is produced? It is always better to do this kind of job >>> in the source document. >>> Jan

