Hey Maruan, 

Thought that would be easier … but unless there’s a way I’m overlooking it’s 
actually really tricky. 
I guess it would mean lifting the code from the PDFTextStripper that does the 
extraction, and instead of returning just the string … also return the a 
mapping to the TextPosition’s.
Then somehow figure out from the TextPosition’s the bounding boxes of the text 
… then write those as annotations separately, I guess. 

It all seems rather complicated … is this the route Acrobat and Preview.app etc 
take to make the highlighting work? 

Joël


> On 02 Sep 2014, at 19:58, Maruan Sahyoun <[email protected]> wrote:
> 
> Hi Joël,
> 
> do you already have the text positions on the page?
> 
> Maruan Sahyoun
> 
>> Am 02.09.2014 um 19:52 schrieb "Joël Kuiper" <[email protected]>:
>> 
>> Well they're uploaded. Basically a user uploads a PDF, the system runs some 
>> prediction / pattern matching on the text  and the user receives the PDF 
>> with the predicted parts highlighted. 
>> 
>> 
>> I'm just a bit confused on how to (properly) do the last part. 
>> —
>> https://joelkuiper.eu
>> 
>>> On Tue, Sep 2, 2014 at 7:30 PM, Jan Tosovsky <[email protected]> wrote:
>>> 
>>>> On 2014-09-02 Joël Kuiper wrote:
>>>> 
>>>> The problem is that I have a PDF for which I want to highlight a known
>>>> string with a color.
>>> From what the PDF is produced? It is always better to do this kind of job 
>>> in the source document.
>>> Jan

Reply via email to