Yes, that's pretty much how you can do it, and yes, it's very tricky to
implement.
I have in fact written the code that does something like that and I use it
in many of my applications.

Acrobat and Preview probably do something similar, yes.


On Tue, Sep 2, 2014 at 11:11 PM, Joël Kuiper <[email protected]> wrote:

> Hey Maruan,
>
> Thought that would be easier … but unless there’s a way I’m overlooking
> it’s actually really tricky.
> I guess it would mean lifting the code from the PDFTextStripper that does
> the extraction, and instead of returning just the string … also return the
> a mapping to the TextPosition’s.
> Then somehow figure out from the TextPosition’s the bounding boxes of the
> text … then write those as annotations separately, I guess.
>
> It all seems rather complicated … is this the route Acrobat and
> Preview.app etc take to make the highlighting work?
>
> Joël
>
>
> > On 02 Sep 2014, at 19:58, Maruan Sahyoun <[email protected]> wrote:
> >
> > Hi Joël,
> >
> > do you already have the text positions on the page?
> >
> > Maruan Sahyoun
> >
> >> Am 02.09.2014 um 19:52 schrieb "Joël Kuiper" <[email protected]>:
> >>
> >> Well they're uploaded. Basically a user uploads a PDF, the system runs
> some prediction / pattern matching on the text  and the user receives the
> PDF with the predicted parts highlighted.
> >>
> >>
> >> I'm just a bit confused on how to (properly) do the last part.
> >> —
> >> https://joelkuiper.eu
> >>
> >>> On Tue, Sep 2, 2014 at 7:30 PM, Jan Tosovsky <[email protected]>
> wrote:
> >>>
> >>>> On 2014-09-02 Joël Kuiper wrote:
> >>>>
> >>>> The problem is that I have a PDF for which I want to highlight a known
> >>>> string with a color.
> >>> From what the PDF is produced? It is always better to do this kind of
> job in the source document.
> >>> Jan
>
>

Reply via email to