Re: [poppler] On PDF Text Extraction

Behdad Esfahbod Wed, 19 Sep 2007 14:13:19 -0700

On Wed, 2007-09-19 at 16:55 +0200, Martin Schröder wrote:
> 2007/9/19, Behdad Esfahbod <[EMAIL PROTECTED]>:
> > Anyway, I wrote about PDF text extraction from the point of view of what
> > cairo should be doing to generate perfectly text-extractable PDFs.
> > Forwarding the message here as people may be interested.  I also point
> > out a few poppler bugs.  I plan to fix them at some point, but it may be
> > an obvious small fix to those familiar with the code base.
> 
> Two things to note, since you are talking about extracting information
> from PDFs you created yourself:
> - tagged PDF can embed more information in the PDF than pure glyphs and may 
> help
> - if tagged PDF is not enough, you can embed even more information
> yourself using private structures


Thanks.  That's not really the goal though.  What I want to do is to
make pango+cairo generate PDFs that has text extractable in all common
viewers.  Part of that work is to fix bugs in Poppler.

Tagged PDF allows for a lot more information to be stored, but it
doesn't solved the problem of glyph to text mapping.

> Best
>    Martin

Regards,
-- 
behdad
http://behdad.org/

"Those who would give up Essential Liberty to purchase a little
 Temporary Safety, deserve neither Liberty nor Safety."
        -- Benjamin Franklin, 1759



_______________________________________________
poppler mailing list
[email protected]
http://lists.freedesktop.org/mailman/listinfo/poppler

Re: [poppler] On PDF Text Extraction

Reply via email to