Re: [poppler] On PDF Text Extraction

Martin Schröder Wed, 19 Sep 2007 07:55:18 -0700

2007/9/19, Behdad Esfahbod <[EMAIL PROTECTED]>:
> Anyway, I wrote about PDF text extraction from the point of view of what
> cairo should be doing to generate perfectly text-extractable PDFs.
> Forwarding the message here as people may be interested.  I also point
> out a few poppler bugs.  I plan to fix them at some point, but it may be
> an obvious small fix to those familiar with the code base.


Two things to note, since you are talking about extracting information
from PDFs you created yourself:
- tagged PDF can embed more information in the PDF than pure glyphs and may help
- if tagged PDF is not enough, you can embed even more information
yourself using private structures

Best
   Martin
_______________________________________________
poppler mailing list
[email protected]
http://lists.freedesktop.org/mailman/listinfo/poppler

Re: [poppler] On PDF Text Extraction

Reply via email to