On Wed, 2007-09-19 at 16:55 +0200, Martin Schröder wrote: > 2007/9/19, Behdad Esfahbod <[EMAIL PROTECTED]>: > > Anyway, I wrote about PDF text extraction from the point of view of what > > cairo should be doing to generate perfectly text-extractable PDFs. > > Forwarding the message here as people may be interested. I also point > > out a few poppler bugs. I plan to fix them at some point, but it may be > > an obvious small fix to those familiar with the code base. > > Two things to note, since you are talking about extracting information > from PDFs you created yourself: > - tagged PDF can embed more information in the PDF than pure glyphs and may > help > - if tagged PDF is not enough, you can embed even more information > yourself using private structures
Thanks. That's not really the goal though. What I want to do is to make pango+cairo generate PDFs that has text extractable in all common viewers. Part of that work is to fix bugs in Poppler. Tagged PDF allows for a lot more information to be stored, but it doesn't solved the problem of glyph to text mapping. > Best > Martin Regards, -- behdad http://behdad.org/ "Those who would give up Essential Liberty to purchase a little Temporary Safety, deserve neither Liberty nor Safety." -- Benjamin Franklin, 1759 _______________________________________________ poppler mailing list [email protected] http://lists.freedesktop.org/mailman/listinfo/poppler
