Hi All, My name is Akash Agrawal and I am working on producing a full-fledged pdf to html solution. I investigated poppler and made a lot of custom changes for my requirement. I got your reference from revision log in pdfthtml source files. I will appreciate if you can address my queries. I am stuck at 2 issues currently:
1. z-index 2. Fonts *z-index:* In it's current solution, poppler's pdftohtml puts all the non-text data into an image and use this image as a background image in html. But at times, there are pdfs which have image/graphics over the text and current solution fails in such case. I looked into Gfx and OutputDevice code and didn't reach a good workable solution for this case. I will be highly indebted if you can suggest some pointers. *Fonts:* Fonts are the biggest problem here. I saw that currently, it outputs all fonts as Times (default font), so I fixed that with exact font names (with tag coz multiple versions of a same fonts might be present in pdf). I also made non-horizontal text as part of image coz rotating the glyphs were not a very good idea to me seeing the time in hand. I am also able to extract font data but facing difficulties to extract encoding info like cmap etc. Your pointers on the same will be very much appreciated. FYI I am using fontforge to convert extracted fonts in a common format (ttf in my case). I am thing to apply cmaps using fontforge. Please let me know if you suggest otherwise. I am waiting for a positive response from your side regarding the same. Looking forward for a strong technical relationship. Regards, Akash Agrawal http://tech-queries.blogspot.com/
_______________________________________________ poppler mailing list [email protected] http://lists.freedesktop.org/mailman/listinfo/poppler
