Pdf.js is an interesting project, but it's light-years behind Poppler in terms of capability to accurately draw a PDF, as well as support a variety of browsers.
Poppler's solution also has some other advantages: * Accessible environment. If you're planning to interact with or parse any of the "elements" in the rendered PDF, everyone knows how to do that with HTML. With a canvas-rendered PDF, I don't think that's going to be as easy. * Pre-formatted HTML. Having a pre-rendered HTML document is always going to be faster to view than spinning up a JavaScript engine to render everything "on the fly". * No JS needed. A lot of people have JavaScript turned off for security reasons. I was told by a credible source that around 30% of American workplaces do not allow JS-enabled web-browsing. * Added semantics. Poppler's "coalescence" functionality creates meaning in some cases where there is none in the underlying PDF. On 7/20/11 11:03 AM, "Albert Astals Cid" <[email protected]> wrote: >A Dimecres, 20 de juliol de 2011, Akash Agrawal vàreu escriure: >> Hi All, > >Hi > >> >> My name is Akash Agrawal and I am working on producing a full-fledged >>pdf to >> html solution. I investigated poppler and made a lot of custom changes >>for >> my requirement. I got your reference from revision log in pdfthtml >>source >> files. > >Noone in this list is amongst the original programmers of pdftohtml so >there is noone with lots of knowledge over it (I for one >basically ignore most of the things it does or tries to do) > >> I will appreciate if you can address my queries. I am stuck at 2 >> issues currently: >> >> 1. z-index >> 2. Fonts >> >> *z-index:* In it's current solution, poppler's pdftohtml puts all the >> non-text data into an image and use this image as a background image in >> html. But at times, there are pdfs which have image/graphics over the >>text >> and current solution fails in such case. I looked into Gfx and >>OutputDevice >> code and didn't reach a good workable solution for this case. I will be >> highly indebted if you can suggest some pointers. > >The guys from pdf.js render everything into an image and then they are >planning on exposing the text to the user via some advanced >html5/css3 trickery. > >> >> *Fonts:* Fonts are the biggest problem here. I saw that currently, it >> outputs all fonts as Times (default font), so I fixed that with exact >>font >> names (with tag coz multiple versions of a same fonts might be present >>in >> pdf). I also made non-horizontal text as part of image coz rotating the >> glyphs were not a very good idea to me seeing the time in hand. I am >>also >> able to extract font data but facing difficulties to extract encoding >>info >> like cmap etc. > >CMaps are extracted in the CMap.cc file. You might also want to have a >look at FoFiTrueType::writeTTF that is supposed to write a >"corrected" TTF file to disk from back when we did not use FreeType >memory functions. > >> Your pointers on the same will be very much appreciated. FYI >> I am using fontforge to convert extracted fonts in a common format (ttf >>in >> my case). I am thing to apply cmaps using fontforge. Please let me know >>if >> you suggest otherwise. > >Have a look at the list, there was a discussion already on extracting >fonts from PDF files and some people suggested you might get >sued if you do that. > >On the other hand i wonder if you guys should not just be helping the >mozilla dudes that implement pdf.js since that will mean pdf >viewing in browsers that is what you seem to want. > >Albert > >> >> I am waiting for a positive response from your side regarding the same. >> Looking forward for a strong technical relationship. >> >> Regards, >> Akash Agrawal >> http://tech-queries.blogspot.com/ >_______________________________________________ >poppler mailing list >[email protected] >http://lists.freedesktop.org/mailman/listinfo/poppler _______________________________________________ poppler mailing list [email protected] http://lists.freedesktop.org/mailman/listinfo/poppler
