Hi Todd, You're in the best position to comment on the suitability of the approaches. I really don't know what your goal is.
Having worked a bit on the librsvg, Cairo, and poppler projects, I know that one can render a poppler page to a Cairo object via the poppler_page_render() function. And that Cairo supports writing to SVG surfaces, preserving all of the vector goodness (when possible) that you seem to expect. http://www.cairographics.org/manual/cairo-SVG-Surfaces.html You can test this out using the "pdftocairo" command line tool without needing to write a line of code. I believe that one can do something similar with the Qt backend, but that's outside of my area of expertise. I hope that helps, Dom On Fri, Nov 4, 2011 at 7:58 AM, Todd Hubers <[email protected]> wrote: > Hi Dom, > You can probably tell me :) I'm not claiming to be a poppler genius. Please > do elaborate on the suitability the CairoOutputDevice to generate an SVG > (remembering that SVGs are favoured for their vector ability for text, lines > and filled shapes). > > Thanks, Todd. > > On 4 November 2011 22:55, Dominic Lachowicz <[email protected]> wrote: >> >> Just out of curiosity, how would the proposed SVGOutputDevice differ >> from using (say) the existing CairoOutputDevice that was configured to >> write to SVG? That can already be accomplished today. >> >> Thanks, >> Dom >> >> On Fri, Nov 4, 2011 at 7:38 AM, Todd Hubers <[email protected]> >> wrote: >> > Alec, I'm quite sold on the SVG idea. It is self contained and can even >> > work >> > outside the browser. >> > Josh, it would seem that the HTMLOutputDevice is the better candidate >> > for >> > SVG. HTML would be a good interim solution as well, however with SVG, >> > everything is packaged into a single file as a package. With HTML the >> > browser is making repeated calls back to the web server (for image >> > resources), but with SVG it's naturally all together. You can also >> > achieve >> > effects like gradients in SVG quite easily and is better supported by >> > older >> > browsers than alternative approaches to getting PDF into the browser. >> > I am interested in seeing the latest version of the HTML solution. I may >> > attempt some preliminary SVG rendering. >> > >> > Back on the topic of "Data" output device. I'm already using XML for RTF >> > output (I'm doing this in my language of choice - C# though so it's not >> > an >> > easy task to contribute this back to poppler). It's true that direct >> > implementation of device drivers are more efficient, however XML or the >> > like >> > do provide a convenient interface very accessible for many programming >> > languages. I would not expect such a "data" output device to be used by >> > PDF >> > viewing applications. However it would be good for all other purposes, >> > where >> > such implementations are usually performed in batch processes and the >> > extra >> > processing in the presence of multi-threading is readily accepted in >> > return >> > for flexibility - that is, a larger community can make use of poppler. >> > Cheers, >> > Todd >> > On 4 November 2011 17:24, Josh Richardson <[email protected]> wrote: >> >> >> >> Hi Todd, >> >> Some of us who are working on pdftohtml utility have had similar >> >> thoughts. >> >> It's on my wish list to completely remove the need for a poppler >> >> output >> >> device by utilizing the SVG toolset available in modern browsers. In >> >> any >> >> case, we are achieving high accuracy on Gecko and Webkit browsers with >> >> the >> >> current version (not merged into the Poppler main repo yet, but I can >> >> send >> >> you an invite for a git repo that Alec Taylor made, which has all those >> >> latest changes.) I think it might meet your needs as-is, or with some >> >> tweaks to make it work better on other browsers. >> >> We are currently extracting the text and fonts for the browser to >> >> render >> >> directly, but still must rely on Splash, Cairo, etc. to rasterize other >> >> graphic operations. With the way we've done it, we have an easy path >> >> to >> >> change over to SVG, one graphic operation at a time, if you'd be >> >> interested >> >> in doing that. >> >> The idea of a separate "data" device is interesting, but I don't think >> >> it's the right way to go. In effect, you are talking about changing >> >> the PDF >> >> data to XML, and from there to other formats. I can appreciate the >> >> sentiment, since PDF is such a difficult format to work with, but >> >> adding a >> >> layer of abstraction is just going to make things more complex, >> >> error-prone, >> >> and slow. To note, the current version of pdftohtml creates a valid >> >> XML-compliant HTML format — actually there's a small bug, but you >> >> probably >> >> get the point. You can always use the XML-compliant HTML as your >> >> easier-to-digest "data" format, which also allows us to represent more >> >> semantics than are available in the original PDF document, and you can >> >> always extend it with whatever XML tags you need. For example, I >> >> extended >> >> it with an attribute describing bounding boxes for all of the text >> >> spans. >> >> Let me know if you want the repo invite. >> >> Best, --josh >> >> From: Todd Hubers <[email protected]> >> >> Date: Thu, 3 Nov 2011 18:13:52 -0700 >> >> To: "[email protected]" <[email protected]> >> >> Subject: [poppler] Poppler - SVG Device >> >> >> >> I'm currently using Poppler for Text extraction and using GhostScript >> >> for >> >> PDF to Image functionality, all for viewing PDFs online without >> >> requiring a >> >> PDF plugin in the browser. >> >> >> >> I noticed Mozilla was working on an interesting project, PDF.js >> >> [https://wiki.mozilla.org/PDF.js]. It loads PDF files with pure >> >> Javascript >> >> (on a HTML5 compatible browser - probably needs canvas). >> >> >> >> This is an opportunity for poppler to steam ahead and get some headline >> >> grabbing exposure. The SVG format is well supported by browsers. PDFs >> >> are >> >> portable across systems, however SVGs are very portable (and fast) >> >> across >> >> the web. >> >> >> >> I propose the building of an SVG Device - PDF to SVG. I am currently >> >> considering using PDF to XML, to then perform XML to SVG. Given the >> >> status >> >> quo, I believe it's time for PDF to SVG. >> >> >> >> I see SVG as a very efficient and therefore powerful web format, I hope >> >> others in the poppler community will see the potential as I do. >> >> >> >> Thanks, >> >> >> >> Todd Hubers (BBIT Hons) >> >> Alivate >> >> >> >> PS. Perhaps we could then have PDF>Cairo, PDF>SVG, and then tools for >> >> SVG>XML, SVG>HTML, SVG>Text. In any case it would be good to have >> >> simply one >> >> direct rendering device and one "data" device. >> > >> > >> > _______________________________________________ >> > poppler mailing list >> > [email protected] >> > http://lists.freedesktop.org/mailman/listinfo/poppler >> > >> > >> >> >> >> -- >> "I like to pay taxes. With them, I buy civilization." -- Oliver Wendell >> Holmes > > -- "I like to pay taxes. With them, I buy civilization." -- Oliver Wendell Holmes _______________________________________________ poppler mailing list [email protected] http://lists.freedesktop.org/mailman/listinfo/poppler
