A Divendres, 4 de novembre de 2011, Dominic Lachowicz vàreu escriure: > Hi Todd, > > You're in the best position to comment on the suitability of the > approaches. I really don't know what your goal is. > > Having worked a bit on the librsvg, Cairo, and poppler projects, I > know that one can render a poppler page to a Cairo object via the > poppler_page_render() function. And that Cairo supports writing to SVG > surfaces, preserving all of the vector goodness (when possible) that > you seem to expect.
Without knowing anything about what Cairo does behind the scenes I guess the harder part is vectorizing the fonts. > http://www.cairographics.org/manual/cairo-SVG-Surfaces.html > > You can test this out using the "pdftocairo" command line tool without > needing to write a line of code. > > I believe that one can do something similar with the Qt backend, but > that's outside of my area of expertise. Yeah QPainter can do that too, but given the Arthur backend can not be compared against the Splash or Cairo ones i guess the results would not be that great. Albert > > I hope that helps, > Dom > > On Fri, Nov 4, 2011 at 7:58 AM, Todd Hubers <[email protected]> wrote: > > Hi Dom, > > You can probably tell me :) I'm not claiming to be a poppler genius. > > Please do elaborate on the suitability the CairoOutputDevice to > > generate an SVG (remembering that SVGs are favoured for their vector > > ability for text, lines and filled shapes). > > > > Thanks, Todd. > > > > On 4 November 2011 22:55, Dominic Lachowicz <[email protected]> wrote: > >> Just out of curiosity, how would the proposed SVGOutputDevice differ > >> from using (say) the existing CairoOutputDevice that was configured to > >> write to SVG? That can already be accomplished today. > >> > >> Thanks, > >> Dom > >> > >> On Fri, Nov 4, 2011 at 7:38 AM, Todd Hubers > >> <[email protected]> > >> > >> wrote: > >> > Alec, I'm quite sold on the SVG idea. It is self contained and can > >> > even work > >> > outside the browser. > >> > Josh, it would seem that the HTMLOutputDevice is the better > >> > candidate > >> > for > >> > SVG. HTML would be a good interim solution as well, however with > >> > SVG, > >> > everything is packaged into a single file as a package. With HTML > >> > the > >> > browser is making repeated calls back to the web server (for image > >> > resources), but with SVG it's naturally all together. You can also > >> > achieve > >> > effects like gradients in SVG quite easily and is better supported > >> > by > >> > older > >> > browsers than alternative approaches to getting PDF into the > >> > browser. > >> > I am interested in seeing the latest version of the HTML solution. > >> > I may attempt some preliminary SVG rendering. > >> > > >> > Back on the topic of "Data" output device. I'm already using XML > >> > for RTF output (I'm doing this in my language of choice - C# > >> > though so it's not an > >> > easy task to contribute this back to poppler). It's true that > >> > direct > >> > implementation of device drivers are more efficient, however XML > >> > or the like > >> > do provide a convenient interface very accessible for many > >> > programming > >> > languages. I would not expect such a "data" output device to be > >> > used by PDF > >> > viewing applications. However it would be good for all other > >> > purposes, > >> > where > >> > such implementations are usually performed in batch processes and > >> > the > >> > extra > >> > processing in the presence of multi-threading is readily accepted > >> > in > >> > return > >> > for flexibility - that is, a larger community can make use of > >> > poppler. > >> > Cheers, > >> > Todd > >> > > >> > On 4 November 2011 17:24, Josh Richardson <[email protected]> wrote: > >> >> Hi Todd, > >> >> Some of us who are working on pdftohtml utility have had similar > >> >> thoughts. > >> >> It's on my wish list to completely remove the need for a > >> >> poppler > >> >> output > >> >> device by utilizing the SVG toolset available in modern > >> >> browsers. In > >> >> any > >> >> case, we are achieving high accuracy on Gecko and Webkit > >> >> browsers with the > >> >> current version (not merged into the Poppler main repo yet, but > >> >> I can > >> >> send > >> >> you an invite for a git repo that Alec Taylor made, which has > >> >> all those latest changes.) I think it might meet your needs > >> >> as-is, or with some tweaks to make it work better on other > >> >> browsers. > >> >> We are currently extracting the text and fonts for the browser > >> >> to > >> >> render > >> >> directly, but still must rely on Splash, Cairo, etc. to > >> >> rasterize other graphic operations. With the way we've done > >> >> it, we have an easy path to > >> >> change over to SVG, one graphic operation at a time, if you'd be > >> >> interested > >> >> in doing that. > >> >> The idea of a separate "data" device is interesting, but I don't > >> >> think it's the right way to go. In effect, you are talking > >> >> about changing the PDF > >> >> data to XML, and from there to other formats. I can appreciate > >> >> the > >> >> sentiment, since PDF is such a difficult format to work with, > >> >> but > >> >> adding a > >> >> layer of abstraction is just going to make things more complex, > >> >> error-prone, > >> >> and slow. To note, the current version of pdftohtml creates a > >> >> valid > >> >> XML-compliant HTML format — actually there's a small bug, but > >> >> you > >> >> probably > >> >> get the point. You can always use the XML-compliant HTML as > >> >> your > >> >> easier-to-digest "data" format, which also allows us to > >> >> represent more semantics than are available in the original PDF > >> >> document, and you can always extend it with whatever XML tags > >> >> you need. For example, I extended > >> >> it with an attribute describing bounding boxes for all of the > >> >> text > >> >> spans. > >> >> Let me know if you want the repo invite. > >> >> Best, --josh > >> >> From: Todd Hubers <[email protected]> > >> >> Date: Thu, 3 Nov 2011 18:13:52 -0700 > >> >> To: "[email protected]" > >> >> <[email protected]> > >> >> Subject: [poppler] Poppler - SVG Device > >> >> > >> >> I'm currently using Poppler for Text extraction and using > >> >> GhostScript > >> >> for > >> >> PDF to Image functionality, all for viewing PDFs online without > >> >> requiring a > >> >> PDF plugin in the browser. > >> >> > >> >> I noticed Mozilla was working on an interesting project, PDF.js > >> >> [https://wiki.mozilla.org/PDF.js]. It loads PDF files with pure > >> >> Javascript > >> >> (on a HTML5 compatible browser - probably needs canvas). > >> >> > >> >> This is an opportunity for poppler to steam ahead and get some > >> >> headline grabbing exposure. The SVG format is well supported by > >> >> browsers. PDFs are > >> >> portable across systems, however SVGs are very portable (and > >> >> fast) > >> >> across > >> >> the web. > >> >> > >> >> I propose the building of an SVG Device - PDF to SVG. I am > >> >> currently > >> >> considering using PDF to XML, to then perform XML to SVG. Given > >> >> the > >> >> status > >> >> quo, I believe it's time for PDF to SVG. > >> >> > >> >> I see SVG as a very efficient and therefore powerful web format, > >> >> I hope others in the poppler community will see the potential > >> >> as I do. > >> >> > >> >> Thanks, > >> >> > >> >> Todd Hubers (BBIT Hons) > >> >> Alivate > >> >> > >> >> PS. Perhaps we could then have PDF>Cairo, PDF>SVG, and then > >> >> tools for > >> >> SVG>XML, SVG>HTML, SVG>Text. In any case it would be good to > >> >> have > >> >> simply one > >> >> direct rendering device and one "data" device. > >> > > >> > _______________________________________________ > >> > poppler mailing list > >> > [email protected] > >> > http://lists.freedesktop.org/mailman/listinfo/poppler > >> > >> -- > >> "I like to pay taxes. With them, I buy civilization." -- Oliver > >> Wendell > >> Holmes > > -- > "I like to pay taxes. With them, I buy civilization." -- Oliver Wendell > Holmes _______________________________________________ > poppler mailing list > [email protected] > http://lists.freedesktop.org/mailman/listinfo/poppler _______________________________________________ poppler mailing list [email protected] http://lists.freedesktop.org/mailman/listinfo/poppler
