Re: [poppler] smaller HTML images

Albert Astals Cid Thu, 23 Jun 2011 00:51:51 -0700

A Thursday, June 23, 2011, Josh Richardson va escriure:
> Currently pdftohtml is creating one large image for each HTML page
> rendered.  In order to reduce the size of the HTML file bundles, as well
> as to improve the semantic value of the HTML, Stephen and I would like to
> extract and use only the portions of that background image that are not
> background white.
> 
> In order to accomplish this, our idea is to add hooks into the
> SplashOutputDevNoText to catch painting operations, and record coordinates
> of the bounding box for any painting operations.  After recording each
> bounding box, we'll draw a new bounding box to combine any contiguous
> regions.  Once we have a list of non-contiguous bounding boxes
> representing all graphics operations that have occurred on the page, we'll
> use those bounding boxes to extract only the relevant regions from the
> large background image, save each region as a separate file, and reference
> the files from the HTML.
> 
> Since we're extending the output device, we'll rename it from
> SplashOutputDevNoText to better capture the new role: 
> SplashOutputDevHtmlImages.  If you think we should retain the old behavior
> with a switch, please let me know — I don't see a significant benefit to
> it.


How are you planning to make text overlap correctly the image if the image 
size is changed?

Albert

> 
> As always, any comments appreciated.
> 
> --josh
_______________________________________________
poppler mailing list
[email protected]
http://lists.freedesktop.org/mailman/listinfo/poppler

Re: [poppler] smaller HTML images

Reply via email to