A Thursday, June 23, 2011, Josh Richardson va escriure: > Currently pdftohtml is creating one large image for each HTML page > rendered. In order to reduce the size of the HTML file bundles, as well > as to improve the semantic value of the HTML, Stephen and I would like to > extract and use only the portions of that background image that are not > background white. > > In order to accomplish this, our idea is to add hooks into the > SplashOutputDevNoText to catch painting operations, and record coordinates > of the bounding box for any painting operations. After recording each > bounding box, we'll draw a new bounding box to combine any contiguous > regions. Once we have a list of non-contiguous bounding boxes > representing all graphics operations that have occurred on the page, we'll > use those bounding boxes to extract only the relevant regions from the > large background image, save each region as a separate file, and reference > the files from the HTML. > > Since we're extending the output device, we'll rename it from > SplashOutputDevNoText to better capture the new role: > SplashOutputDevHtmlImages. If you think we should retain the old behavior > with a switch, please let me know — I don't see a significant benefit to > it.
How are you planning to make text overlap correctly the image if the image size is changed? Albert > > As always, any comments appreciated. > > --josh _______________________________________________ poppler mailing list [email protected] http://lists.freedesktop.org/mailman/listinfo/poppler
