A Dilluns, 14 de novembre de 2011, Igor Slepchin vàreu escriure: > I know that dumping images when running pdftohtml with -xml flag has > been brought up before and it seems that the devs said they would accept > a patch; however, it looks like nothing has made it into the source tree > so far. I figured I could give this a try too so please take a look at > my proposed changes if there is still some interest in this > functionality: https://github.com/igors/poppler/tree/xml_images > > The first commit in the above branch fixes up pdf2xml.dtd to match what > pdftohtml generates; the second patch adds support for images in -xml > mode. With this patch applied, pdftohtml -xml will dump all image files > just like it does in html mode and will add image elements at the > beginning of each page that has images, i.e., you'll see something like > the following in the generated xml: > > <page number="51" position="absolute" top="0" left="0" > height="896" width="572"> > <image top="45" left="26" width="523" height="373" src="filename.jpg"/> > <text top="534" left="81" width="17" height="15" font="18">In </text> > > The default behavior with -xml switch is to process images now; adding > -i option restores the old behavior. > > The change is small enough that I hope it won't be very controversial > but comments are certainly appreciated.
I'm a bit confused you add encoding="US-ASCII" to the first line pdf2xml.dtd and then you remove it altogether? I'm wondering if why you did not add make GfxState *state a parameter of the constructor. Seems to be mandatory to call the transform method. I'd prefer if you make HtmlImage a class. It'd be cool if next time you attach the patches instead of making me go and lose time trying to navigate github ;-) Albert > > Thanks, > Igor > _______________________________________________ > poppler mailing list > [email protected] > http://lists.freedesktop.org/mailman/listinfo/poppler _______________________________________________ poppler mailing list [email protected] http://lists.freedesktop.org/mailman/listinfo/poppler
