On 18/02/12 04:31, Albert Astals Cid wrote: > El Dijous, 16 de febrer de 2012, a les 22:51:10, Dan Filimon va escriure: >>>> I've been looking for ways to extract image and word positions (also >>>> how words form sentences and paragraphs would be useful) from a PDF. >>>> I'd like to get maps of words/images to rectangles (position, width, >>>> height). >>>> >>>> Also, it would really be great if I could get the positions and >>>> hierarchy for every object on a page (sorry about my vague terminology >>>> when it comes to PDF, I've never worked with it). I tried looking at >>>> the code but there don't seem to be many comments and I can't find any >>>> documentation... >>>> >>>> Could you please point me in the right direction? >>> >>> Poppler::Page::textList seems to be what you want >>> >>> http://people.freedesktop.org/~aacid/docs/qt4/classPoppler_1_1Page.html# >>> a75dea3bf58f339f224239b757b4c1bb2 >>> >>> Albert >> >> Thanks for the quick reply! >> >> Yes, that seems to be exactly what I'm looking for, but there doesn't >> seem to be a corresponding one for images. >> Actually, there doesn't seem to be any dedicated image class (well, >> besides QImage), and I can't seem to figure out how to get images from >> a Page... I can see that there is support for rendering part of a page >> to a QImage though. >> I've managed to find some image generating code looking through the >> utils/ folder in ImageOutputDev, but that seems to be using XPdf >> directly and I can't find any documentation for that either. > > From what I remember none of the "public" frontends export the Image > information.
The glib frontend can export the images and their position: http://people.freedesktop.org/~ajohnson/docs/poppler-glib/PopplerPage.html#poppler-page-get-image-mapping > >> >> Also, after having cloned the Poppler repo, I'm not sure where to look >> first. What I gather is that there are multiple backends and frontends for >> Poppler. Backends like Cairo, Splash and frontends like Qt4, GLib and a >> vanilla C++ one. Which of these should I use? > > The one you like better :D > >> I'd kind of like minimal dependencies, but I've used Qt4 in the past >> and liked it. >> >> Which of these should I look at first (and actually, how do they all >> fit together)? > > Qt4 and cpp frontends use splash backend, glib one uses cairo backend. > > Albert > >> >> Sorry for being really noob-ish, but I just cant find any info :( >> >> Thanks! >> Dan >> _______________________________________________ >> poppler mailing list >> [email protected] >> http://lists.freedesktop.org/mailman/listinfo/poppler > _______________________________________________ > poppler mailing list > [email protected] > http://lists.freedesktop.org/mailman/listinfo/poppler _______________________________________________ poppler mailing list [email protected] http://lists.freedesktop.org/mailman/listinfo/poppler
