Hey,

I'm using the xml output of pdftohtml to classify pdfs. I wondered if it
would be easy to create an option in a custom built to have the image tags
in the xml without extracting the images themselves. I have to classify a
lot of pdfs and some of those are powerpoint presentations with lots of
small images (e.g. 26000 per page) which take several hours to extract. I
need the image tags for some of my features for classification.

If someone could point me to the place in the code where I could make that
change that would be very much appreciated. Otherwise I have to check the
code myself.

Many Thanks,
Kai
_______________________________________________
poppler mailing list
[email protected]
https://lists.freedesktop.org/mailman/listinfo/poppler

Reply via email to