If you don't find any solutions, you could try an OCR that gives x/y positions of words like 'cuneiform -l eng -f hocr' and then look for holes with no words.
________________________________ From: poppler <[email protected]> on behalf of Albretch Mueller <[email protected]> Sent: Tuesday, September 3, 2019 11:36 AM To: [email protected] <[email protected]> Subject: [poppler] (preferably Linux-based, OS) utility to extract images from image-based pdf files ... The output of pdfimages would be a whole page image if the input is a non-searchable, image-based pdf files. Take for example: https://www.nysedregents.org/ushistorygov/Archive/20000126exam.pdf which utility would detect the cartoons on page 6 and 7? lbrtchx [email protected]:(preferably Linux-based, OS) utility to extract images from image-based pdf files ... _______________________________________________ poppler mailing list [email protected] https://lists.freedesktop.org/mailman/listinfo/poppler
_______________________________________________ poppler mailing list [email protected] https://lists.freedesktop.org/mailman/listinfo/poppler
