Ahh... You mean use the tool as a *ahm* tool? I'm so used to seeing these as parts of the command-line tools that I've totally forgotten that their inner elements are suitable for use in code. Thanks.
I think I'm going to create a Writer implementation that throws exception if non-white space is written to it, and use the writeText(PDDocument,Writer) to quickly cancel processing when non-white space is found. 2017-10-30 19:54 GMT+02:00 Tilman Hausherr <[email protected]>: > Am 30.10.2017 um 16:52 schrieb Lachezar Dobrev: >> >> I have been looking at it. I am actually using (a similar) approach >> to read embedded bar-codes, but there I can test all images. >> The best I can see in ExtractImages is a way to check if there is >> only one image. However I can not check if there is additional text or >> other content, so that I do not mistakenly skip a page that has a >> single logo (for instance) and lots of other text information. >> I tried looking at PDFTextStripper, but that is hard to follow. > > > That one is easy... just create the object, set start and end page, and then > call getText(). > > Tilman > > >> >> Is there any sure(-ish) sign that there is text on a page that I can >> use? Can I check for the existence of something that would tell me >> that there is additional content on the page other than the single >> image? >> >> 2017-10-30 15:53 GMT+02:00 Tilman Hausherr <[email protected]>: >>> >>> Am 30.10.2017 um 14:04 schrieb Lachezar Dobrev: >>>> >>>> I have to process PDF files, that (supposedly) contain one big image >>>> per page, which is a result from a Document-Scanner. I'd like to avoid >>>> performing PDF-To-Image in these cases, and use the underlying image >>>> instead. >>>> I am not well-versed in all things PDF and have no idea how to >>>> detect if a page has content other than a single image. >>>> Please advise. >>> >>> >>> Please have a look at the ExtractImages.java source code. You can change >>> that one to your needs. >>> >>> Tilman >>> >>> --------------------------------------------------------------------- >>> To unsubscribe, e-mail: [email protected] >>> For additional commands, e-mail: [email protected] >>> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: [email protected] >> For additional commands, e-mail: [email protected] >> > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [email protected] > For additional commands, e-mail: [email protected] > --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]

