Re: Detecting if PDF contains only/mostly images.

Lachezar Dobrev Mon, 30 Oct 2017 08:52:52 -0700

  I have been looking at it. I am actually using (a similar) approach
to read embedded bar-codes, but there I can test all images.
  The best I can see in ExtractImages is a way to check if there is
only one image. However I can not check if there is additional text or
other content, so that I do not mistakenly skip a page that has a
single logo (for instance) and lots of other text information.
  I tried looking at PDFTextStripper, but that is hard to follow.


  Is there any sure(-ish) sign that there is text on a page that I can
use? Can I check for the existence of something that would tell me
that there is additional content on the page other than the single
image?

2017-10-30 15:53 GMT+02:00 Tilman Hausherr <[email protected]>:
> Am 30.10.2017 um 14:04 schrieb Lachezar Dobrev:
>>
>>    I have to process PDF files, that (supposedly) contain one big image
>> per page, which is a result from a Document-Scanner. I'd like to avoid
>> performing PDF-To-Image in these cases, and use the underlying image
>> instead.
>>    I am not well-versed in all things PDF and have no idea how to
>> detect if a page has content other than a single image.
>>    Please advise.
>
>
> Please have a look at the ExtractImages.java source code. You can change
> that one to your needs.
>
> Tilman
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]
>

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: Detecting if PDF contains only/mostly images.

Reply via email to