Gandalf posted this from a non-subscribed address: ----- Forwarded message from [EMAIL PROTECTED] -----
The attached message has been automatically discarded. Date: Wed, 21 Mar 2007 07:54:22 -0700 (PDT) From: Gandalf Parker <[EMAIL PROTECTED]> Subject: Re: [vox-tech] How to tell if a pdf is text or image? To: lugod's technical discussion forum <[email protected]> On Tue, 20 Mar 2007, Alex Mandel wrote: >Well, I don't actually need the text, I just need to know if it is text. >The idea is that once I separate them, all the ones that are images can >then be ocr corrected to text versions. >So my idea was either a yes/no answer or to say something like, if the >document is more than 20%(arbitrary) text consider it text. Try typing identify If you have ImageMagick loaded then it will give you plenty. And you can turn up the verbose setting. Then grep for the one line that tells you what you need to know. Ive never known it to not give enough info and its very fast. Or you might consider loading ImageMagick tools if you dont have it. There are a ton of very useful options (altho it quickly loses you in graphics jargon if you try really fancy things). There is a great thumbprint webpage generator in it which might also speed up the process for you. Gandalf Parker I did give that shot, but it only gives me info about the pdf as an image. It can't tell anything about the fonts embedded in the file. Alex _______________________________________________ vox-tech mailing list [email protected] http://lists.lugod.org/mailman/listinfo/vox-tech
