Anyone know a way to tap into a pdf programmatically to tell if it
contains text vs was scanned as an image?
I basically just want to sort a directory with many thousands of pdfs.
I figured there must be something in the header or in the file info that
either says that it's an image or it has text, or to be more complicated
gives you a quick percentage of document is text, which I could use to
set a sort threshold.
Alternately if it can be done more easily on a ps file there's no reason
why I can't do a pdf2ps on it and then decide how to sort.
It's really a one time deal so I'll take the overhead on that operation.
Alex
_______________________________________________
vox-tech mailing list
[email protected]
http://lists.lugod.org/mailman/listinfo/vox-tech