[...] > PDF is a scripting language. You can look at the raw PDF with a text > editor and you'll see plain text PDF operators interspersed with > possibly binary data. In principle PDF is a programming language and the > only way to tell what it produces is to run it. But in practice, PDF > code is all machine-written, and you could probably learn to distinguish > font-using PDFs from pure-image PDFs by examining the raw PDF file. > > You could look for the font embedding operators. A document consisting > only of scanned page images probably won't have any fonts embedded in > it. Or, if the scanned-paper PDFs are all made by a particular program, > you might be able to identify particular PDF operator sequences that it > uses.
In that vein, I ask: is Alex's question about a general method applicable to the set of all possible PDF files, or are the PDF files of the particular problem a limited set created by one or a few programs? -- Henry House +1 530 753 3361 ext. 13 Please don't send me HTML mail! My mail system frequently rejects it. The unintelligible text that may follow is a digital signature. See <http://hajhouse.org/pgp> to find out how to use it. My OpenPGP key: <http://hajhouse.org/hajhouse.asc>.
signature.asc
Description: Digital signature
_______________________________________________ vox-tech mailing list [email protected] http://lists.lugod.org/mailman/listinfo/vox-tech
