PDFdev is a service provided by PDFzone.com | http://www.pdfzone.com _____________________________________________________________
Hello All, Ooops! My many and humble apologies for not being specific and using the wrong semantics for PDF in my previous question. I work on another project in which the protocol uses the word "tag" for what is called (at least I think is called....) a name object that is a key in PDF protocol. So, let me rephrase. Do any statistics exist on the frequency of common name objects that are keys in PDF files? When I say "name object that is a key," I am talking about things like these: /Filter /MediaBox /CropBox /F1 /BitsPerComponent /ColorSpace /Type /Page /Parent /Kids /Contents /D /Title /Dest ............... Here is another way to ask this question: Does a signature (meaning a unique identifier, not the approval kind of signature) exist for the "typical" PDF file. In other words, if I take a million PDF files and histogram the frequency of the keys, will "most" PDF files have the same basic kind of histogram or will they be all over the map? There are lots of details that I am leaving out here, like exactly how I would index the dependent axis of the histogram. But in general, I am trying to see if a "typical" PDF file can be described (if it exists) by an examination of the contents of the raw file (not the printed or viewed result of the reader). Here is yet another try at what I'm trying to get at. Suppose you get a raw PDF file and run it through a parser and simply histogram the occurrence of keys. Could I look at the frequency of the keys and say to myself, "Hmmmm. This is a catalog. This one is a form. This one is a ......." Please no flames for what I am asking here. It is not a "development" kind of question, so I apologize if it wastes bitspace and does not belong here. I just thought that people who develop apps for PDF and deal with the files frequently might have insight. Believe it or not, there are people out there who would like to know this. Thanks in advance, and again, apologies for the vagueness of the first post. Carolyn We are what we repeatedly do. Excellence, then, is not an act, but a habit. -- Aristotle /****************************************************/ * M. Carolyn Briles * Software Engineer P-21/NIS-9 * MS J570 * Los Alamos National Laboratory * Los Alamos, NM 87545 * office: 505-665-0980 cell: 505-690-6660 /****************************************************/ To change your subscription: http://www.pdfzone.com/discussions/lists-pdfdev.html
