PDFdev is a service provided by PDFzone.com | http://www.pdfzone.com
_____________________________________________________________

Hello All,

Ooops!  My many and humble apologies for not being specific
and using the wrong semantics for PDF in my previous question.
I work on another project in which the protocol uses the word "tag"
for what is called (at least I think is called....) a name object that 
is a key in PDF protocol.

So, let me rephrase.  Do any statistics exist on the frequency of 
common name objects that are keys in PDF files?  

When I say "name object that is a key," I am talking about things 
like these:
/Filter   /MediaBox  /CropBox  /F1  /BitsPerComponent  /ColorSpace 
/Type  /Page  /Parent  /Kids  /Contents  /D  /Title  /Dest  ...............

Here is another way to ask this question:  
Does a signature (meaning a unique identifier, not the approval kind
of signature) exist for the "typical" PDF file.  In other words, if I 
take a million PDF files and histogram the frequency of the keys,
will "most" PDF files have the same basic kind of histogram or
will they be all over the map?

There are lots of details that I am leaving out here, like exactly how
I would index the  dependent axis of the histogram.  But in general, 
I am trying to see if a "typical" PDF file can be described (if it exists) 
by an examination of the contents of the raw file (not the printed or 
viewed result of the reader).

Here is yet another try at what I'm trying to get at.  Suppose you get
a raw PDF file and run it through a parser and simply histogram 
the occurrence of keys.  Could I look at the frequency of the keys and
say to myself, "Hmmmm.  This is a catalog.  This one is a form. This
one is a ......."

Please no flames for what I am asking here.  It is not a "development"
kind of question, so I apologize if it wastes bitspace and does not
belong here.  I just thought that people who develop apps for PDF 
and deal with the files frequently might have insight.  Believe it or not, 
there are people out there who would like to know this.

Thanks in advance, and again, apologies for the vagueness of the
first post.

Carolyn

We are what we repeatedly do.  Excellence, then, is not an act, 
  but a habit.   -- Aristotle

/****************************************************/
  * M. Carolyn Briles
  * Software Engineer    P-21/NIS-9 
  * MS J570
  * Los Alamos National Laboratory               
  * Los Alamos, NM   87545                          
  * office: 505-665-0980  cell: 505-690-6660 
/****************************************************/



To change your subscription:
http://www.pdfzone.com/discussions/lists-pdfdev.html

Reply via email to