PDFdev is a service provided by PDFzone.com | http://www.pdfzone.com
_____________________________________________________________

I don't know of any source of this kind of information... though it
shouldn't be too hard to generate.  Writing an app that ran around google,
sucking up PDF files shouldn't be too hard.

It sounds to me like you're trying to sort your keys based on probability
for lookup purposes.  My advice (if I'm right):  Don't.

Use a map.  In C++:

std::map<int, std::string>
and a parallel:
std::map<std::string, int>

std::wstring might be more appropriate, depending on what you're doing.

Each entry goes in both maps.  It is then possible to look up any key/value
in logarithmic time, rather than linear time.  A map's worst-case
performance is MUCH better than a simple linear search.

If, on the other hand, you're trying to do some sort of PDF fingerprinting,
then I suggest you create that google diver I mentioned earlier.

--Mark Storer
  Software Engineer
  Cardiff Software
#include <disclaimer>
typdef std::disclaimer<Cardiff> Discard;


> -----Original Message-----
> From: Max Khesin [mailto:[EMAIL PROTECTED]
> Sent: Thursday, June 26, 2003 1:40 PM
> To: [EMAIL PROTECTED]
> Subject: Re: [PDFdev] stats on PDF Name Objects
> 
> 
> 
> PDFdev is a service provided by PDFzone.com | http://www.pdfzone.com
> _____________________________________________________________
> 
> Still not enough info to help you (top-secret pdf 
> classification project ?:),
> but I would guess the PDFs would cluster by the producing 
> software (of course you can just look that up in the Producer 
> tag most of
> the time) and by the source document (html, latex, msword, 
> scanned file) etc.
> 
> > Here is another way to ask this question:
> > Does a signature (meaning a unique identifier, not the approval kind
> > of signature) exist for the "typical" PDF file.  In other 
> words, if I
> > take a million PDF files and histogram the frequency of the keys,
> > will "most" PDF files have the same basic kind of histogram or
> > will they be all over the map?
> 
> this is confusing. Are you looking for a unique signature of 
> a PDF file from a set of many PDF files or sig. of a PDF file vs other
> document types?
> 
> max.
> 
> 
> To change your subscription:
> http://www.pdfzone.com/discussions/lists-pdfdev.html
> 

To change your subscription:
http://www.pdfzone.com/discussions/lists-pdfdev.html

Reply via email to