PDFdev is a service provided by PDFzone.com | http://www.pdfzone.com
_____________________________________________________________

Leonard,

Thanks for responding.

I do realize that not all PDFs use /ToUnicode (although their number is significant, 
approx. 600 out of 2000 in my randomly picked batch of files have it).

My goal is to minimize the number of files WITH /ToUnicode that my application chokes 
on during text extraction. The alleged irregularity I described in the original 
message was observed in 3 out of 600 PDFs with /ToUnicode, and that is a significant 
number for me to simply discard it.

Would you happen to know the meaning of the word "def"  in a Cmap (see original 
message)? And why would it be placed inside a dictionary?

Thanks!

Peter


---------- Original Message ----------------------------------
From: Leonard Rosenthol <[EMAIL PROTECTED]>
Reply-To: [EMAIL PROTECTED]
Date:  Thu, 25 Sep 2003 21:41:14 -0400

>
>PDFdev is a service provided by PDFzone.com | http://www.pdfzone.com
>_____________________________________________________________
>
>At 7:10 PM -0400 9/25/03, Peter Persits wrote:
>>An application I am developing is capable of extracting text data 
>>from almost every PDF document, and for this to happen I have to 
>>parse a font's ToUnicode stream which contains a CMap.
>>
>
>       What do you with the HUNDREDS OF THOUSANDS of PDF's that 
>don't have a ToUnicode stream?
>
>
>Leonard
>-- 
>---------------------------------------------------------------------------
>Leonard Rosenthol                            <mailto:[EMAIL PROTECTED]>
>Chief Technical Officer                      <http://www.pdfsages.com>
>PDF Sages, Inc.                              215-629-3700 (voice)
>                                              215-629-0789 (fax)
>
>To change your subscription:
>http://www.pdfzone.com/discussions/lists-pdfdev.html
>
>

To change your subscription:
http://www.pdfzone.com/discussions/lists-pdfdev.html

Reply via email to