Thank you very much for your explanation. I'll try to convert pdf to image and then to text via OCR. Which is the most accurate way to do this?



----- Original Message ----- From: <[email protected]>
To: <[email protected]>
Cc: <[email protected]>
Sent: Thursday, June 23, 2011 6:12 PM
Subject: Re: Text extraction results in strange characters


Dani,
The type of font being used is probably embedded and mapped to images of
the characters.  This works great for viewing the document, but if you
don't have characters (ASCII or Unicode), you're not going to get
reasonable results when copying and pasting.  If my theory is correct,
you'll find that you will also be unable to copy & paste using Adobe
Reader.  The only way to get the text out of a file like this would be to
convert it to an image, and then try to use ocr (optical character
recognition) to extract the text.  As you probably already know, OCR is
not 100% accurate, but it'd be better than nothing.

Developers,
I suggest we add this to the FAQ on the website.  I've seen it come up a
few times, and it's a very interesting explanation.

---- Thanks,
Adam



From:
Daniel Sánchez González <[email protected]>
To:
[email protected]
Date:
06/23/2011 04:55
Subject:
Text extraction results in strange characters



When I try to convert a PDF to text the operation results in strange
characters. If I copy some text from PDF file and paste it in a text
editor,
I've got the same result.

What is wrong?

Thanks in advance.

Dani





- FHA 203b; 203k; HECM; VA; USDA; Conventional
- Warehouse Lines; FHA-Authorized Originators
- Lending and Servicing in over 45 States
www.swmc.com - www.simplehecmcalculator.com Visit www.swmc.com/resources for helpful links on Training, Webinars, Lender Alerts and Submitting Conditions This email and any content within or attached hereto from Sun West Mortgage Company, Inc. is confidential and/or legally privileged. The information is intended only for the use of the individual or entity named on this email. If you are not the intended recipient, you are hereby notified that any disclosure, copying, distribution or taking any action in reliance on the contents of this email information is strictly prohibited, and that the documents should be returned to this office immediately by email. Receipt by anyone other than the intended recipient is not a waiver of any privilege. Please do not include your social security number, account number, or any other personal or financial information in the content of the email. Should you have any questions, please call (800) 453 7884.

Reply via email to