Thank you very much for your explanation. I'll try to convert pdf to image
and then to text via OCR. Which is the most accurate way to do this?
----- Original Message -----
From: <[email protected]>
To: <[email protected]>
Cc: <[email protected]>
Sent: Thursday, June 23, 2011 6:12 PM
Subject: Re: Text extraction results in strange characters
Dani,
The type of font being used is probably embedded and mapped to images of
the characters. This works great for viewing the document, but if you
don't have characters (ASCII or Unicode), you're not going to get
reasonable results when copying and pasting. If my theory is correct,
you'll find that you will also be unable to copy & paste using Adobe
Reader. The only way to get the text out of a file like this would be to
convert it to an image, and then try to use ocr (optical character
recognition) to extract the text. As you probably already know, OCR is
not 100% accurate, but it'd be better than nothing.
Developers,
I suggest we add this to the FAQ on the website. I've seen it come up a
few times, and it's a very interesting explanation.
----
Thanks,
Adam
From:
Daniel Sánchez González <[email protected]>
To:
[email protected]
Date:
06/23/2011 04:55
Subject:
Text extraction results in strange characters
When I try to convert a PDF to text the operation results in strange
characters. If I copy some text from PDF file and paste it in a text
editor,
I've got the same result.
What is wrong?
Thanks in advance.
Dani
- FHA 203b; 203k; HECM; VA; USDA; Conventional
- Warehouse Lines; FHA-Authorized Originators
- Lending and Servicing in over 45 States
www.swmc.com - www.simplehecmcalculator.com Visit
www.swmc.com/resources for helpful links on Training, Webinars, Lender
Alerts and Submitting Conditions
This email and any content within or attached hereto from Sun West Mortgage
Company, Inc. is confidential and/or legally privileged. The information is
intended only for the use of the individual or entity named on this email.
If you are not the intended recipient, you are hereby notified that any
disclosure, copying, distribution or taking any action in reliance on the
contents of this email information is strictly prohibited, and that the
documents should be returned to this office immediately by email. Receipt by
anyone other than the intended recipient is not a waiver of any privilege.
Please do not include your social security number, account number, or any
other personal or financial information in the content of the email. Should
you have any questions, please call (800) 453 7884.