|
Hi David
First, thanks for your help. I don't know the PDF format too much, but I tried couple tools I found, and none could convert the file properly. BTW, the PDF is here: http://mjai.co.il/NEW_TESTAMENT/nt.zip if someone wants to try.
As far as I know, it's not password-protected, and I tried myself to convert the gibberish to Hebrew like code decoding. I succeeded most of it, but still wasn't able to convert the NIQUD signes (the dots), which is a very important thing.
The file properties shows embedded fonts (a lot). Can it help somehow ? Thanks,
On 02/23/2011 10:11 AM, David Haslam wrote: PDF files with embedded custom fonts can be a pain for extracting text.Have you checked document properties | fonts to see what these are? Also, some PDF files are encrypted to prevent copying of content. If printing is allowed, it might sometimes work to intercept the printing output stream. You might still get gibberish, as a result of the embedded fonts though. btw. There are several utilities available that can convert other encodings to Unicode. Unless the embedded font is properly documented, it's a hard slog to remap the encoding. I once tried this for an Indian language, but gave up after a few hours. David |
_______________________________________________ sword-devel mailing list: [email protected] http://www.crosswire.org/mailman/listinfo/sword-devel Instructions to unsubscribe/change your settings at above page
