On Jun 8, 2004, at 7:41 AM, C Bobroff wrote:
By the way, I have received a PDF file from Iran recently in Persian and
it was possible to copy and paste from the PDF text into Notepad and all
the letters came out perfectly, only the letters were running backwards
from left to right. I can't seem to copy and paste with yours. It ends up
in garbage characters. Wish I knew these PDF secrets!

Regarding PDFs:

A PDF file only stores display glyphs (not characters) in left to right visual order and by definition can't do anything else. (It is intended to capture exact printout after all processing on the text is done.) For this reason, text extraction and search in PDFs in Arabic/Persian/etc is always a bit tricky. Although good Fonts and PDF viewer software can conceal that inherent complexity.

In order for a PDF to be well formed for text search and extraction, font glyph names should conform to the old (90's) version of Adobe Glyph List glyph naming standard. Also, you should use a recent release of Acrobat Distiller (Not the PDFWriter virtual printer driver) to create the PDFs. This may involve additional complications such as first saving to a PostScript print file. PDFWriter can't work reliably because of the way Windows printer driver architecture works. So, don't expect PDFWriter to be fixed until say after Longhorn in 2006.

The latest version of Acrobat Reader is somewhat improved in this regard, but to get something that works properly with well-formed PDFs, you will need Adobe Acrobat ME (Middle-East Edition). You can find more information on Adobe Central Europe/Middle East site: <http://www.adobeceea.com/products/ME/main.html>. They are claiming that the generic Reader 6 should search or copy text but actually only the 6.0 ME Acrobat Standard and Pro work properly and they are expensive if you just want to search or copy text...

By the way, one of the potential places that we need a project Defined in FarsiLinux project is a Persian compatible PDF generator and viewer.

- Hooman Mehr

_______________________________________________
PersianComputing mailing list
[EMAIL PROTECTED]
http://lists.sharif.edu/mailman/listinfo/persiancomputing

Reply via email to