[Moving over to fop-dev as this is getting technical] On 30/01/13 15:58, Glenn Adams wrote: > On Wed, Jan 30, 2013 at 6:44 AM, Neeraj <neerajii...@gmail.com> wrote: > >> >> Yes, my editor can handle used font. >> If you highlight the text in the editor and set the font to Arial do you >> see any >> glyph? For PDF text - No >> >> For embedding this, May be I added embedding mode full later, after >> generating >> PDF, but in both the cases it is giving same results. >> >> The issue I reported was for non-Base14 font. You are using Arial which is >> Base14 font and FOP has full support for these kinds of fonts. >> >> Well as you said, I tried same functionality with Arial font also and >> found same >> issue in different form. >> >> Original Arabic text - هذا تعليق الاختبار. تتم كتابة الكلمات بشكل صحيح >> PDF Arabic text - ھذا تعلیق الاختبار. تتم كتابة الكلمات بشكل صحیح >> >> If I compare PDF and MS-Word files, it looks exactly similar but when I >> copy it >> to an editor(Font supported), the words look different (Glyphs are >> missing). You >> can check the above text. >> >> Why am I loosing text while doing copy/paste? > > > One thing to keep in mind is that some fonts do not include entries in the > CMAP table for all glyphs that can be referenced by performing the > character to glyph transformation process. In this case FOP, synthesizes a > CMAP entry which is used in the embedded font, where this entry uses a > dynamically generated Unicode value in the PUA (private use area). This > latter is necessary since PDF requires specifying *some* character code > (and not glyph index directly) when performing text drawing.
I may be missing something, but I don’t understand this ‘PDF requires specifying some character code’. AFAIU you can put glyph indices directly in the PDF string; you just have to specify Identity-H as the font’s encoding and Identity in the CIDToGIDMap. So I’m not sure why it is necessary to use codes in the private use area. Then, to have copy-paste working, you ‘just’ have to provide an appropriate ToUnicode CMap, that re-maps the shaped glyph to the original Unicode code point(s). > If you then attempt to copy this text and paste into another editor that > isn't aware of this dynamic mapping using the embedded font's CMAP, then > you may lose that mapping information. One possible way to fix this, which > I haven't investigated in detail, is to provide a separately encoding > Unicode string that contains the original, pre-transformed text, and > associate this string with the displayed post-transformed character string > that may contain these dynamic PUA characters. The PDF viewer would then > need to make use of the pre-transformed string when performing copy > operations. However, I haven't researched this to see if PDF supports. > > Anyway, I suspect this is what is causing your problem. I've opened a bug > on this at [1]. > > [1] https://issues.apache.org/jira/browse/FOP-2204 Vincent