I know for a fact (because I did it and just verified), that the font used for 
those codes use the real UCS code. The conversion happens in the PDF embedding 
magic. I could look into it, but I have no easy to debug the Adobe Distiller 
path here. Apparently when you get out of the beaten path for new characters, 
the preservation of code points in copy and paste operation is not bullet proof.

Michel

-----Original Message-----
From: Unicode [mailto:unicode-boun...@unicode.org] On Behalf Of Jukka K. Korpela
Sent: Friday, October 24, 2014 4:51 AM
To: unicode@unicode.org
Subject: Re: Code charts and code points

2014-10-24 11:17, "Martin J. Dürst" wrote:

> The code charts are published as PDFs. In general, text in PDFs can be 
> copypasted elsewhere. Is there something in place that makes sure that 
> "wrong" Unicode encodings for glyphs published in code charts don't 
> leak elsewhere?

It seems that there isn’t. Whether this is serious is a different issue.

I tested with the arbitrarily chosen Ornamental Dingbats block, with the chart 
http://www.unicode.org/charts/PDF/Unicode-7.0/U70-1F780.pdf
Opening it in Adobe Reader XI on Win 7, I was able to select the characters 
with the mouse and copy and paste them to a text editor, BabelPad. It shows 
most of them as just boxes, identified with the correct Unicode numbers; this 
is the expected behavior when the editor has no suitable font in its disposal. 
But instead of U+1F67C VERY HEAVY SOLIDUS and U+1F67D VERY HEAVY REVERSE 
SOLIDUS, it shows “/” and “/”, identified as U+002F SOLIDUS and U+005C REVERSE 
SOLIDUS.

So apparently the font designer had placed the glyphs as assigned to SOLIDUS 
and REVERSE SOLIDUS, which is understandable. But this means that when the 
characters in the code charts are copied and pasted, or otherwise accessed at 
the character level, they are wrong characters.

I think it is imaginable that someone wants to copy a block of characters from 
the code charts, as a handy way of getting them for inspection, e.g. for 
testing how some particular software renders them using some particular 
font(s). I would expect some confusion then if you had partly got all wrong 
characters (code points).

Yucca



_______________________________________________
Unicode mailing list
Unicode@unicode.org
http://unicode.org/mailman/listinfo/unicode

_______________________________________________
Unicode mailing list
Unicode@unicode.org
http://unicode.org/mailman/listinfo/unicode

Reply via email to