Hi Peter, Jonathan,

On 16/10/2012, at 2:02, Peter Baker <ps...@virginia.edu> wrote:

> On 10/15/12 10:59 AM, Jonathan Kew wrote:
>> 
>> That's exactly the problem - these glyphs are encoded at PUA codepoints, so 
>> that's what (most) tools will give you as the corresponding character data. 
>> If they were unencoded, (some) tools would use the glyph names to infer the 
>> relevant characters, which would work better.
>> 
>>> Small caps are named like "a.sc" and they are unencoded.
>> And as they're unencoded, (some) tools will look at the glyph name and map 
>> it to the appropriate character.
> 
> I've been trying to explain this:  but Jonathan does it much better than I 
> did, and with more authority.

Yes, but why would he tools be designed this way?
Surely unencoded means that the code-point has not been assigned yet, and may 
be assigned in future. So using these is asking for trouble.
Was not the intention of PUA to be the place to put characters that you need 
now, but have no corresponding Unicode point? This is precisely where using the 
font name should work. Or am I missing something?

So why would the tool be designed to infer the right composition of characters 
when a ligature is properly named at an unencoded point, but that same 
algorithm is not used when it is at a PUA point?

> 
> P.

Perplexed.

    Ross

PS. would not this be particulr issue with ligatures be resolved with a 
/ToUnicode  CMap for the font, which can do one–many assignments. 
Yes, this does not handle the many–one and many–many requirements of complex 
scripts, but that isn't what was being reported here, and is a much harder 
recognition problem.
Besides, it isn't clear there what copy-paste should best produce. Nor how to 
specify the desired search.


--------------------------------------------------
Subscriptions, Archive, and List information, etc.:
  http://tug.org/mailman/listinfo/xetex

Reply via email to