.
Marco Cimarosti wrote,

> 
> So far so good. Now I want to use your PUA Plan-14 tags, if present, to
> override the above assumption about PUA characters. E.g., imagine that my
> string contains this:
> 
>       
> 󠀀󠀂󠁆󠁯󠁏󠁢󠁡󠁲󠀮󠁴󠁴󠁦󠁿> ?
>       (U+0E0000 U+0E0002 U+0E0046 U+0E006F U+0E004F U+0E0062 U+0E0061
> U+0E0072 U+0E002E U+0E0074 U+0E0074 U+0E0066 U+0E007F U+E017 U+E009)
> 
> This is what I am going to do:
> 
> 1) I parsing the tags at the beginning of the string and save the relevant
> information in a temporary variable which we will call PuaInterpretation;
> 
> 2) I remove the tags.
> 
> Now, my PuaInterpretation variable contains the following information:
> 
>       Foobar.ttf
> 
> And my string contains the following text:
> 
>       
>       (U+E017 U+E009)
> 
> Now, what's the next step? What am I supposed to do to find out whether,
> according to the PUA interpretation called "Foobar.ttf", U+E017 and U+E009
> are letters or not?
> 

Hmmm, the UTF-8 non-BMP string apparently got munged.

Anyway, the next step is for your function to load the file 
"Foobar.puapropertiesclass".

This file is a plain-text file following the same format as UNIDATA.  It's
extensible -- if the font vendor doesn't include it with the font download,
then the savvy end-user can simply construct it with a plain-text editor.

Now your function has all the necessary information and can determine 
whether the PUA code points are letters, or not.

Best regards,

James Kass
.

Reply via email to