. Marco Cimarosti wrote, > > So far so good. Now I want to use your PUA Plan-14 tags, if present, to > override the above assumption about PUA characters. E.g., imagine that my > string contains this: > > > > ? > (U+0E0000 U+0E0002 U+0E0046 U+0E006F U+0E004F U+0E0062 U+0E0061 > U+0E0072 U+0E002E U+0E0074 U+0E0074 U+0E0066 U+0E007F U+E017 U+E009) > > This is what I am going to do: > > 1) I parsing the tags at the beginning of the string and save the relevant > information in a temporary variable which we will call PuaInterpretation; > > 2) I remove the tags. > > Now, my PuaInterpretation variable contains the following information: > > Foobar.ttf > > And my string contains the following text: > > > (U+E017 U+E009) > > Now, what's the next step? What am I supposed to do to find out whether, > according to the PUA interpretation called "Foobar.ttf", U+E017 and U+E009 > are letters or not? >
Hmmm, the UTF-8 non-BMP string apparently got munged. Anyway, the next step is for your function to load the file "Foobar.puapropertiesclass". This file is a plain-text file following the same format as UNIDATA. It's extensible -- if the font vendor doesn't include it with the font download, then the savvy end-user can simply construct it with a plain-text editor. Now your function has all the necessary information and can determine whether the PUA code points are letters, or not. Best regards, James Kass .

