I can confirm that \string does convert character tokens to two tokens giving the UTF-16 representation.
With the attached file luatex produces 90,33 34,33 233,33 233,33 65530,33 65537,33 65537,33 which is in each case the unicode value of the character followed by that of ! xetex produces 90,33 34,33 233,33 233,33 65530,33 55296,56321 55296,56321 where the last two lines show that \string has generated U+D800 U+DC01 which does correspond to the UTF-16 encoding of U+10001 confirming that \string on a character token has produced two tokens that have been picked up separately as #1 and #2 of the \test macro. If I am reading it right the UTF-16 comes from here procedure print_char(@!s:integer); {prints a single character} label exit; var l: small_number; begin if (selector>pseudo) and (not doing_special) then {``printing'' to a new string, encode as UTF-16 rather than UTF-8} ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ begin if s>=@"10000 then begin print_visible_char(@"D800 + (s - @"10000) div @"400); print_visible_char(@"DC00 + (s - @"10000) mod @"400); end else print_visible_char(s); return; end; so could not do that and instead just print_visible_char(s); but perhaps some other context requires UTF-16 in which case perhaps the selector needs another state to allow a code path that doesn't encode as UTF-8 or UTF-16 but just generates the internal UTF-32 representation? David
nonbmp2.tex
Description: TeX document
-------------------------------------------------- Subscriptions, Archive, and List information, etc.: http://tug.org/mailman/listinfo/xetex