Re: [XeTeX] printing of characters above "FFFF with \string \meaning (and potentially \Uchar)

David Carlisle Thu, 23 Apr 2015 13:02:59 -0700

I can confirm that \string does convert character tokens
to two tokens giving the UTF-16 representation.


With the attached file luatex produces

90,33
34,33
233,33
233,33
65530,33
65537,33
65537,33


which is in each case the unicode value of the character followed by that
of !

xetex produces

90,33
34,33
233,33
233,33
65530,33
55296,56321
55296,56321


where the last two lines show that \string has generated U+D800 U+DC01
which does correspond to the UTF-16 encoding of U+10001 confirming
that \string on a character token has produced two tokens that have been
picked up separately as #1 and #2 of the \test macro.


If I am reading it right the UTF-16 comes from here

procedure print_char(@!s:integer); {prints a single character}
label exit;
var l: small_number;
begin if (selector>pseudo) and (not doing_special) then
  {``printing'' to a new string, encode as UTF-16 rather than UTF-8}

^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  begin if s>=@"10000 then
   begin print_visible_char(@"D800 + (s - @"10000) div @"400);
   print_visible_char(@"DC00 + (s - @"10000) mod @"400);
   end else print_visible_char(s);
   return;
  end;

so could not do that and instead just print_visible_char(s); but perhaps
some
other context requires UTF-16 in which case perhaps the selector needs
another
state to allow a code path that doesn't encode as UTF-8 or UTF-16 but just
generates
the internal UTF-32 representation?


David

nonbmp2.tex
Description: TeX document


--------------------------------------------------
Subscriptions, Archive, and List information, etc.:
  http://tug.org/mailman/listinfo/xetex

Re: [XeTeX] printing of characters above "FFFF with \string \meaning (and potentially \Uchar)

Reply via email to