On Mon, 11 Nov 2002, John Cowan wrote:

> On *ix systems, use the "bc" command; type "obase=16" and "ibase=16".

  Thank you for this. I should have read the man page of bc more
carefully. (or I used to know it but forgot...)

> For this program, you must use capital letters for the hex digits.
> To get the high surrogate, type "(xxxxx-10000)/400+DC00" for the high

  s/DC00/D800/

> surrogate ("xxxxx" is the scalar value); to get the low surrogate,
> type "(xxxxx-10000)%400+DC00".

And one can define a function....

> On the Macintosh, I have no clue.

  As you know so well,  MacOS X is a Unix and 'bc' should be available
there, too.  If not by default, one can certainly grab the source and
compile it or get a precompiled binary somewhere.

  It seems to me a waste of the bandwidth (however abundant it may have
become recently. I heard several times on this list that it's not in a
certain country in Europe ;-) ) to go all the way across the Atlantic or
the continent to convert between UCVs and surrogate pairs.  There are
several ways to do it locally including two suggested above. On *nix
including MacOS X (http://developer.apple.com/internet/macosx/perl.html),
one can open up a small terminal window (yes, Mac OS X has a
terminal window !) and run a script like the following(assuming Perl
is installed.  If GUI is desired, make one up in Perl/Tk, Tcl/Tk,
pdksh, Python+Tk?...) This should also work in a command prompt of
Windows. Alternatively, I guess a local html file with ECMAscript should
also work.

------------Cut--------here----------------
#!/usr/bin/perl -w
# use the full path of your perl binary in place of /usr/bin/perl

while ( 1 ) {
  print "** Enter Unicode code point in hexadecimal \n" .
        "  (to end, press [enter]) : ";
  $| = 1;               # force a flush after our print
  $ucs = <STDIN>;
  chomp $ucs;

  last if $ucs eq "";

  if ( $ucs =~ /[^a-f0-9A-F]/ ) {
    printf "  Error: %s is invalid. Try again\n", $ucs;
    next;
  }

  $usv = hex $ucs;
  if ( 0xffff < $usv && $usv < 0x110000 ) {
    printf "UTF-16: %04x %04x\n", ($usv-0x10000) / 0x400 + 0xd800,
                                  ($usv-0x10000) % 0x400 + 0xdc00,
  }
  elsif ( $usv < 0xd800 || 0xdfff < $usv && $usv < 0x10000 ) {
    printf "UTF-16: %04x\n", $usv;
  }
  else {
    printf "Your input %s is not valid. Try again\n", $ucs;
  }
}

print "Bye !!\n";
--------------------Cut---------here--------------

  Jungshik


Reply via email to