On Mon, 2004-08-09 at 14:14, Dan Sugalski wrote:
> Additionally if we have source text which is
> Latin-n, EBCDIC, ASCII, or whatever we must be
> able to convert it with no loss to Unicode.
> (Which I believe is now doable with Unicode 4.0)
> Losslessly converting Unicode to
> ASCII/EBCDIC/whatever is *not* required, which is
> fine as it's theoretically (and often
> practically) impossible.
Can I suggest instead:
If we have source text which is comprised of a non-Unicode
character-set we must be able to convert it with minimal loss to
Unicode (minimal being defined as zero for all Unicode-subset
character sets).
Converting Unicode to non-Unicode character sets will be
lossless where possible, and will attempt to encode the name of
the character in ASCII characters into the target character set.
An example would be the conversion of the UTF-8 string (in Perl
5 notation):
"foo \x{263a} bar"
to the ASCII representation:
"foo {SMILING FACE, WHITE} bar"
There are 4 possible failure modes, each resulting in a
conversion exception: 1) the ASCII name is not available 2) the
ASCII name cannot be converted into the target character set
(recursive name-lookups are not allowed, nor would they be very
useful) 3) a VM parameter requesting exceptions on failed
character-set conversions has been set to a true value 4) the
source is a PMC and that PMC has a property indicating that
exceptions should be generated on failed conversions.
This just seems a bit more useful in the general case to me, while
allowing the language implementation the option of requesting an
exception either globally or per-PMC.
Thoughts?
--
â 781-324-3772
â [EMAIL PROTECTED]
â http://www.ajs.com/~ajs