On 12 Aug 01, at 18:42, Prymmer/Kahn wrote:
> On Sun, 12 Aug 2001, Jarkko Hietaniemi wrote:
>
> > Summary: I suggest deprecating E<nn> where nn < 256 since it is not portable.
>
> I think that strategy might be too drastic. Why deprecate? Why not
> simply warn about the unportabilty but still allow the flexability
> afforded by numeric character specification?
What's flexible about something where you have no idea what will come
out? On the other hand, it's as flexible as putting in raw bytes
without specifying the character set -- it'll look OK if the receiver
has the same coded character set as the sender, but otherwise possibly
not.
> Specifying numeric codepoints may prove to be a popular thing given
> the rather sorry state of input methods among common text editors.
If non-ASCII characters are replaced by E<234> automatically (say, by a
script run over the finished text), then they could (nearly) as easily
be replaced by E<ecirc> references; and if they are entered by hand,
then IMO the mnemonic E<ecirc> is easier to remember than E<234>. (I
doubt that text editors will produce such things themselves on a
keypress of � due to the "rather sorry state of input methods" you
mentioned.) I'm not sure what's gained by allowing E<234> if you don't
also mandate "this means code point 234 in the character set X" --
regardless of whether "X" eq "Unicode" or "EBCDIC" or "Latin-9" or
whatever.
I think I agree with Jarkko that E<nnn> to nnn < 256 should either be
deprecated (yes, even for nnn < 127) or be specified as being in a
specific character set (for example, Unicode, for compatibility with
E<nnn> for nnn > 255).
I would also suggest that raw bytes by interpreted as UTF-8 in the
absence of other indications of encoding (such as UTF-16 BOMs); this
would automatically mean that text written in ASCII environments would
be interpreted correctly, since the byte representation for the subset
of Unicode corresponding to ASCII is identical between ASCII and UTF-8.
Cheers,
Philip
--
Philip Newton <[EMAIL PROTECTED]>