Chip Salzenberg writes:
: So: The _string_encoding_ state of each OP must be one of these:
:
: 0. the default -- follow each string's current encoding
: 1. "use byte" -- all strings are one-byte
: 2. "use utf8" -- all strings are UTF-8 (*not* necessarily Unicode!)
There is no 2.
: And t
Chip Salzenberg writes:
: > Not really a superset anymore, unless you're into defining your own
: > characters outside of U+10.
:
: I don't understand... Could someone point me to a description of the
: current Unicode <-> ISO 10646 relationship?
Well, http://www.unicode.org/unicode/standard
Russ Allbery writes:
: FWIW, from the standards front, the next revision of the news standards
: will almost certainly be standardizing on UTF-8 as the character set for
: headers (headers being particularly tricky since while you can use MIME to
: specify a character set for the body, doing the s
Johan Vromans writes:
: Larry Wall <[EMAIL PROTECTED]> writes:
:
: > If a subject has more than 50% high-bit characters in the subject,
: > it goes straight into my spam mailbox without trying any of the
: > other heuristics.
:
: I use 'more than 5 high-bit characters in
Bart Schuller writes:
: On Fri, Feb 04, 2000 at 09:21:04AM -0800, Tim Bray wrote:
: > It should be noted that over in Java-land, UTF-16 is more or less the
: > native dialect, and UTF-8 is a royal pain in the butt to deal with. Sigh.
:
: I was just reading up on the Java Native Interface and the
Tim Bray writes:
: BTW, should ord($c) return different values depending on whether or not
: I've said "use utf8;"?
The short answer is no.
The medium answer is that you'll have to say "use byte" if you want ord($c)
to return the first byte rather than the first character.
The long answer is
Ilya Zakharevich writes:
: On Fri, Feb 04, 2000 at 12:12:25AM -0800, Chip Salzenberg wrote:
: > > > So: The _string_encoding_ state of each OP must be one of these:
: > > > 0. the default -- follow each string's current encoding
: > > > 1. "use byte" -- all strings are one-byte
: > > > 2. "
Gurusamy Sarathy writes:
: Treating literals as utf8 is a bit of a compatibility issue, but
: I think we should get around that by treating the lex input stream
: as any other discipline. IOW, default PL_rsfp to byte mode,
: and let users push a utf8/utf16/whatever discipline on it if they
: wann
Tom Christiansen writes:
: >Well, I hope they enforce it. We're starting to get all sorts of
: >gobbledygook in the subjects of mail messages. I'd love it if mailers
: >rejected messages whose headers contain illegal UTF-8 sequences.
:
: That's not too hard to do. :-)
Technologically, yes. Bu