Peter Constable wrote:
UTF-8 sequences, as originally defined, could be longer than
four bytes,
in order to address codepoints in the vast expanse of UCS-4 at
U+11..U+. Since the accepted code space has been
constrained
to U+..U+10, only four bytes are needed. There
Kent,
It's time to nitpick the nitpicker. ;-)
1. UCS-4, which is still defined by 10646 (but never by Unicode)
is limited at U-7FFF
U-7FFF (~ U7FFF ~ 7FFF ~ -7FFF [!])
The space in U-7FFF is a Swedishism, not specified in
the standard. The U and the - are
Kenneth Whistler scripsit:
It was only with Unicode 3.0 (and the correlated 10646-1:2000)
that this was rationalized to the Unicode definition of
UTF-8 formally consisting of only 1-4 bytes sequences, while
simultaneously the potential need for 5 and 6-byte sequences
in 10646 was removed,
John Cowan asked:
Tell us, O Keen-Eyed Peerer Into The Future: is there any hope that
the code space above 10 will ever be removed from 10646, so that
the Unicode's a subset of 10646 meme can be stomped once and for
all? I grow weary of explaining this pointless difference.
Anything is
- Original Message -
From: Chan Fook Sheng [EMAIL PROTECTED]
To: [EMAIL PROTECTED]
Sent: Friday, May 07, 2004 10:03 AM
Subject: any unicode conversion tools?
Hi
I am looking for unicode, utf-8 coonversion tools for windows platform,
but can't find any on the web.
can anyone
From: Chan Fook Sheng [EMAIL PROTECTED]
I am looking for unicode, utf-8 coonversion tools for windows platform,
but can't find any on the web.
can anyone direct me to some links?
for example: the / character is 47 in decimal, 2F in hex.
it can be represented in UTF-8 format as:
1 byte:
Philippe Verdy scripsit:
A free converter tool exists in the Java SDK for Windows: look for
native2ascii.
Beware of trying to use this as a general converter: it's meant only
for Java code, or code from a closely related programming language.
In particular, it treats strings inside or ''
On May 07, 2004, at 08:08, Philippe Verdy wrote:
From: Chan Fook Sheng [EMAIL PROTECTED]
I am looking for unicode, utf-8 coonversion tools for windows
platform,
but can't find any on the web.
can anyone direct me to some links?
for example: the / character is 47 in decimal, 2F in hex.
it can
See also http://www.unicode.org/review/index.html#pri33
Rick
it can be represented in UTF-8 format as:
1 byte: still 2F
2 bytes: C0 AF (illegal)
3 bytes: E0 80 AF (illegal)
Thanks for keeping the indication that the last two are illegal with
UTF-8. But
you should have better never listed them (even if there still exists
some legacy
UTF-8 encoded sequences can be up to 5 bytes long...
How is that possible. I was under the impression that a UTF-8
sequence
could never be more than 4 bytes (i.e. U+10 becomes F4 8F BF BF).
Philippe chastised Chan for mentioning illegal sequences, but then went
on to make
Clark Cox wrote:
Note
also that
UTF-8 encoded sequences can be up to 5 bytes long...
How is that possible. I was under the impression that a UTF-8
sequence could never be more than 4 bytes (i.e. U+10 becomes F4 8F
BF BF).
Unicode ISO/IEC 10646 define UTF-8 differently; Unicode stops at
UTF-8 sequences, as originally defined, could be longer than four
bytes,
in order to address codepoints in the vast expanse of UCS-4 at
U+11..U+.
U+ or U+7FFF? (not nit-picking, genuinely unsure).
Thanks to Jon Hanna for catching this: it was U+7FFF.
Peter
13 matches
Mail list logo