In a message dated 2001-09-17 16:24:05 Pacific Daylight Time,
[EMAIL PROTECTED] writes:
It doesn't reopen that specific type of security hole, because irregular
UTF-8
sequences (as defined by Unicode 3.1) can only decode to characters above
0x, and those characters are unlikely to be
Doug,
It is true that the *specific* irregular UTF-8 sequences introduced (and
required) by CESU-8 decode to characters above 0x when interpreted as
CESU-8, and to pairs of surrogate code points when (incorrectly)
interpreted
as UTF-8. Since definition D29, arguably my least favorite
-BEGIN PGP SIGNED MESSAGE-
[EMAIL PROTECTED] wrote:
In a message dated 2001-09-17 16:24:05 Pacific Daylight Time,
[EMAIL PROTECTED] writes:
It doesn't reopen that specific type of security hole, because irregular
UTF-8 sequences (as defined by Unicode 3.1) can only decode to
Doug Ewell wrote:
All Unicode code points of the form U+FE and U+FF are
special, in
that they are non-characters and can be treated in a special way by
applications (e.g. as sentinels).
I think this should be All Unicode code points of the form U+xxFFFE and
U+xx are special
-Original Message-
From: Bernard Miller [mailto:[EMAIL PROTECTED]]
Sent: Monday, September 17, 2001 5:19 PM
To: [EMAIL PROTECTED]
Subject: 6 questions
Hello,
These are the questions I wanted to
ask:
1. Why does Unicode say that there are 63486 code
values available to
eTranslate would like to invite everyone to a free online seminar
Automating the Web Localization Process led by Christoph Mosing,
eTranslate Vice-President.
When: Wednesday September 26, 2001
9:00 AM Pacific/12:00 PM Eastern
and again at 12:00 PM Pacific/3:00 PM Eastern
Event
Bernard Miller asked:
1.Why does Unicode say that there are 63486 code
values available to represent characters with single
16 bit values and 2048 available to represent an
additional 1,048,544 characters as surrogates? 65536 -
2048 = 63488 (difference of 2) --I guess it's due to
the 2
On Tue, 18 Sep 2001, Magda Danish (Unicode) wrote:
-Original Message-
From: Bernard Miller [mailto:[EMAIL PROTECTED]]
Sent: Monday, September 17, 2001 5:19 PM
To: [EMAIL PROTECTED]
Subject: 6 questions
Hello,
These are the questions I wanted to
ask:
1. [snip]
6.
I happened across these links:
http://acharya.iitm.ac.in/multi_sys/exist_codes.html
http://acharya.iitm.ac.in/multi_sys/uni_iscii.html
which do contain a nice discussion about ISCII but then they
discuss Unicode in, ummm, somewhat negative terms.
Myself knowing next to nothing about Indic
Bernard,
Many of your questions have been answered by others but I wants to add a few
comments.
1.Why does Unicode say that there are 63486 code
values available to represent characters with single
16 bit values and 2048 available to represent an
additional 1,048,544 characters as
This is the same problem that was discussed extensively for Tamil at TI2001
in Kuala Lampur last month. Basically, it boils down to three problems:
1) Most of the people involved do not understand Unicode or how it works.
2) Most of the people involved expect natural language processing to be a
Carl Brown said:
U+FDD0 to U+FDEF are also noncharacters that represent a range that can be
used by font rendering engines as an internal working set.
An urban legend seems to have sprung up here.
So once again, I assure everyone that there are no alligators
living in the sewers resulting
Jarkko reported:
I happened across these links:
http://acharya.iitm.ac.in/multi_sys/exist_codes.html
http://acharya.iitm.ac.in/multi_sys/uni_iscii.html
which do contain a nice discussion about ISCII but then they
discuss Unicode in, ummm, somewhat negative terms.
Myself knowing next
At 12:26 PM 9/18/01 -0700, Kenneth Whistler wrote:
3.Why don't noBreak formatted Unicode characters
have a canonical decomposition (the compatibility
decomposition surrounded by glue)?
A long story. But the short answer is that such a decomposition
would cause problems for
Ken,
Even those who do not know the details of Indic processing know that you can
not argue both sides of the issue. There was a lot of criticism of the fact
that there were differences in scripts yet there was no mention that Unicode
because of its extended code base does support
Carl,
Carl W. Brown wrote:
Why was really missing was the pint that Unicode is designed to support
multi-lingual text.
So is ISCII. Infact Unicode support for Indic scripts is based on ISCII.
If we use a ISCII how can we support a page that
contains different Indic scripts?
ISCII has
Ram,
ISCII has escape sequences which announce the start of a new Indic script.
An ATR char followed by special codepoint forms the escape sequence.
It is possible to support a page that contains different Indic
scripts.There are
problems with the standard like, it assumes a default
Because of the recent tragic events and the resulting disruption we are
sending you a reminder that this is the final week for submissions for
the Twentieth International Unicode Conference (IUC20).
Last Call for Papers!
Twentieth International Unicode Conference (IUC20)
In a message dated 2001-09-18 9:22:17 Pacific Daylight Time,
[EMAIL PROTECTED] writes:
Doug Ewell wrote:
All Unicode code points of the form U+FE and U+FF are
special, in
that they are non-characters and can be treated in a special way by
applications (e.g. as sentinels).
I
David Hopwood and Carl Brown graciously corrected me:
I don't agree that irregular UTF-8 sequences in general can only decode to
characters above 0x.
That's why I specifically referred to irregular sequences as defined by
Unicode 3.1 (i.e. UAX #27).
I stand corrected. That's what I
20 matches
Mail list logo