Re: PDUTR #26 posted

2001-09-18 Thread DougEwell2
In a message dated 2001-09-17 16:24:05 Pacific Daylight Time, [EMAIL PROTECTED] writes: It doesn't reopen that specific type of security hole, because irregular UTF-8 sequences (as defined by Unicode 3.1) can only decode to characters above 0x, and those characters are unlikely to be

RE: PDUTR #26 posted

2001-09-18 Thread Carl W. Brown
Doug, It is true that the *specific* irregular UTF-8 sequences introduced (and required) by CESU-8 decode to characters above 0x when interpreted as CESU-8, and to pairs of surrogate code points when (incorrectly) interpreted as UTF-8. Since definition D29, arguably my least favorite

Re: PDUTR #26 posted

2001-09-18 Thread David Hopwood
-BEGIN PGP SIGNED MESSAGE- [EMAIL PROTECTED] wrote: In a message dated 2001-09-17 16:24:05 Pacific Daylight Time, [EMAIL PROTECTED] writes: It doesn't reopen that specific type of security hole, because irregular UTF-8 sequences (as defined by Unicode 3.1) can only decode to

RE: PDUTR #26 posted

2001-09-18 Thread Marco Cimarosti
Doug Ewell wrote: All Unicode code points of the form U+FE and U+FF are special, in that they are non-characters and can be treated in a special way by applications (e.g. as sentinels). I think this should be All Unicode code points of the form U+xxFFFE and U+xx are special

FW: 6 questions

2001-09-18 Thread Magda Danish (Unicode)
-Original Message- From: Bernard Miller [mailto:[EMAIL PROTECTED]] Sent: Monday, September 17, 2001 5:19 PM To: [EMAIL PROTECTED] Subject: 6 questions Hello, These are the questions I wanted to ask: 1. Why does Unicode say that there are 63486 code values available to

Education on Automating the Web Localization Process

2001-09-18 Thread Burke, Jon
eTranslate would like to invite everyone to a free online seminar Automating the Web Localization Process led by Christoph Mosing, eTranslate Vice-President. When: Wednesday September 26, 2001 9:00 AM Pacific/12:00 PM Eastern and again at 12:00 PM Pacific/3:00 PM Eastern Event

Re: FW: 6 questions

2001-09-18 Thread Kenneth Whistler
Bernard Miller asked: 1.Why does Unicode say that there are 63486 code values available to represent characters with single 16 bit values and 2048 available to represent an additional 1,048,544 characters as surrogates? 65536 - 2048 = 63488 (difference of 2) --I guess it's due to the 2

Re: FW: 6 questions

2001-09-18 Thread James E. Agenbroad
On Tue, 18 Sep 2001, Magda Danish (Unicode) wrote: -Original Message- From: Bernard Miller [mailto:[EMAIL PROTECTED]] Sent: Monday, September 17, 2001 5:19 PM To: [EMAIL PROTECTED] Subject: 6 questions Hello, These are the questions I wanted to ask: 1. [snip] 6.

discontent about Indic scripts and Unicode

2001-09-18 Thread Hietaniemi Jarkko (NRC/Boston)
I happened across these links: http://acharya.iitm.ac.in/multi_sys/exist_codes.html http://acharya.iitm.ac.in/multi_sys/uni_iscii.html which do contain a nice discussion about ISCII but then they discuss Unicode in, ummm, somewhat negative terms. Myself knowing next to nothing about Indic

RE: 6 questions

2001-09-18 Thread Carl W. Brown
Bernard, Many of your questions have been answered by others but I wants to add a few comments. 1.Why does Unicode say that there are 63486 code values available to represent characters with single 16 bit values and 2048 available to represent an additional 1,048,544 characters as

Re: discontent about Indic scripts and Unicode

2001-09-18 Thread Michael \(michka\) Kaplan
This is the same problem that was discussed extensively for Tamil at TI2001 in Kuala Lampur last month. Basically, it boils down to three problems: 1) Most of the people involved do not understand Unicode or how it works. 2) Most of the people involved expect natural language processing to be a

No alligators (was RE: 6 questions)

2001-09-18 Thread Kenneth Whistler
Carl Brown said: U+FDD0 to U+FDEF are also noncharacters that represent a range that can be used by font rendering engines as an internal working set. An urban legend seems to have sprung up here. So once again, I assure everyone that there are no alligators living in the sewers resulting

Re: discontent about Indic scripts and Unicode

2001-09-18 Thread Kenneth Whistler
Jarkko reported: I happened across these links: http://acharya.iitm.ac.in/multi_sys/exist_codes.html http://acharya.iitm.ac.in/multi_sys/uni_iscii.html which do contain a nice discussion about ISCII but then they discuss Unicode in, ummm, somewhat negative terms. Myself knowing next

Re: FW: 6 questions

2001-09-18 Thread Asmus Freytag
At 12:26 PM 9/18/01 -0700, Kenneth Whistler wrote: 3.Why don't noBreak formatted Unicode characters have a canonical decomposition (the compatibility decomposition surrounded by glue)? A long story. But the short answer is that such a decomposition would cause problems for

RE: discontent about Indic scripts and Unicode

2001-09-18 Thread Carl W. Brown
Ken, Even those who do not know the details of Indic processing know that you can not argue both sides of the issue. There was a lot of criticism of the fact that there were differences in scripts yet there was no mention that Unicode because of its extended code base does support

Re: discontent about Indic scripts and Unicode

2001-09-18 Thread Ram Viswanadha
Carl, Carl W. Brown wrote: Why was really missing was the pint that Unicode is designed to support multi-lingual text. So is ISCII. Infact Unicode support for Indic scripts is based on ISCII. If we use a ISCII how can we support a page that contains different Indic scripts? ISCII has

RE: discontent about Indic scripts and Unicode

2001-09-18 Thread Carl W. Brown
Ram, ISCII has escape sequences which announce the start of a new Indic script. An ATR char followed by special codepoint forms the escape sequence. It is possible to support a page that contains different Indic scripts.There are problems with the standard like, it assumes a default

Last Call for Papers - 20th Unicode Conference - Jan/Feb 2001 -Washington DC

2001-09-18 Thread Misha . Wolf
Because of the recent tragic events and the resulting disruption we are sending you a reminder that this is the final week for submissions for the Twentieth International Unicode Conference (IUC20). Last Call for Papers! Twentieth International Unicode Conference (IUC20)

Re: PDUTR #26 posted

2001-09-18 Thread DougEwell2
In a message dated 2001-09-18 9:22:17 Pacific Daylight Time, [EMAIL PROTECTED] writes: Doug Ewell wrote: All Unicode code points of the form U+FE and U+FF are special, in that they are non-characters and can be treated in a special way by applications (e.g. as sentinels). I

Re: PDUTR #26 posted

2001-09-18 Thread DougEwell2
David Hopwood and Carl Brown graciously corrected me: I don't agree that irregular UTF-8 sequences in general can only decode to characters above 0x. That's why I specifically referred to irregular sequences as defined by Unicode 3.1 (i.e. UAX #27). I stand corrected. That's what I