Re: CGJ , RLM

2004-11-24 Thread Doug Ewell
"kefas" wrote: > 1. U+034F CGJ, Combining Grapheme Joiner, is > displayed as a tall rectangle in MSKLCexe-test and as > a capital square in OutlookExpress AÍE aÍeÍaÍe. But > CGJ "has no visible glyph"! Thus CGJ is not > implemented correctly in Arial Unicode MS. Or are the > editors not imp

Shift-JIS conversion.

2004-11-24 Thread pragati
Hello,     Can anyone please tell me how to convert from UTF-8 to shift-JIS? Please let me know if there is any formula to do it other than using readymade functions as provided by pearl. Because these functions do not provide mapping for all characters.    Warm Regards,Pragati Desai.   Cybag

New Public Review Issue

2004-11-24 Thread Rick McGowan
The Unicode Technical Committee has posted a new issue for public review and comment. Details are on the following web page: http://www.unicode.org/review/ Review period for the new item closes on January 31, 2005. Please see the page for links to discussion and relevant documents. Brief

Re: No Invisible Character - NBSP at the start of a word

2004-11-24 Thread Peter Kirk
On 24/11/2004 22:23, Peter Kirk wrote: On 24/11/2004 22:00, Asmus Freytag wrote: ... The sequence SPACE NBSP *does* not allow a break after the SPACE under the line breaking rules we publish in UAX#14. The common usage in HTML, is to use one or more NBSP followed by SPACE to mark a wider space,

Re: No Invisible Character - NBSP at the start of a word

2004-11-24 Thread Asmus Freytag
At 04:53 PM 11/24/2004, Peter Kirk wrote: On 24/11/2004 22:23, Peter Kirk wrote: On 24/11/2004 22:00, Asmus Freytag wrote: ... The sequence SPACE NBSP *does* not allow a break after the SPACE under the line breaking rules we publish in UAX#14. I tried to change does not into *does* and missed dele

Public Review Issues updated

2004-11-24 Thread Rick McGowan
There have been a number of updates to Public Review Issues on the Unicode web site. The comment periods for Public Review Issues 51, 53, 54, and 56 have been extended to January 31, 2005. During the review period, new drafts may be issued, and if so, they will be announced at the time.

Re: No Invisible Character - NBSP at the start of a word

2004-11-24 Thread Peter Kirk
On 24/11/2004 22:00, Asmus Freytag wrote: ... The sequence SPACE NBSP *does* not allow a break after the SPACE under the line breaking rules we publish in UAX#14. The common usage in HTML, is to use one or more NBSP followed by SPACE to mark a wider space, that allows a break at the end. NBSPs a

Re: No Invisible Character - NBSP at the start of a word

2004-11-24 Thread Peter Kirk
On 24/11/2004 20:22, Jony Rosenne wrote: Ketiv and Qere, were two different words are written together, are not plain text and are thus out of scope for Unicode. For Unicode, one could either choose one version or the other or write them both separately. The forms I refer to are the ones print

Re: No Invisible Character - NBSP at the start of a word

2004-11-24 Thread Dean Snyder
Jony Rosenne wrote at 10:22 PM on Wednesday, November 24, 2004: >Ketiv and Qere, were two different words are written together, are not plain >text and are thus out of scope for Unicode. Actually, it's the vowels of one word written with the consonants of another (or just written by themselves w

Re: No Invisible Character - NBSP at the start of a word

2004-11-24 Thread John Hudson
Jony Rosenne wrote: This isn't what I said. I said it isn't a Unicode problem because it isn't plain text. And I don't understand how you are making this distinction between writing two words separately being plain text and combining them being not plain text. In what way is it not plain text? W

Re: My Querry

2004-11-24 Thread Asmus Freytag
At 04:23 PM 11/23/2004, Chris Jacobs wrote: Now, this implies that UTF-8 does interpret U+ as an ASCII NULL control char. This is incompatible with using it as a string terminator. Except that it's up to you how to interpret the C0 control codes in Unicode. You can do it according to ISO 6429

Re: No Invisible Character - NBSP at the start of a word

2004-11-24 Thread Asmus Freytag
At 04:36 AM 11/24/2004, Peter Kirk wrote: I understand that the proposed INVISIBLE CHARACTER was rejected at the recent UTC meeting. I presume that the intention is that NBSP should be used instead. At the moment, NBSP is the only sanctioned base character without 'ink'. There are cases of words

RE: No Invisible Character - NBSP at the start of a word

2004-11-24 Thread Jony Rosenne
> -Original Message- > From: [EMAIL PROTECTED] > [mailto:[EMAIL PROTECTED] On Behalf Of John Hudson > Sent: Wednesday, November 24, 2004 11:01 PM > To: 'Unicode List' > Subject: Re: No Invisible Character - NBSP at the start of a word > > > Jony Rosenne wrote: > > > Ketiv and Qere, we

Re: No Invisible Character - NBSP at the start of a word

2004-11-24 Thread John Hudson
Jony Rosenne wrote: Ketiv and Qere, were two different words are written together, are not plain text and are thus out of scope for Unicode. Writing them in a combined way results in some sequences of characters that are very problematic from a rendering perspective, but there is a long standing

RE: No Invisible Character - NBSP at the start of a word

2004-11-24 Thread Jony Rosenne
Ketiv and Qere, were two different words are written together, are not plain text and are thus out of scope for Unicode. For Unicode, one could either choose one version or the other or write them both separately. Jony > -Original Message- > From: [EMAIL PROTECTED] > [mailto:[EMAIL PRO

RE: Question on Canonical equivilance

2004-11-24 Thread Kenneth Whistler
Tim Greenwood asked: > > All of the spacing combining marks (general category Mc) except > > musical symbols have a canonical combining class of 0. So, for example > > > > 0B95 (TAMIL LETTER KA) 0BC7 (TAMIL VOWEL SIGN EE - stands to the left > > of the consonant) 0BBE (TAMIL VOWEL SIGN AA - on th

RE: Question on Canonical equivilance

2004-11-24 Thread Peter Constable
> From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf > Of Tim Greenwood > All of the spacing combining marks (general category Mc) except > musical symbols have a canonical combining class of 0. So, for example > > 0B95 (TAMIL LETTER KA) 0BC7 (TAMIL VOWEL SIGN EE - stands to the left >

No Invisible Character - NBSP at the start of a word

2004-11-24 Thread Peter Kirk
I understand that the proposed INVISIBLE CHARACTER was rejected at the recent UTC meeting. I presume that the intention is that NBSP should be used instead. There are cases of words which start with spacing combining marks, for which there are no separate Unicode characters. For example, there

Question on Canonical equivilance

2004-11-24 Thread Tim Greenwood
All of the spacing combining marks (general category Mc) except musical symbols have a canonical combining class of 0. So, for example 0B95 (TAMIL LETTER KA) 0BC7 (TAMIL VOWEL SIGN EE - stands to the left of the consonant) 0BBE (TAMIL VOWEL SIGN AA - on the right) is canonically distinct from 0B95

CGJ , RLM

2004-11-24 Thread kefas
1. U+034F CGJ, Combining Grapheme Joiner, is displayed as a tall rectangle in MSKLCexe-test and as a capital square in OutlookExpress AÍE aÍeÍaÍe. But CGJ "has no visible glyph"! Thus CGJ is not implemented correctly in Arial Unicode MS. Or are the editors not implemented correctly? Should A+

Re: Another Querry

2004-11-24 Thread Antoine Leca
On Wednesday, November 24th, 2004 04:02Z Harshal Trivedi va escriure: > How can i determine end of UCS-2/UCS-4 string while encoding it in C > program? It depends how you are storing and more importantly managing it. If you consider it as mere arrays of uint16_t/uint32_t, with your own function