Re: Bengali Script

2010-07-12 Thread John Burger
Am I missing something? Isn't the name of the =language= irrelevant? Unicode encodes =scripts=, not languages, yes? There's no English block in the standard, after all. The script we're talking about in this thread is used to write lots of languages - Wikipedia lists Assamese, Meitei,

Status of Unihan

2010-07-12 Thread Martin Heijdra
When will Unihan be back? It has been down for quite a while now, and there are librarians for whom checking this is part of their workflow… Martin

Re: Status of Unihan

2010-07-12 Thread Jeroen Ruigrok van der Werven
Martin, -On [20100712 16:52], Martin Heijdra (mheij...@princeton.edu) wrote: When will Unihan be back? It has been down for quite a while now, and there are librarians for whom checking this is part of their workflow… Can I offer http://www.cojak.org/ and http://www.jisho.org/kanji/radicals

Re: UTS#10 (collation) : French backwards level 2, and word-breakers.

2010-07-12 Thread Mark Davis ☕
A few comments. A tailoring that sorts word-by-word would certainly be possible, and is certainly allowed by the UCA. As to whether it is necessary or not, that is another matter. Sorting is about matching user expectations, and of all of the French that I have ever asked, none except for

Re: Status of Unihan

2010-07-12 Thread John H. Jenkins
We hope to have it back in the next few days. On Jul 12, 2010, at 8:34 AM, Martin Heijdra wrote: When will Unihan be back? It has been down for quite a while now, and there are librarians for whom checking this is part of their workflow… Martin = Siôn ap-Rhisiart John H. Jenkins

Re: Bengali Script

2010-07-12 Thread Eric Muller
On 7/8/2010 5:09 PM, Tulasi wrote: Ok I am correcting - Bangladeshi to Bengali. The Government of West Bengal / Society for Natural Language Technology Research (a member of the Consortium) has a very strong preference for the term Bengla rather than Bengali. Eric.

Re: Bengali Script

2010-07-12 Thread Michael Everson
On 12 Jul 2010, at 20:32, Eric Muller wrote: The Government of West Bengal / Society for Natural Language Technology Research (a member of the Consortium) has a very strong preference for the term Bengla rather than Bengali. As a speaker of English I have a very strong preference for the

Re: charset parameter in Google Groups

2010-07-12 Thread Philippe Verdy
The problem in this message is probably not in the specified charset (windows-1252) but on the way the MIME type is specified just before it TEXT/PLAIN. Traditionally, the MIME types are only given in lowercase, so if you had written text/plain; charset=windows-1252, it would have been orrectly

RE: UTS#10 (collation) : French backwards level 2, and word-breakers.

2010-07-12 Thread Kenneth Whistler
Philippe Verdy said: If we don't limit the backwards reordering, then all accents in the full sentences will be reordered, so this is the final word that will drive the order. not only this is incorrect, I understand that you think that the ordering should be done word-by-word, with the

Re: UTS#10 (collation) : French backwards level 2, and word-breakers.

2010-07-12 Thread Philippe Verdy
Mark Davis ☕ m...@macchiato.com A few comments. A tailoring that sorts word-by-word would certainly be possible, and is certainly allowed by the UCA. As to whether it is necessary or not, that is another matter. Sorting is about matching user expectations, and of all of the French that I

RE: UTS#10 (collation) : French backwards level 2, and word-breakers.

2010-07-12 Thread Philippe Verdy
Kenneth Whistler k...@sybase.com wrote: Huh? That is just preprocessing to delete portions of strings before calculating keys. If you want to do so, be my guest, but building in arbitrary rules of content suppression into the UCA algorithm itself is a non-starter. I have definitely not asked

RE: UTS#10 (collation) : French backwards level 2, and word-breakers.

2010-07-12 Thread Kenneth Whistler
Philippe Verdy wrote: Kenneth Whistler k...@sybase.com wrote: Huh? That is just preprocessing to delete portions of strings before calculating keys. If you want to do so, be my guest, but building in arbitrary rules of content suppression into the UCA algorithm itself is a non-starter.

Re: Bengali Script

2010-07-12 Thread Tulasi
In this thread I am looking to explore specific/distinctive answer to: Among both, which standard has more letters/symbols? So I needed 2 list of letters/symbols including cascaded conjuncts, one GOB-standard and the other WBG-standard. Coming up with a list of names were not the theme/target,

Re: charset parameter in Google Groups

2010-07-12 Thread Mark Crispin
On Mon, 12 Jul 2010, Philippe Verdy wrote: Traditionally, the MIME types are only given in lowercase, so if you had written text/plain; charset=windows-1252, it would have been orrectly detected. Nonsense. Pure, unadulterated nonsense. I helped write the MIME RFCs, and I can assure you that

Re: UTS#10 (collation) : French backwards level 2, and word-breakers.

2010-07-12 Thread verdy_p
Kenneth Whistler A : verd...@wanadoo.fr Copie à : unicode@unicode.org, k...@sybase.com Objet : Re: UTS#10 (collation) : French backwards level 2, and word-breakers. Philippe Verdy said: A basic word-breaker using inly the space separator would marvelously improve the speed of French

RE: UTS#10 (collation) : French backwards level 2, and word-breakers.

2010-07-12 Thread Philippe Verdy
De : Kenneth Whistler k...@sybase.com Philippe Verdy wrote: Kenneth Whistler k...@sybase.com wrote: Huh? That is just preprocessing to delete portions of strings before calculating keys. If you want to do so, be my guest, but building in arbitrary rules of content suppression into