date:20100712

Re: Bengali Script

2010-07-12 Thread John Burger

Am I missing something? Isn't the name of the =language= irrelevant? Unicode encodes =scripts=, not languages, yes? There's no English block in the standard, after all. The script we're talking about in this thread is used to write lots of languages - Wikipedia lists Assamese, Meitei,

Status of Unihan

2010-07-12 Thread Martin Heijdra

When will Unihan be back? It has been down for quite a while now, and there are librarians for whom checking this is part of their workflow… Martin

Re: Status of Unihan

2010-07-12 Thread Jeroen Ruigrok van der Werven

Martin, -On [20100712 16:52], Martin Heijdra (mheij...@princeton.edu) wrote: When will Unihan be back? It has been down for quite a while now, and there are librarians for whom checking this is part of their workflow… Can I offer http://www.cojak.org/ and http://www.jisho.org/kanji/radicals

Re: UTS#10 (collation) : French backwards level 2, and word-breakers.

2010-07-12 Thread Mark Davis ☕

A few comments. A tailoring that sorts word-by-word would certainly be possible, and is certainly allowed by the UCA. As to whether it is necessary or not, that is another matter. Sorting is about matching user expectations, and of all of the French that I have ever asked, none except for

Re: Status of Unihan

2010-07-12 Thread John H. Jenkins

We hope to have it back in the next few days. On Jul 12, 2010, at 8:34 AM, Martin Heijdra wrote: When will Unihan be back? It has been down for quite a while now, and there are librarians for whom checking this is part of their workflow… Martin = Siôn ap-Rhisiart John H. Jenkins

Re: Bengali Script

2010-07-12 Thread Eric Muller

On 7/8/2010 5:09 PM, Tulasi wrote: Ok I am correcting - Bangladeshi to Bengali. The Government of West Bengal / Society for Natural Language Technology Research (a member of the Consortium) has a very strong preference for the term Bengla rather than Bengali. Eric.

Re: Bengali Script

2010-07-12 Thread Michael Everson

On 12 Jul 2010, at 20:32, Eric Muller wrote: The Government of West Bengal / Society for Natural Language Technology Research (a member of the Consortium) has a very strong preference for the term Bengla rather than Bengali. As a speaker of English I have a very strong preference for the

Re: charset parameter in Google Groups

2010-07-12 Thread Philippe Verdy

The problem in this message is probably not in the specified charset (windows-1252) but on the way the MIME type is specified just before it TEXT/PLAIN. Traditionally, the MIME types are only given in lowercase, so if you had written text/plain; charset=windows-1252, it would have been orrectly

RE: UTS#10 (collation) : French backwards level 2, and word-breakers.

2010-07-12 Thread Kenneth Whistler

Philippe Verdy said: If we don't limit the backwards reordering, then all accents in the full sentences will be reordered, so this is the final word that will drive the order. not only this is incorrect, I understand that you think that the ordering should be done word-by-word, with the

Re: UTS#10 (collation) : French backwards level 2, and word-breakers.

2010-07-12 Thread Philippe Verdy

Mark Davis ☕ m...@macchiato.com A few comments. A tailoring that sorts word-by-word would certainly be possible, and is certainly allowed by the UCA. As to whether it is necessary or not, that is another matter. Sorting is about matching user expectations, and of all of the French that I

RE: UTS#10 (collation) : French backwards level 2, and word-breakers.

2010-07-12 Thread Philippe Verdy

Kenneth Whistler k...@sybase.com wrote: Huh? That is just preprocessing to delete portions of strings before calculating keys. If you want to do so, be my guest, but building in arbitrary rules of content suppression into the UCA algorithm itself is a non-starter. I have definitely not asked

RE: UTS#10 (collation) : French backwards level 2, and word-breakers.

2010-07-12 Thread Kenneth Whistler

Philippe Verdy wrote: Kenneth Whistler k...@sybase.com wrote: Huh? That is just preprocessing to delete portions of strings before calculating keys. If you want to do so, be my guest, but building in arbitrary rules of content suppression into the UCA algorithm itself is a non-starter.

Re: Bengali Script

2010-07-12 Thread Tulasi

In this thread I am looking to explore specific/distinctive answer to: Among both, which standard has more letters/symbols? So I needed 2 list of letters/symbols including cascaded conjuncts, one GOB-standard and the other WBG-standard. Coming up with a list of names were not the theme/target,

Re: charset parameter in Google Groups

2010-07-12 Thread Mark Crispin

On Mon, 12 Jul 2010, Philippe Verdy wrote: Traditionally, the MIME types are only given in lowercase, so if you had written text/plain; charset=windows-1252, it would have been orrectly detected. Nonsense. Pure, unadulterated nonsense. I helped write the MIME RFCs, and I can assure you that

Re: UTS#10 (collation) : French backwards level 2, and word-breakers.

2010-07-12 Thread verdy_p

Kenneth Whistler A : verd...@wanadoo.fr Copie à : unicode@unicode.org, k...@sybase.com Objet : Re: UTS#10 (collation) : French backwards level 2, and word-breakers. Philippe Verdy said: A basic word-breaker using inly the space separator would marvelously improve the speed of French

RE: UTS#10 (collation) : French backwards level 2, and word-breakers.

2010-07-12 Thread Philippe Verdy

De : Kenneth Whistler k...@sybase.com Philippe Verdy wrote: Kenneth Whistler k...@sybase.com wrote: Huh? That is just preprocessing to delete portions of strings before calculating keys. If you want to do so, be my guest, but building in arbitrary rules of content suppression into

Re: Bengali Script

Status of Unihan

Re: Status of Unihan

Re: UTS#10 (collation) : French backwards level 2, and word-breakers.

Re: Status of Unihan

Re: Bengali Script

Re: Bengali Script

Re: charset parameter in Google Groups

RE: UTS#10 (collation) : French backwards level 2, and word-breakers.

Re: UTS#10 (collation) : French backwards level 2, and word-breakers.

RE: UTS#10 (collation) : French backwards level 2, and word-breakers.

RE: UTS#10 (collation) : French backwards level 2, and word-breakers.

Re: Bengali Script

Re: charset parameter in Google Groups

Re: UTS#10 (collation) : French backwards level 2, and word-breakers.

RE: UTS#10 (collation) : French backwards level 2, and word-breakers.

16 matches

Site Navigation

Mail list logo

Footer information