Re: Unicode collation algorithm - Khmer/Cambodian

2001-02-10 Thread Mark Davis
I have not been following this discussion up until now. Typically the issue with syllables is like that with word-sorting. With word sorting, no matter what is in the second word, any difference in the first word swamps it. Example: ab xyz ghi abc def ghi In many cases, UCA does handle syllabic

Fw: Unicode collation algorithm - interpretation]

2001-02-10 Thread J M Sykes
Because I have not received a copy of the following via the Unicode List, I have assumed the sender (who is probably well known to at least some as editor of the SQL standard) may not currently be a member of the list. Since he clearly intended this message to go to the list, and because it is

Re: extracting words

2001-02-10 Thread Edward Cherlin
At 1:03 AM -0800 1/29/01, Brahim Mouhdi wrote: Hello all, I'm writing a C-program that is called Blacklist, It's purpose is to accept a string (unicode) and extract words from it, then hash the found words according to a hashing algorythm and see if the word is in blacklist hashtable. This is

RE: extracting words

2001-02-10 Thread Makarand Gadre
Like Edward saud, Getting words from a string is nontrivial. You get similar issues in Thai. Thai coes not have any space between words, but the script is Indic based (phonetic). You have to continuously look up the speller and even then it can't be correct for all cases. E.g. Sunday or