RE: Unicode sorting...

2001-06-08 Thread Hong Zhang
I can't really believe that this would be a problem, but if they're integrated alphabets from different locales, will there be issues with sorting (if we're not planning to use the locale)? Are there instances where like characters were combined that will affect the sort orders?

RE: Unicode sorting...

2001-06-08 Thread NeonEdge
Another example is the chinese has no definite sorting order, period. The commonly used scheme are phonetic-based or stroke-based. Since many characters have more than one pronounciations (context sensitive) and more than one forms (simplified and traditional). So if we have a mix content

RE: Unicode sorting...

2001-06-08 Thread Hong Zhang
If this is the case, how would a regex like ^[a-zA-Z] work (or other, more sensitive characters)? If just about anything can come between A and Z, and letters that might be there in a particular locale aren't in another locale, then how will regex engine make the distinction? This syntax

Re: Unicode sorting...

2001-06-08 Thread Jarkko Hietaniemi
If this is the case, how would a regex like ^[a-zA-Z] work (or other, more sensitive characters)? If just about anything can come between A and Z, and letters that might be there in a particular locale aren't in another locale, then how will regex engine make the distinction? This

Re: Unicode sorting...

2001-06-08 Thread Jarkko Hietaniemi
I can't really believe that this would be a problem, but if they're integrated alphabets from different locales, will there be issues with sorting (if we're not planning to use the locale)? Are there instances where like characters were combined that will affect the sort orders? Yes, it is

RE: Unicode sorting...

2001-06-08 Thread Dan Sugalski
At 11:29 AM 6/8/2001 -0700, Hong Zhang wrote: If this is the case, how would a regex like ^[a-zA-Z] work (or other, more sensitive characters)? If just about anything can come between A and Z, and letters that might be there in a particular locale aren't in another locale, then how will

Re: Unicode sorting...

2001-06-08 Thread Jarkko Hietaniemi
The A-Z syntax is really a shorthand for All the uppercase letters. (Originally at least) I won't argue the problems with sorting various sets of characters in various locales, but for regexes at least it's not an issue, because the point isn't sorting or ordering, it's identifying