You will have to normalize the way the strings are processed, and you need to make sure it is done the same way everytime. Checkout ICU for this purpose.
http://oss.software.ibm.com/icu/ Dave --- "Theodore H. Smith" <[EMAIL PROTECTED]> wrote: > What is going to be done about the confusion generated from > having multiple ways to encode the same character? > > For example, for filenames, OSX will encode an accented Roman > letter one way, while for filenames Windows will encode it the > other way. These kind of confusions are totally expected, if > Unicode will allow more than one way to encode the same > character. > > This means that matching algorithm's won't work, because the > characters are different! > > Will there be some kind of recommendation of which to avoid? > Will the Unicode consortium make a standard to say that one of > these encodings is strongly not recommended, and in fact > depreciated? > > And what about the OS that uses this encoding? How will the > Unicode consortium make the newly-offending OS change it's ways? > > And what about the hordes of apps that expect one format but > don't expect the other? And the hoardes of OS independant apps > (Java? Perl?) that might generate conflicting versions? > > ===== Dave Possin Globalization Consultant www.Welocalize.com http://groups.yahoo.com/group/locales/ __________________________________________________ Do You Yahoo!? Sign up for SBC Yahoo! Dial - First Month Free http://sbc.yahoo.com

