Toby, I believe that Peoplesoft does not have a unique problem. A "just say no to UTF-8s" attitude does not solve your problem or the problem of other companies in your situation. The problem that I see with the UTF-8s proposal is that it needs different support than UTF-8. What do UTF-8 support services add? UTF-8 support is the same as multi-byte support. Most of same code can be used with one change. It calculates character lengths differently from other multi-byte implementations. All other changes such as case insensitive compares for example, are based on being able to detect the proper character boundaries. Functions such as strcpy will still work with both UTF-8 and UTF-8s but you don't need a multibyte implementation of strcpy to work with multi-byte characters either because you are working at a string level without regard to the string content. I think that most people will initially think that they can use the UTF-8 services with UTF-8s. They think that it can be treated as a UTF-16 sorted UTF-8. That is not true. Once that start using surrogates they are no longer dealing with character units but fractional character encoding units. They are back to dealing with the same problems that they had using SBCS functions on MBCS data. This may be enough for you. It is difficult to say if the type of functions that you intend to perform on the data is the same as handing DBCS data with SBCS functions. If so, then UTF-8s might be for you. If not, I think that it might be helpful to work on a real solution before you and Oracle get in over your heads. I think that it would be far more productive of this forum to work on answers rather than just criticize. I am still puzzled. It don't understand what is so special about UTF-8s. Why can't you use UTF-16? If you break a standard, it would seem to me that an encoding scheme that you can not determine the length of a character from the first byte is a serious flaw. You might be better to violate the Unicode standard and relocate the characters after the surrogates to a plane 17. This would not be Unicode but it would work with the UTF-8 support functions and would be easy to convert to and from Unicode. It also might be easier to compare UTF-16 in code point order. They are lots of alternatives but from what I know now it looks to me like UTF-8s is the worst choice. Worse still because it is a deceptive choice that looks like an easy way out but will probably cause more grief latter. It looks like it has already put Oracle in a difficult position. If it turns out that UTF-8s it the best solution then we should spell out the guidelines for the user community on how to implement and use UTF-8s. Carl

