As I see it, there are two reasons why you might need to transcode. First, you might need it to access some particular algorithm you need. You have some nice tokenizer class, or a regular expression class that takes char *. Instead of rewriting the class to take XMLCh*, you transcode, process and perhaps convert the result back.
The second scenario, is when you are transcoding for output, to display a string to the user, or write it to a file and need to access a platform function that assumes ASCII, or some other encoding As I see it, the C++ standard library deals with the first issue well. The char traits classes and the templated std::basic_string class make it possible to deal with strings abstractly. Searching, sorting, etc. work the same, whether your XMLCh is a 8-bit signed char, or a 64-bit unsigned long. Writing good, char size independent algorithms is possible and simple. The second issue is more complex. When it comes time to deal with the issues of encodings, etc. you just have to bite the bullet and do it. So, while an algorithm may be able to be designed to be independent of a particular character representation, a program can't escape it for I/O. My proposal was to replace DOMString with basic_string<XMLCh> with a possibly conditional definition of XMLCh. But I'd be happy if we just used std::basic_string<XMLCh> where XMLCh was always a 16-bit unsigned integer, like it is today. This would allow the use of generic string algorithms, in the style of the standard library. -Rob Julian Pardoe wrote: >Having XMLChs as some other than chars makes life a major pain. Suddenly >all the regular facilities one's used to using aren't there any more. >Suddenly your having to convert strings before you can pass them to any >part >of your existing system. The answer is of course transcoding: one can wrap >every string access in a call to a transcoder. But this is clumsy and >ineffecient -- it would be nice if the transcoding were done long before >the .input ever reached you!