Arcane Jill <arcanejill at ramonsky dot com> wrote: > Here's something that's been bothering me. Suppose I write a function > - let's call it trim(), which removes leading and trailing spaces from > a string, represented as one of the UTFs. If I've understood this > correctly, I'm supposed to validate the input, yes? > > Okay, now suppose I write a second function - let's call it tolower(), > which lowercases a string, again represented as one of the UTFs. > Again, I guess I'm supposed to validate the input. yes?...
This is one reason why I work with "strings" of code points, and only convert strings of UTF code units when I read them in and write them out. The read and write functions do the necessary validation, allowing the rest of the code to focus on characters. If you operate directly on strings of UTF-8 bytes, you have to worry about things like this. To answer your question, if you've already validated your input, and you generate only valid output (which I hope is the case :-), and your second function ONLY gets (valid) data from your first function, then you probably don't need to re-validate them. But I'd hate to have to do tolower() for non-Basic-Latin on strings of UTF-8 bytes. For me, conversion from any CES or TES always implies validation. -Doug Ewell Fullerton, California http://users.adelphia.net/~dewell/

