Jill, I think that the best practice is to validate input.
Besides the overhead of revalidating there is the issue of what do you do with data that contains invalid characters. This has to be handles explicitly. Once validated all transforms should maintain valid data. If you also provide a modified strncpy function that returns a length instead of a pointer and only copies complete code points in whatever UTF you are using you should not break the validity. Carl

