Asmus Freytag <asmusf at ix dot netcom dot com> wrote: >> Doing conversion and validation at different stages isn't a great >> idea; that's how character encodings get involved with security >> problems. > > Note that I am careful not to suggest that (and I'm sure Markus isn't > either). "Handling" includes much more than code conversion. It > includes uppercasing, spell checking, sorting, searching, the whole > lot. Burdening every single one of those tasks with policing the > integrity of the encoding seems wasteful, and, as I tried to explain, > puts the error detection in a place where you'll be most likely > prevented from doing something useful in recovery.
Right, but as I said, those downstream tasks shouldn't be consumers of UTF-16 code units anyway. They should be consumers of Unicode code points, which by definition excludes loose surrogates. -- Doug Ewell | Thornton, Colorado, USA | http://www.ewellic.org RFC 5645, 4645, UTN #14 | ietf-languages @ is dot gd slash 2kf0s