I remember Marco's original post in 2002. His intent was to give people with an actual U+ code point that needed converting—like James Lin ten years later—a quick way to do so without getting immersed in all the bit-shifting math.

If this were a routine being run by a computer, or a tutorial on UTF-8, I would agree that it should have taken loose surrogates into account. But it's not. It's just a quick manual reference guide, and loose surrogates are 0.0001% of the real-world problem for users like James.

While I note that Philippe's amended version seems straightforward and in keeping with Marco's original intent (short and simple), I'd like to suggest that neither Marco for creating the original guide, nor anyone else for doing up UTF-16 and UTF-32 versions, nor Otto for reposting them on the list this week, need to be beaten up any further over this edge case.

--
Doug Ewell | Thornton, Colorado, USA
http://www.ewellic.org | @DougEwell ­

Reply via email to