[ I don't like writing me-too e-mails, but you beat me by a minute to sending the exact same mail, so I'm doing it anyway ;-) ]
> So I'd say that we should refer to characters in a string, and deal with > Unicode code-points in the abstract. I'd expect that implementations would > convert this internally into whatever made sense for them. I think it would be the first protocol to depend on knowing how to count code points (I haven't needed it before), but I also think it's the only sensible thing to do, because you could end up with incorrect encodings using the protocol otherwise. Anyway, for applications that don't use Unicode libraries, rolling your own codepoint count isn't very hard, at least for utf-8. cheers, Remko
