On Thu, Jul 26, 2012 at 11:04 PM, Mark Rejhon <[email protected]> wrote: > Generally, in most reasonable situations in XMPP, normalizing an > already-normalized Unicode string, results in no changes. Kevin says to > specify a normalization format, but how do we know what normalization > network equipment uses? So we have to carefully choose the normalization > standard that is least likely to be affected by further unexpected passes of > normalization.
My Unicode knowledge is hazy at best, but I think that if we normalise with e.g. NFC before the sender calculates the edits (that is - the sender calculates the NFC pre-string, and the NFC post-string, such that what is sent on the wire is NFC) and the recipient normalises the incoming packets (that is - even if the network (or the language) has renormalised to e.g. NFD, the recipient will have renormalised to the same form as the sender, and so will perform the same transforms and end up with an identical NFC buffer to the sender. > - Again, rare normalization damage (which I have never seen, not even with > realjabber.org, talk.l.google.com, or Openfire) is self repairing anyway via > Message Reset. I believe that some libraries will change the normal form of strings, at least - so without explicit normalisation rules someone implementing clients in these situations would end up with RTT that didn't quite work right. It's true that resets will fix it every 10 seconds or whenever, but if we have the ability to easily resolve the issue I think we should (the normalisation won't be a particular code burden for devs, as all XMPP entities need to do Unicode mangling elsewhere anyway, so will have the relevant tools). /K
