Sorry. Whenever I wrote NFC, I meant NFD. Typo. > 在 2016年9月19日,23:16,Xuelei Fan <xuelei....@oracle.com> 写道: > >> On 9/19/2016 11:03 PM, Wang Weijun wrote: >> After some thinking, my current opinion is. >> >> 1. Maybe NFC is better than NFKD, but I am not a Unicode expert. >> > It is updated from NFKD to NFD. I did not get the point. Do you mean NFC is > better than NFD? > >> 2. I think the real bug is the order of escaping and normalization. The >> normalization (if a must) should be performed earlier right after valStr is >> created and only performed on valStr. Otherwise the NFKD normalization would >> generate new chars that need to be escaped. Again I am not a Unicode expert >> and I don't know if NFC will also do the same. >> > I don't get the point. The update is moving from NFKD to NFD. No NFKD > normalization any more. > >> If 2) is fixed, whatever is correct in 1) does not matter much. >> > If we continue to use NFKD, normalization before escaping would result in > unexpected string as we talked for the hello-world example.
I this case, a comma appears but then it is escaped. You might say it is unexpected, but at least after escaping, it becomes a legal string. > It is something I want to avoid, so that it is fixed to use NFD instead. I > think if we are moving to use NFD, it is does not matter to escaping first or > normalization first if I understand the UTF-8 correctly. Maybe, but IMO this is not the correct fix. The ultimate reason of the bug is not the form chosen, but the order. --Max > > Thanks, > Xuelei > >> Thanks >> Max >> >>>> On Sep 19, 2016, at 10:32 AM, Xuelei Fan <xuelei....@oracle.com> wrote: >>>> >>>> 4. Is it possible to perform normalization before escaping special >>>> characters? >>>> >>> Yes. I though about this case. The current fix comes from the fact that >>> UTF-8 "Hello, world!" and "Hello, world!" should be different. Parsing them >>> as the same thing may result in unexpected serious issues. >>