On 9/19/2016 11:03 PM, Wang Weijun wrote:
It is updated from NFKD to NFD. I did not get the point. Do you mean
NFC is better than NFD?
After some thinking, my current opinion is.
1. Maybe NFC is better than NFKD, but I am not a Unicode expert.
I don't get the point. The update is moving from NFKD to NFD. No NFKD
normalization any more.
2. I think the real bug is the order of escaping and normalization. The
normalization (if a must) should be performed earlier right after valStr is
created and only performed on valStr. Otherwise the NFKD normalization would
generate new chars that need to be escaped. Again I am not a Unicode expert and
I don't know if NFC will also do the same.
If we continue to use NFKD, normalization before escaping would result
in unexpected string as we talked for the hello-world example. It is
something I want to avoid, so that it is fixed to use NFD instead. I
think if we are moving to use NFD, it is does not matter to escaping
first or normalization first if I understand the UTF-8 correctly.
If 2) is fixed, whatever is correct in 1) does not matter much.
On Sep 19, 2016, at 10:32 AM, Xuelei Fan <xuelei....@oracle.com> wrote:
4. Is it possible to perform normalization before escaping special characters?
Yes. I though about this case. The current fix comes from the fact that UTF-8 "Hello,
world!" and "Hello， world!" should be different. Parsing them as the same thing may
result in unexpected serious issues.