Re: Code Review Request, JDK-8146600 AVA Normalizer.Form issue

Xuelei Fan Mon, 19 Sep 2016 08:18:16 -0700

On 9/19/2016 11:03 PM, Wang Weijun wrote:

After some thinking, my current opinion is.


1. Maybe NFC is better than NFKD, but I am not a Unicode expert.

It is updated from NFKD to NFD. I did not get the point. Do you meanNFC is better than NFD?

2. I think the real bug is the order of escaping and normalization. The 
normalization (if a must) should be performed earlier right after valStr is 
created and only performed on valStr. Otherwise the NFKD normalization would 
generate new chars that need to be escaped. Again I am not a Unicode expert and 
I don't know if NFC will also do the same.

I don't get the point. The update is moving from NFKD to NFD. No NFKDnormalization any more.

If 2) is fixed, whatever is correct in 1) does not matter much.

If we continue to use NFKD, normalization before escaping would resultin unexpected string as we talked for the hello-world example. It issomething I want to avoid, so that it is fixed to use NFD instead. Ithink if we are moving to use NFD, it is does not matter to escapingfirst or normalization first if I understand the UTF-8 correctly.


Thanks,
Xuelei

Thanks
Max

On Sep 19, 2016, at 10:32 AM, Xuelei Fan <xuelei....@oracle.com> wrote:

4. Is it possible to perform normalization before escaping special characters?

Yes.  I though about this case.  The current fix comes from the fact that UTF-8 "Hello, 
world!" and "Hello， world!" should be different. Parsing them as the same thing may 
result in unexpected serious issues.

Re: Code Review Request, JDK-8146600 AVA Normalizer.Form issue

Reply via email to