Sorry. Whenever I wrote NFC, I meant NFD. Typo. 

> 在 2016年9月19日,23:16,Xuelei Fan <xuelei....@oracle.com> 写道:
> 
>> On 9/19/2016 11:03 PM, Wang Weijun wrote:
>> After some thinking, my current opinion is.
>> 
>> 1. Maybe NFC is better than NFKD, but I am not a Unicode expert.
>> 
> It is updated from NFKD to NFD.  I did not get the point.  Do you mean NFC is 
> better than NFD?
> 
>> 2. I think the real bug is the order of escaping and normalization. The 
>> normalization (if a must) should be performed earlier right after valStr is 
>> created and only performed on valStr. Otherwise the NFKD normalization would 
>> generate new chars that need to be escaped. Again I am not a Unicode expert 
>> and I don't know if NFC will also do the same.
>> 
> I don't get the point.  The update is moving from NFKD to NFD.  No NFKD 
> normalization any more.
> 
>> If 2) is fixed, whatever is correct in 1) does not matter much.
>> 
> If we continue to use NFKD, normalization before escaping would result in 
> unexpected string as we talked for the hello-world example.  

I this case, a comma appears but then it is escaped. You might say it is 
unexpected, but at least after escaping, it becomes a legal string. 

> It is something I want to avoid, so that it is fixed to use NFD instead.  I 
> think if we are moving to use NFD, it is does not matter to escaping first or 
> normalization first if I understand the UTF-8 correctly.

Maybe, but IMO this is not the correct fix. The ultimate reason of the bug is 
not the form chosen, but the order. 

--Max

> 
> Thanks,
> Xuelei
> 
>> Thanks
>> Max
>> 
>>>> On Sep 19, 2016, at 10:32 AM, Xuelei Fan <xuelei....@oracle.com> wrote:
>>>> 
>>>> 4. Is it possible to perform normalization before escaping special 
>>>> characters?
>>>> 
>>> Yes.  I though about this case.  The current fix comes from the fact that 
>>> UTF-8 "Hello, world!" and "Hello, world!" should be different. Parsing them 
>>> as the same thing may result in unexpected serious issues.
>> 

Reply via email to