> On Sep 20, 2016, at 8:58 AM, Xuelei Fan <xuelei....@oracle.com> wrote:
>> I this case, a comma appears but then it is escaped. You might say it is
>> unexpected, but at least after escaping, it becomes a legal string.
> I did not get the point. A comma (",") should be escaped and it does get
> escaped and the string is legal. Do you mean "，" (double bytes comma) should
> be converted to ","? Can you have more details?
I'll write double bytes comma as ,, below.
Current code, "Hello,,world" is not modified at escaping, and becomes
"Hello,world" after normalization. This is illegal.
With my fix, "Hello,,world" becomes "Hello,world" after normalization, and then
"Hello\,world" after escaping. This is legal.
With your fix, "Hello,,world" is not modified after both steps, and it's legal.
So both your and my fixes will make it legal and the test will succeed.
>>> It is something I want to avoid, so that it is fixed to use NFD
>>> instead. I think if we are moving to use NFD, it is does not matter
>>> to escaping first or normalization first if I understand the UTF-8
>> Maybe, but IMO this is not the correct fix. The ultimate reason of the
>> bug is not the form chosen, but the order.
> I'm not with you for this bug. The bug is complain about the escaping issue,
> but actually the character should not be escaped. So it is not an issue of
> escaping. So this fix is not trying to fix the escaping issue, but trying to
> fix the normalization issue.
Yes it is complaining about escaping, but there are 2 ways to amend it. 1)
escape it. 2) make it not necessary to escape.
I just prefer my fix, because I think that's where the bug is. Even if we
switch to NFD, I would still like to put normalization before escaping, even if
practically it makes no difference.
>>>>> On Sep 19, 2016, at 10:32 AM, Xuelei Fan <xuelei....@oracle.com
>>>>> <mailto:xuelei....@oracle.com>> wrote:
>>>>>> 4. Is it possible to perform normalization before escaping special
>>>>> Yes. I though about this case. The current fix comes from the fact
>>>>> that UTF-8 "Hello, world!" and "Hello， world!" should be different.
>>>>> Parsing them as the same thing may result in unexpected serious issues.