Re: Code Review Request, JDK-8146600 AVA Normalizer.Form issue

2016-09-19 Thread Wang Weijun

> On Sep 20, 2016, at 8:58 AM, Xuelei Fan  wrote:
>> 
>> I this case, a comma appears but then it is escaped. You might say it is
>> unexpected, but at least after escaping, it becomes a legal string.
>> 
> I did not get the point.  A comma (",") should be escaped and it does get 
> escaped and the string is legal.  Do you mean "," (double bytes comma) should 
> be converted to ","?  Can you have more details?

I'll write double bytes comma as ,, below.

Current code, "Hello,,world" is not modified at escaping, and becomes 
"Hello,world" after normalization. This is illegal.

With my fix, "Hello,,world" becomes "Hello,world" after normalization, and then 
"Hello\,world" after escaping. This is legal.

With your fix, "Hello,,world" is not modified after both steps, and it's legal.

So both your and my fixes will make it legal and the test will succeed.

> 
>>> It is something I want to avoid, so that it is fixed to use NFD
>>> instead.  I think if we are moving to use NFD, it is does not matter
>>> to escaping first or normalization first if I understand the UTF-8
>>> correctly.
>> 
>> Maybe, but IMO this is not the correct fix. The ultimate reason of the
>> bug is not the form chosen, but the order.
>> 
> I'm not with you for this bug. The bug is complain about the escaping issue, 
> but actually the character should not be escaped.  So it is not an issue of 
> escaping.  So this fix is not trying to fix the escaping issue, but trying to 
> fix the normalization issue.

Yes it is complaining about escaping, but there are 2 ways to amend it. 1) 
escape it. 2) make it not necessary to escape.

I just prefer my fix, because I think that's where the bug is. Even if we 
switch to NFD, I would still like to put normalization before escaping, even if 
practically it makes no difference.

Thanks
Max

> 
> Thanks,
> Xuelei
> 
>> --Max
>> 
>>> 
>>> Thanks,
>>> Xuelei
>>> 
 Thanks
 Max
 
> On Sep 19, 2016, at 10:32 AM, Xuelei Fan  > wrote:
> 
>> 4. Is it possible to perform normalization before escaping special
>> characters?
>> 
> Yes.  I though about this case.  The current fix comes from the fact
> that UTF-8 "Hello, world!" and "Hello, world!" should be different.
> Parsing them as the same thing may result in unexpected serious issues.
 



Re: Code Review Request, JDK-8146600 AVA Normalizer.Form issue

2016-09-19 Thread Wang Weijun
Sorry. Whenever I wrote NFC, I meant NFD. Typo. 

> 在 2016年9月19日,23:16,Xuelei Fan  写道:
> 
>> On 9/19/2016 11:03 PM, Wang Weijun wrote:
>> After some thinking, my current opinion is.
>> 
>> 1. Maybe NFC is better than NFKD, but I am not a Unicode expert.
>> 
> It is updated from NFKD to NFD.  I did not get the point.  Do you mean NFC is 
> better than NFD?
> 
>> 2. I think the real bug is the order of escaping and normalization. The 
>> normalization (if a must) should be performed earlier right after valStr is 
>> created and only performed on valStr. Otherwise the NFKD normalization would 
>> generate new chars that need to be escaped. Again I am not a Unicode expert 
>> and I don't know if NFC will also do the same.
>> 
> I don't get the point.  The update is moving from NFKD to NFD.  No NFKD 
> normalization any more.
> 
>> If 2) is fixed, whatever is correct in 1) does not matter much.
>> 
> If we continue to use NFKD, normalization before escaping would result in 
> unexpected string as we talked for the hello-world example.  

I this case, a comma appears but then it is escaped. You might say it is 
unexpected, but at least after escaping, it becomes a legal string. 

> It is something I want to avoid, so that it is fixed to use NFD instead.  I 
> think if we are moving to use NFD, it is does not matter to escaping first or 
> normalization first if I understand the UTF-8 correctly.

Maybe, but IMO this is not the correct fix. The ultimate reason of the bug is 
not the form chosen, but the order. 

--Max

> 
> Thanks,
> Xuelei
> 
>> Thanks
>> Max
>> 
 On Sep 19, 2016, at 10:32 AM, Xuelei Fan  wrote:
 
 4. Is it possible to perform normalization before escaping special 
 characters?
 
>>> Yes.  I though about this case.  The current fix comes from the fact that 
>>> UTF-8 "Hello, world!" and "Hello, world!" should be different. Parsing them 
>>> as the same thing may result in unexpected serious issues.
>> 


Re: Code Review Request, JDK-8146600 AVA Normalizer.Form issue

2016-09-19 Thread Xuelei Fan

On 9/19/2016 11:03 PM, Wang Weijun wrote:

After some thinking, my current opinion is.

1. Maybe NFC is better than NFKD, but I am not a Unicode expert.

It is updated from NFKD to NFD.  I did not get the point.  Do you mean 
NFC is better than NFD?



2. I think the real bug is the order of escaping and normalization. The 
normalization (if a must) should be performed earlier right after valStr is 
created and only performed on valStr. Otherwise the NFKD normalization would 
generate new chars that need to be escaped. Again I am not a Unicode expert and 
I don't know if NFC will also do the same.

I don't get the point.  The update is moving from NFKD to NFD.  No NFKD 
normalization any more.



If 2) is fixed, whatever is correct in 1) does not matter much.

If we continue to use NFKD, normalization before escaping would result 
in unexpected string as we talked for the hello-world example.  It is 
something I want to avoid, so that it is fixed to use NFD instead.  I 
think if we are moving to use NFD, it is does not matter to escaping 
first or normalization first if I understand the UTF-8 correctly.


Thanks,
Xuelei


Thanks
Max


On Sep 19, 2016, at 10:32 AM, Xuelei Fan  wrote:


4. Is it possible to perform normalization before escaping special characters?


Yes.  I though about this case.  The current fix comes from the fact that UTF-8 "Hello, 
world!" and "Hello, world!" should be different. Parsing them as the same thing may 
result in unexpected serious issues.




Re: Code Review Request, JDK-8146600 AVA Normalizer.Form issue

2016-09-19 Thread Wang Weijun
After some thinking, my current opinion is.

1. Maybe NFC is better than NFKD, but I am not a Unicode expert.

2. I think the real bug is the order of escaping and normalization. The 
normalization (if a must) should be performed earlier right after valStr is 
created and only performed on valStr. Otherwise the NFKD normalization would 
generate new chars that need to be escaped. Again I am not a Unicode expert and 
I don't know if NFC will also do the same.

If 2) is fixed, whatever is correct in 1) does not matter much.

Thanks
Max

> On Sep 19, 2016, at 10:32 AM, Xuelei Fan  wrote:
> 
>> 4. Is it possible to perform normalization before escaping special 
>> characters?
>> 
> Yes.  I though about this case.  The current fix comes from the fact that 
> UTF-8 "Hello, world!" and "Hello, world!" should be different. Parsing them 
> as the same thing may result in unexpected serious issues.



Re: Code Review Request, JDK-8146600 AVA Normalizer.Form issue

2016-09-18 Thread Xuelei Fan



On 9/19/2016 9:46 AM, Wang Weijun wrote:

I am not sure of this change for several reasons:

1. I cannot find anywhere in RFC 2253 (or its new versions) mentioning 
normalizations. Do you know elsewhere?

Normalization is not a part of RFC 2253.  The spec is described in 
Unicode standards.

   http://www.unicode.org/reports/tr15/tr15-23.html


2. It's not obvious to say "Hello, world!" and "Hello, world!" should be 
different if NFKD thinks they are.

ASN.1 and RFC 2253 require UTF-8 encoding.  "Hello, world!" is encoded 
as "Hello%2C%20world%21".  "Hello, world!" is encoded as 
"Hello%EF%BC%8C%20world%21".   The encoded code should be different. 
When signing a certificate, "," is not converted to ",",  I don't think 
it is fine to convert it while parsing the field.



3. Why not NFC? Although I did't find normalization on X500 names in RFC 5280, 
I do see in several other cases NFV is used.

Actually, I'm not sure why normalization is required here.  So I don't 
want to update the code too much.  The previous form is NFKD.  If 
removing the "compatibility" impact part, the form is NFD, then.


What's the form of NFV?  Any typo?


4. Is it possible to perform normalization before escaping special characters?

Yes.  I though about this case.  The current fix comes from the fact 
that UTF-8 "Hello, world!" and "Hello, world!" should be different. 
Parsing them as the same thing may result in unexpected serious issues.



5. Why is normalization necessary? At least in RFC 5280 4.1.2.6, it says

   When the subject of the certificate is a CA, the subject
   field MUST be encoded in the same way as it is encoded in the
   issuer field (Section 4.1.2.4 ) in all certificates issued by
   the subject CA.

which implies comparison should be on encoding instead of toString.

I have to say I agree with this point.  I don't see the point to use 
normalization.  But I'm not sure I get the full information to remove 
the normalization.  I don't want to fix it until it is broken.


Xuelei



Thanks
Max


On Sep 15, 2016, at 8:09 AM, Xuelei Fan  wrote:

Hi,

Please review this fix:
   http://cr.openjdk.java.net/~xuelei/8146600/webrev.00/

The Normalizer.Form.NFKD is used to normalize attribute-value assertion in X.509 certificate processing.  The normalizer may convert some 
UTF-8 character into ASCII code.  For example, ","(two bytes) will be converted to ","(one byte), and "Hello, 
world!" is normalize to "Hello, world!".  However, "Hello, world!" and "Hello, world!" should be 
different because of the comma code.  This conversion may result in unexpected weird behaviors for name comparing and conversions.

This fix will update to use "Normalizer.Form.NFD".

Thanks,
Xuelei




Re: Code Review Request, JDK-8146600 AVA Normalizer.Form issue

2016-09-18 Thread Wang Weijun
I am not sure of this change for several reasons:

1. I cannot find anywhere in RFC 2253 (or its new versions) mentioning 
normalizations. Do you know elsewhere?

2. It's not obvious to say "Hello, world!" and "Hello, world!" should be 
different if NFKD thinks they are.

3. Why not NFC? Although I did't find normalization on X500 names in RFC 5280, 
I do see in several other cases NFV is used.

4. Is it possible to perform normalization before escaping special characters?

5. Why is normalization necessary? At least in RFC 5280 4.1.2.6, it says

   When the subject of the certificate is a CA, the subject
   field MUST be encoded in the same way as it is encoded in the
   issuer field (Section 4.1.2.4 ) in all certificates issued by
   the subject CA.

which implies comparison should be on encoding instead of toString.

Thanks
Max

> On Sep 15, 2016, at 8:09 AM, Xuelei Fan  wrote:
> 
> Hi,
> 
> Please review this fix:
>http://cr.openjdk.java.net/~xuelei/8146600/webrev.00/
> 
> The Normalizer.Form.NFKD is used to normalize attribute-value assertion in 
> X.509 certificate processing.  The normalizer may convert some UTF-8 
> character into ASCII code.  For example, ","(two bytes) will be converted to 
> ","(one byte), and "Hello, world!" is normalize to "Hello, world!".  However, 
> "Hello, world!" and "Hello, world!" should be different because of the comma 
> code.  This conversion may result in unexpected weird behaviors for name 
> comparing and conversions.
> 
> This fix will update to use "Normalizer.Form.NFD".
> 
> Thanks,
> Xuelei