Re: RFR: 8274329: Fix non-portable HotSpot code in MethodMatcher::parse_method_pattern

2021-10-06 Thread Jie Fu
On Tue, 5 Oct 2021 06:38:05 GMT, Ioi Lam  wrote:

>> Hi all,
>> 
>> I tried to build OpenJDK on Cygwin (Windows 2016 + VS2019).
>> However, I failed with C4474 and C4778 warnings as below:
>> 
>> Compiling 100 properties into resource bundles for java.desktop
>> Compiling 3038 files for java.base
>> e:\jiefu\ws\jdk\src\hotspot\share\compiler\methodMatcher.cpp(269): error 
>> C2220: the following warning is treated as an error
>> e:\jiefu\ws\jdk\src\hotspot\share\compiler\methodMatcher.cpp(269): warning 
>> C4778: 'sscanf' : unterminated format string 
>> '%255[*\x01\x02\x03\x04\x05\x06\a\b\n\v\f\r\x0e\x0f\x10\x11\x12\x13\x14\x15\x16\x17\x18\x19\x1a\x1b\x1c\x1d\x1e\x1f!"#$%&'*+,-0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ\^_`abcdefghijklmnopqrstuvwxyz{|}~\xe2\x82\xac\xe4\xba\x97\xe5\x84\x8e\xe5\x8e\x97%n'
>> e:\jiefu\ws\jdk\src\hotspot\share\compiler\methodMatcher.cpp(269): warning 
>> C4474: 'sscanf' : too many arguments passed for format string
>> e:\jiefu\ws\jdk\src\hotspot\share\compiler\methodMatcher.cpp(269): note: 
>> placeholders and their parameters expect 1 variadic arguments, but 3 were 
>> provided
>> e:\jiefu\ws\jdk\src\hotspot\share\compiler\methodMatcher.cpp(319): warning 
>> C4778: 'sscanf' : unterminated format string 
>> '%1022[[);/\x01\x02\x03\x04\x05\x06\a\b\n\v\f\r\x0e\x0f\x10\x11\x12\x13\x14\x15\x16\x17\x18\x19\x1a\x1b\x1c\x1d\x1e\x1f!"#$%&'*+,-0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ\^_`abcdefghijklmnopqrstuvwxyz{|}~\xe2\x82\xac\xe4\xba\x97\xe5\x84\x8e\xe5\x8e%n'
>> e:\jiefu\ws\jdk\src\hotspot\share\compiler\methodMatcher.cpp(319): warning 
>> C4474: 'sscanf' : too many arguments passed for format string
>> e:\jiefu\ws\jdk\src\hotspot\share\compiler\methodMatcher.cpp(319): note: 
>> placeholders and their parameters expect 0 variadic arguments, but 2 were 
>> provided
>> 
>> 
>> The failure is caused by non-ASCII chars in the format string of sscanf 
>> [1][2], which is non-portable on our Windows platform.
>> In fact, these non-ASCII coding also triggers C4819 warning, which had been 
>> disabled in JDK-8216154 [3].
>> And I also found an article showing that sscanf may fail with non-ASCII in 
>> the format string [4].
>> 
>> So it would be nice to remove these non-ASCII chars  (`\x80 ~ \xef`).
>> And I think it's safe to do so.
>> 
>> This is because:
>>  1) There are actually no non-ASCII chars for package/class/method/signature 
>> names.
>>  2) I don't think there is a use case, in which people will input non-ASCII 
>> for `CompileCommand`.
>> 
>> You may argue that the non-ASCII may be used by the parser itself.
>> But I didn't find that usage at all. (Please let me know if I miss 
>> something.)
>> 
>> So I suggest to remove these non-ASCII code to make HotSpot to be more 
>> portable.
>> And if we do so, we can also remove the only one 
>> `PRAGMA_DISABLE_MSVC_WARNING(4819)` [5].
>> 
>> Testing:
>>  - Build tests on Windows
>>  - tier1~3 on Linux/x64
>> 
>> Thanks.
>> Best regards,
>> Jie
>> 
>> [1] 
>> https://github.com/openjdk/jdk/blob/master/src/hotspot/share/compiler/methodMatcher.cpp#L269
>> [2] 
>> https://github.com/openjdk/jdk/blob/master/src/hotspot/share/compiler/methodMatcher.cpp#L319
>> [3] 
>> https://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2019-January/032014.html
>> [4] https://jeffpar.github.io/kbarchive/kb/047/Q47369/
>> [5] 
>> https://github.com/openjdk/jdk/blob/master/src/hotspot/share/compiler/methodMatcher.cpp#L246
>
> My experiments above with ` -XX:CompileCommand='compileonly,*::爪哇'` was done 
> on Linux. I tried doing the same on Windows. On US-English Windows, the 
> default codepage is 437 (DOS Latin US). If I change it to 65001 (UTF8) then 
> Java is able to output CJK characters to the console.
> 
> 
> public class CJK {
> public static void main(String args[]) {
>   System.out.println(args[0]);
> \u722a\u54c7();
> }
> 
> static void \u722a\u54c7() { // Chinese word for "Java"
> Thread.dumpStack();
> }
> }
> 
> 
> 
> c:\ade>chcp
> Active code page: 437
> 
> c:\ade>jdk-17\bin\java -cp . CJK 123
> 123
> java.lang.Exception: Stack trace
> at java.base/java.lang.Thread.dumpStack(Thread.java:1380)
> at CJK.??(CJK.java:8)
> at CJK.main(CJK.java:4)
> 
> c:\ade>chcp 65001
> Active code page: 65001
> 
> c:\ade>jdk-17\bin\java -cp . CJK 爪哇
> ??
> java.lang.Exception: Stack trace
> at java.base/java.lang.Thread.dumpStack(Thread.java:1380)
> at CJK.爪哇(CJK.java:8)
> at CJK.main(CJK.java:4)
> 
> 
> As you can see, the CJK characters in the command-line arguments can't even 
> be correctly passed as arguments to the Java main class. If that doesn't 
> work, I can't see how we can get `-XX:CompileCommand` to work with CJK 
> characters.

Thanks @iklam @magicus and @vnkozlov .

-

PR: https://git.openjdk.java.net/jdk/pull/5704


Re: RFR: 8274329: Fix non-portable HotSpot code in MethodMatcher::parse_method_pattern [v3]

2021-10-06 Thread Vladimir Kozlov
On Wed, 6 Oct 2021 05:17:28 GMT, Jie Fu  wrote:

>> Hi all,
>> 
>> I tried to build OpenJDK on Cygwin (Windows 2016 + VS2019).
>> However, I failed with C4474 and C4778 warnings as below:
>> 
>> Compiling 100 properties into resource bundles for java.desktop
>> Compiling 3038 files for java.base
>> e:\jiefu\ws\jdk\src\hotspot\share\compiler\methodMatcher.cpp(269): error 
>> C2220: the following warning is treated as an error
>> e:\jiefu\ws\jdk\src\hotspot\share\compiler\methodMatcher.cpp(269): warning 
>> C4778: 'sscanf' : unterminated format string 
>> '%255[*\x01\x02\x03\x04\x05\x06\a\b\n\v\f\r\x0e\x0f\x10\x11\x12\x13\x14\x15\x16\x17\x18\x19\x1a\x1b\x1c\x1d\x1e\x1f!"#$%&'*+,-0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ\^_`abcdefghijklmnopqrstuvwxyz{|}~\xe2\x82\xac\xe4\xba\x97\xe5\x84\x8e\xe5\x8e\x97%n'
>> e:\jiefu\ws\jdk\src\hotspot\share\compiler\methodMatcher.cpp(269): warning 
>> C4474: 'sscanf' : too many arguments passed for format string
>> e:\jiefu\ws\jdk\src\hotspot\share\compiler\methodMatcher.cpp(269): note: 
>> placeholders and their parameters expect 1 variadic arguments, but 3 were 
>> provided
>> e:\jiefu\ws\jdk\src\hotspot\share\compiler\methodMatcher.cpp(319): warning 
>> C4778: 'sscanf' : unterminated format string 
>> '%1022[[);/\x01\x02\x03\x04\x05\x06\a\b\n\v\f\r\x0e\x0f\x10\x11\x12\x13\x14\x15\x16\x17\x18\x19\x1a\x1b\x1c\x1d\x1e\x1f!"#$%&'*+,-0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ\^_`abcdefghijklmnopqrstuvwxyz{|}~\xe2\x82\xac\xe4\xba\x97\xe5\x84\x8e\xe5\x8e%n'
>> e:\jiefu\ws\jdk\src\hotspot\share\compiler\methodMatcher.cpp(319): warning 
>> C4474: 'sscanf' : too many arguments passed for format string
>> e:\jiefu\ws\jdk\src\hotspot\share\compiler\methodMatcher.cpp(319): note: 
>> placeholders and their parameters expect 0 variadic arguments, but 2 were 
>> provided
>> 
>> 
>> The failure is caused by non-ASCII chars in the format string of sscanf 
>> [1][2], which is non-portable on our Windows platform.
>> In fact, these non-ASCII coding also triggers C4819 warning, which had been 
>> disabled in JDK-8216154 [3].
>> And I also found an article showing that sscanf may fail with non-ASCII in 
>> the format string [4].
>> 
>> So it would be nice to remove these non-ASCII chars  (`\x80 ~ \xef`).
>> And I think it's safe to do so.
>> 
>> This is because:
>>  1) There are actually no non-ASCII chars for package/class/method/signature 
>> names.
>>  2) I don't think there is a use case, in which people will input non-ASCII 
>> for `CompileCommand`.
>> 
>> You may argue that the non-ASCII may be used by the parser itself.
>> But I didn't find that usage at all. (Please let me know if I miss 
>> something.)
>> 
>> So I suggest to remove these non-ASCII code to make HotSpot to be more 
>> portable.
>> And if we do so, we can also remove the only one 
>> `PRAGMA_DISABLE_MSVC_WARNING(4819)` [5].
>> 
>> Testing:
>>  - Build tests on Windows
>>  - tier1~3 on Linux/x64
>> 
>> Thanks.
>> Best regards,
>> Jie
>> 
>> [1] 
>> https://github.com/openjdk/jdk/blob/master/src/hotspot/share/compiler/methodMatcher.cpp#L269
>> [2] 
>> https://github.com/openjdk/jdk/blob/master/src/hotspot/share/compiler/methodMatcher.cpp#L319
>> [3] 
>> https://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2019-January/032014.html
>> [4] https://jeffpar.github.io/kbarchive/kb/047/Q47369/
>> [5] 
>> https://github.com/openjdk/jdk/blob/master/src/hotspot/share/compiler/methodMatcher.cpp#L246
>
> Jie Fu has updated the pull request incrementally with one additional commit 
> since the last revision:
> 
>   Split with RANGEBASE_ASCII and RANGEBASE_NON_ASCII

Passed my tier1-3 testing

-

Marked as reviewed by kvn (Reviewer).

PR: https://git.openjdk.java.net/jdk/pull/5704


Re: RFR: 8274329: Fix non-portable HotSpot code in MethodMatcher::parse_method_pattern [v3]

2021-10-06 Thread Vladimir Kozlov
On Wed, 6 Oct 2021 05:17:28 GMT, Jie Fu  wrote:

>> Hi all,
>> 
>> I tried to build OpenJDK on Cygwin (Windows 2016 + VS2019).
>> However, I failed with C4474 and C4778 warnings as below:
>> 
>> Compiling 100 properties into resource bundles for java.desktop
>> Compiling 3038 files for java.base
>> e:\jiefu\ws\jdk\src\hotspot\share\compiler\methodMatcher.cpp(269): error 
>> C2220: the following warning is treated as an error
>> e:\jiefu\ws\jdk\src\hotspot\share\compiler\methodMatcher.cpp(269): warning 
>> C4778: 'sscanf' : unterminated format string 
>> '%255[*\x01\x02\x03\x04\x05\x06\a\b\n\v\f\r\x0e\x0f\x10\x11\x12\x13\x14\x15\x16\x17\x18\x19\x1a\x1b\x1c\x1d\x1e\x1f!"#$%&'*+,-0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ\^_`abcdefghijklmnopqrstuvwxyz{|}~\xe2\x82\xac\xe4\xba\x97\xe5\x84\x8e\xe5\x8e\x97%n'
>> e:\jiefu\ws\jdk\src\hotspot\share\compiler\methodMatcher.cpp(269): warning 
>> C4474: 'sscanf' : too many arguments passed for format string
>> e:\jiefu\ws\jdk\src\hotspot\share\compiler\methodMatcher.cpp(269): note: 
>> placeholders and their parameters expect 1 variadic arguments, but 3 were 
>> provided
>> e:\jiefu\ws\jdk\src\hotspot\share\compiler\methodMatcher.cpp(319): warning 
>> C4778: 'sscanf' : unterminated format string 
>> '%1022[[);/\x01\x02\x03\x04\x05\x06\a\b\n\v\f\r\x0e\x0f\x10\x11\x12\x13\x14\x15\x16\x17\x18\x19\x1a\x1b\x1c\x1d\x1e\x1f!"#$%&'*+,-0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ\^_`abcdefghijklmnopqrstuvwxyz{|}~\xe2\x82\xac\xe4\xba\x97\xe5\x84\x8e\xe5\x8e%n'
>> e:\jiefu\ws\jdk\src\hotspot\share\compiler\methodMatcher.cpp(319): warning 
>> C4474: 'sscanf' : too many arguments passed for format string
>> e:\jiefu\ws\jdk\src\hotspot\share\compiler\methodMatcher.cpp(319): note: 
>> placeholders and their parameters expect 0 variadic arguments, but 2 were 
>> provided
>> 
>> 
>> The failure is caused by non-ASCII chars in the format string of sscanf 
>> [1][2], which is non-portable on our Windows platform.
>> In fact, these non-ASCII coding also triggers C4819 warning, which had been 
>> disabled in JDK-8216154 [3].
>> And I also found an article showing that sscanf may fail with non-ASCII in 
>> the format string [4].
>> 
>> So it would be nice to remove these non-ASCII chars  (`\x80 ~ \xef`).
>> And I think it's safe to do so.
>> 
>> This is because:
>>  1) There are actually no non-ASCII chars for package/class/method/signature 
>> names.
>>  2) I don't think there is a use case, in which people will input non-ASCII 
>> for `CompileCommand`.
>> 
>> You may argue that the non-ASCII may be used by the parser itself.
>> But I didn't find that usage at all. (Please let me know if I miss 
>> something.)
>> 
>> So I suggest to remove these non-ASCII code to make HotSpot to be more 
>> portable.
>> And if we do so, we can also remove the only one 
>> `PRAGMA_DISABLE_MSVC_WARNING(4819)` [5].
>> 
>> Testing:
>>  - Build tests on Windows
>>  - tier1~3 on Linux/x64
>> 
>> Thanks.
>> Best regards,
>> Jie
>> 
>> [1] 
>> https://github.com/openjdk/jdk/blob/master/src/hotspot/share/compiler/methodMatcher.cpp#L269
>> [2] 
>> https://github.com/openjdk/jdk/blob/master/src/hotspot/share/compiler/methodMatcher.cpp#L319
>> [3] 
>> https://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2019-January/032014.html
>> [4] https://jeffpar.github.io/kbarchive/kb/047/Q47369/
>> [5] 
>> https://github.com/openjdk/jdk/blob/master/src/hotspot/share/compiler/methodMatcher.cpp#L246
>
> Jie Fu has updated the pull request incrementally with one additional commit 
> since the last revision:
> 
>   Split with RANGEBASE_ASCII and RANGEBASE_NON_ASCII

Looks good to me. Let me test it before approval.

-

PR: https://git.openjdk.java.net/jdk/pull/5704


Re: RFR: 8274329: Fix non-portable HotSpot code in MethodMatcher::parse_method_pattern [v3]

2021-10-06 Thread Magnus Ihse Bursie
On Wed, 6 Oct 2021 05:17:28 GMT, Jie Fu  wrote:

>> Hi all,
>> 
>> I tried to build OpenJDK on Cygwin (Windows 2016 + VS2019).
>> However, I failed with C4474 and C4778 warnings as below:
>> 
>> Compiling 100 properties into resource bundles for java.desktop
>> Compiling 3038 files for java.base
>> e:\jiefu\ws\jdk\src\hotspot\share\compiler\methodMatcher.cpp(269): error 
>> C2220: the following warning is treated as an error
>> e:\jiefu\ws\jdk\src\hotspot\share\compiler\methodMatcher.cpp(269): warning 
>> C4778: 'sscanf' : unterminated format string 
>> '%255[*\x01\x02\x03\x04\x05\x06\a\b\n\v\f\r\x0e\x0f\x10\x11\x12\x13\x14\x15\x16\x17\x18\x19\x1a\x1b\x1c\x1d\x1e\x1f!"#$%&'*+,-0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ\^_`abcdefghijklmnopqrstuvwxyz{|}~\xe2\x82\xac\xe4\xba\x97\xe5\x84\x8e\xe5\x8e\x97%n'
>> e:\jiefu\ws\jdk\src\hotspot\share\compiler\methodMatcher.cpp(269): warning 
>> C4474: 'sscanf' : too many arguments passed for format string
>> e:\jiefu\ws\jdk\src\hotspot\share\compiler\methodMatcher.cpp(269): note: 
>> placeholders and their parameters expect 1 variadic arguments, but 3 were 
>> provided
>> e:\jiefu\ws\jdk\src\hotspot\share\compiler\methodMatcher.cpp(319): warning 
>> C4778: 'sscanf' : unterminated format string 
>> '%1022[[);/\x01\x02\x03\x04\x05\x06\a\b\n\v\f\r\x0e\x0f\x10\x11\x12\x13\x14\x15\x16\x17\x18\x19\x1a\x1b\x1c\x1d\x1e\x1f!"#$%&'*+,-0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ\^_`abcdefghijklmnopqrstuvwxyz{|}~\xe2\x82\xac\xe4\xba\x97\xe5\x84\x8e\xe5\x8e%n'
>> e:\jiefu\ws\jdk\src\hotspot\share\compiler\methodMatcher.cpp(319): warning 
>> C4474: 'sscanf' : too many arguments passed for format string
>> e:\jiefu\ws\jdk\src\hotspot\share\compiler\methodMatcher.cpp(319): note: 
>> placeholders and their parameters expect 0 variadic arguments, but 2 were 
>> provided
>> 
>> 
>> The failure is caused by non-ASCII chars in the format string of sscanf 
>> [1][2], which is non-portable on our Windows platform.
>> In fact, these non-ASCII coding also triggers C4819 warning, which had been 
>> disabled in JDK-8216154 [3].
>> And I also found an article showing that sscanf may fail with non-ASCII in 
>> the format string [4].
>> 
>> So it would be nice to remove these non-ASCII chars  (`\x80 ~ \xef`).
>> And I think it's safe to do so.
>> 
>> This is because:
>>  1) There are actually no non-ASCII chars for package/class/method/signature 
>> names.
>>  2) I don't think there is a use case, in which people will input non-ASCII 
>> for `CompileCommand`.
>> 
>> You may argue that the non-ASCII may be used by the parser itself.
>> But I didn't find that usage at all. (Please let me know if I miss 
>> something.)
>> 
>> So I suggest to remove these non-ASCII code to make HotSpot to be more 
>> portable.
>> And if we do so, we can also remove the only one 
>> `PRAGMA_DISABLE_MSVC_WARNING(4819)` [5].
>> 
>> Testing:
>>  - Build tests on Windows
>>  - tier1~3 on Linux/x64
>> 
>> Thanks.
>> Best regards,
>> Jie
>> 
>> [1] 
>> https://github.com/openjdk/jdk/blob/master/src/hotspot/share/compiler/methodMatcher.cpp#L269
>> [2] 
>> https://github.com/openjdk/jdk/blob/master/src/hotspot/share/compiler/methodMatcher.cpp#L319
>> [3] 
>> https://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2019-January/032014.html
>> [4] https://jeffpar.github.io/kbarchive/kb/047/Q47369/
>> [5] 
>> https://github.com/openjdk/jdk/blob/master/src/hotspot/share/compiler/methodMatcher.cpp#L246
>
> Jie Fu has updated the pull request incrementally with one additional commit 
> since the last revision:
> 
>   Split with RANGEBASE_ASCII and RANGEBASE_NON_ASCII

I think this was the best possible solution.

-

Marked as reviewed by ihse (Reviewer).

PR: https://git.openjdk.java.net/jdk/pull/5704


Re: RFR: 8274329: Fix non-portable HotSpot code in MethodMatcher::parse_method_pattern [v3]

2021-10-05 Thread Ioi Lam
On Wed, 6 Oct 2021 05:17:28 GMT, Jie Fu  wrote:

>> Hi all,
>> 
>> I tried to build OpenJDK on Cygwin (Windows 2016 + VS2019).
>> However, I failed with C4474 and C4778 warnings as below:
>> 
>> Compiling 100 properties into resource bundles for java.desktop
>> Compiling 3038 files for java.base
>> e:\jiefu\ws\jdk\src\hotspot\share\compiler\methodMatcher.cpp(269): error 
>> C2220: the following warning is treated as an error
>> e:\jiefu\ws\jdk\src\hotspot\share\compiler\methodMatcher.cpp(269): warning 
>> C4778: 'sscanf' : unterminated format string 
>> '%255[*\x01\x02\x03\x04\x05\x06\a\b\n\v\f\r\x0e\x0f\x10\x11\x12\x13\x14\x15\x16\x17\x18\x19\x1a\x1b\x1c\x1d\x1e\x1f!"#$%&'*+,-0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ\^_`abcdefghijklmnopqrstuvwxyz{|}~\xe2\x82\xac\xe4\xba\x97\xe5\x84\x8e\xe5\x8e\x97%n'
>> e:\jiefu\ws\jdk\src\hotspot\share\compiler\methodMatcher.cpp(269): warning 
>> C4474: 'sscanf' : too many arguments passed for format string
>> e:\jiefu\ws\jdk\src\hotspot\share\compiler\methodMatcher.cpp(269): note: 
>> placeholders and their parameters expect 1 variadic arguments, but 3 were 
>> provided
>> e:\jiefu\ws\jdk\src\hotspot\share\compiler\methodMatcher.cpp(319): warning 
>> C4778: 'sscanf' : unterminated format string 
>> '%1022[[);/\x01\x02\x03\x04\x05\x06\a\b\n\v\f\r\x0e\x0f\x10\x11\x12\x13\x14\x15\x16\x17\x18\x19\x1a\x1b\x1c\x1d\x1e\x1f!"#$%&'*+,-0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ\^_`abcdefghijklmnopqrstuvwxyz{|}~\xe2\x82\xac\xe4\xba\x97\xe5\x84\x8e\xe5\x8e%n'
>> e:\jiefu\ws\jdk\src\hotspot\share\compiler\methodMatcher.cpp(319): warning 
>> C4474: 'sscanf' : too many arguments passed for format string
>> e:\jiefu\ws\jdk\src\hotspot\share\compiler\methodMatcher.cpp(319): note: 
>> placeholders and their parameters expect 0 variadic arguments, but 2 were 
>> provided
>> 
>> 
>> The failure is caused by non-ASCII chars in the format string of sscanf 
>> [1][2], which is non-portable on our Windows platform.
>> In fact, these non-ASCII coding also triggers C4819 warning, which had been 
>> disabled in JDK-8216154 [3].
>> And I also found an article showing that sscanf may fail with non-ASCII in 
>> the format string [4].
>> 
>> So it would be nice to remove these non-ASCII chars  (`\x80 ~ \xef`).
>> And I think it's safe to do so.
>> 
>> This is because:
>>  1) There are actually no non-ASCII chars for package/class/method/signature 
>> names.
>>  2) I don't think there is a use case, in which people will input non-ASCII 
>> for `CompileCommand`.
>> 
>> You may argue that the non-ASCII may be used by the parser itself.
>> But I didn't find that usage at all. (Please let me know if I miss 
>> something.)
>> 
>> So I suggest to remove these non-ASCII code to make HotSpot to be more 
>> portable.
>> And if we do so, we can also remove the only one 
>> `PRAGMA_DISABLE_MSVC_WARNING(4819)` [5].
>> 
>> Testing:
>>  - Build tests on Windows
>>  - tier1~3 on Linux/x64
>> 
>> Thanks.
>> Best regards,
>> Jie
>> 
>> [1] 
>> https://github.com/openjdk/jdk/blob/master/src/hotspot/share/compiler/methodMatcher.cpp#L269
>> [2] 
>> https://github.com/openjdk/jdk/blob/master/src/hotspot/share/compiler/methodMatcher.cpp#L319
>> [3] 
>> https://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2019-January/032014.html
>> [4] https://jeffpar.github.io/kbarchive/kb/047/Q47369/
>> [5] 
>> https://github.com/openjdk/jdk/blob/master/src/hotspot/share/compiler/methodMatcher.cpp#L246
>
> Jie Fu has updated the pull request incrementally with one additional commit 
> since the last revision:
> 
>   Split with RANGEBASE_ASCII and RANGEBASE_NON_ASCII

Marked as reviewed by iklam (Reviewer).

-

PR: https://git.openjdk.java.net/jdk/pull/5704


Re: RFR: 8274329: Fix non-portable HotSpot code in MethodMatcher::parse_method_pattern [v3]

2021-10-05 Thread Jie Fu
> Hi all,
> 
> I tried to build OpenJDK on Cygwin (Windows 2016 + VS2019).
> However, I failed with C4474 and C4778 warnings as below:
> 
> Compiling 100 properties into resource bundles for java.desktop
> Compiling 3038 files for java.base
> e:\jiefu\ws\jdk\src\hotspot\share\compiler\methodMatcher.cpp(269): error 
> C2220: the following warning is treated as an error
> e:\jiefu\ws\jdk\src\hotspot\share\compiler\methodMatcher.cpp(269): warning 
> C4778: 'sscanf' : unterminated format string 
> '%255[*\x01\x02\x03\x04\x05\x06\a\b\n\v\f\r\x0e\x0f\x10\x11\x12\x13\x14\x15\x16\x17\x18\x19\x1a\x1b\x1c\x1d\x1e\x1f!"#$%&'*+,-0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ\^_`abcdefghijklmnopqrstuvwxyz{|}~\xe2\x82\xac\xe4\xba\x97\xe5\x84\x8e\xe5\x8e\x97%n'
> e:\jiefu\ws\jdk\src\hotspot\share\compiler\methodMatcher.cpp(269): warning 
> C4474: 'sscanf' : too many arguments passed for format string
> e:\jiefu\ws\jdk\src\hotspot\share\compiler\methodMatcher.cpp(269): note: 
> placeholders and their parameters expect 1 variadic arguments, but 3 were 
> provided
> e:\jiefu\ws\jdk\src\hotspot\share\compiler\methodMatcher.cpp(319): warning 
> C4778: 'sscanf' : unterminated format string 
> '%1022[[);/\x01\x02\x03\x04\x05\x06\a\b\n\v\f\r\x0e\x0f\x10\x11\x12\x13\x14\x15\x16\x17\x18\x19\x1a\x1b\x1c\x1d\x1e\x1f!"#$%&'*+,-0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ\^_`abcdefghijklmnopqrstuvwxyz{|}~\xe2\x82\xac\xe4\xba\x97\xe5\x84\x8e\xe5\x8e%n'
> e:\jiefu\ws\jdk\src\hotspot\share\compiler\methodMatcher.cpp(319): warning 
> C4474: 'sscanf' : too many arguments passed for format string
> e:\jiefu\ws\jdk\src\hotspot\share\compiler\methodMatcher.cpp(319): note: 
> placeholders and their parameters expect 0 variadic arguments, but 2 were 
> provided
> 
> 
> The failure is caused by non-ASCII chars in the format string of sscanf 
> [1][2], which is non-portable on our Windows platform.
> In fact, these non-ASCII coding also triggers C4819 warning, which had been 
> disabled in JDK-8216154 [3].
> And I also found an article showing that sscanf may fail with non-ASCII in 
> the format string [4].
> 
> So it would be nice to remove these non-ASCII chars  (`\x80 ~ \xef`).
> And I think it's safe to do so.
> 
> This is because:
>  1) There are actually no non-ASCII chars for package/class/method/signature 
> names.
>  2) I don't think there is a use case, in which people will input non-ASCII 
> for `CompileCommand`.
> 
> You may argue that the non-ASCII may be used by the parser itself.
> But I didn't find that usage at all. (Please let me know if I miss something.)
> 
> So I suggest to remove these non-ASCII code to make HotSpot to be more 
> portable.
> And if we do so, we can also remove the only one 
> `PRAGMA_DISABLE_MSVC_WARNING(4819)` [5].
> 
> Testing:
>  - Build tests on Windows
>  - tier1~3 on Linux/x64
> 
> Thanks.
> Best regards,
> Jie
> 
> [1] 
> https://github.com/openjdk/jdk/blob/master/src/hotspot/share/compiler/methodMatcher.cpp#L269
> [2] 
> https://github.com/openjdk/jdk/blob/master/src/hotspot/share/compiler/methodMatcher.cpp#L319
> [3] 
> https://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2019-January/032014.html
> [4] https://jeffpar.github.io/kbarchive/kb/047/Q47369/
> [5] 
> https://github.com/openjdk/jdk/blob/master/src/hotspot/share/compiler/methodMatcher.cpp#L246

Jie Fu has updated the pull request incrementally with one additional commit 
since the last revision:

  Split with RANGEBASE_ASCII and RANGEBASE_NON_ASCII

-

Changes:
  - all: https://git.openjdk.java.net/jdk/pull/5704/files
  - new: https://git.openjdk.java.net/jdk/pull/5704/files/e1271085..d0070680

Webrevs:
 - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=5704&range=02
 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=5704&range=01-02

  Stats: 15 lines in 1 file changed: 1 ins; 9 del; 5 mod
  Patch: https://git.openjdk.java.net/jdk/pull/5704.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/5704/head:pull/5704

PR: https://git.openjdk.java.net/jdk/pull/5704


Re: RFR: 8274329: Fix non-portable HotSpot code in MethodMatcher::parse_method_pattern [v2]

2021-10-05 Thread Jie Fu
On Wed, 6 Oct 2021 04:30:12 GMT, Ioi Lam  wrote:

> It's hard to tell what's the difference between these two RANGEBASE 
> definitions. How about doing it like this to make the code more readable?
> 
> ```
> #define RANGEBASE_ASCII "."
> #define RANGEBASE_NON_ASCII "."
> #ifdef WINDOWS
> #define RANGEBASE RANGEBASE_ASCII
> #else  
> #define RANGEBASE RANGEBASE_ASCII RANGEBASE_NON_ASCII 
> #endif
> ```

Good suggestion!
Updated.
Thanks.

-

PR: https://git.openjdk.java.net/jdk/pull/5704


Re: RFR: 8274329: Fix non-portable HotSpot code in MethodMatcher::parse_method_pattern [v2]

2021-10-05 Thread Ioi Lam
On Wed, 6 Oct 2021 02:33:30 GMT, Jie Fu  wrote:

>> Hi all,
>> 
>> I tried to build OpenJDK on Cygwin (Windows 2016 + VS2019).
>> However, I failed with C4474 and C4778 warnings as below:
>> 
>> Compiling 100 properties into resource bundles for java.desktop
>> Compiling 3038 files for java.base
>> e:\jiefu\ws\jdk\src\hotspot\share\compiler\methodMatcher.cpp(269): error 
>> C2220: the following warning is treated as an error
>> e:\jiefu\ws\jdk\src\hotspot\share\compiler\methodMatcher.cpp(269): warning 
>> C4778: 'sscanf' : unterminated format string 
>> '%255[*\x01\x02\x03\x04\x05\x06\a\b\n\v\f\r\x0e\x0f\x10\x11\x12\x13\x14\x15\x16\x17\x18\x19\x1a\x1b\x1c\x1d\x1e\x1f!"#$%&'*+,-0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ\^_`abcdefghijklmnopqrstuvwxyz{|}~\xe2\x82\xac\xe4\xba\x97\xe5\x84\x8e\xe5\x8e\x97%n'
>> e:\jiefu\ws\jdk\src\hotspot\share\compiler\methodMatcher.cpp(269): warning 
>> C4474: 'sscanf' : too many arguments passed for format string
>> e:\jiefu\ws\jdk\src\hotspot\share\compiler\methodMatcher.cpp(269): note: 
>> placeholders and their parameters expect 1 variadic arguments, but 3 were 
>> provided
>> e:\jiefu\ws\jdk\src\hotspot\share\compiler\methodMatcher.cpp(319): warning 
>> C4778: 'sscanf' : unterminated format string 
>> '%1022[[);/\x01\x02\x03\x04\x05\x06\a\b\n\v\f\r\x0e\x0f\x10\x11\x12\x13\x14\x15\x16\x17\x18\x19\x1a\x1b\x1c\x1d\x1e\x1f!"#$%&'*+,-0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ\^_`abcdefghijklmnopqrstuvwxyz{|}~\xe2\x82\xac\xe4\xba\x97\xe5\x84\x8e\xe5\x8e%n'
>> e:\jiefu\ws\jdk\src\hotspot\share\compiler\methodMatcher.cpp(319): warning 
>> C4474: 'sscanf' : too many arguments passed for format string
>> e:\jiefu\ws\jdk\src\hotspot\share\compiler\methodMatcher.cpp(319): note: 
>> placeholders and their parameters expect 0 variadic arguments, but 2 were 
>> provided
>> 
>> 
>> The failure is caused by non-ASCII chars in the format string of sscanf 
>> [1][2], which is non-portable on our Windows platform.
>> In fact, these non-ASCII coding also triggers C4819 warning, which had been 
>> disabled in JDK-8216154 [3].
>> And I also found an article showing that sscanf may fail with non-ASCII in 
>> the format string [4].
>> 
>> So it would be nice to remove these non-ASCII chars  (`\x80 ~ \xef`).
>> And I think it's safe to do so.
>> 
>> This is because:
>>  1) There are actually no non-ASCII chars for package/class/method/signature 
>> names.
>>  2) I don't think there is a use case, in which people will input non-ASCII 
>> for `CompileCommand`.
>> 
>> You may argue that the non-ASCII may be used by the parser itself.
>> But I didn't find that usage at all. (Please let me know if I miss 
>> something.)
>> 
>> So I suggest to remove these non-ASCII code to make HotSpot to be more 
>> portable.
>> And if we do so, we can also remove the only one 
>> `PRAGMA_DISABLE_MSVC_WARNING(4819)` [5].
>> 
>> Testing:
>>  - Build tests on Windows
>>  - tier1~3 on Linux/x64
>> 
>> Thanks.
>> Best regards,
>> Jie
>> 
>> [1] 
>> https://github.com/openjdk/jdk/blob/master/src/hotspot/share/compiler/methodMatcher.cpp#L269
>> [2] 
>> https://github.com/openjdk/jdk/blob/master/src/hotspot/share/compiler/methodMatcher.cpp#L319
>> [3] 
>> https://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2019-January/032014.html
>> [4] https://jeffpar.github.io/kbarchive/kb/047/Q47369/
>> [5] 
>> https://github.com/openjdk/jdk/blob/master/src/hotspot/share/compiler/methodMatcher.cpp#L246
>
> Jie Fu has updated the pull request incrementally with one additional commit 
> since the last revision:
> 
>   Disable non-ASCII for Windows only

The idea looks good to me. I just have a suggestion to make the code more 
readable.

src/hotspot/share/compiler/methodMatcher.cpp line 77:

> 75: "\x60\x61\x62\x63\x64\x65\x66\x67\x68\x69\x6a\x6b\x6c\x6d\x6e\x6f" \
> 76: "\x70\x71\x72\x73\x74\x75\x76\x77\x78\x79\x7a\x7b\x7c\x7d\x7e\x7f"
> 77: #endif

It's hard to tell what's the difference between these two RANGEBASE 
definitions. How about doing it like this to make the code more readable?


#define RANGEBASE_ASCII "."
#define RANGEBASE_NON_ASCII "."
#ifdef WINDOWS
#define RANGEBASE RANGEBASE_ASCII
#else  
#define RANGEBASE RANGEBASE_ASCII RANGEBASE_NON_ASCII 
#endif

-

Marked as reviewed by iklam (Reviewer).

PR: https://git.openjdk.java.net/jdk/pull/5704


Re: RFR: 8274329: Fix non-portable HotSpot code in MethodMatcher::parse_method_pattern

2021-10-05 Thread Jie Fu
On Tue, 5 Oct 2021 06:38:05 GMT, Ioi Lam  wrote:

>> Hi all,
>> 
>> I tried to build OpenJDK on Cygwin (Windows 2016 + VS2019).
>> However, I failed with C4474 and C4778 warnings as below:
>> 
>> Compiling 100 properties into resource bundles for java.desktop
>> Compiling 3038 files for java.base
>> e:\jiefu\ws\jdk\src\hotspot\share\compiler\methodMatcher.cpp(269): error 
>> C2220: the following warning is treated as an error
>> e:\jiefu\ws\jdk\src\hotspot\share\compiler\methodMatcher.cpp(269): warning 
>> C4778: 'sscanf' : unterminated format string 
>> '%255[*\x01\x02\x03\x04\x05\x06\a\b\n\v\f\r\x0e\x0f\x10\x11\x12\x13\x14\x15\x16\x17\x18\x19\x1a\x1b\x1c\x1d\x1e\x1f!"#$%&'*+,-0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ\^_`abcdefghijklmnopqrstuvwxyz{|}~\xe2\x82\xac\xe4\xba\x97\xe5\x84\x8e\xe5\x8e\x97%n'
>> e:\jiefu\ws\jdk\src\hotspot\share\compiler\methodMatcher.cpp(269): warning 
>> C4474: 'sscanf' : too many arguments passed for format string
>> e:\jiefu\ws\jdk\src\hotspot\share\compiler\methodMatcher.cpp(269): note: 
>> placeholders and their parameters expect 1 variadic arguments, but 3 were 
>> provided
>> e:\jiefu\ws\jdk\src\hotspot\share\compiler\methodMatcher.cpp(319): warning 
>> C4778: 'sscanf' : unterminated format string 
>> '%1022[[);/\x01\x02\x03\x04\x05\x06\a\b\n\v\f\r\x0e\x0f\x10\x11\x12\x13\x14\x15\x16\x17\x18\x19\x1a\x1b\x1c\x1d\x1e\x1f!"#$%&'*+,-0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ\^_`abcdefghijklmnopqrstuvwxyz{|}~\xe2\x82\xac\xe4\xba\x97\xe5\x84\x8e\xe5\x8e%n'
>> e:\jiefu\ws\jdk\src\hotspot\share\compiler\methodMatcher.cpp(319): warning 
>> C4474: 'sscanf' : too many arguments passed for format string
>> e:\jiefu\ws\jdk\src\hotspot\share\compiler\methodMatcher.cpp(319): note: 
>> placeholders and their parameters expect 0 variadic arguments, but 2 were 
>> provided
>> 
>> 
>> The failure is caused by non-ASCII chars in the format string of sscanf 
>> [1][2], which is non-portable on our Windows platform.
>> In fact, these non-ASCII coding also triggers C4819 warning, which had been 
>> disabled in JDK-8216154 [3].
>> And I also found an article showing that sscanf may fail with non-ASCII in 
>> the format string [4].
>> 
>> So it would be nice to remove these non-ASCII chars  (`\x80 ~ \xef`).
>> And I think it's safe to do so.
>> 
>> This is because:
>>  1) There are actually no non-ASCII chars for package/class/method/signature 
>> names.
>>  2) I don't think there is a use case, in which people will input non-ASCII 
>> for `CompileCommand`.
>> 
>> You may argue that the non-ASCII may be used by the parser itself.
>> But I didn't find that usage at all. (Please let me know if I miss 
>> something.)
>> 
>> So I suggest to remove these non-ASCII code to make HotSpot to be more 
>> portable.
>> And if we do so, we can also remove the only one 
>> `PRAGMA_DISABLE_MSVC_WARNING(4819)` [5].
>> 
>> Testing:
>>  - Build tests on Windows
>>  - tier1~3 on Linux/x64
>> 
>> Thanks.
>> Best regards,
>> Jie
>> 
>> [1] 
>> https://github.com/openjdk/jdk/blob/master/src/hotspot/share/compiler/methodMatcher.cpp#L269
>> [2] 
>> https://github.com/openjdk/jdk/blob/master/src/hotspot/share/compiler/methodMatcher.cpp#L319
>> [3] 
>> https://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2019-January/032014.html
>> [4] https://jeffpar.github.io/kbarchive/kb/047/Q47369/
>> [5] 
>> https://github.com/openjdk/jdk/blob/master/src/hotspot/share/compiler/methodMatcher.cpp#L246
>
> My experiments above with ` -XX:CompileCommand='compileonly,*::爪哇'` was done 
> on Linux. I tried doing the same on Windows. On US-English Windows, the 
> default codepage is 437 (DOS Latin US). If I change it to 65001 (UTF8) then 
> Java is able to output CJK characters to the console.
> 
> 
> public class CJK {
> public static void main(String args[]) {
>   System.out.println(args[0]);
> \u722a\u54c7();
> }
> 
> static void \u722a\u54c7() { // Chinese word for "Java"
> Thread.dumpStack();
> }
> }
> 
> 
> 
> c:\ade>chcp
> Active code page: 437
> 
> c:\ade>jdk-17\bin\java -cp . CJK 123
> 123
> java.lang.Exception: Stack trace
> at java.base/java.lang.Thread.dumpStack(Thread.java:1380)
> at CJK.??(CJK.java:8)
> at CJK.main(CJK.java:4)
> 
> c:\ade>chcp 65001
> Active code page: 65001
> 
> c:\ade>jdk-17\bin\java -cp . CJK 爪哇
> ??
> java.lang.Exception: Stack trace
> at java.base/java.lang.Thread.dumpStack(Thread.java:1380)
> at CJK.爪哇(CJK.java:8)
> at CJK.main(CJK.java:4)
> 
> 
> As you can see, the CJK characters in the command-line arguments can't even 
> be correctly passed as arguments to the Java main class. If that doesn't 
> work, I can't see how we can get `-XX:CompileCommand` to work with CJK 
> characters.

Thanks @iklam and @magicus for your experiments and comments.

My experiments show that CompileCommand doesn't work with non-US-English env 
Windows.
And @iklam 's experiments show that it doesn't work with US-English env Windows 

Re: RFR: 8274329: Fix non-portable HotSpot code in MethodMatcher::parse_method_pattern [v2]

2021-10-05 Thread Jie Fu
> Hi all,
> 
> I tried to build OpenJDK on Cygwin (Windows 2016 + VS2019).
> However, I failed with C4474 and C4778 warnings as below:
> 
> Compiling 100 properties into resource bundles for java.desktop
> Compiling 3038 files for java.base
> e:\jiefu\ws\jdk\src\hotspot\share\compiler\methodMatcher.cpp(269): error 
> C2220: the following warning is treated as an error
> e:\jiefu\ws\jdk\src\hotspot\share\compiler\methodMatcher.cpp(269): warning 
> C4778: 'sscanf' : unterminated format string 
> '%255[*\x01\x02\x03\x04\x05\x06\a\b\n\v\f\r\x0e\x0f\x10\x11\x12\x13\x14\x15\x16\x17\x18\x19\x1a\x1b\x1c\x1d\x1e\x1f!"#$%&'*+,-0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ\^_`abcdefghijklmnopqrstuvwxyz{|}~\xe2\x82\xac\xe4\xba\x97\xe5\x84\x8e\xe5\x8e\x97%n'
> e:\jiefu\ws\jdk\src\hotspot\share\compiler\methodMatcher.cpp(269): warning 
> C4474: 'sscanf' : too many arguments passed for format string
> e:\jiefu\ws\jdk\src\hotspot\share\compiler\methodMatcher.cpp(269): note: 
> placeholders and their parameters expect 1 variadic arguments, but 3 were 
> provided
> e:\jiefu\ws\jdk\src\hotspot\share\compiler\methodMatcher.cpp(319): warning 
> C4778: 'sscanf' : unterminated format string 
> '%1022[[);/\x01\x02\x03\x04\x05\x06\a\b\n\v\f\r\x0e\x0f\x10\x11\x12\x13\x14\x15\x16\x17\x18\x19\x1a\x1b\x1c\x1d\x1e\x1f!"#$%&'*+,-0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ\^_`abcdefghijklmnopqrstuvwxyz{|}~\xe2\x82\xac\xe4\xba\x97\xe5\x84\x8e\xe5\x8e%n'
> e:\jiefu\ws\jdk\src\hotspot\share\compiler\methodMatcher.cpp(319): warning 
> C4474: 'sscanf' : too many arguments passed for format string
> e:\jiefu\ws\jdk\src\hotspot\share\compiler\methodMatcher.cpp(319): note: 
> placeholders and their parameters expect 0 variadic arguments, but 2 were 
> provided
> 
> 
> The failure is caused by non-ASCII chars in the format string of sscanf 
> [1][2], which is non-portable on our Windows platform.
> In fact, these non-ASCII coding also triggers C4819 warning, which had been 
> disabled in JDK-8216154 [3].
> And I also found an article showing that sscanf may fail with non-ASCII in 
> the format string [4].
> 
> So it would be nice to remove these non-ASCII chars  (`\x80 ~ \xef`).
> And I think it's safe to do so.
> 
> This is because:
>  1) There are actually no non-ASCII chars for package/class/method/signature 
> names.
>  2) I don't think there is a use case, in which people will input non-ASCII 
> for `CompileCommand`.
> 
> You may argue that the non-ASCII may be used by the parser itself.
> But I didn't find that usage at all. (Please let me know if I miss something.)
> 
> So I suggest to remove these non-ASCII code to make HotSpot to be more 
> portable.
> And if we do so, we can also remove the only one 
> `PRAGMA_DISABLE_MSVC_WARNING(4819)` [5].
> 
> Testing:
>  - Build tests on Windows
>  - tier1~3 on Linux/x64
> 
> Thanks.
> Best regards,
> Jie
> 
> [1] 
> https://github.com/openjdk/jdk/blob/master/src/hotspot/share/compiler/methodMatcher.cpp#L269
> [2] 
> https://github.com/openjdk/jdk/blob/master/src/hotspot/share/compiler/methodMatcher.cpp#L319
> [3] 
> https://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2019-January/032014.html
> [4] https://jeffpar.github.io/kbarchive/kb/047/Q47369/
> [5] 
> https://github.com/openjdk/jdk/blob/master/src/hotspot/share/compiler/methodMatcher.cpp#L246

Jie Fu has updated the pull request incrementally with one additional commit 
since the last revision:

  Disable non-ASCII for Windows only

-

Changes:
  - all: https://git.openjdk.java.net/jdk/pull/5704/files
  - new: https://git.openjdk.java.net/jdk/pull/5704/files/d4b84f2b..e1271085

Webrevs:
 - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=5704&range=01
 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=5704&range=00-01

  Stats: 30 lines in 1 file changed: 30 ins; 0 del; 0 mod
  Patch: https://git.openjdk.java.net/jdk/pull/5704.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/5704/head:pull/5704

PR: https://git.openjdk.java.net/jdk/pull/5704


Re: RFR: 8274329: Fix non-portable HotSpot code in MethodMatcher::parse_method_pattern

2021-10-05 Thread Magnus Ihse Bursie

On 2021-10-05 08:41, Ioi Lam wrote:
As you can see, the CJK characters in the command-line arguments can't 
even be correctly passed as arguments to the Java main class. If that 
doesn't work, I can't see how we can get `-XX:CompileCommand` to work 
with CJK characters. 
So, what does that mean? That we should explicitly limit 
`-XX:CompileCommand`to work with ASCII-only arguments? I accept that we 
might not get all characters to work in all circumstances due to 
limitations in Windows, but the current state of affairs still feel 
unsatisfactory. We should at least have a better failure mode, and 
document any limitations.


/Magnus


Re: RFR: 8274329: Fix non-portable HotSpot code in MethodMatcher::parse_method_pattern

2021-10-04 Thread Ioi Lam
On Sun, 26 Sep 2021 09:55:00 GMT, Jie Fu  wrote:

> Hi all,
> 
> I tried to build OpenJDK on Cygwin (Windows 2016 + VS2019).
> However, I failed with C4474 and C4778 warnings as below:
> 
> Compiling 100 properties into resource bundles for java.desktop
> Compiling 3038 files for java.base
> e:\jiefu\ws\jdk\src\hotspot\share\compiler\methodMatcher.cpp(269): error 
> C2220: the following warning is treated as an error
> e:\jiefu\ws\jdk\src\hotspot\share\compiler\methodMatcher.cpp(269): warning 
> C4778: 'sscanf' : unterminated format string 
> '%255[*\x01\x02\x03\x04\x05\x06\a\b\n\v\f\r\x0e\x0f\x10\x11\x12\x13\x14\x15\x16\x17\x18\x19\x1a\x1b\x1c\x1d\x1e\x1f!"#$%&'*+,-0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ\^_`abcdefghijklmnopqrstuvwxyz{|}~\xe2\x82\xac\xe4\xba\x97\xe5\x84\x8e\xe5\x8e\x97%n'
> e:\jiefu\ws\jdk\src\hotspot\share\compiler\methodMatcher.cpp(269): warning 
> C4474: 'sscanf' : too many arguments passed for format string
> e:\jiefu\ws\jdk\src\hotspot\share\compiler\methodMatcher.cpp(269): note: 
> placeholders and their parameters expect 1 variadic arguments, but 3 were 
> provided
> e:\jiefu\ws\jdk\src\hotspot\share\compiler\methodMatcher.cpp(319): warning 
> C4778: 'sscanf' : unterminated format string 
> '%1022[[);/\x01\x02\x03\x04\x05\x06\a\b\n\v\f\r\x0e\x0f\x10\x11\x12\x13\x14\x15\x16\x17\x18\x19\x1a\x1b\x1c\x1d\x1e\x1f!"#$%&'*+,-0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ\^_`abcdefghijklmnopqrstuvwxyz{|}~\xe2\x82\xac\xe4\xba\x97\xe5\x84\x8e\xe5\x8e%n'
> e:\jiefu\ws\jdk\src\hotspot\share\compiler\methodMatcher.cpp(319): warning 
> C4474: 'sscanf' : too many arguments passed for format string
> e:\jiefu\ws\jdk\src\hotspot\share\compiler\methodMatcher.cpp(319): note: 
> placeholders and their parameters expect 0 variadic arguments, but 2 were 
> provided
> 
> 
> The failure is caused by non-ASCII chars in the format string of sscanf 
> [1][2], which is non-portable on our Windows platform.
> In fact, these non-ASCII coding also triggers C4819 warning, which had been 
> disabled in JDK-8216154 [3].
> And I also found an article showing that sscanf may fail with non-ASCII in 
> the format string [4].
> 
> So it would be nice to remove these non-ASCII chars  (`\x80 ~ \xef`).
> And I think it's safe to do so.
> 
> This is because:
>  1) There are actually no non-ASCII chars for package/class/method/signature 
> names.
>  2) I don't think there is a use case, in which people will input non-ASCII 
> for `CompileCommand`.
> 
> You may argue that the non-ASCII may be used by the parser itself.
> But I didn't find that usage at all. (Please let me know if I miss something.)
> 
> So I suggest to remove these non-ASCII code to make HotSpot to be more 
> portable.
> And if we do so, we can also remove the only one 
> `PRAGMA_DISABLE_MSVC_WARNING(4819)` [5].
> 
> Testing:
>  - Build tests on Windows
>  - tier1~3 on Linux/x64
> 
> Thanks.
> Best regards,
> Jie
> 
> [1] 
> https://github.com/openjdk/jdk/blob/master/src/hotspot/share/compiler/methodMatcher.cpp#L269
> [2] 
> https://github.com/openjdk/jdk/blob/master/src/hotspot/share/compiler/methodMatcher.cpp#L319
> [3] 
> https://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2019-January/032014.html
> [4] https://jeffpar.github.io/kbarchive/kb/047/Q47369/
> [5] 
> https://github.com/openjdk/jdk/blob/master/src/hotspot/share/compiler/methodMatcher.cpp#L246

My experiments above with ` -XX:CompileCommand='compileonly,*::爪哇'` was done on 
Linux. I tried doing the same on Windows. On US-English Windows, the default 
codepage is 437 (DOS Latin US). If I change it to 65001 (UTF8) then Java is 
able to output CJK characters to the console.


public class CJK {
public static void main(String args[]) {
System.out.println(args[0]);
\u722a\u54c7();
}

static void \u722a\u54c7() { // Chinese word for "Java"
Thread.dumpStack();
}
}



c:\ade>chcp
Active code page: 437

c:\ade>jdk-17\bin\java -cp . CJK 123
123
java.lang.Exception: Stack trace
at java.base/java.lang.Thread.dumpStack(Thread.java:1380)
at CJK.??(CJK.java:8)
at CJK.main(CJK.java:4)

c:\ade>chcp 65001
Active code page: 65001

c:\ade>jdk-17\bin\java -cp . CJK 爪哇
??
java.lang.Exception: Stack trace
at java.base/java.lang.Thread.dumpStack(Thread.java:1380)
at CJK.爪哇(CJK.java:8)
at CJK.main(CJK.java:4)


As you can see, the CJK characters in the command-line arguments can't even be 
correctly passed as arguments to the Java main class. If that doesn't work, I 
can't see how we can get `-XX:CompileCommand` to work with CJK characters.

-

PR: https://git.openjdk.java.net/jdk/pull/5704


Re: RFR: 8274329: Fix non-portable HotSpot code in MethodMatcher::parse_method_pattern

2021-10-01 Thread Jie Fu
On Thu, 30 Sep 2021 23:41:20 GMT, Jie Fu  wrote:

> > I will do your experiment next week. This is because it's already our 
> > National Day week and I can't find an English Windows machine until next 
> > week. I'll let you know the result as soon as possible. Thanks.
> 
> No need to hurry :-). In case you can't find an English Windows, I think you 
> can use the `chcp 65001` command mentioned in 
> https://stackoverflow.com/questions/388490/how-to-use-unicode-characters-in-windows-command-line
>  to change your command-line window to use the UTF8 codepage.

Hi @iklam ,

methodMatcher.obj [1] built with `System Locale: zh-cn;Chinese 
(China)`
methodMatcher.obj [2] built with `System Locale: en-us;English 
(United States)"`

There seems no difference when checking with `strings methodMatcher.obj`.

The warnings disappear when the system locale is `en-us;English (United 
States)`.
But unfortunately, I can't reproduce the "CJK" test example, which means 
non-ASCII chars for CompileCommand still fail for both jdk images (even when 
built with en-us locale, no warnings at all).

So it's far more complicated than I had thought.
I will just close this pr since we can't remove the non-ASCII code, which works 
in some countries.

Thank you all for your help and valuable comments.

Best regards,
Jie

[1] 
https://github.com/DamonFool/experiment/blob/main/JDK-8274329/ch-methodMatcher.obj
[2] 
https://github.com/DamonFool/experiment/blob/main/JDK-8274329/en-methodMatcher.obj

-

PR: https://git.openjdk.java.net/jdk/pull/5704


Re: RFR: 8274329: Fix non-portable HotSpot code in MethodMatcher::parse_method_pattern

2021-09-30 Thread Jie Fu
On Thu, 30 Sep 2021 23:30:29 GMT, Ioi Lam  wrote:

> > I will do your experiment next week. This is because it's already our 
> > National Day week and I can't find an English Windows machine until next 
> > week. I'll let you know the result as soon as possible. Thanks.
> 
> No need to hurry :-). In case you can't find an English Windows, I think you 
> can use the `chcp 65001` command mentioned in 
> https://stackoverflow.com/questions/388490/how-to-use-unicode-characters-in-windows-command-line
>  to change your command-line window to use the UTF8 codepage.

Okay.

I also note the warning

e:\jiefu\ws\jdk\src\hotspot\share\compiler\methodMatcher.cpp(269): warning 
C4778: 'sscanf' : unterminated format string 
'%255[*\x01\x02\x03\x04\x05\x06\a\b\n\v\f\r\x0e\x0f\x10\x11\x12\x13\x14\x15\x16\x17\x18\x19\x1a\x1b\x1c\x1d\x1e\x1f!"#$%&'*+,-0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ\^_`abcdefghijklmnopqrstuvwxyz{|}~\xe2\x82\xac\xe4\xba\x97\xe5\x84\x8e\xe5\x8e\x97%n'


It is already different with the original `RANGEBASE` sequence [1].

~\xe2\x82\xac\xe4\xba\x97\xe5\x84\x8e\xe5\x8e\x97%n

vs.

"\x80\x81\x82\x83\x84\x85\x86\x87\x88\x89\x8a\x8b\x8c\x8d\x8e\x8f" \
"\x90\x91\x92\x93\x94\x95\x96\x97\x98\x99\x9a\x9b\x9c\x9d\x9e\x9f" \
"\xa0\xa1\xa2\xa3\xa4\xa5\xa6\xa7\xa8\xa9\xaa\xab\xac\xad\xae\xaf" \
"\xb0\xb1\xb2\xb3\xb4\xb5\xb6\xb7\xb8\xb9\xba\xbb\xbc\xbd\xbe\xbf" \
"\xc0\xc1\xc2\xc3\xc4\xc5\xc6\xc7\xc8\xc9\xca\xcb\xcc\xcd\xce\xcf" \
"\xd0\xd1\xd2\xd3\xd4\xd5\xd6\xd7\xd8\xd9\xda\xdb\xdc\xdd\xde\xdf" \
"\xe0\xe1\xe2\xe3\xe4\xe5\xe6\xe7\xe8\xe9\xea\xeb\xec\xed\xee\xef"


Not sure if this fact is sufficient to say the literal strings will be 
different in methodMatcher.obj.
[1] 
https://github.com/openjdk/jdk/blob/master/src/hotspot/share/compiler/methodMatcher.cpp#L49

-

PR: https://git.openjdk.java.net/jdk/pull/5704


Re: RFR: 8274329: Fix non-portable HotSpot code in MethodMatcher::parse_method_pattern

2021-09-30 Thread Ioi Lam
On Thu, 30 Sep 2021 23:04:17 GMT, Jie Fu  wrote:

> I will do your experiment next week. This is because it's already our 
> National Day week and I can't find an English Windows machine until next 
> week. I'll let you know the result as soon as possible. Thanks.

No need to hurry :-). In case you can't find an English Windows, I think you 
can use the `chcp 65001` command mentioned in 
https://stackoverflow.com/questions/388490/how-to-use-unicode-characters-in-windows-command-line
 to change your command-line window to use the UTF8 codepage.

-

PR: https://git.openjdk.java.net/jdk/pull/5704


Re: RFR: 8274329: Fix non-portable HotSpot code in MethodMatcher::parse_method_pattern

2021-09-30 Thread Jie Fu
On Thu, 30 Sep 2021 17:53:39 GMT, Ioi Lam  wrote:

> The current limitations of the MethodMather class are:
> 
> [1] `parse_method_pattern(char*& line, ...)` requires `line` to be a 
> UTF8-encoded byte sequence. Essentially, this means that for 
> -XX:CompileCommand to support non-ASCII characters, the JVM (and the shell 
> that runs the JVM) must be using UTF8 character encoding.
> 
> Note that a "locale" contains 3 parts: language, country and character 
> encoding. For example,
> 
> * en_US.utf8 (English language, United States, UTF8 character encoding)
> * zh_CN.utf8 (Chinese language, China,  UTF8 character encoding)
> * zh_CN.gbk (Chinese language, China,  GBK character encoding)
> 
> The first two support non-ASCII characters in -XX:CompileCommand, but the 
> third one doesn't.
> 
> [2] MethodMather uses `sscanf`. It assumes that the JVM is running with UTF8 
> character encoding because
> 
> * It uses `char*` strings returned by `sscanf` to compare with the bytes 
> stored in Symbols. This requires `sscanf`  to return strings that are encoded 
> in UTF8, because Symbols stores the string with UTF8-encoded bytes.
> * It relies on range checking by `sscanf` to enforce the following 
> restrictions. However, these restrictions are given with the RANGE macro 
> which is UTF8 encoded bytes (and I suspect that these are incorrect when 
> handling multi-byte UTF8-encoded characters):
> 
> ```
> // '\0' and 0xf0-0xff are disallowed in constant string values
> // 0x20 ' ', 0x09 '\t' and, 0x2c ',' are used in the matching
> // 0x5b '[' and 0x5d ']' can not be used because of the matcher
> // 0x28 '(' and 0x29 ')' are used for the signature
> // 0x2e '.' is always replaced before the matching
> // 0x2f '/' is only used in the class name as package separator
> ```
> 
> == Proposed solutions:
> 
> I don't think we can solve [1] easily. To handle non-ASCII characters that 
> are non encoded in UTF8, we need to call NewPlatformString() in 
> src/java.base/share/native/libjli/java.c. However, this executes Java code, 
> but -XX:CompileCommand needs to be processed before any Java code execution. 
> ==> Proposal: IGNORE it for now.
> 
> For [2], there are two distinct issues:
> 
> (a) The restriction checks are invalid when the JVM is running in an non-UTF8 
> encoding -- this is a moot point because we can't handle [1] anyway, so the 
> data given to sscanf() is already bad. => Proposal: IGNORE it for now
> 
> (b) VC++ compilation warning when methodMather.cpp is compiled in non-UTF8 
> environments
> 
> This is just a warning, and (I think .) it doesn't change the object file 
> at all. I.e., the literal strings in methodMatcher.obj are exactly the same 
> as if methodMather.cpp is compiled under a UTF8 environment.
> 
> Proposal: use pragma to disable the warning. Assuming that my analysis for 
> [1] and (a) is correct, there's no reason to fix the sscanf code. Disabling 
> the warnings with pragma is the most painless and easiest way to handle this.
> 
> @DamonFool could you try this experiment:
> 
> * Implement the pragma and build two JDKs -- one in a Chinese Windows 
> environment, and another in an English Windows environment
> * run `strings methodMatcher.obj` and see if the output is identical
> * run the "CJK" test example in my previous comment, and see if you get 
> identical results with both JDKs
>   
>   * On Windows, you may need to do this to force the terminal to be using 
> UTF8 code page. See 
> https://stackoverflow.com/questions/388490/how-to-use-unicode-characters-in-windows-command-line
> 
> (If this doesn't work, an alternative is to avoid using sscanf and write our 
> own parser).
> 
> Thanks

Thanks @iklam for your excellent analysis.

So HotSpot does support non-ASCII chars as names.
Then we shouldn't simply remove such non-ASCII code.

I will do your experiment next week.
This is because it's already our National Day week and I can't find an English 
Windows machine until next week.
I'll let you know the result as soon as possible.
Thanks.

-

PR: https://git.openjdk.java.net/jdk/pull/5704


Re: RFR: 8274329: Fix non-portable HotSpot code in MethodMatcher::parse_method_pattern

2021-09-30 Thread Ioi Lam
On Sun, 26 Sep 2021 09:55:00 GMT, Jie Fu  wrote:

> Hi all,
> 
> I tried to build OpenJDK on Cygwin (Windows 2016 + VS2019).
> However, I failed with C4474 and C4778 warnings as below:
> 
> Compiling 100 properties into resource bundles for java.desktop
> Compiling 3038 files for java.base
> e:\jiefu\ws\jdk\src\hotspot\share\compiler\methodMatcher.cpp(269): error 
> C2220: the following warning is treated as an error
> e:\jiefu\ws\jdk\src\hotspot\share\compiler\methodMatcher.cpp(269): warning 
> C4778: 'sscanf' : unterminated format string 
> '%255[*\x01\x02\x03\x04\x05\x06\a\b\n\v\f\r\x0e\x0f\x10\x11\x12\x13\x14\x15\x16\x17\x18\x19\x1a\x1b\x1c\x1d\x1e\x1f!"#$%&'*+,-0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ\^_`abcdefghijklmnopqrstuvwxyz{|}~\xe2\x82\xac\xe4\xba\x97\xe5\x84\x8e\xe5\x8e\x97%n'
> e:\jiefu\ws\jdk\src\hotspot\share\compiler\methodMatcher.cpp(269): warning 
> C4474: 'sscanf' : too many arguments passed for format string
> e:\jiefu\ws\jdk\src\hotspot\share\compiler\methodMatcher.cpp(269): note: 
> placeholders and their parameters expect 1 variadic arguments, but 3 were 
> provided
> e:\jiefu\ws\jdk\src\hotspot\share\compiler\methodMatcher.cpp(319): warning 
> C4778: 'sscanf' : unterminated format string 
> '%1022[[);/\x01\x02\x03\x04\x05\x06\a\b\n\v\f\r\x0e\x0f\x10\x11\x12\x13\x14\x15\x16\x17\x18\x19\x1a\x1b\x1c\x1d\x1e\x1f!"#$%&'*+,-0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ\^_`abcdefghijklmnopqrstuvwxyz{|}~\xe2\x82\xac\xe4\xba\x97\xe5\x84\x8e\xe5\x8e%n'
> e:\jiefu\ws\jdk\src\hotspot\share\compiler\methodMatcher.cpp(319): warning 
> C4474: 'sscanf' : too many arguments passed for format string
> e:\jiefu\ws\jdk\src\hotspot\share\compiler\methodMatcher.cpp(319): note: 
> placeholders and their parameters expect 0 variadic arguments, but 2 were 
> provided
> 
> 
> The failure is caused by non-ASCII chars in the format string of sscanf 
> [1][2], which is non-portable on our Windows platform.
> In fact, these non-ASCII coding also triggers C4819 warning, which had been 
> disabled in JDK-8216154 [3].
> And I also found an article showing that sscanf may fail with non-ASCII in 
> the format string [4].
> 
> So it would be nice to remove these non-ASCII chars  (`\x80 ~ \xef`).
> And I think it's safe to do so.
> 
> This is because:
>  1) There are actually no non-ASCII chars for package/class/method/signature 
> names.
>  2) I don't think there is a use case, in which people will input non-ASCII 
> for `CompileCommand`.
> 
> You may argue that the non-ASCII may be used by the parser itself.
> But I didn't find that usage at all. (Please let me know if I miss something.)
> 
> So I suggest to remove these non-ASCII code to make HotSpot to be more 
> portable.
> And if we do so, we can also remove the only one 
> `PRAGMA_DISABLE_MSVC_WARNING(4819)` [5].
> 
> Testing:
>  - Build tests on Windows
>  - tier1~3 on Linux/x64
> 
> Thanks.
> Best regards,
> Jie
> 
> [1] 
> https://github.com/openjdk/jdk/blob/master/src/hotspot/share/compiler/methodMatcher.cpp#L269
> [2] 
> https://github.com/openjdk/jdk/blob/master/src/hotspot/share/compiler/methodMatcher.cpp#L319
> [3] 
> https://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2019-January/032014.html
> [4] https://jeffpar.github.io/kbarchive/kb/047/Q47369/
> [5] 
> https://github.com/openjdk/jdk/blob/master/src/hotspot/share/compiler/methodMatcher.cpp#L246

The current limitations of the MethodMather class are:

[1] `parse_method_pattern(char*& line, ...)` requires `line` to be a 
UTF8-encoded byte sequence. Essentially, this means that for -XX:CompileCommand 
to support non-ASCII characters, the JVM (and the shell that runs the JVM) must 
be using UTF8 character encoding.

Note that a "locale" contains 3 parts: language, country and character 
encoding. For example,

- en_US.utf8 (English language, United States, UTF8 character encoding)
- zh_CN.utf8 (Chinese language, China,  UTF8 character encoding)
- zh_CN.gbk (Chinese language, China,  GBK character encoding)

The first two support  non-ASCII characters in  -XX:CompileCommand, but the 
third one doesn't.

[2] MethodMather uses `sscanf`. It assumes that the JVM is running with UTF8 
character encoding because

- It uses `char*` strings returned by `sscanf` to compare with the bytes stored 
in Symbols. This requires `sscanf`  to return strings that are encoded in UTF8, 
because Symbols stores the string with UTF8-encoded bytes.
- It relies on range checking by `sscanf` to enforce the following 
restrictions. However, these restrictions are given with the RANGE macro which 
is UTF8 encoded bytes (and I suspect that these are incorrect when handling 
multi-byte UTF8-encoded characters):


// '\0' and 0xf0-0xff are disallowed in constant string values
// 0x20 ' ', 0x09 '\t' and, 0x2c ',' are used in the matching
// 0x5b '[' and 0x5d ']' can not be used because of the matcher
// 0x28 '(' and 0x29 ')' are used for the signature
// 0x2e '.' is always replaced before the matching
// 0x2f '/' is only used

Re: RFR: 8274329: Fix non-portable HotSpot code in MethodMatcher::parse_method_pattern

2021-09-30 Thread Ioi Lam
On Thu, 30 Sep 2021 08:31:51 GMT, Jie Fu  wrote:

> > `RANGEBASE` was added by 
> > [JDK-6500501](https://bugs.openjdk.java.net/browse/JDK-6500501) and later 
> > was modified by 
> > [JDK-8027829](https://bugs.openjdk.java.net/browse/JDK-8027829)
> > Note the original comment from 6500501:
> > ```
> > // The characters allowed in a class or method name.  All characters > 0x7f
> > // are allowed in order to handle obfuscated class files (e.g. Volano)
> > ```
> 
> Thanks @vnkozlov for your very helpful comments.
> 
> I have one question: how can we specify (non-ascii chars) and (non-printable 
> ascii chars) through `-XX:CompileCommand`?
> 
> I just learned from https://bugs.openjdk.java.net/browse/JDK-8027829 that we 
> can use unicode like `\u`. But it doesn't work in my experiments.
> 
> My example was made from: 
> https://bugs.openjdk.java.net/secure/attachment/17128/UnicodeIdentifierTest.java
> 
> ```
> public class UnicodeIdentifierTest {
> public static void main(String args[]) {
> System.out.println("Can I use \\u0001 in identifier name? " +
>(Character.isJavaIdentifierPart(1) ? "yes" : 
> "no"));
> for (int i = 0; i < 10; i++ )
> methodWithUnicode\u0001Char();
> 
> System.out.println("Can I use \\u00aa in identifier name? " +
>(Character.isJavaIdentifierPart(0xaa) ? "yes" : 
> "no"));
> for (int i = 0; i < 10; i++ )
> methodWithUnicode\u00aaChar();
> 
> System.out.println("Can I use \\u006b in identifier name? " +
>(Character.isJavaIdentifierPart(0x6b) ? "yes" : 
> "no"));
> for (int i = 0; i < 10; i++ )
> methodWithUnicode\u006bChar();
> 
> }
> public static int a = 0;
> public static void methodWithUnicode\u0001Char() {
> a++;
> }
> 
> public static void methodWithUnicode\u00aaChar() {
> a++;
> }
> 
> public static void methodWithUnicode\u006bChar() {
> a++;
> }
> }
> ```
> 
> And I tried to exclude some specific methods like this
> 
> ```
> ${JDK}/bin/java \
>-XX:+PrintCompilation \
>-XX:CompileCommand=exclude,`echo -e 
> "UnicodeIdentifierTest::methodWithUnicode\u0001Char"` \
>-XX:CompileCommand=exclude,`echo -e 
> "UnicodeIdentifierTest.methodWithUnicode\u0001Char"` \
>
> -XX:CompileCommand=exclude,"UnicodeIdentifierTest.methodWithUnicode\u0001Char"
>  \
>
> -XX:CompileCommand=exclude,'UnicodeIdentifierTest.methodWithUnicode\u0001Char'
>  \
>
> -XX:CompileCommand=exclude,UnicodeIdentifierTest.methodWithUnicode\u0001Char \
>-XX:CompileCommand=exclude,`echo -e 
> "UnicodeIdentifierTest::methodWithUnicode\u00aaChar"` \
>-XX:CompileCommand=exclude,`echo -e 
> "UnicodeIdentifierTest.methodWithUnicode\u00aaChar"` \
>
> -XX:CompileCommand=exclude,"UnicodeIdentifierTest.methodWithUnicode\u00aaChar"
>  \
>
> -XX:CompileCommand=exclude,'UnicodeIdentifierTest.methodWithUnicode\u00aaChar'
>  \
>
> -XX:CompileCommand=exclude,UnicodeIdentifierTest.methodWithUnicode\u00aaChar \
>-XX:CompileCommand=exclude,`echo -e 
> "UnicodeIdentifierTest::methodWithUnicode\u006bChar"` \
>-XX:CompileCommand=exclude,`echo -e 
> "UnicodeIdentifierTest.methodWithUnicode\u006bChar"` \
>
> -XX:CompileCommand=exclude,"UnicodeIdentifierTest.methodWithUnicode\u006bChar"
>  \
>
> -XX:CompileCommand=exclude,'UnicodeIdentifierTest.methodWithUnicode\u006bChar'
>  \
>
> -XX:CompileCommand=exclude,UnicodeIdentifierTest.methodWithUnicode\u006bChar \
>${TEST}
> ```
> 
> But none of them worked.
> 
> So if there is no other way to specify a non-ascii chars, it seems safe to 
> remove the non-ascii code.
> 
> If I miss something, please let me know. Thanks.

(The Chinese characters in this comment may not be displayed properly inside an 
e-mail reader. Please see this comment on GitHub 
https://github.com/openjdk/jdk/pull/5704)

-XX:CompileCommand does not process \u sequences. However, if your shell's 
locale is UTF8, you can do something like this, by directly entering them on 
the command-line, without escaping with \u: 


public class CJK {
public static void main(String args[]) {
\u722a\u54c7();
}

static void \u722a\u54c7() { // Chinese word for "Java"
Thread.dumpStack();
}
}
===
$ locale
LANG=en_US.UTF-8
LANGUAGE=
LC_CTYPE="en_US.UTF-8"
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_COLLATE="en_US.UTF-8"
LC_MONETARY="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_PAPER="en_US.UTF-8"
LC_NAME="en_US.UTF-8"
LC_ADDRESS="en_US.UTF-8"
LC_TELEPHONE="en_US.UTF-8"
LC_MEASUREMENT="en_US.UTF-8"
LC_IDENTIFICATION="en_US.UTF-8"
LC_ALL=

$ javac CJK.java
$ java -Xcomp -XX:-BackgroundCompilation -XX:CompileCommand='compileonly,*::爪哇' 
-XX:+PrintCompilation -cp . CJK > log.txt
java.lang.Exception: Stack trace
at java.base/java.lang.Thread.dumpStack(Thread.java:1380)
at CJK.爪哇  (CJK.java:7)
at C

Re: RFR: 8274329: Fix non-portable HotSpot code in MethodMatcher::parse_method_pattern

2021-09-30 Thread Jie Fu
On Thu, 30 Sep 2021 10:01:52 GMT, Magnus Ihse Bursie  wrote:

> Some misc remarks from a build PoV:
> 
> * We count language and region settings as a build environment requirement, 
> not a portability issue.
> * It is really a shame that Microsoft is making changes to these so darned 
> hard. On all other platforms, LC_ALL=C in the make file just fixes the 
> problem. :-(
> * But we do want the JDK to be easy to build, so this means that we might 
> need to support building on more than en_US on Windows, at least until 
> Microsoft get's their act together.
> 
> From what I see in the discussion here there seems to be no clarity in what 
> range of character the specification allows. This needs to be absolutely 
> clear for any changes here -- we can't filter out legal characters just 
> because they are problematic to build on non en_US platforms.
> 
> However, I'm thinking that you need to take a step back and see what you are 
> really trying to solve. To me, it seems that sscanf is not the right tool for 
> the job, and the fact that it has worked until now is more a lucky 
> coincidence. It seems, from a quick glance, that you should consider the 
> input a byte array, and process it like that, instead of a string, if the 
> encoding is unclear, and the spec is talking about character values (like 
> 0x7f) rather than what characters they are supposed to represent in a 
> specific encoding.

Thanks @magicus .

The background is that we want to build CI/CD pipelines for Windows platforms 
to help the OpenJDK development.

We already have enough Linux and MacOS pipelines but still not have one for 
Windows.
So we just plan to setup some Windows pipelines to further improve OpenJDK 
product quality.

But to my surprise, OpenJDK fails to build on our Windows platforms.

You may suggest changing the locale settings.
But many of our Apps don't allow us to changet it since we are non-English 
speaking country.
It's unfortunate that OpenJDK can't build on our Windows platforms.

It's not our goal to make CompileCommand work with non-ASCII chars.
If it doesn't make anything worse, we can just remove the non-ASCII code to 
make it to be more portable.

-

PR: https://git.openjdk.java.net/jdk/pull/5704


Re: RFR: 8274329: Fix non-portable HotSpot code in MethodMatcher::parse_method_pattern

2021-09-30 Thread Magnus Ihse Bursie
On Sun, 26 Sep 2021 09:55:00 GMT, Jie Fu  wrote:

> Hi all,
> 
> I tried to build OpenJDK on Cygwin (Windows 2016 + VS2019).
> However, I failed with C4474 and C4778 warnings as below:
> 
> Compiling 100 properties into resource bundles for java.desktop
> Compiling 3038 files for java.base
> e:\jiefu\ws\jdk\src\hotspot\share\compiler\methodMatcher.cpp(269): error 
> C2220: the following warning is treated as an error
> e:\jiefu\ws\jdk\src\hotspot\share\compiler\methodMatcher.cpp(269): warning 
> C4778: 'sscanf' : unterminated format string 
> '%255[*\x01\x02\x03\x04\x05\x06\a\b\n\v\f\r\x0e\x0f\x10\x11\x12\x13\x14\x15\x16\x17\x18\x19\x1a\x1b\x1c\x1d\x1e\x1f!"#$%&'*+,-0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ\^_`abcdefghijklmnopqrstuvwxyz{|}~\xe2\x82\xac\xe4\xba\x97\xe5\x84\x8e\xe5\x8e\x97%n'
> e:\jiefu\ws\jdk\src\hotspot\share\compiler\methodMatcher.cpp(269): warning 
> C4474: 'sscanf' : too many arguments passed for format string
> e:\jiefu\ws\jdk\src\hotspot\share\compiler\methodMatcher.cpp(269): note: 
> placeholders and their parameters expect 1 variadic arguments, but 3 were 
> provided
> e:\jiefu\ws\jdk\src\hotspot\share\compiler\methodMatcher.cpp(319): warning 
> C4778: 'sscanf' : unterminated format string 
> '%1022[[);/\x01\x02\x03\x04\x05\x06\a\b\n\v\f\r\x0e\x0f\x10\x11\x12\x13\x14\x15\x16\x17\x18\x19\x1a\x1b\x1c\x1d\x1e\x1f!"#$%&'*+,-0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ\^_`abcdefghijklmnopqrstuvwxyz{|}~\xe2\x82\xac\xe4\xba\x97\xe5\x84\x8e\xe5\x8e%n'
> e:\jiefu\ws\jdk\src\hotspot\share\compiler\methodMatcher.cpp(319): warning 
> C4474: 'sscanf' : too many arguments passed for format string
> e:\jiefu\ws\jdk\src\hotspot\share\compiler\methodMatcher.cpp(319): note: 
> placeholders and their parameters expect 0 variadic arguments, but 2 were 
> provided
> 
> 
> The failure is caused by non-ASCII chars in the format string of sscanf 
> [1][2], which is non-portable on our Windows platform.
> In fact, these non-ASCII coding also triggers C4819 warning, which had been 
> disabled in JDK-8216154 [3].
> And I also found an article showing that sscanf may fail with non-ASCII in 
> the format string [4].
> 
> So it would be nice to remove these non-ASCII chars  (`\x80 ~ \xef`).
> And I think it's safe to do so.
> 
> This is because:
>  1) There are actually no non-ASCII chars for package/class/method/signature 
> names.
>  2) I don't think there is a use case, in which people will input non-ASCII 
> for `CompileCommand`.
> 
> You may argue that the non-ASCII may be used by the parser itself.
> But I didn't find that usage at all. (Please let me know if I miss something.)
> 
> So I suggest to remove these non-ASCII code to make HotSpot to be more 
> portable.
> And if we do so, we can also remove the only one 
> `PRAGMA_DISABLE_MSVC_WARNING(4819)` [5].
> 
> Testing:
>  - Build tests on Windows
>  - tier1~3 on Linux/x64
> 
> Thanks.
> Best regards,
> Jie
> 
> [1] 
> https://github.com/openjdk/jdk/blob/master/src/hotspot/share/compiler/methodMatcher.cpp#L269
> [2] 
> https://github.com/openjdk/jdk/blob/master/src/hotspot/share/compiler/methodMatcher.cpp#L319
> [3] 
> https://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2019-January/032014.html
> [4] https://jeffpar.github.io/kbarchive/kb/047/Q47369/
> [5] 
> https://github.com/openjdk/jdk/blob/master/src/hotspot/share/compiler/methodMatcher.cpp#L246

Some misc remarks from a build PoV:
 * We count language and region settings as a build environment requirement, 
not a portability issue. 
 * It is really a shame that Microsoft is making changes to these so darned 
hard. On all other platforms, LC_ALL=C in the make file just fixes the problem. 
:-(
 * But we do want the JDK to be easy to build, so this means that we might need 
to support building on more than en_US on Windows, at least until Microsoft 
get's their act together.

>From what I see in the discussion here there seems to be no clarity in what 
>range of character the specification allows. This needs to be absolutely clear 
>for any changes here -- we can't filter out legal characters just because they 
>are problematic to build on non en_US platforms.

However, I'm thinking that you need to take a step back and see what you are 
really trying to solve. To me, it seems that sscanf is not the right tool for 
the job, and the fact that it has worked until now is more a lucky coincidence. 
It seems, from a quick glance, that you should consider the input a byte array, 
and process it like that, instead of a string, if the encoding is unclear, and 
the spec is talking about character values (like 0x7f) rather than what 
characters they are supposed to represent in a specific encoding.

-

PR: https://git.openjdk.java.net/jdk/pull/5704


Re: RFR: 8274329: Fix non-portable HotSpot code in MethodMatcher::parse_method_pattern

2021-09-30 Thread Jie Fu
On Wed, 29 Sep 2021 19:40:20 GMT, Vladimir Kozlov  wrote:

> `RANGEBASE` was added by 
> [JDK-6500501](https://bugs.openjdk.java.net/browse/JDK-6500501) and later was 
> modified by [JDK-8027829](https://bugs.openjdk.java.net/browse/JDK-8027829)
> 
> Note the original comment from 6500501:
> 
> ```
> // The characters allowed in a class or method name.  All characters > 0x7f
> // are allowed in order to handle obfuscated class files (e.g. Volano)
> ```

Thanks @vnkozlov for your very helpful comments.

I have one question: how can we specify (non-ascii chars) and (non-printable 
ascii chars) through `-XX:CompileCommand`?


I just learned from https://bugs.openjdk.java.net/browse/JDK-8027829 that we 
can use unicode like `\u`.
But it doesn't work in my experiments.

My example was made from: 
https://bugs.openjdk.java.net/secure/attachment/17128/UnicodeIdentifierTest.java

public class UnicodeIdentifierTest {
public static void main(String args[]) {
System.out.println("Can I use \\u0001 in identifier name? " +
   (Character.isJavaIdentifierPart(1) ? "yes" : "no"));
for (int i = 0; i < 10; i++ )
methodWithUnicode\u0001Char();

System.out.println("Can I use \\u00aa in identifier name? " +
   (Character.isJavaIdentifierPart(0xaa) ? "yes" : 
"no"));
for (int i = 0; i < 10; i++ )
methodWithUnicode\u00aaChar();

System.out.println("Can I use \\u006b in identifier name? " +
   (Character.isJavaIdentifierPart(0x6b) ? "yes" : 
"no"));
for (int i = 0; i < 10; i++ )
methodWithUnicode\u006bChar();

}
public static int a = 0;
public static void methodWithUnicode\u0001Char() {
a++;
}

public static void methodWithUnicode\u00aaChar() {
a++;
}

public static void methodWithUnicode\u006bChar() {
a++;
}
}


And I tried to exclude some specific methods like this

${JDK}/bin/java \
   -XX:+PrintCompilation \
   -XX:CompileCommand=exclude,`echo -e 
"UnicodeIdentifierTest::methodWithUnicode\u0001Char"` \
   -XX:CompileCommand=exclude,`echo -e 
"UnicodeIdentifierTest.methodWithUnicode\u0001Char"` \
   
-XX:CompileCommand=exclude,"UnicodeIdentifierTest.methodWithUnicode\u0001Char" \
   
-XX:CompileCommand=exclude,'UnicodeIdentifierTest.methodWithUnicode\u0001Char' \
   -XX:CompileCommand=exclude,UnicodeIdentifierTest.methodWithUnicode\u0001Char 
\
   -XX:CompileCommand=exclude,`echo -e 
"UnicodeIdentifierTest::methodWithUnicode\u00aaChar"` \
   -XX:CompileCommand=exclude,`echo -e 
"UnicodeIdentifierTest.methodWithUnicode\u00aaChar"` \
   
-XX:CompileCommand=exclude,"UnicodeIdentifierTest.methodWithUnicode\u00aaChar" \
   
-XX:CompileCommand=exclude,'UnicodeIdentifierTest.methodWithUnicode\u00aaChar' \
   -XX:CompileCommand=exclude,UnicodeIdentifierTest.methodWithUnicode\u00aaChar 
\
   -XX:CompileCommand=exclude,`echo -e 
"UnicodeIdentifierTest::methodWithUnicode\u006bChar"` \
   -XX:CompileCommand=exclude,`echo -e 
"UnicodeIdentifierTest.methodWithUnicode\u006bChar"` \
   
-XX:CompileCommand=exclude,"UnicodeIdentifierTest.methodWithUnicode\u006bChar" \
   
-XX:CompileCommand=exclude,'UnicodeIdentifierTest.methodWithUnicode\u006bChar' \
   -XX:CompileCommand=exclude,UnicodeIdentifierTest.methodWithUnicode\u006bChar 
\
   ${TEST}


But none of them worked.

So if there is no other way to specify a non-ascii chars, it seems safe to 
remove the non-ascii code.

If I miss something, please let me know.
Thanks.

-

PR: https://git.openjdk.java.net/jdk/pull/5704


Re: RFR: 8274329: Fix non-portable HotSpot code in MethodMatcher::parse_method_pattern

2021-09-29 Thread Vladimir Kozlov
On Sun, 26 Sep 2021 09:55:00 GMT, Jie Fu  wrote:

> Hi all,
> 
> I tried to build OpenJDK on Cygwin (Windows 2016 + VS2019).
> However, I failed with C4474 and C4778 warnings as below:
> 
> Compiling 100 properties into resource bundles for java.desktop
> Compiling 3038 files for java.base
> e:\jiefu\ws\jdk\src\hotspot\share\compiler\methodMatcher.cpp(269): error 
> C2220: the following warning is treated as an error
> e:\jiefu\ws\jdk\src\hotspot\share\compiler\methodMatcher.cpp(269): warning 
> C4778: 'sscanf' : unterminated format string 
> '%255[*\x01\x02\x03\x04\x05\x06\a\b\n\v\f\r\x0e\x0f\x10\x11\x12\x13\x14\x15\x16\x17\x18\x19\x1a\x1b\x1c\x1d\x1e\x1f!"#$%&'*+,-0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ\^_`abcdefghijklmnopqrstuvwxyz{|}~\xe2\x82\xac\xe4\xba\x97\xe5\x84\x8e\xe5\x8e\x97%n'
> e:\jiefu\ws\jdk\src\hotspot\share\compiler\methodMatcher.cpp(269): warning 
> C4474: 'sscanf' : too many arguments passed for format string
> e:\jiefu\ws\jdk\src\hotspot\share\compiler\methodMatcher.cpp(269): note: 
> placeholders and their parameters expect 1 variadic arguments, but 3 were 
> provided
> e:\jiefu\ws\jdk\src\hotspot\share\compiler\methodMatcher.cpp(319): warning 
> C4778: 'sscanf' : unterminated format string 
> '%1022[[);/\x01\x02\x03\x04\x05\x06\a\b\n\v\f\r\x0e\x0f\x10\x11\x12\x13\x14\x15\x16\x17\x18\x19\x1a\x1b\x1c\x1d\x1e\x1f!"#$%&'*+,-0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ\^_`abcdefghijklmnopqrstuvwxyz{|}~\xe2\x82\xac\xe4\xba\x97\xe5\x84\x8e\xe5\x8e%n'
> e:\jiefu\ws\jdk\src\hotspot\share\compiler\methodMatcher.cpp(319): warning 
> C4474: 'sscanf' : too many arguments passed for format string
> e:\jiefu\ws\jdk\src\hotspot\share\compiler\methodMatcher.cpp(319): note: 
> placeholders and their parameters expect 0 variadic arguments, but 2 were 
> provided
> 
> 
> The failure is caused by non-ASCII chars in the format string of sscanf 
> [1][2], which is non-portable on our Windows platform.
> In fact, these non-ASCII coding also triggers C4819 warning, which had been 
> disabled in JDK-8216154 [3].
> And I also found an article showing that sscanf may fail with non-ASCII in 
> the format string [4].
> 
> So it would be nice to remove these non-ASCII chars  (`\x80 ~ \xef`).
> And I think it's safe to do so.
> 
> This is because:
>  1) There are actually no non-ASCII chars for package/class/method/signature 
> names.
>  2) I don't think there is a use case, in which people will input non-ASCII 
> for `CompileCommand`.
> 
> You may argue that the non-ASCII may be used by the parser itself.
> But I didn't find that usage at all. (Please let me know if I miss something.)
> 
> So I suggest to remove these non-ASCII code to make HotSpot to be more 
> portable.
> And if we do so, we can also remove the only one 
> `PRAGMA_DISABLE_MSVC_WARNING(4819)` [5].
> 
> Testing:
>  - Build tests on Windows
>  - tier1~3 on Linux/x64
> 
> Thanks.
> Best regards,
> Jie
> 
> [1] 
> https://github.com/openjdk/jdk/blob/master/src/hotspot/share/compiler/methodMatcher.cpp#L269
> [2] 
> https://github.com/openjdk/jdk/blob/master/src/hotspot/share/compiler/methodMatcher.cpp#L319
> [3] 
> https://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2019-January/032014.html
> [4] https://jeffpar.github.io/kbarchive/kb/047/Q47369/
> [5] 
> https://github.com/openjdk/jdk/blob/master/src/hotspot/share/compiler/methodMatcher.cpp#L246

`RANGEBASE` was added by 
[JDK-6500501](https://bugs.openjdk.java.net/browse/JDK-6500501) and later was 
modified by [JDK-8027829](https://bugs.openjdk.java.net/browse/JDK-8027829)

Note the original comment from 6500501:

// The characters allowed in a class or method name.  All characters > 0x7f
// are allowed in order to handle obfuscated class files (e.g. Volano)

-

PR: https://git.openjdk.java.net/jdk/pull/5704


Re: RFR: 8274329: Fix non-portable HotSpot code in MethodMatcher::parse_method_pattern

2021-09-29 Thread Vladimir Kozlov
On Sun, 26 Sep 2021 09:55:00 GMT, Jie Fu  wrote:

> Hi all,
> 
> I tried to build OpenJDK on Cygwin (Windows 2016 + VS2019).
> However, I failed with C4474 and C4778 warnings as below:
> 
> Compiling 100 properties into resource bundles for java.desktop
> Compiling 3038 files for java.base
> e:\jiefu\ws\jdk\src\hotspot\share\compiler\methodMatcher.cpp(269): error 
> C2220: the following warning is treated as an error
> e:\jiefu\ws\jdk\src\hotspot\share\compiler\methodMatcher.cpp(269): warning 
> C4778: 'sscanf' : unterminated format string 
> '%255[*\x01\x02\x03\x04\x05\x06\a\b\n\v\f\r\x0e\x0f\x10\x11\x12\x13\x14\x15\x16\x17\x18\x19\x1a\x1b\x1c\x1d\x1e\x1f!"#$%&'*+,-0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ\^_`abcdefghijklmnopqrstuvwxyz{|}~\xe2\x82\xac\xe4\xba\x97\xe5\x84\x8e\xe5\x8e\x97%n'
> e:\jiefu\ws\jdk\src\hotspot\share\compiler\methodMatcher.cpp(269): warning 
> C4474: 'sscanf' : too many arguments passed for format string
> e:\jiefu\ws\jdk\src\hotspot\share\compiler\methodMatcher.cpp(269): note: 
> placeholders and their parameters expect 1 variadic arguments, but 3 were 
> provided
> e:\jiefu\ws\jdk\src\hotspot\share\compiler\methodMatcher.cpp(319): warning 
> C4778: 'sscanf' : unterminated format string 
> '%1022[[);/\x01\x02\x03\x04\x05\x06\a\b\n\v\f\r\x0e\x0f\x10\x11\x12\x13\x14\x15\x16\x17\x18\x19\x1a\x1b\x1c\x1d\x1e\x1f!"#$%&'*+,-0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ\^_`abcdefghijklmnopqrstuvwxyz{|}~\xe2\x82\xac\xe4\xba\x97\xe5\x84\x8e\xe5\x8e%n'
> e:\jiefu\ws\jdk\src\hotspot\share\compiler\methodMatcher.cpp(319): warning 
> C4474: 'sscanf' : too many arguments passed for format string
> e:\jiefu\ws\jdk\src\hotspot\share\compiler\methodMatcher.cpp(319): note: 
> placeholders and their parameters expect 0 variadic arguments, but 2 were 
> provided
> 
> 
> The failure is caused by non-ASCII chars in the format string of sscanf 
> [1][2], which is non-portable on our Windows platform.
> In fact, these non-ASCII coding also triggers C4819 warning, which had been 
> disabled in JDK-8216154 [3].
> And I also found an article showing that sscanf may fail with non-ASCII in 
> the format string [4].
> 
> So it would be nice to remove these non-ASCII chars  (`\x80 ~ \xef`).
> And I think it's safe to do so.
> 
> This is because:
>  1) There are actually no non-ASCII chars for package/class/method/signature 
> names.
>  2) I don't think there is a use case, in which people will input non-ASCII 
> for `CompileCommand`.
> 
> You may argue that the non-ASCII may be used by the parser itself.
> But I didn't find that usage at all. (Please let me know if I miss something.)
> 
> So I suggest to remove these non-ASCII code to make HotSpot to be more 
> portable.
> And if we do so, we can also remove the only one 
> `PRAGMA_DISABLE_MSVC_WARNING(4819)` [5].
> 
> Testing:
>  - Build tests on Windows
>  - tier1~3 on Linux/x64
> 
> Thanks.
> Best regards,
> Jie
> 
> [1] 
> https://github.com/openjdk/jdk/blob/master/src/hotspot/share/compiler/methodMatcher.cpp#L269
> [2] 
> https://github.com/openjdk/jdk/blob/master/src/hotspot/share/compiler/methodMatcher.cpp#L319
> [3] 
> https://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2019-January/032014.html
> [4] https://jeffpar.github.io/kbarchive/kb/047/Q47369/
> [5] 
> https://github.com/openjdk/jdk/blob/master/src/hotspot/share/compiler/methodMatcher.cpp#L246

This should be discussed with `build` group and may be `runtime` to get more 
comments.

-

PR: https://git.openjdk.java.net/jdk/pull/5704