Re: RFR: 8274329: Fix non-portable HotSpot code in MethodMatcher::parse_method_pattern
On Tue, 5 Oct 2021 06:38:05 GMT, Ioi Lam wrote: >> Hi all, >> >> I tried to build OpenJDK on Cygwin (Windows 2016 + VS2019). >> However, I failed with C4474 and C4778 warnings as below: >> >> Compiling 100 properties into resource bundles for java.desktop >> Compiling 3038 files for java.base >> e:\jiefu\ws\jdk\src\hotspot\share\compiler\methodMatcher.cpp(269): error >> C2220: the following warning is treated as an error >> e:\jiefu\ws\jdk\src\hotspot\share\compiler\methodMatcher.cpp(269): warning >> C4778: 'sscanf' : unterminated format string >> '%255[*\x01\x02\x03\x04\x05\x06\a\b\n\v\f\r\x0e\x0f\x10\x11\x12\x13\x14\x15\x16\x17\x18\x19\x1a\x1b\x1c\x1d\x1e\x1f!"#$%&'*+,-0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ\^_`abcdefghijklmnopqrstuvwxyz{|}~\xe2\x82\xac\xe4\xba\x97\xe5\x84\x8e\xe5\x8e\x97%n' >> e:\jiefu\ws\jdk\src\hotspot\share\compiler\methodMatcher.cpp(269): warning >> C4474: 'sscanf' : too many arguments passed for format string >> e:\jiefu\ws\jdk\src\hotspot\share\compiler\methodMatcher.cpp(269): note: >> placeholders and their parameters expect 1 variadic arguments, but 3 were >> provided >> e:\jiefu\ws\jdk\src\hotspot\share\compiler\methodMatcher.cpp(319): warning >> C4778: 'sscanf' : unterminated format string >> '%1022[[);/\x01\x02\x03\x04\x05\x06\a\b\n\v\f\r\x0e\x0f\x10\x11\x12\x13\x14\x15\x16\x17\x18\x19\x1a\x1b\x1c\x1d\x1e\x1f!"#$%&'*+,-0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ\^_`abcdefghijklmnopqrstuvwxyz{|}~\xe2\x82\xac\xe4\xba\x97\xe5\x84\x8e\xe5\x8e%n' >> e:\jiefu\ws\jdk\src\hotspot\share\compiler\methodMatcher.cpp(319): warning >> C4474: 'sscanf' : too many arguments passed for format string >> e:\jiefu\ws\jdk\src\hotspot\share\compiler\methodMatcher.cpp(319): note: >> placeholders and their parameters expect 0 variadic arguments, but 2 were >> provided >> >> >> The failure is caused by non-ASCII chars in the format string of sscanf >> [1][2], which is non-portable on our Windows platform. >> In fact, these non-ASCII coding also triggers C4819 warning, which had been >> disabled in JDK-8216154 [3]. >> And I also found an article showing that sscanf may fail with non-ASCII in >> the format string [4]. >> >> So it would be nice to remove these non-ASCII chars (`\x80 ~ \xef`). >> And I think it's safe to do so. >> >> This is because: >> 1) There are actually no non-ASCII chars for package/class/method/signature >> names. >> 2) I don't think there is a use case, in which people will input non-ASCII >> for `CompileCommand`. >> >> You may argue that the non-ASCII may be used by the parser itself. >> But I didn't find that usage at all. (Please let me know if I miss >> something.) >> >> So I suggest to remove these non-ASCII code to make HotSpot to be more >> portable. >> And if we do so, we can also remove the only one >> `PRAGMA_DISABLE_MSVC_WARNING(4819)` [5]. >> >> Testing: >> - Build tests on Windows >> - tier1~3 on Linux/x64 >> >> Thanks. >> Best regards, >> Jie >> >> [1] >> https://github.com/openjdk/jdk/blob/master/src/hotspot/share/compiler/methodMatcher.cpp#L269 >> [2] >> https://github.com/openjdk/jdk/blob/master/src/hotspot/share/compiler/methodMatcher.cpp#L319 >> [3] >> https://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2019-January/032014.html >> [4] https://jeffpar.github.io/kbarchive/kb/047/Q47369/ >> [5] >> https://github.com/openjdk/jdk/blob/master/src/hotspot/share/compiler/methodMatcher.cpp#L246 > > My experiments above with ` -XX:CompileCommand='compileonly,*::爪哇'` was done > on Linux. I tried doing the same on Windows. On US-English Windows, the > default codepage is 437 (DOS Latin US). If I change it to 65001 (UTF8) then > Java is able to output CJK characters to the console. > > > public class CJK { > public static void main(String args[]) { > System.out.println(args[0]); > \u722a\u54c7(); > } > > static void \u722a\u54c7() { // Chinese word for "Java" > Thread.dumpStack(); > } > } > > > > c:\ade>chcp > Active code page: 437 > > c:\ade>jdk-17\bin\java -cp . CJK 123 > 123 > java.lang.Exception: Stack trace > at java.base/java.lang.Thread.dumpStack(Thread.java:1380) > at CJK.??(CJK.java:8) > at CJK.main(CJK.java:4) > > c:\ade>chcp 65001 > Active code page: 65001 > > c:\ade>jdk-17\bin\java -cp . CJK 爪哇 > ?? > java.lang.Exception: Stack trace > at java.base/java.lang.Thread.dumpStack(Thread.java:1380) > at CJK.爪哇(CJK.java:8) > at CJK.main(CJK.java:4) > > > As you can see, the CJK characters in the command-line arguments can't even > be correctly passed as arguments to the Java main class. If that doesn't > work, I can't see how we can get `-XX:CompileCommand` to work with CJK > characters. Thanks @iklam @magicus and @vnkozlov . - PR: https://git.openjdk.java.net/jdk/pull/5704
Re: RFR: 8274329: Fix non-portable HotSpot code in MethodMatcher::parse_method_pattern [v3]
On Wed, 6 Oct 2021 05:17:28 GMT, Jie Fu wrote: >> Hi all, >> >> I tried to build OpenJDK on Cygwin (Windows 2016 + VS2019). >> However, I failed with C4474 and C4778 warnings as below: >> >> Compiling 100 properties into resource bundles for java.desktop >> Compiling 3038 files for java.base >> e:\jiefu\ws\jdk\src\hotspot\share\compiler\methodMatcher.cpp(269): error >> C2220: the following warning is treated as an error >> e:\jiefu\ws\jdk\src\hotspot\share\compiler\methodMatcher.cpp(269): warning >> C4778: 'sscanf' : unterminated format string >> '%255[*\x01\x02\x03\x04\x05\x06\a\b\n\v\f\r\x0e\x0f\x10\x11\x12\x13\x14\x15\x16\x17\x18\x19\x1a\x1b\x1c\x1d\x1e\x1f!"#$%&'*+,-0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ\^_`abcdefghijklmnopqrstuvwxyz{|}~\xe2\x82\xac\xe4\xba\x97\xe5\x84\x8e\xe5\x8e\x97%n' >> e:\jiefu\ws\jdk\src\hotspot\share\compiler\methodMatcher.cpp(269): warning >> C4474: 'sscanf' : too many arguments passed for format string >> e:\jiefu\ws\jdk\src\hotspot\share\compiler\methodMatcher.cpp(269): note: >> placeholders and their parameters expect 1 variadic arguments, but 3 were >> provided >> e:\jiefu\ws\jdk\src\hotspot\share\compiler\methodMatcher.cpp(319): warning >> C4778: 'sscanf' : unterminated format string >> '%1022[[);/\x01\x02\x03\x04\x05\x06\a\b\n\v\f\r\x0e\x0f\x10\x11\x12\x13\x14\x15\x16\x17\x18\x19\x1a\x1b\x1c\x1d\x1e\x1f!"#$%&'*+,-0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ\^_`abcdefghijklmnopqrstuvwxyz{|}~\xe2\x82\xac\xe4\xba\x97\xe5\x84\x8e\xe5\x8e%n' >> e:\jiefu\ws\jdk\src\hotspot\share\compiler\methodMatcher.cpp(319): warning >> C4474: 'sscanf' : too many arguments passed for format string >> e:\jiefu\ws\jdk\src\hotspot\share\compiler\methodMatcher.cpp(319): note: >> placeholders and their parameters expect 0 variadic arguments, but 2 were >> provided >> >> >> The failure is caused by non-ASCII chars in the format string of sscanf >> [1][2], which is non-portable on our Windows platform. >> In fact, these non-ASCII coding also triggers C4819 warning, which had been >> disabled in JDK-8216154 [3]. >> And I also found an article showing that sscanf may fail with non-ASCII in >> the format string [4]. >> >> So it would be nice to remove these non-ASCII chars (`\x80 ~ \xef`). >> And I think it's safe to do so. >> >> This is because: >> 1) There are actually no non-ASCII chars for package/class/method/signature >> names. >> 2) I don't think there is a use case, in which people will input non-ASCII >> for `CompileCommand`. >> >> You may argue that the non-ASCII may be used by the parser itself. >> But I didn't find that usage at all. (Please let me know if I miss >> something.) >> >> So I suggest to remove these non-ASCII code to make HotSpot to be more >> portable. >> And if we do so, we can also remove the only one >> `PRAGMA_DISABLE_MSVC_WARNING(4819)` [5]. >> >> Testing: >> - Build tests on Windows >> - tier1~3 on Linux/x64 >> >> Thanks. >> Best regards, >> Jie >> >> [1] >> https://github.com/openjdk/jdk/blob/master/src/hotspot/share/compiler/methodMatcher.cpp#L269 >> [2] >> https://github.com/openjdk/jdk/blob/master/src/hotspot/share/compiler/methodMatcher.cpp#L319 >> [3] >> https://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2019-January/032014.html >> [4] https://jeffpar.github.io/kbarchive/kb/047/Q47369/ >> [5] >> https://github.com/openjdk/jdk/blob/master/src/hotspot/share/compiler/methodMatcher.cpp#L246 > > Jie Fu has updated the pull request incrementally with one additional commit > since the last revision: > > Split with RANGEBASE_ASCII and RANGEBASE_NON_ASCII Passed my tier1-3 testing - Marked as reviewed by kvn (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/5704
Re: RFR: 8274329: Fix non-portable HotSpot code in MethodMatcher::parse_method_pattern [v3]
On Wed, 6 Oct 2021 05:17:28 GMT, Jie Fu wrote: >> Hi all, >> >> I tried to build OpenJDK on Cygwin (Windows 2016 + VS2019). >> However, I failed with C4474 and C4778 warnings as below: >> >> Compiling 100 properties into resource bundles for java.desktop >> Compiling 3038 files for java.base >> e:\jiefu\ws\jdk\src\hotspot\share\compiler\methodMatcher.cpp(269): error >> C2220: the following warning is treated as an error >> e:\jiefu\ws\jdk\src\hotspot\share\compiler\methodMatcher.cpp(269): warning >> C4778: 'sscanf' : unterminated format string >> '%255[*\x01\x02\x03\x04\x05\x06\a\b\n\v\f\r\x0e\x0f\x10\x11\x12\x13\x14\x15\x16\x17\x18\x19\x1a\x1b\x1c\x1d\x1e\x1f!"#$%&'*+,-0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ\^_`abcdefghijklmnopqrstuvwxyz{|}~\xe2\x82\xac\xe4\xba\x97\xe5\x84\x8e\xe5\x8e\x97%n' >> e:\jiefu\ws\jdk\src\hotspot\share\compiler\methodMatcher.cpp(269): warning >> C4474: 'sscanf' : too many arguments passed for format string >> e:\jiefu\ws\jdk\src\hotspot\share\compiler\methodMatcher.cpp(269): note: >> placeholders and their parameters expect 1 variadic arguments, but 3 were >> provided >> e:\jiefu\ws\jdk\src\hotspot\share\compiler\methodMatcher.cpp(319): warning >> C4778: 'sscanf' : unterminated format string >> '%1022[[);/\x01\x02\x03\x04\x05\x06\a\b\n\v\f\r\x0e\x0f\x10\x11\x12\x13\x14\x15\x16\x17\x18\x19\x1a\x1b\x1c\x1d\x1e\x1f!"#$%&'*+,-0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ\^_`abcdefghijklmnopqrstuvwxyz{|}~\xe2\x82\xac\xe4\xba\x97\xe5\x84\x8e\xe5\x8e%n' >> e:\jiefu\ws\jdk\src\hotspot\share\compiler\methodMatcher.cpp(319): warning >> C4474: 'sscanf' : too many arguments passed for format string >> e:\jiefu\ws\jdk\src\hotspot\share\compiler\methodMatcher.cpp(319): note: >> placeholders and their parameters expect 0 variadic arguments, but 2 were >> provided >> >> >> The failure is caused by non-ASCII chars in the format string of sscanf >> [1][2], which is non-portable on our Windows platform. >> In fact, these non-ASCII coding also triggers C4819 warning, which had been >> disabled in JDK-8216154 [3]. >> And I also found an article showing that sscanf may fail with non-ASCII in >> the format string [4]. >> >> So it would be nice to remove these non-ASCII chars (`\x80 ~ \xef`). >> And I think it's safe to do so. >> >> This is because: >> 1) There are actually no non-ASCII chars for package/class/method/signature >> names. >> 2) I don't think there is a use case, in which people will input non-ASCII >> for `CompileCommand`. >> >> You may argue that the non-ASCII may be used by the parser itself. >> But I didn't find that usage at all. (Please let me know if I miss >> something.) >> >> So I suggest to remove these non-ASCII code to make HotSpot to be more >> portable. >> And if we do so, we can also remove the only one >> `PRAGMA_DISABLE_MSVC_WARNING(4819)` [5]. >> >> Testing: >> - Build tests on Windows >> - tier1~3 on Linux/x64 >> >> Thanks. >> Best regards, >> Jie >> >> [1] >> https://github.com/openjdk/jdk/blob/master/src/hotspot/share/compiler/methodMatcher.cpp#L269 >> [2] >> https://github.com/openjdk/jdk/blob/master/src/hotspot/share/compiler/methodMatcher.cpp#L319 >> [3] >> https://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2019-January/032014.html >> [4] https://jeffpar.github.io/kbarchive/kb/047/Q47369/ >> [5] >> https://github.com/openjdk/jdk/blob/master/src/hotspot/share/compiler/methodMatcher.cpp#L246 > > Jie Fu has updated the pull request incrementally with one additional commit > since the last revision: > > Split with RANGEBASE_ASCII and RANGEBASE_NON_ASCII Looks good to me. Let me test it before approval. - PR: https://git.openjdk.java.net/jdk/pull/5704
Re: RFR: 8274329: Fix non-portable HotSpot code in MethodMatcher::parse_method_pattern [v3]
On Wed, 6 Oct 2021 05:17:28 GMT, Jie Fu wrote: >> Hi all, >> >> I tried to build OpenJDK on Cygwin (Windows 2016 + VS2019). >> However, I failed with C4474 and C4778 warnings as below: >> >> Compiling 100 properties into resource bundles for java.desktop >> Compiling 3038 files for java.base >> e:\jiefu\ws\jdk\src\hotspot\share\compiler\methodMatcher.cpp(269): error >> C2220: the following warning is treated as an error >> e:\jiefu\ws\jdk\src\hotspot\share\compiler\methodMatcher.cpp(269): warning >> C4778: 'sscanf' : unterminated format string >> '%255[*\x01\x02\x03\x04\x05\x06\a\b\n\v\f\r\x0e\x0f\x10\x11\x12\x13\x14\x15\x16\x17\x18\x19\x1a\x1b\x1c\x1d\x1e\x1f!"#$%&'*+,-0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ\^_`abcdefghijklmnopqrstuvwxyz{|}~\xe2\x82\xac\xe4\xba\x97\xe5\x84\x8e\xe5\x8e\x97%n' >> e:\jiefu\ws\jdk\src\hotspot\share\compiler\methodMatcher.cpp(269): warning >> C4474: 'sscanf' : too many arguments passed for format string >> e:\jiefu\ws\jdk\src\hotspot\share\compiler\methodMatcher.cpp(269): note: >> placeholders and their parameters expect 1 variadic arguments, but 3 were >> provided >> e:\jiefu\ws\jdk\src\hotspot\share\compiler\methodMatcher.cpp(319): warning >> C4778: 'sscanf' : unterminated format string >> '%1022[[);/\x01\x02\x03\x04\x05\x06\a\b\n\v\f\r\x0e\x0f\x10\x11\x12\x13\x14\x15\x16\x17\x18\x19\x1a\x1b\x1c\x1d\x1e\x1f!"#$%&'*+,-0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ\^_`abcdefghijklmnopqrstuvwxyz{|}~\xe2\x82\xac\xe4\xba\x97\xe5\x84\x8e\xe5\x8e%n' >> e:\jiefu\ws\jdk\src\hotspot\share\compiler\methodMatcher.cpp(319): warning >> C4474: 'sscanf' : too many arguments passed for format string >> e:\jiefu\ws\jdk\src\hotspot\share\compiler\methodMatcher.cpp(319): note: >> placeholders and their parameters expect 0 variadic arguments, but 2 were >> provided >> >> >> The failure is caused by non-ASCII chars in the format string of sscanf >> [1][2], which is non-portable on our Windows platform. >> In fact, these non-ASCII coding also triggers C4819 warning, which had been >> disabled in JDK-8216154 [3]. >> And I also found an article showing that sscanf may fail with non-ASCII in >> the format string [4]. >> >> So it would be nice to remove these non-ASCII chars (`\x80 ~ \xef`). >> And I think it's safe to do so. >> >> This is because: >> 1) There are actually no non-ASCII chars for package/class/method/signature >> names. >> 2) I don't think there is a use case, in which people will input non-ASCII >> for `CompileCommand`. >> >> You may argue that the non-ASCII may be used by the parser itself. >> But I didn't find that usage at all. (Please let me know if I miss >> something.) >> >> So I suggest to remove these non-ASCII code to make HotSpot to be more >> portable. >> And if we do so, we can also remove the only one >> `PRAGMA_DISABLE_MSVC_WARNING(4819)` [5]. >> >> Testing: >> - Build tests on Windows >> - tier1~3 on Linux/x64 >> >> Thanks. >> Best regards, >> Jie >> >> [1] >> https://github.com/openjdk/jdk/blob/master/src/hotspot/share/compiler/methodMatcher.cpp#L269 >> [2] >> https://github.com/openjdk/jdk/blob/master/src/hotspot/share/compiler/methodMatcher.cpp#L319 >> [3] >> https://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2019-January/032014.html >> [4] https://jeffpar.github.io/kbarchive/kb/047/Q47369/ >> [5] >> https://github.com/openjdk/jdk/blob/master/src/hotspot/share/compiler/methodMatcher.cpp#L246 > > Jie Fu has updated the pull request incrementally with one additional commit > since the last revision: > > Split with RANGEBASE_ASCII and RANGEBASE_NON_ASCII I think this was the best possible solution. - Marked as reviewed by ihse (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/5704
Re: RFR: 8274329: Fix non-portable HotSpot code in MethodMatcher::parse_method_pattern [v3]
On Wed, 6 Oct 2021 05:17:28 GMT, Jie Fu wrote: >> Hi all, >> >> I tried to build OpenJDK on Cygwin (Windows 2016 + VS2019). >> However, I failed with C4474 and C4778 warnings as below: >> >> Compiling 100 properties into resource bundles for java.desktop >> Compiling 3038 files for java.base >> e:\jiefu\ws\jdk\src\hotspot\share\compiler\methodMatcher.cpp(269): error >> C2220: the following warning is treated as an error >> e:\jiefu\ws\jdk\src\hotspot\share\compiler\methodMatcher.cpp(269): warning >> C4778: 'sscanf' : unterminated format string >> '%255[*\x01\x02\x03\x04\x05\x06\a\b\n\v\f\r\x0e\x0f\x10\x11\x12\x13\x14\x15\x16\x17\x18\x19\x1a\x1b\x1c\x1d\x1e\x1f!"#$%&'*+,-0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ\^_`abcdefghijklmnopqrstuvwxyz{|}~\xe2\x82\xac\xe4\xba\x97\xe5\x84\x8e\xe5\x8e\x97%n' >> e:\jiefu\ws\jdk\src\hotspot\share\compiler\methodMatcher.cpp(269): warning >> C4474: 'sscanf' : too many arguments passed for format string >> e:\jiefu\ws\jdk\src\hotspot\share\compiler\methodMatcher.cpp(269): note: >> placeholders and their parameters expect 1 variadic arguments, but 3 were >> provided >> e:\jiefu\ws\jdk\src\hotspot\share\compiler\methodMatcher.cpp(319): warning >> C4778: 'sscanf' : unterminated format string >> '%1022[[);/\x01\x02\x03\x04\x05\x06\a\b\n\v\f\r\x0e\x0f\x10\x11\x12\x13\x14\x15\x16\x17\x18\x19\x1a\x1b\x1c\x1d\x1e\x1f!"#$%&'*+,-0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ\^_`abcdefghijklmnopqrstuvwxyz{|}~\xe2\x82\xac\xe4\xba\x97\xe5\x84\x8e\xe5\x8e%n' >> e:\jiefu\ws\jdk\src\hotspot\share\compiler\methodMatcher.cpp(319): warning >> C4474: 'sscanf' : too many arguments passed for format string >> e:\jiefu\ws\jdk\src\hotspot\share\compiler\methodMatcher.cpp(319): note: >> placeholders and their parameters expect 0 variadic arguments, but 2 were >> provided >> >> >> The failure is caused by non-ASCII chars in the format string of sscanf >> [1][2], which is non-portable on our Windows platform. >> In fact, these non-ASCII coding also triggers C4819 warning, which had been >> disabled in JDK-8216154 [3]. >> And I also found an article showing that sscanf may fail with non-ASCII in >> the format string [4]. >> >> So it would be nice to remove these non-ASCII chars (`\x80 ~ \xef`). >> And I think it's safe to do so. >> >> This is because: >> 1) There are actually no non-ASCII chars for package/class/method/signature >> names. >> 2) I don't think there is a use case, in which people will input non-ASCII >> for `CompileCommand`. >> >> You may argue that the non-ASCII may be used by the parser itself. >> But I didn't find that usage at all. (Please let me know if I miss >> something.) >> >> So I suggest to remove these non-ASCII code to make HotSpot to be more >> portable. >> And if we do so, we can also remove the only one >> `PRAGMA_DISABLE_MSVC_WARNING(4819)` [5]. >> >> Testing: >> - Build tests on Windows >> - tier1~3 on Linux/x64 >> >> Thanks. >> Best regards, >> Jie >> >> [1] >> https://github.com/openjdk/jdk/blob/master/src/hotspot/share/compiler/methodMatcher.cpp#L269 >> [2] >> https://github.com/openjdk/jdk/blob/master/src/hotspot/share/compiler/methodMatcher.cpp#L319 >> [3] >> https://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2019-January/032014.html >> [4] https://jeffpar.github.io/kbarchive/kb/047/Q47369/ >> [5] >> https://github.com/openjdk/jdk/blob/master/src/hotspot/share/compiler/methodMatcher.cpp#L246 > > Jie Fu has updated the pull request incrementally with one additional commit > since the last revision: > > Split with RANGEBASE_ASCII and RANGEBASE_NON_ASCII Marked as reviewed by iklam (Reviewer). - PR: https://git.openjdk.java.net/jdk/pull/5704
Re: RFR: 8274329: Fix non-portable HotSpot code in MethodMatcher::parse_method_pattern [v3]
> Hi all, > > I tried to build OpenJDK on Cygwin (Windows 2016 + VS2019). > However, I failed with C4474 and C4778 warnings as below: > > Compiling 100 properties into resource bundles for java.desktop > Compiling 3038 files for java.base > e:\jiefu\ws\jdk\src\hotspot\share\compiler\methodMatcher.cpp(269): error > C2220: the following warning is treated as an error > e:\jiefu\ws\jdk\src\hotspot\share\compiler\methodMatcher.cpp(269): warning > C4778: 'sscanf' : unterminated format string > '%255[*\x01\x02\x03\x04\x05\x06\a\b\n\v\f\r\x0e\x0f\x10\x11\x12\x13\x14\x15\x16\x17\x18\x19\x1a\x1b\x1c\x1d\x1e\x1f!"#$%&'*+,-0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ\^_`abcdefghijklmnopqrstuvwxyz{|}~\xe2\x82\xac\xe4\xba\x97\xe5\x84\x8e\xe5\x8e\x97%n' > e:\jiefu\ws\jdk\src\hotspot\share\compiler\methodMatcher.cpp(269): warning > C4474: 'sscanf' : too many arguments passed for format string > e:\jiefu\ws\jdk\src\hotspot\share\compiler\methodMatcher.cpp(269): note: > placeholders and their parameters expect 1 variadic arguments, but 3 were > provided > e:\jiefu\ws\jdk\src\hotspot\share\compiler\methodMatcher.cpp(319): warning > C4778: 'sscanf' : unterminated format string > '%1022[[);/\x01\x02\x03\x04\x05\x06\a\b\n\v\f\r\x0e\x0f\x10\x11\x12\x13\x14\x15\x16\x17\x18\x19\x1a\x1b\x1c\x1d\x1e\x1f!"#$%&'*+,-0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ\^_`abcdefghijklmnopqrstuvwxyz{|}~\xe2\x82\xac\xe4\xba\x97\xe5\x84\x8e\xe5\x8e%n' > e:\jiefu\ws\jdk\src\hotspot\share\compiler\methodMatcher.cpp(319): warning > C4474: 'sscanf' : too many arguments passed for format string > e:\jiefu\ws\jdk\src\hotspot\share\compiler\methodMatcher.cpp(319): note: > placeholders and their parameters expect 0 variadic arguments, but 2 were > provided > > > The failure is caused by non-ASCII chars in the format string of sscanf > [1][2], which is non-portable on our Windows platform. > In fact, these non-ASCII coding also triggers C4819 warning, which had been > disabled in JDK-8216154 [3]. > And I also found an article showing that sscanf may fail with non-ASCII in > the format string [4]. > > So it would be nice to remove these non-ASCII chars (`\x80 ~ \xef`). > And I think it's safe to do so. > > This is because: > 1) There are actually no non-ASCII chars for package/class/method/signature > names. > 2) I don't think there is a use case, in which people will input non-ASCII > for `CompileCommand`. > > You may argue that the non-ASCII may be used by the parser itself. > But I didn't find that usage at all. (Please let me know if I miss something.) > > So I suggest to remove these non-ASCII code to make HotSpot to be more > portable. > And if we do so, we can also remove the only one > `PRAGMA_DISABLE_MSVC_WARNING(4819)` [5]. > > Testing: > - Build tests on Windows > - tier1~3 on Linux/x64 > > Thanks. > Best regards, > Jie > > [1] > https://github.com/openjdk/jdk/blob/master/src/hotspot/share/compiler/methodMatcher.cpp#L269 > [2] > https://github.com/openjdk/jdk/blob/master/src/hotspot/share/compiler/methodMatcher.cpp#L319 > [3] > https://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2019-January/032014.html > [4] https://jeffpar.github.io/kbarchive/kb/047/Q47369/ > [5] > https://github.com/openjdk/jdk/blob/master/src/hotspot/share/compiler/methodMatcher.cpp#L246 Jie Fu has updated the pull request incrementally with one additional commit since the last revision: Split with RANGEBASE_ASCII and RANGEBASE_NON_ASCII - Changes: - all: https://git.openjdk.java.net/jdk/pull/5704/files - new: https://git.openjdk.java.net/jdk/pull/5704/files/e1271085..d0070680 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=5704&range=02 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=5704&range=01-02 Stats: 15 lines in 1 file changed: 1 ins; 9 del; 5 mod Patch: https://git.openjdk.java.net/jdk/pull/5704.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5704/head:pull/5704 PR: https://git.openjdk.java.net/jdk/pull/5704
Re: RFR: 8274329: Fix non-portable HotSpot code in MethodMatcher::parse_method_pattern [v2]
On Wed, 6 Oct 2021 04:30:12 GMT, Ioi Lam wrote: > It's hard to tell what's the difference between these two RANGEBASE > definitions. How about doing it like this to make the code more readable? > > ``` > #define RANGEBASE_ASCII "." > #define RANGEBASE_NON_ASCII "." > #ifdef WINDOWS > #define RANGEBASE RANGEBASE_ASCII > #else > #define RANGEBASE RANGEBASE_ASCII RANGEBASE_NON_ASCII > #endif > ``` Good suggestion! Updated. Thanks. - PR: https://git.openjdk.java.net/jdk/pull/5704
Re: RFR: 8274329: Fix non-portable HotSpot code in MethodMatcher::parse_method_pattern [v2]
On Wed, 6 Oct 2021 02:33:30 GMT, Jie Fu wrote: >> Hi all, >> >> I tried to build OpenJDK on Cygwin (Windows 2016 + VS2019). >> However, I failed with C4474 and C4778 warnings as below: >> >> Compiling 100 properties into resource bundles for java.desktop >> Compiling 3038 files for java.base >> e:\jiefu\ws\jdk\src\hotspot\share\compiler\methodMatcher.cpp(269): error >> C2220: the following warning is treated as an error >> e:\jiefu\ws\jdk\src\hotspot\share\compiler\methodMatcher.cpp(269): warning >> C4778: 'sscanf' : unterminated format string >> '%255[*\x01\x02\x03\x04\x05\x06\a\b\n\v\f\r\x0e\x0f\x10\x11\x12\x13\x14\x15\x16\x17\x18\x19\x1a\x1b\x1c\x1d\x1e\x1f!"#$%&'*+,-0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ\^_`abcdefghijklmnopqrstuvwxyz{|}~\xe2\x82\xac\xe4\xba\x97\xe5\x84\x8e\xe5\x8e\x97%n' >> e:\jiefu\ws\jdk\src\hotspot\share\compiler\methodMatcher.cpp(269): warning >> C4474: 'sscanf' : too many arguments passed for format string >> e:\jiefu\ws\jdk\src\hotspot\share\compiler\methodMatcher.cpp(269): note: >> placeholders and their parameters expect 1 variadic arguments, but 3 were >> provided >> e:\jiefu\ws\jdk\src\hotspot\share\compiler\methodMatcher.cpp(319): warning >> C4778: 'sscanf' : unterminated format string >> '%1022[[);/\x01\x02\x03\x04\x05\x06\a\b\n\v\f\r\x0e\x0f\x10\x11\x12\x13\x14\x15\x16\x17\x18\x19\x1a\x1b\x1c\x1d\x1e\x1f!"#$%&'*+,-0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ\^_`abcdefghijklmnopqrstuvwxyz{|}~\xe2\x82\xac\xe4\xba\x97\xe5\x84\x8e\xe5\x8e%n' >> e:\jiefu\ws\jdk\src\hotspot\share\compiler\methodMatcher.cpp(319): warning >> C4474: 'sscanf' : too many arguments passed for format string >> e:\jiefu\ws\jdk\src\hotspot\share\compiler\methodMatcher.cpp(319): note: >> placeholders and their parameters expect 0 variadic arguments, but 2 were >> provided >> >> >> The failure is caused by non-ASCII chars in the format string of sscanf >> [1][2], which is non-portable on our Windows platform. >> In fact, these non-ASCII coding also triggers C4819 warning, which had been >> disabled in JDK-8216154 [3]. >> And I also found an article showing that sscanf may fail with non-ASCII in >> the format string [4]. >> >> So it would be nice to remove these non-ASCII chars (`\x80 ~ \xef`). >> And I think it's safe to do so. >> >> This is because: >> 1) There are actually no non-ASCII chars for package/class/method/signature >> names. >> 2) I don't think there is a use case, in which people will input non-ASCII >> for `CompileCommand`. >> >> You may argue that the non-ASCII may be used by the parser itself. >> But I didn't find that usage at all. (Please let me know if I miss >> something.) >> >> So I suggest to remove these non-ASCII code to make HotSpot to be more >> portable. >> And if we do so, we can also remove the only one >> `PRAGMA_DISABLE_MSVC_WARNING(4819)` [5]. >> >> Testing: >> - Build tests on Windows >> - tier1~3 on Linux/x64 >> >> Thanks. >> Best regards, >> Jie >> >> [1] >> https://github.com/openjdk/jdk/blob/master/src/hotspot/share/compiler/methodMatcher.cpp#L269 >> [2] >> https://github.com/openjdk/jdk/blob/master/src/hotspot/share/compiler/methodMatcher.cpp#L319 >> [3] >> https://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2019-January/032014.html >> [4] https://jeffpar.github.io/kbarchive/kb/047/Q47369/ >> [5] >> https://github.com/openjdk/jdk/blob/master/src/hotspot/share/compiler/methodMatcher.cpp#L246 > > Jie Fu has updated the pull request incrementally with one additional commit > since the last revision: > > Disable non-ASCII for Windows only The idea looks good to me. I just have a suggestion to make the code more readable. src/hotspot/share/compiler/methodMatcher.cpp line 77: > 75: "\x60\x61\x62\x63\x64\x65\x66\x67\x68\x69\x6a\x6b\x6c\x6d\x6e\x6f" \ > 76: "\x70\x71\x72\x73\x74\x75\x76\x77\x78\x79\x7a\x7b\x7c\x7d\x7e\x7f" > 77: #endif It's hard to tell what's the difference between these two RANGEBASE definitions. How about doing it like this to make the code more readable? #define RANGEBASE_ASCII "." #define RANGEBASE_NON_ASCII "." #ifdef WINDOWS #define RANGEBASE RANGEBASE_ASCII #else #define RANGEBASE RANGEBASE_ASCII RANGEBASE_NON_ASCII #endif - Marked as reviewed by iklam (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/5704
Re: RFR: 8274329: Fix non-portable HotSpot code in MethodMatcher::parse_method_pattern
On Tue, 5 Oct 2021 06:38:05 GMT, Ioi Lam wrote: >> Hi all, >> >> I tried to build OpenJDK on Cygwin (Windows 2016 + VS2019). >> However, I failed with C4474 and C4778 warnings as below: >> >> Compiling 100 properties into resource bundles for java.desktop >> Compiling 3038 files for java.base >> e:\jiefu\ws\jdk\src\hotspot\share\compiler\methodMatcher.cpp(269): error >> C2220: the following warning is treated as an error >> e:\jiefu\ws\jdk\src\hotspot\share\compiler\methodMatcher.cpp(269): warning >> C4778: 'sscanf' : unterminated format string >> '%255[*\x01\x02\x03\x04\x05\x06\a\b\n\v\f\r\x0e\x0f\x10\x11\x12\x13\x14\x15\x16\x17\x18\x19\x1a\x1b\x1c\x1d\x1e\x1f!"#$%&'*+,-0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ\^_`abcdefghijklmnopqrstuvwxyz{|}~\xe2\x82\xac\xe4\xba\x97\xe5\x84\x8e\xe5\x8e\x97%n' >> e:\jiefu\ws\jdk\src\hotspot\share\compiler\methodMatcher.cpp(269): warning >> C4474: 'sscanf' : too many arguments passed for format string >> e:\jiefu\ws\jdk\src\hotspot\share\compiler\methodMatcher.cpp(269): note: >> placeholders and their parameters expect 1 variadic arguments, but 3 were >> provided >> e:\jiefu\ws\jdk\src\hotspot\share\compiler\methodMatcher.cpp(319): warning >> C4778: 'sscanf' : unterminated format string >> '%1022[[);/\x01\x02\x03\x04\x05\x06\a\b\n\v\f\r\x0e\x0f\x10\x11\x12\x13\x14\x15\x16\x17\x18\x19\x1a\x1b\x1c\x1d\x1e\x1f!"#$%&'*+,-0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ\^_`abcdefghijklmnopqrstuvwxyz{|}~\xe2\x82\xac\xe4\xba\x97\xe5\x84\x8e\xe5\x8e%n' >> e:\jiefu\ws\jdk\src\hotspot\share\compiler\methodMatcher.cpp(319): warning >> C4474: 'sscanf' : too many arguments passed for format string >> e:\jiefu\ws\jdk\src\hotspot\share\compiler\methodMatcher.cpp(319): note: >> placeholders and their parameters expect 0 variadic arguments, but 2 were >> provided >> >> >> The failure is caused by non-ASCII chars in the format string of sscanf >> [1][2], which is non-portable on our Windows platform. >> In fact, these non-ASCII coding also triggers C4819 warning, which had been >> disabled in JDK-8216154 [3]. >> And I also found an article showing that sscanf may fail with non-ASCII in >> the format string [4]. >> >> So it would be nice to remove these non-ASCII chars (`\x80 ~ \xef`). >> And I think it's safe to do so. >> >> This is because: >> 1) There are actually no non-ASCII chars for package/class/method/signature >> names. >> 2) I don't think there is a use case, in which people will input non-ASCII >> for `CompileCommand`. >> >> You may argue that the non-ASCII may be used by the parser itself. >> But I didn't find that usage at all. (Please let me know if I miss >> something.) >> >> So I suggest to remove these non-ASCII code to make HotSpot to be more >> portable. >> And if we do so, we can also remove the only one >> `PRAGMA_DISABLE_MSVC_WARNING(4819)` [5]. >> >> Testing: >> - Build tests on Windows >> - tier1~3 on Linux/x64 >> >> Thanks. >> Best regards, >> Jie >> >> [1] >> https://github.com/openjdk/jdk/blob/master/src/hotspot/share/compiler/methodMatcher.cpp#L269 >> [2] >> https://github.com/openjdk/jdk/blob/master/src/hotspot/share/compiler/methodMatcher.cpp#L319 >> [3] >> https://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2019-January/032014.html >> [4] https://jeffpar.github.io/kbarchive/kb/047/Q47369/ >> [5] >> https://github.com/openjdk/jdk/blob/master/src/hotspot/share/compiler/methodMatcher.cpp#L246 > > My experiments above with ` -XX:CompileCommand='compileonly,*::爪哇'` was done > on Linux. I tried doing the same on Windows. On US-English Windows, the > default codepage is 437 (DOS Latin US). If I change it to 65001 (UTF8) then > Java is able to output CJK characters to the console. > > > public class CJK { > public static void main(String args[]) { > System.out.println(args[0]); > \u722a\u54c7(); > } > > static void \u722a\u54c7() { // Chinese word for "Java" > Thread.dumpStack(); > } > } > > > > c:\ade>chcp > Active code page: 437 > > c:\ade>jdk-17\bin\java -cp . CJK 123 > 123 > java.lang.Exception: Stack trace > at java.base/java.lang.Thread.dumpStack(Thread.java:1380) > at CJK.??(CJK.java:8) > at CJK.main(CJK.java:4) > > c:\ade>chcp 65001 > Active code page: 65001 > > c:\ade>jdk-17\bin\java -cp . CJK 爪哇 > ?? > java.lang.Exception: Stack trace > at java.base/java.lang.Thread.dumpStack(Thread.java:1380) > at CJK.爪哇(CJK.java:8) > at CJK.main(CJK.java:4) > > > As you can see, the CJK characters in the command-line arguments can't even > be correctly passed as arguments to the Java main class. If that doesn't > work, I can't see how we can get `-XX:CompileCommand` to work with CJK > characters. Thanks @iklam and @magicus for your experiments and comments. My experiments show that CompileCommand doesn't work with non-US-English env Windows. And @iklam 's experiments show that it doesn't work with US-English env Windows
Re: RFR: 8274329: Fix non-portable HotSpot code in MethodMatcher::parse_method_pattern [v2]
> Hi all, > > I tried to build OpenJDK on Cygwin (Windows 2016 + VS2019). > However, I failed with C4474 and C4778 warnings as below: > > Compiling 100 properties into resource bundles for java.desktop > Compiling 3038 files for java.base > e:\jiefu\ws\jdk\src\hotspot\share\compiler\methodMatcher.cpp(269): error > C2220: the following warning is treated as an error > e:\jiefu\ws\jdk\src\hotspot\share\compiler\methodMatcher.cpp(269): warning > C4778: 'sscanf' : unterminated format string > '%255[*\x01\x02\x03\x04\x05\x06\a\b\n\v\f\r\x0e\x0f\x10\x11\x12\x13\x14\x15\x16\x17\x18\x19\x1a\x1b\x1c\x1d\x1e\x1f!"#$%&'*+,-0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ\^_`abcdefghijklmnopqrstuvwxyz{|}~\xe2\x82\xac\xe4\xba\x97\xe5\x84\x8e\xe5\x8e\x97%n' > e:\jiefu\ws\jdk\src\hotspot\share\compiler\methodMatcher.cpp(269): warning > C4474: 'sscanf' : too many arguments passed for format string > e:\jiefu\ws\jdk\src\hotspot\share\compiler\methodMatcher.cpp(269): note: > placeholders and their parameters expect 1 variadic arguments, but 3 were > provided > e:\jiefu\ws\jdk\src\hotspot\share\compiler\methodMatcher.cpp(319): warning > C4778: 'sscanf' : unterminated format string > '%1022[[);/\x01\x02\x03\x04\x05\x06\a\b\n\v\f\r\x0e\x0f\x10\x11\x12\x13\x14\x15\x16\x17\x18\x19\x1a\x1b\x1c\x1d\x1e\x1f!"#$%&'*+,-0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ\^_`abcdefghijklmnopqrstuvwxyz{|}~\xe2\x82\xac\xe4\xba\x97\xe5\x84\x8e\xe5\x8e%n' > e:\jiefu\ws\jdk\src\hotspot\share\compiler\methodMatcher.cpp(319): warning > C4474: 'sscanf' : too many arguments passed for format string > e:\jiefu\ws\jdk\src\hotspot\share\compiler\methodMatcher.cpp(319): note: > placeholders and their parameters expect 0 variadic arguments, but 2 were > provided > > > The failure is caused by non-ASCII chars in the format string of sscanf > [1][2], which is non-portable on our Windows platform. > In fact, these non-ASCII coding also triggers C4819 warning, which had been > disabled in JDK-8216154 [3]. > And I also found an article showing that sscanf may fail with non-ASCII in > the format string [4]. > > So it would be nice to remove these non-ASCII chars (`\x80 ~ \xef`). > And I think it's safe to do so. > > This is because: > 1) There are actually no non-ASCII chars for package/class/method/signature > names. > 2) I don't think there is a use case, in which people will input non-ASCII > for `CompileCommand`. > > You may argue that the non-ASCII may be used by the parser itself. > But I didn't find that usage at all. (Please let me know if I miss something.) > > So I suggest to remove these non-ASCII code to make HotSpot to be more > portable. > And if we do so, we can also remove the only one > `PRAGMA_DISABLE_MSVC_WARNING(4819)` [5]. > > Testing: > - Build tests on Windows > - tier1~3 on Linux/x64 > > Thanks. > Best regards, > Jie > > [1] > https://github.com/openjdk/jdk/blob/master/src/hotspot/share/compiler/methodMatcher.cpp#L269 > [2] > https://github.com/openjdk/jdk/blob/master/src/hotspot/share/compiler/methodMatcher.cpp#L319 > [3] > https://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2019-January/032014.html > [4] https://jeffpar.github.io/kbarchive/kb/047/Q47369/ > [5] > https://github.com/openjdk/jdk/blob/master/src/hotspot/share/compiler/methodMatcher.cpp#L246 Jie Fu has updated the pull request incrementally with one additional commit since the last revision: Disable non-ASCII for Windows only - Changes: - all: https://git.openjdk.java.net/jdk/pull/5704/files - new: https://git.openjdk.java.net/jdk/pull/5704/files/d4b84f2b..e1271085 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=5704&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=5704&range=00-01 Stats: 30 lines in 1 file changed: 30 ins; 0 del; 0 mod Patch: https://git.openjdk.java.net/jdk/pull/5704.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5704/head:pull/5704 PR: https://git.openjdk.java.net/jdk/pull/5704
Re: RFR: 8274329: Fix non-portable HotSpot code in MethodMatcher::parse_method_pattern
On 2021-10-05 08:41, Ioi Lam wrote: As you can see, the CJK characters in the command-line arguments can't even be correctly passed as arguments to the Java main class. If that doesn't work, I can't see how we can get `-XX:CompileCommand` to work with CJK characters. So, what does that mean? That we should explicitly limit `-XX:CompileCommand`to work with ASCII-only arguments? I accept that we might not get all characters to work in all circumstances due to limitations in Windows, but the current state of affairs still feel unsatisfactory. We should at least have a better failure mode, and document any limitations. /Magnus
Re: RFR: 8274329: Fix non-portable HotSpot code in MethodMatcher::parse_method_pattern
On Sun, 26 Sep 2021 09:55:00 GMT, Jie Fu wrote: > Hi all, > > I tried to build OpenJDK on Cygwin (Windows 2016 + VS2019). > However, I failed with C4474 and C4778 warnings as below: > > Compiling 100 properties into resource bundles for java.desktop > Compiling 3038 files for java.base > e:\jiefu\ws\jdk\src\hotspot\share\compiler\methodMatcher.cpp(269): error > C2220: the following warning is treated as an error > e:\jiefu\ws\jdk\src\hotspot\share\compiler\methodMatcher.cpp(269): warning > C4778: 'sscanf' : unterminated format string > '%255[*\x01\x02\x03\x04\x05\x06\a\b\n\v\f\r\x0e\x0f\x10\x11\x12\x13\x14\x15\x16\x17\x18\x19\x1a\x1b\x1c\x1d\x1e\x1f!"#$%&'*+,-0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ\^_`abcdefghijklmnopqrstuvwxyz{|}~\xe2\x82\xac\xe4\xba\x97\xe5\x84\x8e\xe5\x8e\x97%n' > e:\jiefu\ws\jdk\src\hotspot\share\compiler\methodMatcher.cpp(269): warning > C4474: 'sscanf' : too many arguments passed for format string > e:\jiefu\ws\jdk\src\hotspot\share\compiler\methodMatcher.cpp(269): note: > placeholders and their parameters expect 1 variadic arguments, but 3 were > provided > e:\jiefu\ws\jdk\src\hotspot\share\compiler\methodMatcher.cpp(319): warning > C4778: 'sscanf' : unterminated format string > '%1022[[);/\x01\x02\x03\x04\x05\x06\a\b\n\v\f\r\x0e\x0f\x10\x11\x12\x13\x14\x15\x16\x17\x18\x19\x1a\x1b\x1c\x1d\x1e\x1f!"#$%&'*+,-0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ\^_`abcdefghijklmnopqrstuvwxyz{|}~\xe2\x82\xac\xe4\xba\x97\xe5\x84\x8e\xe5\x8e%n' > e:\jiefu\ws\jdk\src\hotspot\share\compiler\methodMatcher.cpp(319): warning > C4474: 'sscanf' : too many arguments passed for format string > e:\jiefu\ws\jdk\src\hotspot\share\compiler\methodMatcher.cpp(319): note: > placeholders and their parameters expect 0 variadic arguments, but 2 were > provided > > > The failure is caused by non-ASCII chars in the format string of sscanf > [1][2], which is non-portable on our Windows platform. > In fact, these non-ASCII coding also triggers C4819 warning, which had been > disabled in JDK-8216154 [3]. > And I also found an article showing that sscanf may fail with non-ASCII in > the format string [4]. > > So it would be nice to remove these non-ASCII chars (`\x80 ~ \xef`). > And I think it's safe to do so. > > This is because: > 1) There are actually no non-ASCII chars for package/class/method/signature > names. > 2) I don't think there is a use case, in which people will input non-ASCII > for `CompileCommand`. > > You may argue that the non-ASCII may be used by the parser itself. > But I didn't find that usage at all. (Please let me know if I miss something.) > > So I suggest to remove these non-ASCII code to make HotSpot to be more > portable. > And if we do so, we can also remove the only one > `PRAGMA_DISABLE_MSVC_WARNING(4819)` [5]. > > Testing: > - Build tests on Windows > - tier1~3 on Linux/x64 > > Thanks. > Best regards, > Jie > > [1] > https://github.com/openjdk/jdk/blob/master/src/hotspot/share/compiler/methodMatcher.cpp#L269 > [2] > https://github.com/openjdk/jdk/blob/master/src/hotspot/share/compiler/methodMatcher.cpp#L319 > [3] > https://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2019-January/032014.html > [4] https://jeffpar.github.io/kbarchive/kb/047/Q47369/ > [5] > https://github.com/openjdk/jdk/blob/master/src/hotspot/share/compiler/methodMatcher.cpp#L246 My experiments above with ` -XX:CompileCommand='compileonly,*::爪哇'` was done on Linux. I tried doing the same on Windows. On US-English Windows, the default codepage is 437 (DOS Latin US). If I change it to 65001 (UTF8) then Java is able to output CJK characters to the console. public class CJK { public static void main(String args[]) { System.out.println(args[0]); \u722a\u54c7(); } static void \u722a\u54c7() { // Chinese word for "Java" Thread.dumpStack(); } } c:\ade>chcp Active code page: 437 c:\ade>jdk-17\bin\java -cp . CJK 123 123 java.lang.Exception: Stack trace at java.base/java.lang.Thread.dumpStack(Thread.java:1380) at CJK.??(CJK.java:8) at CJK.main(CJK.java:4) c:\ade>chcp 65001 Active code page: 65001 c:\ade>jdk-17\bin\java -cp . CJK 爪哇 ?? java.lang.Exception: Stack trace at java.base/java.lang.Thread.dumpStack(Thread.java:1380) at CJK.爪哇(CJK.java:8) at CJK.main(CJK.java:4) As you can see, the CJK characters in the command-line arguments can't even be correctly passed as arguments to the Java main class. If that doesn't work, I can't see how we can get `-XX:CompileCommand` to work with CJK characters. - PR: https://git.openjdk.java.net/jdk/pull/5704
Re: RFR: 8274329: Fix non-portable HotSpot code in MethodMatcher::parse_method_pattern
On Thu, 30 Sep 2021 23:41:20 GMT, Jie Fu wrote: > > I will do your experiment next week. This is because it's already our > > National Day week and I can't find an English Windows machine until next > > week. I'll let you know the result as soon as possible. Thanks. > > No need to hurry :-). In case you can't find an English Windows, I think you > can use the `chcp 65001` command mentioned in > https://stackoverflow.com/questions/388490/how-to-use-unicode-characters-in-windows-command-line > to change your command-line window to use the UTF8 codepage. Hi @iklam , methodMatcher.obj [1] built with `System Locale: zh-cn;Chinese (China)` methodMatcher.obj [2] built with `System Locale: en-us;English (United States)"` There seems no difference when checking with `strings methodMatcher.obj`. The warnings disappear when the system locale is `en-us;English (United States)`. But unfortunately, I can't reproduce the "CJK" test example, which means non-ASCII chars for CompileCommand still fail for both jdk images (even when built with en-us locale, no warnings at all). So it's far more complicated than I had thought. I will just close this pr since we can't remove the non-ASCII code, which works in some countries. Thank you all for your help and valuable comments. Best regards, Jie [1] https://github.com/DamonFool/experiment/blob/main/JDK-8274329/ch-methodMatcher.obj [2] https://github.com/DamonFool/experiment/blob/main/JDK-8274329/en-methodMatcher.obj - PR: https://git.openjdk.java.net/jdk/pull/5704
Re: RFR: 8274329: Fix non-portable HotSpot code in MethodMatcher::parse_method_pattern
On Thu, 30 Sep 2021 23:30:29 GMT, Ioi Lam wrote: > > I will do your experiment next week. This is because it's already our > > National Day week and I can't find an English Windows machine until next > > week. I'll let you know the result as soon as possible. Thanks. > > No need to hurry :-). In case you can't find an English Windows, I think you > can use the `chcp 65001` command mentioned in > https://stackoverflow.com/questions/388490/how-to-use-unicode-characters-in-windows-command-line > to change your command-line window to use the UTF8 codepage. Okay. I also note the warning e:\jiefu\ws\jdk\src\hotspot\share\compiler\methodMatcher.cpp(269): warning C4778: 'sscanf' : unterminated format string '%255[*\x01\x02\x03\x04\x05\x06\a\b\n\v\f\r\x0e\x0f\x10\x11\x12\x13\x14\x15\x16\x17\x18\x19\x1a\x1b\x1c\x1d\x1e\x1f!"#$%&'*+,-0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ\^_`abcdefghijklmnopqrstuvwxyz{|}~\xe2\x82\xac\xe4\xba\x97\xe5\x84\x8e\xe5\x8e\x97%n' It is already different with the original `RANGEBASE` sequence [1]. ~\xe2\x82\xac\xe4\xba\x97\xe5\x84\x8e\xe5\x8e\x97%n vs. "\x80\x81\x82\x83\x84\x85\x86\x87\x88\x89\x8a\x8b\x8c\x8d\x8e\x8f" \ "\x90\x91\x92\x93\x94\x95\x96\x97\x98\x99\x9a\x9b\x9c\x9d\x9e\x9f" \ "\xa0\xa1\xa2\xa3\xa4\xa5\xa6\xa7\xa8\xa9\xaa\xab\xac\xad\xae\xaf" \ "\xb0\xb1\xb2\xb3\xb4\xb5\xb6\xb7\xb8\xb9\xba\xbb\xbc\xbd\xbe\xbf" \ "\xc0\xc1\xc2\xc3\xc4\xc5\xc6\xc7\xc8\xc9\xca\xcb\xcc\xcd\xce\xcf" \ "\xd0\xd1\xd2\xd3\xd4\xd5\xd6\xd7\xd8\xd9\xda\xdb\xdc\xdd\xde\xdf" \ "\xe0\xe1\xe2\xe3\xe4\xe5\xe6\xe7\xe8\xe9\xea\xeb\xec\xed\xee\xef" Not sure if this fact is sufficient to say the literal strings will be different in methodMatcher.obj. [1] https://github.com/openjdk/jdk/blob/master/src/hotspot/share/compiler/methodMatcher.cpp#L49 - PR: https://git.openjdk.java.net/jdk/pull/5704
Re: RFR: 8274329: Fix non-portable HotSpot code in MethodMatcher::parse_method_pattern
On Thu, 30 Sep 2021 23:04:17 GMT, Jie Fu wrote: > I will do your experiment next week. This is because it's already our > National Day week and I can't find an English Windows machine until next > week. I'll let you know the result as soon as possible. Thanks. No need to hurry :-). In case you can't find an English Windows, I think you can use the `chcp 65001` command mentioned in https://stackoverflow.com/questions/388490/how-to-use-unicode-characters-in-windows-command-line to change your command-line window to use the UTF8 codepage. - PR: https://git.openjdk.java.net/jdk/pull/5704
Re: RFR: 8274329: Fix non-portable HotSpot code in MethodMatcher::parse_method_pattern
On Thu, 30 Sep 2021 17:53:39 GMT, Ioi Lam wrote: > The current limitations of the MethodMather class are: > > [1] `parse_method_pattern(char*& line, ...)` requires `line` to be a > UTF8-encoded byte sequence. Essentially, this means that for > -XX:CompileCommand to support non-ASCII characters, the JVM (and the shell > that runs the JVM) must be using UTF8 character encoding. > > Note that a "locale" contains 3 parts: language, country and character > encoding. For example, > > * en_US.utf8 (English language, United States, UTF8 character encoding) > * zh_CN.utf8 (Chinese language, China, UTF8 character encoding) > * zh_CN.gbk (Chinese language, China, GBK character encoding) > > The first two support non-ASCII characters in -XX:CompileCommand, but the > third one doesn't. > > [2] MethodMather uses `sscanf`. It assumes that the JVM is running with UTF8 > character encoding because > > * It uses `char*` strings returned by `sscanf` to compare with the bytes > stored in Symbols. This requires `sscanf` to return strings that are encoded > in UTF8, because Symbols stores the string with UTF8-encoded bytes. > * It relies on range checking by `sscanf` to enforce the following > restrictions. However, these restrictions are given with the RANGE macro > which is UTF8 encoded bytes (and I suspect that these are incorrect when > handling multi-byte UTF8-encoded characters): > > ``` > // '\0' and 0xf0-0xff are disallowed in constant string values > // 0x20 ' ', 0x09 '\t' and, 0x2c ',' are used in the matching > // 0x5b '[' and 0x5d ']' can not be used because of the matcher > // 0x28 '(' and 0x29 ')' are used for the signature > // 0x2e '.' is always replaced before the matching > // 0x2f '/' is only used in the class name as package separator > ``` > > == Proposed solutions: > > I don't think we can solve [1] easily. To handle non-ASCII characters that > are non encoded in UTF8, we need to call NewPlatformString() in > src/java.base/share/native/libjli/java.c. However, this executes Java code, > but -XX:CompileCommand needs to be processed before any Java code execution. > ==> Proposal: IGNORE it for now. > > For [2], there are two distinct issues: > > (a) The restriction checks are invalid when the JVM is running in an non-UTF8 > encoding -- this is a moot point because we can't handle [1] anyway, so the > data given to sscanf() is already bad. => Proposal: IGNORE it for now > > (b) VC++ compilation warning when methodMather.cpp is compiled in non-UTF8 > environments > > This is just a warning, and (I think .) it doesn't change the object file > at all. I.e., the literal strings in methodMatcher.obj are exactly the same > as if methodMather.cpp is compiled under a UTF8 environment. > > Proposal: use pragma to disable the warning. Assuming that my analysis for > [1] and (a) is correct, there's no reason to fix the sscanf code. Disabling > the warnings with pragma is the most painless and easiest way to handle this. > > @DamonFool could you try this experiment: > > * Implement the pragma and build two JDKs -- one in a Chinese Windows > environment, and another in an English Windows environment > * run `strings methodMatcher.obj` and see if the output is identical > * run the "CJK" test example in my previous comment, and see if you get > identical results with both JDKs > > * On Windows, you may need to do this to force the terminal to be using > UTF8 code page. See > https://stackoverflow.com/questions/388490/how-to-use-unicode-characters-in-windows-command-line > > (If this doesn't work, an alternative is to avoid using sscanf and write our > own parser). > > Thanks Thanks @iklam for your excellent analysis. So HotSpot does support non-ASCII chars as names. Then we shouldn't simply remove such non-ASCII code. I will do your experiment next week. This is because it's already our National Day week and I can't find an English Windows machine until next week. I'll let you know the result as soon as possible. Thanks. - PR: https://git.openjdk.java.net/jdk/pull/5704
Re: RFR: 8274329: Fix non-portable HotSpot code in MethodMatcher::parse_method_pattern
On Sun, 26 Sep 2021 09:55:00 GMT, Jie Fu wrote: > Hi all, > > I tried to build OpenJDK on Cygwin (Windows 2016 + VS2019). > However, I failed with C4474 and C4778 warnings as below: > > Compiling 100 properties into resource bundles for java.desktop > Compiling 3038 files for java.base > e:\jiefu\ws\jdk\src\hotspot\share\compiler\methodMatcher.cpp(269): error > C2220: the following warning is treated as an error > e:\jiefu\ws\jdk\src\hotspot\share\compiler\methodMatcher.cpp(269): warning > C4778: 'sscanf' : unterminated format string > '%255[*\x01\x02\x03\x04\x05\x06\a\b\n\v\f\r\x0e\x0f\x10\x11\x12\x13\x14\x15\x16\x17\x18\x19\x1a\x1b\x1c\x1d\x1e\x1f!"#$%&'*+,-0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ\^_`abcdefghijklmnopqrstuvwxyz{|}~\xe2\x82\xac\xe4\xba\x97\xe5\x84\x8e\xe5\x8e\x97%n' > e:\jiefu\ws\jdk\src\hotspot\share\compiler\methodMatcher.cpp(269): warning > C4474: 'sscanf' : too many arguments passed for format string > e:\jiefu\ws\jdk\src\hotspot\share\compiler\methodMatcher.cpp(269): note: > placeholders and their parameters expect 1 variadic arguments, but 3 were > provided > e:\jiefu\ws\jdk\src\hotspot\share\compiler\methodMatcher.cpp(319): warning > C4778: 'sscanf' : unterminated format string > '%1022[[);/\x01\x02\x03\x04\x05\x06\a\b\n\v\f\r\x0e\x0f\x10\x11\x12\x13\x14\x15\x16\x17\x18\x19\x1a\x1b\x1c\x1d\x1e\x1f!"#$%&'*+,-0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ\^_`abcdefghijklmnopqrstuvwxyz{|}~\xe2\x82\xac\xe4\xba\x97\xe5\x84\x8e\xe5\x8e%n' > e:\jiefu\ws\jdk\src\hotspot\share\compiler\methodMatcher.cpp(319): warning > C4474: 'sscanf' : too many arguments passed for format string > e:\jiefu\ws\jdk\src\hotspot\share\compiler\methodMatcher.cpp(319): note: > placeholders and their parameters expect 0 variadic arguments, but 2 were > provided > > > The failure is caused by non-ASCII chars in the format string of sscanf > [1][2], which is non-portable on our Windows platform. > In fact, these non-ASCII coding also triggers C4819 warning, which had been > disabled in JDK-8216154 [3]. > And I also found an article showing that sscanf may fail with non-ASCII in > the format string [4]. > > So it would be nice to remove these non-ASCII chars (`\x80 ~ \xef`). > And I think it's safe to do so. > > This is because: > 1) There are actually no non-ASCII chars for package/class/method/signature > names. > 2) I don't think there is a use case, in which people will input non-ASCII > for `CompileCommand`. > > You may argue that the non-ASCII may be used by the parser itself. > But I didn't find that usage at all. (Please let me know if I miss something.) > > So I suggest to remove these non-ASCII code to make HotSpot to be more > portable. > And if we do so, we can also remove the only one > `PRAGMA_DISABLE_MSVC_WARNING(4819)` [5]. > > Testing: > - Build tests on Windows > - tier1~3 on Linux/x64 > > Thanks. > Best regards, > Jie > > [1] > https://github.com/openjdk/jdk/blob/master/src/hotspot/share/compiler/methodMatcher.cpp#L269 > [2] > https://github.com/openjdk/jdk/blob/master/src/hotspot/share/compiler/methodMatcher.cpp#L319 > [3] > https://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2019-January/032014.html > [4] https://jeffpar.github.io/kbarchive/kb/047/Q47369/ > [5] > https://github.com/openjdk/jdk/blob/master/src/hotspot/share/compiler/methodMatcher.cpp#L246 The current limitations of the MethodMather class are: [1] `parse_method_pattern(char*& line, ...)` requires `line` to be a UTF8-encoded byte sequence. Essentially, this means that for -XX:CompileCommand to support non-ASCII characters, the JVM (and the shell that runs the JVM) must be using UTF8 character encoding. Note that a "locale" contains 3 parts: language, country and character encoding. For example, - en_US.utf8 (English language, United States, UTF8 character encoding) - zh_CN.utf8 (Chinese language, China, UTF8 character encoding) - zh_CN.gbk (Chinese language, China, GBK character encoding) The first two support non-ASCII characters in -XX:CompileCommand, but the third one doesn't. [2] MethodMather uses `sscanf`. It assumes that the JVM is running with UTF8 character encoding because - It uses `char*` strings returned by `sscanf` to compare with the bytes stored in Symbols. This requires `sscanf` to return strings that are encoded in UTF8, because Symbols stores the string with UTF8-encoded bytes. - It relies on range checking by `sscanf` to enforce the following restrictions. However, these restrictions are given with the RANGE macro which is UTF8 encoded bytes (and I suspect that these are incorrect when handling multi-byte UTF8-encoded characters): // '\0' and 0xf0-0xff are disallowed in constant string values // 0x20 ' ', 0x09 '\t' and, 0x2c ',' are used in the matching // 0x5b '[' and 0x5d ']' can not be used because of the matcher // 0x28 '(' and 0x29 ')' are used for the signature // 0x2e '.' is always replaced before the matching // 0x2f '/' is only used
Re: RFR: 8274329: Fix non-portable HotSpot code in MethodMatcher::parse_method_pattern
On Thu, 30 Sep 2021 08:31:51 GMT, Jie Fu wrote: > > `RANGEBASE` was added by > > [JDK-6500501](https://bugs.openjdk.java.net/browse/JDK-6500501) and later > > was modified by > > [JDK-8027829](https://bugs.openjdk.java.net/browse/JDK-8027829) > > Note the original comment from 6500501: > > ``` > > // The characters allowed in a class or method name. All characters > 0x7f > > // are allowed in order to handle obfuscated class files (e.g. Volano) > > ``` > > Thanks @vnkozlov for your very helpful comments. > > I have one question: how can we specify (non-ascii chars) and (non-printable > ascii chars) through `-XX:CompileCommand`? > > I just learned from https://bugs.openjdk.java.net/browse/JDK-8027829 that we > can use unicode like `\u`. But it doesn't work in my experiments. > > My example was made from: > https://bugs.openjdk.java.net/secure/attachment/17128/UnicodeIdentifierTest.java > > ``` > public class UnicodeIdentifierTest { > public static void main(String args[]) { > System.out.println("Can I use \\u0001 in identifier name? " + >(Character.isJavaIdentifierPart(1) ? "yes" : > "no")); > for (int i = 0; i < 10; i++ ) > methodWithUnicode\u0001Char(); > > System.out.println("Can I use \\u00aa in identifier name? " + >(Character.isJavaIdentifierPart(0xaa) ? "yes" : > "no")); > for (int i = 0; i < 10; i++ ) > methodWithUnicode\u00aaChar(); > > System.out.println("Can I use \\u006b in identifier name? " + >(Character.isJavaIdentifierPart(0x6b) ? "yes" : > "no")); > for (int i = 0; i < 10; i++ ) > methodWithUnicode\u006bChar(); > > } > public static int a = 0; > public static void methodWithUnicode\u0001Char() { > a++; > } > > public static void methodWithUnicode\u00aaChar() { > a++; > } > > public static void methodWithUnicode\u006bChar() { > a++; > } > } > ``` > > And I tried to exclude some specific methods like this > > ``` > ${JDK}/bin/java \ >-XX:+PrintCompilation \ >-XX:CompileCommand=exclude,`echo -e > "UnicodeIdentifierTest::methodWithUnicode\u0001Char"` \ >-XX:CompileCommand=exclude,`echo -e > "UnicodeIdentifierTest.methodWithUnicode\u0001Char"` \ > > -XX:CompileCommand=exclude,"UnicodeIdentifierTest.methodWithUnicode\u0001Char" > \ > > -XX:CompileCommand=exclude,'UnicodeIdentifierTest.methodWithUnicode\u0001Char' > \ > > -XX:CompileCommand=exclude,UnicodeIdentifierTest.methodWithUnicode\u0001Char \ >-XX:CompileCommand=exclude,`echo -e > "UnicodeIdentifierTest::methodWithUnicode\u00aaChar"` \ >-XX:CompileCommand=exclude,`echo -e > "UnicodeIdentifierTest.methodWithUnicode\u00aaChar"` \ > > -XX:CompileCommand=exclude,"UnicodeIdentifierTest.methodWithUnicode\u00aaChar" > \ > > -XX:CompileCommand=exclude,'UnicodeIdentifierTest.methodWithUnicode\u00aaChar' > \ > > -XX:CompileCommand=exclude,UnicodeIdentifierTest.methodWithUnicode\u00aaChar \ >-XX:CompileCommand=exclude,`echo -e > "UnicodeIdentifierTest::methodWithUnicode\u006bChar"` \ >-XX:CompileCommand=exclude,`echo -e > "UnicodeIdentifierTest.methodWithUnicode\u006bChar"` \ > > -XX:CompileCommand=exclude,"UnicodeIdentifierTest.methodWithUnicode\u006bChar" > \ > > -XX:CompileCommand=exclude,'UnicodeIdentifierTest.methodWithUnicode\u006bChar' > \ > > -XX:CompileCommand=exclude,UnicodeIdentifierTest.methodWithUnicode\u006bChar \ >${TEST} > ``` > > But none of them worked. > > So if there is no other way to specify a non-ascii chars, it seems safe to > remove the non-ascii code. > > If I miss something, please let me know. Thanks. (The Chinese characters in this comment may not be displayed properly inside an e-mail reader. Please see this comment on GitHub https://github.com/openjdk/jdk/pull/5704) -XX:CompileCommand does not process \u sequences. However, if your shell's locale is UTF8, you can do something like this, by directly entering them on the command-line, without escaping with \u: public class CJK { public static void main(String args[]) { \u722a\u54c7(); } static void \u722a\u54c7() { // Chinese word for "Java" Thread.dumpStack(); } } === $ locale LANG=en_US.UTF-8 LANGUAGE= LC_CTYPE="en_US.UTF-8" LC_NUMERIC="en_US.UTF-8" LC_TIME="en_US.UTF-8" LC_COLLATE="en_US.UTF-8" LC_MONETARY="en_US.UTF-8" LC_MESSAGES="en_US.UTF-8" LC_PAPER="en_US.UTF-8" LC_NAME="en_US.UTF-8" LC_ADDRESS="en_US.UTF-8" LC_TELEPHONE="en_US.UTF-8" LC_MEASUREMENT="en_US.UTF-8" LC_IDENTIFICATION="en_US.UTF-8" LC_ALL= $ javac CJK.java $ java -Xcomp -XX:-BackgroundCompilation -XX:CompileCommand='compileonly,*::爪哇' -XX:+PrintCompilation -cp . CJK > log.txt java.lang.Exception: Stack trace at java.base/java.lang.Thread.dumpStack(Thread.java:1380) at CJK.爪哇 (CJK.java:7) at C
Re: RFR: 8274329: Fix non-portable HotSpot code in MethodMatcher::parse_method_pattern
On Thu, 30 Sep 2021 10:01:52 GMT, Magnus Ihse Bursie wrote: > Some misc remarks from a build PoV: > > * We count language and region settings as a build environment requirement, > not a portability issue. > * It is really a shame that Microsoft is making changes to these so darned > hard. On all other platforms, LC_ALL=C in the make file just fixes the > problem. :-( > * But we do want the JDK to be easy to build, so this means that we might > need to support building on more than en_US on Windows, at least until > Microsoft get's their act together. > > From what I see in the discussion here there seems to be no clarity in what > range of character the specification allows. This needs to be absolutely > clear for any changes here -- we can't filter out legal characters just > because they are problematic to build on non en_US platforms. > > However, I'm thinking that you need to take a step back and see what you are > really trying to solve. To me, it seems that sscanf is not the right tool for > the job, and the fact that it has worked until now is more a lucky > coincidence. It seems, from a quick glance, that you should consider the > input a byte array, and process it like that, instead of a string, if the > encoding is unclear, and the spec is talking about character values (like > 0x7f) rather than what characters they are supposed to represent in a > specific encoding. Thanks @magicus . The background is that we want to build CI/CD pipelines for Windows platforms to help the OpenJDK development. We already have enough Linux and MacOS pipelines but still not have one for Windows. So we just plan to setup some Windows pipelines to further improve OpenJDK product quality. But to my surprise, OpenJDK fails to build on our Windows platforms. You may suggest changing the locale settings. But many of our Apps don't allow us to changet it since we are non-English speaking country. It's unfortunate that OpenJDK can't build on our Windows platforms. It's not our goal to make CompileCommand work with non-ASCII chars. If it doesn't make anything worse, we can just remove the non-ASCII code to make it to be more portable. - PR: https://git.openjdk.java.net/jdk/pull/5704
Re: RFR: 8274329: Fix non-portable HotSpot code in MethodMatcher::parse_method_pattern
On Sun, 26 Sep 2021 09:55:00 GMT, Jie Fu wrote: > Hi all, > > I tried to build OpenJDK on Cygwin (Windows 2016 + VS2019). > However, I failed with C4474 and C4778 warnings as below: > > Compiling 100 properties into resource bundles for java.desktop > Compiling 3038 files for java.base > e:\jiefu\ws\jdk\src\hotspot\share\compiler\methodMatcher.cpp(269): error > C2220: the following warning is treated as an error > e:\jiefu\ws\jdk\src\hotspot\share\compiler\methodMatcher.cpp(269): warning > C4778: 'sscanf' : unterminated format string > '%255[*\x01\x02\x03\x04\x05\x06\a\b\n\v\f\r\x0e\x0f\x10\x11\x12\x13\x14\x15\x16\x17\x18\x19\x1a\x1b\x1c\x1d\x1e\x1f!"#$%&'*+,-0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ\^_`abcdefghijklmnopqrstuvwxyz{|}~\xe2\x82\xac\xe4\xba\x97\xe5\x84\x8e\xe5\x8e\x97%n' > e:\jiefu\ws\jdk\src\hotspot\share\compiler\methodMatcher.cpp(269): warning > C4474: 'sscanf' : too many arguments passed for format string > e:\jiefu\ws\jdk\src\hotspot\share\compiler\methodMatcher.cpp(269): note: > placeholders and their parameters expect 1 variadic arguments, but 3 were > provided > e:\jiefu\ws\jdk\src\hotspot\share\compiler\methodMatcher.cpp(319): warning > C4778: 'sscanf' : unterminated format string > '%1022[[);/\x01\x02\x03\x04\x05\x06\a\b\n\v\f\r\x0e\x0f\x10\x11\x12\x13\x14\x15\x16\x17\x18\x19\x1a\x1b\x1c\x1d\x1e\x1f!"#$%&'*+,-0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ\^_`abcdefghijklmnopqrstuvwxyz{|}~\xe2\x82\xac\xe4\xba\x97\xe5\x84\x8e\xe5\x8e%n' > e:\jiefu\ws\jdk\src\hotspot\share\compiler\methodMatcher.cpp(319): warning > C4474: 'sscanf' : too many arguments passed for format string > e:\jiefu\ws\jdk\src\hotspot\share\compiler\methodMatcher.cpp(319): note: > placeholders and their parameters expect 0 variadic arguments, but 2 were > provided > > > The failure is caused by non-ASCII chars in the format string of sscanf > [1][2], which is non-portable on our Windows platform. > In fact, these non-ASCII coding also triggers C4819 warning, which had been > disabled in JDK-8216154 [3]. > And I also found an article showing that sscanf may fail with non-ASCII in > the format string [4]. > > So it would be nice to remove these non-ASCII chars (`\x80 ~ \xef`). > And I think it's safe to do so. > > This is because: > 1) There are actually no non-ASCII chars for package/class/method/signature > names. > 2) I don't think there is a use case, in which people will input non-ASCII > for `CompileCommand`. > > You may argue that the non-ASCII may be used by the parser itself. > But I didn't find that usage at all. (Please let me know if I miss something.) > > So I suggest to remove these non-ASCII code to make HotSpot to be more > portable. > And if we do so, we can also remove the only one > `PRAGMA_DISABLE_MSVC_WARNING(4819)` [5]. > > Testing: > - Build tests on Windows > - tier1~3 on Linux/x64 > > Thanks. > Best regards, > Jie > > [1] > https://github.com/openjdk/jdk/blob/master/src/hotspot/share/compiler/methodMatcher.cpp#L269 > [2] > https://github.com/openjdk/jdk/blob/master/src/hotspot/share/compiler/methodMatcher.cpp#L319 > [3] > https://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2019-January/032014.html > [4] https://jeffpar.github.io/kbarchive/kb/047/Q47369/ > [5] > https://github.com/openjdk/jdk/blob/master/src/hotspot/share/compiler/methodMatcher.cpp#L246 Some misc remarks from a build PoV: * We count language and region settings as a build environment requirement, not a portability issue. * It is really a shame that Microsoft is making changes to these so darned hard. On all other platforms, LC_ALL=C in the make file just fixes the problem. :-( * But we do want the JDK to be easy to build, so this means that we might need to support building on more than en_US on Windows, at least until Microsoft get's their act together. >From what I see in the discussion here there seems to be no clarity in what >range of character the specification allows. This needs to be absolutely clear >for any changes here -- we can't filter out legal characters just because they >are problematic to build on non en_US platforms. However, I'm thinking that you need to take a step back and see what you are really trying to solve. To me, it seems that sscanf is not the right tool for the job, and the fact that it has worked until now is more a lucky coincidence. It seems, from a quick glance, that you should consider the input a byte array, and process it like that, instead of a string, if the encoding is unclear, and the spec is talking about character values (like 0x7f) rather than what characters they are supposed to represent in a specific encoding. - PR: https://git.openjdk.java.net/jdk/pull/5704
Re: RFR: 8274329: Fix non-portable HotSpot code in MethodMatcher::parse_method_pattern
On Wed, 29 Sep 2021 19:40:20 GMT, Vladimir Kozlov wrote: > `RANGEBASE` was added by > [JDK-6500501](https://bugs.openjdk.java.net/browse/JDK-6500501) and later was > modified by [JDK-8027829](https://bugs.openjdk.java.net/browse/JDK-8027829) > > Note the original comment from 6500501: > > ``` > // The characters allowed in a class or method name. All characters > 0x7f > // are allowed in order to handle obfuscated class files (e.g. Volano) > ``` Thanks @vnkozlov for your very helpful comments. I have one question: how can we specify (non-ascii chars) and (non-printable ascii chars) through `-XX:CompileCommand`? I just learned from https://bugs.openjdk.java.net/browse/JDK-8027829 that we can use unicode like `\u`. But it doesn't work in my experiments. My example was made from: https://bugs.openjdk.java.net/secure/attachment/17128/UnicodeIdentifierTest.java public class UnicodeIdentifierTest { public static void main(String args[]) { System.out.println("Can I use \\u0001 in identifier name? " + (Character.isJavaIdentifierPart(1) ? "yes" : "no")); for (int i = 0; i < 10; i++ ) methodWithUnicode\u0001Char(); System.out.println("Can I use \\u00aa in identifier name? " + (Character.isJavaIdentifierPart(0xaa) ? "yes" : "no")); for (int i = 0; i < 10; i++ ) methodWithUnicode\u00aaChar(); System.out.println("Can I use \\u006b in identifier name? " + (Character.isJavaIdentifierPart(0x6b) ? "yes" : "no")); for (int i = 0; i < 10; i++ ) methodWithUnicode\u006bChar(); } public static int a = 0; public static void methodWithUnicode\u0001Char() { a++; } public static void methodWithUnicode\u00aaChar() { a++; } public static void methodWithUnicode\u006bChar() { a++; } } And I tried to exclude some specific methods like this ${JDK}/bin/java \ -XX:+PrintCompilation \ -XX:CompileCommand=exclude,`echo -e "UnicodeIdentifierTest::methodWithUnicode\u0001Char"` \ -XX:CompileCommand=exclude,`echo -e "UnicodeIdentifierTest.methodWithUnicode\u0001Char"` \ -XX:CompileCommand=exclude,"UnicodeIdentifierTest.methodWithUnicode\u0001Char" \ -XX:CompileCommand=exclude,'UnicodeIdentifierTest.methodWithUnicode\u0001Char' \ -XX:CompileCommand=exclude,UnicodeIdentifierTest.methodWithUnicode\u0001Char \ -XX:CompileCommand=exclude,`echo -e "UnicodeIdentifierTest::methodWithUnicode\u00aaChar"` \ -XX:CompileCommand=exclude,`echo -e "UnicodeIdentifierTest.methodWithUnicode\u00aaChar"` \ -XX:CompileCommand=exclude,"UnicodeIdentifierTest.methodWithUnicode\u00aaChar" \ -XX:CompileCommand=exclude,'UnicodeIdentifierTest.methodWithUnicode\u00aaChar' \ -XX:CompileCommand=exclude,UnicodeIdentifierTest.methodWithUnicode\u00aaChar \ -XX:CompileCommand=exclude,`echo -e "UnicodeIdentifierTest::methodWithUnicode\u006bChar"` \ -XX:CompileCommand=exclude,`echo -e "UnicodeIdentifierTest.methodWithUnicode\u006bChar"` \ -XX:CompileCommand=exclude,"UnicodeIdentifierTest.methodWithUnicode\u006bChar" \ -XX:CompileCommand=exclude,'UnicodeIdentifierTest.methodWithUnicode\u006bChar' \ -XX:CompileCommand=exclude,UnicodeIdentifierTest.methodWithUnicode\u006bChar \ ${TEST} But none of them worked. So if there is no other way to specify a non-ascii chars, it seems safe to remove the non-ascii code. If I miss something, please let me know. Thanks. - PR: https://git.openjdk.java.net/jdk/pull/5704
Re: RFR: 8274329: Fix non-portable HotSpot code in MethodMatcher::parse_method_pattern
On Sun, 26 Sep 2021 09:55:00 GMT, Jie Fu wrote: > Hi all, > > I tried to build OpenJDK on Cygwin (Windows 2016 + VS2019). > However, I failed with C4474 and C4778 warnings as below: > > Compiling 100 properties into resource bundles for java.desktop > Compiling 3038 files for java.base > e:\jiefu\ws\jdk\src\hotspot\share\compiler\methodMatcher.cpp(269): error > C2220: the following warning is treated as an error > e:\jiefu\ws\jdk\src\hotspot\share\compiler\methodMatcher.cpp(269): warning > C4778: 'sscanf' : unterminated format string > '%255[*\x01\x02\x03\x04\x05\x06\a\b\n\v\f\r\x0e\x0f\x10\x11\x12\x13\x14\x15\x16\x17\x18\x19\x1a\x1b\x1c\x1d\x1e\x1f!"#$%&'*+,-0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ\^_`abcdefghijklmnopqrstuvwxyz{|}~\xe2\x82\xac\xe4\xba\x97\xe5\x84\x8e\xe5\x8e\x97%n' > e:\jiefu\ws\jdk\src\hotspot\share\compiler\methodMatcher.cpp(269): warning > C4474: 'sscanf' : too many arguments passed for format string > e:\jiefu\ws\jdk\src\hotspot\share\compiler\methodMatcher.cpp(269): note: > placeholders and their parameters expect 1 variadic arguments, but 3 were > provided > e:\jiefu\ws\jdk\src\hotspot\share\compiler\methodMatcher.cpp(319): warning > C4778: 'sscanf' : unterminated format string > '%1022[[);/\x01\x02\x03\x04\x05\x06\a\b\n\v\f\r\x0e\x0f\x10\x11\x12\x13\x14\x15\x16\x17\x18\x19\x1a\x1b\x1c\x1d\x1e\x1f!"#$%&'*+,-0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ\^_`abcdefghijklmnopqrstuvwxyz{|}~\xe2\x82\xac\xe4\xba\x97\xe5\x84\x8e\xe5\x8e%n' > e:\jiefu\ws\jdk\src\hotspot\share\compiler\methodMatcher.cpp(319): warning > C4474: 'sscanf' : too many arguments passed for format string > e:\jiefu\ws\jdk\src\hotspot\share\compiler\methodMatcher.cpp(319): note: > placeholders and their parameters expect 0 variadic arguments, but 2 were > provided > > > The failure is caused by non-ASCII chars in the format string of sscanf > [1][2], which is non-portable on our Windows platform. > In fact, these non-ASCII coding also triggers C4819 warning, which had been > disabled in JDK-8216154 [3]. > And I also found an article showing that sscanf may fail with non-ASCII in > the format string [4]. > > So it would be nice to remove these non-ASCII chars (`\x80 ~ \xef`). > And I think it's safe to do so. > > This is because: > 1) There are actually no non-ASCII chars for package/class/method/signature > names. > 2) I don't think there is a use case, in which people will input non-ASCII > for `CompileCommand`. > > You may argue that the non-ASCII may be used by the parser itself. > But I didn't find that usage at all. (Please let me know if I miss something.) > > So I suggest to remove these non-ASCII code to make HotSpot to be more > portable. > And if we do so, we can also remove the only one > `PRAGMA_DISABLE_MSVC_WARNING(4819)` [5]. > > Testing: > - Build tests on Windows > - tier1~3 on Linux/x64 > > Thanks. > Best regards, > Jie > > [1] > https://github.com/openjdk/jdk/blob/master/src/hotspot/share/compiler/methodMatcher.cpp#L269 > [2] > https://github.com/openjdk/jdk/blob/master/src/hotspot/share/compiler/methodMatcher.cpp#L319 > [3] > https://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2019-January/032014.html > [4] https://jeffpar.github.io/kbarchive/kb/047/Q47369/ > [5] > https://github.com/openjdk/jdk/blob/master/src/hotspot/share/compiler/methodMatcher.cpp#L246 `RANGEBASE` was added by [JDK-6500501](https://bugs.openjdk.java.net/browse/JDK-6500501) and later was modified by [JDK-8027829](https://bugs.openjdk.java.net/browse/JDK-8027829) Note the original comment from 6500501: // The characters allowed in a class or method name. All characters > 0x7f // are allowed in order to handle obfuscated class files (e.g. Volano) - PR: https://git.openjdk.java.net/jdk/pull/5704
Re: RFR: 8274329: Fix non-portable HotSpot code in MethodMatcher::parse_method_pattern
On Sun, 26 Sep 2021 09:55:00 GMT, Jie Fu wrote: > Hi all, > > I tried to build OpenJDK on Cygwin (Windows 2016 + VS2019). > However, I failed with C4474 and C4778 warnings as below: > > Compiling 100 properties into resource bundles for java.desktop > Compiling 3038 files for java.base > e:\jiefu\ws\jdk\src\hotspot\share\compiler\methodMatcher.cpp(269): error > C2220: the following warning is treated as an error > e:\jiefu\ws\jdk\src\hotspot\share\compiler\methodMatcher.cpp(269): warning > C4778: 'sscanf' : unterminated format string > '%255[*\x01\x02\x03\x04\x05\x06\a\b\n\v\f\r\x0e\x0f\x10\x11\x12\x13\x14\x15\x16\x17\x18\x19\x1a\x1b\x1c\x1d\x1e\x1f!"#$%&'*+,-0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ\^_`abcdefghijklmnopqrstuvwxyz{|}~\xe2\x82\xac\xe4\xba\x97\xe5\x84\x8e\xe5\x8e\x97%n' > e:\jiefu\ws\jdk\src\hotspot\share\compiler\methodMatcher.cpp(269): warning > C4474: 'sscanf' : too many arguments passed for format string > e:\jiefu\ws\jdk\src\hotspot\share\compiler\methodMatcher.cpp(269): note: > placeholders and their parameters expect 1 variadic arguments, but 3 were > provided > e:\jiefu\ws\jdk\src\hotspot\share\compiler\methodMatcher.cpp(319): warning > C4778: 'sscanf' : unterminated format string > '%1022[[);/\x01\x02\x03\x04\x05\x06\a\b\n\v\f\r\x0e\x0f\x10\x11\x12\x13\x14\x15\x16\x17\x18\x19\x1a\x1b\x1c\x1d\x1e\x1f!"#$%&'*+,-0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ\^_`abcdefghijklmnopqrstuvwxyz{|}~\xe2\x82\xac\xe4\xba\x97\xe5\x84\x8e\xe5\x8e%n' > e:\jiefu\ws\jdk\src\hotspot\share\compiler\methodMatcher.cpp(319): warning > C4474: 'sscanf' : too many arguments passed for format string > e:\jiefu\ws\jdk\src\hotspot\share\compiler\methodMatcher.cpp(319): note: > placeholders and their parameters expect 0 variadic arguments, but 2 were > provided > > > The failure is caused by non-ASCII chars in the format string of sscanf > [1][2], which is non-portable on our Windows platform. > In fact, these non-ASCII coding also triggers C4819 warning, which had been > disabled in JDK-8216154 [3]. > And I also found an article showing that sscanf may fail with non-ASCII in > the format string [4]. > > So it would be nice to remove these non-ASCII chars (`\x80 ~ \xef`). > And I think it's safe to do so. > > This is because: > 1) There are actually no non-ASCII chars for package/class/method/signature > names. > 2) I don't think there is a use case, in which people will input non-ASCII > for `CompileCommand`. > > You may argue that the non-ASCII may be used by the parser itself. > But I didn't find that usage at all. (Please let me know if I miss something.) > > So I suggest to remove these non-ASCII code to make HotSpot to be more > portable. > And if we do so, we can also remove the only one > `PRAGMA_DISABLE_MSVC_WARNING(4819)` [5]. > > Testing: > - Build tests on Windows > - tier1~3 on Linux/x64 > > Thanks. > Best regards, > Jie > > [1] > https://github.com/openjdk/jdk/blob/master/src/hotspot/share/compiler/methodMatcher.cpp#L269 > [2] > https://github.com/openjdk/jdk/blob/master/src/hotspot/share/compiler/methodMatcher.cpp#L319 > [3] > https://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2019-January/032014.html > [4] https://jeffpar.github.io/kbarchive/kb/047/Q47369/ > [5] > https://github.com/openjdk/jdk/blob/master/src/hotspot/share/compiler/methodMatcher.cpp#L246 This should be discussed with `build` group and may be `runtime` to get more comments. - PR: https://git.openjdk.java.net/jdk/pull/5704