Disclaimer: I am Reviewer but not versed in this corner of the jdk, nor have knowledge of the history, so take what I suggest with a grain of salt.
That said, I'd probably just add a variant to CStringUtilities::getString which takes encoding as an argument, and then pass "US-ASCII" to remain compatible with the current version of jsnap. ..Thomas On Wed, Jun 13, 2018 at 2:37 PM, Yasumasa Suenaga <yasue...@gmail.com> wrote: > Hi Thomas, > > Thank you for your comment. > > CStringUtilities.java says the String which is constructed from byte array > should use Charset.defaultCharset(). [1] > I'm not convinced how should we fix - US-ASCII or default charset. > > IMHO we should use default charset. However I concern I need to get CSR > approvement. > (In addition, I want to refactor CStringUtilities. I guess we can make more > simple code :-) > > > What do you think about it? > > Yasumasa > > > [1] > http://hg.openjdk.java.net/jdk/jdk/file/7bf4f1b5e438/src/jdk.hotspot.agent/share/classes/sun/jvm/hotspot/utilities/CStringUtilities.java#l73 > > > > On 2018/06/13 21:18, Thomas Stüfe wrote: >> >> Hi Yasumasa, >> >> your patch will change behavior of jsnap if file.encoding is set to >> something different than US-ASCII. >> >> Old version uses, hard wired, US-ASCII. >> >> You now use jvm/hotspot/utilities/CStringUtilities.java, which uses >> >> private static String encoding = System.getProperty("file.encoding", >> "US-ASCII"); >> >> and thus may use a different encoding. >> >> Best Regards, Thomas >> >> >> >> On Wed, Jun 13, 2018 at 2:09 PM, Yasumasa Suenaga <yasue...@gmail.com> >> wrote: >>> >>> PING: Could you review it? >>> I want to merge this change before JDK 11 RDP 1. >>> >>>>>> webrev: >>>>>> http://cr.openjdk.java.net/~ysuenaga/JDK-8204531/webrev.00/ >>>>>> >>>>>> JBS: >>>>>> https://bugs.openjdk.java.net/browse/JDK-8204531 >>> >>> >>> >>> >>> Yasumasa >>> >>> >>> On 2018/06/08 15:38, Yasumasa Suenaga wrote: >>>> >>>> >>>> Hi David, >>>> >>>> 2018-06-08 10:47 GMT+09:00 David Holmes <david.hol...@oracle.com>: >>>>> >>>>> >>>>> Hi Yasumasa, >>>>> >>>>> On 7/06/2018 10:43 PM, Yasumasa Suenaga wrote: >>>>>> >>>>>> >>>>>> >>>>>> Hi all, >>>>>> >>>>>> Please review this change: >>>>>> >>>>>> webrev: >>>>>> http://cr.openjdk.java.net/~ysuenaga/JDK-8204531/webrev.00/ >>>>>> >>>>>> JBS: >>>>>> https://bugs.openjdk.java.net/browse/JDK-8204531 >>>>>> >>>>>> >>>>>> We can use `jhsdb jsnap` to check all PerfData. >>>>>> String values in PerfData are defined as jbyte array, but JSnap cannot >>>>>> handle it well as following: >>>>>> >>>>>> ``` >>>>>> $ jhsdb jsnap --pid 28542 --all | less >>>>>> >>>>>> sun.gc.cause=No >>>>>> >>>>>> >>>>>> GC^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@ >>>>>> ``` >>>>>> >>>>>> You can see this value via `less` and `vim` on Linux. `^@` shows it is >>>>>> non-ascii character. >>>>> >>>>> >>>>> >>>>> >>>>> So your synopsis says "remove unused chars following \0". Is that >>>>> really >>>>> what is happening? I would have expected "new String(bytes, encoding)" >>>>> to >>>>> stop processing bytes when it encounters a NUL! >>>> >>>> >>>> >>>> String c'tor will continue to parse byte array even if it finds '\0'. >>>> You can see it on JShell as below: >>>> >>>> ------------- >>>> jshell> var bytes = new byte[]{(byte)'a', (byte)'b', (byte)'c', >>>> (byte)'\0', (byte)'d', (byte)'e', (byte)'f'} >>>> bytes ==> byte[7] { 97, 98, 99, 0, 100, 101, 102 } >>>> >>>> jshell> new String(bytes, "US-ASCII") >>>> $2 ==> "abc\000def" >>>> ------------- >>>> >>>> >>>>>> PerfDataEntry has null-terminated C string. So we should restore as it >>>>>> in >>>>>> Java layer. >>>>> >>>>> >>>>> >>>>> >>>>> If the issue is really "junk" beyond the \0 nul-terminator, how did we >>>>> get >>>>> such values? Shouldn't the byte array already be terminated at the NUL? >>>>> Or >>>>> is this just a raw representation of an array in the VM using the exact >>>>> length of the array regardless of content? >>>> >>>> >>>> >>>> I think JSnap shows the array regardless of content. >>>> >>>> >>>>> I see CStringUtilities.getString() both stops when it encounters a 0 >>>>> value, >>>>> and uses the default file encoding (else ASCII). >>>>> >>>>> The fix seems perfectly reasonable, I'm just unclear where exactly this >>>>> "bad" data should have been stopped. >>>> >>>> >>>> >>>> Thanks! >>>> >>>> >>>> Yasumasa >>>> >>>> >>>>> Thanks, >>>>> David >>>>> >>>>> >>>>>> >>>>>> Thanks, >>>>>> >>>>>> Yasumasa >>>>>> >>>>> >>> >