Hi Yasumasa, looks good to me.
small nit: import java.nio.charset.* -> import java.nio.charset.Charset; We loose the assert for type now from byteArrayValue(). Can you re-add it to the location where you call CStringUtils? We do not seem to need byteArrayValue(), can this be removed? Best Regards, Thomas On Wed, Jun 13, 2018 at 3:33 PM, Yasumasa Suenaga <yasue...@gmail.com> wrote: > Thanks Thomas, > >> That said, I'd probably just add a variant to >> CStringUtilities::getString which takes encoding as an argument, and >> then pass "US-ASCII" to remain compatible with the current version of >> jsnap. > > > I agree with you. My opinion is one of the ideals. > Anyway, I uploaded new webrev which is added charset support in > CStringUtils: > > http://cr.openjdk.java.net/~ysuenaga/JDK-8204531/webrev.01/ > > > Thanks, > > Yasumasa > > > > On 2018/06/13 21:46, Thomas Stüfe wrote: >> >> Disclaimer: I am Reviewer but not versed in this corner of the jdk, >> nor have knowledge of the history, so take what I suggest with a grain >> of salt. >> >> That said, I'd probably just add a variant to >> CStringUtilities::getString which takes encoding as an argument, and >> then pass "US-ASCII" to remain compatible with the current version of >> jsnap. >> >> ..Thomas >> >> On Wed, Jun 13, 2018 at 2:37 PM, Yasumasa Suenaga <yasue...@gmail.com> >> wrote: >>> >>> Hi Thomas, >>> >>> Thank you for your comment. >>> >>> CStringUtilities.java says the String which is constructed from byte >>> array >>> should use Charset.defaultCharset(). [1] >>> I'm not convinced how should we fix - US-ASCII or default charset. >>> >>> IMHO we should use default charset. However I concern I need to get CSR >>> approvement. >>> (In addition, I want to refactor CStringUtilities. I guess we can make >>> more >>> simple code :-) >>> >>> >>> What do you think about it? >>> >>> Yasumasa >>> >>> >>> [1] >>> >>> http://hg.openjdk.java.net/jdk/jdk/file/7bf4f1b5e438/src/jdk.hotspot.agent/share/classes/sun/jvm/hotspot/utilities/CStringUtilities.java#l73 >>> >>> >>> >>> On 2018/06/13 21:18, Thomas Stüfe wrote: >>>> >>>> >>>> Hi Yasumasa, >>>> >>>> your patch will change behavior of jsnap if file.encoding is set to >>>> something different than US-ASCII. >>>> >>>> Old version uses, hard wired, US-ASCII. >>>> >>>> You now use jvm/hotspot/utilities/CStringUtilities.java, which uses >>>> >>>> private static String encoding = System.getProperty("file.encoding", >>>> "US-ASCII"); >>>> >>>> and thus may use a different encoding. >>>> >>>> Best Regards, Thomas >>>> >>>> >>>> >>>> On Wed, Jun 13, 2018 at 2:09 PM, Yasumasa Suenaga <yasue...@gmail.com> >>>> wrote: >>>>> >>>>> >>>>> PING: Could you review it? >>>>> I want to merge this change before JDK 11 RDP 1. >>>>> >>>>>>>> webrev: >>>>>>>> http://cr.openjdk.java.net/~ysuenaga/JDK-8204531/webrev.00/ >>>>>>>> >>>>>>>> JBS: >>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8204531 >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> Yasumasa >>>>> >>>>> >>>>> On 2018/06/08 15:38, Yasumasa Suenaga wrote: >>>>>> >>>>>> >>>>>> >>>>>> Hi David, >>>>>> >>>>>> 2018-06-08 10:47 GMT+09:00 David Holmes <david.hol...@oracle.com>: >>>>>>> >>>>>>> >>>>>>> >>>>>>> Hi Yasumasa, >>>>>>> >>>>>>> On 7/06/2018 10:43 PM, Yasumasa Suenaga wrote: >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> Hi all, >>>>>>>> >>>>>>>> Please review this change: >>>>>>>> >>>>>>>> webrev: >>>>>>>> http://cr.openjdk.java.net/~ysuenaga/JDK-8204531/webrev.00/ >>>>>>>> >>>>>>>> JBS: >>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8204531 >>>>>>>> >>>>>>>> >>>>>>>> We can use `jhsdb jsnap` to check all PerfData. >>>>>>>> String values in PerfData are defined as jbyte array, but JSnap >>>>>>>> cannot >>>>>>>> handle it well as following: >>>>>>>> >>>>>>>> ``` >>>>>>>> $ jhsdb jsnap --pid 28542 --all | less >>>>>>>> >>>>>>>> sun.gc.cause=No >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> GC^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@ >>>>>>>> ``` >>>>>>>> >>>>>>>> You can see this value via `less` and `vim` on Linux. `^@` shows it >>>>>>>> is >>>>>>>> non-ascii character. >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> So your synopsis says "remove unused chars following \0". Is that >>>>>>> really >>>>>>> what is happening? I would have expected "new String(bytes, >>>>>>> encoding)" >>>>>>> to >>>>>>> stop processing bytes when it encounters a NUL! >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> String c'tor will continue to parse byte array even if it finds '\0'. >>>>>> You can see it on JShell as below: >>>>>> >>>>>> ------------- >>>>>> jshell> var bytes = new byte[]{(byte)'a', (byte)'b', (byte)'c', >>>>>> (byte)'\0', (byte)'d', (byte)'e', (byte)'f'} >>>>>> bytes ==> byte[7] { 97, 98, 99, 0, 100, 101, 102 } >>>>>> >>>>>> jshell> new String(bytes, "US-ASCII") >>>>>> $2 ==> "abc\000def" >>>>>> ------------- >>>>>> >>>>>> >>>>>>>> PerfDataEntry has null-terminated C string. So we should restore as >>>>>>>> it >>>>>>>> in >>>>>>>> Java layer. >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> If the issue is really "junk" beyond the \0 nul-terminator, how did >>>>>>> we >>>>>>> get >>>>>>> such values? Shouldn't the byte array already be terminated at the >>>>>>> NUL? >>>>>>> Or >>>>>>> is this just a raw representation of an array in the VM using the >>>>>>> exact >>>>>>> length of the array regardless of content? >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> I think JSnap shows the array regardless of content. >>>>>> >>>>>> >>>>>>> I see CStringUtilities.getString() both stops when it encounters a 0 >>>>>>> value, >>>>>>> and uses the default file encoding (else ASCII). >>>>>>> >>>>>>> The fix seems perfectly reasonable, I'm just unclear where exactly >>>>>>> this >>>>>>> "bad" data should have been stopped. >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> Thanks! >>>>>> >>>>>> >>>>>> Yasumasa >>>>>> >>>>>> >>>>>>> Thanks, >>>>>>> David >>>>>>> >>>>>>> >>>>>>>> >>>>>>>> Thanks, >>>>>>>> >>>>>>>> Yasumasa >>>>>>>> >>>>>>> >>>>> >>> >