Re: PING: RFR: 8204531: Remove unused chars following '\0'

Thomas Stüfe Wed, 13 Jun 2018 11:36:02 -0700

Hi Yasumasa,

looks good to me.


small nit:
import java.nio.charset.* -> import java.nio.charset.Charset;

We loose the assert for type now from byteArrayValue(). Can you re-add
it to the location where you call CStringUtils? We do not seem to need
byteArrayValue(), can this be removed?

Best Regards, Thomas



On Wed, Jun 13, 2018 at 3:33 PM, Yasumasa Suenaga <[email protected]> wrote:
> Thanks Thomas,
>
>> That said, I'd probably just add a variant to
>> CStringUtilities::getString which takes encoding as an argument, and
>> then pass "US-ASCII" to remain compatible with the current version of
>> jsnap.
>
>
> I agree with you. My opinion is one of the ideals.
> Anyway, I uploaded new webrev which is added charset support in
> CStringUtils:
>
>   http://cr.openjdk.java.net/~ysuenaga/JDK-8204531/webrev.01/
>
>
> Thanks,
>
> Yasumasa
>
>
>
> On 2018/06/13 21:46, Thomas Stüfe wrote:
>>
>> Disclaimer: I am Reviewer but not versed in this corner of the jdk,
>> nor have knowledge of the history, so take what I suggest with a grain
>> of salt.
>>
>> That said, I'd probably just add a variant to
>> CStringUtilities::getString which takes encoding as an argument, and
>> then pass "US-ASCII" to remain compatible with the current version of
>> jsnap.
>>
>> ..Thomas
>>
>> On Wed, Jun 13, 2018 at 2:37 PM, Yasumasa Suenaga <[email protected]>
>> wrote:
>>>
>>> Hi Thomas,
>>>
>>> Thank you for your comment.
>>>
>>> CStringUtilities.java says the String which is constructed from byte
>>> array
>>> should use Charset.defaultCharset(). [1]
>>> I'm not convinced how should we fix - US-ASCII or default charset.
>>>
>>> IMHO we should use default charset. However I concern I need to get CSR
>>> approvement.
>>> (In addition, I want to refactor CStringUtilities. I guess we can make
>>> more
>>> simple code :-)
>>>
>>>
>>> What do you think about it?
>>>
>>> Yasumasa
>>>
>>>
>>> [1]
>>>
>>> http://hg.openjdk.java.net/jdk/jdk/file/7bf4f1b5e438/src/jdk.hotspot.agent/share/classes/sun/jvm/hotspot/utilities/CStringUtilities.java#l73
>>>
>>>
>>>
>>> On 2018/06/13 21:18, Thomas Stüfe wrote:
>>>>
>>>>
>>>> Hi Yasumasa,
>>>>
>>>> your patch will change behavior of jsnap if file.encoding is set to
>>>> something different than US-ASCII.
>>>>
>>>> Old version uses, hard wired, US-ASCII.
>>>>
>>>> You now use jvm/hotspot/utilities/CStringUtilities.java, which uses
>>>>
>>>> private static String encoding = System.getProperty("file.encoding",
>>>> "US-ASCII");
>>>>
>>>> and thus may use a different encoding.
>>>>
>>>> Best Regards, Thomas
>>>>
>>>>
>>>>
>>>> On Wed, Jun 13, 2018 at 2:09 PM, Yasumasa Suenaga <[email protected]>
>>>> wrote:
>>>>>
>>>>>
>>>>> PING: Could you review it?
>>>>> I want to merge this change before JDK 11 RDP 1.
>>>>>
>>>>>>>>       webrev:
>>>>>>>>         http://cr.openjdk.java.net/~ysuenaga/JDK-8204531/webrev.00/
>>>>>>>>
>>>>>>>>       JBS:
>>>>>>>>         https://bugs.openjdk.java.net/browse/JDK-8204531
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> Yasumasa
>>>>>
>>>>>
>>>>> On 2018/06/08 15:38, Yasumasa Suenaga wrote:
>>>>>>
>>>>>>
>>>>>>
>>>>>> Hi David,
>>>>>>
>>>>>> 2018-06-08 10:47 GMT+09:00 David Holmes <[email protected]>:
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Hi Yasumasa,
>>>>>>>
>>>>>>> On 7/06/2018 10:43 PM, Yasumasa Suenaga wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> Hi all,
>>>>>>>>
>>>>>>>> Please review this change:
>>>>>>>>
>>>>>>>>       webrev:
>>>>>>>>         http://cr.openjdk.java.net/~ysuenaga/JDK-8204531/webrev.00/
>>>>>>>>
>>>>>>>>       JBS:
>>>>>>>>         https://bugs.openjdk.java.net/browse/JDK-8204531
>>>>>>>>
>>>>>>>>
>>>>>>>> We can use `jhsdb jsnap` to check all PerfData.
>>>>>>>> String values in PerfData are defined as jbyte array, but JSnap
>>>>>>>> cannot
>>>>>>>> handle it well as following:
>>>>>>>>
>>>>>>>> ```
>>>>>>>> $ jhsdb jsnap --pid 28542 --all | less
>>>>>>>>
>>>>>>>> sun.gc.cause=No
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> GC^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@
>>>>>>>> ```
>>>>>>>>
>>>>>>>> You can see this value via `less` and `vim` on Linux. `^@` shows it
>>>>>>>> is
>>>>>>>> non-ascii character.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> So your synopsis says "remove unused chars following \0". Is that
>>>>>>> really
>>>>>>> what is happening? I would have expected "new String(bytes,
>>>>>>> encoding)"
>>>>>>> to
>>>>>>> stop processing bytes when it encounters a NUL!
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> String c'tor will continue to parse byte array even if it finds '\0'.
>>>>>> You can see it on JShell as below:
>>>>>>
>>>>>> -------------
>>>>>> jshell> var bytes = new byte[]{(byte)'a', (byte)'b', (byte)'c',
>>>>>> (byte)'\0', (byte)'d', (byte)'e', (byte)'f'}
>>>>>> bytes ==> byte[7] { 97, 98, 99, 0, 100, 101, 102 }
>>>>>>
>>>>>> jshell> new String(bytes, "US-ASCII")
>>>>>> $2 ==> "abc\000def"
>>>>>> -------------
>>>>>>
>>>>>>
>>>>>>>> PerfDataEntry has null-terminated C string. So we should restore as
>>>>>>>> it
>>>>>>>> in
>>>>>>>> Java layer.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> If the issue is really "junk" beyond the \0 nul-terminator, how did
>>>>>>> we
>>>>>>> get
>>>>>>> such values? Shouldn't the byte array already be terminated at the
>>>>>>> NUL?
>>>>>>> Or
>>>>>>> is this just a raw representation of an array in the VM using the
>>>>>>> exact
>>>>>>> length of the array regardless of content?
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> I think JSnap shows the array regardless of content.
>>>>>>
>>>>>>
>>>>>>> I see CStringUtilities.getString() both stops when it encounters a 0
>>>>>>> value,
>>>>>>> and uses the default file encoding (else ASCII).
>>>>>>>
>>>>>>> The fix seems perfectly reasonable, I'm just unclear where exactly
>>>>>>> this
>>>>>>> "bad" data should have been stopped.
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> Thanks!
>>>>>>
>>>>>>
>>>>>> Yasumasa
>>>>>>
>>>>>>
>>>>>>> Thanks,
>>>>>>> David
>>>>>>>
>>>>>>>
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>>
>>>>>>>> Yasumasa
>>>>>>>>
>>>>>>>
>>>>>
>>>
>

Re: PING: RFR: 8204531: Remove unused chars following '\0'

Reply via email to