Re: PING: RFR: 8204531: Remove unused chars following '\0'

Thomas Stüfe Wed, 13 Jun 2018 05:47:12 -0700

Disclaimer: I am Reviewer but not versed in this corner of the jdk,
nor have knowledge of the history, so take what I suggest with a grain
of salt.


That said, I'd probably just add a variant to
CStringUtilities::getString which takes encoding as an argument, and
then pass "US-ASCII" to remain compatible with the current version of
jsnap.

..Thomas

On Wed, Jun 13, 2018 at 2:37 PM, Yasumasa Suenaga <yasue...@gmail.com> wrote:
> Hi Thomas,
>
> Thank you for your comment.
>
> CStringUtilities.java says the String which is constructed from byte array
> should use Charset.defaultCharset(). [1]
> I'm not convinced how should we fix - US-ASCII or default charset.
>
> IMHO we should use default charset. However I concern I need to get CSR
> approvement.
> (In addition, I want to refactor CStringUtilities. I guess we can make more
> simple code :-)
>
>
> What do you think about it?
>
> Yasumasa
>
>
> [1]
> http://hg.openjdk.java.net/jdk/jdk/file/7bf4f1b5e438/src/jdk.hotspot.agent/share/classes/sun/jvm/hotspot/utilities/CStringUtilities.java#l73
>
>
>
> On 2018/06/13 21:18, Thomas Stüfe wrote:
>>
>> Hi Yasumasa,
>>
>> your patch will change behavior of jsnap if file.encoding is set to
>> something different than US-ASCII.
>>
>> Old version uses, hard wired, US-ASCII.
>>
>> You now use jvm/hotspot/utilities/CStringUtilities.java, which uses
>>
>> private static String encoding = System.getProperty("file.encoding",
>> "US-ASCII");
>>
>> and thus may use a different encoding.
>>
>> Best Regards, Thomas
>>
>>
>>
>> On Wed, Jun 13, 2018 at 2:09 PM, Yasumasa Suenaga <yasue...@gmail.com>
>> wrote:
>>>
>>> PING: Could you review it?
>>> I want to merge this change before JDK 11 RDP 1.
>>>
>>>>>>      webrev:
>>>>>>        http://cr.openjdk.java.net/~ysuenaga/JDK-8204531/webrev.00/
>>>>>>
>>>>>>      JBS:
>>>>>>        https://bugs.openjdk.java.net/browse/JDK-8204531
>>>
>>>
>>>
>>>
>>> Yasumasa
>>>
>>>
>>> On 2018/06/08 15:38, Yasumasa Suenaga wrote:
>>>>
>>>>
>>>> Hi David,
>>>>
>>>> 2018-06-08 10:47 GMT+09:00 David Holmes <david.hol...@oracle.com>:
>>>>>
>>>>>
>>>>> Hi Yasumasa,
>>>>>
>>>>> On 7/06/2018 10:43 PM, Yasumasa Suenaga wrote:
>>>>>>
>>>>>>
>>>>>>
>>>>>> Hi all,
>>>>>>
>>>>>> Please review this change:
>>>>>>
>>>>>>      webrev:
>>>>>>        http://cr.openjdk.java.net/~ysuenaga/JDK-8204531/webrev.00/
>>>>>>
>>>>>>      JBS:
>>>>>>        https://bugs.openjdk.java.net/browse/JDK-8204531
>>>>>>
>>>>>>
>>>>>> We can use `jhsdb jsnap` to check all PerfData.
>>>>>> String values in PerfData are defined as jbyte array, but JSnap cannot
>>>>>> handle it well as following:
>>>>>>
>>>>>> ```
>>>>>> $ jhsdb jsnap --pid 28542 --all | less
>>>>>>
>>>>>> sun.gc.cause=No
>>>>>>
>>>>>>
>>>>>> GC^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@
>>>>>> ```
>>>>>>
>>>>>> You can see this value via `less` and `vim` on Linux. `^@` shows it is
>>>>>> non-ascii character.
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> So your synopsis says "remove unused chars following \0". Is that
>>>>> really
>>>>> what is happening? I would have expected "new String(bytes, encoding)"
>>>>> to
>>>>> stop processing bytes when it encounters a NUL!
>>>>
>>>>
>>>>
>>>> String c'tor will continue to parse byte array even if it finds '\0'.
>>>> You can see it on JShell as below:
>>>>
>>>> -------------
>>>> jshell> var bytes = new byte[]{(byte)'a', (byte)'b', (byte)'c',
>>>> (byte)'\0', (byte)'d', (byte)'e', (byte)'f'}
>>>> bytes ==> byte[7] { 97, 98, 99, 0, 100, 101, 102 }
>>>>
>>>> jshell> new String(bytes, "US-ASCII")
>>>> $2 ==> "abc\000def"
>>>> -------------
>>>>
>>>>
>>>>>> PerfDataEntry has null-terminated C string. So we should restore as it
>>>>>> in
>>>>>> Java layer.
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> If the issue is really "junk" beyond the \0 nul-terminator, how did we
>>>>> get
>>>>> such values? Shouldn't the byte array already be terminated at the NUL?
>>>>> Or
>>>>> is this just a raw representation of an array in the VM using the exact
>>>>> length of the array regardless of content?
>>>>
>>>>
>>>>
>>>> I think JSnap shows the array regardless of content.
>>>>
>>>>
>>>>> I see CStringUtilities.getString() both stops when it encounters a 0
>>>>> value,
>>>>> and uses the default file encoding (else ASCII).
>>>>>
>>>>> The fix seems perfectly reasonable, I'm just unclear where exactly this
>>>>> "bad" data should have been stopped.
>>>>
>>>>
>>>>
>>>> Thanks!
>>>>
>>>>
>>>> Yasumasa
>>>>
>>>>
>>>>> Thanks,
>>>>> David
>>>>>
>>>>>
>>>>>>
>>>>>> Thanks,
>>>>>>
>>>>>> Yasumasa
>>>>>>
>>>>>
>>>
>

Re: PING: RFR: 8204531: Remove unused chars following '\0'

Reply via email to