Re: PING: RFR: 8204531: Remove unused chars following '\0'

Yasumasa Suenaga Wed, 13 Jun 2018 05:39:20 -0700

Hi Thomas,

Thank you for your comment.


CStringUtilities.java says the String which is constructed from byte array 
should use Charset.defaultCharset(). [1]
I'm not convinced how should we fix - US-ASCII or default charset.

IMHO we should use default charset. However I concern I need to get CSR 
approvement.
(In addition, I want to refactor CStringUtilities. I guess we can make more 
simple code :-)


What do you think about it?

Yasumasa


[1] 
http://hg.openjdk.java.net/jdk/jdk/file/7bf4f1b5e438/src/jdk.hotspot.agent/share/classes/sun/jvm/hotspot/utilities/CStringUtilities.java#l73


On 2018/06/13 21:18, Thomas Stüfe wrote:

Hi Yasumasa,

your patch will change behavior of jsnap if file.encoding is set to
something different than US-ASCII.

Old version uses, hard wired, US-ASCII.

You now use jvm/hotspot/utilities/CStringUtilities.java, which uses

private static String encoding = System.getProperty("file.encoding",
"US-ASCII");

and thus may use a different encoding.

Best Regards, Thomas



On Wed, Jun 13, 2018 at 2:09 PM, Yasumasa Suenaga <[email protected]> wrote:

PING: Could you review it?
I want to merge this change before JDK 11 RDP 1.

     webrev:
       http://cr.openjdk.java.net/~ysuenaga/JDK-8204531/webrev.00/

     JBS:
       https://bugs.openjdk.java.net/browse/JDK-8204531




Yasumasa


On 2018/06/08 15:38, Yasumasa Suenaga wrote:


Hi David,

2018-06-08 10:47 GMT+09:00 David Holmes <[email protected]>:


Hi Yasumasa,

On 7/06/2018 10:43 PM, Yasumasa Suenaga wrote:



Hi all,

Please review this change:

     webrev:
       http://cr.openjdk.java.net/~ysuenaga/JDK-8204531/webrev.00/

     JBS:
       https://bugs.openjdk.java.net/browse/JDK-8204531


We can use `jhsdb jsnap` to check all PerfData.
String values in PerfData are defined as jbyte array, but JSnap cannot
handle it well as following:

```
$ jhsdb jsnap --pid 28542 --all | less

sun.gc.cause=No

GC^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@
```

You can see this value via `less` and `vim` on Linux. `^@` shows it is
non-ascii character.




So your synopsis says "remove unused chars following \0". Is that really
what is happening? I would have expected "new String(bytes, encoding)" to
stop processing bytes when it encounters a NUL!



String c'tor will continue to parse byte array even if it finds '\0'.
You can see it on JShell as below:

-------------
jshell> var bytes = new byte[]{(byte)'a', (byte)'b', (byte)'c',
(byte)'\0', (byte)'d', (byte)'e', (byte)'f'}
bytes ==> byte[7] { 97, 98, 99, 0, 100, 101, 102 }

jshell> new String(bytes, "US-ASCII")
$2 ==> "abc\000def"
-------------

PerfDataEntry has null-terminated C string. So we should restore as it
in
Java layer.




If the issue is really "junk" beyond the \0 nul-terminator, how did we
get
such values? Shouldn't the byte array already be terminated at the NUL?
Or
is this just a raw representation of an array in the VM using the exact
length of the array regardless of content?



I think JSnap shows the array regardless of content.

I see CStringUtilities.getString() both stops when it encounters a 0
value,
and uses the default file encoding (else ASCII).

The fix seems perfectly reasonable, I'm just unclear where exactly this
"bad" data should have been stopped.



Thanks!


Yasumasa

Thanks,
David


Thanks,

Yasumasa

Re: PING: RFR: 8204531: Remove unused chars following '\0'

Reply via email to