Thanks Thomas,

That said, I'd probably just add a variant to
CStringUtilities::getString which takes encoding as an argument, and
then pass "US-ASCII" to remain compatible with the current version of
jsnap.

I agree with you. My opinion is one of the ideals.
Anyway, I uploaded new webrev which is added charset support in CStringUtils:

  http://cr.openjdk.java.net/~ysuenaga/JDK-8204531/webrev.01/


Thanks,

Yasumasa


On 2018/06/13 21:46, Thomas Stüfe wrote:
Disclaimer: I am Reviewer but not versed in this corner of the jdk,
nor have knowledge of the history, so take what I suggest with a grain
of salt.

That said, I'd probably just add a variant to
CStringUtilities::getString which takes encoding as an argument, and
then pass "US-ASCII" to remain compatible with the current version of
jsnap.

..Thomas

On Wed, Jun 13, 2018 at 2:37 PM, Yasumasa Suenaga <yasue...@gmail.com> wrote:
Hi Thomas,

Thank you for your comment.

CStringUtilities.java says the String which is constructed from byte array
should use Charset.defaultCharset(). [1]
I'm not convinced how should we fix - US-ASCII or default charset.

IMHO we should use default charset. However I concern I need to get CSR
approvement.
(In addition, I want to refactor CStringUtilities. I guess we can make more
simple code :-)


What do you think about it?

Yasumasa


[1]
http://hg.openjdk.java.net/jdk/jdk/file/7bf4f1b5e438/src/jdk.hotspot.agent/share/classes/sun/jvm/hotspot/utilities/CStringUtilities.java#l73



On 2018/06/13 21:18, Thomas Stüfe wrote:

Hi Yasumasa,

your patch will change behavior of jsnap if file.encoding is set to
something different than US-ASCII.

Old version uses, hard wired, US-ASCII.

You now use jvm/hotspot/utilities/CStringUtilities.java, which uses

private static String encoding = System.getProperty("file.encoding",
"US-ASCII");

and thus may use a different encoding.

Best Regards, Thomas



On Wed, Jun 13, 2018 at 2:09 PM, Yasumasa Suenaga <yasue...@gmail.com>
wrote:

PING: Could you review it?
I want to merge this change before JDK 11 RDP 1.

      webrev:
        http://cr.openjdk.java.net/~ysuenaga/JDK-8204531/webrev.00/

      JBS:
        https://bugs.openjdk.java.net/browse/JDK-8204531




Yasumasa


On 2018/06/08 15:38, Yasumasa Suenaga wrote:


Hi David,

2018-06-08 10:47 GMT+09:00 David Holmes <david.hol...@oracle.com>:


Hi Yasumasa,

On 7/06/2018 10:43 PM, Yasumasa Suenaga wrote:



Hi all,

Please review this change:

      webrev:
        http://cr.openjdk.java.net/~ysuenaga/JDK-8204531/webrev.00/

      JBS:
        https://bugs.openjdk.java.net/browse/JDK-8204531


We can use `jhsdb jsnap` to check all PerfData.
String values in PerfData are defined as jbyte array, but JSnap cannot
handle it well as following:

```
$ jhsdb jsnap --pid 28542 --all | less

sun.gc.cause=No


GC^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@
```

You can see this value via `less` and `vim` on Linux. `^@` shows it is
non-ascii character.




So your synopsis says "remove unused chars following \0". Is that
really
what is happening? I would have expected "new String(bytes, encoding)"
to
stop processing bytes when it encounters a NUL!



String c'tor will continue to parse byte array even if it finds '\0'.
You can see it on JShell as below:

-------------
jshell> var bytes = new byte[]{(byte)'a', (byte)'b', (byte)'c',
(byte)'\0', (byte)'d', (byte)'e', (byte)'f'}
bytes ==> byte[7] { 97, 98, 99, 0, 100, 101, 102 }

jshell> new String(bytes, "US-ASCII")
$2 ==> "abc\000def"
-------------


PerfDataEntry has null-terminated C string. So we should restore as it
in
Java layer.




If the issue is really "junk" beyond the \0 nul-terminator, how did we
get
such values? Shouldn't the byte array already be terminated at the NUL?
Or
is this just a raw representation of an array in the VM using the exact
length of the array regardless of content?



I think JSnap shows the array regardless of content.


I see CStringUtilities.getString() both stops when it encounters a 0
value,
and uses the default file encoding (else ASCII).

The fix seems perfectly reasonable, I'm just unclear where exactly this
"bad" data should have been stopped.



Thanks!


Yasumasa


Thanks,
David



Thanks,

Yasumasa




Reply via email to