Re: [xz-devel] xz-java and newer java

Brett Okken Thu, 21 Jan 2021 17:47:17 -0800

> Have you tested with 32-bit Java too? It's quite possible that it's
> better to use ints than longs on 32-bit system. If so, that should be
> detected at runtime too, I guess.


I have now run benchmarks using the 32bit jre on 64bit windows system.
That actually introduces additional interesting impacts by using the
client jvm by default, which does not use the c2 compiler. The longs
appear to be a bit faster in client mode (not as optimized) and the
ints a bit faster in server mode.

> In XZ Utils the arrays have extra room at the end so that memcmplen.h
> can always read 4/8/16 bytes at a time. Since this is easy to do, I
> think it should be done in XZ for Java too to avoid special handling of
> the last bytes.

Somewhat surprisingly, this actually appears to make things slightly
worse by having to introduce the check to see a detected difference
was beyond the actual length.

> Since Java in general is memory safe, having bound checks with Unsafe is
> nice as long as it doesn't hurt performance too much. This
>
>        if (aFromIndex < 0 || aFromIndex + length > a.length ||
>            bFromIndex < 0 || bFromIndex + length > b.length) {
>
> is a bit relaxed though since it doesn't catch integer overflows.
> Something like this would be more strict:
>
>        if (length < 0 ||
>             aFromIndex < 0 || aFromIndex > a.length - length ||
>             bFromIndex < 0 || bFromIndex > b.length - length) {

Nice catch. Arrays approaching 2GB are not common yet, but seems
likely in future.

> Comparing byte arrays as ints or longs results in unaligned/misaligned
> memory access. MethodHandles.byteArrayViewVarHandle docs say that this
> is OK. A quick web search gave me an impression that it might not be
> safe with Unsafe though. Can you verify how it is with Unsafe? If it
> isn't allowed, dropping support for Unsafe may be fine. It's just the
> older Java versions that would use it anyway.

This is a bit trickier. The closest I can find to documentation is
from the getInt method[1].

The object referred to by o is an array, and the offset is an integer
of the form B+N*S, where N is a valid index into the array, and B and
S are the values obtained by arrayBaseOffset and arrayIndexScale
(respectively) from the array's class. The value referred to is the
Nth element of the array.
...
However, the results are undefined if that variable is not in fact of
the type returned by this method.

Taken in context with what the new
jdk.internal.misc.Unsafe.getLongUnaligned implementation looks like[2]
unaligned access is not safe with Unsafe.getLong.

> Do you have a way to check how these methods behave on Android and ARM?
> (I understand that this might be too much work to check. This may be
> skipped.)

I /might/ be able to run some benchmarks on aws graviton2 instances.

I will send out updated code soon, but I currently have
implementations for jdk 9+ to use VarHandle for x86 (ints for 32bit
longs for 64 bit) and non-x86 to use the vectorized Arrays mismatch.
For older jdks, x86 will use Unsafe (again, ints for 32 bit and longs
for 64 bit). There are also implementations for VarHandles which will
make an attempt to compare individual bytes to try and get memory
reads into alignment. The default implementation behavior can be
overridden by setting a system property.

[1] - 
https://github.com/openjdk/jdk11u-dev/blob/master/src/jdk.unsupported/share/classes/sun/misc/Unsafe.java#L127-L139
[2] - 
https://github.com/openjdk/jdk11u-dev/blob/master/src/java.base/share/classes/jdk/internal/misc/Unsafe.java#L3398

Re: [xz-devel] xz-java and newer java

Reply via email to