Re: [xz-devel] xz-java and newer java

2024-02-19 Thread Lasse Collin
On 2024-02-19 Brett Okken wrote:
> I have created a pr to the GitHub project.
> 
> https://github.com/tukaani-project/xz-java/pull/12

Thanks! I could be good to split into smaller commits to make reviewing
easier.

> It is not clear to me if that is actually seeing active dev on the
> Java project yet.

I see now that there are quite a few things on GH. I had forgotten to
turn email notifications on for the xz-java project; clearly those
aren't on by default. :-( But likely not much would have been done even
if I had noticed those issues and PRs earlier so the main problem is
that the silence has been impolite. I'm sorry.

XZ Utils 5.6.0 has to be released this month since there was a wish to
get it into the next Ubuntu LTS. I'm hoping that next month something
will finally get done around XZ for Java. We'll see.

One thing I wonder is if JNI could help. Optimizing the Java code can
help a bit but I suspect that it still won't be very fast. So far it has
been nice that the Java code is quite readable and I would like keep it
that way in the future too.

-- 
Lasse Collin



Re: [xz-devel] xz-java and newer java

2024-02-19 Thread Brett Okken
I have created a pr to the GitHub project.

https://github.com/tukaani-project/xz-java/pull/12

It is not clear to me if that is actually seeing active dev on the Java
project yet.

Thanks,
Brett

On Sat, Feb 12, 2022 at 11:45 AM Brett Okken 
wrote:

> Can this be taken up again?
>
> On Wed, Mar 24, 2021 at 6:20 AM Brett Okken 
> wrote:
>
>> I grabbed an older version in the last mail. This is the updated
>> version for aarch64.
>>
>


Re: [xz-devel] xz-java and newer java

2022-05-10 Thread Dennis Ens
> Can this be taken up again?
+1

Any updates on this?

--
Dennis Ens



Re: [xz-devel] xz-java and newer java

2022-02-12 Thread Brett Okken
Can this be taken up again?

On Wed, Mar 24, 2021 at 6:20 AM Brett Okken 
wrote:

> I grabbed an older version in the last mail. This is the updated
> version for aarch64.
>


Re: [xz-devel] xz-java and newer java

2021-03-24 Thread Brett Okken
I grabbed an older version in the last mail. This is the updated
version for aarch64.


ArrayUtil.java
Description: Binary data


Re: [xz-devel] xz-java and newer java

2021-03-23 Thread Brett Okken
I was able to test on AWS graviton2 instances (aarch64), but only with
jdk 15. The results show that the vectorized approach appears the best
option, though long comparisons are also an improvement over baseline.


Based on this, I made a small change to ArrayUtil to, by default, use
unsafe long comparisons for aarch64 for older JDKs.

I attached an image of the summarized data, but here it is raw:

BASELINE
Benchmark (file)  (preset)  Mode  Cnt
Score Error  Units
XZCompressionBenchmark.compress  ihe_ovly_pr.dcm 3  avgt3
1.192 ±   0.005  ms/op
XZCompressionBenchmark.compress  ihe_ovly_pr.dcm 6  avgt3
6.066 ±   0.023  ms/op
XZCompressionBenchmark.compress   image1.dcm 3  avgt3
 4125.464 ± 134.683  ms/op
XZCompressionBenchmark.compress   image1.dcm 6  avgt3
 8193.866 ± 205.916  ms/op
XZCompressionBenchmark.compresslarge.xml 3  avgt3
 1438.101 ±   7.357  ms/op
XZCompressionBenchmark.compresslarge.xml 6  avgt3
11961.600 ±  38.130  ms/op

Updated
Benchmark (file)  (preset)
 Mode  Cnt  Score Error  Units
XZCompressionBenchmark.compress_legacy   ihe_ovly_pr.dcm 3
 avgt3  1.434 ±   0.007  ms/op
XZCompressionBenchmark.compress_legacy   ihe_ovly_pr.dcm 6
 avgt3  6.694 ±   0.006  ms/op
XZCompressionBenchmark.compress_legacyimage1.dcm 3
 avgt3   4236.741 ± 176.331  ms/op
XZCompressionBenchmark.compress_legacyimage1.dcm 6
 avgt3   8923.713 ± 715.574  ms/op
XZCompressionBenchmark.compress_legacy large.xml 3
 avgt3   1399.139 ±   5.955  ms/op
XZCompressionBenchmark.compress_legacy large.xml 6
 avgt3  15793.829 ± 169.169  ms/op
XZCompressionBenchmark.compress_unsafe_long  ihe_ovly_pr.dcm 3
 avgt3  1.341 ±   0.016  ms/op
XZCompressionBenchmark.compress_unsafe_long  ihe_ovly_pr.dcm 6
 avgt3  4.441 ±   0.012  ms/op
XZCompressionBenchmark.compress_unsafe_long   image1.dcm 3
 avgt3   4172.261 ±  41.783  ms/op
XZCompressionBenchmark.compress_unsafe_long   image1.dcm 6
 avgt3   7414.503 ± 123.315  ms/op
XZCompressionBenchmark.compress_unsafe_longlarge.xml 3
 avgt3   1289.451 ±  10.420  ms/op
XZCompressionBenchmark.compress_unsafe_longlarge.xml 6
 avgt3  11355.386 ±  68.198  ms/op
XZCompressionBenchmark.compress_vh_int   ihe_ovly_pr.dcm 3
 avgt3  1.343 ±   0.008  ms/op
XZCompressionBenchmark.compress_vh_int   ihe_ovly_pr.dcm 6
 avgt3  5.137 ±   0.016  ms/op
XZCompressionBenchmark.compress_vh_intimage1.dcm 3
 avgt3   4097.739 ± 162.179  ms/op
XZCompressionBenchmark.compress_vh_intimage1.dcm 6
 avgt3   7865.803 ± 127.912  ms/op
XZCompressionBenchmark.compress_vh_int large.xml 3
 avgt3   1284.516 ±  26.837  ms/op
XZCompressionBenchmark.compress_vh_int large.xml 6
 avgt3  12066.120 ± 106.382  ms/op
XZCompressionBenchmark.compress_vh_long  ihe_ovly_pr.dcm 3
 avgt3  1.330 ±   0.009  ms/op
XZCompressionBenchmark.compress_vh_long  ihe_ovly_pr.dcm 6
 avgt3  4.569 ±   0.205  ms/op
XZCompressionBenchmark.compress_vh_long   image1.dcm 3
 avgt3   4110.154 ± 182.196  ms/op
XZCompressionBenchmark.compress_vh_long   image1.dcm 6
 avgt3   7420.054 ± 330.954  ms/op
XZCompressionBenchmark.compress_vh_longlarge.xml 3
 avgt3   1271.665 ±  10.160  ms/op
XZCompressionBenchmark.compress_vh_longlarge.xml 6
 avgt3  11077.733 ±  64.546  ms/op
XZCompressionBenchmark.compress_vh_vectorihe_ovly_pr.dcm 3
 avgt3  1.326 ±   0.016  ms/op
XZCompressionBenchmark.compress_vh_vectorihe_ovly_pr.dcm 6
 avgt3  4.551 ±   0.085  ms/op
XZCompressionBenchmark.compress_vh_vector image1.dcm 3
 avgt3   4084.482 ±  32.445  ms/op
XZCompressionBenchmark.compress_vh_vector image1.dcm 6
 avgt3   7670.077 ± 343.810  ms/op
XZCompressionBenchmark.compress_vh_vector  large.xml 3
 avgt3   1274.196 ±   2.831  ms/op
XZCompressionBenchmark.compress_vh_vector  large.xml 6
 avgt3  10485.505 ±  43.182  ms/op


ArrayUtil.java
Description: Binary data


Re: [xz-devel] xz-java and newer java

2021-02-19 Thread Brett Okken
I have attached updated patches and ArrayUtil.java.
HC4 needed changes/optimizations in both locations.
I also found a better way to handle BT4 occasionally sending -1 as the length.
diff --git a/src/org/tukaani/xz/lz/BT4.java b/src/org/tukaani/xz/lz/BT4.java
index 6c46feb..7d78aef 100644
--- a/src/org/tukaani/xz/lz/BT4.java
+++ b/src/org/tukaani/xz/lz/BT4.java
@@ -11,6 +11,7 @@
 package org.tukaani.xz.lz;

 import org.tukaani.xz.ArrayCache;
+import org.tukaani.xz.common.ArrayUtil;

 final class BT4 extends LZEncoder {
 private final Hash234 hash;
@@ -46,6 +47,7 @@ final class BT4 extends LZEncoder {
 this.depthLimit = depthLimit > 0 ? depthLimit : 16 + niceLen / 2;
 }

+@Override
 public void putArraysToCache(ArrayCache arrayCache) {
 arrayCache.putArray(tree);
 hash.putArraysToCache(arrayCache);
@@ -70,6 +72,7 @@ final class BT4 extends LZEncoder {
 return avail;
 }

+@Override
 public Matches getMatches() {
 matches.count = 0;

@@ -118,9 +121,10 @@ final class BT4 extends LZEncoder {

 // If a match was found, see how long it is.
 if (matches.count > 0) {
-while (lenBest < matchLenLimit && buf[readPos + lenBest - delta2]
-  == buf[readPos + lenBest])
-++lenBest;
+lenBest += ArrayUtil.mismatch(buf,
+  readPos + lenBest - delta2,
+  readPos + lenBest,
+  matchLenLimit - lenBest);

 matches.len[matches.count - 1] = lenBest;

@@ -160,25 +164,27 @@ final class BT4 extends LZEncoder {
 + (delta > cyclicPos ? cyclicSize : 0)) << 1;
 int len = Math.min(len0, len1);

-if (buf[readPos + len - delta] == buf[readPos + len]) {
-while (++len < matchLenLimit)
-if (buf[readPos + len - delta] != buf[readPos + len])
-break;
-
-if (len > lenBest) {
-lenBest = len;
-matches.len[matches.count] = len;
-matches.dist[matches.count] = delta - 1;
-++matches.count;
-
-if (len >= niceLenLimit) {
-tree[ptr1] = tree[pair];
-tree[ptr0] = tree[pair + 1];
-return matches;
-}
+int mismatch = ArrayUtil.mismatch(buf,
+  readPos + len - delta,
+  readPos + len,
+  matchLenLimit - len);
+
+len += mismatch;
+
+if (len > lenBest) {
+lenBest = len;
+matches.len[matches.count] = len;
+matches.dist[matches.count] = delta - 1;
+++matches.count;
+
+if (len >= niceLenLimit) {
+tree[ptr1] = tree[pair];
+tree[ptr0] = tree[pair + 1];
+return matches;
 }
 }

+
 if ((buf[readPos + len - delta] & 0xFF)
 < (buf[readPos + len] & 0xFF)) {
 tree[ptr1] = currentMatch;
@@ -215,18 +221,19 @@ final class BT4 extends LZEncoder {
 + (delta > cyclicPos ? cyclicSize : 0)) << 1;
 int len = Math.min(len0, len1);

-if (buf[readPos + len - delta] == buf[readPos + len]) {
-// No need to look for longer matches than niceLenLimit
-// because we only are updating the tree, not returning
-// matches found to the caller.
-do {
-if (++len == niceLenLimit) {
-tree[ptr1] = tree[pair];
-tree[ptr0] = tree[pair + 1];
-return;
-}
-} while (buf[readPos + len - delta] == buf[readPos + len]);
+// No need to look for longer matches than niceLenLimit
+// because we only are updating the tree, not returning
+// matches found to the caller.
+int mismatch = ArrayUtil.mismatch(buf,
+  readPos + len - delta,
+  readPos + len,
+  niceLenLimit);
+if (mismatch == niceLenLimit) {
+tree[ptr1] = tree[pair];
+tree[ptr0] = tree[pair + 1];
+return;
 }
+len += mismatch;

 if ((buf[readPos + len - delta] & 0xFF)
 < (buf[readPos + len] & 0xFF)) {
@@ -243,6 +250,7 @@ final class BT4 extends LZEncoder {
 }
 }

+@Override
 public void skip(int len) {
 while (len-- > 0) {
  

Re: [xz-devel] xz-java and newer java

2021-02-16 Thread Brett Okken
On Tue, Feb 16, 2021 at 12:48 PM Lasse Collin  wrote:
>
> I quickly tried these with "XZEncDemo 2". I used the preset 2 because
> that uses LZMAEncoderFast instead of LZMAEncoderNormal where the
> negative lengths result in a crash.

I updated the mismatch method to check for negative lengths upfront:

return length > 0 ? (int) MISMATCH.invokeExact(bytes, aFromIndex,
bFromIndex, length) : 0;

I must not have sent that out.

> The performance was about the same
> or worse than the original code. I don't know why. I didn't spend much
> time on this and it's possible that I messed up something.

I have only been testing at the default preset (6). I should add tests
for the one of the "fast" presets as well.

> One thing that may be worth checking out is how in HC4.java (and
> BT4.java too) the patch doesn't try to quickly skip too short matches
> like the original code does. I suppose the first set of patches should
> be such that they only replace the byte-by-byte loops with a function
> call to make comparison as fair as possible.

In the BT4, (what is being tested), it does not actually seem to
benefit from the "too short" matches, at least for the content I was
testing. This might be different for HC4.

> These patches won't get into XZ for Java 1.9 but might be in a later
> version if I see them being/becoming good. The only remaining patch
> that might get into 1.9 is LZDecoder.repeat improvements.

Sounds good.

> When you post a patch or other code, please make sure that word-wrapping
> is disabled in the email client or use attachments. Thanks!

I will move to attachments. I do not see how to stick with plain text
and keep gmail from wrapping.

Brett



Re: [xz-devel] xz-java and newer java

2021-02-16 Thread Lasse Collin
I quickly tried these with "XZEncDemo 2". I used the preset 2 because
that uses LZMAEncoderFast instead of LZMAEncoderNormal where the
negative lengths result in a crash. The performance was about the same
or worse than the original code. I don't know why. I didn't spend much
time on this and it's possible that I messed up something.

One thing that may be worth checking out is how in HC4.java (and
BT4.java too) the patch doesn't try to quickly skip too short matches
like the original code does. I suppose the first set of patches should
be such that they only replace the byte-by-byte loops with a function
call to make comparison as fair as possible.

These patches won't get into XZ for Java 1.9 but might be in a later
version if I see them being/becoming good. The only remaining patch
that might get into 1.9 is LZDecoder.repeat improvements.

When you post a patch or other code, please make sure that word-wrapping
is disabled in the email client or use attachments. Thanks!

-- 
Lasse Collin  |  IRC: Larhzu @ IRCnet & Freenode



Re: [xz-devel] xz-java and newer java

2021-01-31 Thread Brett Okken
Replacing while loops with switch statements for the "extra bytes"
also yields a small improvement. Pulling that common logic out into a
utility method negates most of the benefit.
Here is the updated ArrayUtil class.



package org.tukaani.xz.common;

import static java.lang.invoke.MethodType.methodType;

import java.lang.invoke.MethodHandle;
import java.lang.invoke.MethodHandles;
import java.lang.invoke.MethodType;
import java.lang.reflect.Constructor;
import java.lang.reflect.Method;
import java.nio.ByteOrder;
import java.util.Arrays;
import java.util.Locale;
import java.util.Properties;
import java.util.logging.Level;
import java.util.logging.Logger;

/**
 * Utilities for optimized array interactions.
 *
 * 
 * The means of comparing arrays can be controlled by setting the sytem property
 * {@code org.tukaani.xz.ArrayComparison} to a value from {@link
ArrayComparison}.
 * 
 *
 * @author Brett Okken
 */
public final class ArrayUtil {

/**
 * Enumerated options for controlling implementation of how to
compare arrays.
 */
public static enum ArrayComparison {
/**
 * Uses {@code VarHandle} for {@code int[]} access.
 * 
 * This is default behavior on jdk9+ for 32 bit x86.
 * 
 */
VH_INT,
/**
 * Uses {@code VarHandle} for {@code int[]} access after attempting
 * to align the reads on 4 byte boundaries.
 */
VH_INT_ALIGN,
/**
 * Uses {@code VarHandle} for {@code long[]} access.
 * 
 * This is default behavior on jdk9+ for 64 bit x86.
 * 
 */
VH_LONG,
/**
 * Uses {@code VarHandle} for {@code long[]} access after attempting
 * to align the reads.
 */
VH_LONG_ALIGN,
/**
 * Uses {@code Arrays.mismatch()} to perform vectorized comparison.
 * 
 * This is default behavior on jdk9+ for non-x86.
 * 
 */
VECTOR,
/**
 * Uses {@code sun.misc.Unsafe.getInt()} for unaligned {@code int[]}
 * access.
 * 
 * This is default behavior on jdk 8 and prior for 32 bit x86.
 * 
 */
UNSAFE_GET_INT,
/**
 * Uses {@code sun.misc.Unsafe.getLong()} for unaligned {@code long[]}
 * access.
 * 
 * This is default behavior on jdk 8 and prior for 64 bit x86.
 * 
 */
UNSAFE_GET_LONG,
/**
 * Performs byte-by-byte comparison.
 */
LEGACY;

static ArrayComparison getFromProperty(String prop) {
if (prop == null || prop.isEmpty()) {
return null;
}
try {
return ArrayComparison.valueOf(prop.toUpperCase(Locale.US));
} catch (Exception e) {
final Logger logger =
Logger.getLogger(ArrayUtil.class.getName());
logger.log(Level.INFO,
   "Invalid ArrayComparison option, using
default behavior",
   e);
return null;
}
}
}

/**
 * MethodHandle to the actual mismatch method to use at runtime.
 */
private static final MethodHandle MISMATCH;

/**
 * The method this is bound to at runtime is depends on the chosen
 * implementation for {@code byte[]} comparison.
 * 
 * For {@code long} based comparisons, it will be bound to either
 * {@link Long#numberOfLeadingZeros(long)} or
 * {@link Long#numberOfTrailingZeros(long)} depending on
 * {@link ByteOrder#nativeOrder()}.
 * 
 * 
 * For {@code int} based comparisons it will be bound to either
 * {@link Integer#numberOfLeadingZeros(int)} or
 * {@link Integer#numberOfTrailingZeros(int)} depending on
 * {@link ByteOrder#nativeOrder()}.
 * 
 */
private static final MethodHandle LEADING_ZEROS;

/**
 * Populated from reflected read of
 * {@code sun.misc.Unsafe.ARRAY_BYTE_BASE_OFFSET} if one of the unsafe
 * implementations is used.
 */
private static final long ARRAY_BASE_OFFSET;

/**
 * The method this is bound to at runtime is depends on the chosen
 * implementation for {@code byte[]} comparison.
 * 
 * For {@link ArrayComparison#VECTOR} and
 * {@link ArrayComparison#LEGACY} this will be {@code null}.
 * 
 * 
 * For {@link ArrayComparison#VH_INT} and {@link
ArrayComparison#VH_INT_ALIGN}
 * this will be a jdk 9+ {@code byteArrayViewVarHandle} for {@code int[]}
 * using the {@link ByteOrder#nativeOrder()}. The method signature is
 * {@code int get(byte[], int)}.
 * 
 * 
 * For {@link ArrayComparison#VH_LONG} and {@link
ArrayComparison#VH_LONG_ALIGN}
 * this will be a jdk 9+ {@code byteArrayViewVarHandle} for {@code long[]}
 * using the {@link ByteOrder#nativeOrder()}. The method signature is
 * {@code long get(byte[], int)}.
 * 
 

Re: [xz-devel] xz-java and newer java

2021-01-24 Thread Brett Okken
Based on some playing around with unrolling loops as part of the crc64
implementation, I tried unrolling the "legacy" implementation and
found it provided some nice improvements. The improvements were most
pronounced on 32 bit jdk 11:

32 jdk 11 - LEGACY
Benchmark (file)  Mode  Cnt  Score
Error  Units
XZCompressionBenchmark.compress  ihe_ovly_pr.dcm  avgt3 17.812
±   0.588  ms/op
XZCompressionBenchmark.compress   image1.dcm  avgt3   8404.259
± 391.678  ms/op
XZCompressionBenchmark.compresslarge.xml  avgt3  16037.416
± 467.379  ms/op

Unrolled
Benchmark (file)  Mode  Cnt  Score
Error  Units
XZCompressionBenchmark.compress  ihe_ovly_pr.dcm  avgt3 13.624
±   0.845  ms/op
XZCompressionBenchmark.compress   image1.dcm  avgt3   7833.118
±  28.132  ms/op
XZCompressionBenchmark.compresslarge.xml  avgt3  12838.831
± 192.884  ms/op

32 jdk 11 - LEGACY (server)
Benchmark (file)  Mode  Cnt  Score
Error  Units
XZCompressionBenchmark.compress  ihe_ovly_pr.dcm  avgt3 14.105
±   0.081  ms/op
XZCompressionBenchmark.compress   image1.dcm  avgt3   8474.630
± 518.903  ms/op
XZCompressionBenchmark.compresslarge.xml  avgt3  16009.553
± 529.315  ms/op

Unrolled
Benchmark (file)  Mode  Cnt  Score
Error  Units
XZCompressionBenchmark.compress  ihe_ovly_pr.dcm  avgt3 10.513
±   0.290  ms/op
XZCompressionBenchmark.compress   image1.dcm  avgt3   7900.578
± 309.317  ms/op
XZCompressionBenchmark.compresslarge.xml  avgt3  12871.200
± 570.491  ms/op


/**
 * Simply loops over all of the bytes, comparing one at a time.
 */
@SuppressWarnings("unused")
private static int legacyMismatch(
byte[] a, int aFromIndex, int bFromIndex, int length) {
int i=0;
for (int j=length - 7; i

Re: [xz-devel] xz-java and newer java

2021-01-22 Thread Brett Okken
package org.tukaani.xz.common;

import static java.lang.invoke.MethodType.methodType;

import java.lang.invoke.MethodHandle;
import java.lang.invoke.MethodHandles;
import java.lang.invoke.MethodType;
import java.lang.reflect.Constructor;
import java.lang.reflect.Method;
import java.nio.ByteOrder;
import java.util.Arrays;
import java.util.Locale;
import java.util.Properties;
import java.util.logging.Level;
import java.util.logging.Logger;

/**
 * Utilities for optimized array interactions.
 *
 * 
 * The means of comparing arrays can be controlled by setting the
system property
 * {@code org.tukaani.xz.ArrayComparison} to a value from {@link
ArrayComparison}.
 * 
 *
 * @author Brett Okken
 */
public final class ArrayUtil {

/**
 * Enumerated options for controlling implementation of how to
compare arrays.
 */
public static enum ArrayComparison {
/**
 * Uses {@code VarHandle} for {@code int[]} access.
 * 
 * This is default behavior on jdk9+ for 32 bit x86.
 * 
 */
VH_INT,
/**
 * Uses {@code VarHandle} for {@code int[]} access after attempting
 * to align the reads on 4 byte boundaries.
 */
VH_INT_ALIGN,
/**
 * Uses {@code VarHandle} for {@code long[]} access.
 * 
 * This is default behavior on jdk9+ for 64 bit x86.
 * 
 */
VH_LONG,
/**
 * Uses {@code VarHandle} for {@code long[]} access after attempting
 * to align the reads.
 */
VH_LONG_ALIGN,
/**
 * Uses {@code Arrays.mismatch()} to perform vectorized comparison.
 * 
 * This is default behavior on jdk9+ for non-x86.
 * 
 */
VECTOR,
/**
 * Uses {@code sun.misc.Unsafe.getInt()} for unaligned {@code int[]}
 * access.
 * 
 * This is default behavior on jdk 8 and prior for 32 bit x86.
 * 
 */
UNSAFE_GET_INT,
/**
 * Uses {@code sun.misc.Unsafe.getLong()} for unaligned {@code long[]}
 * access.
 * 
 * This is default behavior on jdk 8 and prior for 64 bit x86.
 * 
 */
UNSAFE_GET_LONG,
/**
 * Performs byte-by-byte comparison.
 */
LEGACY;

static ArrayComparison getFromProperty(String prop) {
if (prop == null || prop.isEmpty()) {
return null;
}
try {
return ArrayComparison.valueOf(prop.toUpperCase(Locale.US));
} catch (Exception e) {
final Logger logger =
Logger.getLogger(ArrayUtil.class.getName());
logger.log(Level.INFO,
   "Invalid ArrayComparison option, using
default behavior",
   e);
return null;
}
}
}

/**
 * MethodHandle to the actual mismatch method to use at runtime.
 */
private static final MethodHandle MISMATCH;

/**
 * The method this is bound to at runtime depends on the chosen
 * implementation for {@code byte[]} comparison.
 * 
 * For {@code long} based comparisons, it will be bound to either
 * {@link Long#numberOfLeadingZeros(long)} or
 * {@link Long#numberOfTrailingZeros(long)} depending on
 * {@link ByteOrder#nativeOrder()}.
 * 
 * 
 * For {@code int} based comparisons it will be bound to either
 * {@link Integer#numberOfLeadingZeros(int)} or
 * {@link Integer#numberOfTrailingZeros(int)} depending on
 * {@link ByteOrder#nativeOrder()}.
 * 
 */
private static final MethodHandle LEADING_ZEROS;

/**
 * Populated from reflected read of
 * {@code sun.misc.Unsafe.ARRAY_BYTE_BASE_OFFSET} if one of the unsafe
 * implementations is used.
 */
private static final long ARRAY_BASE_OFFSET;

/**
 * The method this is bound to at runtime is depends on the chosen
 * implementation for {@code byte[]} comparison.
 * 
 * For {@link ArrayComparison#VECTOR} and
 * {@link ArrayComparison#LEGACY} this will be {@code null}.
 * 
 * 
 * For {@link ArrayComparison#VH_INT} and {@link
ArrayComparison#VH_INT_ALIGN}
 * this will be a jdk 9+ {@code byteArrayViewVarHandle} for {@code int[]}
 * using the {@link ByteOrder#nativeOrder()}. The method signature is
 * {@code int get(byte[], int)}.
 * 
 * 
 * For {@link ArrayComparison#VH_LONG} and {@link
ArrayComparison#VH_LONG_ALIGN}
 * this will be a jdk 9+ {@code byteArrayViewVarHandle} for {@code long[]}
 * using the {@link ByteOrder#nativeOrder()}. The method signature is
 * {@code long get(byte[], int)}.
 * 
 * 
 * For {@link ArrayComparison#UNSAFE_GET_INT} this is bound to
 * {@code sun.misc.Unsafe.getInt(Object, long)}.
 * 
 * 
 * For {@link ArrayComparison#UNSAFE_GET_LONG} this is bound to
 * 

Re: [xz-devel] xz-java and newer java

2021-01-22 Thread Brett Okken
diff --git a/src/org/tukaani/xz/lz/BT4.java b/src/org/tukaani/xz/lz/BT4.java
index 6c46feb..c96c766 100644
--- a/src/org/tukaani/xz/lz/BT4.java
+++ b/src/org/tukaani/xz/lz/BT4.java
@@ -11,6 +11,7 @@
 package org.tukaani.xz.lz;

 import org.tukaani.xz.ArrayCache;
+import org.tukaani.xz.common.ArrayUtil;

 final class BT4 extends LZEncoder {
 private final Hash234 hash;
@@ -118,9 +119,10 @@ final class BT4 extends LZEncoder {

 // If a match was found, see how long it is.
 if (matches.count > 0) {
-while (lenBest < matchLenLimit && buf[readPos + lenBest - delta2]
-  == buf[readPos + lenBest])
-++lenBest;
+lenBest += ArrayUtil.mismatch(buf,
+  readPos + lenBest - delta2,
+  readPos + lenBest,
+  matchLenLimit - lenBest);

 matches.len[matches.count - 1] = lenBest;

@@ -160,10 +162,12 @@ final class BT4 extends LZEncoder {
 + (delta > cyclicPos ? cyclicSize : 0)) << 1;
 int len = Math.min(len0, len1);

-if (buf[readPos + len - delta] == buf[readPos + len]) {
-while (++len < matchLenLimit)
-if (buf[readPos + len - delta] != buf[readPos + len])
-break;
+int mismatch = ArrayUtil.mismatch(buf,
+  readPos + len - delta,
+  readPos + len,
+  matchLenLimit - len);
+if (mismatch != 0) {
+len += mismatch;

 if (len > lenBest) {
 lenBest = len;
@@ -215,18 +219,19 @@ final class BT4 extends LZEncoder {
 + (delta > cyclicPos ? cyclicSize : 0)) << 1;
 int len = Math.min(len0, len1);

-if (buf[readPos + len - delta] == buf[readPos + len]) {
-// No need to look for longer matches than niceLenLimit
-// because we only are updating the tree, not returning
-// matches found to the caller.
-do {
-if (++len == niceLenLimit) {
-tree[ptr1] = tree[pair];
-tree[ptr0] = tree[pair + 1];
-return;
-}
-} while (buf[readPos + len - delta] == buf[readPos + len]);
+// No need to look for longer matches than niceLenLimit
+// because we only are updating the tree, not returning
+// matches found to the caller.
+int mismatch = ArrayUtil.mismatch(buf,
+  readPos + len - delta,

+  readPos + len,
+  niceLenLimit);
+if (mismatch == niceLenLimit) {
+tree[ptr1] = tree[pair];
+tree[ptr0] = tree[pair + 1];
+return;
 }
+len += mismatch;

 if ((buf[readPos + len - delta] & 0xFF)
 < (buf[readPos + len] & 0xFF)) {



diff --git a/src/org/tukaani/xz/lz/HC4.java b/src/org/tukaani/xz/lz/HC4.java
index d2b4e84..623d59d 100644
--- a/src/org/tukaani/xz/lz/HC4.java
+++ b/src/org/tukaani/xz/lz/HC4.java
@@ -11,6 +11,7 @@
 package org.tukaani.xz.lz;

 import org.tukaani.xz.ArrayCache;
+import org.tukaani.xz.common.ArrayUtil;

 final class HC4 extends LZEncoder {
 private final Hash234 hash;
@@ -136,9 +137,10 @@ final class HC4 extends LZEncoder {

 // If a match was found, see how long it is.
 if (matches.count > 0) {
-while (lenBest < matchLenLimit && buf[readPos + lenBest - delta2]
-  == buf[readPos + lenBest])
-++lenBest;
+lenBest += ArrayUtil.mismatch(buf,
+  readPos + lenBest - delta2,
+  readPos + lenBest,
+  matchLenLimit - lenBest);

 matches.len[matches.count - 1] = lenBest;

@@ -167,30 +169,21 @@ final class HC4 extends LZEncoder {
 currentMatch = chain[cyclicPos - delta
  + (delta > cyclicPos ? cyclicSize : 0)];

-// Test the first byte and the first new byte that would give us
-// a match that is at least one byte longer than lenBest. This
-// too short matches get quickly skipped.
-if (buf[readPos + lenBest - delta] == buf[readPos + lenBest]
-&& buf[readPos - delta] == buf[readPos]) {
-// Calculate the length of the match.
-int len = 0;
-while (++len < matchLenLimit)
-if 

Re: [xz-devel] xz-java and newer java

2021-01-21 Thread Brett Okken
> Have you tested with 32-bit Java too? It's quite possible that it's
> better to use ints than longs on 32-bit system. If so, that should be
> detected at runtime too, I guess.

I have now run benchmarks using the 32bit jre on 64bit windows system.
That actually introduces additional interesting impacts by using the
client jvm by default, which does not use the c2 compiler. The longs
appear to be a bit faster in client mode (not as optimized) and the
ints a bit faster in server mode.

> In XZ Utils the arrays have extra room at the end so that memcmplen.h
> can always read 4/8/16 bytes at a time. Since this is easy to do, I
> think it should be done in XZ for Java too to avoid special handling of
> the last bytes.

Somewhat surprisingly, this actually appears to make things slightly
worse by having to introduce the check to see a detected difference
was beyond the actual length.

> Since Java in general is memory safe, having bound checks with Unsafe is
> nice as long as it doesn't hurt performance too much. This
>
>if (aFromIndex < 0 || aFromIndex + length > a.length ||
>bFromIndex < 0 || bFromIndex + length > b.length) {
>
> is a bit relaxed though since it doesn't catch integer overflows.
> Something like this would be more strict:
>
>if (length < 0 ||
> aFromIndex < 0 || aFromIndex > a.length - length ||
> bFromIndex < 0 || bFromIndex > b.length - length) {

Nice catch. Arrays approaching 2GB are not common yet, but seems
likely in future.

> Comparing byte arrays as ints or longs results in unaligned/misaligned
> memory access. MethodHandles.byteArrayViewVarHandle docs say that this
> is OK. A quick web search gave me an impression that it might not be
> safe with Unsafe though. Can you verify how it is with Unsafe? If it
> isn't allowed, dropping support for Unsafe may be fine. It's just the
> older Java versions that would use it anyway.

This is a bit trickier. The closest I can find to documentation is
from the getInt method[1].

The object referred to by o is an array, and the offset is an integer
of the form B+N*S, where N is a valid index into the array, and B and
S are the values obtained by arrayBaseOffset and arrayIndexScale
(respectively) from the array's class. The value referred to is the
Nth element of the array.
...
However, the results are undefined if that variable is not in fact of
the type returned by this method.

Taken in context with what the new
jdk.internal.misc.Unsafe.getLongUnaligned implementation looks like[2]
unaligned access is not safe with Unsafe.getLong.

> Do you have a way to check how these methods behave on Android and ARM?
> (I understand that this might be too much work to check. This may be
> skipped.)

I /might/ be able to run some benchmarks on aws graviton2 instances.

I will send out updated code soon, but I currently have
implementations for jdk 9+ to use VarHandle for x86 (ints for 32bit
longs for 64 bit) and non-x86 to use the vectorized Arrays mismatch.
For older jdks, x86 will use Unsafe (again, ints for 32 bit and longs
for 64 bit). There are also implementations for VarHandles which will
make an attempt to compare individual bytes to try and get memory
reads into alignment. The default implementation behavior can be
overridden by setting a system property.

[1] - 
https://github.com/openjdk/jdk11u-dev/blob/master/src/jdk.unsupported/share/classes/sun/misc/Unsafe.java#L127-L139
[2] - 
https://github.com/openjdk/jdk11u-dev/blob/master/src/java.base/share/classes/jdk/internal/misc/Unsafe.java#L3398



Re: [xz-devel] xz-java and newer java

2021-01-18 Thread Lasse Collin
On 2021-01-11 Brett Okken wrote:
> I threw together a quick jmh test, and there is no value in the
> changes to Hash234.

OK, let's forget that then.

On 2021-01-16 Brett Okken wrote:
> I have found a way to use VarHandle byte array access at runtime in
> code which is compile time compatible with jdk 7. So here is an
> updated ArrayUtil class which will use a VarHandle to read long values
> in jdk 9+. If that is not available, it will attempt to use
> sun.misc.Unsafe. If that cannot be found, it falls back to standard
> byte by byte comparison.

Sounds promising. :-) You have already done quite a bit of work in both
writing code and benchmarking. Thank you!

The method you ended up is similar to src/liblzma/common/memcmplen.h
in XZ Utils. There 8-byte version is used on 64-bit systems and 4-byte
version on 32-bit systems. In XZ Utils, SSE2 version (16-byte
comparison) is faster than 4-byte compare on 32-bit x86, but on x86-64
the 8-byte version has similar speed or is faster than the SSE2 version
(it depends on the CPU).

Have you tested with 32-bit Java too? It's quite possible that it's
better to use ints than longs on 32-bit system. If so, that should be
detected at runtime too, I guess.

In XZ Utils the arrays have extra room at the end so that memcmplen.h
can always read 4/8/16 bytes at a time. Since this is easy to do, I
think it should be done in XZ for Java too to avoid special handling of
the last bytes.

> I did add an index bounds check for the unsafe implementation and
> found it had minimal impact on over all performance.

Since Java in general is memory safe, having bound checks with Unsafe is
nice as long as it doesn't hurt performance too much. This

if (aFromIndex < 0 || aFromIndex + length > a.length ||
bFromIndex < 0 || bFromIndex + length > b.length) {

is a bit relaxed though since it doesn't catch integer overflows.
Something like this would be more strict:

if (length < 0 ||
aFromIndex < 0 || aFromIndex > a.length - length ||
bFromIndex < 0 || bFromIndex > b.length - length) {

> Using VarHandle (at least on jdk 11) offers very similar performance
> to Unsafe across all 3 files I used for benchmarking.

OK. I cannot comment the details much because I'm not familiar with
either API for now.

Comparing byte arrays as ints or longs results in unaligned/misaligned
memory access. MethodHandles.byteArrayViewVarHandle docs say that this
is OK. A quick web search gave me an impression that it might not be
safe with Unsafe though. Can you verify how it is with Unsafe? If it
isn't allowed, dropping support for Unsafe may be fine. It's just the
older Java versions that would use it anyway.

It is *essential* that the code works well also on archs that don't
have fast unaligned access. Even if the VarHandle method is safe, it's
not clear how the performance is on archs that don't support fast
unaligned access. It would be bad to add an optimization that is good
on x86-64 but counter-productive on some other archs. One may need
arch-specific code just like there is in XZ Utils, although on the
other hand it would be nice to keep the Java code less complicated.

Do you have a way to check how these methods behave on Android and ARM?
(I understand that this might be too much work to check. This may be
skipped.)

I wish to add module-info.java in the next release. Do these new
methods affect what should be in module-info.java? With the current
code this seems to be enough:

module org.tukaani.xz {
exports org.tukaani.xz;
}

> final int leadingZeros = (int)LEADING_ZEROS.invokeExact(diff);
> return i + (leadingZeros / Byte.SIZE);

Seems that Java might not optimize that division to a right shift. It
could be better to use "leadingZeros >>> 3".

> I know you said you were not going to be able to work on xz-java for
> awhile, but given these benchmark results, which really exceeded my
> expectations, could this get some priority to release?

I understood that it's 9-18 % faster. That is significant but it's
still a performance optimization only, not an important bug fix, and to
me the code doesn't feel completely ready yet (for example, the
unaligned access is important to get right).

(Compare to the threaded decompression support that is coming to XZ
Utils. It will speed things up a few hundred percent.)

Can you provide a complete patch to make testing easier (or if not
possible, complete copies of modified files)? Also, please try to wrap
the lines so that they stay within 80 columns (with some long
unbreakable strings this may not be possible, then those lines can be
overlong instead of messing up the indentation).

I think your patch will find its way into XZ for Java in some form
but once again I repeat that it will take some time. These XZ projects
are only a hobby for me and currently I don't even turn on my computer
every day.

-- 
Lasse Collin  |  IRC: Larhzu @ IRCnet & Freenode



Re: [xz-devel] xz-java and newer java

2021-01-16 Thread Brett Okken
Lasse,

I have found a way to use VarHandle byte array access at runtime in
code which is compile time compatible with jdk 7. So here is an
updated ArrayUtil class which will use a VarHandle to read long values
in jdk 9+. If that is not available, it will attempt to use
sun.misc.Unsafe. If that cannot be found, it falls back to standard
byte by byte comparison.
I did add an index bounds check for the unsafe implementation and
found it had minimal impact on over all performance.
Using VarHandle (at least on jdk 11) offers very similar performance
to Unsafe across all 3 files I used for benchmarking.

--Baseline 1.8
Benchmark (file)  Mode  Cnt  Score
Error  Units
XZCompressionBenchmark.compress  ihe_ovly_pr.dcm  avgt4  9.558
±   0.239  ms/op
XZCompressionBenchmark.compress   image1.dcm  avgt4   6553.304
± 112.475  ms/op
XZCompressionBenchmark.compresslarge.xml  avgt4  10592.151
± 291.527  ms/op

--Unsafe
Benchmark (file)  Mode  Cnt Score
   Error  Units
XZCompressionBenchmark.compress  ihe_ovly_pr.dcm  avgt4 7.699
±   0.058  ms/op
XZCompressionBenchmark.compress   image1.dcm  avgt4  6001.170
± 143.814  ms/op
XZCompressionBenchmark.compresslarge.xml  avgt4  7853.963
±  83.753  ms/op

--VarHandle
Benchmark (file)  Mode  Cnt Score
   Error  Units
XZCompressionBenchmark.compress  ihe_ovly_pr.dcm  avgt4 7.630
±   0.542  ms/op
XZCompressionBenchmark.compress   image1.dcm  avgt4  5872.098
±  71.185  ms/op
XZCompressionBenchmark.compresslarge.xml  avgt4  8239.880
± 346.036  ms/op

I know you said you were not going to be able to work on xz-java for
awhile, but given these benchmark results, which really exceeded my
expectations, could this get some priority to release?


package org.tukaani.xz.common;

import java.lang.invoke.MethodHandle;
import java.lang.invoke.MethodHandles;
import java.lang.invoke.MethodType;
import java.lang.reflect.Constructor;
import java.lang.reflect.Method;
import java.nio.ByteOrder;
import java.util.logging.Level;
import java.util.logging.Logger;

/**
 * Utilities for optimized array interactions.
 *
 * @author Brett Okken
 */
public final class ArrayUtil {

/**
 * MethodHandle to the actual mismatch method to use at runtime.
 */
private static final MethodHandle MISMATCH;

/**
 * If {@code sun.misc.Unsafe} can be loaded, this is MethodHandle
bound to an instance of Unsafe for method {@code long getLong(Object,
long)}.
 */
private static final MethodHandle UNSAFE_GET_LONG;

/**
 * MethodHandle to either {@link Long#numberOfLeadingZeros(long)}
or {@link Long#numberOfTrailingZeros(long)} depending on {@link
ByteOrder#nativeOrder()}.
 */
private static final MethodHandle LEADING_ZEROS;

/**
 * Populated from reflected read of {@code
sun.misc.Unsafe.ARRAY_BYTE_BASE_OFFSET}.
 */
private static final long ARRAY_BASE_OFFSET;

/**
 * {@code MethodHandle} for a jdk 9+ {@code
byteArrayViewVarHandle} for {@code long[]} using the {@link
ByteOrder#nativeOrder()}.
 * The method signature is {@code long get(byte[], int)}.
 */
private static final MethodHandle VAR_HANDLE_GET_LONG;

static {
final Logger logger = Logger.getLogger(ArrayUtil.class.getName());
MethodHandle leadingZeros = null;
MethodHandle varHandleGetLong = null;
MethodHandle unsafeGetLong = null;
long arrayBaseOffset = 0;
MethodHandle mismatch = null;
final MethodHandles.Lookup lookup = MethodHandles.lookup();
final MethodType mismatchType =
MethodType.methodType(int.class, byte[].class, int.class,
byte[].class, int.class, int.class);
try {
//getLong interprets in platform byte order. the concept
of "leading zeros" being bytes
//in encounter order is true for big endian
//for little endian platform, the trailing zeros gives the
encounter order result
leadingZeros = lookup.findStatic(Long.class,
 ByteOrder.BIG_ENDIAN ==
ByteOrder.nativeOrder()
 ?
"numberOfLeadingZeros" : "numberOfTrailingZeros",

MethodType.methodType(int.class, long.class));

//first try to load byteArrayViewVarHandle for a long[]
try {
final Class varHandleClazz =
Class.forName("java.lang.invoke.VarHandle", true, null);
final Method byteArrayViewHandle =
MethodHandles.class.getDeclaredMethod("byteArrayViewVarHandle", new
Class[] {Class.class, ByteOrder.class});
final Object varHandle =
byteArrayViewHandle.invoke(null, long[].class,
ByteOrder.nativeOrder());
final Class accessModeEnum =
Class.forName("java.lang.invoke.VarHandle$AccessMode", true, null);
@SuppressWarnings({ 

Re: [xz-devel] xz-java and newer java

2021-01-13 Thread Brett Okken
I have continued to refine the changes around the array comparisons
and think I am pretty well done there.

I did a small benchmark measuring the time to compress 3 different
files using new XZOutputStream(cos, new LZMA2Options()). Where cos was
an OutputStream which simply calculated the crc32 of the content
written.

The ihe_ovly.pr.dcm is a ~66KB binary file.
The image1.dcm file is a ~26MB binary file with some metadata wrapping
a grayscale bmp.
The large.xml file is a ~51MB formatted utf-8 xml file.

1.8
Benchmark (file)  Mode  Cnt Score
   Error  Units
XZCompressionBenchmark.compress  ihe_ovly_pr.dcm  avgt5 9.575
±   0.322  ms/op
XZCompressionBenchmark.compress   image1.dcm  avgt5  6767.376
± 675.373  ms/op
XZCompressionBenchmark.compresslarge.xml  avgt5  9680.569
± 207.521  ms/op

1.9-SNAPSHOT
Benchmark (file)  Mode  Cnt Score
  Error  Units
XZCompressionBenchmark.compress  ihe_ovly_pr.dcm  avgt5 7.888
±  0.443  ms/op
XZCompressionBenchmark.compress   image1.dcm  avgt5  6144.748
± 71.551  ms/op
XZCompressionBenchmark.compresslarge.xml  avgt5  7904.019
± 26.316  ms/op

These results are from 64 bit windows using open jdk 11.0.6.
The results are 9-18% faster across the 3 different file types.

I made a small change to ArrayUtil from what I sent previously. When
everything matches, it now returns length rather than -1.

The remaining changes are:

BT4:
in getMatches()

if (matches.count > 0) {
while (lenBest < matchLenLimit && buf[readPos + lenBest - delta2]
  == buf[readPos + lenBest])
++lenBest;

Changes to:
if (matches.count > 0) {
lenBest += ArrayUtil.mismatch(buf, readPos + lenBest -
delta2, buf, readPos + lenBest, matchLenLimit - lenBest);

Further down, inside a while(true) block,

if (buf[readPos + len - delta] == buf[readPos + len]) {
while (++len < matchLenLimit)
if (buf[readPos + len - delta] != buf[readPos + len])
break;

Changes to:
int mismatch = ArrayUtil.mismatch(buf, readPos + len -
delta, buf, readPos + len, matchLenLimit - len);
if (mismatch != 0) {
len += mismatch;

in skip(int, int)

if (buf[readPos + len - delta] == buf[readPos + len]) {
// No need to look for longer matches than niceLenLimit
// because we only are updating the tree, not returning
// matches found to the caller.
do {
if (++len == niceLenLimit) {
tree[ptr1] = tree[pair];
tree[ptr0] = tree[pair + 1];
return;
}
} while (buf[readPos + len - delta] == buf[readPos + len]);
}

Changes to:
// No need to look for longer matches than niceLenLimit
// because we only are updating the tree, not returning
// matches found to the caller.
int mismatch = ArrayUtil.mismatch(buf, readPos + len -
delta, buf, readPos + len, niceLenLimit);
if (mismatch == niceLenLimit) {
tree[ptr1] = tree[pair];
tree[ptr0] = tree[pair + 1];
return;
}
len += mismatch;


In HC4, both changes are in getMatches()

if (matches.count > 0) {
while (lenBest < matchLenLimit && buf[readPos + lenBest - delta2]
  == buf[readPos + lenBest])
++lenBest;

Changes to:
if (matches.count > 0) {
lenBest += ArrayUtil.mismatch(buf, readPos + lenBest -
delta2, buf, readPos + lenBest, matchLenLimit - lenBest);

And:

// Test the first byte and the first new byte that would give us
// a match that is at least one byte longer than lenBest. This
// too short matches get quickly skipped.
if (buf[readPos + lenBest - delta] == buf[readPos + lenBest]
&& buf[readPos - delta] == buf[readPos]) {
// Calculate the length of the match.
int len = 0;
while (++len < matchLenLimit)
if (buf[readPos + len - delta] != buf[readPos + len])
break;

// Use the match if and only if it is better than the longest
// match found so far.
if (len > lenBest) {
lenBest = len;
matches.len[matches.count] = len;
matches.dist[matches.count] = delta - 1;
++matches.count;

// Return if it is long enough (niceLen or reached the
// end of the dictionary).
if (len >= niceLenLimit)
  

Re: [xz-devel] xz-java and newer java

2021-01-12 Thread Brett Okken
It turns out that reading the longs in native byte order provides
noticeable improvement.
I did find that there was cost overhead of ~1 ns/op by using an
interface/implementation to flex behavior if Unsafe could not be
loaded. That cost goes away by using java.lang.invoke.MethodHandle.
So here is an updated jdk 7 compatible ArrayUtil implementation which
matches current performance if the first byte does not match and is
faster in every other scenario. The gap in performance grows as more
bytes actually match. At 2 bytes, it takes roughly half the time. At
97 bytes, it takes less than ten percent of the time.

Here are the benchmark results:

Benchmark(length)  Mode  Cnt
Score   Error  Units
ArrayMismatchBenchmark.comparerMismatch_nomatch 0  avgt5
4.487 ± 0.059  ns/op
ArrayMismatchBenchmark.comparerMismatch_nomatch 1  avgt5
4.515 ± 0.102  ns/op
ArrayMismatchBenchmark.comparerMismatch_nomatch 2  avgt5
4.523 ± 0.023  ns/op
ArrayMismatchBenchmark.comparerMismatch_nomatch 7  avgt5
5.164 ± 0.098  ns/op
ArrayMismatchBenchmark.comparerMismatch_nomatch13  avgt5
5.748 ± 0.974  ns/op
ArrayMismatchBenchmark.comparerMismatch_nomatch57  avgt5
10.060 ± 1.135  ns/op
ArrayMismatchBenchmark.comparerMismatch_nomatch97  avgt5
11.518 ± 0.418  ns/op
ArrayMismatchBenchmark.legacyMismatch_nomatch   0  avgt5
3.259 ± 0.069  ns/op
ArrayMismatchBenchmark.legacyMismatch_nomatch   1  avgt5
5.712 ± 0.070  ns/op
ArrayMismatchBenchmark.legacyMismatch_nomatch   2  avgt5
6.017 ± 0.300  ns/op
ArrayMismatchBenchmark.legacyMismatch_nomatch   7  avgt5
12.949 ± 0.163  ns/op
ArrayMismatchBenchmark.legacyMismatch_nomatch  13  avgt5
18.696 ± 0.551  ns/op
ArrayMismatchBenchmark.legacyMismatch_nomatch  57  avgt5
43.232 ± 1.015  ns/op
ArrayMismatchBenchmark.legacyMismatch_nomatch  97  avgt5
90.599 ± 0.794  ns/op
ArrayMismatchBenchmark.unsafeMisMatch_nomatch   0  avgt5
3.246 ± 0.138  ns/op
ArrayMismatchBenchmark.unsafeMisMatch_nomatch   1  avgt5
3.225 ± 0.042  ns/op
ArrayMismatchBenchmark.unsafeMisMatch_nomatch   2  avgt5
3.242 ± 0.043  ns/op
ArrayMismatchBenchmark.unsafeMisMatch_nomatch   7  avgt5
3.244 ± 0.048  ns/op
ArrayMismatchBenchmark.unsafeMisMatch_nomatch  13  avgt5
3.477 ± 0.028  ns/op
ArrayMismatchBenchmark.unsafeMisMatch_nomatch  57  avgt5
5.968 ± 0.553  ns/op
ArrayMismatchBenchmark.unsafeMisMatch_nomatch  97  avgt5
7.182 ± 0.080  ns/op
ArrayMismatchBenchmark.utilMismatch_nomatch 0  avgt5
3.219 ± 0.044  ns/op
ArrayMismatchBenchmark.utilMismatch_nomatch 1  avgt5
3.217 ± 0.054  ns/op
ArrayMismatchBenchmark.utilMismatch_nomatch 2  avgt5
3.217 ± 0.069  ns/op
ArrayMismatchBenchmark.utilMismatch_nomatch 7  avgt5
3.206 ± 0.047  ns/op
ArrayMismatchBenchmark.utilMismatch_nomatch13  avgt5
3.509 ± 0.218  ns/op
ArrayMismatchBenchmark.utilMismatch_nomatch57  avgt5
5.870 ± 0.063  ns/op
ArrayMismatchBenchmark.utilMismatch_nomatch97  avgt5
7.178 ± 0.267  ns/op

The "comparer" implementation is using interface with different
implementations based on whether Unsafe could be loaded.
The "unsafe" implementation is directly using the Unsafe class.
The "util" implementation is using the ArrayUtil class below.


package org.tukaani.xz.common;

import java.lang.invoke.MethodHandle;
import java.lang.invoke.MethodHandles;
import java.lang.invoke.MethodType;
import java.lang.reflect.Constructor;
import java.nio.ByteOrder;

public final class ArrayUtil {

/**
 * MethodHandle to the actual mismatch method to use at runtime.
 */
private static final MethodHandle MISMATCH;

/**
 * If {@code sun.misc.Unsafe} can be loaded, this is MethodHandle
bound to an instance of Unsafe for method {@code long getLong(Object,
long)}.
 */
private static final MethodHandle UNSAFE_GET_LONG;

/**
 * MethodHandle to either {@link Long#numberOfLeadingZeros(long)}
or {@link Long#numberOfTrailingZeros(long)} depending on {@link
ByteOrder#nativeOrder()}.
 */
private static final MethodHandle LEADING_ZEROS;

/**
 * Populated from reflected read of {@code
sun.misc.Unsafe.ARRAY_BYTE_BASE_OFFSET}.
 */
private static final long ARRAY_BASE_OFFSET;

static {
//try to create an instance using Unsafe
long arrayBaseOffset = 0;
MethodHandle unsafeGetLong = null;
MethodHandle leadingZeros = null;
MethodHandle mismatch = null;
final MethodHandles.Lookup lookup = MethodHandles.lookup();
final MethodType mismatchType =
MethodType.methodType(int.class, byte[].class, int.class,
byte[].class, int.class, int.class);
try {
Class unsafeClazz = Class.forName("sun.misc.Unsafe", 

Re: [xz-devel] xz-java and newer java

2021-01-11 Thread Brett Okken
I threw together a quick jmh test, and there is no value in the
changes to Hash234.

For the array mismatch, the results are kind of interesting. My
observation, stepping through some compression uses, is that the
comparison length is typically 100-200 bytes in length, but the actual
match length is typically fairly short. This is obviously going to be
highly dependent on data, and I was using raw image data for
observation. Content like xml or json might have longer matches. So I
set up a benchmark which is always comparing 128 bytes and the
mismatch occurs after various "lengths":

Benchmark  (length)  Mode  Cnt
Score   Error  Units
ArrayMismatchBenchmark.legacyMismatch_nomatch 0  avgt5
3.198 ± 0.168  ns/op
ArrayMismatchBenchmark.legacyMismatch_nomatch 1  avgt5
5.607 ± 0.048  ns/op
ArrayMismatchBenchmark.legacyMismatch_nomatch 2  avgt5
5.852 ± 0.053  ns/op
ArrayMismatchBenchmark.legacyMismatch_nomatch 7  avgt5
12.703 ± 0.350  ns/op
ArrayMismatchBenchmark.legacyMismatch_nomatch13  avgt5
18.275 ± 0.228  ns/op
ArrayMismatchBenchmark.legacyMismatch_nomatch57  avgt5
42.313 ± 0.450  ns/op
ArrayMismatchBenchmark.legacyMismatch_nomatch97  avgt5
89.410 ± 2.927  ns/op
ArrayMismatchBenchmark.arraysMismatch_nomatch 0  avgt5
4.629 ± 0.035  ns/op
ArrayMismatchBenchmark.arraysMismatch_nomatch 1  avgt5
9.515 ± 0.096  ns/op
ArrayMismatchBenchmark.arraysMismatch_nomatch 2  avgt5
9.526 ± 0.132  ns/op
ArrayMismatchBenchmark.arraysMismatch_nomatch 7  avgt5
9.581 ± 0.395  ns/op
ArrayMismatchBenchmark.arraysMismatch_nomatch13  avgt5
9.781 ± 0.133  ns/op
ArrayMismatchBenchmark.arraysMismatch_nomatch57  avgt5
9.846 ± 0.182  ns/op
ArrayMismatchBenchmark.arraysMismatch_nomatch97  avgt5
10.809 ± 0.307  ns/op
ArrayMismatchBenchmark.intMismatch_nomatch0  avgt5
3.417 ± 0.018  ns/op
ArrayMismatchBenchmark.intMismatch_nomatch1  avgt5
3.412 ± 0.011  ns/op
ArrayMismatchBenchmark.intMismatch_nomatch2  avgt5
3.414 ± 0.032  ns/op
ArrayMismatchBenchmark.intMismatch_nomatch7  avgt5
5.401 ± 0.207  ns/op
ArrayMismatchBenchmark.intMismatch_nomatch   13  avgt5
8.311 ± 0.070  ns/op
ArrayMismatchBenchmark.intMismatch_nomatch   57  avgt5
20.536 ± 0.556  ns/op
ArrayMismatchBenchmark.intMismatch_nomatch   97  avgt5
30.969 ± 0.318  ns/op
ArrayMismatchBenchmark.longMismatch_nomatch   0  avgt5
4.399 ± 0.082  ns/op
ArrayMismatchBenchmark.longMismatch_nomatch   1  avgt5
4.390 ± 0.068  ns/op
ArrayMismatchBenchmark.longMismatch_nomatch   2  avgt5
4.398 ± 0.033  ns/op
ArrayMismatchBenchmark.longMismatch_nomatch   7  avgt5
4.403 ± 0.110  ns/op
ArrayMismatchBenchmark.longMismatch_nomatch  13  avgt5
6.564 ± 0.398  ns/op
ArrayMismatchBenchmark.longMismatch_nomatch  57  avgt5
11.548 ± 0.331  ns/op
ArrayMismatchBenchmark.longMismatch_nomatch  97  avgt5
16.335 ± 0.119  ns/op

I labeled the current behavior as "legacy".
The Arrays.mismatch is significantly slower when the mismatch occurs
early in the array and significantly faster when the mismatch occurs
later.
Comparing an int (4 bytes) at a time is a clear winner if the mismatch
occurs in those 4 bytes, which appeared to be 90+% of the calls I
observed.
Comparing a long (8 bytes) at a time is faster than the current
behavior unless it is the first byte which does not match, but slower
than comparing ints if the mismatch occurs in the first 4 bytes.

I wrote this test using jdk 9 VarHandle to read the ints and longs
from the byte[], but the same thing can be achieved using
sun.misc.Unsafe. I will add that as a case in the benchmark, but it is
expected to be similar to VarHandle (maybe slightly faster).

Brett

On Mon, Jan 11, 2021 at 10:04 AM Lasse Collin  wrote:
>
> On 2021-01-09 Brett Okken wrote:
> > This would seem to be a potential candidate for a multi-release
> > jar[1], if you can figure out a reasonable way to get a build system
> > to generate one.
>
> I suppose it can be done. The build system uses Apache Ant. From some
> sources I've understood that there are more modern alternatives but I
> haven't had any interest or energy to learn more as Ant seems to still
> work OK.
>
> > The 4 uses I found of comparing byte[] could be refactored to call a
> > new utility class to do the comparison. The "regular" implementation
> > could be java 7 compatible, and the jdk 9 version would be in the
> > META_INF folder.
> > Even for the java 7 compatible version, it might be worth exploring
> > how much improvement would come from using Unsafe to read int or long
> > values from the byte[] and compare those.
> >
> > For Hash234, I would think the whole class could be handled for the
>
> All these sound like worth checking out.
>
> On 

Re: [xz-devel] xz-java and newer java

2021-01-09 Thread Brett Okken
This would seem to be a potential candidate for a multi-release
jar[1], if you can figure out a reasonable way to get a build system
to generate one.

The 4 uses I found of comparing byte[] could be refactored to call a
new utility class to do the comparison. The "regular" implementation
could be java 7 compatible, and the jdk 9 version would be in the
META_INF folder.
Even for the java 7 compatible version, it might be worth exploring
how much improvement would come from using Unsafe to read int or long
values from the byte[] and compare those.

For Hash234, I would think the whole class could be handled for the MR jar.

[1] - https://openjdk.java.net/jeps/238

Thanks,

Brett

On Fri, Jan 8, 2021 at 1:36 PM Lasse Collin  wrote:
>
> On 2021-01-08 Brett Okken wrote:
> > Are there any plans to update xz-java to take advantage of newer
> > features in jdk 9+?
>
> There aren't much plans at all. Adding module-info.java is likely to
> happen in the next release, whenever that will be.
>
> Apache Commons Compress 1.20 requires Java 7. It depends on XZ for
> Java. I think it wouldn't be good to make XZ for Java require a newer
> Java version than Commons Compress but it could be discussed with
> Commons Compress developers. There's a bug with .7z files that requires
> changing both XZ for Java and Commons Compress so I could ask about the
> Java version too.
>
> > For example, Arrays.mismatch[1] leverages vectorized comparisons of 2
> > byte[]. This could be leveraged in the getMatches methods of BT4 and
> > HC4 as well as the 2 getMatchLen methods in LZEncoder.
> >
> > Another example would be to use a VarHandle to read int values out of
> > a byte[][2], which would be useful for the calcHashes method in
> > Hash234.
>
> Thanks! These sound interesting. If they make big enough difference, it
> could be a good reason to require Java 9.
>
> I will need to check out the LZDecoder improvement from the other
> message too, and perhaps a few variations of it. Thanks!
>
> There are multiple things in XZ Utils that I try to look at in the near
> future so it will be a while until I will play with the Java code.
>
> --
> Lasse Collin  |  IRC: Larhzu @ IRCnet & Freenode
>



Re: [xz-devel] xz-java and newer java

2021-01-08 Thread Lasse Collin
On 2021-01-08 Brett Okken wrote:
> Are there any plans to update xz-java to take advantage of newer
> features in jdk 9+?

There aren't much plans at all. Adding module-info.java is likely to
happen in the next release, whenever that will be.

Apache Commons Compress 1.20 requires Java 7. It depends on XZ for
Java. I think it wouldn't be good to make XZ for Java require a newer
Java version than Commons Compress but it could be discussed with
Commons Compress developers. There's a bug with .7z files that requires
changing both XZ for Java and Commons Compress so I could ask about the
Java version too.

> For example, Arrays.mismatch[1] leverages vectorized comparisons of 2
> byte[]. This could be leveraged in the getMatches methods of BT4 and
> HC4 as well as the 2 getMatchLen methods in LZEncoder.
> 
> Another example would be to use a VarHandle to read int values out of
> a byte[][2], which would be useful for the calcHashes method in
> Hash234.

Thanks! These sound interesting. If they make big enough difference, it
could be a good reason to require Java 9.

I will need to check out the LZDecoder improvement from the other
message too, and perhaps a few variations of it. Thanks!

There are multiple things in XZ Utils that I try to look at in the near
future so it will be a while until I will play with the Java code.

-- 
Lasse Collin  |  IRC: Larhzu @ IRCnet & Freenode