being more modular.
2. Allow specifying the implementation to use with a system property. This
would be unlikely to be used outside of benchmarking, but would provide
options for users on unusual hardware.
Brett
On Tue, Mar 12, 2024 at 12:55 PM Lasse Collin
wrote:
> On 2024-03-12 Bre
will get opportunity to test out arm64. That
could be awhile yet.
I do have some things still on jdk 8, but only decompression. Surveys seem
to indicate quite a bit of jdk 8 still in use, but I have no personal need.
Brett
On Sun, Mar 10, 2024 at 2:49 PM Lasse Collin
wrote:
> On 2024-03-09 Br
When I tested graviton2 (arm64) previously, Arrays.mismatch was better than
comparing longs using a VarHandle.
The benefits are definitely with content that compresses more - because
there are more long matches.
I do like Unsafe as an option for jdk 8 users on x86 or arm64.
On Sat, Mar 9, 2024
I have added a comment to the PR with updated benchmark results:
https://github.com/tukaani-project/xz-java/pull/13#issuecomment-1977705691
On Fri, Mar 1, 2024 at 6:23 AM Brett Okken wrote:
>
> I found and resolved the difference:
> https://github.com/tukaani-project/xz-java/pull/1
On Thu, Feb 29, 2024 at 8:47 PM Brett Okken wrote:
>
> > Thanks! Ideally there would be one commit to add the minimal portable
> > version, then separate commits for each optimized variant.
>
> Would you like me to remove the Unsafe based impl from
> https://github.com/tukaani
or in one or the other.
Brett
On Thu, Feb 29, 2024 at 11:35 AM Lasse Collin wrote:
>
> On 2024-02-25 Brett Okken wrote:
> > I created https://github.com/tukaani-project/xz-java/pull/13 with the
> > bare bones changes to utilize a utility for array comparisons and an
> > U
t;
> On 2024-02-19 Brett Okken wrote:
> > I have created a pr to the GitHub project.
> >
> > https://github.com/tukaani-project/xz-java/pull/12
>
> Thanks! I could be good to split into smaller commits to make reviewing
> easier.
>
> > It is not clear to me if
I have created a pr to the GitHub project with these changes.
https://github.com/tukaani-project/xz-java/pull/11/files
Thanks,
Brett
On Thu, Mar 31, 2022 at 4:33 PM Lasse Collin
wrote:
> > On Thu, May 6, 2021 at 4:18 PM Brett Okken
> > wrote:
> >
> > > Th
I have created a pr to the GitHub project.
https://github.com/tukaani-project/xz-java/pull/12
It is not clear to me if that is actually seeing active dev on the Java
project yet.
Thanks,
Brett
On Sat, Feb 12, 2022 at 11:45 AM Brett Okken
wrote:
> Can this be taken up again?
>
> On
> I'm not sure that this is authoritative. The Java API documentation
> says that it "aims" to provide "Full support for the .xz file format
> specification version 1.0.4"
I am not certain which statement you believe is not authoritative.
There are existing constructors (such as[1]) which allow
What version of xz are you using?
The differences between xz and lzma are a bit more involved. One such
example is that xz is a framed format which includes checksums on each
“frame”. I would not expect checksum verification to account for all of
that difference, but it can be disabled to
like multithreaded encoding / decoding and a
> > few updates that Brett Okken had submited (but are still waiting for
> > merge). Should I add these things to only my local version, or is
> > there a plan for these things in the future?
>
> Brett Okken's patches I haven't r
Can this be reviewed?
On Thu, May 6, 2021 at 4:18 PM Brett Okken wrote:
> These changes reduce the time of DeltaEncoder by ~65% and DeltaDecoder
> by ~40%, assuming using arrays that are several KB in size.
>
Can this be taken up again?
On Wed, Mar 24, 2021 at 6:20 AM Brett Okken
wrote:
> I grabbed an older version in the last mail. This is the updated
> version for aarch64.
>
These changes reduce the time of DeltaEncoder by ~65% and DeltaDecoder
by ~40%, assuming using arrays that are several KB in size.
diff --git a/src/org/tukaani/xz/delta/DeltaCoder.java
b/src/org/tukaani/xz/delta/DeltaCoder.java
index d94eb66..ccb702d 100644xz/delta/DeltaCoder.java
I grabbed an older version in the last mail. This is the updated
version for aarch64.
ArrayUtil.java
Description: Binary data
I was able to test on AWS graviton2 instances (aarch64), but only with
jdk 15. The results show that the vectorized approach appears the best
option, though long comparisons are also an improvement over baseline.
Based on this, I made a small change to ArrayUtil to, by default, use
unsafe long
> With a quick try I got a feeling that my worry about short repeats was
> wrong. It doesn't matter because decoding each LZMA symbol is much more
> expensive. What matters is avoiding multiple tiny arraycopy calls
> within a single run of the repeat method, and that problem was already
> solved.
I learned the wrong lesson from LZDecoder.
This pattern of doubling sizes System.arraycopy was better than byte
by byte copies in loop. There was not really a direct comparison to
Arrays.fill. The single byte repeating was close.
Hotspot must be doing something interesting with Arrays.fill,
I have attached updated patches and ArrayUtil.java.
HC4 needed changes/optimizations in both locations.
I also found a better way to handle BT4 occasionally sending -1 as the length.
diff --git a/src/org/tukaani/xz/lz/BT4.java b/src/org/tukaani/xz/lz/BT4.java
index 6c46feb..7d78aef 100644
---
On Tue, Feb 16, 2021 at 12:48 PM Lasse Collin wrote:
>
> I quickly tried these with "XZEncDemo 2". I used the preset 2 because
> that uses LZMAEncoderFast instead of LZMAEncoderNormal where the
> negative lengths result in a crash.
I updated the mismatch method to check for negative lengths
We found in LZDecoder that using System.arrayCopy with doubling size
is faster than Arrays.fill (especially for larger arrays).
We can apply that knowledge in the BasicArrayCache, where there are
some use cases which require clearing out the array prior to returning
it.
diff --git
On Sun, Feb 14, 2021 at 9:30 AM Lasse Collin
wrote:
> On 2021-02-13 Brett Okken wrote:
> > We can make it look even more like liblzma :)
>
> It can be done but I'm not sure yet if it should be done. Your
> implementation looks very neat though. :-)
>
> > In my benc
We can make it look even more like liblzma :)
In my benchmark I observe no negative impact of using the functions.
Which is to say that this is still 5-7% faster than the byte-by-byte
approach.
public class CRC64 extends Check {
private static final VarHandle INT_HANDLE =
On Thu, Feb 11, 2021 at 12:51 PM Lasse Collin wrote:
>
> On 2021-02-05 Brett Okken wrote:
> > I worked this out last night. We need to double how much we copy each
> > time by not advancing "back". This actually works even better than
> > Arrays.
the decompression of the repeating single byte by ~1%.
/*
* CRC64
*
* Authors: Brett Okken
* Lasse Collin
*
* This file has been put into the public domain.
* You can do whatever you want with this file.
*/
package org.tukaani.xz.check;
import java.lang.invoke.MethodHandles;
import
Here is a patch for changes. The benchmark results follow.
diff --git a/src/org/tukaani/xz/lz/LZDecoder.java
b/src/org/tukaani/xz/lz/LZDecoder.java
index 85b2ca1..565209a 100644
--- a/src/org/tukaani/xz/lz/LZDecoder.java
+++ b/src/org/tukaani/xz/lz/LZDecoder.java
@@ -12,6 +12,7 @@ package
This had /way/ more impact than I expected on overall decompression performance.
Here are the baseline numbers for 1.8 (jdk 11 64bit):
Benchmark (file) Mode Cnt
Score Error Units
XZDecompressionBenchmark.decompress ihe_ovly_pr.dcm avgt3
0.731 ± 0.010
> > Now that there is a 6 byte chunkHeader, could the 1 byte tempBuf be
> > removed?
>
> It's better to keep it. It would be confusing to use the same buffer in
> write(int) and writeChunk(). At glance it would look like that
> writeChunk() could be overwriting the input.
I assumed that
On Fri, Feb 5, 2021 at 11:07 AM Lasse Collin wrote:
>
> On 2021-02-02 Brett Okken wrote:
> > Thus far I have only tested on jdk 11 64bit windows, but the fairly
> > clear winner is:
> >
> > public void update(byte[] buf, int off, int len) {
> &g
After recent changes, the LZMA2OutputStream class no longer uses
DataOutputStream, but the import statement is still present.
Now that there is a 6 byte chunkHeader, could the 1 byte tempBuf be removed?
> With a file with two-byte repeat ("ababababababab"...) it's 50 % slower
> than the baseline. Calling arraycopy in a loop, copying two bytes at a
> time, is not efficient. I didn't try look how big the copy needs to be
> to make the overhead of arraycopy smaller than the benefit but clearly
> it
I still need to do more testing across jdk 8 and 15, but initial
returns on this are pretty positive. The repeating byte file is
meaningfully faster than baseline. One of my test files (image1.dcm)
does not improve much from baseline, but the other 2 files do.
diff --git
On Wed, Feb 3, 2021 at 2:56 PM Lasse Collin wrote:
>
> On 2021-02-01 Brett Okken wrote:
> > I have played with this quite a bit and have come up with a slightly
> > modified change which does not regress for the smallest of the sample
> > objects and shows a nice impr
I have not done any testing of xz specifically, but was motivated by
https://github.com/openjdk/jdk/pull/542, which showed pretty
noticeable slowdown when biased locking is removed. The specific
example there was writing 1 byte at a time being transitioned to
writing the 2-8 bytes to a byte[]
I tested jdk 15 64bit and jdk 11 32bit, client and server and the
above implementation is consistently quite good.
The alternate in running does not do the leading alignment. This
version is really close in 64 bit testing and slightly faster for 32
bit. The differences are pretty small, and both
I accidentally hit reply instead of reply all.
> > Shouldn't that be (i & 3) != 0?
> > An offset of 0 should not enter this loop, but 0 & 3 does not equal 1.
>
> The idea really is that offset of 1 doesn't enter the loop, thus the
> main slicing-by-4 loop is misaligned. I don't know why it makes
I have played with this quite a bit and have come up with a slightly
modified change which does not regress for the smallest of the sample
objects and shows a nice improvement for the 2 larger files.
Here is baseline benchmark on 1.8:
jdk 11 64 bit 1.8 BASELINE
Benchmark
Comparison}.
*
*
* @author Brett Okken
*/
public final class ArrayUtil {
/**
* Enumerated options for controlling implementation of how to
compare arrays.
*/
public static enum ArrayComparison {
/**
* Uses {@code VarHandle} for {@code int
Here are some small improvements when creating new BlockInputStream
instances. This reduces the size of the byte[] for the block header to
the actual size and replaces use of ByteArrayInputStream, which has
synchronized methods, with a ByteBuffer, which provides the same
functionality without
There are several places where single byte writes are being done
during compression. Often this is going to an OutputStream with
synchronized write methods. Historically that has not mattered much
because of biased locking. However, biased locking is being
removed[1]. These changes will batch
Based on some playing around with unrolling loops as part of the crc64
implementation, I tried unrolling the "legacy" implementation and
found it provided some nice improvements. The improvements were most
pronounced on 32 bit jdk 11:
32 jdk 11 - LEGACY
Benchmark
org.tukaani.xz.ArrayComparison} to a value from {@link
ArrayComparison}.
*
*
* @author Brett Okken
*/
public final class ArrayUtil {
/**
* Enumerated options for controlling implementation of how to
compare arrays.
*/
public static enum ArrayComparison
diff --git a/src/org/tukaani/xz/lz/BT4.java b/src/org/tukaani/xz/lz/BT4.java
index 6c46feb..c96c766 100644
--- a/src/org/tukaani/xz/lz/BT4.java
+++ b/src/org/tukaani/xz/lz/BT4.java
@@ -11,6 +11,7 @@
package org.tukaani.xz.lz;
import org.tukaani.xz.ArrayCache;
+import
Here is a slice by 4 implementation. It goes byte by byte to easily be
compatible with older jdks. Performance wise, it is pretty comparable
to the java port of Adler's stackoverflow implementation:
Benchmark Mode Cnt Score Error Units
Hash64Benchmark.adler
> Have you tested with 32-bit Java too? It's quite possible that it's
> better to use ints than longs on 32-bit system. If so, that should be
> detected at runtime too, I guess.
I have now run benchmarks using the 32bit jre on 64bit windows system.
That actually introduces additional interesting
java.lang.reflect.Method;
import java.nio.ByteOrder;
import java.util.logging.Level;
import java.util.logging.Logger;
/**
* Utilities for optimized array interactions.
*
* @author Brett Okken
*/
public final class ArrayUtil {
/**
* MethodHandle to the actual mismatch method to use at runtime
Mark Adler has posted an optimized crc64 implementation on
stackoverflow[1]. This can be reasonably easily ported to java (that
post has a link to java impl on github[2] which warrants a little
clean up, but gives a decent idea).
I did a quick benchmark calculating the crc64 over 8KB and the
public int getMatchLen(int forward, int dist, int lenLimit) {
final int curPos = readPos + forward;
final int backPos = curPos - dist - 1;
return ArrayUtil.mismatch(buf, curPos, buf, backPos, lenLimit);
}
On Tue, Jan 12, 2021 at 10:17 AM Brett Okken wrote:
>
>
lower
> than comparing ints if the mismatch occurs in the first 4 bytes.
>
> I wrote this test using jdk 9 VarHandle to read the ints and longs
> from the byte[], but the same thing can be achieved using
> sun.misc.Unsafe. I will add that as a case in the benchmark, but it is
> e
, but it is
expected to be similar to VarHandle (maybe slightly faster).
Brett
On Mon, Jan 11, 2021 at 10:04 AM Lasse Collin wrote:
>
> On 2021-01-09 Brett Okken wrote:
> > This would seem to be a potential candidate for a multi-release
> > jar[1], if you can figure out
hole class could be handled for the MR jar.
[1] - https://openjdk.java.net/jeps/238
Thanks,
Brett
On Fri, Jan 8, 2021 at 1:36 PM Lasse Collin wrote:
>
> On 2021-01-08 Brett Okken wrote:
> > Are there any plans to update xz-java to take advantage of newer
> > features in jdk 9+?
The repeat method in LZDecoder[1] currently copies individual bytes in
a loop. This could be changed to do batch copies:
do {
//it is possible for the "repeat" to include content which
is going to be generated here
//so we have to limit ourselves to how much data is
Are there any plans to update xz-java to take advantage of newer
features in jdk 9+?
For example, Arrays.mismatch[1] leverages vectorized comparisons of 2
byte[]. This could be leveraged in the getMatches methods of BT4 and
HC4 as well as the 2 getMatchLen methods in LZEncoder.
Another example
54 matches
Mail list logo