Re: [xz-devel] [BUG] Issue with xz-java: Unknown Filter ID

2024-07-18 Thread Lasse Collin
On 2024-07-18 Brett Okken wrote:
> There were a number of "unsigned" related operations added in 1.8.
> I cannot find any of the official doc, but here is a decent
> summarization of what was added:
> https://www.baeldung.com/java-unsigned-arithmetic

Thanks! I suspect that RangeDecoder was the only performance-critical
place with unsigned integers.

-- 
Lasse Collin



Re: [xz-devel] [BUG] Issue with xz-java: Unknown Filter ID

2024-07-18 Thread Brett Okken
Glad it worked.
There were a number of "unsigned" related operations added in 1.8.
I cannot find any of the official doc, but here is a decent
summarization of what was added:
https://www.baeldung.com/java-unsigned-arithmetic

On Thu, Jul 18, 2024 at 11:16 AM Lasse Collin  wrote:
>
> On 2024-07-18 Brett Okken wrote:
> > Did you try out the Integer.compareUnsigned method as an alternative?
>
> No, I was too dumb to even look for it. :-( This code was written when
> Java 6 was the latest stable release, and back then it seemed that Java
> was quite negative about supporting unsigned integers.
>
> Integer.compareUnsigned seems to give the same performance as the
> "long" method on x86-64. So compareUnsigned is the obvious choice. I
> have committed it to master.
>
> Thanks!
>
> --
> Lasse Collin



Re: [xz-devel] [BUG] Issue with xz-java: Unknown Filter ID

2024-07-18 Thread Lasse Collin
On 2024-07-18 Brett Okken wrote:
> Did you try out the Integer.compareUnsigned method as an alternative?

No, I was too dumb to even look for it. :-( This code was written when
Java 6 was the latest stable release, and back then it seemed that Java
was quite negative about supporting unsigned integers.

Integer.compareUnsigned seems to give the same performance as the
"long" method on x86-64. So compareUnsigned is the obvious choice. I
have committed it to master.

Thanks!

-- 
Lasse Collin



Re: [xz-devel] [BUG] Issue with xz-java: Unknown Filter ID

2024-07-18 Thread Brett Okken
Lasse,

On the rc_dec_speed PR, it appears to me the main change that might
speed things up is this unsigned comparison?
https://github.com/tukaani-project/xz-java/compare/master...rc_dec_speed#diff-8a1afbf1609c4b2d7813b299fce056f7ddd58a4a24ff02b01c2fdba38ff7fd0dL24-R23

Did you try out the Integer.compareUnsigned method as an alternative?
https://docs.oracle.com/javase/8/docs/api/java/lang/Integer.html#compareUnsigned-int-int-

My observation historically has been that moving to 64bit operations is slower.

I will try to find some time to take a closer look at the PR you referenced.

Brett

On Thu, Jul 18, 2024 at 9:13 AM Lasse Collin  wrote:
>
> On 2024-03-07 Lasse Collin wrote:
> > On 2024-03-05 Dennis Ens wrote:
> > > > I hope 1.10 could be done in a month or two but I don't want to
> > > > make any promises or serious predictions. Historically those
> > > > haven't been accurate at all.
> > >
> > > I'll hope it's on the sooner side then.
>
> 1.10 is getting closer to being ready. I'd like to fix this-escape
> warnings still as those are a sign of design errors. More info here:
>
> https://github.com/tukaani-project/xz-java/pull/18
>
> I also created a branch rc_dec_speed but didn't create a PR for it. It
> speeds up decompression on x86-64 with OpenJDK 22 roughly 4 %. I'm not
> sure if it should be included in 1.10 in some form. If there is
> interest, let's create a separate thread for it, or I can create a PR if
> that is preferred.
>
> So now you and other people with Java knowledge can easily help because
> especially the PR #18 doesn't need much knowledge of the project
> internals. Thanks!
>
> I currently don't know what I should post to xz-devel and what to
> GitHub. Many public communication channels makes things hard to follow.
>
> --
> Lasse Collin
>



Re: [xz-devel] [BUG] Issue with xz-java: Unknown Filter ID

2024-07-18 Thread Lasse Collin
On 2024-03-07 Lasse Collin wrote:
> On 2024-03-05 Dennis Ens wrote:
> > > I hope 1.10 could be done in a month or two but I don't want to
> > > make any promises or serious predictions. Historically those
> > > haven't been accurate at all.
> > 
> > I'll hope it's on the sooner side then.

1.10 is getting closer to being ready. I'd like to fix this-escape
warnings still as those are a sign of design errors. More info here:

https://github.com/tukaani-project/xz-java/pull/18

I also created a branch rc_dec_speed but didn't create a PR for it. It
speeds up decompression on x86-64 with OpenJDK 22 roughly 4 %. I'm not
sure if it should be included in 1.10 in some form. If there is
interest, let's create a separate thread for it, or I can create a PR if
that is preferred.

So now you and other people with Java knowledge can easily help because
especially the PR #18 doesn't need much knowledge of the project
internals. Thanks!

I currently don't know what I should post to xz-devel and what to
GitHub. Many public communication channels makes things hard to follow.

-- 
Lasse Collin



Re: [xz-devel] [BUG] Issue with xz-java: Unknown Filter ID

2024-03-07 Thread Lasse Collin
On 2024-03-05 Dennis Ens wrote:
> > I hope 1.10 could be done in a month or two but I don't want to
> > make any promises or serious predictions. Historically those
> > haven't been accurate at all.  
> 
> I'll hope it's on the sooner side then. Is there a reason that
> xz-java is so far behind its counterpart?

These are unpaid hobby projects and the maintainers work on things they
happen to find interesting. The focus was on XZ Utils quite long, now
more attention is returning to XZ for Java.

> It seems those filters have been in that version for a while, and it
> seems strange they aren't compatible with each other. Maybe this
> should be made more clear in the README?

The README file in XZ for Java 1.9 specifies that the code implements
the .xz file format specification version 1.0.4. That doesn't include
the ARM64 or RISC-V filters.

ARM64 filter was in the master branch already. RISC-V filter is there
now too among a few other changes. README refers to spec version 1.2.0
now.

I understand it can be cryptic to refer to a spec version but obviously
one cannot list what future things are missing. One could list
supported filters but in theory something else could be extended too.

> I don't see anything about contributing on the xz-java github page.
> What are the best practices for contributing to this project?

I'm not sure if there is anything specific. Chatting on #tukaani can be
good to get ideas discussed quickly but it requires that people happen
to be online at the same time.

> > The encoder implementations have some minor differences which
> > affects both output and speed. Different releases can in theory
> > have different output. XZ Utils output might change in future
> > versions too.  
> 
> I see, that makes sense. I'm glad the difference is explainable and
> not a bug. Can you explain exactly what the differences are?

I don't remember much now. It's minor details but minor differences
affect output already.

> Does xz-java always do a better job compressing since it resulted in a
> smaller file?

They should be very close in practice. You need to compare to XZ Utils
in single-threaded mode: xz -T1

-- 
Lasse Collin



Re: [xz-devel] [BUG] Issue with xz-java: Unknown Filter ID

2024-03-05 Thread Dennis Ens
> I hope 1.10 could be done in a month or two but I don't want to make any
> promises or serious predictions. Historically those haven't been
> accurate at all.

I'll hope it's on the sooner side then. Is there a reason that xz-java is so 
far behind its counterpart? It seems those filters have been in that version 
for a while, and it seems strange they aren't compatible with each other. Maybe 
this should be made more clear in the README? 

I'd be happy to help out to make them compatible. I don't see anything about 
contributing on the xz-java github page. What are the best practices for 
contributing to this project?

> However, Java in general is slower. Some compressors have a Java API but
> the performance-critical code is native code. For example, java.util.zip
> calls into native code from zlib. XZ for Java doesn't use any native
> code (for now at least).

That's good to know. It seems like if it's using native code it should be 
possible to get the speeds pretty similar. It sounds like an interesting 
problem to tackle.

> XZ for Java lacks threading still. Implementing it is among the most
> important tasks in XZ for Java. It helps with big files like your test
> file but makes compressed file a little bigger. From your numbers I'm
> not certain if you used xz in threaded mode or not. The time difference
> looks unusually high for single-threaded mode for both compression and
> decompression. The difference for a big input file in threaded mode
> looks small though (unless it had lots of trivially-compressible
> sections).

> XZ Utils 5.6.0 also enables threaded mode by default.

That would explain a lot of the speed difference I noticed then. I was using 
the latest code from master, so I should have been running in threaded mode. 
It's great how large an improvement that can make, although I know it's always 
complicated to implement threading without any bugs.

> The encoder implementations have some minor differences which affects
> both output and speed. Different releases can in theory have different
> output. XZ Utils output might change in future versions too.

I see, that makes sense. I'm glad the difference is explainable and not a bug. 
Can you explain exactly what the differences are? Does xz-java always do a 
better job compressing since it resulted in a smaller file?



Re: [xz-devel] [BUG] Issue with xz-java: Unknown Filter ID

2024-03-05 Thread Lasse Collin
On 2024-03-05 Dennis Ens wrote:
> > The XZ for Java development is becoming active again but it may
> > still take a while until the next stable release is out. A few
> > other things are waiting in the queue from the past three years.  
> 
> Ah, I see. Thank you for the answer. Do you have a timeline of when
> the changes are expected?

I hope 1.10 could be done in a month or two but I don't want to make any
promises or serious predictions. Historically those haven't been
accurate at all.

> First, xz-java seems much slower. I tested compressing and
> decompressing a ~1.2 gigabyte file, and xz-java took 17m32.345s
> compared to xz's 7m7.615s to compress. Decompressing was 0m21.760s to
> 0m6.223s. Is there anything that can be done to improve the speed of
> the Java version, or is c just a much more efficient programming
> language?

Brett Okken's patches (originally from early 2021) should improve
compression speed. They are currently under review. Those are one of
the things to get into the next stable release.

However, Java in general is slower. Some compressors have a Java API but
the performance-critical code is native code. For example, java.util.zip
calls into native code from zlib. XZ for Java doesn't use any native
code (for now at least).

XZ for Java lacks threading still. Implementing it is among the most
important tasks in XZ for Java. It helps with big files like your test
file but makes compressed file a little bigger. From your numbers I'm
not certain if you used xz in threaded mode or not. The time difference
looks unusually high for single-threaded mode for both compression and
decompression. The difference for a big input file in threaded mode
looks small though (unless it had lots of trivially-compressible
sections).

In single-threaded mode, I would expect compressing with xz to take
around 30-40 % less time than XZ for Java but your numbers show 60 %
time reduction.

XZ Utils 5.6.0 added x86-64 assembly (GCC & Clang only) which reduces
per-thread decompression time by 20-40 % depending on the file and the
computer. So that increases the difference between XZ Utils and XZ for
Java too: decompression time can be roughly 50 % less with XZ Utils
5.6.0 in single-threaded mode on x86-64 compared to XZ for Java.

XZ Utils 5.6.0 also enables threaded mode by default.

> Also, I noticed that the results of compressing the files were
> different sizes. They both worked, so I don't know if it's an issue,
> but it does seem strange. The xz-java one was slightly smaller than
> the xz one.

The encoder implementations have some minor differences which affects
both output and speed. Different releases can in theory have different
output. XZ Utils output might change in future versions too.

-- 
Lasse Collin



Re: [xz-devel] [BUG] Issue with xz-java: Unknown Filter ID

2024-03-05 Thread Dennis Ens
> > The files specifically were good-1-arm64-lzma2-1.xz and
> > good-1-arm64-lzma2-2.xz and good-1-riscv-lzma2-1.xz and
> > good-1-riscv-lzma2-2.xz. These did seem to work fine when I tried
> > with xz, but not with xz-java. Do you think there might be a fix
> > available for this soon?
> 
> 
> XZ for Java 1.9 doesn't have ARM64 or RISC-V filter. The master branch
> has ARM64 filter. RISC-V filter will likely be there this week.
> 
> The XZ for Java development is becoming active again but it may still
> take a while until the next stable release is out. A few other things
> are waiting in the queue from the past three years.


Ah, I see. Thank you for the answer. Do you have a timeline of when the changes 
are expected?

I started to use xz, and was able to decompress the files without issue. 
Messing around with xz and xz-java, I noticed a few other things though. 

First, xz-java seems much slower. I tested compressing and decompressing a ~1.2 
gigabyte file, and xz-java took 17m32.345s compared to xz's 7m7.615s to 
compress. Decompressing was 0m21.760s to 0m6.223s. Is there anything that can 
be done to improve the speed of the Java version, or is c just a much more 
efficient programming language?

Also, I noticed that the results of compressing the files were different sizes. 
They both worked, so I don't know if it's an issue, but it does seem strange. 
The xz-java one was slightly smaller than the xz one.

Thanks again for the help.

--
Dennis Ens




Re: [xz-devel] [BUG] Issue with xz-java: Unknown Filter ID

2024-03-05 Thread Lasse Collin
On 2024-03-05 Dennis Ens wrote:
> The files specifically were good-1-arm64-lzma2-1.xz and
> good-1-arm64-lzma2-2.xz and good-1-riscv-lzma2-1.xz and
> good-1-riscv-lzma2-2.xz. These did seem to work fine when I tried
> with xz, but not with xz-java. Do you think there might be a fix
> available for this soon?

XZ for Java 1.9 doesn't have ARM64 or RISC-V filter. The master branch
has ARM64 filter. RISC-V filter will likely be there this week.

The XZ for Java development is becoming active again but it may still
take a while until the next stable release is out. A few other things
are waiting in the queue from the past three years.

-- 
Lasse Collin