Re: [xz-devel] [BUG] Issue with xz-java: Unknown Filter ID
On 2024-07-18 Brett Okken wrote: > There were a number of "unsigned" related operations added in 1.8. > I cannot find any of the official doc, but here is a decent > summarization of what was added: > https://www.baeldung.com/java-unsigned-arithmetic Thanks! I suspect that RangeDecoder was the only performance-critical place with unsigned integers. -- Lasse Collin
Re: [xz-devel] [BUG] Issue with xz-java: Unknown Filter ID
Glad it worked. There were a number of "unsigned" related operations added in 1.8. I cannot find any of the official doc, but here is a decent summarization of what was added: https://www.baeldung.com/java-unsigned-arithmetic On Thu, Jul 18, 2024 at 11:16 AM Lasse Collin wrote: > > On 2024-07-18 Brett Okken wrote: > > Did you try out the Integer.compareUnsigned method as an alternative? > > No, I was too dumb to even look for it. :-( This code was written when > Java 6 was the latest stable release, and back then it seemed that Java > was quite negative about supporting unsigned integers. > > Integer.compareUnsigned seems to give the same performance as the > "long" method on x86-64. So compareUnsigned is the obvious choice. I > have committed it to master. > > Thanks! > > -- > Lasse Collin
Re: [xz-devel] [BUG] Issue with xz-java: Unknown Filter ID
On 2024-07-18 Brett Okken wrote: > Did you try out the Integer.compareUnsigned method as an alternative? No, I was too dumb to even look for it. :-( This code was written when Java 6 was the latest stable release, and back then it seemed that Java was quite negative about supporting unsigned integers. Integer.compareUnsigned seems to give the same performance as the "long" method on x86-64. So compareUnsigned is the obvious choice. I have committed it to master. Thanks! -- Lasse Collin
Re: [xz-devel] [BUG] Issue with xz-java: Unknown Filter ID
Lasse, On the rc_dec_speed PR, it appears to me the main change that might speed things up is this unsigned comparison? https://github.com/tukaani-project/xz-java/compare/master...rc_dec_speed#diff-8a1afbf1609c4b2d7813b299fce056f7ddd58a4a24ff02b01c2fdba38ff7fd0dL24-R23 Did you try out the Integer.compareUnsigned method as an alternative? https://docs.oracle.com/javase/8/docs/api/java/lang/Integer.html#compareUnsigned-int-int- My observation historically has been that moving to 64bit operations is slower. I will try to find some time to take a closer look at the PR you referenced. Brett On Thu, Jul 18, 2024 at 9:13 AM Lasse Collin wrote: > > On 2024-03-07 Lasse Collin wrote: > > On 2024-03-05 Dennis Ens wrote: > > > > I hope 1.10 could be done in a month or two but I don't want to > > > > make any promises or serious predictions. Historically those > > > > haven't been accurate at all. > > > > > > I'll hope it's on the sooner side then. > > 1.10 is getting closer to being ready. I'd like to fix this-escape > warnings still as those are a sign of design errors. More info here: > > https://github.com/tukaani-project/xz-java/pull/18 > > I also created a branch rc_dec_speed but didn't create a PR for it. It > speeds up decompression on x86-64 with OpenJDK 22 roughly 4 %. I'm not > sure if it should be included in 1.10 in some form. If there is > interest, let's create a separate thread for it, or I can create a PR if > that is preferred. > > So now you and other people with Java knowledge can easily help because > especially the PR #18 doesn't need much knowledge of the project > internals. Thanks! > > I currently don't know what I should post to xz-devel and what to > GitHub. Many public communication channels makes things hard to follow. > > -- > Lasse Collin >
Re: [xz-devel] [BUG] Issue with xz-java: Unknown Filter ID
On 2024-03-07 Lasse Collin wrote: > On 2024-03-05 Dennis Ens wrote: > > > I hope 1.10 could be done in a month or two but I don't want to > > > make any promises or serious predictions. Historically those > > > haven't been accurate at all. > > > > I'll hope it's on the sooner side then. 1.10 is getting closer to being ready. I'd like to fix this-escape warnings still as those are a sign of design errors. More info here: https://github.com/tukaani-project/xz-java/pull/18 I also created a branch rc_dec_speed but didn't create a PR for it. It speeds up decompression on x86-64 with OpenJDK 22 roughly 4 %. I'm not sure if it should be included in 1.10 in some form. If there is interest, let's create a separate thread for it, or I can create a PR if that is preferred. So now you and other people with Java knowledge can easily help because especially the PR #18 doesn't need much knowledge of the project internals. Thanks! I currently don't know what I should post to xz-devel and what to GitHub. Many public communication channels makes things hard to follow. -- Lasse Collin
Re: [xz-devel] [BUG] Issue with xz-java: Unknown Filter ID
On 2024-03-05 Dennis Ens wrote: > > I hope 1.10 could be done in a month or two but I don't want to > > make any promises or serious predictions. Historically those > > haven't been accurate at all. > > I'll hope it's on the sooner side then. Is there a reason that > xz-java is so far behind its counterpart? These are unpaid hobby projects and the maintainers work on things they happen to find interesting. The focus was on XZ Utils quite long, now more attention is returning to XZ for Java. > It seems those filters have been in that version for a while, and it > seems strange they aren't compatible with each other. Maybe this > should be made more clear in the README? The README file in XZ for Java 1.9 specifies that the code implements the .xz file format specification version 1.0.4. That doesn't include the ARM64 or RISC-V filters. ARM64 filter was in the master branch already. RISC-V filter is there now too among a few other changes. README refers to spec version 1.2.0 now. I understand it can be cryptic to refer to a spec version but obviously one cannot list what future things are missing. One could list supported filters but in theory something else could be extended too. > I don't see anything about contributing on the xz-java github page. > What are the best practices for contributing to this project? I'm not sure if there is anything specific. Chatting on #tukaani can be good to get ideas discussed quickly but it requires that people happen to be online at the same time. > > The encoder implementations have some minor differences which > > affects both output and speed. Different releases can in theory > > have different output. XZ Utils output might change in future > > versions too. > > I see, that makes sense. I'm glad the difference is explainable and > not a bug. Can you explain exactly what the differences are? I don't remember much now. It's minor details but minor differences affect output already. > Does xz-java always do a better job compressing since it resulted in a > smaller file? They should be very close in practice. You need to compare to XZ Utils in single-threaded mode: xz -T1 -- Lasse Collin
Re: [xz-devel] [BUG] Issue with xz-java: Unknown Filter ID
> I hope 1.10 could be done in a month or two but I don't want to make any > promises or serious predictions. Historically those haven't been > accurate at all. I'll hope it's on the sooner side then. Is there a reason that xz-java is so far behind its counterpart? It seems those filters have been in that version for a while, and it seems strange they aren't compatible with each other. Maybe this should be made more clear in the README? I'd be happy to help out to make them compatible. I don't see anything about contributing on the xz-java github page. What are the best practices for contributing to this project? > However, Java in general is slower. Some compressors have a Java API but > the performance-critical code is native code. For example, java.util.zip > calls into native code from zlib. XZ for Java doesn't use any native > code (for now at least). That's good to know. It seems like if it's using native code it should be possible to get the speeds pretty similar. It sounds like an interesting problem to tackle. > XZ for Java lacks threading still. Implementing it is among the most > important tasks in XZ for Java. It helps with big files like your test > file but makes compressed file a little bigger. From your numbers I'm > not certain if you used xz in threaded mode or not. The time difference > looks unusually high for single-threaded mode for both compression and > decompression. The difference for a big input file in threaded mode > looks small though (unless it had lots of trivially-compressible > sections). > XZ Utils 5.6.0 also enables threaded mode by default. That would explain a lot of the speed difference I noticed then. I was using the latest code from master, so I should have been running in threaded mode. It's great how large an improvement that can make, although I know it's always complicated to implement threading without any bugs. > The encoder implementations have some minor differences which affects > both output and speed. Different releases can in theory have different > output. XZ Utils output might change in future versions too. I see, that makes sense. I'm glad the difference is explainable and not a bug. Can you explain exactly what the differences are? Does xz-java always do a better job compressing since it resulted in a smaller file?
Re: [xz-devel] [BUG] Issue with xz-java: Unknown Filter ID
On 2024-03-05 Dennis Ens wrote: > > The XZ for Java development is becoming active again but it may > > still take a while until the next stable release is out. A few > > other things are waiting in the queue from the past three years. > > Ah, I see. Thank you for the answer. Do you have a timeline of when > the changes are expected? I hope 1.10 could be done in a month or two but I don't want to make any promises or serious predictions. Historically those haven't been accurate at all. > First, xz-java seems much slower. I tested compressing and > decompressing a ~1.2 gigabyte file, and xz-java took 17m32.345s > compared to xz's 7m7.615s to compress. Decompressing was 0m21.760s to > 0m6.223s. Is there anything that can be done to improve the speed of > the Java version, or is c just a much more efficient programming > language? Brett Okken's patches (originally from early 2021) should improve compression speed. They are currently under review. Those are one of the things to get into the next stable release. However, Java in general is slower. Some compressors have a Java API but the performance-critical code is native code. For example, java.util.zip calls into native code from zlib. XZ for Java doesn't use any native code (for now at least). XZ for Java lacks threading still. Implementing it is among the most important tasks in XZ for Java. It helps with big files like your test file but makes compressed file a little bigger. From your numbers I'm not certain if you used xz in threaded mode or not. The time difference looks unusually high for single-threaded mode for both compression and decompression. The difference for a big input file in threaded mode looks small though (unless it had lots of trivially-compressible sections). In single-threaded mode, I would expect compressing with xz to take around 30-40 % less time than XZ for Java but your numbers show 60 % time reduction. XZ Utils 5.6.0 added x86-64 assembly (GCC & Clang only) which reduces per-thread decompression time by 20-40 % depending on the file and the computer. So that increases the difference between XZ Utils and XZ for Java too: decompression time can be roughly 50 % less with XZ Utils 5.6.0 in single-threaded mode on x86-64 compared to XZ for Java. XZ Utils 5.6.0 also enables threaded mode by default. > Also, I noticed that the results of compressing the files were > different sizes. They both worked, so I don't know if it's an issue, > but it does seem strange. The xz-java one was slightly smaller than > the xz one. The encoder implementations have some minor differences which affects both output and speed. Different releases can in theory have different output. XZ Utils output might change in future versions too. -- Lasse Collin
Re: [xz-devel] [BUG] Issue with xz-java: Unknown Filter ID
> > The files specifically were good-1-arm64-lzma2-1.xz and > > good-1-arm64-lzma2-2.xz and good-1-riscv-lzma2-1.xz and > > good-1-riscv-lzma2-2.xz. These did seem to work fine when I tried > > with xz, but not with xz-java. Do you think there might be a fix > > available for this soon? > > > XZ for Java 1.9 doesn't have ARM64 or RISC-V filter. The master branch > has ARM64 filter. RISC-V filter will likely be there this week. > > The XZ for Java development is becoming active again but it may still > take a while until the next stable release is out. A few other things > are waiting in the queue from the past three years. Ah, I see. Thank you for the answer. Do you have a timeline of when the changes are expected? I started to use xz, and was able to decompress the files without issue. Messing around with xz and xz-java, I noticed a few other things though. First, xz-java seems much slower. I tested compressing and decompressing a ~1.2 gigabyte file, and xz-java took 17m32.345s compared to xz's 7m7.615s to compress. Decompressing was 0m21.760s to 0m6.223s. Is there anything that can be done to improve the speed of the Java version, or is c just a much more efficient programming language? Also, I noticed that the results of compressing the files were different sizes. They both worked, so I don't know if it's an issue, but it does seem strange. The xz-java one was slightly smaller than the xz one. Thanks again for the help. -- Dennis Ens
Re: [xz-devel] [BUG] Issue with xz-java: Unknown Filter ID
On 2024-03-05 Dennis Ens wrote: > The files specifically were good-1-arm64-lzma2-1.xz and > good-1-arm64-lzma2-2.xz and good-1-riscv-lzma2-1.xz and > good-1-riscv-lzma2-2.xz. These did seem to work fine when I tried > with xz, but not with xz-java. Do you think there might be a fix > available for this soon? XZ for Java 1.9 doesn't have ARM64 or RISC-V filter. The master branch has ARM64 filter. RISC-V filter will likely be there this week. The XZ for Java development is becoming active again but it may still take a while until the next stable release is out. A few other things are waiting in the queue from the past three years. -- Lasse Collin