On 2022-03-07 01:08:40 [+0200], Lasse Collin wrote: > Hello! Hi, > I committed something. The liblzma part shouldn't need any big changes, > I hope. There are a few FIXMEs but some of them might actually be fine > as is. The xz side is just an initial commit, there isn't even > --memlimit-threading option yet (I will add it). > > Testing is welcome. It would be nice if someone who has 12-24 hardware > threads could test if it scales well. One needs a file with like a > hundred blocks, so with the default xz -6 that means a 2.5 gigabyte > uncompressed file, smaller if one uses, for example, --block-size=8MiB > when compressing.
I made Stream Blocks CompOffset UncompOffset CompSize UncompSize Ratio Check Padding 1 777 0 0 2.386.777.028 19.540.326.400 0,122 CRC64 0 one block is 25.165.824. 32 cores: | $ time ./src/xz/xz -tv tars.tar.xz -T0 | tars.tar.xz (1/1) | 100 % 2.276,2 MiB / 18,2 GiB = 0,122 1,6 GiB/s 0:11 | | real 0m11,162s | user 5m44,108s | sys 0m1,988s 256 cores: | $ time ./src/xz/xz -tv tars.tar.xz -T0 | tars.tar.xz (1/1) | 100 % 2.276,2 MiB / 18,2 GiB = 0,122 3,4 GiB/s 0:05 | | real 0m5,403s | user 4m0,298s | sys 0m24,315s it appears to work :) If I see this right, then the file is too small or xz too fast but it does not appear that xz manages to create more than 100 threads. and decompression to disk | $ time ~bigeasy/xz/src/xz/xz -dvk tars.tar.xz -T0 | tars.tar.xz (1/1) | 100 % 2.276,2 MiB / 18,2 GiB = 0,122 746 MiB/s 0:24 | | real 0m25,064s | user 3m49,175s | sys 0m29,748s appears to block at around 10 to 14 threads or so and then it hangs at the end until disk I/O finishes. Decent. Assuming disk I/O is slow, say 10MiB/s, and we would 388 CPUs (blocks/2) then it would decompress the whole file into memory and stuck on disk I/O? In terms of scaling, xz -tv of that same file with with -T1…64: | CPUS: 1 | tars.tar.xz: 2.276,2 MiB / 18,2 GiB = 0,122, 85 MiB/s, 3:38 | | real 3m38,047s | user 3m37,404s | sys 0m0,626s | CPUS: 2 | tars.tar.xz: 2.276,2 MiB / 18,2 GiB = 0,122, 171 MiB/s, 1:49 | | real 1m49,296s | user 3m41,529s | sys 0m1,433s | CPUS: 3 | tars.tar.xz: 2.276,2 MiB / 18,2 GiB = 0,122, 256 MiB/s, 1:12 | | real 1m12,832s | user 3m40,929s | sys 0m1,199s | CPUS: 4 | tars.tar.xz: 2.276,2 MiB / 18,2 GiB = 0,122, 341 MiB/s, 0:54 | | real 0m54,616s | user 3m40,596s | sys 0m1,161s | CPUS: 5 | tars.tar.xz: 2.276,2 MiB / 18,2 GiB = 0,122, 425 MiB/s, 0:43 | | real 0m43,900s | user 3m41,306s | sys 0m1,038s | CPUS: 6 | tars.tar.xz: 2.276,2 MiB / 18,2 GiB = 0,122, 510 MiB/s, 0:36 | | real 0m36,587s | user 3m41,527s | sys 0m1,076s | CPUS: 7 | tars.tar.xz: 2.276,2 MiB / 18,2 GiB = 0,122, 591 MiB/s, 0:31 | | real 0m31,568s | user 3m41,559s | sys 0m1,079s | CPUS: 8 | tars.tar.xz: 2.276,2 MiB / 18,2 GiB = 0,122, 676 MiB/s, 0:27 | | real 0m27,579s | user 3m42,098s | sys 0m0,966s | CPUS: 9 | tars.tar.xz: 2.276,2 MiB / 18,2 GiB = 0,122, 758 MiB/s, 0:24 | | real 0m24,614s | user 3m42,318s | sys 0m1,119s | CPUS: 10 | tars.tar.xz: 2.276,2 MiB / 18,2 GiB = 0,122, 844 MiB/s, 0:22 | | real 0m22,111s | user 3m41,353s | sys 0m1,152s | CPUS: 11 | tars.tar.xz: 2.276,2 MiB / 18,2 GiB = 0,122, 923 MiB/s, 0:20 | | real 0m20,219s | user 3m43,327s | sys 0m1,311s | CPUS: 12 | tars.tar.xz: 2.276,2 MiB / 18,2 GiB = 0,122, 1,0 GiB/s, 0:18 | | real 0m18,442s | user 3m41,710s | sys 0m1,110s | CPUS: 13 | tars.tar.xz: 2.276,2 MiB / 18,2 GiB = 0,122, 1,1 GiB/s, 0:17 | | real 0m17,067s | user 3m42,102s | sys 0m1,176s | CPUS: 14 | tars.tar.xz: 2.276,2 MiB / 18,2 GiB = 0,122, 1,1 GiB/s, 0:15 | | real 0m15,861s | user 3m41,978s | sys 0m1,171s | CPUS: 15 | tars.tar.xz: 2.276,2 MiB / 18,2 GiB = 0,122, 1,2 GiB/s, 0:14 | | real 0m14,866s | user 3m42,247s | sys 0m1,108s | CPUS: 16 | tars.tar.xz: 2.276,2 MiB / 18,2 GiB = 0,122, 1,3 GiB/s, 0:13 | | real 0m13,936s | user 3m41,086s | sys 0m1,017s | CPUS: 17 | tars.tar.xz: 2.276,2 MiB / 18,2 GiB = 0,122, 1,4 GiB/s, 0:13 | | real 0m13,200s | user 3m42,171s | sys 0m1,137s | CPUS: 18 | tars.tar.xz: 2.276,2 MiB / 18,2 GiB = 0,122, 1,5 GiB/s, 0:12 | | real 0m12,539s | user 3m43,286s | sys 0m1,355s | CPUS: 19 | tars.tar.xz: 2.276,2 MiB / 18,2 GiB = 0,122, 1,5 GiB/s, 0:11 | | real 0m11,949s | user 3m44,354s | sys 0m1,111s | CPUS: 20 | tars.tar.xz: 2.276,2 MiB / 18,2 GiB = 0,122, 1,6 GiB/s, 0:11 | | real 0m11,216s | user 3m42,635s | sys 0m1,202s | CPUS: 21 | tars.tar.xz: 2.276,2 MiB / 18,2 GiB = 0,122, 1,7 GiB/s, 0:10 | | real 0m10,655s | user 3m41,742s | sys 0m1,123s | CPUS: 22 | tars.tar.xz: 2.276,2 MiB / 18,2 GiB = 0,122, 1,8 GiB/s, 0:10 | | real 0m10,232s | user 3m42,328s | sys 0m1,211s | CPUS: 23 | tars.tar.xz: 2.276,2 MiB / 18,2 GiB = 0,122, 1,9 GiB/s, 0:09 | | real 0m9,812s | user 3m42,091s | sys 0m0,935s | CPUS: 24 | tars.tar.xz: 2.276,2 MiB / 18,2 GiB = 0,122, 1,9 GiB/s, 0:09 | | real 0m9,448s | user 3m42,343s | sys 0m1,220s | CPUS: 25 | tars.tar.xz: 2.276,2 MiB / 18,2 GiB = 0,122, 2,0 GiB/s, 0:09 | | real 0m9,099s | user 3m42,985s | sys 0m1,226s | CPUS: 26 | tars.tar.xz: 2.276,2 MiB / 18,2 GiB = 0,122, 2,1 GiB/s, 0:08 | | real 0m8,750s | user 3m43,389s | sys 0m1,401s | CPUS: 27 | tars.tar.xz: 2.276,2 MiB / 18,2 GiB = 0,122, 2,2 GiB/s, 0:08 | | real 0m8,444s | user 3m43,105s | sys 0m1,245s | CPUS: 28 | tars.tar.xz: 2.276,2 MiB / 18,2 GiB = 0,122, 2,3 GiB/s, 0:08 | | real 0m8,119s | user 3m43,075s | sys 0m1,103s | CPUS: 29 | tars.tar.xz: 2.276,2 MiB / 18,2 GiB = 0,122, 2,3 GiB/s, 0:07 | | real 0m7,850s | user 3m43,279s | sys 0m1,202s | CPUS: 30 | tars.tar.xz: 2.276,2 MiB / 18,2 GiB = 0,122, 2,4 GiB/s, 0:07 | | real 0m7,601s | user 3m43,112s | sys 0m1,043s | CPUS: 31 | tars.tar.xz: 2.276,2 MiB / 18,2 GiB = 0,122, 2,5 GiB/s, 0:07 | | real 0m7,381s | user 3m43,070s | sys 0m1,354s | CPUS: 32 | tars.tar.xz: 2.276,2 MiB / 18,2 GiB = 0,122, 2,5 GiB/s, 0:07 | | real 0m7,241s | user 3m44,362s | sys 0m1,247s | CPUS: 33 | tars.tar.xz: 2.276,2 MiB / 18,2 GiB = 0,122, 2,6 GiB/s, 0:06 | | real 0m7,027s | user 3m44,586s | sys 0m1,152s | CPUS: 34 | tars.tar.xz: 2.276,2 MiB / 18,2 GiB = 0,122, 2,7 GiB/s, 0:06 | | real 0m6,822s | user 3m44,385s | sys 0m1,475s | CPUS: 35 | tars.tar.xz: 2.276,2 MiB / 18,2 GiB = 0,122, 2,8 GiB/s, 0:06 | | real 0m6,637s | user 3m44,306s | sys 0m1,263s | CPUS: 36 | tars.tar.xz: 2.276,2 MiB / 18,2 GiB = 0,122, 2,8 GiB/s, 0:06 | | real 0m6,479s | user 3m45,268s | sys 0m0,991s | CPUS: 37 | tars.tar.xz: 2.276,2 MiB / 18,2 GiB = 0,122, 2,9 GiB/s, 0:06 | | real 0m6,336s | user 3m45,405s | sys 0m1,175s | CPUS: 38 | tars.tar.xz: 2.276,2 MiB / 18,2 GiB = 0,122, 3,0 GiB/s, 0:06 | | real 0m6,183s | user 3m45,455s | sys 0m1,153s | CPUS: 39 | tars.tar.xz: 2.276,2 MiB / 18,2 GiB = 0,122, 3,0 GiB/s, 0:05 | | real 0m6,021s | user 3m45,547s | sys 0m1,331s | CPUS: 40 | tars.tar.xz: 2.276,2 MiB / 18,2 GiB = 0,122, 3,1 GiB/s, 0:05 | | real 0m5,902s | user 3m45,937s | sys 0m1,224s | CPUS: 41 | tars.tar.xz: 2.276,2 MiB / 18,2 GiB = 0,122, 3,2 GiB/s, 0:05 | | real 0m5,772s | user 3m46,520s | sys 0m1,261s | CPUS: 42 | tars.tar.xz: 2.276,2 MiB / 18,2 GiB = 0,122, 3,2 GiB/s, 0:05 | | real 0m5,650s | user 3m46,616s | sys 0m1,276s | CPUS: 43 | tars.tar.xz: 2.276,2 MiB / 18,2 GiB = 0,122, 3,3 GiB/s, 0:05 | | real 0m5,545s | user 3m46,671s | sys 0m1,474s | CPUS: 44 | tars.tar.xz: 2.276,2 MiB / 18,2 GiB = 0,122, 3,4 GiB/s, 0:05 | | real 0m5,429s | user 3m46,988s | sys 0m1,264s | CPUS: 45 | tars.tar.xz: 2.276,2 MiB / 18,2 GiB = 0,122, 3,4 GiB/s, 0:05 | | real 0m5,338s | user 3m46,985s | sys 0m1,598s | CPUS: 46 | tars.tar.xz: 2.276,2 MiB / 18,2 GiB = 0,122, 3,5 GiB/s, 0:05 | | real 0m5,248s | user 3m47,202s | sys 0m1,724s | CPUS: 47 | tars.tar.xz: 2.276,2 MiB / 18,2 GiB = 0,122, 3,6 GiB/s, 0:05 | | real 0m5,138s | user 3m47,641s | sys 0m1,339s | CPUS: 48 | tars.tar.xz: 2.276,2 MiB / 18,2 GiB = 0,122, 3,6 GiB/s, 0:05 | | real 0m5,054s | user 3m48,088s | sys 0m1,335s | CPUS: 49 | tars.tar.xz: 2.276,2 MiB / 18,2 GiB = 0,122, 3,7 GiB/s, 0:04 | | real 0m4,981s | user 3m48,815s | sys 0m1,397s | CPUS: 50 | tars.tar.xz: 2.276,2 MiB / 18,2 GiB = 0,122, 3,8 GiB/s, 0:04 | | real 0m4,890s | user 3m48,999s | sys 0m1,601s | CPUS: 51 | tars.tar.xz: 2.276,2 MiB / 18,2 GiB = 0,122, 3,8 GiB/s, 0:04 | | real 0m4,786s | user 3m48,623s | sys 0m1,382s | CPUS: 52 | tars.tar.xz: 2.276,2 MiB / 18,2 GiB = 0,122, 3,9 GiB/s, 0:04 | | real 0m4,720s | user 3m49,048s | sys 0m1,555s | CPUS: 53 | tars.tar.xz: 2.276,2 MiB / 18,2 GiB = 0,122, 3,9 GiB/s, 0:04 | | real 0m4,658s | user 3m49,990s | sys 0m1,712s | CPUS: 54 | tars.tar.xz: 2.276,2 MiB / 18,2 GiB = 0,122, 4,0 GiB/s, 0:04 | | real 0m4,603s | user 3m52,079s | sys 0m1,757s | CPUS: 55 | tars.tar.xz: 2.276,2 MiB / 18,2 GiB = 0,122, 4,1 GiB/s, 0:04 | | real 0m4,485s | user 3m50,508s | sys 0m1,509s | CPUS: 56 | tars.tar.xz: 2.276,2 MiB / 18,2 GiB = 0,122, 4,1 GiB/s, 0:04 | | real 0m4,444s | user 3m51,148s | sys 0m1,764s | CPUS: 57 | tars.tar.xz: 2.276,2 MiB / 18,2 GiB = 0,122, 4,2 GiB/s, 0:04 | | real 0m4,381s | user 3m51,783s | sys 0m1,816s | CPUS: 58 | tars.tar.xz: 2.276,2 MiB / 18,2 GiB = 0,122, 4,3 GiB/s, 0:04 | | real 0m4,306s | user 3m51,901s | sys 0m1,671s | CPUS: 59 | tars.tar.xz: 2.276,2 MiB / 18,2 GiB = 0,122, 4,3 GiB/s, 0:04 | | real 0m4,250s | user 3m51,997s | sys 0m1,809s | CPUS: 60 | tars.tar.xz: 2.276,2 MiB / 18,2 GiB = 0,122, 4,4 GiB/s, 0:04 | | real 0m4,199s | user 3m52,443s | sys 0m1,889s | CPUS: 61 | tars.tar.xz: 2.276,2 MiB / 18,2 GiB = 0,122, 4,4 GiB/s, 0:04 | | real 0m4,168s | user 3m53,326s | sys 0m1,906s | CPUS: 62 | tars.tar.xz: 2.276,2 MiB / 18,2 GiB = 0,122, 4,5 GiB/s, 0:04 | | real 0m4,114s | user 3m52,766s | sys 0m2,308s | CPUS: 63 | tars.tar.xz: 2.276,2 MiB / 18,2 GiB = 0,122, 4,5 GiB/s, 0:04 | | real 0m4,074s | user 3m53,676s | sys 0m2,001s | CPUS: 64 | tars.tar.xz: 2.272,9 MiB / 18,2 GiB = 0,122, 4,6 GiB/s, 0:03 | | real 0m4,023s | user 3m53,527s | sys 0m1,899s time of 1 CPU / 64 = (3 * 60 + 38) / 64 = 3.40625 Looks okay. > If the input is broken, it should produce as much output as the > single-threaded stable version does. That is, if one thread detects an > error, the data before that point is first flushed out before the error > is reported. This has pros and cons. It would be easy to add a flag to > allow switching to fast error reporting for applications that don't > care about partial output from broken files. I guess most of them don't care because an error is usually an abort, the sooner, the better. It is probably the exception that you want decompress it despite the error and maybe go on with the next block and see what is left. Sebastian