Bug#1067440: Compression makes searching packages very slow

2024-03-22 Thread Laurențiu Nicola
Thanks for the quick fix, I can confirm it's much faster now:

# apt 2.7.13, trixie
$ time apt search librust-
real0m30.185s
user0m28.286s
sys 0m1.729s

# apt 2.7.14, trixie
$ time apt search librust-
real0m0.640s
user0m0.490s
sys 0m0.035s

And sorry for the empty subject, it was my first time using bugs.debian.org. It 
told me to add comments by sending an email to the bug address and I didn't 
know whether to copy-paste the original subject. It's not like I can manually 
set In-Reply-To and References from my email client, so it would have been 
broken anyway.



Bug#1067440:

2024-03-21 Thread Laurențiu Nicola
Correction: because of full-text search, it might actually be quadratic in the 
number of packages (I didn't check). And it might be possible to fix it, by 
going through the compressed stream just once, instead of restarting (assuming 
the results are returned in the file order, which seems reasonable).

Bug#1067440: Compression makes searching packages very slow

2024-03-21 Thread Laurențiu Nicola
Package: apt
Version: 2.7.12

I noticed that searching for packages is very slow if the package lists are 
compressed. To reproduce, remove `/var/lib/apt/lists`, enable

Acquire::GzipIndexes "true"; Acquire::CompressionTypes::Order:: "gz";

, run `apt update`. This enables LZ4 compression on my systems, but I don't 
think the exact method matters. You can then run `apt search librust`, which 
takes about 19 seconds in a Debian 12 container (docker.io/debian:12 has 
compression already set up), compared to 0.4 seconds without compression.

Also tested on Ubuntu 22.04 and 24.04, so the exact APT version shouldn't 
matter too much.

I tried to look into it, and `strace -e trace=openat apt-cache search librust` 
shows it reopen and re-read one of the package lists:

openat(AT_FDCWD, 
"/var/lib/apt/lists/archive.ubuntu.com_ubuntu_dists_jammy_universe_binary-amd64_Packages.lz4",
 O_RDONLY) = 16
librust-addr2line+default-dev - Cross-platform symbolication library - feature 
"default"
openat(AT_FDCWD, 
"/var/lib/apt/lists/archive.ubuntu.com_ubuntu_dists_jammy_universe_binary-amd64_Packages.lz4",
 O_RDONLY) = 16
librust-addr2line+object-dev - Cross-platform symbolication library - feature 
"object"
openat(AT_FDCWD, 
"/var/lib/apt/lists/archive.ubuntu.com_ubuntu_dists_jammy_universe_binary-amd64_Packages.lz4",
 O_RDONLY) = 16
librust-addr2line+rustc-demangle-dev - Cross-platform symbolication library - 
feature "rustc-demangle"
openat(AT_FDCWD, 
"/var/lib/apt/lists/archive.ubuntu.com_ubuntu_dists_jammy_universe_binary-amd64_Packages.lz4",
 O_RDONLY) = 16
librust-addr2line+std-dev - Cross-platform symbolication library - feature "std"

(you can use -e trace=openat,read to confirm that it's actually reading the 
file)

I believe it's quadratic in the number of search results, and this is related 
to the pseudo-indexing mechanism used by APT (see `pkgRecords::Lookup` in 
apt-pkg). Each lookup call will have to decompress the file in order to seek to 
the destination.

Unfortunately, I suspect this isn't exactly an easy fix, given the current 
design.