Bug#1067440: Compression makes searching packages very slow

2024-03-22 Thread Laurențiu Nicola
Thanks for the quick fix, I can confirm it's much faster now:

# apt 2.7.13, trixie
$ time apt search librust-
real0m30.185s
user0m28.286s
sys 0m1.729s

# apt 2.7.14, trixie
$ time apt search librust-
real0m0.640s
user0m0.490s
sys 0m0.035s

And sorry for the empty subject, it was my first time using bugs.debian.org. It 
told me to add comments by sending an email to the bug address and I didn't 
know whether to copy-paste the original subject. It's not like I can manually 
set In-Reply-To and References from my email client, so it would have been 
broken anyway.



Bug#1067440: Compression makes searching packages very slow

2024-03-21 Thread Julian Andres Klode
On Thu, Mar 21, 2024 at 09:25:47PM +0100, Julian Andres Klode wrote:
> On Thu, Mar 21, 2024 at 06:01:12PM +0200, Laurențiu Nicola wrote:
> > Package: apt
> > Version: 2.7.12
> > 
> > I noticed that searching for packages is very slow if the package lists are 
> > compressed. To reproduce, remove `/var/lib/apt/lists`, enable
> > 
> > Acquire::GzipIndexes "true"; Acquire::CompressionTypes::Order:: "gz";
> > 
> > , run `apt update`. This enables LZ4 compression on my systems, but I don't 
> > think the exact method matters. You can then run `apt search librust`, 
> > which takes about 19 seconds in a Debian 12 container (docker.io/debian:12 
> > has compression already set up), compared to 0.4 seconds without 
> > compression.
> > 
> > Also tested on Ubuntu 22.04 and 24.04, so the exact APT version shouldn't 
> > matter too much.
> > 
> > I tried to look into it, and `strace -e trace=openat apt-cache search 
> > librust` shows it reopen and re-read one of the package lists:
> > 
> > openat(AT_FDCWD, 
> > "/var/lib/apt/lists/archive.ubuntu.com_ubuntu_dists_jammy_universe_binary-amd64_Packages.lz4",
> >  O_RDONLY) = 16
> > librust-addr2line+default-dev - Cross-platform symbolication library - 
> > feature "default"
> > openat(AT_FDCWD, 
> > "/var/lib/apt/lists/archive.ubuntu.com_ubuntu_dists_jammy_universe_binary-amd64_Packages.lz4",
> >  O_RDONLY) = 16
> > librust-addr2line+object-dev - Cross-platform symbolication library - 
> > feature "object"
> > openat(AT_FDCWD, 
> > "/var/lib/apt/lists/archive.ubuntu.com_ubuntu_dists_jammy_universe_binary-amd64_Packages.lz4",
> >  O_RDONLY) = 16
> > librust-addr2line+rustc-demangle-dev - Cross-platform symbolication library 
> > - feature "rustc-demangle"
> > openat(AT_FDCWD, 
> > "/var/lib/apt/lists/archive.ubuntu.com_ubuntu_dists_jammy_universe_binary-amd64_Packages.lz4",
> >  O_RDONLY) = 16
> > librust-addr2line+std-dev - Cross-platform symbolication library - feature 
> > "std"
> > 
> > (you can use -e trace=openat,read to confirm that it's actually reading the 
> > file)
> > 
> > I believe it's quadratic in the number of search results, and this is 
> > related to the pseudo-indexing mechanism used by APT (see 
> > `pkgRecords::Lookup` in apt-pkg). Each lookup call will have to decompress 
> > the file in order to seek to the destination.
> > 
> > Unfortunately, I suspect this isn't exactly an easy fix, given the current 
> > design.
> > 
> 
> Going to respond to this but also including responses to your followup email
> which has a broken Subject:
> 
> 
> Searching works by ordering the packages based on file, offset
> and then iterating over them and looking them up. Seeking forward
> to a higher offset does not involve a reopen, we just skip content
> in betwene.
> 
> Full-text search is inside the description in the section parsed
> for each package.
> 
> It's not clear why this fails on bookworm - I can reproduce that -
> t certainly is fine in git main on my Ubuntu 24.04 system.

This should be fixed in

https://salsa.debian.org/apt-team/apt/-/merge_requests/336

What happened was that we got called for the same offset twice, e.g.
let's say 42.

Now our section size is let's say 7.

So what happened is that we did:

- Jump to 42  =>  offset = 42
- Parse   =>  offset = 49
- Jump to 42  => ugh gotta go back

Luckily we still have the currently used section in the Buffer; i.e.
we only looked at bytes 49... but our buffer still starts
at 42, so we can just look back to find our section again.

Sadly we can't avoid the reparse of the section here, because the
pkgTagSection passed to pkgTagFile::Jump() could be a different one
than the last one.

I have a patch to avoid the reparse by storing offset a level higher
where the pkgTagSection is fixed but I'm in favour of not shipping that
- those would only be marginal gains and we want to exercise this code
path.
-- 
debian developer - deb.li/jak | jak-linux.org - free software dev
ubuntu core developer  i speak de, en



Bug#1067440: Compression makes searching packages very slow

2024-03-21 Thread Julian Andres Klode
On Thu, Mar 21, 2024 at 06:01:12PM +0200, Laurențiu Nicola wrote:
> Package: apt
> Version: 2.7.12
> 
> I noticed that searching for packages is very slow if the package lists are 
> compressed. To reproduce, remove `/var/lib/apt/lists`, enable
> 
> Acquire::GzipIndexes "true"; Acquire::CompressionTypes::Order:: "gz";
> 
> , run `apt update`. This enables LZ4 compression on my systems, but I don't 
> think the exact method matters. You can then run `apt search librust`, which 
> takes about 19 seconds in a Debian 12 container (docker.io/debian:12 has 
> compression already set up), compared to 0.4 seconds without compression.
> 
> Also tested on Ubuntu 22.04 and 24.04, so the exact APT version shouldn't 
> matter too much.
> 
> I tried to look into it, and `strace -e trace=openat apt-cache search 
> librust` shows it reopen and re-read one of the package lists:
> 
> openat(AT_FDCWD, 
> "/var/lib/apt/lists/archive.ubuntu.com_ubuntu_dists_jammy_universe_binary-amd64_Packages.lz4",
>  O_RDONLY) = 16
> librust-addr2line+default-dev - Cross-platform symbolication library - 
> feature "default"
> openat(AT_FDCWD, 
> "/var/lib/apt/lists/archive.ubuntu.com_ubuntu_dists_jammy_universe_binary-amd64_Packages.lz4",
>  O_RDONLY) = 16
> librust-addr2line+object-dev - Cross-platform symbolication library - feature 
> "object"
> openat(AT_FDCWD, 
> "/var/lib/apt/lists/archive.ubuntu.com_ubuntu_dists_jammy_universe_binary-amd64_Packages.lz4",
>  O_RDONLY) = 16
> librust-addr2line+rustc-demangle-dev - Cross-platform symbolication library - 
> feature "rustc-demangle"
> openat(AT_FDCWD, 
> "/var/lib/apt/lists/archive.ubuntu.com_ubuntu_dists_jammy_universe_binary-amd64_Packages.lz4",
>  O_RDONLY) = 16
> librust-addr2line+std-dev - Cross-platform symbolication library - feature 
> "std"
> 
> (you can use -e trace=openat,read to confirm that it's actually reading the 
> file)
> 
> I believe it's quadratic in the number of search results, and this is related 
> to the pseudo-indexing mechanism used by APT (see `pkgRecords::Lookup` in 
> apt-pkg). Each lookup call will have to decompress the file in order to seek 
> to the destination.
> 
> Unfortunately, I suspect this isn't exactly an easy fix, given the current 
> design.
> 

Going to respond to this but also including responses to your followup email
which has a broken Subject:


Searching works by ordering the packages based on file, offset
and then iterating over them and looking them up. Seeking forward
to a higher offset does not involve a reopen, we just skip content
in betwene.

Full-text search is inside the description in the section parsed
for each package.

It's not clear why this fails on bookworm - I can reproduce that -
t certainly is fine in git main on my Ubuntu 24.04 system.


-- 
debian developer - deb.li/jak | jak-linux.org - free software dev
ubuntu core developer  i speak de, en



Bug#1067440: Compression makes searching packages very slow

2024-03-21 Thread Laurențiu Nicola
Package: apt
Version: 2.7.12

I noticed that searching for packages is very slow if the package lists are 
compressed. To reproduce, remove `/var/lib/apt/lists`, enable

Acquire::GzipIndexes "true"; Acquire::CompressionTypes::Order:: "gz";

, run `apt update`. This enables LZ4 compression on my systems, but I don't 
think the exact method matters. You can then run `apt search librust`, which 
takes about 19 seconds in a Debian 12 container (docker.io/debian:12 has 
compression already set up), compared to 0.4 seconds without compression.

Also tested on Ubuntu 22.04 and 24.04, so the exact APT version shouldn't 
matter too much.

I tried to look into it, and `strace -e trace=openat apt-cache search librust` 
shows it reopen and re-read one of the package lists:

openat(AT_FDCWD, 
"/var/lib/apt/lists/archive.ubuntu.com_ubuntu_dists_jammy_universe_binary-amd64_Packages.lz4",
 O_RDONLY) = 16
librust-addr2line+default-dev - Cross-platform symbolication library - feature 
"default"
openat(AT_FDCWD, 
"/var/lib/apt/lists/archive.ubuntu.com_ubuntu_dists_jammy_universe_binary-amd64_Packages.lz4",
 O_RDONLY) = 16
librust-addr2line+object-dev - Cross-platform symbolication library - feature 
"object"
openat(AT_FDCWD, 
"/var/lib/apt/lists/archive.ubuntu.com_ubuntu_dists_jammy_universe_binary-amd64_Packages.lz4",
 O_RDONLY) = 16
librust-addr2line+rustc-demangle-dev - Cross-platform symbolication library - 
feature "rustc-demangle"
openat(AT_FDCWD, 
"/var/lib/apt/lists/archive.ubuntu.com_ubuntu_dists_jammy_universe_binary-amd64_Packages.lz4",
 O_RDONLY) = 16
librust-addr2line+std-dev - Cross-platform symbolication library - feature "std"

(you can use -e trace=openat,read to confirm that it's actually reading the 
file)

I believe it's quadratic in the number of search results, and this is related 
to the pseudo-indexing mechanism used by APT (see `pkgRecords::Lookup` in 
apt-pkg). Each lookup call will have to decompress the file in order to seek to 
the destination.

Unfortunately, I suspect this isn't exactly an easy fix, given the current 
design.