Bug#452577: dpkg-dev: dpkg-shlibdeps no longer optimizes dpkg --search usage

2007-11-23 Thread Aaron M. Ucko
Raphael Hertzog [EMAIL PROTECTED] writes:

 Somehow I was thinking I had implemented a cache but I must have mixed up
 with something else. Thanks for the check!

No problem.

 I did my best to make it understandable compared to the old code. :)

Well done.

 Applied. But it doesn't give a huge performance boost. On a run on
 kdemultimedia, it saves 8 seconds out of 3m12. I think we could save much

Thanks!  As for the performance gain, it really depends on how many
packages you have installed, and I suspect also on how much RAM you
have.  At any rate, it shouldn't hurt as long as it doesn't introduce
any new bugs, which I'm pretty sure I've avoided doing this time
around. ;-)

 But optimizing this part probably needs somewhat more care and is a bit
 less straightforward. I also wonder how much memory it would cost on big
 packages.

Right, I was pondering that as well, but decided to stick with tthe
straightforward optimization for now.

 But if you have some time to spend on it, I'll happily review a patch.

Good to know, but I don't anticipate having enough time anytime soon.

-- 
Aaron M. Ucko, KB1CJC (amu at alum.mit.edu, ucko at debian.org)
http://www.mit.edu/~amu/ | http://stuff.mit.edu/cgi/finger/[EMAIL PROTECTED]




-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]



Bug#452577: dpkg-dev: dpkg-shlibdeps no longer optimizes dpkg --search usage

2007-11-23 Thread Raphael Hertzog
On Fri, 23 Nov 2007, Aaron M. Ucko wrote:
 dpkg-shlibdeps's latest incarnation (as of 1.14.8 and its experimental
 predecessors) introduces a performance regression: it runs dpkg
 --search once per executable or library being examined, rather than
 caching its results any fashion.  As each call requires scanning every
 package's contents AFAICT, the resulting procedure can take a LONG
 time on systems with many packages installed.  (It would be great if
 dpkg-query could itself run faster, but that's a separate issue.)

Somehow I was thinking I had implemented a cache but I must have mixed up
with something else. Thanks for the check!

 (At any rate, the rewrite at least resulted in much clearer and more
 readily patched logic.)

I did my best to make it understandable compared to the old code. :)

 Could you please review and apply the attached patch (against 1.14.10)
 when you get a chance?

Applied. But it doesn't give a huge performance boost. On a run on
kdemultimedia, it saves 8 seconds out of 3m12. I think we could save much
more by caching the Dpkg::Shlibs::Objdump::Object objects created by the
line:
my $id = $dumplibs_wo_symfile-parse($lib);

But optimizing this part probably needs somewhat more care and is a bit
less straightforward. I also wonder how much memory it would cost on big
packages.

But if you have some time to spend on it, I'll happily review a patch.

Cheers,
-- 
Raphaël Hertzog

Premier livre français sur Debian GNU/Linux :
http://www.ouaza.com/livre/admin-debian/





Bug#452577: dpkg-dev: dpkg-shlibdeps no longer optimizes dpkg --search usage

2007-11-23 Thread Aaron M. Ucko
Package: dpkg-dev
Version: 1.14.8
Severity: normal
Tags: patch

dpkg-shlibdeps's latest incarnation (as of 1.14.8 and its experimental
predecessors) introduces a performance regression: it runs dpkg
--search once per executable or library being examined, rather than
caching its results any fashion.  As each call requires scanning every
package's contents AFAICT, the resulting procedure can take a LONG
time on systems with many packages installed.  (It would be great if
dpkg-query could itself run faster, but that's a separate issue.)

I previously reported the same high-level problem as #421290, but I'm
opening a new bug because the underlying code base is so different.
As before, I have put together a patch that I believe DTRT.  This
time, I've even tested it against complicated cases such as libkcal2b,
to avoid a repeat of #425641, for which I do sincerely apologize.  (At
any rate, the rewrite at least resulted in much clearer and more
readily patched logic.)

Could you please review and apply the attached patch (against 1.14.10)
when you get a chance?

Thanks!

-- System Information:
Debian Release: lenny/sid
  APT prefers unstable
  APT policy: (500, 'unstable'), (500, 'testing'), (500, 'stable')
Architecture: i386 (i686)

Kernel: Linux 2.6.22
Locale: LANG=en_US.UTF-8, LC_CTYPE=en_US.UTF-8 (charmap=UTF-8)
Shell: /bin/sh linked to /bin/bash
--- dpkg-shlibdeps.1.14.10  2007-11-23 13:47:46.0 -0500
+++ dpkg-shlibdeps.optimized2007-11-23 13:53:20.0 -0500
@@ -483,9 +483,22 @@
 return undef;
 }
 
+my %cached_pkgmatch = ();
+
 sub find_packages {
-my @files = (@_);
+my @files;
 my $pkgmatch = {};
+
+foreach (@_) {
+   if (exists $cached_pkgmatch{$_}) {
+   $pkgmatch-{$_} = $cached_pkgmatch{$_};
+   } else {
+   push @files, $_;
+   $cached_pkgmatch{$_} = []; # placeholder to cache misses too.
+   }
+}
+return $pkgmatch unless @files;
+
 my $pid = open(DPKG, -|);
 syserr(_g(cannot fork for dpkg --search)) unless defined($pid);
 if (!$pid) {
@@ -503,7 +516,7 @@
print(STDERR  $_\n)
|| syserr(_g(write diversion info to stderr));
} elsif (m/^([^:]+): (\S+)$/) {
-   $pkgmatch-{$2} = [ split(/, /, $1) ];
+   $cached_pkgmatch{$2} = $pkgmatch-{$2} = [ split(/, /, $1) ];
} else {
warning(_g(unknown output from dpkg --search: '%s'), $_);
}