Hi list,
you are probably all aware of the long time it takes to extract all data
from a pp'd binary or a .par file for large applications.
For a sample .par which contains a couple of modules for testing, I did
a profiling run and it turns out that a lot (20%) of the time is spent
in an accessor in Archive::Zip. Namely Archive::Zip::Member::fileName.
(There are over 100k calls to that in my example package. Real world use
might end up five times that number.)
Now, replacing the use of that accessor with a bare hash access in a
single place in PAR.pm results in a reduction to about three thousand
calls to that accessor. The extraction process runs, on my rather fast
machine, about 0.3 seconds faster (of 1.2 seconds script run-time which
includes loading all those modules). I'd expect the extraction to make
up about 0.8-0.9 seconds of the total run-time. That's definitely a
noticeable speed-up. A simple-minded micro-benchmark shows that direct
hash access is about three times faster than calling the method.
I'm aware that this is breaking encapsulation. This is bad. But it's
also a *huge* gain! Would you consider it feasible to do such a hack if
it helps this much?
The code I'm referring to is in PAR.pm's _first_member function. my
%names = map { ...->fileName... } $zip->members;
An alternative would be to convince the A::Zip maintainer to provide a
$zip->member_names() function which breaks encapsulation of
A::Zip::Member from within the Archive::Zip distribution. The problems
are that a) A::Zip is currently very strict about this and I don't want
to be the one to change that and b) the author has been away for some
time and A::Zip is currently community maintained.
What do you think?
Steffen