Steffen Mueller wrote:
Hi list,
you are probably all aware of the long time it takes to extract all data
from a pp'd binary or a .par file for large applications.
For a sample .par which contains a couple of modules for testing, I did
a profiling run and it turns out that a lot (20%) of the time is spent
in an accessor in Archive::Zip. Namely Archive::Zip::Member::fileName.
(There are over 100k calls to that in my example package. Real world use
might end up five times that number.)
Now, replacing the use of that accessor with a bare hash access in a
single place in PAR.pm results in a reduction to about three thousand
calls to that accessor. The extraction process runs, on my rather fast
machine, about 0.3 seconds faster (of 1.2 seconds script run-time which
includes loading all those modules). I'd expect the extraction to make
up about 0.8-0.9 seconds of the total run-time. That's definitely a
noticeable speed-up. A simple-minded micro-benchmark shows that direct
hash access is about three times faster than calling the method.
I'm aware that this is breaking encapsulation. This is bad. But it's
also a *huge* gain! Would you consider it feasible to do such a hack if
it helps this much?
The code I'm referring to is in PAR.pm's _first_member function. my
%names = map { ...->fileName... } $zip->members;
An alternative would be to convince the A::Zip maintainer to provide a
$zip->member_names() function which breaks encapsulation of
A::Zip::Member from within the Archive::Zip distribution. The problems
are that a) A::Zip is currently very strict about this and I don't want
to be the one to change that and b) the author has been away for some
time and A::Zip is currently community maintained.
What do you think?
Steffen
This might sound like passing the buck but processors are getting so
fast now, and are due to get faster. The extra speed would help with
old machines, but I am not sure a saved half second or so would be much
appreciated during a pp'd applications initial startup.
I suppose it could be argued that if there were a pp'd application that
an end user had to execute and exit, repeatedly, then there would be a
real difference. However, it is a contrived example. If that were the
need I imagine the original programmer would simply design the program
to stay running, and have a "run again" button, or some sort of similar
way to keep the application alive.
The real goal of any software is to solve a problem for the end user.
In the long run, maintainability, and especially ease of adding onto
existing code (extensibility) will do more for the end user. It also
lets programmers whose skills only go but so far (like me) do what I can
do. Such as debug other people's (modular) code.
Just an opinion.