Steffen Mueller wrote:
Hi list,

you are probably all aware of the long time it takes to extract all data from a pp'd binary or a .par file for large applications.

For a sample .par which contains a couple of modules for testing, I did a profiling run and it turns out that a lot (20%) of the time is spent in an accessor in Archive::Zip. Namely Archive::Zip::Member::fileName. (There are over 100k calls to that in my example package. Real world use might end up five times that number.)

Now, replacing the use of that accessor with a bare hash access in a single place in PAR.pm results in a reduction to about three thousand calls to that accessor. The extraction process runs, on my rather fast machine, about 0.3 seconds faster (of 1.2 seconds script run-time which includes loading all those modules). I'd expect the extraction to make up about 0.8-0.9 seconds of the total run-time. That's definitely a noticeable speed-up. A simple-minded micro-benchmark shows that direct hash access is about three times faster than calling the method.

I'm aware that this is breaking encapsulation. This is bad. But it's also a *huge* gain! Would you consider it feasible to do such a hack if it helps this much?

The code I'm referring to is in PAR.pm's _first_member function. my %names = map { ...->fileName... } $zip->members;

An alternative would be to convince the A::Zip maintainer to provide a $zip->member_names() function which breaks encapsulation of A::Zip::Member from within the Archive::Zip distribution. The problems are that a) A::Zip is currently very strict about this and I don't want to be the one to change that and b) the author has been away for some time and A::Zip is currently community maintained.

What do you think?

Steffen

This might sound like passing the buck but processors are getting so fast now, and are due to get faster. The extra speed would help with old machines, but I am not sure a saved half second or so would be much appreciated during a pp'd applications initial startup.

I suppose it could be argued that if there were a pp'd application that an end user had to execute and exit, repeatedly, then there would be a real difference. However, it is a contrived example. If that were the need I imagine the original programmer would simply design the program to stay running, and have a "run again" button, or some sort of similar way to keep the application alive.

The real goal of any software is to solve a problem for the end user. In the long run, maintainability, and especially ease of adding onto existing code (extensibility) will do more for the end user. It also lets programmers whose skills only go but so far (like me) do what I can do. Such as debug other people's (modular) code.

Just an opinion.





Reply via email to