Hi list,

you are probably all aware of the long time it takes to extract all data from a pp'd binary or a .par file for large applications.

For a sample .par which contains a couple of modules for testing, I did a profiling run and it turns out that a lot (20%) of the time is spent in an accessor in Archive::Zip. Namely Archive::Zip::Member::fileName. (There are over 100k calls to that in my example package. Real world use might end up five times that number.)

Now, replacing the use of that accessor with a bare hash access in a single place in PAR.pm results in a reduction to about three thousand calls to that accessor. The extraction process runs, on my rather fast machine, about 0.3 seconds faster (of 1.2 seconds script run-time which includes loading all those modules). I'd expect the extraction to make up about 0.8-0.9 seconds of the total run-time. That's definitely a noticeable speed-up. A simple-minded micro-benchmark shows that direct hash access is about three times faster than calling the method.

I'm aware that this is breaking encapsulation. This is bad. But it's also a *huge* gain! Would you consider it feasible to do such a hack if it helps this much?

The code I'm referring to is in PAR.pm's _first_member function. my %names = map { ...->fileName... } $zip->members;

An alternative would be to convince the A::Zip maintainer to provide a $zip->member_names() function which breaks encapsulation of A::Zip::Member from within the Archive::Zip distribution. The problems are that a) A::Zip is currently very strict about this and I don't want to be the one to change that and b) the author has been away for some time and A::Zip is currently community maintained.

What do you think?

Steffen

Reply via email to