Optimization during extraction

Steffen Mueller Wed, 24 Jan 2007 02:55:59 -0800

Hi list,

you are probably all aware of the long time it takes to extract all datafrom a pp'd binary or a .par file for large applications.

For a sample .par which contains a couple of modules for testing, I dida profiling run and it turns out that a lot (20%) of the time is spentin an accessor in Archive::Zip. Namely Archive::Zip::Member::fileName.(There are over 100k calls to that in my example package. Real world usemight end up five times that number.)

Now, replacing the use of that accessor with a bare hash access in asingle place in PAR.pm results in a reduction to about three thousandcalls to that accessor. The extraction process runs, on my rather fastmachine, about 0.3 seconds faster (of 1.2 seconds script run-time whichincludes loading all those modules). I'd expect the extraction to makeup about 0.8-0.9 seconds of the total run-time. That's definitely anoticeable speed-up. A simple-minded micro-benchmark shows that directhash access is about three times faster than calling the method.

I'm aware that this is breaking encapsulation. This is bad. But it'salso a *huge* gain! Would you consider it feasible to do such a hack ifit helps this much?

The code I'm referring to is in PAR.pm's _first_member function. my%names = map { ...->fileName... } $zip->members;

An alternative would be to convince the A::Zip maintainer to provide a$zip->member_names() function which breaks encapsulation ofA::Zip::Member from within the Archive::Zip distribution. The problemsare that a) A::Zip is currently very strict about this and I don't wantto be the one to change that and b) the author has been away for sometime and A::Zip is currently community maintained.


What do you think?

Steffen

Optimization during extraction

Reply via email to