Hi Scott, first, let me say "thanks" for your research. The abysmal startup performance has been annoying me for quite some time, but I just don't have much time to devote to PAR these days, so I couldn't do much about it. That was the reason for hacking up Archive::Unzip::Burst, which just speeds up initial extraction of the executable by doing it in C (infozip, really) instead of the dog-slow Archive::Zip.
A::U::B so far only runs on Linux, with some hacking, apparently also under AIX. I understand that you did all your measurements on Windows? If you did them on Linux, could you run them with A::U::Burst installed for comparison? I have not managed to build A::U::B on Windows yet. Scott Stanton wrote: > I am trying to find a way to get self-contained par apps to approach the > speed of PerlApp binaries on Windows. I ran a few tests on a sample > application to compare performance on my system. Here are my results: [...] > First, I made a standard perlapp self-contained executable. Perlapp > unpacks just the shared libraries to a single user-specific directory > ($TEMP/pdk-<user>). It does not remove the files when done. The files > are named using an md5 checksum so they are unlikely to collide if there > are changes. The one exception I noticed was perl58.dll which was > placed in a checksum named subdirectory under its original name. > Presumably something similar must be done for any bundled run-time > linker dependency. That's actually a good idea: Avoiding dll file name clashes using a directory for each external (non-perl-module) dll. Currently, we extract the external dlls under their original name, IIRC. However, I don't know how you can make those directories accessible to the dll loader without appending *each* directory to $PATH. Doing that in a long path ($TEMP is long on Windows, isn't it?) might quickly result in hitting the length limitations of env. vars on Windows. > Next I tried making a par binary that unpacks to an application specific > temp directory ($TEMP/par-<user>/cache-<sum>) and does not clean up > afterwards. In this mode, par unpacks the whole zip file as well as all > of the bootstrap files. This results in: > > First call: 8.5 s > > Second call: 0.5 s This is the default behaviour of PAR::Packer, of course. > The next test I ran was to create a self contained executable made with > par and -clean. This case unpacks some of the files each time and > removes the temporary directory when done: > > First call: 2.3 s > > Second call: 2.3 s (no reuse) > > > > These tests indicate that there is a severe penalty for unpacking the > whole file, but once that is done the second use is pretty fast. Taking > a cue from the perlapp approach, I tried running a few additional > experiments. To get a baseline for the cost of using infozip to expand > the zip file, I tried expanding just the dlls, then everything: > > .dlls only: 0.5 s > > Everything: 4.9 s This seems *very* slow for infozip. Perhaps it's because of the hardware (old hard disk?) or the filesystem, or because you have a really huge binary. But infozip was blazingly fast in extracting even zips of tens of megabytes of tiny files on my reasonably fast (2005) linux machine using ext3. > That's pretty interesting. We might be able to speed things up a lot by > only unpacking the dlls. So in my next experiment I modified par a bit > to disable the calls to _extract_inc in PAR.pm so it doesn't unpack > everything when running with a cache directory. This improved things a > lot: > > First call: 1.7 s > > Second call: 0.5 s > This is pretty good, but we're still unpacking a lot of .pm files that > could be loaded directly from memory. It turns out there are two places > that files are unpacked. One is the bootstrap files embedded in the > binary, the rest are inside the .par file. The .par files can be loaded > directly from memory by using PerlIO::scalar to create streams from > buffers. Some of the bootstrap files must be saved to disk in order to > get far enough along to get the dynloader and PerlIO::scalar modules > loaded, but the rest can be loaded from memory. The resulting cache > directory only contains a small subset of the .pm files plus all of the > .dlls. Also, par was extracting files before testing whether they > already exist in the cache. Deferring the read avoids unnecessary I/O > (it also avoids the permission denied problems that sometimes crop up > when the same binary is called multiple times). With these changes in > place, I got the following numbers: > > First call: 1.2 s > > Second call: 0.5 s That is certainly the most elegant way to speed up PAR startup. It's one of the longest standing criticisms that we don't do it that way. Nicholas Clark's ex::lib::zip does this loading from memory using XS, if I remember correctly. I don't remember whether or how it deals with shared object files. It probably does. > Rerunning the -clean case gives: > > First call: 1.6 s > > Second call: 1.6 s > > > > This is a significant improvement over the original cached and uncached > cases and probably approaches the limit of what can be achieved with the > current zip implementation and application structure. So what's > missing? I'm sure there are a few more improvements, but they probably mean moving away from A::Zip for as much as possible. This is already impressive. > There are probably ways to streamline the bootstrapping to avoid the > need to unpack any .pm files to disk. Further tweaking of the startup > code might allow the PerlIO::scalar and dynloader modules to be loaded > explicitly. Probably. This would probably require quite some refactoring. > My current deltas are a bit of a hack and don't preserve the original > _extract_inc behavior. Ideally, the this would be under the control of > a switch. I'd like to get feedback from the list about the best way to > integrate this. Also, if you can think of problems with this approach > that I haven't considered, that would be helpful too. I've verified > that the patch works on both Linux (AS Perl 5.8.7) and Windows XP (AS > Perl 5.8.6). I would love to see this done in PAR(::Packer). As you correctly point out, it's a pretty drastic change which certainly comes with some change in behaviour, but in my opinion, it's the way to go forward. Perhaps we should create a branch of PAR::Packer for these changes, which will eventually lead to PAR::Packer >= 2.0. At the same time, there have been several severe bug reports against PAR::Packer, PAR, and Module::ScanDeps. Once or if we can fix those, I'd like to release a PAR(::Packer) 1.0 to the world to get away<from those tiny 0.001 version bumps. Scott, you have commit access to the repository, so please feel free to branch PAR::Packer and experiment! I'd be very much willing to do a developer release with your changes. Best regards, Steffen
