Hi Brian, now that bittorrent is well included into systemimager I'd like to re-iterate the idea I wrote about in January. The initial email is attached, below.
The idea is to use a "tardiff" produced tarball for updating client nodes instead of deploying the entire image. Experiments showed that building a tardiff was 7 to 32 times faster than building the tarball. (tarball build + compress: 130s, tardiff build from compressed tarball: 17s, from uncompressed tarball 3.8s). Additionally its deployment saves a huge amount of bandwidth and it gives one some sort of image management (as tardiffs are small and can be archived easilly). If nothing speaks against this, I'd imagine it would be easy to add a --diff option to si_updateclient which fetches a tardiff instead of doing the entire rsync of the image. Of course, bittorrent deployment could also take into account some image-diff.tar.gz file on top of the original image.tar.gz, with some logic recognising whether a diff is needed at all or not. So, should I proceed with the integration of this? Any thoughts about the approach? Thanks, best regards, Erich ================== initial email: Jan 10 2006 =========================== here are some thoughts about updating (with a test install on a 2.4GHz Xeon with RHEL4 in mind): - the full image directory requires 711MB - the tarred up image needs: 640MB - built from the image directory in 31s - compressed image tarfile: 219MB - compress time (gzip) 100s - compress time (gzip -9) 270s - uncompress time 15s For the initial deployment it fully makes sense to work with the gzipped tarfile, this saves a lot of bandwidth and certainly speeds up things even on small clusters. For updates it seems to be a big waste to use the full tar. A tar containing only the differences between the image directory and the initial deployment tarfile would be much faster to deploy. An idea is to use two tars right from the start: image.tar.gz : full archive of the image directory (initial deployment image) image-diff.tar.gz : archive containing the diffs between the image directory and image.tar.gz . We would use only one of these differential archives (not incremental, just base+diff). At the initial deployment time image-diff.tar.gz would be an empty archive. The attached script "tardiff" builds such differential tars. It mainly spends its time with comparing the original image tarfile with the directory content on the filesystem. Actually most time is spent with decompressing the archive. Here are some numbers (with only small differences between image directory and image.tar.gz): - if the tarfile is compressed (image.tar.gz) building image-diff.tar.gz takes 17.6s - if the tarfile is not compressed (image.tar), building image-diff.tar.gz takes 3.8s. So it is really feasible to build (and overwrite) image-diff.tar.gz right before upgrading a cluster. The benefit of using this instead of a full tar is a very much reduced amount of data to be transfered over the network and the fact that unpacking this small archive really touches only the modified files, nothing else. Any thoughts? If this makes sense, I'll try integrating once there is some BT transport available (or the framework around it). ------------------------------------------------------- All the advantages of Linux Managed Hosting--Without the Cost and Risk! Fully trained technicians. The highest number of Red Hat certifications in the hosting industry. Fanatical Support. Click to learn more http://sel.as-us.falkag.net/sel?cmd=lnk&kid=107521&bid=248729&dat=121642 _______________________________________________ Sisuite-devel mailing list Sisuite-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/sisuite-devel