[Sisuite-devel] tardiff for updateclient (or whole deployment?)

Erich Focht Mon, 29 May 2006 02:38:15 -0700

Hi Brian,

now that bittorrent is well included into systemimager I'd like to re-iterate
the idea I wrote about in January. The initial email is attached, below.


The idea is to use a "tardiff" produced tarball for updating client nodes
instead of deploying the entire image. Experiments showed that building a
tardiff was 7 to 32 times faster than building the tarball. (tarball build +
compress: 130s, tardiff build from compressed tarball: 17s, from uncompressed
tarball 3.8s). Additionally its deployment saves a huge amount of bandwidth
and it gives one some sort of image management (as tardiffs are small and
can be archived easilly).

If nothing speaks against this, I'd imagine it would be easy to add a --diff
option to si_updateclient which fetches a tardiff instead of doing the entire
rsync of the image. Of course, bittorrent deployment could also take into
account some image-diff.tar.gz file on top of the original image.tar.gz, with
some logic recognising whether a diff is needed at all or not.

So, should I proceed with the integration of this? Any thoughts about the
approach?

Thanks,
best regards,
Erich


================== initial email: Jan 10 2006 ===========================

here are some thoughts about updating (with a test install on a 2.4GHz Xeon
with RHEL4 in mind):

- the full image directory requires   711MB
- the tarred up image needs:          640MB
  - built from the image directory in 31s
- compressed image tarfile:           219MB
  - compress time (gzip)              100s
  - compress time (gzip -9)           270s
  - uncompress time                   15s

For the initial deployment it fully makes sense to work with the gzipped
tarfile, this saves a lot of bandwidth and certainly speeds up things even on
small clusters.

For updates it seems to be a big waste to use the full tar. A tar containing
only the differences between the image directory and the initial deployment
tarfile would be much faster to deploy. An idea is to use two tars right from
the start:

image.tar.gz      : full archive of the image directory (initial deployment
                    image)
image-diff.tar.gz : archive containing the diffs between the image directory
                    and image.tar.gz . We would use only one of these
                    differential archives (not incremental, just base+diff).

At the initial deployment time image-diff.tar.gz would be an empty archive.

The attached script "tardiff" builds such differential tars. It mainly spends
its time with comparing the original image tarfile with the directory content
on the filesystem. Actually most time is spent with decompressing the
archive. Here are some numbers (with only small differences between image
directory and image.tar.gz):

- if the tarfile is compressed (image.tar.gz) building image-diff.tar.gz takes
  17.6s
- if the tarfile is not compressed (image.tar), building image-diff.tar.gz
  takes 3.8s.

So it is really feasible to build (and overwrite) image-diff.tar.gz right
before upgrading a cluster. The benefit of using this instead of a full tar is
a very much reduced amount of data to be transfered over the network and the
fact that unpacking this small archive really touches only the modified files,
nothing else.

Any thoughts? If this makes sense, I'll try integrating once there is some BT
transport available (or the framework around it).



-------------------------------------------------------
All the advantages of Linux Managed Hosting--Without the Cost and Risk!
Fully trained technicians. The highest number of Red Hat certifications in
the hosting industry. Fanatical Support. Click to learn more
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=107521&bid=248729&dat=121642
_______________________________________________
Sisuite-devel mailing list
Sisuite-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/sisuite-devel

[Sisuite-devel] tardiff for updateclient (or whole deployment?)

Reply via email to