Hi,

here are some thoughts about updating (with a test install on a 2.4GHz Xeon
with RHEL4 in mind):

- the full image directory requires   711MB
- the tarred up image needs:          640MB
  - built from the image directory in 31s
- compressed image tarfile:           219MB
  - compress time (gzip)              100s
  - compress time (gzip -9)           270s
  - uncompress time                   15s

For the initial deployment it fully makes sense to work with the gzipped
tarfile, this saves a lot of bandwidth and certainly speeds up things even on
small clusters.

For updates it seems to be a big waste to use the full tar. A tar containing
only the differences between the image directory and the initial deployment
tarfile would be much faster to deploy. An idea is to use two tars right from
the start:

image.tar.gz      : full archive of the image directory (initial deployment
                    image)
image-diff.tar.gz : archive containing the diffs between the image directory
                    and image.tar.gz . We would use only one of these
                    differential archives (not incremental, just base+diff).

At the initial deployment time image-diff.tar.gz would be an empty archive.

The attached script "tardiff" builds such differential tars. It mainly spends
its time with comparing the original image tarfile with the directory content
on the filesystem. Actually most time is spent with decompressing the
archive. Here are some numbers (with only small differences between image
directory and image.tar.gz):

- if the tarfile is compressed (image.tar.gz) building image-diff.tar.gz takes
  17.6s
- if the tarfile is not compressed (image.tar), building image-diff.tar.gz
  takes 3.8s.

So it is really feasible to build (and overwrite) image-diff.tar.gz right
before upgrading a cluster. The benefit of using this instead of a full tar is
a very much reduced amount of data to be transfered over the network and the
fact that unpacking this small archive really touches only the modified files,
nothing else.

Any thoughts? If this makes sense, I'll try integrating once there is some BT
transport available (or the framework around it).

Regards,
Erich



On Monday 02 January 2006 23:42, Paul Greidanus wrote:
> Hi Andrea and Bernard,
> 
> I'm just wondering if there is (or is planned) to have a way to use BT 
> to run si_updateclient? or is this for initial deployment only? I've 
> always thought systemimager's strongest feature is the update client 
> feature, but I've had trouble if I tell a large number (>50) machines to 
> update at the same time.. it would be cool if BT would fix this one, and 
> just have the nodes sync to eachother..
> 
> Andrea Righi wrote:
> > Hi all,
> > 
> > attached is some work I've done regarding bittorrent...
> > 
> > Since this seems to work and since it can be useful also for those that
> > are working for the bittorrent porting (in particular for Bernard) I've 
> > decided to post this patch (based on the developer trunk).
> > 
> > I've used the standard bt client (written in python) both for clients 
> > and image server, so I've included both the python interpreter and 
> > bittorrent scripts in BOEL binaries.
> > 
> > Bernard is using libBt, it seems more light since it doesn't require 
> > python, maybe it could be interesting do some tests to evaluate the 
> > difference in performances between the two products.
> > 
> > To use my bittorrent patch you've to start the tracker and the first 
> > seeder on the image server, simply using the following command:
> > 
> > # si_startbttrack --image_server <ip_image_server> --image <image_name>
> > 
> > This command creates also a single tar file for the image specified by 
> > comdline; run with --update_image to update the tar file.
> > 
> > I've not written the si_stopbttrack command, even if it is very
> > simple... but for now you have to stop it only killing the tracker and
> > the seeder manually... ;-)
> > 
> > For the client side I don't use flamethrower, the bt client works alone 
> > without further packages (except python, of course).
> > 
> > You must only define BITTORRENT_STAGING=<dir_path> as a kernel boot 
> > parameter. In this way the bittorrent protocol is used to deploy the 
> > image in the clients. First of all the image is downloaded in a staging 
> > directory (the path is specified by the parameter), then it is untarred 
> > in /a/ (the root directory of the installed fs) and the tar is removed 
> > when the  exctraction is completed.
> > 
> > If the client has not a lot of memory, like my test machine :-(, you can
> > deploy the tar file using the client disk space, for example I usually 
> > use BITTORRENT_STAGE=/a/tmp... otherwise if you have enough RAM, to 
> > increase performance, is better to deploy the tar file in tmpfs (for 
> > example using /tmp - we can assume this as the default option).
> > 
> > If you can do some tests let me know your feedback... in particular for
> > Bernard that it's working hardly with bittorrent... ;-)
> > 
> > Best regards and happy new year!
> > -Andrea

Attachment: tardiff
Description: Perl program

Reply via email to