Re: We could really use some advice from SCALR staff to figure out why synchronize all on our app role is taking 47 minutes.

donnoman Thu, 18 Jun 2009 12:21:18 -0700


Thanks Alex, I've been working on fixes to our sync-all issue.


Our new Image down to 1.4g from 2.8g.

[ops] app:~# df -h
Filesystem            Size  Used Avail Use% Mounted on
/dev/sda1              10G  1.4G  8.1G  15% /
varrun                851M   56K  851M   1% /var/run
varlock               851M     0  851M   0% /var/lock
udev                  851M   16K  851M   1% /dev
devshm                851M     0  851M   0% /dev/shm
/dev/sda2             147G  492M  139G   1% /mnt

I modified our Capistrano scripts to run against only the app role via
localhost and now deploy the scripts to the non-ephemeral volume.

We deploy the applications to the ephemeral volume /mnt

Created a shell script that we use to update and invoke the deploy
scripts based on SCALR's on_init scripting hook.

The role doesn't need to be synced between code deploys because the
scripts that checkout to the ephemeral volume on instance start up
will check out the latest production code.

It now takes 11 minutes from the rebundle trap to finish uploading to
S3.
takes 5 minutes for our scripts to do the checkouts and deploy to the
ephemeral volume.
The whole sync all process from rebundle trap to instances being
notified of host_up to replace one instance takes 21 minutes.
Based on this we estimate starting a new instance based on this image
and deploy scripts will take about 10 minutes.

Still more lengthy than we would like, but a big improvement over the
hour we were experiencing before, and we shouldn't have to sync-all
for just a code update.

An application or OS dependency will need to change for us to have to
sync-all the app role again.  Our deploy scripts take care of gem
installations, so even the gem load-out can change without a sync as
well.

We will continue to leverage our Capistrano scripts to keep all of the
running instances up to date.  The on_init scripts will cover us if an
instance fails or scales.

We think there's a great deal more risk involved with this method, as
any failure in the scripts could render an instance unusable, but the
long sync times were just untenable and had to be addressed.


Jun 18, 2009 13:22:49
Info

i-a392caca/trap-rebundle.sh
Received rebundle trap from Scalr. (New role: ops-app).

Jun 18, 2009 13:33:19
Info

i-a392caca/trap-rebundle.sh
Rebundle output: Checking prerequisites:

Jun 18, 2009 13:37:24
Info

i-0f87df66/exec-event-scripts.sh
Executing scripts on event 'hostInit' fired on host 10.248.254.175


Jun 18, 2009 13:42:38
Info

i-0f87df66/nbb_boot_finish
Completed user defined boot_finish tasks. Continuing. (10.248.254.175/
app)
Jun 18, 2009 13:43:09
Info

i-4d4d1224/trap-hostup.sh
10.248.254.175 UP. Scalr notified me that 10.248.254.175 of role app
(Custom role: ops-app) is up.
--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"scalr-discuss" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to 
[email protected]
For more options, visit this group at 
http://groups.google.com/group/scalr-discuss?hl=en
-~----------~----~----~----~------~----~------~--~---

Re: We could really use some advice from SCALR staff to figure out why synchronize all on our app role is taking 47 minutes.

Reply via email to