It was a bug related to determining instance index. Bug was fixed.
Thanks for reporting.

On Mar 30, 6:17 am, mikeytag <[email protected]> wrote:
> Farm ID: 517
>
> I just had a major catastrophe on my farm. I have 4 storage nodes that
> run a Gluster filesystem for the other machines in my farm. It is
> imperative that on HostInit each server mounts its appropriate EBS
> volume so that the glusterfs daemon can start. I was very excited to
> see the built in Scalr feature that handles this, previously I was
> using my own script that worked reasonably well.
>
> Today Scalr noticed that 3 of these nodes were down, so it created new
> instances. However, when I logged in to see why none of my sites were
> working. I went to the EBS Volumes page in Gluster and saw that the
> volumes that are set to automatically mount for sto1-g2, sto2-g2, and
> sto3-g2 were all listed as "Available"  This means Scalr was unable to
> mount or didn't try to mount the appropriate volumes when new
> instances of these roles came up after the old ones crashed. BTW, all
> these roles explicitly only allow 1 running instance at a time because
> I need specific EBS volumes mounted to them.
>
> Here is an example of what I found in my Event Log:
>
> 29-03-2009 04:43:15     INFO    Main Farm       i-00056169/trap-hostup.sh
> 10.251.199.116 UP. Scalr notified me that 10.251.199.116 of role base
> (Custom role: sto1-g2) is up.
>
> 29-03-2009 04:42:09     INFO    Main Farm       i-59197d30/trap-hostdown.sh
> 10.251.75.181 DOWN: Scalr notified me that 10.251.75.181 of role base
> (Custom role: sto1-g2, I'm first: 0) is down
>
> 29-03-2009 04:40:08     WARN    Main Farm       PollerProcess   Disaster: No
> instances running in role sto1-g2!
>
> 29-03-2009 04:38:09     ERROR   Main Farm       PollerProcess   Failed to
> retrieve LA on instance i-51e58138 for 20 minutes. terminating
> instance. Try increasing 'Terminate instance if cannot retrieve it's
> status' setting on sto1-g2 configuration tab.
>
> and in the Scripting Log I have a bunch of these:
>
> 2009-03-26 18:12:47     OnHostUp        Main Farm       i-8f42d9e6      
> Script '/usr/local/
> bin/scalr-scripting.Gx28149/EBS_Mount' execution result (Execution
> time: 7 seconds).
> stdout: MY ROLE: sto1-g2
> My INSTANCE: i-8f42d9e6
> Volume is already attached!
>
> 2009-03-26 15:42:59     OnHostUp        Main Farm       i-8f42d9e6      
> Script '/usr/local/
> bin/scalr-scripting.fn24850/EBS_Mount' execution result (Execution
> time: 8 seconds).
> stdout: MY ROLE: sto1-g2
> My INSTANCE: i-8f42d9e6
> Volume is already attached!
>
> 2009-03-26 15:42:37     OnHostUp        Main Farm       i-8f42d9e6      
> Script '/usr/local/
> bin/scalr-scripting.tl24436/EBS_Mount' execution result (Execution
> time: 9 seconds).
> stdout: MY ROLE: sto1-g2
> My INSTANCE: i-8f42d9e6
> Volume is already attached!
>
> I then thought to myself, great I forgot to turn off the old OnHostUp
> EBS_Mount script and it is causing a conflict. Well, after visiting my
> Farm Edit page I found that this was NOT the case. The EBS_Mount
> script is not checked for any event for any role. I am guessing that I
> just stumbled on some type of Scripting cache bug in Scalr and the
> side effect is that my instances are not able to reattach their EBS
> volumes using the new feature.
--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"scalr-discuss" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to 
[email protected]
For more options, visit this group at 
http://groups.google.com/group/scalr-discuss?hl=en
-~----------~----~----~----~------~----~------~--~---

Reply via email to