It was a bug related to determining instance index. Bug was fixed. Thanks for reporting.
On Mar 30, 6:17 am, mikeytag <[email protected]> wrote: > Farm ID: 517 > > I just had a major catastrophe on my farm. I have 4 storage nodes that > run a Gluster filesystem for the other machines in my farm. It is > imperative that on HostInit each server mounts its appropriate EBS > volume so that the glusterfs daemon can start. I was very excited to > see the built in Scalr feature that handles this, previously I was > using my own script that worked reasonably well. > > Today Scalr noticed that 3 of these nodes were down, so it created new > instances. However, when I logged in to see why none of my sites were > working. I went to the EBS Volumes page in Gluster and saw that the > volumes that are set to automatically mount for sto1-g2, sto2-g2, and > sto3-g2 were all listed as "Available" This means Scalr was unable to > mount or didn't try to mount the appropriate volumes when new > instances of these roles came up after the old ones crashed. BTW, all > these roles explicitly only allow 1 running instance at a time because > I need specific EBS volumes mounted to them. > > Here is an example of what I found in my Event Log: > > 29-03-2009 04:43:15 INFO Main Farm i-00056169/trap-hostup.sh > 10.251.199.116 UP. Scalr notified me that 10.251.199.116 of role base > (Custom role: sto1-g2) is up. > > 29-03-2009 04:42:09 INFO Main Farm i-59197d30/trap-hostdown.sh > 10.251.75.181 DOWN: Scalr notified me that 10.251.75.181 of role base > (Custom role: sto1-g2, I'm first: 0) is down > > 29-03-2009 04:40:08 WARN Main Farm PollerProcess Disaster: No > instances running in role sto1-g2! > > 29-03-2009 04:38:09 ERROR Main Farm PollerProcess Failed to > retrieve LA on instance i-51e58138 for 20 minutes. terminating > instance. Try increasing 'Terminate instance if cannot retrieve it's > status' setting on sto1-g2 configuration tab. > > and in the Scripting Log I have a bunch of these: > > 2009-03-26 18:12:47 OnHostUp Main Farm i-8f42d9e6 > Script '/usr/local/ > bin/scalr-scripting.Gx28149/EBS_Mount' execution result (Execution > time: 7 seconds). > stdout: MY ROLE: sto1-g2 > My INSTANCE: i-8f42d9e6 > Volume is already attached! > > 2009-03-26 15:42:59 OnHostUp Main Farm i-8f42d9e6 > Script '/usr/local/ > bin/scalr-scripting.fn24850/EBS_Mount' execution result (Execution > time: 8 seconds). > stdout: MY ROLE: sto1-g2 > My INSTANCE: i-8f42d9e6 > Volume is already attached! > > 2009-03-26 15:42:37 OnHostUp Main Farm i-8f42d9e6 > Script '/usr/local/ > bin/scalr-scripting.tl24436/EBS_Mount' execution result (Execution > time: 9 seconds). > stdout: MY ROLE: sto1-g2 > My INSTANCE: i-8f42d9e6 > Volume is already attached! > > I then thought to myself, great I forgot to turn off the old OnHostUp > EBS_Mount script and it is causing a conflict. Well, after visiting my > Farm Edit page I found that this was NOT the case. The EBS_Mount > script is not checked for any event for any role. I am guessing that I > just stumbled on some type of Scripting cache bug in Scalr and the > side effect is that my instances are not able to reattach their EBS > volumes using the new feature. --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "scalr-discuss" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/scalr-discuss?hl=en -~----------~----~----~----~------~----~------~--~---
