As mentioned previously, I'm still having a problem whereby when I ask Scalr to launch one additional instance for a role (for which there is 1 instance running and the min=max=1) it can spin up 2 or more additional ones (for example, it launched 8 today) before deciding to terminate all by the additional one requested after ~45mins.
Having performed several runs with additional debug logging inserted and viewing the logs, I have a theory as to what is happening. However, I'm not 100% certain (and viewing the logs is a little confusing as they often don't appear in timestamp order). Here is the sequence of events I believe may happen: 1. From the farm roles_view page I request one additional instance. 1a. the POST request handler increments the min count for the role 1b. Scalr::RunInstance is called 1c. EC2 RunInstances is called 1d. A new instances is added into the instances DB table with status 'pending' All of that seems to take 2-3 minutes to complete. So, while that is happening: 2. The Poller cronjob is run (scheduled every 1min, so can run several times before the instance launches): 2a. It notices that the number of instances running is < min count (and sets need_new_instance=true) 2b. It starts a new instance - *which it should not* Again, the call to RunInstances in 2b only adds the DB entry for the instance in pending state *after* the call to EC2 RunInstances completes - which may be 2-3 mins during which time the cron job is executes again. As I said, I'm not 100% of the reasoning above. Next time I find time to sit down and have another debugging session, I'll update this thread if I find something different. I can't quite explain how those steps manage to cascade to cause 8 instances to be launched. I'm also wondering if this is related to the DNS Zone update failure I'm seeing (which claims it can't update the DNS Zone because Cron has locked the zone - which might be true if there are several Poller cronjobs running in parallel if they're scheduled every 1 min, but taking 2-3 mins to execute). If any of the developers can shed any light on the logic here, that would be welcome. I plan to change my Poller cron to run every 4mins to see if that effects the issue. Once I understand it fully I'll be in a position to suggest a fix. Thanks, -Cenji. --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "scalr-discuss" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/scalr-discuss?hl=en -~----------~----~----~----~------~----~------~--~---
