Re: Problems after management server reboot & workaround

Kelven Yang Wed, 08 May 2013 12:37:59 -0700

This is a known issue when you are running management server together with
a KVM host. After KVM host is added to the running management server, it
creates a bridge that can cause management server ID to be changed after
reboot, but only for once.


A similar issue can happen when you run management server in a VM and
later on clone the VM.

We have code logic to handle these cases, instead of seeing some annoying
messages in the log, it will not affect management server from normal
functioning. But it would be really nice to see a fix to have a stable
management server ID acquisition process.

Kelven 

On 5/8/13 12:15 PM, "Chip Childers" <chip.child...@sungard.com> wrote:

>On Wed, May 08, 2013 at 09:11:43PM +0200, Jori Liesenborgs wrote:
>> 
>> Hi everyone,
>> 
>> On our cloudstack setup (4.0.2), I noticed that after a reboot of
>> the management server, I was no longer able to start new instances.
>> A secondary problem was that the management-server.log file filled
>> up extremely fast (gigabytes in a few hours), with messages like
>> these:
>> 
>> 2013-05-08 05:26:10,627 DEBUG [agent.manager.ClusteredAgentAttache]
>> (AgentManager-Handler-4:null) Seq 7-1033568320: Forwarding Seq
>> 7-1033568320:  { Cmd , MgmtId: 38424150221294, via: 7, Ver: v1,
>> Flags: 100111,
>> [{"StopCommand":{"isProxy":false,"vmName":"i-2-6-VM","wait":0}}] }
>> to 130450099353672
>> 
>> This turned out to contain an important clue: when looking at the
>> 'mshost' table in the 'cloud' database, instead of seeing one entry
>> for the management server ID, there now were two:
>> 
>> | id | msid            | runid         | name          | ...
>> |  1 | 130450099353672 | 1367919381740 | cloud-manager | ...
>> |  2 |  38424150221294 | 1367950608087 | cloud-manager | ...
>> 
>> And these two IDs were those that were mentioned in the logfile. In
>> fact, every reboot a new entry in the 'mshost' table appeared, and
>> that new ID was being inserted into the 'host' entries, for system
>> VMs 'v-2-VM' and 's-1-VM'.
>> 
>> Browsing through the code, it appears that in the
>> ManagementServerNode.java file, the function getManagementServerId()
>> returns a static value created by the MacAddress class. Now, on a
>> Linux platform (we are using ubuntu), this address is obtained from
>> the first entry that the command "/sbin/ifconfig -a" shows as
>> output. And this turned out to be the address of the cloud0 bridge
>> interface, which changed after a reboot (or after deleting the
>> bridge using brctl and restarting the entire cloudstack).
>> 
>> To avoid having to modify and recompile cloudstack, I created a fake
>> ifconfig: a simple python process that most of the time just runs
>> the real ifconfig (which I renamed to ifconfig-bin), but when called
>> as "/sbin/ifconfig -a", it rearranges the output so that eth0 is
>> shown first (and not cloud0). This way, the management server id is
>> basically the MAC address of eth0, which stays the same after a
>> reboot.
>> 
>> I haven't had the time to create a long running test yet (I only
>> figured it out this afternoon), but after several reboots, the
>> management server id now stays the same, and I am still able to
>> start new instances.
>> 
>> Hope someone finds this useful.
>> 
>> Cheers,
>> Jori
>> 
>> 
>
>Jori,
>
>This is really interesting.  Would you mind opening a bug about it with
>your findings?  And if you're interested in submitting a patch, we'd
>love that too!
>
>-chip

Re: Problems after management server reboot & workaround

Reply via email to