Re: Question about Basic and Advanced Network
1) Netscaler provides local balancing functions rather than IPs. For both basic and advanced networking you can either assign IPs statically to your VMs or you can use DHCP on your virtual routers to provide the IPs. Public vs private IPs , doesn't really make any difference. 2) You can setup your Cloudstack using Adavnced network with Security Groups which is pretty much basic networking but with multiple subnets/vlans. However if you use Advanced networking (without Security Groups) then no you cannot have isolated networks using SG but Advanced networking does support firewalling to isolated and VPC networks. Jon From: Francisco Germano Sent: 11 August 2019 22:51 To: 'users@cloudstack.apache.org' Subject: Question about Basic and Advanced Network Greetings, My team and I are working for open-source software and our next step are to implement an integration with the Cloudstack. We are implementing the network context and we have some doubts. Could you help us? About Basic Network: 1 - A Citrix NetScaler provides public IP, right? Is it possible to control the Public IPs using just the CloudStack API? If yes, how? About Advanced Network: 2 - Is it possible to use Security Group in the Isolated Network? Best regards, Francisco Germano
Re: Best use of server NICs.
Hi Dag Many thanks for that, option 1 it is then 🙂 Jon From: Dag Sonstebo Sent: 19 March 2019 09:29 To: users@cloudstack.apache.org Subject: Re: Best use of server NICs. Hi Jon, In short "it depends...". Going by your hardware spec (only 1GBps NICs) I will assume (please correct me if wrong) that this is a smaller environment / lab / proof of concept? If so you won't see much of a benefit from option 2 since you simply won't have that much secondary storage traffic going through to cause noisy neighbour problems - hence my advice would be option 1) to give you redundancy. Option 2) would be at risk of no redundancy for management and storage (bad), and would only make sense if you had guest VMs with high network IO. Even if you had a lot of secondary storage traffic I would advise against this. If you absolutely wanted to run secondary storage traffic separately I would run a bond for management and primary storage and a NIC each for secondary and guest traffic - but I would still say 1) is the better option. Regards, Dag Sonstebo Cloud Architect ShapeBlue On 18/03/2019, 19:02, "Jon Marshall" wrote: I have 4 1Gbps NICs in each compute node and was considering 2 deployment options (Advanced network with Security Groups) - 1) 2 NICs bonded together and used for all storage and management and the other 2 NIC bonded together and used for guest VM traffic. 2) 1 NIC or management and primary storage, 1 NIC for secondary storage and the remaining 2 NICs bonded together for guest VM traffic. Option 1 would give more redundancy but is there any benefit to separating storage that would outweigh this ? Or is there a better option I have overlooked. Any advice much appreciated dag.sonst...@shapeblue.com www.shapeblue.com<http://www.shapeblue.com> Amadeus House, Floral Street, London WC2E 9DPUK @shapeblue
Best use of server NICs.
I have 4 1Gbps NICs in each compute node and was considering 2 deployment options (Advanced network with Security Groups) - 1) 2 NICs bonded together and used for all storage and management and the other 2 NIC bonded together and used for guest VM traffic. 2) 1 NIC or management and primary storage, 1 NIC for secondary storage and the remaining 2 NICs bonded together for guest VM traffic. Option 1 would give more redundancy but is there any benefit to separating storage that would outweigh this ? Or is there a better option I have overlooked. Any advice much appreciated
Re: KVM Host HA and power lost to host.
Hi Andrija Thanks for responding. Makes sense I guess, just wanted to make sure I wasn't missing anything obvious. Jon From: Andrija Panic Sent: 04 March 2019 13:43 To: users Subject: Re: KVM Host HA and power lost to host. Jon, not an expert on particular implementation, but obviously your host needs power, so its IPMI/BMC/iLo/iDRAC/etc. controller can be contacted and host fenced. Redundant PSU with different power sources is expected (defacto standard in production). Kind regards, Andrija On Mon, 4 Mar 2019 at 12:19, Jon Marshall wrote: > > I have KVM Host HA enabled and power is lost to one of the compute nodes. > The host has it's state marked as alert and the HA states go through > degraded to suspect to Fencing. > > The problem is that the host is never fenced because there is no power to > it so none of the OOBM commands work which means the VMs are never migrated. > > From the management server logs - > > 2019-03-04 11:02:48,288 WARN [o.a.c.h.t.BaseHATask] > (pool-6-thread-9:null) (logid:d0a19f20) Exception occurred while running > FenceTask on a resource: > org.apache.cloudstack.ha.provider.HAFenceException: OOBM service is not > configured or enabled for this host dcp-cscn2.local > org.apache.cloudstack.ha.provider.HAFenceException: OOBM service is not > configured or enabled for this host dcp-cscn2.local > at > org.apache.cloudstack.kvm.ha.KVMHAProvider.fence(KVMHAProvider.java:99) > at > org.apache.cloudstack.kvm.ha.KVMHAProvider.fence(KVMHAProvider.java:42) > at > org.apache.cloudstack.ha.task.FenceTask.performAction(FenceTask.java:42) > at > org.apache.cloudstack.ha.task.BaseHATask$1.call(BaseHATask.java:86) > at > org.apache.cloudstack.ha.task.BaseHATask$1.call(BaseHATask.java:83) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > Caused by: com.cloud.utils.exception.CloudRuntimeException: Out-of-band > Management action (OFF) on host (b53122bc-1446-4ffd-a179-e363ad0d541f) > failed with error: Get Auth Capabilities error > Error issuing Get Channel Authentication Capabilities request > Error: Unable to establish IPMI v2 / RMCP+ session > > at > org.apache.cloudstack.outofbandmanagement.OutOfBandManagementServiceImpl.executePowerOperation(OutOfBandManagementServiceImpl.java:423) > at sun.reflect.GeneratedMethodAccessor225.invoke(Unknown Source) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > ... 21 more > > > which begs the question how is this meant to work for a host whose power > has failed. > > > If I turn off KVM Host HA and change the ping interval to 30 and ping > timeout to 2 then the VMs failover to another host within 5 mins. > > I understand what Host HA is meant for but it seems for a failed host in > terms of power it doesn't work. > > Jon > -- Andrija Panić
KVM Host HA and power lost to host.
I have KVM Host HA enabled and power is lost to one of the compute nodes. The host has it's state marked as alert and the HA states go through degraded to suspect to Fencing. The problem is that the host is never fenced because there is no power to it so none of the OOBM commands work which means the VMs are never migrated. From the management server logs - 2019-03-04 11:02:48,288 WARN [o.a.c.h.t.BaseHATask] (pool-6-thread-9:null) (logid:d0a19f20) Exception occurred while running FenceTask on a resource: org.apache.cloudstack.ha.provider.HAFenceException: OOBM service is not configured or enabled for this host dcp-cscn2.local org.apache.cloudstack.ha.provider.HAFenceException: OOBM service is not configured or enabled for this host dcp-cscn2.local at org.apache.cloudstack.kvm.ha.KVMHAProvider.fence(KVMHAProvider.java:99) at org.apache.cloudstack.kvm.ha.KVMHAProvider.fence(KVMHAProvider.java:42) at org.apache.cloudstack.ha.task.FenceTask.performAction(FenceTask.java:42) at org.apache.cloudstack.ha.task.BaseHATask$1.call(BaseHATask.java:86) at org.apache.cloudstack.ha.task.BaseHATask$1.call(BaseHATask.java:83) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Caused by: com.cloud.utils.exception.CloudRuntimeException: Out-of-band Management action (OFF) on host (b53122bc-1446-4ffd-a179-e363ad0d541f) failed with error: Get Auth Capabilities error Error issuing Get Channel Authentication Capabilities request Error: Unable to establish IPMI v2 / RMCP+ session at org.apache.cloudstack.outofbandmanagement.OutOfBandManagementServiceImpl.executePowerOperation(OutOfBandManagementServiceImpl.java:423) at sun.reflect.GeneratedMethodAccessor225.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ... 21 more which begs the question how is this meant to work for a host whose power has failed. If I turn off KVM Host HA and change the ping interval to 30 and ping timeout to 2 then the VMs failover to another host within 5 mins. I understand what Host HA is meant for but it seems for a failed host in terms of power it doesn't work. Jon
Re: Not able to access the vm from outside network
23 - ip6tables -A i-2-40-VM -j DROP 2019-03-01 10:46:23,739 - Programmed default rules for vm i-2-40-VM 2019-03-01 10:46:24,255 - Executing command: add_network_rules 2019-03-01 10:46:24,259 - programming network rules for IP: 172.20.109.167 vmname=i-2-40-VM 2019-03-01 10:46:24,260 - iptables -F i-2-40-VM 2019-03-01 10:46:24,273 - ip6tables -F i-2-40-VM 2019-03-01 10:46:24,287 - iptables -F i-2-40-VM-eg 2019-03-01 10:46:24,298 - ip6tables -F i-2-40-VM-eg 2019-03-01 10:46:24,312 - iptables -I i-2-40-VM -p tcp -m tcp --dport 0:12000 -m state --state NEW -s 0.0.0.0/24 -j ACCEPT 2019-03-01 10:46:24,325 - iptables -I i-2-40-VM-eg -p tcp -m tcp --dport 0:12000 -m state --state NEW -d 0.0.0.0/24 -j RETURN 2019-03-01 10:46:24,339 - iptables -A i-2-40-VM-eg -j DROP 2019-03-01 10:46:24,351 - ip6tables -A i-2-40-VM-eg -j RETURN 2019-03-01 10:46:24,364 - iptables -A i-2-40-VM -j DROP 2019-03-01 10:46:24,376 - ip6tables -A i-2-40-VM -j DROP 2019-03-01 10:46:24,389 - Writing log to /var/run/cloud/i-2-40-VM.log 2019-03-01 10:46:31,575 - Executing command: get_rule_logs_for_vms 2019-03-01 10:47:31,513 - Executing command: get_rule_logs_for_vms 2019-03-01 10:48:31,515 - Executing command: get_rule_logs_for_vms 2019-03-01 10:49:31,517 - Executing command: get_rule_logs_for_vms 2019-03-01 10:50:31,520 - Executing command: get_rule_logs_for_vms 2019-03-01 10:51:31,522 - Executing command: get_rule_logs_for_vms 2019-03-01 10:52:31,527 - Executing command: get_rule_logs_for_vms 2019-03-01 10:53:31,528 - Executing command: get_rule_logs_for_vms 2019-03-01 10:54:31,529 - Executing command: get_rule_logs_for_vms 2019-03-01 10:55:31,581 - Executing command: get_rule_logs_for_vms Regards Soundar On Fri, Mar 1, 2019 at 1:12 AM Jon Marshall wrote: > Is this after you migrated the VM to another compute node ? > > It looks suspiciously like the issue I saw ie. I was using advanced > networking with security groups and the security policy for the VM was not > migrated to the new compute node. > > There is a bug filed for it and a workaround - > > https://github.com/apache/cloudstack/issues/3088 > > the fix is in the comments but basically you need to need to edit this > file - "/usr/share/cloudstack-common/scripts/vm/network/security_group.py" > > and change line 490 from - > > if ips[0] == "0": > > to - > > if len(ips) == 0 or ips[0] == "0": > > and that should fix it. > > The will be included in CS v4.11.3 > > Jon > > > > From: soundar rajan > Sent: 28 February 2019 13:52 > To: d...@cloudstack.apache.org; users@cloudstack.apache.org > Subject: Not able to access the vm from outside network > > Hi, > > VM outbound is working fine. Inbound is not not able to access from > outside network > > Error Log > 2019-02-28 18:12:25,112 - Failed to network rule ! > Traceback (most recent call last): > File "/usr/share/cloudstack-common/scripts/vm/network/security_group.py", > line 995, in add_network_rules > default_network_rules(vmName, vm_id, vm_ip, vm_ip6, vmMac, vif, brname, > sec_ips) > File "/usr/share/cloudstack-common/scripts/vm/network/security_group.py", > line 490, in default_network_rules > if ips[0] == "0": > IndexError: list index out of range > 2019-02-28 18:13:16,635 - Executing command: cleanup_rules > 2019-02-28 18:13:16,645 - Vms on the host : ['i-2-40-VM', 'i-2-90-VM', > 'i-2-112-VM'] > 2019-02-28 18:13:16,645 - iptables-save | grep -P '^:(?!.*-(def|eg))' | awk > '{sub(/^:/, "", $1) ; print $1}' | sort | uniq > 2019-02-28 18:13:16,671 - iptables chains in the host :['BF-cloudbr0', > 'BF-cloudbr0-IN', 'BF-cloudbr0-OUT', 'FORWARD', 'i-2-112-VM', 'i-2-40-VM', > 'i-2-90-VM', 'INPUT', 'OUTPUT', 'POSTROUTING', 'PREROUTING', ''] > 2019-02-28 18:13:16,672 - grep -E '^ebtable_' /proc/modules | cut -f1 -d' ' > | sed s/ebtable_// > 2019-02-28 18:13:16,693 - ebtables -t nat -L | awk '/chain:/ { > gsub(/(^.*chain: |-(in|out|ips).*)/, ""); print $1}' | sort | uniq > 2019-02-28 18:13:16,716 - ebtables -t filter -L | awk '/chain:/ { > gsub(/(^.*chain: |-(in|out|ips).*)/, ""); print $1}' | sort | uniq > 2019-02-28 18:13:16,738 - ebtables chains in the host: ['FORWARD,', > 'INPUT,', 'OUTPUT,', ''] > 2019-02-28 18:13:16,739 - Cleaned up rules for 0 chains > 2019-02-28 18:13:23,959 - Executing command: get_rule_logs_for_vms > > It happens to particular vm > > Please help.. >
Re: Not able to access the vm from outside network
Is this after you migrated the VM to another compute node ? It looks suspiciously like the issue I saw ie. I was using advanced networking with security groups and the security policy for the VM was not migrated to the new compute node. There is a bug filed for it and a workaround - https://github.com/apache/cloudstack/issues/3088 the fix is in the comments but basically you need to need to edit this file - "/usr/share/cloudstack-common/scripts/vm/network/security_group.py" and change line 490 from - if ips[0] == "0": to - if len(ips) == 0 or ips[0] == "0": and that should fix it. The will be included in CS v4.11.3 Jon From: soundar rajan Sent: 28 February 2019 13:52 To: d...@cloudstack.apache.org; users@cloudstack.apache.org Subject: Not able to access the vm from outside network Hi, VM outbound is working fine. Inbound is not not able to access from outside network Error Log 2019-02-28 18:12:25,112 - Failed to network rule ! Traceback (most recent call last): File "/usr/share/cloudstack-common/scripts/vm/network/security_group.py", line 995, in add_network_rules default_network_rules(vmName, vm_id, vm_ip, vm_ip6, vmMac, vif, brname, sec_ips) File "/usr/share/cloudstack-common/scripts/vm/network/security_group.py", line 490, in default_network_rules if ips[0] == "0": IndexError: list index out of range 2019-02-28 18:13:16,635 - Executing command: cleanup_rules 2019-02-28 18:13:16,645 - Vms on the host : ['i-2-40-VM', 'i-2-90-VM', 'i-2-112-VM'] 2019-02-28 18:13:16,645 - iptables-save | grep -P '^:(?!.*-(def|eg))' | awk '{sub(/^:/, "", $1) ; print $1}' | sort | uniq 2019-02-28 18:13:16,671 - iptables chains in the host :['BF-cloudbr0', 'BF-cloudbr0-IN', 'BF-cloudbr0-OUT', 'FORWARD', 'i-2-112-VM', 'i-2-40-VM', 'i-2-90-VM', 'INPUT', 'OUTPUT', 'POSTROUTING', 'PREROUTING', ''] 2019-02-28 18:13:16,672 - grep -E '^ebtable_' /proc/modules | cut -f1 -d' ' | sed s/ebtable_// 2019-02-28 18:13:16,693 - ebtables -t nat -L | awk '/chain:/ { gsub(/(^.*chain: |-(in|out|ips).*)/, ""); print $1}' | sort | uniq 2019-02-28 18:13:16,716 - ebtables -t filter -L | awk '/chain:/ { gsub(/(^.*chain: |-(in|out|ips).*)/, ""); print $1}' | sort | uniq 2019-02-28 18:13:16,738 - ebtables chains in the host: ['FORWARD,', 'INPUT,', 'OUTPUT,', ''] 2019-02-28 18:13:16,739 - Cleaned up rules for 0 chains 2019-02-28 18:13:23,959 - Executing command: get_rule_logs_for_vms It happens to particular vm Please help..
Re: Possible bug fix - sanity check please
Hi Yiping It is this related bug - https://github.com/apache/cloudstack/issues/3088 Have a look at the comments but to summarise you need to replace this line (line 490) - if ips[0] == "0": with - if len(ips) == 0 or ips[0] == "0": I have tested it and it fixes the problem I was seeing. The fix will be included in 4.11.3 apparently. Jon From: Yiping Zhang Sent: 24 January 2019 23:18 To: users@cloudstack.apache.org Subject: Re: Possible bug fix - sanity check please Hi, Jon: Would you please describe this bug a little more? How do I reproduce it? Is there a Jira or Github issue number for it? It sounds like a bug in 4.11.2.0 affecting VM live migration. I am in the middle of upgrading to 4.11.2.0, and on my lab system I see that the line 488 of file /usr/share/cloudstack-common/scripts/vm/network/security_group.py does have a ";" instead of a ":". Thanks, Yiping On 1/24/19, 12:54 AM, "Jon Marshall" wrote: Please ignore, it has already been fixed but it is not included in the 4.11.2 release (due in the 4.11.3 one). ____ From: Jon Marshall Sent: 23 January 2019 15:30 To: users@cloudstack.apache.org Subject: Possible bug fix - sanity check please The following issue was seen using CS 4.11.2 in advanced mode with security group isolation. VM (internal name i-2-29-VM) - is created and works fine with default security group allowing inbound SSH and ICMP echo request. Migrate the VM to another of the compute nodes and the VM migrates and from the proxy console the VM can connect out but the default security group inbound is not copied across the compute node. The /var/log/cloudstack/agent/security_group.log shows on the compute node the VM has migrated to - 2019-01-18 14:54:25,724 - Ignoring failure to delete ebtables chain for vm i-2-29-VM 2019-01-18 14:54:25,724 - ebtables -t nat -F i-2-29-VM-out 2019-01-18 14:54:25,730 - Ignoring failure to delete ebtables chain for vm i-2-29-VM 2019-01-18 14:54:25,730 - ebtables -t nat -F i-2-29-VM-in-ips 2019-01-18 14:54:25,735 - Ignoring failure to delete ebtables chain for vm i-2-29-VM 2019-01-18 14:54:25,735 - ebtables -t nat -F i-2-29-VM-out-ips 2019-01-18 14:54:25,740 - Ignoring failure to delete ebtables chain for vm i-2-29-VM 2019-01-18 14:54:25,741 - iptables -N i-2-29-VM 2019-01-18 14:54:25,745 - ip6tables -N i-2-29-VM 2019-01-18 14:54:25,749 - iptables -N i-2-29-VM-eg 2019-01-18 14:54:25,753 - ip6tables -N i-2-29-VM-eg 2019-01-18 14:54:25,758 - iptables -N i-2-29-def 2019-01-18 14:54:25,763 - ip6tables -N i-2-29-def 2019-01-18 14:54:25,767 - Creating ipset chain i-2-29-VM 2019-01-18 14:54:25,768 - ipset -F i-2-29-VM 2019-01-18 14:54:25,772 - ipset chain not exists creating i-2-29-VM 2019-01-18 14:54:25,772 - ipset -N i-2-29-VM iphash family inet 2019-01-18 14:54:25,777 - vm ip 172.30.6.62019-01-18 14:54:25,777 - ipset -A i-2-29-VM 172.30.6.60 2019-01-18 14:54:25,782 - Failed to network rule ! Traceback (most recent call last): File "/usr/share/cloudstack-common/scripts/vm/network/security_group.py", line 995, in add_network_rules default_network_rules(vmName, vm_id, vm_ip, vm_ip6, vmMac, vif, brname, sec_ips) File "/usr/share/cloudstack-common/scripts/vm/network/security_group.py", line 490, in default_network_rules if ips[0] == "0": IndexError: list index out of range Added a few lines to debug the script security_group.py and it would appear this line (line 487) is the culprit - ips = sec_ips.split(';') as far as I can tell the separator should be a colon (':') and not a semi colon or at least on my setup. Once changed to - ips = sec_ips.split(':') the iptables rules were updated correctly on the host the VM was migrated to. I don't know if this is the right change to make as the script is over a 1000 lines long and imports other modules so woudl appreciate any input as this seems to be a key function of Advanced with security groups. Thanks Jon
Re: Possible bug fix - sanity check please
Please ignore, it has already been fixed but it is not included in the 4.11.2 release (due in the 4.11.3 one). From: Jon Marshall Sent: 23 January 2019 15:30 To: users@cloudstack.apache.org Subject: Possible bug fix - sanity check please The following issue was seen using CS 4.11.2 in advanced mode with security group isolation. VM (internal name i-2-29-VM) - is created and works fine with default security group allowing inbound SSH and ICMP echo request. Migrate the VM to another of the compute nodes and the VM migrates and from the proxy console the VM can connect out but the default security group inbound is not copied across the compute node. The /var/log/cloudstack/agent/security_group.log shows on the compute node the VM has migrated to - 2019-01-18 14:54:25,724 - Ignoring failure to delete ebtables chain for vm i-2-29-VM 2019-01-18 14:54:25,724 - ebtables -t nat -F i-2-29-VM-out 2019-01-18 14:54:25,730 - Ignoring failure to delete ebtables chain for vm i-2-29-VM 2019-01-18 14:54:25,730 - ebtables -t nat -F i-2-29-VM-in-ips 2019-01-18 14:54:25,735 - Ignoring failure to delete ebtables chain for vm i-2-29-VM 2019-01-18 14:54:25,735 - ebtables -t nat -F i-2-29-VM-out-ips 2019-01-18 14:54:25,740 - Ignoring failure to delete ebtables chain for vm i-2-29-VM 2019-01-18 14:54:25,741 - iptables -N i-2-29-VM 2019-01-18 14:54:25,745 - ip6tables -N i-2-29-VM 2019-01-18 14:54:25,749 - iptables -N i-2-29-VM-eg 2019-01-18 14:54:25,753 - ip6tables -N i-2-29-VM-eg 2019-01-18 14:54:25,758 - iptables -N i-2-29-def 2019-01-18 14:54:25,763 - ip6tables -N i-2-29-def 2019-01-18 14:54:25,767 - Creating ipset chain i-2-29-VM 2019-01-18 14:54:25,768 - ipset -F i-2-29-VM 2019-01-18 14:54:25,772 - ipset chain not exists creating i-2-29-VM 2019-01-18 14:54:25,772 - ipset -N i-2-29-VM iphash family inet 2019-01-18 14:54:25,777 - vm ip 172.30.6.62019-01-18 14:54:25,777 - ipset -A i-2-29-VM 172.30.6.60 2019-01-18 14:54:25,782 - Failed to network rule ! Traceback (most recent call last): File "/usr/share/cloudstack-common/scripts/vm/network/security_group.py", line 995, in add_network_rules default_network_rules(vmName, vm_id, vm_ip, vm_ip6, vmMac, vif, brname, sec_ips) File "/usr/share/cloudstack-common/scripts/vm/network/security_group.py", line 490, in default_network_rules if ips[0] == "0": IndexError: list index out of range Added a few lines to debug the script security_group.py and it would appear this line (line 487) is the culprit - ips = sec_ips.split(';') as far as I can tell the separator should be a colon (':') and not a semi colon or at least on my setup. Once changed to - ips = sec_ips.split(':') the iptables rules were updated correctly on the host the VM was migrated to. I don't know if this is the right change to make as the script is over a 1000 lines long and imports other modules so woudl appreciate any input as this seems to be a key function of Advanced with security groups. Thanks Jon
Possible bug fix - sanity check please
The following issue was seen using CS 4.11.2 in advanced mode with security group isolation. VM (internal name i-2-29-VM) - is created and works fine with default security group allowing inbound SSH and ICMP echo request. Migrate the VM to another of the compute nodes and the VM migrates and from the proxy console the VM can connect out but the default security group inbound is not copied across the compute node. The /var/log/cloudstack/agent/security_group.log shows on the compute node the VM has migrated to - 2019-01-18 14:54:25,724 - Ignoring failure to delete ebtables chain for vm i-2-29-VM 2019-01-18 14:54:25,724 - ebtables -t nat -F i-2-29-VM-out 2019-01-18 14:54:25,730 - Ignoring failure to delete ebtables chain for vm i-2-29-VM 2019-01-18 14:54:25,730 - ebtables -t nat -F i-2-29-VM-in-ips 2019-01-18 14:54:25,735 - Ignoring failure to delete ebtables chain for vm i-2-29-VM 2019-01-18 14:54:25,735 - ebtables -t nat -F i-2-29-VM-out-ips 2019-01-18 14:54:25,740 - Ignoring failure to delete ebtables chain for vm i-2-29-VM 2019-01-18 14:54:25,741 - iptables -N i-2-29-VM 2019-01-18 14:54:25,745 - ip6tables -N i-2-29-VM 2019-01-18 14:54:25,749 - iptables -N i-2-29-VM-eg 2019-01-18 14:54:25,753 - ip6tables -N i-2-29-VM-eg 2019-01-18 14:54:25,758 - iptables -N i-2-29-def 2019-01-18 14:54:25,763 - ip6tables -N i-2-29-def 2019-01-18 14:54:25,767 - Creating ipset chain i-2-29-VM 2019-01-18 14:54:25,768 - ipset -F i-2-29-VM 2019-01-18 14:54:25,772 - ipset chain not exists creating i-2-29-VM 2019-01-18 14:54:25,772 - ipset -N i-2-29-VM iphash family inet 2019-01-18 14:54:25,777 - vm ip 172.30.6.62019-01-18 14:54:25,777 - ipset -A i-2-29-VM 172.30.6.60 2019-01-18 14:54:25,782 - Failed to network rule ! Traceback (most recent call last): File "/usr/share/cloudstack-common/scripts/vm/network/security_group.py", line 995, in add_network_rules default_network_rules(vmName, vm_id, vm_ip, vm_ip6, vmMac, vif, brname, sec_ips) File "/usr/share/cloudstack-common/scripts/vm/network/security_group.py", line 490, in default_network_rules if ips[0] == "0": IndexError: list index out of range Added a few lines to debug the script security_group.py and it would appear this line (line 487) is the culprit - ips = sec_ips.split(';') as far as I can tell the separator should be a colon (':') and not a semi colon or at least on my setup. Once changed to - ips = sec_ips.split(':') the iptables rules were updated correctly on the host the VM was migrated to. I don't know if this is the right change to make as the script is over a 1000 lines long and imports other modules so woudl appreciate any input as this seems to be a key function of Advanced with security groups. Thanks Jon
Possible bug in migrating VMs with advanced using security groups ?
Don't know whether this is a bug or to do wit setup - CS 4.11.2 1 x manager, 3 x compute nodes runnning Advanced with security groups. VM (internal name i-2-29-VM) - is created and works fine with default security group allowing inbound SSH and ICMP echo request. Migrate the VM to another of the compute nodes and the VM migrate and from the proxy console the VM can connect out but the default security group inbound is not copied across the compute node. The /var/log/cloudstack/agent/security_group.log shows on the compute node the VM has migrated to - 2019-01-18 14:54:25,724 - Ignoring failure to delete ebtables chain for vm i-2-29-VM 2019-01-18 14:54:25,724 - ebtables -t nat -F i-2-29-VM-out 2019-01-18 14:54:25,730 - Ignoring failure to delete ebtables chain for vm i-2-29-VM 2019-01-18 14:54:25,730 - ebtables -t nat -F i-2-29-VM-in-ips 2019-01-18 14:54:25,735 - Ignoring failure to delete ebtables chain for vm i-2-29-VM 2019-01-18 14:54:25,735 - ebtables -t nat -F i-2-29-VM-out-ips 2019-01-18 14:54:25,740 - Ignoring failure to delete ebtables chain for vm i-2-29-VM 2019-01-18 14:54:25,741 - iptables -N i-2-29-VM 2019-01-18 14:54:25,745 - ip6tables -N i-2-29-VM 2019-01-18 14:54:25,749 - iptables -N i-2-29-VM-eg 2019-01-18 14:54:25,753 - ip6tables -N i-2-29-VM-eg 2019-01-18 14:54:25,758 - iptables -N i-2-29-def 2019-01-18 14:54:25,763 - ip6tables -N i-2-29-def 2019-01-18 14:54:25,767 - Creating ipset chain i-2-29-VM 2019-01-18 14:54:25,768 - ipset -F i-2-29-VM 2019-01-18 14:54:25,772 - ipset chain not exists creating i-2-29-VM 2019-01-18 14:54:25,772 - ipset -N i-2-29-VM iphash family inet 2019-01-18 14:54:25,777 - vm ip 172.30.6.60 2019-01-18 14:54:25,777 - ipset -A i-2-29-VM 172.30.6.60 2019-01-18 14:54:25,782 - Failed to network rule ! Traceback (most recent call last): File "/usr/share/cloudstack-common/scripts/vm/network/security_group.py", line 995, in add_network_rules default_network_rules(vmName, vm_id, vm_ip, vm_ip6, vmMac, vif, brname, sec_ips) File "/usr/share/cloudstack-common/scripts/vm/network/security_group.py", line 490, in default_network_rules if ips[0] == "0": IndexError: list index out of range Jon
Re: VR DHCP server does not lease secondary IPs to guests
If you allocate a secondary IP to a VM you don't want the VR to offer that IP to another VM otherwise you could end up with two VMs trying to use the same IP. If you remove the secondary IP from the VM then the VR can allocate that IP to another VM. From: Fariborz Navidan Sent: 21 September 2018 19:03 To: users@cloudstack.apache.org Subject: VR DHCP server does not lease secondary IPs to guests Hello folks, When I add secondary IPs to a VM, DHCP server on virtual router does not offer those to dhcp client. Do I need to modify OS templates to do this? If yes, how should I get secondary IPs when they are not available in the dhclient's lease files? Thanks for any advise!
Re: Basic vs advanced networking
Dag Just a quick follow up on this. I haven't tried security groups with advanced networking so tried to set up yesterday but had issues adding host. For normal advanced network (no security groups) I configure the NIC for VM traffic (and public) without an IP and set the switch port to be a trunk and then ACS just creates the subinterfaces internally when I add networks. With advanced and security groups I assume I do the same for the guest VM traffic NIC (no public) and just configure it as a trunk as there will be multiple vlans on it ? So no IP address assigned to that NIC, correct ? Jon From: Dag Sonstebo Sent: 09 August 2018 10:13 To: users@cloudstack.apache.org Subject: Re: Basic vs advanced networking Hi Jon, In short you are right – advanced networking offers a lot more features, and the only benefit of basic networking is a simpler setup (no VRs) as well as to a certain degree more scalability since you can run relatively large L3 networks (with the proviso that broadcast traffic may be a limiting factor). As security groups rely on access to underlying networking on the hypervisor they will also most likely never work on VMware due to the proprietary nature of ESXi. If you look through the user@ / dev@ mailing list you’ll see we have started discussions around deprecating basic networks for advanced zone with security groups – since the latter offers the same networking functionality as basic (security groups, no VRs) but offers the scalability of running multiple of these basic type networks (a traditional basic zone can only run one network). So all in all if you are looking at longer term strategy whilst wanting the simplicity of basic networking you should look at this option (looks like you might have played with this already). Regards, Dag Sonstebo Cloud Architect ShapeBlue On 09/08/2018, 07:54, "Jon Marshall" wrote: Having looked at both in a lab environment I am wondering what the advantages of running basic networking are. Obviously with basic you can use security groups (although you can with advanced if using KVM) but apart from that advanced seems to offer all the features of basic plus a whole lot more. The only downside I have found with advanced is that VRs seems to be the most "flaky" aspect of ACS and obviously you end up with a whole lot more of them. Would be interested to hear opinions either way. Thanks dag.sonst...@shapeblue.com www.shapeblue.com<http://www.shapeblue.com> Shapeblue - The CloudStack Company<http://www.shapeblue.com/> www.shapeblue.com ShapeBlue are the largest independent integrator of CloudStack technologies globally and are specialists in the design and implementation of IaaS cloud infrastructures for both private and public cloud implementations. 53 Chandos Place, Covent Garden, London WC2N 4HSUK @shapeblue
Re: Basic vs advanced networking
Hi Dag Makes a lot of sense, thanks for that. Jon From: Dag Sonstebo Sent: 09 August 2018 10:13 To: users@cloudstack.apache.org Subject: Re: Basic vs advanced networking Hi Jon, In short you are right – advanced networking offers a lot more features, and the only benefit of basic networking is a simpler setup (no VRs) as well as to a certain degree more scalability since you can run relatively large L3 networks (with the proviso that broadcast traffic may be a limiting factor). As security groups rely on access to underlying networking on the hypervisor they will also most likely never work on VMware due to the proprietary nature of ESXi. If you look through the user@ / dev@ mailing list you’ll see we have started discussions around deprecating basic networks for advanced zone with security groups – since the latter offers the same networking functionality as basic (security groups, no VRs) but offers the scalability of running multiple of these basic type networks (a traditional basic zone can only run one network). So all in all if you are looking at longer term strategy whilst wanting the simplicity of basic networking you should look at this option (looks like you might have played with this already). Regards, Dag Sonstebo Cloud Architect ShapeBlue On 09/08/2018, 07:54, "Jon Marshall" wrote: Having looked at both in a lab environment I am wondering what the advantages of running basic networking are. Obviously with basic you can use security groups (although you can with advanced if using KVM) but apart from that advanced seems to offer all the features of basic plus a whole lot more. The only downside I have found with advanced is that VRs seems to be the most "flaky" aspect of ACS and obviously you end up with a whole lot more of them. Would be interested to hear opinions either way. Thanks dag.sonst...@shapeblue.com www.shapeblue.com<http://www.shapeblue.com> Shapeblue - The CloudStack Company<http://www.shapeblue.com/> www.shapeblue.com ShapeBlue are the largest independent integrator of CloudStack technologies globally and are specialists in the design and implementation of IaaS cloud infrastructures for both private and public cloud implementations. 53 Chandos Place, Covent Garden, London WC2N 4HSUK @shapeblue
Basic vs advanced networking
Having looked at both in a lab environment I am wondering what the advantages of running basic networking are. Obviously with basic you can use security groups (although you can with advanced if using KVM) but apart from that advanced seems to offer all the features of basic plus a whole lot more. The only downside I have found with advanced is that VRs seems to be the most "flaky" aspect of ACS and obviously you end up with a whole lot more of them. Would be interested to hear opinions either way. Thanks
Tips for troubleshooting
I have a test setup for CS 4.11.1 advanced networking KVM on Centos 7. One manager node and one compute node 2 NICs (1 management/storage), I a trunk link for VM traffic. I create a guest network, an isolated network and a VPC with it's own isolated network so 3 VRs and each network has a VM created. Every time I reboot both servers I get exactly the same results - 1) the VR for the guest network is up and the VM is up. 2) the VR for the isolated network is up but the VM is stopped 3) the VR for the VPC isolated network is stuck in starting and so are the VMs. For 2) the solution is simply to start up the VMs For 3) you cannot do anything until the VR goes into stopped mode which takes approx 10 mins. You can then destroy it and simply start one of the stopped VMs which recreates the VR. I asked about the VPC VR problem before and was told to check the management server log and the agent log. The agent log shows nothing. In the management server log I traced the entire ctx job(s) and I can see it has enough resource and the start job but then it just says it is pending then eventually reports the host as unreachable. Not asking for someone to fix this for me but can someone tell me how to troubleshoot this because the logs are not showing much and I can reproduce this every time I reboot. Jon
Re: VPC virtual router will not start on reboot
Hi Dag Sorry I am running 4.11.1 already. I just created an isolated network with a VM on the same host (ID = 1) and it works fine so i'm not sure it's a host specific issue. It seems to only come up with VRs for VPCs. I'll keep digging :) From: Dag Sonstebo Sent: 23 July 2018 09:19 To: users@cloudstack.apache.org Subject: Re: VPC virtual router will not start on reboot Hi Jon, First of all I would advise you to upgrade to 4.11.1, it comes with a number of bug fixes. Wrt the errors you are seeing they tend to be fairly clear – the KVM host with ID=1 in your DB is not checking in, or taking time checking in, and the management server can therefore not communicate with it. Check the startup of the agent works as expected, and also check the agent logs. Regards, Dag Sonstebo Cloud Architect ShapeBlue On 23/07/2018, 09:11, "Jon Marshall" wrote: Cloudstack 4.11.0 - KVM Created on VPC with 1 isolated network as test with 2 instances and it works as expected. When doing a reboot of all nodes (compute and management) when it comes back up the virtual router will not start. This happens each time I reboot. I have gone through management server logs and it is not a resource issue as it reports CPU, memory etc. as okay. It does report this - 2018-07-23 08:37:09,931 ERROR [c.c.v.VmWorkJobHandlerProxy] (Work-Job-Executor-21:ctx-1c696ea5 job-821/job-822 ctx-bef3996c) (logid:a19d2179) Invocation exception, caused by: com.cloud.exception.AgentUnavailableException: Resource [Host:1] is unreachable: Host 1: Unable to start instance due to Unable to start VM:390a0aad-9c13-4578-bbbf-4de1323b142e due to error in finalizeStart, not retrying checking the host table in the database that same host is running the 2 system VMs so not sure how it is unreachable ? Could someone offer any tips/pointers on how to troubleshoot this ? Jon dag.sonst...@shapeblue.com www.shapeblue.com<http://www.shapeblue.com> Shapeblue - The CloudStack Company<http://www.shapeblue.com/> www.shapeblue.com ShapeBlue are the largest independent integrator of CloudStack technologies globally and are specialists in the design and implementation of IaaS cloud infrastructures for both private and public cloud implementations. 53 Chandos Place, Covent Garden, London WC2N 4HSUK @shapeblue
VPC virtual router will not start on reboot
Cloudstack 4.11.0 - KVM Created on VPC with 1 isolated network as test with 2 instances and it works as expected. When doing a reboot of all nodes (compute and management) when it comes back up the virtual router will not start. This happens each time I reboot. I have gone through management server logs and it is not a resource issue as it reports CPU, memory etc. as okay. It does report this - 2018-07-23 08:37:09,931 ERROR [c.c.v.VmWorkJobHandlerProxy] (Work-Job-Executor-21:ctx-1c696ea5 job-821/job-822 ctx-bef3996c) (logid:a19d2179) Invocation exception, caused by: com.cloud.exception.AgentUnavailableException: Resource [Host:1] is unreachable: Host 1: Unable to start instance due to Unable to start VM:390a0aad-9c13-4578-bbbf-4de1323b142e due to error in finalizeStart, not retrying checking the host table in the database that same host is running the 2 system VMs so not sure how it is unreachable ? Could someone offer any tips/pointers on how to troubleshoot this ? Jon
Re: VPC ACLs SRC and DST
Hi Andrija Following on from that if you are using an isolated guest network and static IP for NAT to a VM private IP is there anyway in the IP address firewall configuration to deny certain traffic as well as permit traffic. Jon From: Andrija Panic Sent: 18 July 2018 16:17 To: users Subject: Re: VPC ACLs SRC and DST Hi Adam, unless something has changed in most recent version (doubt that) - no, you can only define one CIDR in each ACL rule, which, if creating egress/outbound rule is considered as destination IP/CIDR to which you alow/deny access from your VPC network, or if using ingress (inbound) rule, then this CIDR represents the SOURCE from which access is allowed/denied to your VPC network (whole VPC network in both cases - i.e. it's not granular on single IP/VM level - for this you need to use local firewall if really needed) Hope that answers your question. Andrija On Wed, 18 Jul 2018 at 17:07, Adam Witwicki wrote: > Hello > > Is there a way we can add the DST IP to the ACL lists in a VPC as well as > the SRC IP (outbound) > > Thanks > > Adam > > > > Disclaimer Notice: > This email has been sent by Oakford Technology Limited, while we have > checked this e-mail and any attachments for viruses, we can not guarantee > that they are virus-free. You must therefore take full responsibility for > virus checking. > This message and any attachments are confidential and should only be read > by those to whom they are addressed. If you are not the intended recipient, > please contact us, delete the message from your computer and destroy any > copies. Any distribution or copying without our prior permission is > prohibited. > Internet communications are not always secure and therefore Oakford > Technology Limited does not accept legal responsibility for this message. > The recipient is responsible for verifying its authenticity before acting > on the contents. Any views or opinions presented are solely those of the > author and do not necessarily represent those of Oakford Technology Limited. > Registered address: Oakford Technology Limited, 10 Prince Maurice Court, > Devizes, Wiltshire. SN10 2RT. > Registered in England and Wales No. 5971519 > > -- Andrija Panić
Re: VPC vitual router stuck in starting
The virtual router for the VPC finally went to stopped and I did a restart VPC and did a clean up and the VR restarted. I could then restart the VMs. From: Jon Marshall Sent: 17 July 2018 13:46 To: users@cloudstack.apache.org Subject: RE: VPC vitual router stuck in starting Hi Jon, It is possible to connect directly to the VR via console KVM ? (virsh console r-XXX-VM) If yes, please check cloud.log, State "starting" from CS doesn't mean it's not okay from KVM The cloud-agent log on KVM host could be useful as well. Best regards, N.B -Message d'origine- De : Jon Marshall [mailto:jms@hotmail.co.uk] Envoyé : mardi 17 juillet 2018 12:28 À : users@cloudstack.apache.org Objet : VPC vitual router stuck in starting Testing with advanced networking v4.11 using KVM. I setup some isolated networks (2) and then a VPC which all worked fine. I then rebooted compute nodes (x3) and manager and when it all came back the VPC virtual router is stuck in starting as are the VMs in the VPC. I have checked the management server logs and I see a lot of - com.cloud.utils.exception.ExecutionException: Unable to start VM:ed9a140a-9cd6-47e9-a2b7-8d34ca5b6ca7 due to error in finalizeStart, not retrying com.cloud.exception.AgentUnavailableException: Resource [Host:4] is unreachable: Host 4: Unable to start instance due to Unable to start VM:ed9a140a-9cd6-47e9-a2b7-8d34ca5b6ca7 due to error in finalizeStart, not retrying Caused by: com.cloud.utils.exception.ExecutionException: Unable to start VM:ed9a140a-9cd6-47e9-a2b7-8d34ca5b6ca7 due to error in finalizeStart, not retrying com.cloud.utils.exception.ExecutionException: Unable to start VM:ed9a140a-9cd6-47e9-a2b7-8d34ca5b6ca7 due to error in finalizeStart, not retrying it says Host 4 is not reachable but I have another virtual router for one of the isolated networks and some guest VMs running on the same host. 1) Does anyone have any suggestions as to how to troubleshoot this beyond looking through the logs ? 2) how can I stop the vritual router, destroy it and recreate it as while it is starting you cannot do anything with it ? thanks
Re: VPC vitual router stuck in starting
Hi Nicolas Sorry to have to ask but using "virsh console ..." command what is the username/password I should enter ? I can never seem to find answers to these sort of questions no matter how hard I look 😞 By the way I can reproduce this problem ie. I deleted the VPC, created a new one with some isolated networks and VMs, rebooted everything and same issue - VR for VPC stuck in starting. Jon From: Nicolas Bouige Sent: 17 July 2018 13:46 To: users@cloudstack.apache.org Subject: RE: VPC vitual router stuck in starting Hi Jon, It is possible to connect directly to the VR via console KVM ? (virsh console r-XXX-VM) If yes, please check cloud.log, State "starting" from CS doesn't mean it's not okay from KVM The cloud-agent log on KVM host could be useful as well. Best regards, N.B -----Message d'origine- De : Jon Marshall [mailto:jms@hotmail.co.uk] Envoyé : mardi 17 juillet 2018 12:28 À : users@cloudstack.apache.org Objet : VPC vitual router stuck in starting Testing with advanced networking v4.11 using KVM. I setup some isolated networks (2) and then a VPC which all worked fine. I then rebooted compute nodes (x3) and manager and when it all came back the VPC virtual router is stuck in starting as are the VMs in the VPC. I have checked the management server logs and I see a lot of - com.cloud.utils.exception.ExecutionException: Unable to start VM:ed9a140a-9cd6-47e9-a2b7-8d34ca5b6ca7 due to error in finalizeStart, not retrying com.cloud.exception.AgentUnavailableException: Resource [Host:4] is unreachable: Host 4: Unable to start instance due to Unable to start VM:ed9a140a-9cd6-47e9-a2b7-8d34ca5b6ca7 due to error in finalizeStart, not retrying Caused by: com.cloud.utils.exception.ExecutionException: Unable to start VM:ed9a140a-9cd6-47e9-a2b7-8d34ca5b6ca7 due to error in finalizeStart, not retrying com.cloud.utils.exception.ExecutionException: Unable to start VM:ed9a140a-9cd6-47e9-a2b7-8d34ca5b6ca7 due to error in finalizeStart, not retrying it says Host 4 is not reachable but I have another virtual router for one of the isolated networks and some guest VMs running on the same host. 1) Does anyone have any suggestions as to how to troubleshoot this beyond looking through the logs ? 2) how can I stop the vritual router, destroy it and recreate it as while it is starting you cannot do anything with it ? thanks
VPC vitual router stuck in starting
Testing with advanced networking v4.11 using KVM. I setup some isolated networks (2) and then a VPC which all worked fine. I then rebooted compute nodes (x3) and manager and when it all came back the VPC virtual router is stuck in starting as are the VMs in the VPC. I have checked the management server logs and I see a lot of - com.cloud.utils.exception.ExecutionException: Unable to start VM:ed9a140a-9cd6-47e9-a2b7-8d34ca5b6ca7 due to error in finalizeStart, not retrying com.cloud.exception.AgentUnavailableException: Resource [Host:4] is unreachable: Host 4: Unable to start instance due to Unable to start VM:ed9a140a-9cd6-47e9-a2b7-8d34ca5b6ca7 due to error in finalizeStart, not retrying Caused by: com.cloud.utils.exception.ExecutionException: Unable to start VM:ed9a140a-9cd6-47e9-a2b7-8d34ca5b6ca7 due to error in finalizeStart, not retrying com.cloud.utils.exception.ExecutionException: Unable to start VM:ed9a140a-9cd6-47e9-a2b7-8d34ca5b6ca7 due to error in finalizeStart, not retrying it says Host 4 is not reachable but I have another virtual router for one of the isolated networks and some guest VMs running on the same host. 1) Does anyone have any suggestions as to how to troubleshoot this beyond looking through the logs ? 2) how can I stop the vritual router, destroy it and recreate it as while it is starting you cannot do anything with it ? thanks
Re: Adding secondary IP to VM
Did a bit of digging in database and there is a table called "nic_secondary_ips" - mysql> select * from nic_secondary_ips; ++--+--+---+-+-++-++---+ | id | uuid | vmId | nicId | ip4_address | ip6_address | network_id | created | account_id | domain_id | ++--+--+---+-+-++-++---+ | 2 | 57921029-893a-4400-b6ac-50d4fd006b74 |9 |15 | 172.30.4.80 | NULL |204 | 2018-07-11 09:23:06 | 2 | 1 | ++--+--+---+-+-++-++---+ 1 row in set (0.00 sec) mysql> so that's where its' stored. Jon From: Andrija Panic Sent: 11 July 2018 10:40 To: users Subject: Re: Adding secondary IP to VM ACS doesn't handle this in any way (except that it might reserve the IP, so it's not possible to add same IP to another VM/nic in same network). You need to manually configure secondary IP on the VM - this is at least in 4.8 release, and per my experience so far. Cheers. On Wed, 11 Jul 2018 at 11:23, Jon Marshall wrote: > I am trying to work out how CS handles additional IPs assigned to a VM. > > > So using DHCP for the VMs if I log onto the virtual router in the > "dhcphosts.txt" can see the VM maping to it's IP. > > > If I then acquire a secondary IP for the VM a couple of questions - > > > 1) where does the virtual router store the information because it is not > in the DHCP file which makes sense but it must record it somewhere because > it won't hand out that same IP to another VM (I tested it). Is it in the > DBase somewhere > > > 2) How do others handle multiple IPs on a VM ie. do you DHCP for the main > interface and then configure static IPs for the sub interfaces or do you > turn off DHCP altogether ? > > > Many thanks > > > Jon > -- Andrija Panić
Adding secondary IP to VM
I am trying to work out how CS handles additional IPs assigned to a VM. So using DHCP for the VMs if I log onto the virtual router in the "dhcphosts.txt" can see the VM maping to it's IP. If I then acquire a secondary IP for the VM a couple of questions - 1) where does the virtual router store the information because it is not in the DHCP file which makes sense but it must record it somewhere because it won't hand out that same IP to another VM (I tested it). Is it in the DBase somewhere 2) How do others handle multiple IPs on a VM ie. do you DHCP for the main interface and then configure static IPs for the sub interfaces or do you turn off DHCP altogether ? Many thanks Jon
Re: Isolated network and ingress rules
Hi Dag Many thanks Jon From: Dag Sonstebo Sent: 06 July 2018 13:01 To: users@cloudstack.apache.org Subject: Re: Isolated network and ingress rules Hi Jon, For normal isolated networks the ingress rules are on the firewall configuration option under each individual public IP address – as oppose to egress rules which apply to the whole network. Regards, Dag Sonstebo Cloud Architect ShapeBlue On 06/07/2018, 12:17, "Jon Marshall" wrote: Quick update re question 2) - where I created a VPC and added a static NAT and it worked as expected. I think this may well be because with VPCs you can configure both ingress and egress rules whereas with a guest isolated network I don't seem to have the ingress option. ____ From: Jon Marshall Sent: 06 July 2018 09:26 To: users@cloudstack.apache.org Subject: Isolated network and ingress rules Have setup advanced network 4.11 KVM and it seems to be a lot more intuitive than basic networking (at least to me 😊) Just a couple of quick questions - 1) when I add a new isolated network with source NAT through the UI no matter what I enter in the Guest gateway and Guest netmask boxes it just uses the initial CIDR block I specified when building the zone. And it reuses this for every new isolated network. Is this normal behaviour ? 2) I tried to add a static NAT for one of the VMs in an isolated network. I know the mapping works because a "curl icanhazip.com" returns the static IP rather than the one used by all the other VMs but I cannot connect to the statically mapped VM from outside. When I go to the Network details in the UI I have egress rules I can edit but no ingress rules tab. Again is this to be expected and if it is any pointers on how to get it working. Thanks dag.sonst...@shapeblue.com www.shapeblue.com<http://www.shapeblue.com> Shapeblue - The CloudStack Company<http://www.shapeblue.com/> www.shapeblue.com ShapeBlue are the largest independent integrator of CloudStack technologies globally and are specialists in the design and implementation of IaaS cloud infrastructures for both private and public cloud implementations. 53 Chandos Place, Covent Garden, London WC2N 4HSUK @shapeblue
Re: Isolated network and ingress rules
Quick update re question 2) - where I created a VPC and added a static NAT and it worked as expected. I think this may well be because with VPCs you can configure both ingress and egress rules whereas with a guest isolated network I don't seem to have the ingress option. From: Jon Marshall Sent: 06 July 2018 09:26 To: users@cloudstack.apache.org Subject: Isolated network and ingress rules Have setup advanced network 4.11 KVM and it seems to be a lot more intuitive than basic networking (at least to me 😊) Just a couple of quick questions - 1) when I add a new isolated network with source NAT through the UI no matter what I enter in the Guest gateway and Guest netmask boxes it just uses the initial CIDR block I specified when building the zone. And it reuses this for every new isolated network. Is this normal behaviour ? 2) I tried to add a static NAT for one of the VMs in an isolated network. I know the mapping works because a "curl icanhazip.com" returns the static IP rather than the one used by all the other VMs but I cannot connect to the statically mapped VM from outside. When I go to the Network details in the UI I have egress rules I can edit but no ingress rules tab. Again is this to be expected and if it is any pointers on how to get it working. Thanks
Isolated network and ingress rules
Have setup advanced network 4.11 KVM and it seems to be a lot more intuitive than basic networking (at least to me 😊) Just a couple of quick questions - 1) when I add a new isolated network with source NAT through the UI no matter what I enter in the Guest gateway and Guest netmask boxes it just uses the initial CIDR block I specified when building the zone. And it reuses this for every new isolated network. Is this normal behaviour ? 2) I tried to add a static NAT for one of the VMs in an isolated network. I know the mapping works because a "curl icanhazip.com" returns the static IP rather than the one used by all the other VMs but I cannot connect to the statically mapped VM from outside. When I go to the Network details in the UI I have egress rules I can edit but no ingress rules tab. Again is this to be expected and if it is any pointers on how to get it working. Thanks
Re: Advanced networking - physical NICs.
Chris Many thanks for that. Jon From: Christoffer Pedersen Sent: 03 July 2018 12:21 To: users@cloudstack.apache.org Subject: Re: Advanced networking - physical NICs. Hi Jon, I would suppose that several people/providers run guest and public networks together. I was also confused in the start about the cloudstack networking. 1. I guess you can, the traffic will be separated by VLAN’s. 2. When defining a public range, in my experience you have to assign a VLAN to that range. Then just put in the VLAN ID where your respecitve public range resides. 3. You can allocate vlan ranges for guest networks. You can for example use 500-549 as a range. Just bind that to your cloudbr. Cloudstack will manage the sub-bridge for the vlan. 4. You would have a trunk running from the switch to your network port on the server. You would add that port to your cloudbr1 like: auto eth1 iface eth1 inet manual auto cloudbr1 iface cloudbr1 inet manual bridge_ports eth1 Please correct me if I’m wrong, i al using openvswitch so my config is different. Cloud will handle the tagging if you specify a vlan for your public or guest networks. Chris Sent from my iPhone > On 3. Jul 2018, at 12:55, Jon Marshall wrote: > > I come from a Cisco background so I understand vlans, tagging and how to > configure switches for trunks and I also understand how to configure tagging > on CentOS. > > > The bit that is just not clicking with me is how to configure the NIC with CS > using KVM and advanced networking. > > > The management/storage NIC is easy as I just assign an IP directly the bridge > configuration file (cloudbr0) as there is no vlan tagging here. > > > The second NIC I want to run guest and public traffic across and I am using > another bridge - cloudbr1. > > > Questions - > > > 1) Is it okay to run guest and public traffic on the same NIC ? > > > 2) do the public IPs only live on the VR ie. do I need a cloudbr1. > for the public IP range ? > > > 3) whenever I add a new guest network once setup do I first need to setup the > cloudbr1. for that guest network or does cloudstack do this > automatically ? > > > 4) Assuming it is okay to run guest and public on same NIC what would the > initial configuration of cloudbr1 look like ? > > > > Apologies for all the questions but I am just getting completely stuck on this >
Re: Advanced networking - physical NICs.
Paul Many thanks, will give it a go. Jon From: Paul Angus Sent: 03 July 2018 12:13 To: users@cloudstack.apache.org Subject: RE: Advanced networking - physical NICs. Hi Jon, 1. Yes 2. You tell CloudStack what VLAN the public IPs are on, CloudStack will add the VLAN tags 3. CloudStack will do it automatically 4. 'something' like this: Ifcfg-eth1 DEVICE=eth1 ONBOOT=yes HOTPLUG=no BOOTPROTO=none TYPE=Ethernet BRIDGE= cloudbr1 NM_CONTROLLED=no Ifcfg-cloudbr1 DEVICE= cloudbr1 TYPE=Bridge ONBOOT=yes BOOTPROTO=none IPV6INIT=no IPV6_AUTOCONF=no STP=off NOTE the public/guest interface (eth1 in this case) will then need to be connected to a trunk interface on your switch which allows all of the VLANs that you need for public and private networks Kind regards, Paul Angus paul.an...@shapeblue.com www.shapeblue.com<http://www.shapeblue.com> Shapeblue - The CloudStack Company<http://www.shapeblue.com/> www.shapeblue.com ShapeBlue are the largest independent integrator of CloudStack technologies globally and are specialists in the design and implementation of IaaS cloud infrastructures for both private and public cloud implementations. 53 Chandos Place, Covent Garden, London WC2N 4HSUK @shapeblue -Original Message----- From: Jon Marshall Sent: 03 July 2018 11:55 To: users@cloudstack.apache.org Subject: Advanced networking - physical NICs. I come from a Cisco background so I understand vlans, tagging and how to configure switches for trunks and I also understand how to configure tagging on CentOS. The bit that is just not clicking with me is how to configure the NIC with CS using KVM and advanced networking. The management/storage NIC is easy as I just assign an IP directly the bridge configuration file (cloudbr0) as there is no vlan tagging here. The second NIC I want to run guest and public traffic across and I am using another bridge - cloudbr1. Questions - 1) Is it okay to run guest and public traffic on the same NIC ? 2) do the public IPs only live on the VR ie. do I need a cloudbr1. for the public IP range ? 3) whenever I add a new guest network once setup do I first need to setup the cloudbr1. for that guest network or does cloudstack do this automatically ? 4) Assuming it is okay to run guest and public on same NIC what would the initial configuration of cloudbr1 look like ? Apologies for all the questions but I am just getting completely stuck on this
Advanced networking - physical NICs.
I come from a Cisco background so I understand vlans, tagging and how to configure switches for trunks and I also understand how to configure tagging on CentOS. The bit that is just not clicking with me is how to configure the NIC with CS using KVM and advanced networking. The management/storage NIC is easy as I just assign an IP directly the bridge configuration file (cloudbr0) as there is no vlan tagging here. The second NIC I want to run guest and public traffic across and I am using another bridge - cloudbr1. Questions - 1) Is it okay to run guest and public traffic on the same NIC ? 2) do the public IPs only live on the VR ie. do I need a cloudbr1. for the public IP range ? 3) whenever I add a new guest network once setup do I first need to setup the cloudbr1. for that guest network or does cloudstack do this automatically ? 4) Assuming it is okay to run guest and public on same NIC what would the initial configuration of cloudbr1 look like ? Apologies for all the questions but I am just getting completely stuck on this
Advanced networking adding a host
Trying to setup advanced networking using KVM CS v4.11 When I try to add the first host in the initial setup I get this in the management-server log - local), Ver: v1, Flags: 110, { ReadyAnswer } } 2018-07-03 10:30:37,489 DEBUG [c.c.u.s.SSHCmdHelper] (qtp788117692-16:ctx-c7a9deda ctx-9bbb3bea) (logid:2e852372) SSH command: cloudstack-setup-agent -m 172.30.3.2 -z 1 -p 1 -c 1 -g 9f2b15cb-1b75-321b-bf59-f83e7a5e8efb -a -s --pubNic=cloudbr1 --prvNic=cloudbr0 --guestNic=cloudbr1 --hypervisor=kvm SSH command output: Usage: cloudstack-setup-agent [options] cloudstack-setup-agent: error: no such option: -s 2018-07-03 10:30:37,489 INFO [c.c.h.k.d.LibvirtServerDiscoverer] (qtp788117692-16:ctx-c7a9deda ctx-9bbb3bea) (logid:2e852372) cloudstack agent setup command failed: cloudstack-setup-agent -m 172.30.3.2 -z 1 -p 1 -c 1 -g 9f2b15cb-1b75-321b-bf59-f83e7a5e8efb -a -s --pubNic=cloudbr1 --prvNic=cloudbr0 --guestNic=cloudbr1 --hypervisor=kvm and sure enough there is no "-s" option according to the agent - cloudstack-setup-agent -h Usage: cloudstack-setup-agent [options] Options: -h, --helpshow this help message and exit -aauto mode -m MGT, --host=MGTManagement server hostname or IP-Address -z ZONE, --zone=ZONE zone id -p POD, --pod=POD pod id -c CLUSTER, --cluster=CLUSTER cluster id -t HYPERVISOR, --hypervisor=HYPERVISOR hypervisor type -g GUID, --guid=GUID guid --pubNic=PUBNIC Public traffic interface --prvNic=PRVNIC Private traffic interface --guestNic=GUESTNIC Guest traffic interface anyone have an idea what the management server thinks the "-s" option is meant to be (storage ??)
Re: Adding a static route to the SSVM for remote NFS server
Hi Sateesh I was trying to edit the interfaces files on the SSVM itself, I would never have thought to use that setting. Worked perfectly, many thanks for that, much appreciated. Jon From: Sateesh Chodapuneedi Sent: 27 June 2018 10:58 To: users@cloudstack.apache.org Subject: Re: Adding a static route to the SSVM for remote NFS server Hi Jon, >> Do you know how to add it permanently across reboots ? Yes, we can update the global configuration setting "secstorage.allowed.internal.sites" to achieve that. That is a comma separated list of internal CIDRs having the servers hosting the templates that SSVM tries to download. We can add the CIDR of NFS server over there, just add the CIDR in that comma separated list. Do not overwrite whatever is the current setting, just append after a comma. Regards, Sateesh -Original Message- From: Jon Marshall Reply-To: "users@cloudstack.apache.org" Date: Wednesday, 27 June 2018 at 13:45 To: "users@cloudstack.apache.org" Subject: Re: Adding a static route to the SSVM for remote NFS server Hi Sateesh I can add the route manually but when the SSVM is rebooted it loses that route. I edited the /etc/network/interfaces file and added it there but it still gets overwritten on reboot. Do you know how to add it permanently across reboots ? Jon From: Sateesh Chodapuneedi Sent: 26 June 2018 16:25 To: users@cloudstack.apache.org Subject: Re: Adding a static route to the SSVM for remote NFS server Hi Jon, >> Can I just add a static route to the SSVM for the 172.30.5.0/28 subnet and would this work or is there a better way to do it ? Yes, that should work. I did it some time back in my environment where NFS server is sitting in a separate subnet (in LAN), and public NIC/gateway in the SSVM was used to route the packets to/fro NFS server. We have added static route via the private/management NIC because the NFS server sits in LAN. Let us know how it goes. Regards, Sateesh -Original Message- From: Jon Marshall Reply-To: "users@cloudstack.apache.org" Date: Tuesday, 26 June 2018 at 19:36 To: "users@cloudstack.apache.org" Subject: Adding a static route to the SSVM for remote NFS server I am doing basic networking with 2 NICS (one for management/storage and the other for Guest traffic). When you configure the physical NIC/bridges you can only define one default gateway so I do it for the guest traffic which means the routing table on the SSVM ends up as - root@s-1-VM:/etc# netstat -nr Kernel IP routing table Destination Gateway Genmask Flags MSS Window irtt Iface 0.0.0.0 172.30.4.1 0.0.0.0 UG0 0 0 eth2 8.8.4.4 172.30.3.1 255.255.255.255 UGH 0 0 0 eth1 8.8.8.8 172.30.3.1 255.255.255.255 UGH 0 0 0 eth1 169.254.0.0 0.0.0.0 255.255.0.0 U 0 0 0 eth0 172.30.3.0 0.0.0.0 255.255.255.192 U 0 0 0 eth1 172.30.4.0 0.0.0.0 255.255.255.128 U 0 0 0 eth2 where 172.30.3.0/26 is the management network and 172.30.4.0/25 is the guest network. My NFS server has an IP of 172.30.5.2 so it is on a different subnet which means the the secondary storage would have to run over the guest NIC if I am understanding this properly. I want storage over the management NIC. Based on some advice on this mailing list I configured storage to use the same bridge (KVM label) as management but it won't build ie. it errors on the storage traffic part I suspect because the subnet details I enter are not part of the management subnet. Can I just add a static route to the SSVM for the 172.30.5.0/28 subnet and would this work or is there a better way to do it ? On a more general note with basic networking the assumption seems to be you run everything over the same NIC and if you don't it seems to cause no end of problems :) DISCLAIMER == This e-mail may contain privileged and confidential information which is the property of Accelerite, a Persistent Systems business. It is intended only for the use of the individual or entity to which it is addressed. If you are not the intended recipient, you are not authorized to read, retain, copy, print, distribute or use this message. If you have received this communication in error, please notify the sender and delete all copies of this message. Accelerite, a Persistent Systems business does not accept any liability for virus infected mails.
Re: Adding a static route to the SSVM for remote NFS server
Hi Sateesh I can add the route manually but when the SSVM is rebooted it loses that route. I edited the /etc/network/interfaces file and added it there but it still gets overwritten on reboot. Do you know how to add it permanently across reboots ? Jon From: Sateesh Chodapuneedi Sent: 26 June 2018 16:25 To: users@cloudstack.apache.org Subject: Re: Adding a static route to the SSVM for remote NFS server Hi Jon, >> Can I just add a static route to the SSVM for the 172.30.5.0/28 subnet and >> would this work or is there a better way to do it ? Yes, that should work. I did it some time back in my environment where NFS server is sitting in a separate subnet (in LAN), and public NIC/gateway in the SSVM was used to route the packets to/fro NFS server. We have added static route via the private/management NIC because the NFS server sits in LAN. Let us know how it goes. Regards, Sateesh -Original Message----- From: Jon Marshall Reply-To: "users@cloudstack.apache.org" Date: Tuesday, 26 June 2018 at 19:36 To: "users@cloudstack.apache.org" Subject: Adding a static route to the SSVM for remote NFS server I am doing basic networking with 2 NICS (one for management/storage and the other for Guest traffic). When you configure the physical NIC/bridges you can only define one default gateway so I do it for the guest traffic which means the routing table on the SSVM ends up as - root@s-1-VM:/etc# netstat -nr Kernel IP routing table Destination Gateway Genmask Flags MSS Window irtt Iface 0.0.0.0 172.30.4.1 0.0.0.0 UG0 0 0 eth2 8.8.4.4 172.30.3.1 255.255.255.255 UGH 0 0 0 eth1 8.8.8.8 172.30.3.1 255.255.255.255 UGH 0 0 0 eth1 169.254.0.0 0.0.0.0 255.255.0.0 U 0 0 0 eth0 172.30.3.0 0.0.0.0 255.255.255.192 U 0 0 0 eth1 172.30.4.0 0.0.0.0 255.255.255.128 U 0 0 0 eth2 where 172.30.3.0/26 is the management network and 172.30.4.0/25 is the guest network. My NFS server has an IP of 172.30.5.2 so it is on a different subnet which means the the secondary storage would have to run over the guest NIC if I am understanding this properly. I want storage over the management NIC. Based on some advice on this mailing list I configured storage to use the same bridge (KVM label) as management but it won't build ie. it errors on the storage traffic part I suspect because the subnet details I enter are not part of the management subnet. Can I just add a static route to the SSVM for the 172.30.5.0/28 subnet and would this work or is there a better way to do it ? On a more general note with basic networking the assumption seems to be you run everything over the same NIC and if you don't it seems to cause no end of problems :) DISCLAIMER == This e-mail may contain privileged and confidential information which is the property of Accelerite, a Persistent Systems business. It is intended only for the use of the individual or entity to which it is addressed. If you are not the intended recipient, you are not authorized to read, retain, copy, print, distribute or use this message. If you have received this communication in error, please notify the sender and delete all copies of this message. Accelerite, a Persistent Systems business does not accept any liability for virus infected mails.
Adding a static route to the SSVM for remote NFS server
I am doing basic networking with 2 NICS (one for management/storage and the other for Guest traffic). When you configure the physical NIC/bridges you can only define one default gateway so I do it for the guest traffic which means the routing table on the SSVM ends up as - root@s-1-VM:/etc# netstat -nr Kernel IP routing table Destination Gateway Genmask Flags MSS Window irtt Iface 0.0.0.0 172.30.4.1 0.0.0.0 UG0 0 0 eth2 8.8.4.4 172.30.3.1 255.255.255.255 UGH 0 0 0 eth1 8.8.8.8 172.30.3.1 255.255.255.255 UGH 0 0 0 eth1 169.254.0.0 0.0.0.0 255.255.0.0 U 0 0 0 eth0 172.30.3.0 0.0.0.0 255.255.255.192 U 0 0 0 eth1 172.30.4.0 0.0.0.0 255.255.255.128 U 0 0 0 eth2 where 172.30.3.0/26 is the management network and 172.30.4.0/25 is the guest network. My NFS server has an IP of 172.30.5.2 so it is on a different subnet which means the the secondary storage would have to run over the guest NIC if I am understanding this properly. I want storage over the management NIC. Based on some advice on this mailing list I configured storage to use the same bridge (KVM label) as management but it won't build ie. it errors on the storage traffic part I suspect because the subnet details I enter are not part of the management subnet. Can I just add a static route to the SSVM for the 172.30.5.0/28 subnet and would this work or is there a better way to do it ? On a more general note with basic networking the assumption seems to be you run everything over the same NIC and if you don't it seems to cause no end of problems :)
Re: Storage traffic clarification.
Ilya Thanks for the response. So if I use cloudbr0 for management then define that on the storage icon as well when setting up a zone. Is there something else I need to do as well though because when I set it up I have cloudbr0 for management and cloudbr1 for guest and in the network configuration files I only define a default gateway in the cloudbr1 file. This is what caught me out originally ie. I defined a default gateway in both cloudbrx files and the SSVM chose the management vlan as it's default gateway so the guest traffic did not work. If i only set the default gateway in the guest subnet everything works but then the SSVM will have a default gateway in the guest IP subnet and as it does not have an interface in the NFS subnet it has to use that default gateway to get to the NFS server. Perhaps I am not understanding how cloudstack is doing the routing internally ? Jon From: ilya musayev Sent: 20 June 2018 21:20 To: users@cloudstack.apache.org Subject: Re: Storage traffic clarification. Jon with Basic Network - it implies you have all in one network for everything. If you have a storage network that is L3 routable and you don’t want to use guest network - then when you create a zone - use storage label and define what bridge will be used to get there. If it’s not guest bridge you wan to use - then use the management Bridge. Regards Ilya On Wed, Jun 20, 2018 at 12:25 AM Jon Marshall wrote: > I am probably missing something obvious but according to this article ( > https://www.shapeblue.com/understanding-cloudstacks-physical-networking-architecture/) [https://www.shapeblue.com/wp-content/uploads/2013/01/PhysicalNetworkingBlog_basNetWiz-300x239.png]<https://www.shapeblue.com/understanding-cloudstacks-physical-networking-architecture/> Understanding CloudStack’s Physical Networking ...<https://www.shapeblue.com/understanding-cloudstacks-physical-networking-architecture/> www.shapeblue.com Understanding and configuring the physical connections of a host in a CloudStack deployment can at first be very confusing. While Software Defined Networking (SDN) is set to greatly simplify some aspects, its integration within CloudStack is not fully mature yet and it won’t be the right solution for everyone. > by default primary and secondary storage traffic travels across the > management network. > > As an example assume basic networking with 2 NICS, one for management with > an IP subnet, the other NIC for guest traffic using a different subnet. A > physical host should only have one default gateway and this would have to > be from the guest VM subnet. > > I setup two tests - > > 1) the NFS server had an IP address from the management subnet > > 2) the NFS server was on a completely different IP subnet ie. not the > management or the guest IP subnets. > > Both worked but in test 2 I can't see how the storage traffic could be > using the management NIC because there is no default gateway on the compute > nodes for the management subnet and the NFS server is on a remote network. > > So is storage traffic in test 2 actually running across the guest NIC ? > > And as the recommendation is to have separate storage from guest traffic > does this mean the NFS server has to be in the management subnet ? > > Thanks >
Re: advanced networking with public IPs direct to VMs
Hi Rafael Just to let you know I reran the 2 NIC setup and it worked fine this time so it must have been something I did in the setup. Many thanks for all the help Jon From: Rafael Weingärtner Sent: 15 June 2018 11:40 To: users Subject: Re: advanced networking with public IPs direct to VMs Did you notice some problems in the log files when you tested with 2 NICs? When using NFS cluster wide storage, the behavior should be the same as with 3 NICs. There might be something in your configuration. The problem for zone wide storage is what we discussed before though. 1) if I want to run the management/storage traffic over the same NIC the NFS server needs to be in the management subnet No. You should be able to setup different network ranges for each one of them. 2) when I do the initial configuration I need to drag and drop the storage icon and use the same label as the management traffic If you are using only two NICs, for sure you need to configure the traffic labels according. I mean, you have two only NICs, then you need to configure the labels (cloudbr0 and cloudbr2) in that physical network tab in the zone configuration. On Thu, Jun 14, 2018 at 5:03 PM, Jon Marshall wrote: > Hi Rafael > > > I did log a bug but when rebuilding I found some slightly different > behaviour so have temporarily removed it. > > > So using cluster NFS and 3 NICs as already described VM HA works. > > > Because the recommendation for basic network setup seems to be run > storage/management over the same NIC and guest on another, so 2 NICs in > total, I set it up this way using cluster NFS and to my surprise VM HA did > not work so it is obviously a bit more complicated than it first appeared. > > > My NFS server is on a different subnet than the management server and when > I set it up in the UI because the storage traffic runs over the management > NIC by default I did not assign a label to the storage traffic, ie. I only > assigned labels to management and guest. > > > So two thoughts occur which I can test unless you can see the issue - > > > 1) if I want to run the management/storage traffic over the same NIC the > NFS server needs to be in the management subnet > > > or > > > 2) when I do the initial configuration I need to drag and drop the storage > icon and use the same label as the management traffic > > > Personally I can't see how 2) will help ie. the only time I should need to > assign a label to storage is if I use a different NIC. > > > Apologies for bringing this up again but am happy to run any tests and > would like to file accurate bug report. > > > > > > > > From: Rafael Weingärtner > Sent: 11 June 2018 10:58 > To: users > Subject: Re: advanced networking with public IPs direct to VMs > > Well, it seems that you have found a bug. Can you fill out an issue report > on Github? > > Thanks for the hard work on debugging and testing. > > On Fri, Jun 8, 2018 at 2:17 PM, Jon Marshall > wrote: > > > So based on Erik's suggestion (thanks Erik) I rebuilt the management > > server and setup cluster wide primary storage as opposed to zone wide > which > > I have been using so far. > > > > > > Still using 3 NICs (management/Guest/storage) and basic networking. > > > > > > And VM HA now works. In addition it failed over quicker than it did when > I > > had zone wide NFS storage on a single NIC. > > > > > > Still a bit confused about this output where it is still showing the > > storage_ip_addresses as 172.30.3.x IPs which is the management subnet but > > maybe I am reading it incorrectly. > > > > > > > > mysql> select * from cloud.host; > > ++-+ > > --++++-- > > ---+-++- > > +-+--+-- > > -+---++- > > --+-++-- > > --+++-+--+-- > > -+-+-+-- > > ---+++--+--- > > ---+++--+--- > > +---+--- > > +-+++--- > > --+-+-+--+-- > > --+---+-+--+ > > |
Storage traffic clarification.
I am probably missing something obvious but according to this article (https://www.shapeblue.com/understanding-cloudstacks-physical-networking-architecture/) by default primary and secondary storage traffic travels across the management network. As an example assume basic networking with 2 NICS, one for management with an IP subnet, the other NIC for guest traffic using a different subnet. A physical host should only have one default gateway and this would have to be from the guest VM subnet. I setup two tests - 1) the NFS server had an IP address from the management subnet 2) the NFS server was on a completely different IP subnet ie. not the management or the guest IP subnets. Both worked but in test 2 I can't see how the storage traffic could be using the management NIC because there is no default gateway on the compute nodes for the management subnet and the NFS server is on a remote network. So is storage traffic in test 2 actually running across the guest NIC ? And as the recommendation is to have separate storage from guest traffic does this mean the NFS server has to be in the management subnet ? Thanks
Re: advanced networking with public IPs direct to VMs
I did a quick run through and it looked like the same messages as I got with zone wide NFS when it didn't work. I am going to do some more tests and capture full management logs so I can do a comparison to see if there are any differences and once I have done that I will redo the bug report. Just to clarify the second point about labels. When you configure the UI and use the manual setup with basic networking when you configure the physical network the "Management" and "Guest" icons are already under the physical network part and the "Storage" icon is under "Traffic types" For both the 2 and 3 NIC setup I configure Management as cloudbr0 and Guest as cloudbr1. For the 2 NIC setup that is all I do because by default storage runs across management so I assume I don't need to do anything else. For the 3 NIC setup I then drag and drop the Storage icon onto the physical network part and configure it as cloudbr2. Just wanted to make that clear in case I am doing it wrong. Will let you know results of tests next week. From: Rafael Weingärtner Sent: 15 June 2018 11:4 To: users Subject: Re: advanced networking with public IPs direct to VMs Did you notice some problems in the log files when you tested with 2 NICs? When using NFS cluster wide storage, the behavior should be the same as with 3 NICs. There might be something in your configuration. The problem for zone wide storage is what we discussed before though. 1) if I want to run the management/storage traffic over the same NIC the NFS server needs to be in the management subnet No. You should be able to setup different network ranges for each one of them. 2) when I do the initial configuration I need to drag and drop the storage icon and use the same label as the management traffic If you are using only two NICs, for sure you need to configure the traffic labels according. I mean, you have two only NICs, then you need to configure the labels (cloudbr0 and cloudbr2) in that physical network tab in the zone configuration. On Thu, Jun 14, 2018 at 5:03 PM, Jon Marshall wrote: > Hi Rafael > > > I did log a bug but when rebuilding I found some slightly different > behaviour so have temporarily removed it. > > > So using cluster NFS and 3 NICs as already described VM HA works. > > > Because the recommendation for basic network setup seems to be run > storage/management over the same NIC and guest on another, so 2 NICs in > total, I set it up this way using cluster NFS and to my surprise VM HA did > not work so it is obviously a bit more complicated than it first appeared. > > > My NFS server is on a different subnet than the management server and when > I set it up in the UI because the storage traffic runs over the management > NIC by default I did not assign a label to the storage traffic, ie. I only > assigned labels to management and guest. > > > So two thoughts occur which I can test unless you can see the issue - > > > 1) if I want to run the management/storage traffic over the same NIC the > NFS server needs to be in the management subnet > > > or > > > 2) when I do the initial configuration I need to drag and drop the storage > icon and use the same label as the management traffic > > > Personally I can't see how 2) will help ie. the only time I should need to > assign a label to storage is if I use a different NIC. > > > Apologies for bringing this up again but am happy to run any tests and > would like to file accurate bug report. > > > > > > > > From: Rafael Weingärtner > Sent: 11 June 2018 10:58 > To: users > Subject: Re: advanced networking with public IPs direct to VMs > > Well, it seems that you have found a bug. Can you fill out an issue report > on Github? > > Thanks for the hard work on debugging and testing. > > On Fri, Jun 8, 2018 at 2:17 PM, Jon Marshall > wrote: > > > So based on Erik's suggestion (thanks Erik) I rebuilt the management > > server and setup cluster wide primary storage as opposed to zone wide > which > > I have been using so far. > > > > > > Still using 3 NICs (management/Guest/storage) and basic networking. > > > > > > And VM HA now works. In addition it failed over quicker than it di
Re: advanced networking with public IPs direct to VMs
Hi Rafael I did log a bug but when rebuilding I found some slightly different behaviour so have temporarily removed it. So using cluster NFS and 3 NICs as already described VM HA works. Because the recommendation for basic network setup seems to be run storage/management over the same NIC and guest on another, so 2 NICs in total, I set it up this way using cluster NFS and to my surprise VM HA did not work so it is obviously a bit more complicated than it first appeared. My NFS server is on a different subnet than the management server and when I set it up in the UI because the storage traffic runs over the management NIC by default I did not assign a label to the storage traffic, ie. I only assigned labels to management and guest. So two thoughts occur which I can test unless you can see the issue - 1) if I want to run the management/storage traffic over the same NIC the NFS server needs to be in the management subnet or 2) when I do the initial configuration I need to drag and drop the storage icon and use the same label as the management traffic Personally I can't see how 2) will help ie. the only time I should need to assign a label to storage is if I use a different NIC. Apologies for bringing this up again but am happy to run any tests and would like to file accurate bug report. From: Rafael Weingärtner Sent: 11 June 2018 10:58 To: users Subject: Re: advanced networking with public IPs direct to VMs Well, it seems that you have found a bug. Can you fill out an issue report on Github? Thanks for the hard work on debugging and testing. On Fri, Jun 8, 2018 at 2:17 PM, Jon Marshall wrote: > So based on Erik's suggestion (thanks Erik) I rebuilt the management > server and setup cluster wide primary storage as opposed to zone wide which > I have been using so far. > > > Still using 3 NICs (management/Guest/storage) and basic networking. > > > And VM HA now works. In addition it failed over quicker than it did when I > had zone wide NFS storage on a single NIC. > > > Still a bit confused about this output where it is still showing the > storage_ip_addresses as 172.30.3.x IPs which is the management subnet but > maybe I am reading it incorrectly. > > > > mysql> select * from cloud.host; > ++-+ > --++++-- > ---+-++- > +-+--+-- > -+---++- > --+-++-- > --+++-+--+-- > -+-+-+-- > ---+++--+--- > ---+++--+--- > +---+--- > +-+++--- > --+-+-+--+-- > --+---+-+--+ > | id | name| uuid | status | > type | private_ip_address | private_netmask | > private_mac_address | storage_ip_address | storage_netmask | > storage_mac_address | storage_ip_address_2 | storage_mac_address_2 | > storage_netmask_2 | cluster_id | public_ip_address | public_netmask | > public_mac_address | proxy_port | data_center_id | pod_id | cpu_sockets | > cpus | speed | url | fs_type | > hypervisor_type | hypervisor_version | ram| resource | version | > parent | total_size | capabilities | guid > | available | setup | dom0_memory | last_ping | > mgmt_server_id | disconnected| created | removed | > update_count | resource_state | owner | lastUpdated | engine_state | > ++-+ > --++++-- > ---+-++- > +-+--+-- > -+---++- > --+-++-- > --+++-+--+-- > -+-+-+-- > ---+++--+--- > ---+++--+--- > +---+--- > +-+++--- > --+-+
Re: 4.11 without Host-HA framework
Hi Parth Just in case you have not seen my other thread, it turns out that all this time it has been a bug. Using multiple NICs with basic networking and using zone wide NFS VM HA just does not work. If you change to cluster wide NFS then it works fine (and quite quickly as well :)) I am now going to setup Host HA and see make sure that all works as well using cluster NFS. Got there in the end :) Jon From: Parth Patel Sent: 24 May 2018 06:52 To: users@cloudstack.apache.org Subject: Re: 4.11 without Host-HA framework Hi Jon and Angus, I did not shutdown the VMs as Yiping Zhang said, but I have confirmed this and discussed earlier in the users list that my HA-enabled VMs got started on another suitable available host in the cluster even when I didn't have IPMI-enabled hardware and did no configuration for OOBM and Host-HA. I simply pulled the ethernet cable connecting the host to entire network (I did use just one NIC) and according to the value set in ping timeout event, the HA-enabled VMs were restarted on another available host. I tested the scenario using both the scenarios: the echo command as well as good old plugging out the NIC from the host. My VMs were successfully started on another available host after CS manager confirmed they were not reachable. I too want to understand how the failover mechanism in CloudStack actually works. I used ACS 4.11 packages available here: http://cloudstack.apt-get.eu/centos/7/4.11/ Regards, Parth Patel On Thu, 24 May 2018 at 10:53 Paul Angus wrote: > I'm afraid that is not a host crash. When shutting down the guest OS, the > CloudStack agent on the host is still able to report to the management > server that the VM has stopped. > > This is my point. VM-HA relies on the management sever communication with > the host agent. > > Kind regards, > > Paul Angus > > paul.an...@shapeblue.com > www.shapeblue.com<http://www.shapeblue.com> > 53 Chandos Place, Covent Garden, London WC2N 4HSUK > @shapeblue > > > > > -Original Message- > From: Yiping Zhang > Sent: 24 May 2018 00:44 > To: users@cloudstack.apache.org > Subject: Re: 4.11 without Host-HA framework > > I can say for fact that VM's using a HA enabled service offering will be > restarted by CS on another host, assuming there are enough > capacity/resources in the cluster, when their original host crashes, > regardless that host comes back or not. > > The simplest way to test VM HA feature with a VM instance using HA enabled > service offering is to issue shutdown command in guest OS, and watching it > gets restarted by CS manager. > > On 5/23/18, 1:23 PM, "Paul Angus" wrote: > > Hi Jon, > > Don't worry, TBH I'm dubious about those claiming to have VM-HA > working when a host crashes (but doesn't restart). > I'll check in with the guys that set values for host-ha when testing, > to see which ones they change and what they set them to. > > paul.an...@shapeblue.com > www.shapeblue.com<http://www.shapeblue.com> > 53 Chandos Place, Covent Garden, London WC2N 4HSUK > @shapeblue > > > > > -Original Message- > From: Jon Marshall > Sent: 23 May 2018 21:10 > To: users@cloudstack.apache.org > Subject: Re: 4.11 without Host-HA framework > > Rohit / Paul > > > Thanks again for answering. > > > I am a Cisco guy with an ex Unix background but no virtualisation > experience and I can honestly say I have never felt this stupid before 😊 > > > I have Cloudstack working but failover is killing me. > > > When you say VM HA relies on the host telling CS the VM is down how > does that work because if you crash the host how does it tell CS anything ? > And when you say tell CS do you mean the CS manager ? > > > I guess I am just not understanding all the moving parts. I have had > HOST HA working (to an extent) although it takes a long time to failover > even after tweaking the timers but the fact that I keep finding references > to people saying even without HOST HA it should failover (and mine doesn't) > makes me think I have configured it incorrectly somewhere along the line. > > > I have configured a compute offering with HA and I am crashing the > host with the echo command as suggested but still nothing. > > > I understand what you are saying Paul about it not being a good idea > to rely on VM HA so I will go back to Host HA and try to speed up failover > times. > > > Can I ask, from your experiences, what is a realistic fail over time > for CS ie. if a host fails for example ? > > > Jon > > > > > _
Re: advanced networking with public IPs direct to VMs
Hi Rafael I don't have a github account but can setup one up and do a report sometime this week if that is okay ? No problem with the testing and thanks for the help. Before I leave this if I use NFS cluster mode couple of questions - 1) if I run management and storage over same interface the NFS server can still be on a different subnet than the management subnet ie. the NFS server does not have to have IP from the management subnet ? 2) If i add another cluster can I just create a different NFS share from the same server ? Finally many thanks to you and the others for the help provided. From: Rafael Weingärtner Sent: 11 June 2018 10:58 To: users Subject: Re: advanced networking with public IPs direct to VMs Well, it seems that you have found a bug. Can you fill out an issue report on Github? Thanks for the hard work on debugging and testing. On Fri, Jun 8, 2018 at 2:17 PM, Jon Marshall wrote: > So based on Erik's suggestion (thanks Erik) I rebuilt the management > server and setup cluster wide primary storage as opposed to zone wide which > I have been using so far. > > > Still using 3 NICs (management/Guest/storage) and basic networking. > > > And VM HA now works. In addition it failed over quicker than it did when I > had zone wide NFS storage on a single NIC. > > > Still a bit confused about this output where it is still showing the > storage_ip_addresses as 172.30.3.x IPs which is the management subnet but > maybe I am reading it incorrectly. > > > > mysql> select * from cloud.host; > ++-+ > --++++-- > ---+-++- > +-+--+-- > -+---++- > --+-++-- > --+++-+--+-- > -+-+-+-- > ---+++--+--- > ---+++--+--- > +---+--- > +-+++--- > --+-+-+--+-- > --+---+-+--+ > | id | name| uuid | status | > type | private_ip_address | private_netmask | > private_mac_address | storage_ip_address | storage_netmask | > storage_mac_address | storage_ip_address_2 | storage_mac_address_2 | > storage_netmask_2 | cluster_id | public_ip_address | public_netmask | > public_mac_address | proxy_port | data_center_id | pod_id | cpu_sockets | > cpus | speed | url | fs_type | > hypervisor_type | hypervisor_version | ram| resource | version | > parent | total_size | capabilities | guid > | available | setup | dom0_memory | last_ping | > mgmt_server_id | disconnected| created | removed | > update_count | resource_state | owner | lastUpdated | engine_state | > ++-+ > --++++-- > ---+-++- > +-+--+-- > -+---++- > --+-++-- > --+++-+--+-- > -+-+-+-- > ---+++--+--- > ---+++--+--- > +---+--- > +-+++--- > --+-+-+--+-- > --+---+-+--+ > | 1 | dcp-cscn1.local | 372c738c-5370-4b46-9358-14b649c73d6b | Up | > Routing| 172.30.3.3 | 255.255.255.192 | > 00:22:19:92:4e:34 | 172.30.3.3 | 255.255.255.192 | > 00:22:19:92:4e:34 | NULL | NULL | NULL > | 1 | 172.30.4.3| 255.255.255.128 | > 00:22:19:92:4e:35 | NULL | 1 | 1 | 1 | > 2 | 3000 | iqn.1994-05.com.redhat:fa437fb0c023 | NULL| KVM > | NULL | 7510159360 | NULL | 4.11.0.0 | NULL | >NULL | hvm,snapshot | > 9f2b15cb-1b75-321b-bf59-f83e7a5e8efb-Libvir
Re: advanced networking with public IPs direct to VMs
00:00:15 | NULL | NULL | NULL | NULL | 172.30.4.62 | 255.255.255.128 | 1e:00:01:00:00:63 | NULL | 1 | 1 |NULL | NULL | NULL | NoIqn | NULL| NULL| NULL | 0 | NULL | 4.11.0.0 | NULL | NULL | NULL | Proxy.2-ConsoleProxyResource | 1 | 0 | 0 | 1492635804 | 146457912294 | 2018-06-08 11:57:31 | 2018-06-08 11:22:03 | NULL|7 | Enabled | NULL | NULL| Disabled | | 4 | dcp-cscn2.local | 935260eb-a80c-4ead-85d7-3df8212e301b | Down | Routing| 172.30.3.4 | 255.255.255.192 | 00:26:b9:4a:97:7d | 172.30.3.4 | 255.255.255.192 | 00:26:b9:4a:97:7d | NULL | NULL | NULL | 1 | 172.30.4.4 | 255.255.255.128 | 00:26:b9:4a:97:7e | NULL | 1 | 1 | 1 |2 | 3000 | iqn.1994-05.com.redhat:e9b4aa7e7881 | NULL| KVM | NULL | 7510159360 | NULL | 4.11.0.0 | NULL | NULL | hvm,snapshot | 40e58399-fc7a-3a59-8f48-16d0f99b11c9-LibvirtComputingResource | 1 | 0 | 0 | 1492635804 | NULL | 2018-06-08 11:57:31 | 2018-06-08 11:35:07 | NULL|7 | Enabled| NULL | NULL | Disabled | | 5 | dcp-cscn3.local | f3cabc9e-9679-4d7e-8297-b6765eea2770 | Up | Routing| 172.30.3.5 | 255.255.255.192 | 00:24:e8:73:6a:b2 | 172.30.3.5 | 255.255.255.192 | 00:24:e8:73:6a:b2 | NULL | NULL | NULL | 1 | 172.30.4.5 | 255.255.255.128 | 00:24:e8:73:6a:b3 | NULL | 1 | 1 | 1 |2 | 3000 | iqn.1994-05.com.redhat:ccdce43aff1c | NULL| KVM | NULL | 7510159360 | NULL | 4.11.0.0 | NULL | NULL | hvm,snapshot | 10bb1c01-0e92-3108-8209-37f3eebad8fb-LibvirtComputingResource | 1 | 0 | 0 | 1492635804 | 146457912294 | 2018-06-08 11:57:31 | 2018-06-08 11:36:27 | NULL|4 | Enabled| NULL | NULL | Disabled | ++-+--++++-+-++-+-+--+---+---++---+-+++++-+--+---+-+-+-+++--+--+++--+---+---+---+-+++-+-+-+--++---+-+--+ 5 rows in set (0.00 sec) mysql> So some sort of bug maybe ? From: Erik Weber Sent: 08 June 2018 10:15 To: users@cloudstack.apache.org Subject: Re: advanced networking with public IPs direct to VMs While someone ponders about the zone wide storage, you could try adding a cluster wide nfs storage and see if it the rest works in that setup. Erik On Thu, Jun 7, 2018 at 11:49 AM Jon Marshall wrote: > Yes, all basic. I read a Shapeblue doc that recommended splitting traffic > across multiple NICs even in basic networking mode so that is what I am > trying to do. > > > With single NIC you do not get the NFS storage message. > > > I have the entire management server logs for both scenarios after I pulled > the power to one of the compute nodes but from the single NIC setup these > seem to be the relevant lines - > > > 2018-06-04 10:17:10,972 DEBUG [c.c.n.NetworkUsageManagerImpl] > (AgentTaskPool-3:ctx-8627b348) (logid:ef7b8230) Disconnected called on 4 > with status Down > 2018-06-04 10:17:10,972 DEBUG [c.c.h.Status] > (AgentTaskPool-3:ctx-8627b348) (logid:ef7b8230) Transition:[Resource state > = Enabled, Agent event = HostDown, Host id = 4, name = dcp-cscn2.local] > 2018-06-04 10:17:10,981 WARN [o.a.c.alerts] > (AgentTaskPool-3:ctx-8627b348) (logid:ef7b8230) AlertType:: 7 | > dataCenterId:: 1 | podId:: 1 | clusterId:: null | message:: Host is down, > name: dcp-cscn2.local (id:4), availability zone: dcpz1, pod: dcp1 > 2018-06-04 10:17:11,000 DEBUG [c.c.h.CheckOnAgentInvestigator] > (HA-Worker-1:ctx-f763f12f work-17) (logid:77c56778) Unable to reach the > agent for VM[User|i-2-6-VM]: Resource [Host:4] is unreachable: Host 4: Host > with specified id is not in the right state: Down > 2018-06
Re: advanced networking with public IPs direct to VMs
Yes, all basic. I read a Shapeblue doc that recommended splitting traffic across multiple NICs even in basic networking mode so that is what I am trying to do. With single NIC you do not get the NFS storage message. I have the entire management server logs for both scenarios after I pulled the power to one of the compute nodes but from the single NIC setup these seem to be the relevant lines - 2018-06-04 10:17:10,972 DEBUG [c.c.n.NetworkUsageManagerImpl] (AgentTaskPool-3:ctx-8627b348) (logid:ef7b8230) Disconnected called on 4 with status Down 2018-06-04 10:17:10,972 DEBUG [c.c.h.Status] (AgentTaskPool-3:ctx-8627b348) (logid:ef7b8230) Transition:[Resource state = Enabled, Agent event = HostDown, Host id = 4, name = dcp-cscn2.local] 2018-06-04 10:17:10,981 WARN [o.a.c.alerts] (AgentTaskPool-3:ctx-8627b348) (logid:ef7b8230) AlertType:: 7 | dataCenterId:: 1 | podId:: 1 | clusterId:: null | message:: Host is down, name: dcp-cscn2.local (id:4), availability zone: dcpz1, pod: dcp1 2018-06-04 10:17:11,000 DEBUG [c.c.h.CheckOnAgentInvestigator] (HA-Worker-1:ctx-f763f12f work-17) (logid:77c56778) Unable to reach the agent for VM[User|i-2-6-VM]: Resource [Host:4] is unreachable: Host 4: Host with specified id is not in the right state: Down 2018-06-04 10:17:11,006 DEBUG [c.c.h.KVMInvestigator] (AgentTaskPool-2:ctx-a6f6dbd1) (logid:774553ff) Neighbouring host:5 returned status:Down for the investigated host:4 2018-06-04 10:17:11,006 DEBUG [c.c.h.KVMInvestigator] (AgentTaskPool-2:ctx-a6f6dbd1) (logid:774553ff) HA: HOST is ineligible legacy state Down for host 4 2018-06-04 10:17:11,006 DEBUG [c.c.h.HighAvailabilityManagerImpl] (AgentTaskPool-2:ctx-a6f6dbd1) (logid:774553ff) KVMInvestigator was able to determine host 4 is in Down 2018-06-04 10:17:11,006 INFO [c.c.a.m.AgentManagerImpl] (AgentTaskPool-2:ctx-a6f6dbd1) (logid:774553ff) The agent from host 4 state determined is Down 2018-06-04 10:17:11,006 ERROR [c.c.a.m.AgentManagerImpl] (AgentTaskPool-2:ctx-a6f6dbd1) (logid:774553ff) Host is down: 4-dcp-cscn2.local. Starting HA on the VMs At the moment I only need to assign public IPs direct to VMs rather than using NAT with the virtual router but would be happy to go with advanced networking if it would make things easier :) From: Rafael Weingärtner Sent: 07 June 2018 10:35 To: users Subject: Re: advanced networking with public IPs direct to VMs Ah so, it is not an advanced setup; even when you use multiple NICs. Can you confirm that the message ""Agent investigation was requested on host, but host does not support investigation because it has no NFS storage. Skipping investigation." does not appear when you use a single NIC? Can you check other log entries that might appear when the host is marked as "down"? On Thu, Jun 7, 2018 at 6:30 AM, Jon Marshall wrote: > It is all basic networking at the moment for all the setups. > > > If you want me to I can setup a single NIC solution again and run any > commands you need me to do. > > > FYI when I setup single NIC I use the guided installtion option in the UI > rather than manual setup which I do for the multiple NIC scenario. > > > Happy to set it up if it helps. > > > > > > From: Rafael Weingärtner > Sent: 07 June 2018 10:23 > To: users > Subject: Re: advanced networking with public IPs direct to VMs > > Ok, so that explains the log message. This is looking like a bug to me. It > seems that in Zone wide the host state (when disconnected) is not being > properly identified due to this NFS thing, and as a consequency it has a > side effect in VM HA. > > We would need some inputs from guys that have advanced networking > deployments and Zone wide storage. > > I do not see how the all in one NIC deployment scenario is working though. > This method "com.cloud.ha.KVMInvestigator.isAgentAlive(Host)" is dead > simple, if there is no NFS in the cluster (NFS storage pools found for a > host's cluster), KVM hosts will be detected as "disconnected" and not down > with that warning message you noticed. > > When you say "all in one NIC", is it an advanced network deployment where > you put all traffic in a single network, or is it a basic networking that > you are doing? > > On Thu, Jun 7, 2018 at 6:06 AM, Jon Marshall > wrote: > > > zone wide. > > > > > > > > From: Rafael Weingärtner > > Sent: 07 June 2018 10:04 > > To: users > > Subject: Re: advanced networking with public IPs direct to VMs > > > > What type of storage are you using? Zone wide? Or c
Re: advanced networking with public IPs direct to VMs
It is all basic networking at the moment for all the setups. If you want me to I can setup a single NIC solution again and run any commands you need me to do. FYI when I setup single NIC I use the guided installtion option in the UI rather than manual setup which I do for the multiple NIC scenario. Happy to set it up if it helps. From: Rafael Weingärtner Sent: 07 June 2018 10:23 To: users Subject: Re: advanced networking with public IPs direct to VMs Ok, so that explains the log message. This is looking like a bug to me. It seems that in Zone wide the host state (when disconnected) is not being properly identified due to this NFS thing, and as a consequency it has a side effect in VM HA. We would need some inputs from guys that have advanced networking deployments and Zone wide storage. I do not see how the all in one NIC deployment scenario is working though. This method "com.cloud.ha.KVMInvestigator.isAgentAlive(Host)" is dead simple, if there is no NFS in the cluster (NFS storage pools found for a host's cluster), KVM hosts will be detected as "disconnected" and not down with that warning message you noticed. When you say "all in one NIC", is it an advanced network deployment where you put all traffic in a single network, or is it a basic networking that you are doing? On Thu, Jun 7, 2018 at 6:06 AM, Jon Marshall wrote: > zone wide. > > > > From: Rafael Weingärtner > Sent: 07 June 2018 10:04 > To: users > Subject: Re: advanced networking with public IPs direct to VMs > > What type of storage are you using? Zone wide? Or cluster "wide" storage? > > On Thu, Jun 7, 2018 at 4:25 AM, Jon Marshall > wrote: > > > Rafael > > > > > > Here is the output as requested - > > > > > > > > mysql> mysql> select * from cloud.storage_pool where removed is null; > > ++--+--+ > > ---+--++++-- > > --++--+---+- > > +-+-+-+- > > ---+---+---++--- > > --+---+ > > | id | name | uuid | pool_type | > > port | data_center_id | pod_id | cluster_id | used_bytes | > capacity_bytes | > > host_address | user_info | path| created | > removed > > | update_time | status | storage_provider_name | scope | hypervisor | > > managed | capacity_iops | > > ++--+--+ > > ---+--++++-- > > --++--+---+- > > +-+-+-+- > > ---+---+---++--- > > --+---+ > > | 1 | ds1 | a234224f-05fb-3f4c-9b0f-c51ebdf9a601 | NetworkFilesystem | > > 2049 | 1 | NULL | NULL | 6059720704 | > 79133933568 | > > 172.30.5.2 | NULL | /export/primary | 2018-06-05 13:45:01 | NULL > > | NULL| Up | DefaultPrimary| ZONE | KVM| > > 0 | NULL | > > ++--+--+ > > ---+--++++-- > > --++--+---+- > > +-+-+-+- > > ---+---+---++--- > > --+---+ > > 1 row in set (0.00 sec) > > > > mysql> > > > > Do you think this problem is related to my NIC/bridge configuration or > the > > way I am configuring the zone ? > > > > Jon > > > > From: Rafael Weingärtner > > Sent: 07 June 2018 06:45 > > To: users > > Subject: Re: advanced networking with public IPs direct to VMs > > > > Can you also post the result of: > > select * from cloud.storage_pool where removed is null > > > > On Wed, Jun 6, 2018 at 3:06 PM, Dag Sonstebo > > > wrote: > > > > > Hi Jon, > > > > > > Still confused where your primary storage pools are – are you sure your > > > hosts are in cluster 1? > > > > > > Quick question just to make sure - assuming management/storage is on > the > > > same NIC when I setup basic networking the physical network has the > > > management and guest icons already there and I just edit the KVM >
Re: advanced networking with public IPs direct to VMs
zone wide. From: Rafael Weingärtner Sent: 07 June 2018 10:04 To: users Subject: Re: advanced networking with public IPs direct to VMs What type of storage are you using? Zone wide? Or cluster "wide" storage? On Thu, Jun 7, 2018 at 4:25 AM, Jon Marshall wrote: > Rafael > > > Here is the output as requested - > > > > mysql> mysql> select * from cloud.storage_pool where removed is null; > ++--+--+ > ---+--++++-- > --++--+---+- > +-+-+-+- > ---+---+---++--- > --+---+ > | id | name | uuid | pool_type | > port | data_center_id | pod_id | cluster_id | used_bytes | capacity_bytes | > host_address | user_info | path| created | removed > | update_time | status | storage_provider_name | scope | hypervisor | > managed | capacity_iops | > ++--+--+ > ---+--++++-- > --++--+---+- > +-+-+-+- > ---+---+---++--- > --+---+ > | 1 | ds1 | a234224f-05fb-3f4c-9b0f-c51ebdf9a601 | NetworkFilesystem | > 2049 | 1 | NULL | NULL | 6059720704 |79133933568 | > 172.30.5.2 | NULL | /export/primary | 2018-06-05 13:45:01 | NULL > | NULL| Up | DefaultPrimary| ZONE | KVM| > 0 | NULL | > ++--+--+ > ---+--++++-- > --++--+---+- > +-+-+-+- > ---+---+---++--- > --+---+ > 1 row in set (0.00 sec) > > mysql> > > Do you think this problem is related to my NIC/bridge configuration or the > way I am configuring the zone ? > > Jon > > From: Rafael Weingärtner > Sent: 07 June 2018 06:45 > To: users > Subject: Re: advanced networking with public IPs direct to VMs > > Can you also post the result of: > select * from cloud.storage_pool where removed is null > > On Wed, Jun 6, 2018 at 3:06 PM, Dag Sonstebo > wrote: > > > Hi Jon, > > > > Still confused where your primary storage pools are – are you sure your > > hosts are in cluster 1? > > > > Quick question just to make sure - assuming management/storage is on the > > same NIC when I setup basic networking the physical network has the > > management and guest icons already there and I just edit the KVM labels. > If > > I am running storage over management do I need to drag the storage icon > to > > the physical network and use the same KVM label (cloudbr0) as the > > management or does CS automatically just use the management NIC ie. I > would > > only need to drag the storage icon across in basic setup if I wanted it > on > > a different NIC/IP subnet ? (hope that makes sense !) > > > > >> I would do both – set up your 2/3 physical networks, name isn’t that > > important – but then drag the traffic types to the correct one and make > > sure the labels are correct. > > Regards, > > Dag Sonstebo > > Cloud Architect > > ShapeBlue > > > > On 06/06/2018, 12:39, "Jon Marshall" wrote: > > > > Dag > > > > > > Do you mean check the pools with "Infrastructure -> Primary Storage" > > and "Infrastructure -> Secondary Storage" within the UI ? > > > > > > If so Primary Storage has a state of UP, secondary storage does not > > show a state as such so not sure where else to check it ? > > > > > > Rerun of the command - > > > > mysql> select * from cloud.storage_pool where cluster_id = 1; > > Empty set (0.00 sec) > > > > mysql> > > > > I think it is something to do with my zone creation rather than the > > NIC, bridge setup although I can post those if needed. > > > > I may try to setup just the 2 NIC solution you mentioned although as > I > > say I had the same issue with that ie. host goes to "Altert" state and > same > > error messages. The only time I can get it to go to "Do
Re: advanced networking with public IPs direct to VMs
Rafael Here is the output as requested - mysql> mysql> select * from cloud.storage_pool where removed is null; ++--+--+---+--++++++--+---+-+-+-+-++---+---++-+---+ | id | name | uuid | pool_type | port | data_center_id | pod_id | cluster_id | used_bytes | capacity_bytes | host_address | user_info | path| created | removed | update_time | status | storage_provider_name | scope | hypervisor | managed | capacity_iops | ++--+--+---+--++++++--+---+-+-+-+-++---+---++-+---+ | 1 | ds1 | a234224f-05fb-3f4c-9b0f-c51ebdf9a601 | NetworkFilesystem | 2049 | 1 | NULL | NULL | 6059720704 |79133933568 | 172.30.5.2 | NULL | /export/primary | 2018-06-05 13:45:01 | NULL| NULL| Up | DefaultPrimary| ZONE | KVM| 0 | NULL | ++--+--+---+--++++++--+---+-+-+-+-++---+---++-+---+ 1 row in set (0.00 sec) mysql> Do you think this problem is related to my NIC/bridge configuration or the way I am configuring the zone ? Jon From: Rafael Weingärtner Sent: 07 June 2018 06:45 To: users Subject: Re: advanced networking with public IPs direct to VMs Can you also post the result of: select * from cloud.storage_pool where removed is null On Wed, Jun 6, 2018 at 3:06 PM, Dag Sonstebo wrote: > Hi Jon, > > Still confused where your primary storage pools are – are you sure your > hosts are in cluster 1? > > Quick question just to make sure - assuming management/storage is on the > same NIC when I setup basic networking the physical network has the > management and guest icons already there and I just edit the KVM labels. If > I am running storage over management do I need to drag the storage icon to > the physical network and use the same KVM label (cloudbr0) as the > management or does CS automatically just use the management NIC ie. I would > only need to drag the storage icon across in basic setup if I wanted it on > a different NIC/IP subnet ? (hope that makes sense !) > > >> I would do both – set up your 2/3 physical networks, name isn’t that > important – but then drag the traffic types to the correct one and make > sure the labels are correct. > Regards, > Dag Sonstebo > Cloud Architect > ShapeBlue > > On 06/06/2018, 12:39, "Jon Marshall" wrote: > > Dag > > > Do you mean check the pools with "Infrastructure -> Primary Storage" > and "Infrastructure -> Secondary Storage" within the UI ? > > > If so Primary Storage has a state of UP, secondary storage does not > show a state as such so not sure where else to check it ? > > > Rerun of the command - > > mysql> select * from cloud.storage_pool where cluster_id = 1; > Empty set (0.00 sec) > > mysql> > > I think it is something to do with my zone creation rather than the > NIC, bridge setup although I can post those if needed. > > I may try to setup just the 2 NIC solution you mentioned although as I > say I had the same issue with that ie. host goes to "Altert" state and same > error messages. The only time I can get it to go to "Down" state is when > it is all on the single NIC. > > Quick question just to make sure - assuming management/storage is on > the same NIC when I setup basic networking the physical network has the > management and guest icons already there and I just edit the KVM labels. If > I am running storage over management do I need to drag the storage icon to > the physical network and use the same KVM label (cloudbr0) as the > management or does CS automatically just use the management NIC ie. I would > only need to drag the storage icon across in basic setup if I wanted it on > a different NIC/IP subnet ? (hope that makes sense !) > > On the plus side I have been at this for so long now and done so many > rebuilds I could do it in my sleep now 😊 > > >
Re: advanced networking with public IPs direct to VMs
Dag Am not an SQL expert by any means but does this not show hosts are in cluster 1 - mysql> select name, cluster_id from cloud.host; +-++ | name| cluster_id | +-++ | dcp-cscn1.local | 1 | | v-2-VM | NULL | | s-1-VM | NULL | | dcp-cscn2.local | 1 | | dcp-cscn3.local | 1 | +-++ 5 rows in set (0.00 sec) mysql> I only have one cluster and those are the hosts I am using. Jon From: Dag Sonstebo Sent: 06 June 2018 19:06 To: users@cloudstack.apache.org Subject: Re: advanced networking with public IPs direct to VMs Hi Jon, Still confused where your primary storage pools are – are you sure your hosts are in cluster 1? Quick question just to make sure - assuming management/storage is on the same NIC when I setup basic networking the physical network has the management and guest icons already there and I just edit the KVM labels. If I am running storage over management do I need to drag the storage icon to the physical network and use the same KVM label (cloudbr0) as the management or does CS automatically just use the management NIC ie. I would only need to drag the storage icon across in basic setup if I wanted it on a different NIC/IP subnet ? (hope that makes sense !) >> I would do both – set up your 2/3 physical networks, name isn’t that >> important – but then drag the traffic types to the correct one and make sure >> the labels are correct. Regards, Dag Sonstebo Cloud Architect ShapeBlue On 06/06/2018, 12:39, "Jon Marshall" wrote: Dag Do you mean check the pools with "Infrastructure -> Primary Storage" and "Infrastructure -> Secondary Storage" within the UI ? If so Primary Storage has a state of UP, secondary storage does not show a state as such so not sure where else to check it ? Rerun of the command - mysql> select * from cloud.storage_pool where cluster_id = 1; Empty set (0.00 sec) mysql> I think it is something to do with my zone creation rather than the NIC, bridge setup although I can post those if needed. I may try to setup just the 2 NIC solution you mentioned although as I say I had the same issue with that ie. host goes to "Altert" state and same error messages. The only time I can get it to go to "Down" state is when it is all on the single NIC. Quick question just to make sure - assuming management/storage is on the same NIC when I setup basic networking the physical network has the management and guest icons already there and I just edit the KVM labels. If I am running storage over management do I need to drag the storage icon to the physical network and use the same KVM label (cloudbr0) as the management or does CS automatically just use the management NIC ie. I would only need to drag the storage icon across in basic setup if I wanted it on a different NIC/IP subnet ? (hope that makes sense !) On the plus side I have been at this for so long now and done so many rebuilds I could do it in my sleep now 😊 From: Dag Sonstebo Sent: 06 June 2018 12:28 To: users@cloudstack.apache.org Subject: Re: advanced networking with public IPs direct to VMs Looks OK to me Jon. The one thing that throws me is your storage pools – can you rerun your query: select * from cloud.storage_pool where cluster_id = 1; Do the pools show up as online in the CloudStack GUI? Regards, Dag Sonstebo Cloud Architect ShapeBlue On 06/06/2018, 12:08, "Jon Marshall" wrote: Don't know whether this helps or not but I logged into the SSVM and ran an ifconfig - eth0: flags=4163 mtu 1500 inet 169.254.3.35 netmask 255.255.0.0 broadcast 169.254.255.255 ether 0e:00:a9:fe:03:23 txqueuelen 1000 (Ethernet) RX packets 141 bytes 20249 (19.7 KiB) RX errors 0 dropped 0 overruns 0 frame 0 TX packets 108 bytes 16287 (15.9 KiB) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 eth1: flags=4163 mtu 1500 inet 172.30.3.34 netmask 255.255.255.192 broadcast 172.30.3.63 ether 1e:00:3b:00:00:05 txqueuelen 1000 (Ethernet) RX packets 56722 bytes 4953133 (4.7 MiB) RX errors 0 dropped 44573 overruns 0 frame 0 TX packets 11224 bytes 1234932 (1.1 MiB) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 eth2: flags=4163 mtu 1500 inet 172.30.4.86 netmask 255.255.255.128 broadcast 172.30.4.127 ether 1e:00:d9:00:00:53 txqueuelen 1000 (Ethernet) RX packets 366191 bytes 435
Re: advanced networking with public IPs direct to VMs
Dag Do you mean check the pools with "Infrastructure -> Primary Storage" and "Infrastructure -> Secondary Storage" within the UI ? If so Primary Storage has a state of UP, secondary storage does not show a state as such so not sure where else to check it ? Rerun of the command - mysql> select * from cloud.storage_pool where cluster_id = 1; Empty set (0.00 sec) mysql> I think it is something to do with my zone creation rather than the NIC, bridge setup although I can post those if needed. I may try to setup just the 2 NIC solution you mentioned although as I say I had the same issue with that ie. host goes to "Altert" state and same error messages. The only time I can get it to go to "Down" state is when it is all on the single NIC. Quick question just to make sure - assuming management/storage is on the same NIC when I setup basic networking the physical network has the management and guest icons already there and I just edit the KVM labels. If I am running storage over management do I need to drag the storage icon to the physical network and use the same KVM label (cloudbr0) as the management or does CS automatically just use the management NIC ie. I would only need to drag the storage icon across in basic setup if I wanted it on a different NIC/IP subnet ? (hope that makes sense !) On the plus side I have been at this for so long now and done so many rebuilds I could do it in my sleep now 😊 From: Dag Sonstebo Sent: 06 June 2018 12:28 To: users@cloudstack.apache.org Subject: Re: advanced networking with public IPs direct to VMs Looks OK to me Jon. The one thing that throws me is your storage pools – can you rerun your query: select * from cloud.storage_pool where cluster_id = 1; Do the pools show up as online in the CloudStack GUI? Regards, Dag Sonstebo Cloud Architect ShapeBlue On 06/06/2018, 12:08, "Jon Marshall" wrote: Don't know whether this helps or not but I logged into the SSVM and ran an ifconfig - eth0: flags=4163 mtu 1500 inet 169.254.3.35 netmask 255.255.0.0 broadcast 169.254.255.255 ether 0e:00:a9:fe:03:23 txqueuelen 1000 (Ethernet) RX packets 141 bytes 20249 (19.7 KiB) RX errors 0 dropped 0 overruns 0 frame 0 TX packets 108 bytes 16287 (15.9 KiB) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 eth1: flags=4163 mtu 1500 inet 172.30.3.34 netmask 255.255.255.192 broadcast 172.30.3.63 ether 1e:00:3b:00:00:05 txqueuelen 1000 (Ethernet) RX packets 56722 bytes 4953133 (4.7 MiB) RX errors 0 dropped 44573 overruns 0 frame 0 TX packets 11224 bytes 1234932 (1.1 MiB) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 eth2: flags=4163 mtu 1500 inet 172.30.4.86 netmask 255.255.255.128 broadcast 172.30.4.127 ether 1e:00:d9:00:00:53 txqueuelen 1000 (Ethernet) RX packets 366191 bytes 435300557 (415.1 MiB) RX errors 0 dropped 39456 overruns 0 frame 0 TX packets 145065 bytes 7978602 (7.6 MiB) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 eth3: flags=4163 mtu 1500 inet 172.30.5.14 netmask 255.255.255.240 broadcast 172.30.5.15 ether 1e:00:cb:00:00:1a txqueuelen 1000 (Ethernet) RX packets 132440 bytes 426362982 (406.6 MiB) RX errors 0 dropped 39446 overruns 0 frame 0 TX packets 67443 bytes 423670834 (404.0 MiB) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 lo: flags=73 mtu 65536 inet 127.0.0.1 netmask 255.0.0.0 loop txqueuelen 1 (Local Loopback) RX packets 18 bytes 1440 (1.4 KiB) RX errors 0 dropped 0 overruns 0 frame 0 TX packets 18 bytes 1440 (1.4 KiB) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 so it has interfaces in both the management and the storage subnets (as well as guest). From: Jon Marshall Sent: 06 June 2018 11:08 To: users@cloudstack.apache.org Subject: Re: advanced networking with public IPs direct to VMs Hi Rafael Thanks for the help, really appreciate it. So rerunning that command with all servers up - mysql> select * from cloud.storage_pool where cluster_id = 1 and removed is null; Empty set (0.00 sec) mysql> As for the storage IP no I'm not setting it to be the management IP when I setup the zone but the output of the SQL command suggests that is what has happened. As I said to Dag I am using a different subnet for storage ie. 172.30.3.0/26 - management subnet 172.30.4.0/25 - guest VM subnet 172.30.5.0/28 - storage the NFS
Re: advanced networking with public IPs direct to VMs
Don't know whether this helps or not but I logged into the SSVM and ran an ifconfig - eth0: flags=4163 mtu 1500 inet 169.254.3.35 netmask 255.255.0.0 broadcast 169.254.255.255 ether 0e:00:a9:fe:03:23 txqueuelen 1000 (Ethernet) RX packets 141 bytes 20249 (19.7 KiB) RX errors 0 dropped 0 overruns 0 frame 0 TX packets 108 bytes 16287 (15.9 KiB) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 eth1: flags=4163 mtu 1500 inet 172.30.3.34 netmask 255.255.255.192 broadcast 172.30.3.63 ether 1e:00:3b:00:00:05 txqueuelen 1000 (Ethernet) RX packets 56722 bytes 4953133 (4.7 MiB) RX errors 0 dropped 44573 overruns 0 frame 0 TX packets 11224 bytes 1234932 (1.1 MiB) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 eth2: flags=4163 mtu 1500 inet 172.30.4.86 netmask 255.255.255.128 broadcast 172.30.4.127 ether 1e:00:d9:00:00:53 txqueuelen 1000 (Ethernet) RX packets 366191 bytes 435300557 (415.1 MiB) RX errors 0 dropped 39456 overruns 0 frame 0 TX packets 145065 bytes 7978602 (7.6 MiB) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 eth3: flags=4163 mtu 1500 inet 172.30.5.14 netmask 255.255.255.240 broadcast 172.30.5.15 ether 1e:00:cb:00:00:1a txqueuelen 1000 (Ethernet) RX packets 132440 bytes 426362982 (406.6 MiB) RX errors 0 dropped 39446 overruns 0 frame 0 TX packets 67443 bytes 423670834 (404.0 MiB) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 lo: flags=73 mtu 65536 inet 127.0.0.1 netmask 255.0.0.0 loop txqueuelen 1 (Local Loopback) RX packets 18 bytes 1440 (1.4 KiB) RX errors 0 dropped 0 overruns 0 frame 0 TX packets 18 bytes 1440 (1.4 KiB) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 so it has interfaces in both the management and the storage subnets (as well as guest). From: Jon Marshall Sent: 06 June 2018 11:08 To: users@cloudstack.apache.org Subject: Re: advanced networking with public IPs direct to VMs Hi Rafael Thanks for the help, really appreciate it. So rerunning that command with all servers up - mysql> select * from cloud.storage_pool where cluster_id = 1 and removed is null; Empty set (0.00 sec) mysql> As for the storage IP no I'm not setting it to be the management IP when I setup the zone but the output of the SQL command suggests that is what has happened. As I said to Dag I am using a different subnet for storage ie. 172.30.3.0/26 - management subnet 172.30.4.0/25 - guest VM subnet 172.30.5.0/28 - storage the NFS server IP is 172.30.5.2 each compute node has 3 NICs with an IP from each subnet (i am assuming the management node only needs an IP in the management network ?) When I add the zone in the UI I have one physical network with management (cloudbr0), guest (cloudbr1) and storage (cloudbr2). When I fill in the storage traffic page I use the range 172.16.5.10 - 14 as free IPs as I exclude the ones already allocated to the compute nodes and the NFS server. I think maybe I am doing something wrong in the UI setup but it is not obvious to me what it is. What I might try today unless you want me to keep the setup I have for more outputs is to go back to 2 NICs, one for storage/management and one for guest VMs. I think with the 2 NICs setup the mistake I made last time when adding the zone was to assume storage would just run over management so I did not drag and drop the storage icon and assign it to cloudbr0 as with the management which I think is what I should do ? From: Rafael Weingärtner Sent: 06 June 2018 10:54 To: users Subject: Re: advanced networking with public IPs direct to VMs Jon, do not panic we are here to help you :) So, I might have mistyped the SQL query. You you use select * from cloud.storage_pool where cluster_id = 1 and removed is not null ", you are listing the storage pools removed. Therefore, the right query would be " select * from cloud.storage_pool where cluster_id = 1 and removed is null " There is also something else I do not understand. You are setting the storage IP in the management subnet? I am not sure if you should be doing like this. Normally, I set all my storages (primary[when working with NFS] and secondary) to IPs in the storage subnet. On Wed, Jun 6, 2018 at 6:49 AM, Dag Sonstebo wrote: > Hi John, > > I’m late to this thread and have possibly missed some things – but a > couple of observations: > > “When I add the zone and get to the storage web page I exclude the IPs > already used for the compute node NICs and the NFS server itself. …..” > “So the range is 172.30.5.1 -> 15 and the range I fill in is 172.30.5.10 > -> 172.30.5.14
Re: advanced networking with public IPs direct to VMs
Hi Rafael Thanks for the help, really appreciate it. So rerunning that command with all servers up - mysql> select * from cloud.storage_pool where cluster_id = 1 and removed is null; Empty set (0.00 sec) mysql> As for the storage IP no I'm not setting it to be the management IP when I setup the zone but the output of the SQL command suggests that is what has happened. As I said to Dag I am using a different subnet for storage ie. 172.30.3.0/26 - management subnet 172.30.4.0/25 - guest VM subnet 172.30.5.0/28 - storage the NFS server IP is 172.30.5.2 each compute node has 3 NICs with an IP from each subnet (i am assuming the management node only needs an IP in the management network ?) When I add the zone in the UI I have one physical network with management (cloudbr0), guest (cloudbr1) and storage (cloudbr2). When I fill in the storage traffic page I use the range 172.16.5.10 - 14 as free IPs as I exclude the ones already allocated to the compute nodes and the NFS server. I think maybe I am doing something wrong in the UI setup but it is not obvious to me what it is. What I might try today unless you want me to keep the setup I have for more outputs is to go back to 2 NICs, one for storage/management and one for guest VMs. I think with the 2 NICs setup the mistake I made last time when adding the zone was to assume storage would just run over management so I did not drag and drop the storage icon and assign it to cloudbr0 as with the management which I think is what I should do ? From: Rafael Weingärtner Sent: 06 June 2018 10:54 To: users Subject: Re: advanced networking with public IPs direct to VMs Jon, do not panic we are here to help you :) So, I might have mistyped the SQL query. You you use select * from cloud.storage_pool where cluster_id = 1 and removed is not null ", you are listing the storage pools removed. Therefore, the right query would be " select * from cloud.storage_pool where cluster_id = 1 and removed is null " There is also something else I do not understand. You are setting the storage IP in the management subnet? I am not sure if you should be doing like this. Normally, I set all my storages (primary[when working with NFS] and secondary) to IPs in the storage subnet. On Wed, Jun 6, 2018 at 6:49 AM, Dag Sonstebo wrote: > Hi John, > > I’m late to this thread and have possibly missed some things – but a > couple of observations: > > “When I add the zone and get to the storage web page I exclude the IPs > already used for the compute node NICs and the NFS server itself. …..” > “So the range is 172.30.5.1 -> 15 and the range I fill in is 172.30.5.10 > -> 172.30.5.14.” > > I think you may have some confusion around the use of the storage network. > The important part here is to understand this is for *secondary storage* > use only – it has nothing to do with primary storage. This means this > storage network needs to be accessible to the SSVM, to the hypervisors, and > secondary storage NFS pools needs to be accessible on this network. > > The important part – this also means you *can not use the same IP ranges > for management and storage networks* - doing so means you will have issues > where effectively both hypervisors and SSVM can see the same subnet on two > NICs – and you end up in a routing black hole. > > So – you need to either: > > 1) Use different IP subnets on management and storage, or > 2) preferably just simplify your setup – stop using a secondary storage > network altogether and just allow secondary storage to use the management > network (which is default). Unless you have a very high I/O environment in > production you are just adding complexity by running separate management > and storage. > > Regards, > Dag Sonstebo > Cloud Architect > ShapeBlue > > On 06/06/2018, 10:18, "Jon Marshall" wrote: > > I will disconnect the host this morning and test but before I do that > I ran this command when all hosts are up - > > > > > > select * from cloud.host; > ++-+ > --++++-- > ---+-++- > +-+--+-- > -+---++- > --+-++-- > --+++-+--+-- > -+-+-+-- > ---+++--+--- > ---+++--+--- > +---+--- > +-++--
Re: advanced networking with public IPs direct to VMs
Hi Dag Thanks for joining in. I did use a separate network for management (172.30.3.0/27) and storage (172.30.5.0/28) when I configured the zone it is just for some reason it is not referencing the 172.30.5.x subnet anywhere in the SQL output. My compute nodes have 3 NICs, one for management, one for guest VM traffic and one for storage, all different subnets and in different vlans on the switch. I also set it up with two NICs just as you suggested with storage/management on one NIC and guest traffic on the other NIC and I got exactly the same result ie. host in "Alert" state and this from logs - 2018-06-04 12:53:45,853 WARN [c.c.h.KVMInvestigator] (AgentTaskPool-3:ctx-0aed2673) (logid:32aaef2a) Agent investigation was requested on host Host[-2-Routing], but host does not support investigation because it has no NFS storage. Skipping investigation. 2018-06-04 12:53:45,854 DEBUG [c.c.h.HighAvailabilityManagerImpl] (AgentTaskPool-3:ctx-0aed2673) (logid:32aaef2a) KVMInvestigator was able to determine host 2 is in Disconnected 2018-06-04 12:53:45,854 INFO [c.c.a.m.AgentManagerImpl] (AgentTaskPool-3:ctx-0aed2673) (logid:32aaef2a) The agent from host 2 state determined is Disconnected 2018-06-04 12:53:45,854 WARN [c.c.a.m.AgentManagerImpl] (AgentTaskPool-3:ctx-0aed2673) (logid:32aaef2a) Agent is disconnected but the host is still up: 2-dcp-cscn2.local 2018-06-04 12:53:45,854 WARN [o.a.c.alerts] (AgentTaskPool-3:ctx-0aed2673) (logid:32aaef2a) AlertType:: 7 | dataCenterId:: 1 | podId:: 1 | clusterId:: null | message:: Host disconnected, name: dcp-cscn2.local (id:2), availability zone: dcp1, pod: dcpp1 the only difference was when I configured the zone I did not have to configure cloudbr2 (for storage) and did not enter any storage traffic IP subnet range. I know it is something stupid I am doing 😊 From: Dag Sonstebo Sent: 06 June 2018 10:49 To: users@cloudstack.apache.org Subject: Re: advanced networking with public IPs direct to VMs Hi John, I’m late to this thread and have possibly missed some things – but a couple of observations: “When I add the zone and get to the storage web page I exclude the IPs already used for the compute node NICs and the NFS server itself. …..” “So the range is 172.30.5.1 -> 15 and the range I fill in is 172.30.5.10 -> 172.30.5.14.” I think you may have some confusion around the use of the storage network. The important part here is to understand this is for *secondary storage* use only – it has nothing to do with primary storage. This means this storage network needs to be accessible to the SSVM, to the hypervisors, and secondary storage NFS pools needs to be accessible on this network. The important part – this also means you *can not use the same IP ranges for management and storage networks* - doing so means you will have issues where effectively both hypervisors and SSVM can see the same subnet on two NICs – and you end up in a routing black hole. So – you need to either: 1) Use different IP subnets on management and storage, or 2) preferably just simplify your setup – stop using a secondary storage network altogether and just allow secondary storage to use the management network (which is default). Unless you have a very high I/O environment in production you are just adding complexity by running separate management and storage. Regards, Dag Sonstebo Cloud Architect ShapeBlue On 06/06/2018, 10:18, "Jon Marshall" wrote: I will disconnect the host this morning and test but before I do that I ran this command when all hosts are up - select * from cloud.host; ++-+--++++-+-++-+-+--+---+---++---+-+++++-+--+---+-+-+-+++--+--+++--+---+---+---+-+++-+-+-+--++---+-+--+ | id | name| uuid | status | type | private_ip_address | private_netmask | private_mac_address | storage_ip_address | storage_netmask | storage_mac_address | storage_ip_address_2 | storage_mac_address_2 | storage_netmask_2 | cluster_id | public_ip_address | public_netmask | public_mac_address | proxy_port | data_center_id | pod_id | cpu_sockets | cpus | speed | url |
Re: advanced networking with public IPs direct to VMs
Select * from storage_pool where cluster_id = and removed > is not null > Can you run that SQL to see the its return when your hosts are marked as disconnected? On Tue, Jun 5, 2018 at 11:32 AM, Jon Marshall wrote: > I reran the tests with the 3 NIC setup. When I configured the zone through > the UI I used the labels cloudbr0 for management, cloudbr1 for guest > traffic and cloudbr2 for NFS as per my original response to you. > > > When I pull the power to the node (dcp-cscn2.local) after about 5 mins > the host status goes to "Alert" but never to "Down" > > > I get this in the logs - > > > 2018-06-05 15:17:14,382 WARN [c.c.h.KVMInvestigator] > (AgentTaskPool-1:ctx-f4da4dc9) (logid:138e9a93) Agent investigation was > requested on host Host[-4-Routing], but host does not support investigation > because it has no NFS storage. Skipping investigation. > 2018-06-05 15:17:14,382 DEBUG [c.c.h.HighAvailabilityManagerImpl] > (AgentTaskPool-1:ctx-f4da4dc9) (logid:138e9a93) KVMInvestigator was able to > determine host 4 is in Disconnected > 2018-06-05 15:17:14,382 INFO [c.c.a.m.AgentManagerImpl] > (AgentTaskPool-1:ctx-f4da4dc9) (logid:138e9a93) The agent from host 4 state > determined is Disconnected > 2018-06-05 15:17:14,382 WARN [c.c.a.m.AgentManagerImpl] > (AgentTaskPool-1:ctx-f4da4dc9) (logid:138e9a93) Agent is disconnected but > the host is still up: 4-dcp-cscn2.local > > I don't understand why it thinks there is no NFS storage as each compute > node has a dedicated storage NIC. > > > I also don't understand why it thinks the host is still up ie. what test > is it doing to determine that ? > > > Am I just trying to get something working that is not supported ? > > > > From: Rafael Weingärtner > Sent: 04 June 2018 15:31 > To: users > Subject: Re: advanced networking with public IPs direct to VMs > > What type of failover are you talking about? > What ACS version are you using? > What hypervisor are you using? > How are you configuring your NICs in the hypervisor? > How are you configuring the traffic labels in ACS? > > On Mon, Jun 4, 2018 at 11:29 AM, Jon Marshall > wrote: > > > Hi all > > > > > > I am close to giving up on basic networking as I just cannot get failover > > working with multiple NICs (I am not even sure it is supported). > > > > > > What I would like is to use 3 NICs for management, storage and guest > > traffic. I would like to assign public IPs direct to the VMs which is > why I > > originally chose basic. > > > > > > If I switch to advanced networking do I just configure a guest VM with > > public IPs on one NIC and not both with the public traffic - > > > > > > would this work ? > > > > > > -- > Rafael Weingärtner > -- Rafael Weingärtner
Re: advanced networking with public IPs direct to VMs
No problem. I am leaving work now but will test first thing tomorrow and get back to you. I definitely have NFS storage as far as I can tell ! From: Rafael Weingärtner Sent: 05 June 2018 16:13 To: users Subject: Re: advanced networking with public IPs direct to VMs That is interesting. Let's see the source of all truth... This is the code that is generating that odd message. > List clusterPools = > _storagePoolDao.listPoolsByCluster(agent.getClusterId()); > boolean hasNfs = false; > for (StoragePoolVO pool : clusterPools) { > if (pool.getPoolType() == StoragePoolType.NetworkFilesystem) { > hasNfs = true; > break; > } > } > if (!hasNfs) { > s_logger.warn( > "Agent investigation was requested on host " + agent + > ", but host does not support investigation because it has no NFS storage. > Skipping investigation."); > return Status.Disconnected; > } > There are two possibilities here. You do not have any NFS storage? Is that the case? Or maybe, for some reason, the call "_storagePoolDao.listPoolsByCluster(agent.getClusterId())" is not returning any NFS storage pools. Looking at the "listPoolsByCluster " we will see that the following SQL is used: Select * from storage_pool where cluster_id = and removed > is not null > Can you run that SQL to see the its return when your hosts are marked as disconnected? On Tue, Jun 5, 2018 at 11:32 AM, Jon Marshall wrote: > I reran the tests with the 3 NIC setup. When I configured the zone through > the UI I used the labels cloudbr0 for management, cloudbr1 for guest > traffic and cloudbr2 for NFS as per my original response to you. > > > When I pull the power to the node (dcp-cscn2.local) after about 5 mins > the host status goes to "Alert" but never to "Down" > > > I get this in the logs - > > > 2018-06-05 15:17:14,382 WARN [c.c.h.KVMInvestigator] > (AgentTaskPool-1:ctx-f4da4dc9) (logid:138e9a93) Agent investigation was > requested on host Host[-4-Routing], but host does not support investigation > because it has no NFS storage. Skipping investigation. > 2018-06-05 15:17:14,382 DEBUG [c.c.h.HighAvailabilityManagerImpl] > (AgentTaskPool-1:ctx-f4da4dc9) (logid:138e9a93) KVMInvestigator was able to > determine host 4 is in Disconnected > 2018-06-05 15:17:14,382 INFO [c.c.a.m.AgentManagerImpl] > (AgentTaskPool-1:ctx-f4da4dc9) (logid:138e9a93) The agent from host 4 state > determined is Disconnected > 2018-06-05 15:17:14,382 WARN [c.c.a.m.AgentManagerImpl] > (AgentTaskPool-1:ctx-f4da4dc9) (logid:138e9a93) Agent is disconnected but > the host is still up: 4-dcp-cscn2.local > > I don't understand why it thinks there is no NFS storage as each compute > node has a dedicated storage NIC. > > > I also don't understand why it thinks the host is still up ie. what test > is it doing to determine that ? > > > Am I just trying to get something working that is not supported ? > > > > From: Rafael Weingärtner > Sent: 04 June 2018 15:31 > To: users > Subject: Re: advanced networking with public IPs direct to VMs > > What type of failover are you talking about? > What ACS version are you using? > What hypervisor are you using? > How are you configuring your NICs in the hypervisor? > How are you configuring the traffic labels in ACS? > > On Mon, Jun 4, 2018 at 11:29 AM, Jon Marshall > wrote: > > > Hi all > > > > > > I am close to giving up on basic networking as I just cannot get failover > > working with multiple NICs (I am not even sure it is supported). > > > > > > What I would like is to use 3 NICs for management, storage and guest > > traffic. I would like to assign public IPs direct to the VMs which is > why I > > originally chose basic. > > > > > > If I switch to advanced networking do I just configure a guest VM with > > public IPs on one NIC and not both with the public traffic - > > > > > > would this work ? > > > > > > -- > Rafael Weingärtner > -- Rafael Weingärtner
Re: advanced networking with public IPs direct to VMs
I reran the tests with the 3 NIC setup. When I configured the zone through the UI I used the labels cloudbr0 for management, cloudbr1 for guest traffic and cloudbr2 for NFS as per my original response to you. When I pull the power to the node (dcp-cscn2.local) after about 5 mins the host status goes to "Alert" but never to "Down" I get this in the logs - 2018-06-05 15:17:14,382 WARN [c.c.h.KVMInvestigator] (AgentTaskPool-1:ctx-f4da4dc9) (logid:138e9a93) Agent investigation was requested on host Host[-4-Routing], but host does not support investigation because it has no NFS storage. Skipping investigation. 2018-06-05 15:17:14,382 DEBUG [c.c.h.HighAvailabilityManagerImpl] (AgentTaskPool-1:ctx-f4da4dc9) (logid:138e9a93) KVMInvestigator was able to determine host 4 is in Disconnected 2018-06-05 15:17:14,382 INFO [c.c.a.m.AgentManagerImpl] (AgentTaskPool-1:ctx-f4da4dc9) (logid:138e9a93) The agent from host 4 state determined is Disconnected 2018-06-05 15:17:14,382 WARN [c.c.a.m.AgentManagerImpl] (AgentTaskPool-1:ctx-f4da4dc9) (logid:138e9a93) Agent is disconnected but the host is still up: 4-dcp-cscn2.local I don't understand why it thinks there is no NFS storage as each compute node has a dedicated storage NIC. I also don't understand why it thinks the host is still up ie. what test is it doing to determine that ? Am I just trying to get something working that is not supported ? From: Rafael Weingärtner Sent: 04 June 2018 15:31 To: users Subject: Re: advanced networking with public IPs direct to VMs What type of failover are you talking about? What ACS version are you using? What hypervisor are you using? How are you configuring your NICs in the hypervisor? How are you configuring the traffic labels in ACS? On Mon, Jun 4, 2018 at 11:29 AM, Jon Marshall wrote: > Hi all > > > I am close to giving up on basic networking as I just cannot get failover > working with multiple NICs (I am not even sure it is supported). > > > What I would like is to use 3 NICs for management, storage and guest > traffic. I would like to assign public IPs direct to the VMs which is why I > originally chose basic. > > > If I switch to advanced networking do I just configure a guest VM with > public IPs on one NIC and not both with the public traffic - > > > would this work ? > -- Rafael Weingärtner
Re: advanced networking with public IPs direct to VMs
I think I do know what it means. Let me build it with 3 separate NICs again an rerun. From: Rafael Weingärtner Sent: 04 June 2018 15:31 To: users Subject: Re: advanced networking with public IPs direct to VMs What type of failover are you talking about? What ACS version are you using? What hypervisor are you using? How are you configuring your NICs in the hypervisor? How are you configuring the traffic labels in ACS? On Mon, Jun 4, 2018 at 11:29 AM, Jon Marshall wrote: > Hi all > > > I am close to giving up on basic networking as I just cannot get failover > working with multiple NICs (I am not even sure it is supported). > > > What I would like is to use 3 NICs for management, storage and guest > traffic. I would like to assign public IPs direct to the VMs which is why I > originally chose basic. > > > If I switch to advanced networking do I just configure a guest VM with > public IPs on one NIC and not both with the public traffic - > > > would this work ? > -- Rafael Weingärtner
Re: advanced networking with public IPs direct to VMs
Update to this. I ran the all on one NIC test again and it does report as "Down" in the UI as opposed to "Alert" when using multiple NICs. Looking at the management server log this seems to be the key part - 1) from the single NIC logs - 2018-06-04 10:17:10,967 DEBUG [c.c.h.KVMInvestigator] (AgentTaskPool-3:ctx-8627b348) (logid:ef7b8230) Neighbouring host:5 returned status:Down for the investigated host:4 2018-06-04 10:17:10,967 DEBUG [c.c.h.KVMInvestigator] (AgentTaskPool-3:ctx-8627b348) (logid:ef7b8230) HA: HOST is ineligible legacy state Down for host 4 2018-06-04 10:17:10,967 DEBUG [c.c.h.HighAvailabilityManagerImpl] (AgentTaskPool-3:ctx-8627b348) (logid:ef7b8230) KVMInvestigator was able to determine host 4 is in Down 2018-06-04 10:17:10,967 INFO [c.c.a.m.AgentManagerImpl] (AgentTaskPool-3:ctx-8627b348) (logid:ef7b8230) The agent from host 4 state determined is Down 2018-06-04 10:17:10,967 ERROR [c.c.a.m.AgentManagerImpl] (AgentTaskPool-3:ctx-8627b348) (logid:ef7b8230) Host is down: 4-dcp-cscn2.local. Starting HA on the VMs 2) from the setup with 2 NICs (managemnet/storage on one NIC, guest traffic on the other) - 2018-06-04 12:53:45,853 WARN [c.c.h.KVMInvestigator] (AgentTaskPool-3:ctx-0aed2673) (logid:32aaef2a) Agent investigation was requested on host Host[-2-Routing], but host does not support investigation because it has no NFS storage. Skipping investigation. 2018-06-04 12:53:45,854 DEBUG [c.c.h.HighAvailabilityManagerImpl] (AgentTaskPool-3:ctx-0aed2673) (logid:32aaef2a) KVMInvestigator was able to determine host 2 is in Disconnected 2018-06-04 12:53:45,854 INFO [c.c.a.m.AgentManagerImpl] (AgentTaskPool-3:ctx-0aed2673) (logid:32aaef2a) The agent from host 2 state determined is Disconnected 2018-06-04 12:53:45,854 WARN [c.c.a.m.AgentManagerImpl] (AgentTaskPool-3:ctx-0aed2673) (logid:32aaef2a) Agent is disconnected but the host is still up: 2-dcp-cscn2.local 2018-06-04 12:53:45,854 WARN [o.a.c.alerts] (AgentTaskPool-3:ctx-0aed2673) (logid:32aaef2a) AlertType:: 7 | dataCenterId:: 1 | podId:: 1 | clusterId:: null | message:: Host disconnected, name: dcp-cscn2.local (id:2), availability zone: dcp1, pod: dcpp1 2018-06-04 12:53:45,858 INFO [c.c.a.m.AgentManagerImpl] (AgentTaskPool-3:ctx-0aed2673) (logid:32aaef2a) Host 2 is disconnecting with event AgentDisconnected 2018-06-04 12:53:45,858 DEBUG [c.c.a.m.AgentManagerImpl] (AgentTaskPool-3:ctx-0aed2673) (logid:32aaef2a) The next status of agent 2is Alert, current status is Up 2018-06-04 12:53:45,858 DEBUG [c.c.a.m.AgentManagerImpl] (AgentTaskPool-3:ctx-0aed2673) (logid:32aaef2a) Deregistering link for 2 with state Alert 2018-06-04 12:53:45,859 DEBUG [c.c.a.m.AgentManagerImpl] (AgentTaskPool-3:ctx-0aed2673) (logid:32aaef2a) Remove Agent : 2 I don't know what it means by host has no NFS storage but you can see it never marks the failed node as down. Any ideas ? From: Rafael Weingärtner Sent: 04 June 2018 21:15 To: users Subject: Re: advanced networking with public IPs direct to VMs Everything seems to be normal at a first glance. Do you see some sort of error in the log files? On Mon, Jun 4, 2018 at 11:39 AM, Jon Marshall wrote: > CS version 4.11 > > VM HA at the moment (not Host HA as yet) > > KVM > > > For the management node just one NIC - 172.30.3.2/26 assigned to physical > NIC. > > > For the compute nodes - > > > 3 NICs so as an example from one compute node - > > > ifcfg-eth0 > > BRIDGE=cloudbr0 > > > ifcfg-eth1 > > BRIDGE=cloudbr1 > > > ifcfg-eth2 > > BRIDGE=cloudbr2 > > > then the 3 bridges - > > > ifcfg-cloudbr0 > > ip address 172.30.3.3/26<--- management network > > > if-cloudbr1 > > ip address 172.30.4.3/25 <-- guest traffic > > gateway 172.30.4.1 > > > > ifcfg-cloubr2 > > ip address 172.30.5.3 /28 <-- storage traffic > > > traffic labels would be cloudbr0, cloudbr1, cloudbr2 > > > Can only get failover working when I put all traffic on same NIC. > > > > > From: Rafael Weingärtner > Sent: 04 June 2018 15:31 > To: users > Subject: Re: advanced networking with public IPs direct to VMs > > What type of failover are you talking about? > What version are you using? > What hypervisor are you using? > How are you configuring your NICs in the hypervisor? > How are you configuring the traffic labels in ACS? > > On Mon, Jun 4, 2018 at 11:29 AM, Jon Marshall > wrote: > > > Hi all > > > > > > I am close to giving up on basic networking as I just cannot get failover > > working with multiple NICs (I am not even sure it is supported). > > > > > > What I would like is to use 3 NICs for management, storage and guest >
Re: advanced networking with public IPs direct to VMs
No, watching the management server logs when I pull the power on one of the compute nodes it recognises the host is not responding to a ping and eventually marks the host status as "Alert" in the UI but it never tries to migrate the VMs that was running on the node. >From memory when I put everything on one NIC (management, storage and guest >traffic) the host status is marked as "Down" not alert which makes me think >there is something not supported with multiple NICs and failover. It is almost as though with multiple NICs the manager knows that there is a problem with the node but cannot definitely say it is down and so it cannot migrate the VM in case it is still running on that node. I have been at this for well over a month now (off and on) and apart from when I used a single NIC VM HA has never worked. If the configuration I have posted looks okay then maybe it is just not supported unless of course you know differently ? I did think it may be the default gateway being set to the guest VM subnet but if I don't do this then the SSVM has issues with communication. I am going to do a side by side comparison of the management server logs for single NIC vs dual NICs (management/storage on one NIC, the other NIC for guest VMs) and see if there is anything obvious that stands out. That aside if I can't get this working then can I just assign a public IP subnet to the guest VM when setting up advanced networking and if so how does it then in effect bypass the virtual router (in terms of NAT) or do I not need to worry about this ? Thanks From: Rafael Weingärtner Sent: 04 June 2018 21:15 To: users Subject: Re: advanced networking with public IPs direct to VMs Everything seems to be normal at a first glance. Do you see some sort of error in the log files? On Mon, Jun 4, 2018 at 11:39 AM, Jon Marshall wrote: > CS version 4.11 > > VM HA at the moment (not Host HA as yet) > > KVM > > > For the management node just one NIC - 172.30.3.2/26 assigned to physical > NIC. > > > For the compute nodes - > > > 3 NICs so as an example from one compute node - > > > ifcfg-eth0 > > BRIDGE=cloudbr0 > > > ifcfg-eth1 > > BRIDGE=cloudbr1 > > > ifcfg-eth2 > > BRIDGE=cloudbr2 > > > then the 3 bridges - > > > ifcfg-cloudbr0 > > ip address 172.30.3.3/26<--- management network > > > if-cloudbr1 > > ip address 172.30.4.3/25 <-- guest traffic > > gateway 172.30.4.1 > > > > ifcfg-cloubr2 > > ip address 172.30.5.3 /28 <-- storage traffic > > > traffic labels would be cloudbr0, cloudbr1, cloudbr2 > > > Can only get failover working when I put all traffic on same NIC. > > > > > From: Rafael Weingärtner > Sent: 04 June 2018 15:31 > To: users > Subject: Re: advanced networking with public IPs direct to VMs > > What type of failover are you talking about? > What version are you using? > What hypervisor are you using? > How are you configuring your NICs in the hypervisor? > How are you configuring the traffic labels in ACS? > > On Mon, Jun 4, 2018 at 11:29 AM, Jon Marshall > wrote: > > > Hi all > > > > > > I am close to giving up on basic networking as I just cannot get failover > > working with multiple NICs (I am not even sure it is supported). > > > > > > What I would like is to use 3 NICs for management, storage and guest > > traffic. I would like to assign public IPs direct to the VMs which is > why I > > originally chose basic. > > > > > > If I switch to advanced networking do I just configure a guest VM with > > public IPs on one NIC and not both with the public traffic - > > > > > > would this work ? > > > > > > -- > Rafael Weingärtner > -- Rafael Weingärtner
Re: advanced networking with public IPs direct to VMs
CS version 4.11 VM HA at the moment (not Host HA as yet) KVM For the management node just one NIC - 172.30.3.2/26 assigned to physical NIC. For the compute nodes - 3 NICs so as an example from one compute node - ifcfg-eth0 BRIDGE=cloudbr0 ifcfg-eth1 BRIDGE=cloudbr1 ifcfg-eth2 BRIDGE=cloudbr2 then the 3 bridges - ifcfg-cloudbr0 ip address 172.30.3.3/26<--- management network if-cloudbr1 ip address 172.30.4.3/25 <-- guest traffic gateway 172.30.4.1 ifcfg-cloubr2 ip address 172.30.5.3 /28 <-- storage traffic traffic labels would be cloudbr0, cloudbr1, cloudbr2 Can only get failover working when I put all traffic on same NIC. From: Rafael Weingärtner Sent: 04 June 2018 15:31 To: users Subject: Re: advanced networking with public IPs direct to VMs What type of failover are you talking about? What version are you using? What hypervisor are you using? How are you configuring your NICs in the hypervisor? How are you configuring the traffic labels in ACS? On Mon, Jun 4, 2018 at 11:29 AM, Jon Marshall wrote: > Hi all > > > I am close to giving up on basic networking as I just cannot get failover > working with multiple NICs (I am not even sure it is supported). > > > What I would like is to use 3 NICs for management, storage and guest > traffic. I would like to assign public IPs direct to the VMs which is why I > originally chose basic. > > > If I switch to advanced networking do I just configure a guest VM with > public IPs on one NIC and not both with the public traffic - > > > would this work ? > -- Rafael Weingärtner
Re: advanced networking with public IPs direct to VMs
Sorry that should say "not bother with the public traffic" ____ From: Jon Marshall Sent: 04 June 2018 15:29 To: users@cloudstack.apache.org Subject: advanced networking with public IPs direct to VMs Hi all I am close to giving up on basic networking as I just cannot get failover working with multiple NICs (I am not even sure it is supported). What I would like is to use 3 NICs for management, storage and guest traffic. I would like to assign public IPs direct to the VMs which is why I originally chose basic. If I switch to advanced networking do I just configure a guest VM with public IPs on one NIC and not both with the public traffic - would this work ?
advanced networking with public IPs direct to VMs
Hi all I am close to giving up on basic networking as I just cannot get failover working with multiple NICs (I am not even sure it is supported). What I would like is to use 3 NICs for management, storage and guest traffic. I would like to assign public IPs direct to the VMs which is why I originally chose basic. If I switch to advanced networking do I just configure a guest VM with public IPs on one NIC and not both with the public traffic - would this work ?
Re: 4.11 without Host-HA framework
As mentioned if I use just the one NIC for all traffic then VM HA works. I have been using this document to understand CS network concepts - https://www.shapeblue.com/understanding-cloudstacks-physical-networking-architecture/ I have been assuming that the manager node only needs an interface in the management network and it is the compute nodes that I have split the traffic across 3 NICs as per the above doc. Does the manager need NICs in the other networks as well ? Jon From: Paul Angus Sent: 25 May 2018 07:37 To: users@cloudstack.apache.org Subject: RE: 4.11 without Host-HA framework I'm on leave next week, but I'll pick this up again when I'm back ... paul.an...@shapeblue.com www.shapeblue.com<http://www.shapeblue.com> 53 Chandos Place, Covent Garden, London WC2N 4HSUK @shapeblue -Original Message----- From: Jon Marshall Sent: 24 May 2018 11:20 To: users@cloudstack.apache.org Subject: Re: 4.11 without Host-HA framework Hi Parth I remember you saying this worked for you in a previous thread. I am beginning to wonder if it is the fact I have used 3 separate NICs, one for management, one for the VM traffic and the third for storage that I am not seeing the behaviour you saw. That is why, I too would like to understand exactly what is talking to what and doing checks for both non Host-HA and Host-HA. I did get failover working in some scenarios with Host-HA and OOBM using IPMI but it was slow even after tweaking the timers eg. for a crashed host the best time i got was around 8 minutes which seems a long time but perhaps that is an acceptable time for CS, I just don't know. Not expecting it to be instantaneous as it needs to do checks etc. Jon From: Parth Patel Sent: 24 May 2018 06:52 To: users@cloudstack.apache.org Subject: Re: 4.11 without Host-HA framework Hi Jon and Angus, I did not shutdown the VMs as Yiping Zhang said, but I have confirmed this and discussed earlier in the users list that my HA-enabled VMs got started on another suitable available host in the cluster even when I didn't have IPMI-enabled hardware and did no configuration for OOBM and Host-HA. I simply pulled the ethernet cable connecting the host to entire network (I did use just one NIC) and according to the value set in ping timeout event, the HA-enabled VMs were restarted on another available host. I tested the scenario using both the scenarios: the echo command as well as good old plugging out the NIC from the host. My VMs were successfully started on another available host after CS manager confirmed they were not reachable. I too want to understand how the failover mechanism in CloudStack actually works. I used ACS 4.11 packages available here: http://cloudstack.apt-get.eu/centos/7/4.11/ Regards, Parth Patel On Thu, 24 May 2018 at 10:53 Paul Angus wrote: > I'm afraid that is not a host crash. When shutting down the guest OS, > the CloudStack agent on the host is still able to report to the > management server that the VM has stopped. > > This is my point. VM-HA relies on the management sever communication > with the host agent. > > Kind regards, > > Paul Angus > > paul.an...@shapeblue.com > www.shapeblue.com<http://www.shapeblue.com> > 53 Chandos Place, Covent Garden, London WC2N 4HSUK @shapeblue > > > > > -Original Message- > From: Yiping Zhang > Sent: 24 May 2018 00:44 > To: users@cloudstack.apache.org > Subject: Re: 4.11 without Host-HA framework > > I can say for fact that VM's using a HA enabled service offering will be > restarted by CS on another host, assuming there are enough > capacity/resources in the cluster, when their original host crashes, > regardless that host comes back or not. > > The simplest way to test VM HA feature with a VM instance using HA enabled > service offering is to issue shutdown command in guest OS, and watching it > gets restarted by CS manager. > > On 5/23/18, 1:23 PM, "Paul Angus" wrote: > > Hi Jon, > > Don't worry, TBH I'm dubious about those claiming to have VM-HA > working when a host crashes (but doesn't restart). > I'll check in with the guys that set values for host-ha when testing, > to see which ones they change and what they set them to. > > paul.an...@shapeblue.com > www.shapeblue.com<http://www.shapeblue.com> > 53 Chandos Place, Covent Garden, London WC2N 4HSUK > @shapeblue > > > > > -Original Message- > From: Jon Marshall > Sent: 23 May 2018 21:10 > To: users@cloudstack.apache.org > Subject: Re: 4.11 without Host-HA framework > > Rohit / Paul > > > Thanks again for answering. > > > I am a Cisco guy w
Re: 4.11 without Host-HA framework
Update on this. I put everything (management, storage and guest VMs) on single NIC so all in same subnet and VM HA failover worked. It took about 6 1/2 minutes with default timers before the VM was responding to a ping after being migrated. So it looks like it is something with the network setup I am doing. The manager node hast just a single NIC in the management subnet - 172.30.3.0/27 and the IP is assigned directly to the NIC. Each compute node has - 1) a NIC in the management subnet - 172.30.3.0/27 2) a NIC in the guest VM subnet - 172.30.4.0/25 3) a NIC in the storage subnet - 172.30.5.0/28 (NFS server is also in this subnet) None of the NICs are vlan aware but the ports they connect to on the switch are in different vlans. 3 bridges are used on each node - cloudbr0 for management, cloudbr1 for guest VMs and cloudbr2 for storage. Only the ifcfg-cloudbr1 configuration references a default gateway because I read somewhere that is what should be used and I seemed to remember I had trouble with SSVM until I did this. When setting up the cloud I exclude the already used management IPs on the nodes from the range you enter as I had issues with the system VMs picking up IPs already in use. Same reasoning behind storage ie. I exclude IPs already used for NFS server and compute nodes. Can anyone see where any of the above could be causing an issue ? Many thanks for any help given From: Rohit Yadav Sent: 23 May 2018 10:45 To: users@cloudstack.apache.org Subject: Re: 4.11 without Host-HA framework Jon, In the VM's compute offering, make sure that HA is ticked/enabled. Then use that HA-enabled VM offering while deploying a VM. Around testing - it depends how you're crashing. In case of KVM, you can try to cause host crash (example: echo c > /proc/sysrq-trigger) and see if HA-enabled VMs gets started on a different host. - Rohit <https://cloudstack.apache.org> [https://cloudstack.apache.org/images/monkey-144.png]<https://cloudstack.apache.org/> Apache CloudStack: Open Source Cloud Computing<https://cloudstack.apache.org/> cloudstack.apache.org CloudStack is open source cloud computing software for creating, managing, and deploying infrastructure cloud services ____ From: Jon Marshall Sent: Tuesday, May 22, 2018 8:28:06 PM To: users@cloudstack.apache.org Subject: Re: 4.11 without Host-HA framework Hi Rohit Thanks for responding. I have not had much luck with HA at all. I crash a server and nothing happens in terms of VMs migrating to another host. Monitoring the management log file it seems the management server recognises the host has stopped responding to pings but doesn't think it has to do anything. I am currently running v4.11 with basic network but 3 separate NICs, one for management, one for storage and one for VMs themselves. Should it make it any difference ie. would it be worth trying to run management and storage over the same NIC ? I am just lost as to why I see no failover at all whereas others are reporting it works fine. Jon From: Rohit Yadav Sent: 22 May 2018 12:12 To: users@cloudstack.apache.org Subject: Re: 4.11 without Host-HA framework Hi Jon, Yes, Host-HA is different from VM-HA and without Host HA enabled a HA enabled VM should be recovered/run on a different host when it crashes. Historically the term 'HA' in CloudStack is used around high availability of a VM. Host HA as the name tries to imply is around HA of a physical hypervisor host by means of out-of-band management technologies such as ipmi and currently supporting ipmi as OOBM and KVM hosts with NFS storage. - Rohit <https://cloudstack.apache.org> [https://cloudstack.apache.org/images/monkey-144.png]<https://cloudstack.apache.org/> Apache CloudStack: Open Source Cloud Computing<https://cloudstack.apache.org/> cloudstack.apache.org CloudStack is open source cloud computing software for creating, managing, and deploying infrastructure cloud services From: Jon Marshall Sent: Monday, May 21, 2018 8:36:04 PM To: users@cloudstack.apache.org Subject: 4.11 without Host-HA framework I keep seeing conflicting information about this in the mailing lists and in blogs etc. If I run 4.11 without enabling Host HA framework should HA still work if I crash a compute node because my understanding was the new framework was added for certain cases only. It doesn't work for me but I can find a number of people saying you don't need to enable the new framework for it to work. Thanks Jon rohit.ya...@shapeblue.com www.shapeblue.com<http://www.shapeblue.com> 53 Chandos Place, Covent Garden, London WC2N 4HSUK @shapeblue rohit.ya...@shapeblue.com www.shapeblue.com<http://www.shapeblue.com> 53 Chandos Place, Covent Garden, London WC2N 4HSUK @shapeblue
Re: 4.11 without Host-HA framework
Hi Rohit In an attempt to make things simpler I am now running the management and storage (NFS) across the same NIC with a separate NIC for the guest VMs. So basic networking, one subnet for management/storage and a different one for guest VMs which means two bridges. I am also just testing VM HA (not Host HA at present) 1 manager and 3 compute nodes. I crash a compute node or pull the power on the node and monitor the management server log. It reports the ping timeouts and once then after the ping interval * ping timoeut time it marks the host as state Alert in the UI. So far so good. But it never tries to migrate the VM running on the crashed node. Not a single message about attempting to restart, nothing. The VM has been setup with a compute offering with HA enabled. Any thoughts as to why it is not trying to restart the VM on another of the nodes (there is capacity as one of the nodes has no VMs on it) . The only other thing I can try is to use just one NIC for everything and see if I get anywhere with that. Jon From: Rohit Yadav Sent: 23 May 2018 10:45 To: users@cloudstack.apache.org Subject: Re: 4.11 without Host-HA framework Jon, In the VM's compute offering, make sure that HA is ticked/enabled. Then use that HA-enabled VM offering while deploying a VM. Around testing - it depends how you're crashing. In case of KVM, you can try to cause host crash (example: echo c > /proc/sysrq-trigger) and see if HA-enabled VMs gets started on a different host. - Rohit <https://cloudstack.apache.org> [https://cloudstack.apache.org/images/monkey-144.png]<https://cloudstack.apache.org/> Apache CloudStack: Open Source Cloud Computing<https://cloudstack.apache.org/> cloudstack.apache.org CloudStack is open source cloud computing software for creating, managing, and deploying infrastructure cloud services ____ From: Jon Marshall Sent: Tuesday, May 22, 2018 8:28:06 PM To: users@cloudstack.apache.org Subject: Re: 4.11 without Host-HA framework Hi Rohit Thanks for responding. I have not had much luck with HA at all. I crash a server and nothing happens in terms of VMs migrating to another host. Monitoring the management log file it seems the management server recognises the host has stopped responding to pings but doesn't think it has to do anything. I am currently running v4.11 with basic network but 3 separate NICs, one for management, one for storage and one for VMs themselves. Should it make it any difference ie. would it be worth trying to run management and storage over the same NIC ? I am just lost as to why I see no failover at all whereas others are reporting it works fine. Jon From: Rohit Yadav Sent: 22 May 2018 12:12 To: users@cloudstack.apache.org Subject: Re: 4.11 without Host-HA framework Hi Jon, Yes, Host-HA is different from VM-HA and without Host HA enabled a HA enabled VM should be recovered/run on a different host when it crashes. Historically the term 'HA' in CloudStack is used around high availability of a VM. Host HA as the name tries to imply is around HA of a physical hypervisor host by means of out-of-band management technologies such as ipmi and currently supporting ipmi as OOBM and KVM hosts with NFS storage. - Rohit <https://cloudstack.apache.org> [https://cloudstack.apache.org/images/monkey-144.png]<https://cloudstack.apache.org/> Apache CloudStack: Open Source Cloud Computing<https://cloudstack.apache.org/> cloudstack.apache.org CloudStack is open source cloud computing software for creating, managing, and deploying infrastructure cloud services From: Jon Marshall Sent: Monday, May 21, 2018 8:36:04 PM To: users@cloudstack.apache.org Subject: 4.11 without Host-HA framework I keep seeing conflicting information about this in the mailing lists and in blogs etc. If I run 4.11 without enabling Host HA framework should HA still work if I crash a compute node because my understanding was the new framework was added for certain cases only. It doesn't work for me but I can find a number of people saying you don't need to enable the new framework for it to work. Thanks Jon rohit.ya...@shapeblue.com www.shapeblue.com<http://www.shapeblue.com> 53 Chandos Place, Covent Garden, London WC2N 4HSUK @shapeblue rohit.ya...@shapeblue.com www.shapeblue.com<http://www.shapeblue.com> 53 Chandos Place, Covent Garden, London WC2N 4HSUK @shapeblue
Re: Basic networking setup
So everything on one subnet/vlan except guest traffic which has it's own. Man thanks for that. From: Ivan Kudryavtsev Sent: 29 May 2018 10:49 To: users Subject: Re: Basic networking setup Hello, Jon, Basically following schema is used for a basic zone: 1. system VMs and hardware servers (heads, secondary storages, hypervisors) use a fake net like 10.0.0.0/16 (I also do NAT all those nodes thru heads to avoid public IPs, or separate security appliance can be used); 2. guest network - separate CIDR used; I still think that the sentence you cite is correct. Every pod has dedicated CIDR (pt2) which assigned to guest VMs, the same stuff (actually) is true for management, but this is another CIDR (pt1). Some people also suggest using a separate network for storage, but I don't see advantages for small and medium deployments. Cheers. 2018-05-29 16:12 GMT+07:00 Jon Marshall : > From the 4.11 documentation - > > > "When basic networking is used, CloudStack will assign IP addresses in the > CIDR of the pod to the guests in that pod. The administrator must add a > Direct IP range on the pod for this purpose. These IPs are in the same VLAN > as the hosts." > > > It may be the way it is written but the above suggests that the IP subnet > used for guest VM traffic is the same IP subnet used for the actual hosts > themselves. > > > But in the same documentation it says it recommends the use of separate > NICs for management and guest traffic. > > > I have setup CS using separate subnets for management, Guest VMs and > storage so 3 separate NICs each in a different vlan using a different IP > subnet. (the NICs are not vlan aware, just connecting to ports in different > vlans on the switch). > > > Should I be using just the one IP subnet for all NICs and simply > connecting them all to the same bridge instead ? > > > Jon > > > -- With best regards, Ivan Kudryavtsev Bitworks Software, Ltd. Cell: +7-923-414-1515 WWW: http://bitworks.software/ <http://bw-sw.com/> Bitworks Software — custom software development for fast ...<http://bitworks.software/> bitworks.software Welcome to Bitworks Software. We update our web-site currently. Our estimate of coming back is middle of May, 2018. Currently available resources:
Basic networking setup
>From the 4.11 documentation - "When basic networking is used, CloudStack will assign IP addresses in the CIDR of the pod to the guests in that pod. The administrator must add a Direct IP range on the pod for this purpose. These IPs are in the same VLAN as the hosts." It may be the way it is written but the above suggests that the IP subnet used for guest VM traffic is the same IP subnet used for the actual hosts themselves. But in the same documentation it says it recommends the use of separate NICs for management and guest traffic. I have setup CS using separate subnets for management, Guest VMs and storage so 3 separate NICs each in a different vlan using a different IP subnet. (the NICs are not vlan aware, just connecting to ports in different vlans on the switch). Should I be using just the one IP subnet for all NICs and simply connecting them all to the same bridge instead ? Jon
Re: 4.11 without Host-HA framework
Hi Parth I remember you saying this worked for you in a previous thread. I am beginning to wonder if it is the fact I have used 3 separate NICs, one for management, one for the VM traffic and the third for storage that I am not seeing the behaviour you saw. That is why, I too would like to understand exactly what is talking to what and doing checks for both non Host-HA and Host-HA. I did get failover working in some scenarios with Host-HA and OOBM using IPMI but it was slow even after tweaking the timers eg. for a crashed host the best time i got was around 8 minutes which seems a long time but perhaps that is an acceptable time for CS, I just don't know. Not expecting it to be instantaneous as it needs to do checks etc. Jon From: Parth Patel Sent: 24 May 2018 06:52 To: users@cloudstack.apache.org Subject: Re: 4.11 without Host-HA framework Hi Jon and Angus, I did not shutdown the VMs as Yiping Zhang said, but I have confirmed this and discussed earlier in the users list that my HA-enabled VMs got started on another suitable available host in the cluster even when I didn't have IPMI-enabled hardware and did no configuration for OOBM and Host-HA. I simply pulled the ethernet cable connecting the host to entire network (I did use just one NIC) and according to the value set in ping timeout event, the HA-enabled VMs were restarted on another available host. I tested the scenario using both the scenarios: the echo command as well as good old plugging out the NIC from the host. My VMs were successfully started on another available host after CS manager confirmed they were not reachable. I too want to understand how the failover mechanism in CloudStack actually works. I used ACS 4.11 packages available here: http://cloudstack.apt-get.eu/centos/7/4.11/ Regards, Parth Patel On Thu, 24 May 2018 at 10:53 Paul Angus wrote: > I'm afraid that is not a host crash. When shutting down the guest OS, the > CloudStack agent on the host is still able to report to the management > server that the VM has stopped. > > This is my point. VM-HA relies on the management sever communication with > the host agent. > > Kind regards, > > Paul Angus > > paul.an...@shapeblue.com > www.shapeblue.com<http://www.shapeblue.com> > 53 Chandos Place, Covent Garden, London WC2N 4HSUK > @shapeblue > > > > > -Original Message- > From: Yiping Zhang > Sent: 24 May 2018 00:44 > To: users@cloudstack.apache.org > Subject: Re: 4.11 without Host-HA framework > > I can say for fact that VM's using a HA enabled service offering will be > restarted by CS on another host, assuming there are enough > capacity/resources in the cluster, when their original host crashes, > regardless that host comes back or not. > > The simplest way to test VM HA feature with a VM instance using HA enabled > service offering is to issue shutdown command in guest OS, and watching it > gets restarted by CS manager. > > On 5/23/18, 1:23 PM, "Paul Angus" wrote: > > Hi Jon, > > Don't worry, TBH I'm dubious about those claiming to have VM-HA > working when a host crashes (but doesn't restart). > I'll check in with the guys that set values for host-ha when testing, > to see which ones they change and what they set them to. > > paul.an...@shapeblue.com > www.shapeblue.com<http://www.shapeblue.com> > 53 Chandos Place, Covent Garden, London WC2N 4HSUK > @shapeblue > > > > > -Original Message- > From: Jon Marshall > Sent: 23 May 2018 21:10 > To: users@cloudstack.apache.org > Subject: Re: 4.11 without Host-HA framework > > Rohit / Paul > > > Thanks again for answering. > > > I am a Cisco guy with an ex Unix background but no virtualisation > experience and I can honestly say I have never felt this stupid before 😊 > > > I have Cloudstack working but failover is killing me. > > > When you say VM HA relies on the host telling CS the VM is down how > does that work because if you crash the host how does it tell CS anything ? > And when you say tell CS do you mean the CS manager ? > > > I guess I am just not understanding all the moving parts. I have had > HOST HA working (to an extent) although it takes a long time to failover > even after tweaking the timers but the fact that I keep finding references > to people saying even without HOST HA it should failover (and mine doesn't) > makes me think I have configured it incorrectly somewhere along the line. > > > I have configured a compute offering with HA and I am crashing the > host with the echo command as suggested
Re: 4.11 without Host-HA framework
Rohit / Paul Thanks again for answering. I am a Cisco guy with an ex Unix background but no virtualisation experience and I can honestly say I have never felt this stupid before 😊 I have Cloudstack working but failover is killing me. When you say VM HA relies on the host telling CS the VM is down how does that work because if you crash the host how does it tell CS anything ? And when you say tell CS do you mean the CS manager ? I guess I am just not understanding all the moving parts. I have had HOST HA working (to an extent) although it takes a long time to failover even after tweaking the timers but the fact that I keep finding references to people saying even without HOST HA it should failover (and mine doesn't) makes me think I have configured it incorrectly somewhere along the line. I have configured a compute offering with HA and I am crashing the host with the echo command as suggested but still nothing. I understand what you are saying Paul about it not being a good idea to rely on VM HA so I will go back to Host HA and try to speed up failover times. Can I ask, from your experiences, what is a realistic fail over time for CS ie. if a host fails for example ? Jon From: Paul Angus Sent: 23 May 2018 19:55 To: users@cloudstack.apache.org Subject: RE: 4.11 without Host-HA framework Jon, As Rohit says, it is very important to understand the difference between VM HA and host HA. VM HA relies on the HOST telling CloudStack that the VM is down on order for CloudStack start it again (wherever that ends up being). Any sequence of events that ends up with VM HA restarting the VM when CloudStack can't contact the host is luck/fluke/unreliable/bad(tm) The purpose of Host HA was to create a reliable mechanism to determine that a host has 'crashed' and that the VMs within it are inoperative. Then take appropriate action, including ultimately telling VM HA to restart the VM elsewhere. paul.an...@shapeblue.com www.shapeblue.com<http://www.shapeblue.com> Shapeblue - The CloudStack Company<http://www.shapeblue.com/> www.shapeblue.com ShapeBlue are the largest independent integrator of CloudStack technologies globally and are specialists in the design and implementation of IaaS cloud infrastructures for both private and public cloud implementations. 53 Chandos Place, Covent Garden, London WC2N 4HSUK @shapeblue -Original Message- From: Rohit Yadav Sent: 23 May 2018 10:45 To: users@cloudstack.apache.org Subject: Re: 4.11 without Host-HA framework Jon, In the VM's compute offering, make sure that HA is ticked/enabled. Then use that HA-enabled VM offering while deploying a VM. Around testing - it depends how you're crashing. In case of KVM, you can try to cause host crash (example: echo c > /proc/sysrq-trigger) and see if HA-enabled VMs gets started on a different host. - Rohit <https://cloudstack.apache.org> From: Jon Marshall Sent: Tuesday, May 22, 2018 8:28:06 PM To: users@cloudstack.apache.org Subject: Re: 4.11 without Host-HA framework Hi Rohit Thanks for responding. I have not had much luck with HA at all. I crash a server and nothing happens in terms of VMs migrating to another host. Monitoring the management log file it seems the management server recognises the host has stopped responding to pings but doesn't think it has to do anything. I am currently running v4.11 with basic network but 3 separate NICs, one for management, one for storage and one for VMs themselves. Should it make it any difference ie. would it be worth trying to run management and storage over the same NIC ? I am just lost as to why I see no failover at all whereas others are reporting it works fine. Jon From: Rohit Yadav Sent: 22 May 2018 12:12 To: users@cloudstack.apache.org Subject: Re: 4.11 without Host-HA framework Hi Jon, Yes, Host-HA is different from VM-HA and without Host HA enabled a HA enabled VM should be recovered/run on a different host when it crashes. Historically the term 'HA' in CloudStack is used around high availability of a VM. Host HA as the name tries to imply is around HA of a physical hypervisor host by means of out-of-band management technologies such as ipmi and currently supporting ipmi as OOBM and KVM hosts with NFS storage. - Rohit <https://cloudstack.apache.org> [https://cloudstack.apache.org/images/monkey-144.png]<https://cloudstack.apache.org/> Apache CloudStack: Open Source Cloud Computing<https://cloudstack.apache.org/> cloudstack.apache.org CloudStack is open source cloud computing software for creating, managing, and deploying infrastructure cloud services From: Jon Marshall Sent: Monday, May 21, 2018 8:36:04 PM To: users@cloudstack.apache.org Subject: 4.11 without Host
Re: 4.11 without Host-HA framework
Hi Rohit Thanks for responding. I have not had much luck with HA at all. I crash a server and nothing happens in terms of VMs migrating to another host. Monitoring the management log file it seems the management server recognises the host has stopped responding to pings but doesn't think it has to do anything. I am currently running v4.11 with basic network but 3 separate NICs, one for management, one for storage and one for VMs themselves. Should it make it any difference ie. would it be worth trying to run management and storage over the same NIC ? I am just lost as to why I see no failover at all whereas others are reporting it works fine. Jon From: Rohit Yadav Sent: 22 May 2018 12:12 To: users@cloudstack.apache.org Subject: Re: 4.11 without Host-HA framework Hi Jon, Yes, Host-HA is different from VM-HA and without Host HA enabled a HA enabled VM should be recovered/run on a different host when it crashes. Historically the term 'HA' in CloudStack is used around high availability of a VM. Host HA as the name tries to imply is around HA of a physical hypervisor host by means of out-of-band management technologies such as ipmi and currently supporting ipmi as OOBM and KVM hosts with NFS storage. - Rohit <https://cloudstack.apache.org> [https://cloudstack.apache.org/images/monkey-144.png]<https://cloudstack.apache.org/> Apache CloudStack: Open Source Cloud Computing<https://cloudstack.apache.org/> cloudstack.apache.org CloudStack is open source cloud computing software for creating, managing, and deploying infrastructure cloud services ________ From: Jon Marshall Sent: Monday, May 21, 2018 8:36:04 PM To: users@cloudstack.apache.org Subject: 4.11 without Host-HA framework I keep seeing conflicting information about this in the mailing lists and in blogs etc. If I run 4.11 without enabling Host HA framework should HA still work if I crash a compute node because my understanding was the new framework was added for certain cases only. It doesn't work for me but I can find a number of people saying you don't need to enable the new framework for it to work. Thanks Jon rohit.ya...@shapeblue.com www.shapeblue.com<http://www.shapeblue.com> 53 Chandos Place, Covent Garden, London WC2N 4HSUK @shapeblue
4.11 without Host-HA framework
I keep seeing conflicting information about this in the mailing lists and in blogs etc. If I run 4.11 without enabling Host HA framework should HA still work if I crash a compute node because my understanding was the new framework was added for certain cases only. It doesn't work for me but I can find a number of people saying you don't need to enable the new framework for it to work. Thanks Jon
Re: Failover for VMs
Paul I did some more testing today and am not sure what some of the states mean. The first test was the easiest ie. "echo c > /proc/sysrq-trigger" which crashes the server. In my setup the VMs on the crashed node never migrate because the server is rebooted and it comes backup before CS tries to migrate any servers. It takes approx 4 mins for server to recover. The next tests were by doing a hard reset on the server and then modifying timers - I did 4 tests and the quickest I got the VMs to failover was approx 5 and half minutes (see below for test details). So I have two questions really from all this - 1) why does it go from Suspect to Degraded and back to Suspect once I started changing timers. According to the docs Degraded means a successful activity check but the server was down so it can't have passed. And noticeably without modifying any timers it never goes to Degraded at all. 2) what is a sensible fail over time in your experience ie. what in your experience is a reasonable failover time ? Thanks for any help you can give. Tests - 1) default timers - 0:00 Suspect 9:00 recovery/Fenced 10:15 VM migrated 2) kvm.ha.activity.check.max.attempts 3 (default = 10) 0:00 Suspect 2:00 Degraded 7:00 Suspect 9:00 Recovery/Fenced 10:20 VM migrated 3) kvm.ha.activity.check.max.attempts 3 (default = 10) kvm.ha.degraded.max.period 120 seconds (default = 300) 0:00 Suspect 2:00 Degraded 4:00 Suspect 6:00 Checking/Fenced 7:21 VM migrated 4) kvm.ha.activity.check.max.attempts 3 (default = 10) kvm.ha.degraded.max.period 120 seconds (default = 300) kvm.ha.activity.check.interval 30 seconds (default = 60) 0:00 Suspect 1:10 Degraded 3:10 Suspect 4:20 Recovering/Fenced 5:30 VM migrated ________ From: Jon Marshall Sent: 29 March 2018 09:40 To: users@cloudstack.apache.org Subject: Re: Failover for VMs Hi Paul I did make some progress with this and seem to remember that after it said Recovered it then went back to Suspect and finally Fenced. I am going to rerun a lot of the tests after changing some of the kvm_ha_ timers to try and speed things up a bit. Will update here after I have run tests to check if that is what I should be seeing. Many thanks Jon From: Paul Angus Sent: 28 March 2018 20:01 To: users@cloudstack.apache.org Subject: Re: Failover for VMs Ah. Did you wait after the node said recovered? That message is spurious. I've seen it also. It should say recovering. at that time. ____ From: Jon Marshall Sent: Tuesday, 27 March 2018 10:42 am To: users@cloudstack.apache.org Subject: Re: Failover for VMs Just as an update to this before I forget what I did :) - I used "echo c /proc/sysrq-trigger" on one of the compute nodes and there was no VM failover. Instead HA reported suspect and then IPMI rebooted the machine, it came back online and the VM started responding to pings again. IPMI is out of band so that seems to be reasonable behaviour but no use in testing HA. Next I just pulled all 3 NIC cables from the same compute node and again HA reported suspect. Again IPMI rebooted but then HA state changed to "Recovered" which I don't understand as the NIC cables were still disconnected so VM was not reachable and no failover. I don't understand how it can think the node is recovered as apart from the IPMI out of band connection there are no network connections to this server. Finally pulled power lead and this time HA went from suspect to Fencing and then stayed that way. Again no VM failover. This makes sense as no power means IPMI cannot reboot server so it never moves to Fenced I assume. Again no failover. I am wondering if it is to do with out of band IPMI or the way I have the NICs setup. The management node only has one NIC in the management network but I assume this is okay. I may try reloading with CS v4.9 and just try failover without the new HA KVM to see if I see anything different. Jon From: Jon Marshall Sent: 27 March 2018 10:10 To: users@cloudstack.apache.org Subject: Re: Failover for VMs Thanks Paul, will pick up after Easter break. Doing some more testing with HA KVM at the moment so any progress will update this thread i From: Paul Angus http://www.shapeblue.com> [http://www.shapeblue.com/wp-content/uploads/2017/06/logo.png]<http://www.shapeblue.com/> Shapeblue - The CloudStack Company<http://www.shapeblue.com/> www.shapeblue.com Rapid deployment framework for Apache CloudStack IaaS Clouds. CSForge is a framework developed by ShapeBlue to deliver the rapid deployment of a standardised CloudStack powered IaaS cloud for small production deployments, or medium scale POCs or pilots. [http://www.shapeblue.com/wp-content/uploads/2017
Re: Failover for VMs
Hi Paul I did make some progress with this and seem to remember that after it said Recovered it then went back to Suspect and finally Fenced. I am going to rerun a lot of the tests after changing some of the kvm_ha_ timers to try and speed things up a bit. Will update here after I have run tests to check if that is what I should be seeing. Many thanks Jon From: Paul Angus Sent: 28 March 2018 20:01 To: users@cloudstack.apache.org Subject: Re: Failover for VMs Ah. Did you wait after the node said recovered? That message is spurious. I've seen it also. It should say recovering. at that time. From: Jon Marshall Sent: Tuesday, 27 March 2018 10:42 am To: users@cloudstack.apache.org Subject: Re: Failover for VMs Just as an update to this before I forget what I did :) - I used "echo c /proc/sysrq-trigger" on one of the compute nodes and there was no VM failover. Instead HA reported suspect and then IPMI rebooted the machine, it came back online and the VM started responding to pings again. IPMI is out of band so that seems to be reasonable behaviour but no use in testing HA. Next I just pulled all 3 NIC cables from the same compute node and again HA reported suspect. Again IPMI rebooted but then HA state changed to "Recovered" which I don't understand as the NIC cables were still disconnected so VM was not reachable and no failover. I don't understand how it can think the node is recovered as apart from the IPMI out of band connection there are no network connections to this server. Finally pulled power lead and this time HA went from suspect to Fencing and then stayed that way. Again no VM failover. This makes sense as no power means IPMI cannot reboot server so it never moves to Fenced I assume. Again no failover. I am wondering if it is to do with out of band IPMI or the way I have the NICs setup. The management node only has one NIC in the management network but I assume this is okay. I may try reloading with CS v4.9 and just try failover without the new HA KVM to see if I see anything different. Jon ____ From: Jon Marshall Sent: 27 March 2018 10:10 To: users@cloudstack.apache.org Subject: Re: Failover for VMs Thanks Paul, will pick up after Easter break. Doing some more testing with HA KVM at the moment so any progress will update this thread i From: Paul Angus http://www.shapeblue.com> [http://www.shapeblue.com/wp-content/uploads/2017/06/logo.png]<http://www.shapeblue.com/> Shapeblue - The CloudStack Company<http://www.shapeblue.com/> www.shapeblue.com Rapid deployment framework for Apache CloudStack IaaS Clouds. CSForge is a framework developed by ShapeBlue to deliver the rapid deployment of a standardised ... [http://www.shapeblue.com/wp-content/uploads/2017/06/logo.png]<http://www.shapeblue.com/> Shapeblue - The CloudStack Company<http://www.shapeblue.com/> www.shapeblue.com<http://www.shapeblue.com> Rapid deployment framework for Apache CloudStack IaaS Clouds. CSForge is a framework developed by ShapeBlue to deliver the rapid deployment of a standardised ... [http://www.shapeblue.com/wp-content/uploads/2017/06/logo.png]<http://www.shapeblue.com/> Shapeblue - The CloudStack Company<http://www.shapeblue.com/> www.shapeblue.com<http://www.shapeblue.com> Rapid deployment framework for Apache CloudStack IaaS Clouds. CSForge is a framework developed by ShapeBlue to deliver the rapid deployment of a standardised ... 53 Chandos Place, Covent Garden, London WC2N 4HSUK @shapeblue paul.an...@shapeblue.com www.shapeblue.com<http://www.shapeblue.com> 53 Chandos Place, Covent Garden, London WC2N 4HSUK @shapeblue -Original Message- From: Jon Marshall Sent: 27 March 2018 09:19 To: users@cloudstack.apache.org Subject: Failover for VMs After 3 weeks of trying multiple different setups I still have not managed to get a VM to failover between compute nodes and am just running out of ideas. I have 3 compute nodes each with 3 NICS (management, VMs traffic, storage), one management node with just a single NIC connection in the management network and a separate NFS server. I have tried with and without the new Host HA KVM in CS v4.11 as from what I have read even without enabling the new Host HA KVM when you power off or reboot a compute node your VMs should still migrate. I have tried powering off a compute node, pulling the power lead, removing the management and NFS network cables and the management server just seems to carry on as if nothing has happened. Could someone explain exactly how HA is meant to work so I can look at where it is going wrong.
Re: Failover for VMs
Ok, significant progress made with this and have got Host HA KVM failover working for a number of different scenarios. Will update this thread with tests run etc. and pick up after Easter as suggested by Paul. From: Jon Marshall Sent: 27 March 2018 11:24 To: users@cloudstack.apache.org Subject: Re: Failover for VMs I am just updating as I continue testing - When i pulled the power lead as discussed below it goes from Suspect to Fencing but never gets to Fenced. But when I put the power lead back in to the server CS almost immediately puts that server into maintenance mode and then does migrate t ot sure of the logic but at least I got to see a VM failover ___ From: Jon Marshall Sent: 27 March 2018 10:42 To: users@cloudstack.apache.org Subject: Re: Failover for VMs Just as an update to this before I forget what I did :) - I used "echo c /proc/sysrq-trigger" on one of the compute nodes and there was no VM failover. Instead HA reported suspect and then IPMI rebooted the machine, it came bacVM started responding to pings again. IPMI is out of band so that seems to be reasonable behaviour but no use in testing HA. Next I just pulled all 3 NIC cables from the same compute node and again HA reported suspect. Again IPMI rebooted but then HA state changed to "Recovered" which I don't understand as the NIC cables were still disconnected so VM was not reachable and no failover. I don't understand how it can think the node is recovered as apart from the IPMI out of band connection there are no network connections to this server. Finally pulled power lead and this time HA went from suspect to Fencing and then stayed that way. Again no VM failover. This makes sense as no power means IPMI cannot reboot server so it never moves to Fenced I assume. Again no failover. I am wondering if it is to do with out of band IPMI or the way I have the NICs setup. The management node only has one NIC in the management network but I assume this is okay. I may try reloading with CS v4.9 and just try failover without the new HA KVM to see if I see anything different. Jon ________ From: Jon Marshall Sent: 27 March 2018 10:10 To: users@cloudstack.apache.org Subject: Re: Failover for VMs Thanks Paul, will pick up after Easter break. Doing some more testing with HA KVM at the moment so any progress will update this thread i From: Paul Angus http://www.shapeblue.com> [http://www.shapeblue.com/wp-content/uploads/2017/06/logo.png]<http://www.shapeblue.com/> Shapeblue - The CloudStack Company<http://www.shapeblue.com/> www.shapeblue.com Rapid deployment framework for Apache CloudStack IaaS Clouds. CSForge is a framework developed by ShapeBlue to deliver the rapid deployment of a standardised ... [http://www.shapeblue.com/wp-content/uploads/2017/06/logo.png]<http://www.shapeblue.com/> Shapeblue - The CloudStack Company<http://www.shapeblue.com/> www.shapeblue.com<http://www.shapeblue.com> Rapid deployment framework for Apache CloudStack IaaS Clouds. CSForge is a framework developed by ShapeBlue to deliver the rapid deployment of a standardised ... [http://www.shapeblue.com/wp-content/uploads/2017/06/logo.png]<http://www.shapeblue.com/> Shapeblue - The CloudStack Company<http://www.shapeblue.com/> www.shapeblue.com<http://www.shapeblue.com> Rapid deployment framework for Apache CloudStack IaaS Clouds. CSForge is a framework developed by ShapeBlue to deliver the rapid deployment of a standardised ... [http://www.shapeblue.com/wp-content/uploads/2017/06/logo.png]<http://www.shapeblue.com/> Shapeblue - The CloudStack Company<http://www.shapeblue.com/> www.shapeblue.com<http://www.shapeblue.com> Rapid deployment framework for Apache CloudStack IaaS Clouds. CSForge is a framework developed by ShapeBlue to deliver the rapid deployment of a standardised ... 53 Chandos Place, Covent Garden, London WC2N 4HSUK @shapeblue -Original Message- From: Jon Marshall Sent: 27 March 2018 09:19 To: users@cloudstack.apache.org Subject: Failover for VMs After 3 weeks of trying multiple different setups I still have not managed to get a VM to failover between compute nodes and am just running out of ideas. I have 3 compute nodes each with 3 NICS (management, VMs traffic, storage), one management node with just a single NIC connection in the management network and a separate NFS server. I have tried with and without the new Host HA KVM in CS v4.11 as from what I have read even without enabling the new Host HA KVM when you power off or reboot a compute node your VMs should still migrate. I have tried powering off a compute node, pulling the power lead, removing the management and NFS network cables and the management server just seems to carry on a
Re: Failover for VMs
I am just updating as I continue testing - When i pulled the power lead as discussed below it goes from Suspect to Fencing but never gets to Fenced. But when I put the power lead back in to the server CS almost immediately puts that server into maintenance mode and then does migrate the VM. Not sure of the logic but at least I got to see a VM failover :) From: Jon Marshall Sent: 27 March 2018 10:42 To: users@cloudstack.apache.org Subject: Re: Failover for VMs Just as an update to this before I forget what I did :) - I used "echo c /proc/sysrq-trigger" on one of the compute nodes and there was no VM failover. Instead HA reported suspect and then IPMI rebooted the machine, it came bacVM started responding to pings again. IPMI is out of band so that seems to be reasonable behaviour but no use in testing HA. Next I just pulled all 3 NIC cables from the same compute node and again HA reported suspect. Again IPMI rebooted but then HA state changed to "Recovered" which I don't understand as the NIC cables were still disconnected so VM was not reachable and no failover. I don't understand how it can think the node is recovered as apart from the IPMI out of band connection there are no network connections to this server. Finally pulled power lead and this time HA went from suspect to Fencing and then stayed that way. Again no VM failover. This makes sense as no power means IPMI cannot reboot server so it never moves to Fenced I assume. Again no failover. I am wondering if it is to do with out of band IPMI or the way I have the NICs setup. The management node only has one NIC in the management network but I assume this is okay. I may try reloading with CS v4.9 and just try failover without the new HA KVM to see if I see anything different. Jon ________ From: Jon Marshall Sent: 27 March 2018 10:10 To: users@cloudstack.apache.org Subject: Re: Failover for VMs Thanks Paul, will pick up after Easter break. Doing some more testing with HA KVM at the moment so any progress will update this thread i From: Paul Angus http://www.shapeblue.com> [http://www.shapeblue.com/wp-content/uploads/2017/06/logo.png]<http://www.shapeblue.com/> Shapeblue - The CloudStack Company<http://www.shapeblue.com/> www.shapeblue.com Rapid deployment framework for Apache CloudStack IaaS Clouds. CSForge is a framework developed by ShapeBlue to deliver the rapid deployment of a standardised ... [http://www.shapeblue.com/wp-content/uploads/2017/06/logo.png]<http://www.shapeblue.com/> Shapeblue - The CloudStack Company<http://www.shapeblue.com/> www.shapeblue.com<http://www.shapeblue.com> Rapid deployment framework for Apache CloudStack IaaS Clouds. CSForge is a framework developed by ShapeBlue to deliver the rapid deployment of a standardised ... [http://www.shapeblue.com/wp-content/uploads/2017/06/logo.png]<http://www.shapeblue.com/> Shapeblue - The CloudStack Company<http://www.shapeblue.com/> www.shapeblue.com<http://www.shapeblue.com> Rapid deployment framework for Apache CloudStack IaaS Clouds. CSForge is a framework developed by ShapeBlue to deliver the rapid deployment of a standardised ... 53 Chandos Place, Covent Garden, London WC2N 4HSUK @shapeblue -Original Message- From: Jon Marshall Sent: 27 March 2018 09:19 To: users@cloudstack.apache.org Subject: Failover for VMs After 3 weeks of trying multiple different setups I still have not managed to get a VM to failover between compute nodes and am just running out of ideas. I have 3 compute nodes each with 3 NICS (management, VMs traffic, storage), one management node with just a single NIC connection in the management network and a separate NFS server. I have tried with and without the new Host HA KVM in CS v4.11 as from what I have read even without enabling the new Host HA KVM when you power off or reboot a compute node your VMs should still migrate. I have tried powering off a compute node, pulling the power lead, removing the management and NFS network cables and the management server just seems to carry on as if nothing has happened. Could someone explain exactly how HA is meant to work so I can look at where it is going wrong.
Re: Failover for VMs
Just as an update to this before I forget what I did :) - I used "echo c /proc/sysrq-trigger" on one of the compute nodes and there was no VM failover. Instead HA reported suspect and then IPMI rebooted the machine, it came back online and the VM started responding to pings again. IPMI is out of band so that seems to be reasonable behaviour but no use in testing HA. Next I just pulled all 3 NIC cables from the same compute node and again HA reported suspect. Again IPMI rebooted but then HA state changed to "Recovered" which I don't understand as the NIC cables were still disconnected so VM was not reachable and no failover. I don't understand how it can think the node is recovered as apart from the IPMI out of band connection there are no network connections to this server. Finally pulled power lead and this time HA went from suspect to Fencing and then stayed that way. Again no VM failover. This makes sense as no power means IPMI cannot reboot server so it never moves to Fenced I assume. Again no failover. I am wondering if it is to do with out of band IPMI or the way I have the NICs setup. The management node only has one NIC in the management network but I assume this is okay. I may try reloading with CS v4.9 and just try failover without the new HA KVM to see if I see anything different. Jon ________ From: Jon Marshall Sent: 27 March 2018 10:10 To: users@cloudstack.apache.org Subject: Re: Failover for VMs Thanks Paul, will pick up after Easter break. Doing some more testing with HA KVM at the moment so any progress will update this thread i From: Paul Angus http://www.shapeblue.com> [http://www.shapeblue.com/wp-content/uploads/2017/06/logo.png]<http://www.shapeblue.com/> Shapeblue - The CloudStack Company<http://www.shapeblue.com/> www.shapeblue.com Rapid deployment framework for Apache CloudStack IaaS Clouds. CSForge is a framework developed by ShapeBlue to deliver the rapid deployment of a standardised ... [http://www.shapeblue.com/wp-content/uploads/2017/06/logo.png]<http://www.shapeblue.com/> Shapeblue - The CloudStack Company<http://www.shapeblue.com/> www.shapeblue.com<http://www.shapeblue.com> Rapid deployment framework for Apache CloudStack IaaS Clouds. CSForge is a framework developed by ShapeBlue to deliver the rapid deployment of a standardised ... 53 Chandos Place, Covent Garden, London WC2N 4HSUK @shapeblue -Original Message- From: Jon Marshall Sent: 27 March 2018 09:19 To: users@cloudstack.apache.org Subject: Failover for VMs After 3 weeks of trying multiple different setups I still have not managed to get a VM to failover between compute nodes and am just running out of ideas. I have 3 compute nodes each with 3 NICS (management, VMs traffic, storage), one management node with just a single NIC connection in the management network and a separate NFS server. I have tried with and without the new Host HA KVM in CS v4.11 as from what I have read even without enabling the new Host HA KVM when you power off or reboot a compute node your VMs should still migrate. I have tried powering off a compute node, pulling the power lead, removing the management and NFS network cables and the management server just seems to carry on as if nothing has happened. Could someone explain exactly how HA is meant to work so I can look at where it is going wrong.
Re: Failover for VMs
Thanks Paul, will pick up after Easter break. Doing some more testing with HA KVM at the moment so any progress will update this thread i From: Paul Angus http://www.shapeblue.com> [http://www.shapeblue.com/wp-content/uploads/2017/06/logo.png]<http://www.shapeblue.com/> Shapeblue - The CloudStack Company<http://www.shapeblue.com/> www.shapeblue.com Rapid deployment framework for Apache CloudStack IaaS Clouds. CSForge is a framework developed by ShapeBlue to deliver the rapid deployment of a standardised ... 53 Chandos Place, Covent Garden, London WC2N 4HSUK @shapeblue -Original Message- From: Jon Marshall Sent: 27 March 2018 09:19 To: users@cloudstack.apache.org Subject: Failover for VMs After 3 weeks of trying multiple different setups I still have not managed to get a VM to failover between compute nodes and am just running out of ideas. I have 3 compute nodes each with 3 NICS (management, VMs traffic, storage), one management node with just a single NIC connection in the management network and a separate NFS server. I have tried with and without the new Host HA KVM in CS v4.11 as from what I have read even without enabling the new Host HA KVM when you power off or reboot a compute node your VMs should still migrate. I have tried powering off a compute node, pulling the power lead, removing the management and NFS network cables and the management server just seems to carry on as if nothing has happened. Could someone explain exactly how HA is meant to work so I can look at where it is going wrong.
Failover for VMs
After 3 weeks of trying multiple different setups I still have not managed to get a VM to failover between compute nodes and am just running out of ideas. I have 3 compute nodes each with 3 NICS (management, VMs traffic, storage), one management node with just a single NIC connection in the management network and a separate NFS server. I have tried with and without the new Host HA KVM in CS v4.11 as from what I have read even without enabling the new Host HA KVM when you power off or reboot a compute node your VMs should still migrate. I have tried powering off a compute node, pulling the power lead, removing the management and NFS network cables and the management server just seems to carry on as if nothing has happened. Could someone explain exactly how HA is meant to work so I can look at where it is going wrong.
Re: KVM HostHA
.com<http://www.shapeblue.com> > Rapid deployment framework for Apache CloudStack IaaS Clouds. CSForge is a > framework developed by ShapeBlue to deliver the rapid deployment of a > standardised ... > > > > > 53 Chandos Place, Covent Garden, London WC2N 4HSUK > > @shapeblue > > > > > > > > > On 14 Mar 2018, at 14:51, Andrija Panic > wrote: > > > > > > Hi Boris, > > > > > > ok thanks for the explanation - that makes sense, and covers my > > "exception > > > case" that I have. > > > > > > This is atm only available for NFS as I could read (KVM on NFS) ? > > > > > > Cheers > > > > > > On 14 March 2018 at 13:02, Boris Stoyanov < > boris.stoya...@shapeblue.com> > > > wrote: > > > > > >> Hi Andrija, > > >> > > >> There’s two types of checks Host-HA is doing to determine if host if > > >> healthy. > > >> > > >> 1. Health checks - pings the host as soon as there’s connection issues > > >> with the agent > > >> > > >> If that fails, > > >> > > >> 2. Activity checks - checks if there are any writing operations on the > > >> Disks of the VMs that are running on the hosts. This is to determine > if > > the > > >> VMs are actually alive and executing processes. Only if no disk > > operations > > >> are executed on the shared storage, only then it’s trying to Recover > the > > >> host with IPMI call, if that eventually fails, it migrates the VMs to > a > > >> healthy host and Fences the faulty one. > > >> > > >> Hope that explains your case. > > >> > > >> Boris. > > >> > > >> > > >> boris.stoya...@shapeblue.com > > >> www.shapeblue.com<http://www.shapeblue.com> > > >> 53 Chandos Place, Covent Garden, London WC2N 4HSUK > > >> @shapeblue > > >> > > >> > > >> > > >>> On 14 Mar 2018, at 13:53, Andrija Panic > > wrote: > > >>> > > >>> Hi Paul, > > >>> > > >>> sorry to bump in the middle of the thread, but just curious about the > > >> idea > > >>> behing host-HA and why it behaves the way you exlained above: > > >>> > > >>> > > >>> Would it be more sense (or not?), that when MGMT detects agents is > > >>> unreachable or host unreachable (or after unsuccessful i.e. agent > > >> restart, > > >>> etc...,to be defined), to actually use IPMI to STONITH the node, thus > > >>> making sure no VMS running and then to really start all HA-enabled > VMs > > on > > >>> other hosts ? > > >>> > > >>> I'm just trying to make parallel to the corosync/pacemaker as > > clustering > > >>> suite/services in Linux (RHEL and others), where when majority of > nodes > > >>> detect that one node is down, a common thing (especially for shared > > >>> storage) is to STONITH that node, make sure it;s down, then move > > >> "resource" > > >>> (in our case VMs) to other cluster nodes ? > > >>> > > >>> I see it's actually much broader setup per > > >>> https://cwiki.apache.org/confluence/display/CLOUDSTACK/Host+HA but > > >> again - > > >>> whole idea (in my head at least...) is when host get's down, we make > > sure > > >>> it's down (avoid VM corruption, by doint STONITH to that node) and > then > > >>> start HA VMs on ohter hosts. > > >>> > > >>> I understand there might be exceptions as I have right now (4.8) - > > >> libvirt > > >>> get stuck (librbd exception or similar) so agent get's disconnected, > > but > > >>> VMs are still running fine... (except DB get messed up, all NICs > loose > > >>> isolation_uri, VR's loose MAC addresses and other IP addresses > etc...) > > >>> > > >>> > > >>> Thanks > > >>> Andrija > > >>> > > >>> > > >>> > > >>> > > >>> On 14 March 2018 at 10:57, Jon Marshall > wrote: > > >>> > > >>>> That would make sense. > > >>>> > > >>>> > > >>>> I have anothe
Re: KVM HostHA
;resource" > >>> (in our case VMs) to other cluster nodes ? > >>> > >>> I see it's actually much broader setup per > >>> https://cwiki.apache.org/confluence/display/CLOUDSTACK/Host+HA but > >> again - > >>> whole idea (in my head at least...) is when host get's down, we make > sure > >>> it's down (avoid VM corruption, by doint STONITH to that node) and then > >>> start HA VMs on ohter hosts. > >>> > >>> I understand there might be exceptions as I have right now (4.8) - > >> libvirt > >>> get stuck (librbd exception or similar) so agent get's disconnected, > but > >>> VMs are still running fine... (except DB get messed up, all NICs loose > >>> isolation_uri, VR's loose MAC addresses and other IP addresses etc...) > >>> > >>> > >>> Thanks > >>> Andrija > >>> > >>> > >>> > >>> > >>> On 14 March 2018 at 10:57, Jon Marshall wrote: > >>> > >>>> That would make sense. > >>>> > >>>> > >>>> I have another server being used for something else at the moment so I > >>>> will add that in and update this thread when I have tested > >>>> > >>>> > >>>> Jon > >>>> > >>>> > >>>> > >>>> From: Paul Angus > >>>> Sent: 14 March 2018 09:16 > >>>> To: users@cloudstack.apache.org > >>>> Subject: RE: KVM HostHA > >>>> > >>>> I'd need to do some testing, but I suspect that your problem is that > you > >>>> only have two hosts. At the point that one host is deemed out of > >> service, > >>>> you only have one host left. With only one host, CloudStack will show > >> the > >>>> cluster as ineligible. > >>>> > >>>> It is extremely common for any system working as a cluster to require > a > >>>> minimum starting point of 3 nodes to be able to function. > >>>> > >>>> > >>>> Kind regards, > >>>> > >>>> Paul Angus > >>>> > >>>> paul.an...@shapeblue.com > >>>> www.shapeblue.com<http://www.shapeblue.com> > >>>> [http://www.shapeblue.com/wp-content/uploads/2017/06/logo.png]< > >>>> http://www.shapeblue.com/> > >>>> > >>>> Shapeblue - The CloudStack Company<http://www.shapeblue.com/> > >>>> www.shapeblue.com<http://www.shapeblue.com> > >>>> Rapid deployment framework for Apache CloudStack IaaS Clouds. CSForge > >> is a > >>>> framework developed by ShapeBlue to deliver the rapid deployment of a > >>>> standardised ... > >>>> > >>>> > >>>> > >>>> 53 Chandos Place, Covent Garden, London WC2N 4HSUK > >>>> @shapeblue > >>>> > >>>> > >>>> > >>>> > >>>> -Original Message- > >>>> From: Jon Marshall > >>>> Sent: 14 March 2018 08:36 > >>>> To: users@cloudstack.apache.org > >>>> Subject: Re: KVM HostHA > >>>> > >>>> Hi Paul > >>>> > >>>> > >>>> My testing does indeed end up with the failed host in maintenance mode > >> but > >>>> the VMs are never migrated. As I posted earlier the management server > >> seems > >>>> to be saying there is no other host that the VM can be migrated to. > >>>> > >>>> > >>>> Couple of questions if you have the time to respond - > >>>> > >>>> > >>>> 1) this article seems to suggest a reboot or powering off a host will > >> end > >>>> result in the VMs being migrated and this was on CS v 4.2.1 back in > >> 2013 so > >>>> does Host HA do something different > >>>> > >>>> > >>>> 2) Whenever one of my two nodes is taken down in testing the active > >>>> compute nodes HA status goes from Available to Ineligible. Should this > >>>> happen ie. is it going to Ineligible stopping the manager from > migrating > >>>> the VMs. > >>>> > >>>> > >>>&g
Re: KVM HostHA
That would make sense. I have another server being used for something else at the moment so I will add that in and update this thread when I have tested Jon From: Paul Angus Sent: 14 March 2018 09:16 To: users@cloudstack.apache.org Subject: RE: KVM HostHA I'd need to do some testing, but I suspect that your problem is that you only have two hosts. At the point that one host is deemed out of service, you only have one host left. With only one host, CloudStack will show the cluster as ineligible. It is extremely common for any system working as a cluster to require a minimum starting point of 3 nodes to be able to function. Kind regards, Paul Angus paul.an...@shapeblue.com www.shapeblue.com<http://www.shapeblue.com> [http://www.shapeblue.com/wp-content/uploads/2017/06/logo.png]<http://www.shapeblue.com/> Shapeblue - The CloudStack Company<http://www.shapeblue.com/> www.shapeblue.com Rapid deployment framework for Apache CloudStack IaaS Clouds. CSForge is a framework developed by ShapeBlue to deliver the rapid deployment of a standardised ... 53 Chandos Place, Covent Garden, London WC2N 4HSUK @shapeblue -Original Message- From: Jon Marshall Sent: 14 March 2018 08:36 To: users@cloudstack.apache.org Subject: Re: KVM HostHA Hi Paul My testing does indeed end up with the failed host in maintenance mode but the VMs are never migrated. As I posted earlier the management server seems to be saying there is no other host that the VM can be migrated to. Couple of questions if you have the time to respond - 1) this article seems to suggest a reboot or powering off a host will end result in the VMs being migrated and this was on CS v 4.2.1 back in 2013 so does Host HA do something different 2) Whenever one of my two nodes is taken down in testing the active compute nodes HA status goes from Available to Ineligible. Should this happen ie. is it going to Ineligible stopping the manager from migrating the VMs. Apologies for all the questions but I just can't get this to work at the moment. If I do eventually get it working I will do a write up for others with same issue :) From: Paul Angus Sent: 14 March 2018 07:45 To: users@cloudstack.apache.org Subject: RE: KVM HostHA Hi Parth, Two answer your questions, VM-HA does not restart VMs on an alternate host if the original host goes down. The management server (without host-HA) cannot tell what happened to the host. It cannot tell if there was a failure in the agent, loss of connectivity to the management NIC or if the host is truly down. In the first two scenarios, the guest VMs can still be running perfectly well, and to restart them elsewhere would be very dangerous. Therefore, the correct thing to do is - nothing but alert the operator. These scenarios are what Host-HA was introduced for. Wrt to STONITH, if no disk activity is detected on the host, host-HA will try to restart (via IPMI) the host. If, after a configurable number of attempts, the host agent still does not check in, then host-HA will shut down the host (via IPMA), trigger VM-HA and mark the host as in-maintenance. paul.an...@shapeblue.com www.shapeblue.com<http://www.shapeblue.com> [http://www.shapeblue.com/wp-content/uploads/2017/06/logo.png]<http://www.shapeblue.com/> Shapeblue - The CloudStack Company<http://www.shapeblue.com/> www.shapeblue.com<http://www.shapeblue.com> Rapid deployment framework for Apache CloudStack IaaS Clouds. CSForge is a framework developed by ShapeBlue to deliver the rapid deployment of a standardised ... 53 Chandos Place, Covent Garden, London WC2N 4HSUK @shapeblue -Original Message- From: Parth Patel Sent: 14 March 2018 05:05 To: users@cloudstack.apache.org Subject: Re: KVM HostHA Hi Paul, Thanks for the clarification. I currently don't have an ipmi enabled hardware (in test environment), but it will be beneficial if you can help me clear out some basic concepts of it: - If HA-enabled VMs are autostarted on another host when current host goes down, what is the need or purpose of HA-host? (other than management server able to remotely control it's power interfaces) - I understood the "Shoot-the-other-node-in-the-head" (STONITH) approach ACS uses to fence the host, but I couldn't find what mechanism or events trigger this? Thanks and regards, Parth Patel On Wed, 14 Mar 2018 at 02:22 Paul Angus wrote: > The management server doesn't ping the host through IPMI. However if > IPMI is not available, you will not be able to use Host HA, as there > is no way for CloudStack to 'fence' the host - that is shut it down to > be sure that a VM cannot start again on that host. > > I can explain why that is necessary if you wish. > > > Kind regards, > > Paul Angus > > paul.an
Re: KVM HostHA
Hi Paul My testing does indeed end up with the failed host in maintenance mode but the VMs are never migrated. As I posted earlier the management server seems to be saying there is no other host that the VM can be migrated to. Couple of questions if you have the time to respond - 1) this article seems to suggest a reboot or powering off a host will end result in the VMs being migrated and this was on CS v 4.2.1 back in 2013 so does Host HA do something different 2) Whenever one of my two nodes is taken down in testing the active compute nodes HA status goes from Available to Ineligible. Should this happen ie. is it going to Ineligible stopping the manager from migrating the VMs. Apologies for all the questions but I just can't get this to work at the moment. If I do eventually get it working I will do a write up for others with same issue :) From: Paul Angus Sent: 14 March 2018 07:45 To: users@cloudstack.apache.org Subject: RE: KVM HostHA Hi Parth, Two answer your questions, VM-HA does not restart VMs on an alternate host if the original host goes down. The management server (without host-HA) cannot tell what happened to the host. It cannot tell if there was a failure in the agent, loss of connectivity to the management NIC or if the host is truly down. In the first two scenarios, the guest VMs can still be running perfectly well, and to restart them elsewhere would be very dangerous. Therefore, the correct thing to do is - nothing but alert the operator. These scenarios are what Host-HA was introduced for. Wrt to STONITH, if no disk activity is detected on the host, host-HA will try to restart (via IPMI) the host. If, after a configurable number of attempts, the host agent still does not check in, then host-HA will shut down the host (via IPMA), trigger VM-HA and mark the host as in-maintenance. paul.an...@shapeblue.com www.shapeblue.com<http://www.shapeblue.com> [http://www.shapeblue.com/wp-content/uploads/2017/06/logo.png]<http://www.shapeblue.com/> Shapeblue - The CloudStack Company<http://www.shapeblue.com/> www.shapeblue.com Rapid deployment framework for Apache CloudStack IaaS Clouds. CSForge is a framework developed by ShapeBlue to deliver the rapid deployment of a standardised ... 53 Chandos Place, Covent Garden, London WC2N 4HSUK @shapeblue -Original Message- From: Parth Patel Sent: 14 March 2018 05:05 To: users@cloudstack.apache.org Subject: Re: KVM HostHA Hi Paul, Thanks for the clarification. I currently don't have an ipmi enabled hardware (in test environment), but it will be beneficial if you can help me clear out some basic concepts of it: - If HA-enabled VMs are autostarted on another host when current host goes down, what is the need or purpose of HA-host? (other than management server able to remotely control it's power interfaces) - I understood the "Shoot-the-other-node-in-the-head" (STONITH) approach ACS uses to fence the host, but I couldn't find what mechanism or events trigger this? Thanks and regards, Parth Patel On Wed, 14 Mar 2018 at 02:22 Paul Angus wrote: > The management server doesn't ping the host through IPMI. However if > IPMI is not available, you will not be able to use Host HA, as there > is no way for CloudStack to 'fence' the host - that is shut it down to > be sure that a VM cannot start again on that host. > > I can explain why that is necessary if you wish. > > > Kind regards, > > Paul Angus > > paul.an...@shapeblue.com > www.shapeblue.com<http://www.shapeblue.com> [http://www.shapeblue.com/wp-content/uploads/2017/06/logo.png]<http://www.shapeblue.com/> Shapeblue - The CloudStack Company<http://www.shapeblue.com/> www.shapeblue.com Rapid deployment framework for Apache CloudStack IaaS Clouds. CSForge is a framework developed by ShapeBlue to deliver the rapid deployment of a standardised ... > 53 Chandos Place, Covent Garden, London WC2N 4HSUK @shapeblue > > > > > -Original Message- > From: Parth Patel > Sent: 13 March 2018 16:57 > To: users@cloudstack.apache.org > Cc: Jon Marshall > Subject: Re: KVM HostHA > > Hi Jon and Victor, > > I think the management server pings your host using ipmi (I really don't > hope this is the case). > In my case, I did not have OOBM enabled at all (my hardware didn't support > it) > I think you could disable OOBM and/or HA-Host and give that a try :) > > On Tue, 13 Mar 2018 at 20:40 victor wrote: > > > Hello Guys, > > > > I have tried the following two cases. > > > > 1, "echo c > /proc/sysrq-trigger" > > > > 2, Pulled the network cable of one of the host > > > > In both cases, the following happened. > > > > = > >
Re: KVM HostHA
Update on below. I pulled the NICs for both management and storage from cnode 1. 1) The UI immediately showed the power state as Unknown but the state was Up. 2) The HA state on cnode1 showed as suspect. The HA state on cnode2 showed as available. 3) After about 4 mins the state on cnode1 went from Up to Alert 4) The HA state on cnode1 showed as Fencing and the HA state on cnode2 showed as Ineligible. The HA enabled VMs on cnode1 never switched over to the working node cnode2. Any ideas ? From: Jon Marshall Sent: 13 March 2018 10:50 To: users@cloudstack.apache.org Subject: Re: KVM HostHAtot stop the server responding to an ipmitool request on the manager eg - "ipmitool -I lanplus -H 172.16.7.29 -U admin3 -P letmein chassis status" from the management server got an answer saying the chassis power was on so CS never registered the compute node as down. I am obviously doing something wrong but cannot work it out. The management server has one NIC - 172.16.7.4 Each compute node has 3 NICs - cnode1cnode2 mangement NIC172.16.7.5 172.16.7.6 vm NIC 172.16.6.130 172.16.6.131 storage - 172.16.250.4 172.16.250.5 Dell LOM (for Idrac) 172.16.7.29172.16.7.30 the dell LOM IPs are the ones used to configure OOBM in the UI If I pull the storage NIC presumably nothing will happen as the ipmitool check is running across the management NIC so I need to pull both ? My understanding of host HA was the management server monitored the compute nodes using ipmitool and if it did not get a response because the host was down it would fence off that host and move the VMs to an active compute node. This is obviously too simplistic so could someone explain how it is meant to work and what it is protecting against ? From: Paul Angus Sent: 13 March 2018 07:01 To: users@cloudstack.apache.org Subject: RE: KVM HostHA Hi all, One small note, unplugging the management NIC will only cause an HA event if the storage is running over that NIC also. Is the storage is over a separate NIC then, the guest VMs will continue to run when the mgmt. NIC is unplugged, Host HA will detect the disk activity and conclude that there is nothing it can do, as the VMs are still running other than mark the hosts as degraded. Kind regards, Paul Angus paul.an...@shapeblue.com www.shapeblue.com<http://www.shapeblue.com> [http://www.shapeblue.com/wp-content/uploads/2017/06/logo.png]<http://www.shapeblue.com/> Shapeblue - The CloudStack Company<http://www.shapeblue.com/> www.shapeblue.com Rapid deployment framework for Apache CloudStack IaaS Clouds. CSForge is a framework developed by ShapeBlue to deliver the rapid deployment of a standardised ... [http://www.shapeblue.com/wp-content/uploads/2017/06/logo.png]<http://www.shapeblue.com/> Shapeblue - The CloudStack Company<http://www.shapeblue.com/> [http://www.shapeblue.com/wp-content/uploads/2017/06/logo.png]<http://www.shapeblue.com/> Shapeblue - The CloudStack Company<http://www.shapeblue.com/> www.shapeblue.com Rapid deployment framework for Apache CloudStack IaaS Clouds. CSForge is a framework developed by ShapeBlue to deliver the rapid deployment of a standardised ... www.shapeblue.com<http://www.shapeblue.com> [http://www.shapeblue.com/wp-content/uploads/2017/06/logo.png]<http://www.shapeblue.com/> Shapeblue - The CloudStack Company<http://www.shapeblue.com/> www.shapeblue.com Rapid deployment framework for Apache CloudStack IaaS Clouds. CSForge is a framework developed by ShapeBlue to deliver the rapid deployment of a standardised ... Rapid deployment framework for Apache CloudStack IaaS Clouds. CSForge is a framework developed by ShapeBlue to deliver the rapid deployment of a standardised ... 53 Chandos Place, Covent Garden, London WC2N 4HSUK @shapeblue -Original Message- From: Parth Patel Sent: 12 March 2018 17:35 To: users@cloudstack.apache.org Subject: Re: KVM HostHA > > Hi Jon, > > As I said, in my case, making the host HA didn't work but by just > having a HA VM running on host and executing - (WARNING) "echo c > > /proc/sysrq-trigger" to simulate a kernel crash on host, the > management server registered it as down and started the VM on another > host. I know I've suggested this before but I insist you give this a > try. Also, you don't need to completely power off the machine manually > but just plugging out the network cable works fine. The cloudstack > agent after losing connection to management server auto reboots > because of KVM heartbeat check shell script mentioned by Rohit Yadav > to one of my earlier queries in other threa
Re: KVM HostHA
I tried "echo c > /proc/sysrq-trigger" which stopped me getting into the server but it did not stop the server responding to an ipmitool request on the manager eg - "ipmitool -I lanplus -H 172.16.7.29 -U admin3 -P letmein chassis status" from the management server got an answer saying the chassis power was on so CS never registered the compute node as down. I am obviously doing something wrong but cannot work it out. The management server has one NIC - 172.16.7.4 Each compute node has 3 NICs - cnode1cnode2 mangement NIC172.16.7.5 172.16.7.6 vm NIC 172.16.6.130 172.16.6.131 storage - 172.16.250.4 172.16.250.5 Dell LOM (for Idrac) 172.16.7.29172.16.7.30 the dell LOM IPs are the ones used to configure OOBM in the UI If I pull the storage NIC presumably nothing will happen as the ipmitool check is running across the management NIC so I need to pull both ? My understanding of host HA was the management server monitored the compute nodes using ipmitool and if it did not get a response because the host was down it would fence off that host and move the VMs to an active compute node. This is obviously too simplistic so could someone explain how it is meant to work and what it is protecting against ? From: Paul Angus Sent: 13 March 2018 07:01 To: users@cloudstack.apache.org Subject: RE: KVM HostHA Hi all, One small note, unplugging the management NIC will only cause an HA event if the storage is running over that NIC also. Is the storage is over a separate NIC then, the guest VMs will continue to run when the mgmt. NIC is unplugged, Host HA will detect the disk activity and conclude that there is nothing it can do, as the VMs are still running other than mark the hosts as degraded. Kind regards, Paul Angus paul.an...@shapeblue.com www.shapeblue.com<http://www.shapeblue.com> [http://www.shapeblue.com/wp-content/uploads/2017/06/logo.png]<http://www.shapeblue.com/> Shapeblue - The CloudStack Company<http://www.shapeblue.com/> www.shapeblue.com Rapid deployment framework for Apache CloudStack IaaS Clouds. CSForge is a framework developed by ShapeBlue to deliver the rapid deployment of a standardised ... 53 Chandos Place, Covent Garden, London WC2N 4HSUK @shapeblue -Original Message- From: Parth Patel Sent: 12 March 2018 17:35 To: users@cloudstack.apache.org Subject: Re: KVM HostHA > > Hi Jon, > > As I said, in my case, making the host HA didn't work but by just > having a HA VM running on host and executing - (WARNING) "echo c > > /proc/sysrq-trigger" to simulate a kernel crash on host, the > management server registered it as down and started the VM on another > host. I know I've suggested this before but I insist you give this a > try. Also, you don't need to completely power off the machine manually > but just plugging out the network cable works fine. The cloudstack > agent after losing connection to management server auto reboots > because of KVM heartbeat check shell script mentioned by Rohit Yadav > to one of my earlier queries in other thread. > > On Mon 12 Mar, 2018, 21:23 Jon Marshall, wrote: > Hi Paul > > > Thanks for the response. > > > I think I am not understanding how it was meant to work then. My > understanding was that the manager used ipmitool to just keep querying > the compute nodes as to their status so I assumed it didn't matter how > you shut the node down, once it was down the manager would get no > response and mark it as down (which it does). > > > I am in testing mode so I think I will just go and pull the power and > see what happens :) > > > Thanks > > > Jon > > > > From: Paul Angus > Sent: 12 March 2018 15:31 > To: users@cloudstack.apache.org > Subject: RE: KVM HostHA > Hi Jon, > > I think that what you guys are finding, is that a controlled host > shutdown, which will cause the agent to shutdown cleanly; Is not > considered an HA event. I wouldn't expect CloudStack to take any > action if you shut down a host, only if the host (agent) stops responding. > > > > > Kind regards, > > Paul Angus > > paul.an...@shapeblue.com > www.shapeblue.com<http://www.shapeblue.com> [http://www.shapeblue.com/wp-content/uploads/2017/06/logo.png]<http://www.shapeblue.com/> Shapeblue - The CloudStack Company<http://www.shapeblue.com/> www.shapeblue.com Rapid deployment framework for Apache CloudStack IaaS Clouds. CSForge is a framework developed by ShapeBlue to deliver the rapid deployment of a standardised ... > [http:/
Re: KVM HostHA
Hi Paul Thanks for the response. I think I am not understanding how it was meant to work then. My understanding was that the manager used ipmitool to just keep querying the compute nodes as to their status so I assumed it didn't matter how you shut the node down, once it was down the manager would get no response and mark it as down (which it does). I am in testing mode so I think I will just go and pull the power and see what happens :) Thanks Jon From: Paul Angus Sent: 12 March 2018 15:31 To: users@cloudstack.apache.org Subject: RE: KVM HostHA Hi Jon, I think that what you guys are finding, is that a controlled host shutdown, which will cause the agent to shutdown cleanly; Is not considered an HA event. I wouldn't expect CloudStack to take any action if you shut down a host, only if the host (agent) stops responding. Kind regards, Paul Angus paul.an...@shapeblue.com www.shapeblue.com<http://www.shapeblue.com> [http://www.shapeblue.com/wp-content/uploads/2017/06/logo.png]<http://www.shapeblue.com/> Shapeblue - The CloudStack Company<http://www.shapeblue.com/> www.shapeblue.com Rapid deployment framework for Apache CloudStack IaaS Clouds. CSForge is a framework developed by ShapeBlue to deliver the rapid deployment of a standardised ... 53 Chandos Place, Covent Garden, London WC2N 4HSUK @shapeblue -Original Message- From: Jon Marshall Sent: 12 March 2018 15:15 To: users@cloudstack.apache.org Subject: Re: KVM HostHA I have the same issue here and am not entirely sure what the behaviour should be. I have one manager node and 2 compute nodes running 4.11 with ipmi working correctly. >From the UI under HA - HA Enabled Yes HA State Available HA Provider kvmhaprovider although interestingly from the "Details" tab it shows - HA enabled No which I assume is a cosmetic issue ? On each compute node I have one HA enabled VM and one non HA enabled VM. I power off a compute node and the UI updates the host status and the VMs on that node stop responding but they never fail over to the other node. Couple of things I noticed - 1) as soon as i power off the compute node the HA state on the other node shows "Ineligible" 2) In the UI the instances all still show as green even though two of them are not available Any help much appreciated From: victor Sent: 07 March 2018 17:01 To: users@cloudstack.apache.org Subject: KVM HostHA Hello Guys, I have installed cloudstack 4.11. I have enabled HA for each hosts I have added. I have also added ipmi successfully (using ipmi driver). The hosts are showing like the following. === HA Enabled Yes HA State Available HA Provider kvmhaprovider == Also the host is showing the following correctly Resource state --> Enabled State --> UP Power state --> On So I have shutdown one of the hosts to see how the KVM hosts Ha is working. I have waited for half an hour. But nothing has happened. What will happen to the VM's in that host, if the host failed to back up. There isn't much from logs. Regards Victor
Re: KVM HostHA
I have the same issue here and am not entirely sure what the behaviour should be. I have one manager node and 2 compute nodes running 4.11 with ipmi working correctly. >From the UI under HA - HA Enabled Yes HA State Available HA Provider kvmhaprovider although interestingly from the "Details" tab it shows - HA enabled No which I assume is a cosmetic issue ? On each compute node I have one HA enabled VM and one non HA enabled VM. I power off a compute node and the UI updates the host status and the VMs on that node stop responding but they never fail over to the other node. Couple of things I noticed - 1) as soon as i power off the compute node the HA state on the other node shows "Ineligible" 2) In the UI the instances all still show as green even though two of them are not available Any help much appreciated From: victor Sent: 07 March 2018 17:01 To: users@cloudstack.apache.org Subject: KVM HostHA Hello Guys, I have installed cloudstack 4.11. I have enabled HA for each hosts I have added. I have also added ipmi successfully (using ipmi driver). The hosts are showing like the following. === HA Enabled Yes HA State Available HA Provider kvmhaprovider == Also the host is showing the following correctly Resource state --> Enabled State --> UP Power state --> On So I have shutdown one of the hosts to see how the KVM hosts Ha is working. I have waited for half an hour. But nothing has happened. What will happen to the VM's in that host, if the host failed to back up. There isn't much from logs. Regards Victor
System VMs and bridge connections
Can someone tell me where I am going wrong or if this is possible (apologies for the long post) I have configured the management server as per installation instructions with just an interface in the management network using subnet 172.16.7.0/27 I then configured a host with 3 separate NICs – 1. Management interface with IP from same subnet as management server IP 2. Second NIC using a subnet of 172.16.6.128/25. This is meant to be the subnet for the VMs. 3. Third NIC with an IP from the 172.16.232.0/28 subnet which is where the NFS server is. I am using KVM so configured Linux bridges eg cloudbr0 for 1), cloudbr1 for 2) and cloudbr2 for 3). I then connected to the UI and did the basic setup. It worked in that the host showed as up and the system VMs came up but neither system VM was working properly so I logged into both and saw the same problem. The VMs had picked up an IP from both the management network and the VM subnet eg. Eth1 – 172.16.7.10 Eth2 – 172.16.6.177 The default gateway was 172.16.6.129 ie. From the VM subnet. But neither VM could ping that default gateway. When I looked at the bridges on the host the mac address of eth2 was seen on cloudbr0 which is the management subnet. When I then logged into the physical L3 switch I could see eth2’s mac address in the management vlan and not the VM subnet vlan. So it seems like the bridging between the VMs and the physical NICs is not working properly or more likely there is something basic I am not understanding. Should I be looking to use advanced networking or is the above setup possible with just basic network. I am using cloudstack v4.10 and feel a bit of an idiot as all the docs say setting up basic networking is really easy 😊 (I did do an install where it is all on the same server and that worked fine). Any pointers much appreciated. PS. I cannot console to the system VMs because of the above and the SSVM does not have interface in the NFS network even though there is a physical NIC on the host.