Re: [Openstack] How to deploy OpenStack on thousands of nodes?
Hi Brent, Thanks very much for your sharing of your experience. I want to clarify that we are indeed trying to deploy OpenStack on thousands of nodes. The first block we met was keepalived based HA which needs multicast. Thus we are reviewing our network topology design again. At the same time we knew from the docs of rabbmit needs broadcast. And in the first try we made a wrong configuration for rabbitmq thus it did not work. We will have a detailed test on both of that. Best, Kylin CG 2013/6/26 Brent Roskos brent.ros...@solinea.com Kylin, I think there is some confusion as to the term broadcast. Many of the Rabbit docs describe the delivery of a message from one publisher to multiple subscribers as a 'broadcast'. This is not to be confused with a network broadcast where traffic is sent over the network broadcast address. Rabbit uses tcp and a publisher/subscriber model - even in more complex configurations where there are multiple publishers (think cluster). I have personally implemented large openstack compute clouds that had many hypervisors, each on individual subnets and a rabbit server on yet another subnet and all message traffic worked as expected. There were no actual network broadcasts to worry about. In my previous message I had assumed that you were actually in the process of implementation and were running into problems. It now seems that is not the case - you are in a review or planning period. However - as I noted above the openstack queues on rabbit will work in a distributed network configuration as long as all of the subscribers can reach the rabbit server on tcp/5672. I've personally done it and not had an issue. Brent On Tue, Jun 25, 2013 at 9:40 PM, Sg Kylin kylin7...@gmail.com wrote: Hi Brent, Thanks for your reply! But we are afraid that Rabbitmq needs broadcast to work correctly and usually broadcast is not available in cross-subnets deployments. That is what we are worrying about... Best, Kylin CG 2013/6/26 Brent Roskos brent.ros...@solinea.com By default rabbit uses tcp port 5672 for communication.. tcp can certainly cross subnet boundaries and be routed without issue. I suggest you do some network troubleshooting; ping your rabbit server then telnet to port 5672 on the rabbit server from hosts on the other subnets. Check your router acls and local host firewalls. Check to make sure that your rabbit server has a route to get back to the other subnets with the reply. Dual homed hosts with one local connection and one Internet connection will need specific routes added to allow them to reach other local subnets since you wouldn't want that traffic to try to traverse the default route which points out to the Internet. This is true even if you are using virtual interfaces with vlans instead of separate physical interfaces. Regards, Brent On Tue, Jun 25, 2013 at 6:10 AM, Sg Kylin kylin7...@gmail.com wrote: Hi All, We are currently trying to deploy OpenStack on thousands of nodes. We are using Grizzly stable version and Ubuntu 12.04.2. However, the big problem we meet now is the network topology. If we want to use HA (haproxy + keepalived) for the controller nodes on which *-apis are running as well as network nodes which are deployed across different VLANs (VLANs can reach each other by setting gateways), e.g 10.1.0.0/16 and 10.2.0.0/16, HA would not work correctly. Also we found that rabbitmq could not work when nova-* services were deployed across different subnets. Thus, we want to know whether HA and rabbitmq can be used across subnets? If it not true, we can only deploy them in a single flat layer 2 net, which seems unfeasible in real-world because of broadcast storms... Best, Kylin CG ___ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp ___ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp
Re: [Openstack] [openstack-dev] How to deploy OpenStack on thousands of nodes?
We minimized the impact of this by creating a small subnet that just had the switch address, host addresses and vrrp address in it. It seems a feasible solution for us. We avoided pacemaker in this particular instance because the keepalived setup and configuration was so very simple - only a couple of lines in a config file, and because we didn't need any of the other available HA features. Actually this is also the reason we chose keepalived. 2013/6/26 Brent Roskos brent.ros...@solinea.com I stand corrected. Mostly confused since the keepalived didn't actually need addresses in the multicast IP range. It does use it - as I can see with ifconfig. We minimized the impact of this by creating a small subnet that just had the switch address, host addresses and vrrp address in it. All the chatter was contained within that block. We avoided pacemaker in this particular instance because the keepalived setup and configuration was so very simple - only a couple of lines in a config file, and because we didn't need any of the other available HA features. Brent On Wed, Jun 26, 2013 at 10:03 AM, Jesse Pretorius jesse.pretor...@gmail.com wrote: On 26 June 2013 15:42, Brent Roskos brent.ros...@solinea.com wrote: I've also used keepalived for services that did not scale laterally. In this case I put two horizon servers behind an active/passive virtual IP. This was also pretty simple as there was no need to maintain state information in for active passive. That wouldn't work quite as well when capacity thresholds started to become a concern. Neither of the above required multicast support - which really helps with deployment options. *ahem* keepalived most definitely requires multicast support for its vrrp... and it's quite noisy. If there's a way to make it use unicast instead, I'd definitely like to know. corosync pacemaker can do a virtual IP failover between as many nodes as you like using unicast instead of multicast. ___ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp ___ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp ___ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp
[Openstack] How to deploy OpenStack on thousands of nodes?
Hi All, We are currently trying to deploy OpenStack on thousands of nodes. We are using Grizzly stable version and Ubuntu 12.04.2. However, the big problem we meet now is the network topology. If we want to use HA (haproxy + keepalived) for the controller nodes on which *-apis are running as well as network nodes which are deployed across different VLANs (VLANs can reach each other by setting gateways), e.g 10.1.0.0/16 and 10.2.0.0/16, HA would not work correctly. Also we found that rabbitmq could not work when nova-* services were deployed across different subnets. Thus, we want to know whether HA and rabbitmq can be used across subnets? If it not true, we can only deploy them in a single flat layer 2 net, which seems unfeasible in real-world because of broadcast storms... Best, Kylin CG ___ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp
Re: [Openstack] How to deploy OpenStack on thousands of nodes?
Hi Brent, Thanks for your reply! But we are afraid that Rabbitmq needs broadcast to work correctly and usually broadcast is not available in cross-subnets deployments. That is what we are worrying about... Best, Kylin CG 2013/6/26 Brent Roskos brent.ros...@solinea.com By default rabbit uses tcp port 5672 for communication.. tcp can certainly cross subnet boundaries and be routed without issue. I suggest you do some network troubleshooting; ping your rabbit server then telnet to port 5672 on the rabbit server from hosts on the other subnets. Check your router acls and local host firewalls. Check to make sure that your rabbit server has a route to get back to the other subnets with the reply. Dual homed hosts with one local connection and one Internet connection will need specific routes added to allow them to reach other local subnets since you wouldn't want that traffic to try to traverse the default route which points out to the Internet. This is true even if you are using virtual interfaces with vlans instead of separate physical interfaces. Regards, Brent On Tue, Jun 25, 2013 at 6:10 AM, Sg Kylin kylin7...@gmail.com wrote: Hi All, We are currently trying to deploy OpenStack on thousands of nodes. We are using Grizzly stable version and Ubuntu 12.04.2. However, the big problem we meet now is the network topology. If we want to use HA (haproxy + keepalived) for the controller nodes on which *-apis are running as well as network nodes which are deployed across different VLANs (VLANs can reach each other by setting gateways), e.g 10.1.0.0/16 and 10.2.0.0/16, HA would not work correctly. Also we found that rabbitmq could not work when nova-* services were deployed across different subnets. Thus, we want to know whether HA and rabbitmq can be used across subnets? If it not true, we can only deploy them in a single flat layer 2 net, which seems unfeasible in real-world because of broadcast storms... Best, Kylin CG ___ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp ___ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp
Re: [Openstack] 回复: vm can't connect to remote host (169.254.169.254)
Hi All, This problem also took me one day. Currently I got this error from /var/log/quantum/metadata-agent.log: 2013-05-08 17:55:53ERROR [quantum.agent.metadata.agent] Unexpected error. Traceback (most recent call last): File /usr/lib/python2.7/dist-packages/quantum/agent/metadata/agent.py, line 88, in __call__ return self._proxy_request(instance_id, req) File /usr/lib/python2.7/dist-packages/quantum/agent/metadata/agent.py, line 137, in _proxy_request resp, content = h.request(url, headers=headers) File /usr/lib/python2.7/dist-packages/httplib2/__init__.py, line 1444, in request (response, content) = self._request(conn, authority, uri, request_uri, method, body, headers, redirections, cachekey) File /usr/lib/python2.7/dist-packages/httplib2/__init__.py, line 1196, in _request (response, content) = self._conn_request(conn, request_uri, method, body, headers) File /usr/lib/python2.7/dist-packages/httplib2/__init__.py, line 1132, in _conn_request conn.connect() File /usr/lib/python2.7/dist-packages/httplib2/__init__.py, line 798, in connect raise socket.error, msg error: [Errno 111] ECONNREFUSED Any advice is appreciate! Best, 2013/5/8 zengshan2008 zengshan2...@gmail.com ** Hi all, found that : root@networknode:/etc# ovs-vsctl show 690ad327-14ad-410e-b310-2d23e4c78223 Bridge br-int Port br-int Interface br-int type: internal Port int-br-em3 Interface int-br-em3 Port qr-d9cb6d6d-5e tag: 1 Interface qr-d9cb6d6d-5e type: internal Port tape10a4f07-60 tag: 1 *Interface tape10a4f07-60 *type: internal Bridge br-em3 Port em3 Interface em3 Port br-em3 Interface br-em3 type: internal Port phy-br-em3 Interface phy-br-em3 Bridge br-em1 Port em1 Interface em1 Port qg-daf2c037-cc Interface qg-daf2c037-cc type: internal Port br-em1 Interface br-em1 type: internal ovs_version: 1.4.3 *root@networknode:/etc* root@networknode:/etc*# ifconfig tape10a4f07-60 tape10a4f07-60: error fetching interface information: Device not found* is this the reason I can't ping vm? and how can I solve it? 2013-05-08 -- zengshan2008 -- *发件人:*zengshan2008 *发送时间:*2013-05-08 14:59 *主题:*vm can't connect to remote host (169.254.169.254) *收件人:*Stephen Kramercelticrem...@gmail.com,Anil Vishnoi vishnoia...@gmail.com,gong yong shenggong...@linux.vnet.ibm.com *抄送:*openstackopenstack@lists.launchpad.net ** Hi all, I've installed openstack using quantum by the guide https://github.com/mseknibilel/OpenStack-Folsom-Install-guide/blob/master/OpenStack_Folsom_Install_Guide_WebVersion.rst In my environment, I have a controller node, a network node and a compute node, I am using openvswitch plugin and everything looks fine, but the following logs in the vm console make me crazy. can anybody help me out? May 8 00:45:57 cirros kern.info kernel: [2.468239] eth0: IPv6 duplicate address fe80::f816:3eff:fe80:636f detected! debug end ## cloud-setup: failed to read iid from metadata. tried 30 WARN: /etc/rc3.d/S45-cloud-setup failed Starting dropbear sshd: generating rsa key... generating dsa key... OK = cloud-final: system completely up in 41.24 seconds wget: can't connect to remote host (169.254.169.254): Network is unreachable wget: can't connect to remote host (169.254.169.254): Network is unreachable wget: can't connect to remote host (169.254.169.254): Network is unreachable instance-id: public-ipv4: local-ipv4 : wget: can't connect to remote host (169.254.169.254): Network is unreachable cloud-userdata: failed to read instance id WARN: /etc/rc3.d/S99-cloud-userdata failed 2013-05-08 -- zengshan2008 ** ___ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp ___ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp
Re: [Openstack] 回复: vm can't connect to remote host (169.254.169.254)
Sorry, I forgot the import information. I used Grizzly(2013.1) and followed the guide in https://github.com/mseknibilel/OpenStack-Grizzly-Install-Guide/blob/master/OpenStack_Grizzly_Install_Guide.rst . 2013/5/8 zengshan2008 zengshan2...@gmail.com ** which version are you using? and which mode are you using? there isn't metadata-agent.log file in my environment. 2013-05-08 -- zengshan2008 -- *发件人:*Sg Kylin *发送时间:*2013-05-08 18:02 *主题:*Re: [Openstack] 回复: vm can't connect to remote host (169.254.169.254) *收件人:*zengshan2008zengshan2...@gmail.com *抄送:*Stephen Kramercelticrem...@gmail.com,Anil Vishnoi vishnoia...@gmail.com,gong yong shenggong...@linux.vnet.ibm.com ,openstackopenstack@lists.launchpad.net Hi All, This problem also took me one day. Currently I got this error from /var/log/quantum/metadata-agent.log: 2013-05-08 17:55:53ERROR [quantum.agent.metadata.agent] Unexpected error. Traceback (most recent call last): File /usr/lib/python2.7/dist-packages/quantum/agent/metadata/agent.py, line 88, in __call__ return self._proxy_request(instance_id, req) File /usr/lib/python2.7/dist-packages/quantum/agent/metadata/agent.py, line 137, in _proxy_request resp, content = h.request(url, headers=headers) File /usr/lib/python2.7/dist-packages/httplib2/__init__.py, line 1444, in request (response, content) = self._request(conn, authority, uri, request_uri, method, body, headers, redirections, cachekey) File /usr/lib/python2.7/dist-packages/httplib2/__init__.py, line 1196, in _request (response, content) = self._conn_request(conn, request_uri, method, body, headers) File /usr/lib/python2.7/dist-packages/httplib2/__init__.py, line 1132, in _conn_request conn.connect() File /usr/lib/python2.7/dist-packages/httplib2/__init__.py, line 798, in connect raise socket.error, msg error: [Errno 111] ECONNREFUSED Any advice is appreciate! Best, 2013/5/8 zengshan2008 zengshan2...@gmail.com ** Hi all, found that : root@networknode:/etc# ovs-vsctl show 690ad327-14ad-410e-b310-2d23e4c78223 Bridge br-int Port br-int Interface br-int type: internal Port int-br-em3 Interface int-br-em3 Port qr-d9cb6d6d-5e tag: 1 Interface qr-d9cb6d6d-5e type: internal Port tape10a4f07-60 tag: 1 *Interface tape10a4f07-60 *type: internal Bridge br-em3 Port em3 Interface em3 Port br-em3 Interface br-em3 type: internal Port phy-br-em3 Interface phy-br-em3 Bridge br-em1 Port em1 Interface em1 Port qg-daf2c037-cc Interface qg-daf2c037-cc type: internal Port br-em1 Interface br-em1 type: internal ovs_version: 1.4.3 *root@networknode:/etc* root@networknode:/etc*# ifconfig tape10a4f07-60 tape10a4f07-60: error fetching interface information: Device not found* is this the reason I can't ping vm? and how can I solve it? 2013-05-08 -- zengshan2008 -- *发件人:*zengshan2008 *发送时间:*2013-05-08 14:59 *主题:*vm can't connect to remote host (169.254.169.254) *收件人:*Stephen Kramercelticrem...@gmail.com,Anil Vishnoi vishnoia...@gmail.com,gong yong shenggong...@linux.vnet.ibm.com *抄送:*openstackopenstack@lists.launchpad.net ** Hi all, I've installed openstack using quantum by the guide https://github.com/mseknibilel/OpenStack-Folsom-Install-guide/blob/master/OpenStack_Folsom_Install_Guide_WebVersion.rst In my environment, I have a controller node, a network node and a compute node, I am using openvswitch plugin and everything looks fine, but the following logs in the vm console make me crazy. can anybody help me out? May 8 00:45:57 cirros kern.info kernel: [2.468239] eth0: IPv6 duplicate address fe80::f816:3eff:fe80:636f detected! debug end ## cloud-setup: failed to read iid from metadata. tried 30 WARN: /etc/rc3.d/S45-cloud-setup failed Starting dropbear sshd: generating rsa key... generating dsa key... OK = cloud-final: system completely up in 41.24 seconds wget: can't connect to remote host (169.254.169.254): Network is unreachable wget: can't connect to remote host (169.254.169.254): Network is unreachable wget: can't connect to remote host (169.254.169.254): Network is unreachable instance-id: public-ipv4: local-ipv4 : wget: can't connect to remote host (169.254.169.254): Network is unreachable cloud-userdata: failed to read instance id WARN: /etc/rc3.d/S99-cloud-userdata failed 2013-05-08 -- zengshan2008 ** ___ Mailing list