[Openstack] adding/removing security groups at instance runtime
Hey all, I was googling around trying to find a way to add a new security group to a running instance. I however just found the fact that this is said to work with trunk but not how exactly nor what components I should update to trunk to make it actually happen. I am having a multi-node OpenStack Essex cluster and would like to update only the parts that I actually need to get this feature in. As far as I could search, it seems that at least python-novaclient is responsible for the client side. But do I also need to update nova-api or something similar? Many thanks in advance, Christian Parpart. ___ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp
[Openstack] nova-network sometimes stops routing Floating IPs
Hey all, I am having a rather serious with the central (OpenStack' Essex) nova-network gateway we have set up. We have quite some floating IPs assigned to a few virtual machines, and it just works. But since a few days (or weeks) I notice that some VM does not get inbound traffic from external IPs, such as the internet, through the floating IP to be DNAT'ed to the application VM. We tried to debug it, and it is definitely something with nova-network service going wrong here. Now, a `pkill dnsmasq sleep 2 initctl restart nova-network` actually fixes it. The question now is *why* does it fix it. The routing tables do not really seem to have changed, unless I was missing something while checking :). Is there anything nova-network is doing as well except setting up IPs and iptables? Or - what is nova-network actually doing in general and what could be the reason the run into such a situation. Many thanks in advance, Christian. ___ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp
[Openstack] Essex Dashboard: KeyError at /nova/instances_and_volumes/
Hey all, since quite some weeks I am getting an error page instead of the Instances and Volumes page in the Essex Horizon Dashboard with the above title and the below detailed error output: Environment: Request Method: GETRequest URL: http://controller.rz.dawanda.com/nova/instances_and_volumes/ Django Version: 1.3.1Python Version: 2.7.3Installed Applications: ['openstack_dashboard', 'django.contrib.sessions', 'django.contrib.messages', 'django.contrib.staticfiles', 'django_nose', 'horizon', 'horizon.dashboards.nova', 'horizon.dashboards.syspanel', 'horizon.dashboards.settings']Installed Middleware: ('django.middleware.common.CommonMiddleware', 'django.middleware.csrf.CsrfViewMiddleware', 'django.contrib.sessions.middleware.SessionMiddleware', 'django.contrib.messages.middleware.MessageMiddleware', 'openstack_dashboard.middleware.DashboardLogUnhandledExceptionsMiddleware', 'horizon.middleware.HorizonMiddleware', 'django.middleware.doc.XViewMiddleware', 'django.middleware.locale.LocaleMiddleware') Traceback:File /usr/lib/python2.7/dist-packages/django/core/handlers/base.py in get_response 111. response = callback(request, *callback_args, **callback_kwargs)File /usr/lib/python2.7/dist-packages/horizon/decorators.py in dec 40. return view_func(request, *args, **kwargs)File /usr/lib/python2.7/dist-packages/horizon/decorators.py in dec 55. return view_func(request, *args, **kwargs)File /usr/lib/python2.7/dist-packages/horizon/decorators.py in dec 40. return view_func(request, *args, **kwargs)File /usr/lib/python2.7/dist-packages/django/views/generic/base.py in view 47. return self.dispatch(request, *args, **kwargs)File /usr/lib/python2.7/dist-packages/django/views/generic/base.py in dispatch 68. return handler(request, *args, **kwargs)File /usr/lib/python2.7/dist-packages/horizon/tables/views.py in get 105. handled = self.construct_tables()File /usr/lib/python2.7/dist-packages/horizon/tables/views.py in construct_tables 96. handled = self.handle_table(table)File /usr/lib/python2.7/dist-packages/horizon/tables/views.py in handle_table 68. data = self._get_data_dict()File /usr/lib/python2.7/dist-packages/horizon/tables/views.py in _get_data_dict 37. self._data[table._meta.name] = data_func()File /usr/lib/python2.7/dist-packages/horizon/dashboards/nova/instances_and_volumes/views.py in get_volumes_data 74. att['instance'] = instances[att['server_id']] Exception Type: KeyError at /nova/instances_and_volumes/Exception Value: u'8aa2989e-85ea-4975-b81b-04d06dbf8013' -- -- now I wonder in how far that is a bug in the software and/or whether I have an invalid entry in my nova database that I can fix by hand. if so, does anyone know how to actually work around this? I do really need this (now not working) page :-) Regards, Christian Parpart. ___ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp
[Openstack] floating IPs not routed from inside
Hey all, we're having quite a few compute nodes with Essex installed and one central nova-network gateway. We now have a few floating IPs set up to route from the world through the gateway to these VMs. However, accessing these floating (public) IPs from inside a *tenant's VM*results into timeouts, but accessing the very same IP from a compute node (hypervisor) hosting those VMs actually does work. Now I'm a bit confused, it seems like a routing issue or iptables NAT thing and would be really greatful if anyone can help me out with a hint. :) Is this known to not work or what do you need from me to actually understand my issue a bit more? Many thanks in advance, Christian Parpart. ___ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp
Re: [Openstack] looking for a Nova scheduler filter plugin to boot nodes on named hosts
Hey all, many thanks for your replies so far. In general, I must say that there really is an absolute need for explicit provisioning, that is, deciding by the admin what single host to prefer (but still to reject when there're just no resources left of course). - Filter like SameHost filter only works when there is already a host, and then you've to look up and built up the correlation first (not a big problem, but doesn't feel comfortable). - IsolatedHosts filter doesn't make that much sense, as we are using one general template-%{TIMESTAMP} to bootstrap some node and then set up everything else inside, and we usually still may have more than one VM on that compute node (e.g. a memcached VM and a postfix VM). - availability zones, so I got told, are deprecated already (dunno) and I can't give every compute node a different availability zone, as - tbh - that's what I have hostnames for :-) Philip, I'd really like to dive into developing such a plugin, let's call it HostnameFilter-plugin, that the operator can pass one (or a set of) hostname(s) that are allowed to spawn the VM on. However, I just wrote Python once, and even dislike the syntax a bit, not saying I hate it, but still :-) Is there any guide (/tutorial) for reference (or hello_world nova scheduler plugin) I can look at to learn on how to write such a plugin? Many thanks for your replies so far, Christian Parpart. On Tue, Oct 16, 2012 at 4:22 PM, Day, Phil philip@hp.com wrote: Hi Christian, ** ** For a more general solution you might want to look at the code that supports passing in “—availabilty_zone=az:host” (look for forced_host in compute/api.py). Currently this is limited to admin, but I think that should be changed to be a specific action that can be controlled by policy (we have a change in preparation for this). ** ** Cheers, Phil ** ** *From:* openstack-bounces+philip.day=hp@lists.launchpad.net [mailto: openstack-bounces+philip.day=hp@lists.launchpad.net] *On Behalf Of *GMI M *Sent:* 16 October 2012 15:13 *To:* Christian Parpart *Cc:* openstack@lists.launchpad.net *Subject:* Re: [Openstack] looking for a Nova scheduler filter plugin to boot nodes on named hosts ** ** Hi Christian, I think you might be able to use the existing filters in Essex. For example, you can add the following lines in the nova.conf of the controller host (or where nova-scheduler runs) and restart nova-scheduler: isolated_hosts=nova7 isolated_images=sadsd1e35dfe63 This will allow you to run the image with ID sadsd1e35dfe63 only on the compute host nova7. You can also pass a list of compute servers in the isolated_hosts, if you have the need. I certainly see the use-case for this feature, for example when you want to run Windows based instances and you don't want to buy a Windows datacenter license for each nova-compute host, but only for a few that will run Windows instances. I hope this helps you. On Mon, Oct 15, 2012 at 7:45 PM, Christian Parpart tra...@gmail.com wrote: Hi all, ** ** I am looking for a (Essex) Nova scheduler plugin that parses the scheduler_hints to get a hostname of the hypervisor to spawn the actual VM on, rejecting any other node. ** ** This allows us to explicitely spawn a VM on a certain host (yes, there are really usecases where you want that). :-) ** ** I was trying to build my own and searching around since I couldn't believe I was the only one, but didn't find one yet. ** ** Does anyone of you maybe have the skills to actually write that simple plugin, or even maybe knows where such a plugin has already been developed? ** ** Many thanks in advance, Christian Parpart. ___ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp ** ** ___ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp
[Openstack] looking for a Nova scheduler filter plugin to boot nodes on named hosts
Hi all, I am looking for a (Essex) Nova scheduler plugin that parses the scheduler_hints to get a hostname of the hypervisor to spawn the actual VM on, rejecting any other node. This allows us to explicitely spawn a VM on a certain host (yes, there are really usecases where you want that). :-) I was trying to build my own and searching around since I couldn't believe I was the only one, but didn't find one yet. Does anyone of you maybe have the skills to actually write that simple plugin, or even maybe knows where such a plugin has already been developed? Many thanks in advance, Christian Parpart. ___ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp
Re: [Openstack] irregular but frequent networking issues (Essex on Ubuntu 12.04)
I ran into this bug quite a few months ago, too, but worked around it by loading vhost_net kernel driver. Currently I get network outages for just a few seconds, like freezes for ~ 10-15 seconds, and then everything works like nothing have ever happened. I unfortunately can't find anything in the logs inside VM, hypervisor nor nova-network node. Regards, Christian. On Fri, Oct 5, 2012 at 4:30 PM, Alejandro Comisario alejandro.comisa...@mercadolibre.com wrote: Hi Cris, maybe your problem is related to this bug ? https://bugs.launchpad.net/ubuntu/+source/qemu-kvm/+bug/997978 Regards. Ale On Fri, Oct 5, 2012 at 8:44 AM, Christian Parpart tra...@gmail.comwrote: Hey all, we're pretty happy about our new OpenStack Essex installation atop of Ubuntu 12.04 (hypervisor and guests). We use KVM (tenant is in a VLAN) as virtualization technology, having about 15 compute nodes, and a central nova-network node to act as the gateway (yet to be fully HA'd, however). On that gateway we're having a PPTP VPN so we can log in on any host or guest though this VPN. Our problem however is, that from time to time (I think it's daily and even multiple times per day) we're encountering kind of networking freezes. I first noticed it as my SSH session froze for a few seconds from my desktop - VPN (nova-network node) - KVM guest. I quickly checked others, and they hang, too. I checked a hypervisor, which didn't hang, so it is not a general networking issue (like PPTP or SSH or alike). Now, it feels like there is some problem with networking from and/or to KVM guests and I absolutely have no clue on how to trace this down. It really feels like a bug, but that's really out of my scope, and that's why I'm seeking for advice here, since we just can't stay in this situation :-) It is confirmed that we're having this from desktop - VPN - KVM and from physical node (in data center) to KVM. However, I do not yet know whether all KVMs are affected at once (which would indicate that the issue MAY be caused by the central nova-network node) or whether it is just hypervisor based (so it may be due to some hypervisor's state) or just plain random across all 15 compute nodes we have. I now think that this issue may indeed be KVM networking related I'll be happy about any hints and proposals you can provide me in order to track this issue down. Please tell me about any further information you need. Many thanks in advance, Christian Parpart. ___ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp ___ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp
Re: [Openstack] inter-tenant and VM-to-bare-metal communication policies/restrictions.
On Wed, Aug 15, 2012 at 4:16 AM, Lorin Hochstein lo...@nimbisservices.comwrote: On Jul 5, 2012, at 11:47 AM, Christian Parpart tra...@gmail.com wrote: Hi all, I am running multiple compute nodes and a single nova-network node, that is to act as a central gateway for the tenant's VMs. However, since this nova-network node (of course) knows all routes, every VM of any tenant can talk to each other, including to the physical nodes, which I highly disagree with and would like to restrict that. :-) If you add this to nova.conf: allow_same_net_traffic=false It should prevent the VMs from communicating with each other. From http://docs.openstack.org/essex/openstack-compute/admin/content/compute-options-reference.html#d6e3133 Hey Lorin, according to this rather short documentation for that flag, it is unfortunately very unclear what they meant with from same network - I hope to misread that line :-) That is, it sounds like it does prevent communication with ANY of the other VMs, but I just want to disallow communication from one tenant to another. Like, having a production tenant and a staging tenant, they should not be able to talk to each other but a VM from the production tenant should be able to talk to another VM within the same tenant. It might be helpful, if one may want to find some more clear words to this flag within the flag reference :-) I would also like to know on what physical hosts I need this flag to be applied, too. I mean, is it just the nova-network node(s) or all compute nodes, that this flag takes affect? Many thanks in advance, Christian Parpart. ___ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp
Re: [Openstack] inter-tenant and VM-to-bare-metal communication policies/restrictions.
On Fri, Jul 6, 2012 at 6:39 AM, romi zhang romizhang1...@163.com wrote: I am also very interesting about this and also try to find a way to forbid the talking between VMs on same compute+network node. J ** ** Romi ** ** *发件人:* openstack-bounces+romizhang1968=163@lists.launchpad.net[mailto: openstack-bounces+romizhang1968=163@lists.launchpad.net] *代表 *Christian Parpart *发送时间:* 2012年7月5日 星期四 23:48 *收件人:* openstack@lists.launchpad.net *主题:* [Openstack] inter-tenant and VM-to-bare-metal communication policies/restrictions. ** ** Hi all, ** ** I am running multiple compute nodes and a single nova-network node, that is to act as a central gateway for the tenant's VMs. ** ** However, since this nova-network node (of course) knows all routes, every VM of any tenant can talk to each other, including to the physical nodes, which* *** I highly disagree with and would like to restrict that. :-) ** ** root@gw1:~# ip route show default via $UPLINK_IP dev eth1 metric 100 10.10.0.0/19 dev eth0 proto kernel scope link src 10.10.30.5 10.10.40.0/21 dev br100 proto kernel scope link src 10.10.40.1 10.10.48.0/24 dev br101 proto kernel scope link src 10.10.48.1 10.10.49.0/24 dev br102 proto kernel scope link src 10.10.49.1 $PUBLIC_NET/28 dev eth1 proto kernel scope link src $PUBLIC_IP 192.168.0.0/16 dev eth0 proto kernel scope link src 192.168.2.1 ** ** - 10.10.0.0/19 is the network for bare metal nodes, switches, PDUs, etc.** ** - 10.10.40.0/21(br100) is the production tenant - 10.10.48.0/24 (br101) is the staging tenant - 10.10.49.0/24 (br102) is the playground tenant. - 192.168.0.0/16 is the legacy network (management and VM nodes) ** ** No tenant's VM shall be able to talk to a VM of another tenant. And ideally no tenant's VM should be able to talk to the management network either. ** ** Unfortunately, since we're migrating a live system, and we also have production services on the bare-metal nodes, I had to add special routes** ** to allow the legacy installations to communicate to the new production** ** VMs for the transition phase. I hope I can remove that ASAP. ** ** Now, checking iptables on the nova-network node: ** ** root@gw1:~# iptables -t filter -vn -L FORWARD Chain FORWARD (policy ACCEPT 64715 packets, 13M bytes) pkts bytes target prot opt in out source destination 36M 29G nova-filter-top all -- * * 0.0.0.0/0 0.0.0.0/0 36M 29G nova-network-FORWARD all -- * * 0.0.0.0/0 0.0.0.0/0 ** ** root@gw1:~# iptables -t filter -vn -L nova-filter-top Chain nova-filter-top (2 references) pkts bytes target prot opt in out source destination 36M 29G nova-network-local all -- * * 0.0.0.0/0 0.0.0.0/0 ** ** root@gw1:~# iptables -t filter -vn -L nova-network-local Chain nova-network-local (1 references) pkts bytes target prot opt in out source destination root@gw1:~# iptables -t filter -vn -L nova-network-FORWARD Chain nova-network-FORWARD (1 references) pkts bytes target prot opt in out source destination 0 0 ACCEPT all -- br102 * 0.0.0.0/0 0.0.0.0/0 0 0 ACCEPT all -- * br102 0.0.0.0/0 0.0.0.0/0 0 0 ACCEPT udp -- * * 0.0.0.0/0 10.10.49.2 udp dpt:1194 18M 11G ACCEPT all -- br100 * 0.0.0.0/0 0.0.0.0/0 18M 18G ACCEPT all -- * br100 0.0.0.0/0 0.0.0.0/0 0 0 ACCEPT udp -- * * 0.0.0.0/0 10.10.40.2 udp dpt:1194 106K 14M ACCEPT all -- br101 * 0.0.0.0/0 0.0.0.0/0 79895 23M ACCEPT all -- * br101 0.0.0.0/0 0.0.0.0/0 0 0 ACCEPT udp -- * * 0.0.0.0/0 10.10.48.2 udp dpt:1194 ** ** Now I see, that all traffic from tenant staging (br101) for example allows any traffic from/to any destination (-j ACCEPT). I'd propose to reduce this limitation to the public gateway interface (eth1 in my case) and that this value shall be configurable in the nova.conf file. ** ** Is there any other thing, I might have overseen, to disallow inter-tenant communication and to disallow tenant-VM-to-bare-metal communication? ** ** Many thanks in advance, Christian Parpart. Am I (almost) the only one interested in disallowing inter-tenant communication, or am I overseeing something in the docs? :-( Christian. ___ Mailing list
Re: [Openstack] About images list in dashboard
On Fri, Jul 13, 2012 at 10:56 PM, John Postlethwait john.postlethw...@nebula.com wrote: Well, it sounds like this issue only happens in Essex, and is no longer an issue in Folsom, so the bug will just be closed as invalid, as it is now fixed in the newer code... Please backport this bug then. That is, the bug report indeed makes absolutely sense to me. :-) John Postlethwait Nebula, Inc. 206-999-4492 On Friday, July 13, 2012 at 1:36 PM, Sam Su wrote: Thank you for you guys' suggestions. Even so, I'd like to file a bug to track this issue, if someone else have the same problem, they would know what happened and what progressed from the bug trace. Sam On Fri, Jul 13, 2012 at 12:43 PM, Gabriel Hurley gabriel.hur...@nebula.com wrote: Glance pagination was added in Folsom. Adding a bug for this won’t help since it’s already been added in the current code. ** ** **- **Gabriel ** ** *From:* openstack-bounces+gabriel.hurley=nebula@lists.launchpad.net[mailto: openstack-bounces+gabriel.hurley=nebula@lists.launchpad.net] *On Behalf Of *John Postlethwait *Sent:* Friday, July 13, 2012 12:05 PM *To:* Sam Su *Cc:* openstack *Subject:* Re: [Openstack] About images list in dashboard ** ** Hi Sam, ** ** Would you mind filing a bug against Horizon with the details so that we can get it fixed? You can do so here: https://bugs.launchpad.net/horizon/+filebug ** ** ** ** ** ** John Postlethwait Nebula, Inc. 206-999-4492 ** ** On Thursday, July 12, 2012 at 3:55 PM, Sam Su wrote: I finally found why this happened. ** ** If in one tenant, there are more than 30 images and snapshots so that glance cannot return the images list in one response, some images and snapshots will not be seen in the page Images Snapshots of Horizon. ** ** ** ** Sam ** ** ** ** On Thu, Jul 5, 2012 at 1:19 PM, Sam Su susltd...@gmail.com wrote: Thank you for your suggestion. ** ** I can see all images in other tenants from dashboard, so I think the images type should be ok. ** ** ** ** ** ** On Thu, Jul 5, 2012 at 11:54 AM, Gabriel Hurley gabriel.hur...@nebula.com wrote: The “Project Dashboard” hides images with an AKI or AMI image type (as they’re not launchable and generally shouldn’t be edited by “normal” users). You can see those in the “Admin Dashboard” if you want to edit them. So my guess is that the kernel and ramdisk images are being hidden correctly and your “ubuntu-11.10-server-amd64” and “ubuntu-12.04-server-amd64” have the wrong image type set. All the best, - Gabriel *From:* openstack-bounces+gabriel.hurley=nebula@lists.launchpad.net[mailto: openstack-bounces+gabriel.hurley=nebula@lists.launchpad.net] *On Behalf Of *Sam Su *Sent:* Thursday, July 05, 2012 11:20 AM *To:* openstack *Subject:* [Openstack] About images list in dashboard Hi, I have an Openstack Essex environment. The nova control services, glance, keystone and dashboard are all deployed in one server. Now I encounter a strange problem. I can only see two images (all images are set is_public=true) in the tenant 'demo' from dashboard, i.e., Horizon, as below: *Image Name Type Status Public Container Format Actions* CentOS-6.2-x86_64 Image Active YesOVF Launch CentOS-5.8-x86_64 Image Active YesOVF Launch However, when I use 'nova image-list' with the same credential for the same tenant 'demo', I can see many more images (see the following result)* *** # nova image-list +-+--+- --+--+ | ID | Name | Status |Server | +-+--+--- +--+ | 18b130ce-a815-4671-80e8-9308a7b6fc6d | ubuntu-12.04-server-amd64 | ACTIVE | | | 388d16ce-b80b-4e9e-b8db-db6dce6f4a83 | ubuntu-12.04-server-amd64-kernel | ACTIVE | | | 8d9505ce-0974-431d-a53d-e9ed6dc89033 | CentOS-6.2-x86_64 | ACTIVE | | | 99be14c0-3b15-470b-9e2d-a9d7e2242c7a | CentOS-5.8-x86_64 | ACTIVE | | | a486733f-c011-4fa1-8ce2-553084f9bc0e | ubuntu-11.10-server-amd64 |
[Openstack] how to properly get rid of some `nova-manage service list` entries
Hey all, I'm having some old entries in the output of `nova-manage service list`, which I would like to get rid from. One compute, and 2 nova-network items. Is it safe to just DELETE them from the mysql table or is there more involved? Best regards, Christian Parpart. ___ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp
[Openstack] inter-tenant and VM-to-bare-metal communication policies/restrictions.
Hi all, I am running multiple compute nodes and a single nova-network node, that is to act as a central gateway for the tenant's VMs. However, since this nova-network node (of course) knows all routes, every VM of any tenant can talk to each other, including to the physical nodes, which I highly disagree with and would like to restrict that. :-) root@gw1:~# ip route show default via $UPLINK_IP dev eth1 metric 100 10.10.0.0/19 dev eth0 proto kernel scope link src 10.10.30.5 10.10.40.0/21 dev br100 proto kernel scope link src 10.10.40.1 10.10.48.0/24 dev br101 proto kernel scope link src 10.10.48.1 10.10.49.0/24 dev br102 proto kernel scope link src 10.10.49.1 $PUBLIC_NET/28 dev eth1 proto kernel scope link src $PUBLIC_IP 192.168.0.0/16 dev eth0 proto kernel scope link src 192.168.2.1 - 10.10.0.0/19 is the network for bare metal nodes, switches, PDUs, etc. - 10.10.40.0/21(br100) is the production tenant - 10.10.48.0/24 (br101) is the staging tenant - 10.10.49.0/24 (br102) is the playground tenant. - 192.168.0.0/16 is the legacy network (management and VM nodes) No tenant's VM shall be able to talk to a VM of another tenant. And ideally no tenant's VM should be able to talk to the management network either. Unfortunately, since we're migrating a live system, and we also have production services on the bare-metal nodes, I had to add special routes to allow the legacy installations to communicate to the new production VMs for the transition phase. I hope I can remove that ASAP. Now, checking iptables on the nova-network node: root@gw1:~# iptables -t filter -vn -L FORWARD Chain FORWARD (policy ACCEPT 64715 packets, 13M bytes) pkts bytes target prot opt in out source destination 36M 29G nova-filter-top all -- * * 0.0.0.0/0 0.0.0.0/0 36M 29G nova-network-FORWARD all -- * * 0.0.0.0/0 0.0.0.0/0 root@gw1:~# iptables -t filter -vn -L nova-filter-top Chain nova-filter-top (2 references) pkts bytes target prot opt in out source destination 36M 29G nova-network-local all -- * * 0.0.0.0/0 0.0.0.0/0 root@gw1:~# iptables -t filter -vn -L nova-network-local Chain nova-network-local (1 references) pkts bytes target prot opt in out source destination root@gw1:~# iptables -t filter -vn -L nova-network-FORWARD Chain nova-network-FORWARD (1 references) pkts bytes target prot opt in out source destination 0 0 ACCEPT all -- br102 * 0.0.0.0/0 0.0.0.0/0 0 0 ACCEPT all -- * br102 0.0.0.0/0 0.0.0.0/0 0 0 ACCEPT udp -- * * 0.0.0.0/0 10.10.49.2 udp dpt:1194 18M 11G ACCEPT all -- br100 * 0.0.0.0/0 0.0.0.0/0 18M 18G ACCEPT all -- * br100 0.0.0.0/0 0.0.0.0/0 0 0 ACCEPT udp -- * * 0.0.0.0/0 10.10.40.2 udp dpt:1194 106K 14M ACCEPT all -- br101 * 0.0.0.0/0 0.0.0.0/0 79895 23M ACCEPT all -- * br101 0.0.0.0/0 0.0.0.0/0 0 0 ACCEPT udp -- * * 0.0.0.0/0 10.10.48.2 udp dpt:1194 Now I see, that all traffic from tenant staging (br101) for example allows any traffic from/to any destination (-j ACCEPT). I'd propose to reduce this limitation to the public gateway interface (eth1 in my case) and that this value shall be configurable in the nova.conf file. Is there any other thing, I might have overseen, to disallow inter-tenant communication and to disallow tenant-VM-to-bare-metal communication? Many thanks in advance, Christian Parpart. ___ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp
Re: [Openstack] Nova Pacemaker Resource Agents
Hey, that's great, but how do you handle RabbitMQ in-between? I kind of achieved it w/o OCF agents but used native upstart support of Pacemaker, however, OCF's are much more nicer, and still, I'd be interested in how you solved the RabbitMQ issue. Best regards, Christian Parpart. On Mon, Jul 2, 2012 at 7:38 PM, Sébastien Han han.sebast...@gmail.comwrote: Hi everyone, For those of you who want to achieve HA in nova. I wrote some resource agents according to the OCF specification. The RAs available are: - nova-scheduler - nova-api - novnc - nova-consoleauth - nova-cert The how-to is available here: http://www.sebastien-han.fr/blog/2012/07/02/openstack-nova-components-ha/ and the RAs on my Github https://github.com/leseb/OpenStack-ra Those RAs mainly re-use the structure of the resource agent written by Martin Gerhard Loschwitz from Hastexo. Hope it helps! Cheers. ~Seb ___ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp ___ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp
Re: [Openstack] HA inside VMs (via Corosync/Pacemaker)
Oh, no. I use floating IPs for actually real public IPs. But now, that you mention the pools, well, I would have to assign one floating IP to at least TWO KVM instances. Hm, Pacemaker/Corosync *inside* the VM will add the Service-IP to the local ethernet interface, and thus, the outside OpenStack components do not know about. Using a dedicated floating IP pool for service IPs might feel like a great solution, but OpenStack is not the one to manage who gets what IP - but Corosync/Pacemaker inside the KVM instances. :-) Anyone an idea how to solve this? Many thanks in advance, Christian. On Sat, Jun 30, 2012 at 5:00 AM, Vishvananda Ishaya vishvana...@gmail.comwrote: Seems like you could use a floating ip for this. You can define a range for internal floating ips by using a separate floating ip pool. On Jun 29, 2012 7:06 PM, Christian Parpart tra...@gmail.com wrote: Hey all, I would like to setup a highly available service *inside* two KVM instances, so I have created a security group to contain all required service ports, so clients can connect to either VM and that works. And both instances have their own designated IP address, provided by nova itself. And now I want to allocate a custom private IP address (I just chose one from the higher address range, since I've a quite a big one (/21) and it was planned to use higher numbers for HA service IPs. But how do I teach OpneStack to let traffic to these KVMs via its designated Service IP? I took a look at the iptables rules, however, they are created automatically, and I did not get it really right what it all wants to tell me yet and what is there for what (not every rule uses -m comment --comment $hint). :-) So how do I teach OpneStack custom provided IP addresses? Best regards, Christian. ___ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp ___ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp
Re: [Openstack] HA inside VMs (via Corosync/Pacemaker)
On Sat, Jun 30, 2012 at 1:51 PM, Narayan Desai narayan.de...@gmail.comwrote: On Sat, Jun 30, 2012 at 3:06 AM, Christian Parpart tra...@gmail.com wrote: Hm, Pacemaker/Corosync *inside* the VM will add the Service-IP to the local ethernet interface, and thus, the outside OpenStack components do not know about. Using a dedicated floating IP pool for service IPs might feel like a great solution, but OpenStack is not the one to manage who gets what IP - but Corosync/Pacemaker inside the KVM instances. :-) Anyone an idea how to solve this? It sounds like you want to add explicit support to pacemaker to deal with openstack fixed addresses. Then you could run with rfc1918 floating addresses, and then have pacemaker/corosync reassign the (external) fixed address when consensus changes. Think of the openstack fixed address control plane in a similar way to ifconfig. You should even be able to script it up yourself; you'd need to add your openstack creds to the HA images though. Hey, that's a really great idea, and IMHO apparently the only way to not interfere with OpenStack internals too much. So I need to create a new resource agent that represents a floating IP. If I succeed, I'll share that script then. :) Cheers, Christian Parpart. ___ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp
[Openstack] HA inside VMs (via Corosync/Pacemaker)
Hey all, I would like to setup a highly available service *inside* two KVM instances, so I have created a security group to contain all required service ports, so clients can connect to either VM and that works. And both instances have their own designated IP address, provided by nova itself. And now I want to allocate a custom private IP address (I just chose one from the higher address range, since I've a quite a big one (/21) and it was planned to use higher numbers for HA service IPs. But how do I teach OpneStack to let traffic to these KVMs via its designated Service IP? I took a look at the iptables rules, however, they are created automatically, and I did not get it really right what it all wants to tell me yet and what is there for what (not every rule uses -m comment --comment $hint). :-) So how do I teach OpneStack custom provided IP addresses? Best regards, Christian. ___ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp
Re: [Openstack] big problem with boot from iso
On Tue, Jun 26, 2012 at 2:30 AM, William Herry william.herry.ch...@gmail.com wrote: Hi I use boot from iso to install a centos instance, it can't recognize the disk, I create a flavor with 300G ephemeral and 300G disk, it says no valid disk found, but when I create a flavor with 30 swap, it found the disk vdb, and can install the system, but of course can boot Hey, maybe your VM disk space is exported as VIRTIO block-device (/dev/vda, ...) and your ISO image doesn't support these block devices? Try loading its underlying kernel module :) Cheers, Christian. ___ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp
[Openstack] nova boot --hint same_host=[UUID] fails with InstanceNotFound: Instance [ could not be found. ?
Hey all, while strictly following the guidelines [1] on how to spawn an instance on the same host as another instance, I run into the error, that it cannot find some instanced called: [, which - of course - is not the UUID I specified. I tried dropping the [ ] and just passed the UUID right away, but still, then is just takes the first character of the UUID and says, that it can't find this one. my exact command line looked like this: nova boot --image 6c73d25e-df93-4c96-a803-9d419a367267 --flavor 16 --hint same_host=[df5fd16b-271d-45ac-9e9a-5d3ad33920e5] $instance_name 2012-06-26 08:26:28 ERROR nova.rpc.amqp [req-67363fee-1046-4ff8-90a2-05aacb1cbe10 fe655fd0ad49474db5882931685c77fe 8f956f17ce9d4c4d9957c230aab4f720] Exception during message handling 2012-06-26 08:26:28 TRACE nova.rpc.amqp Traceback (most recent call last): 2012-06-26 08:26:28 TRACE nova.rpc.amqp File /usr/lib/python2.7/dist-packages/nova/rpc/amqp.py, line 252, in _process_data 2012-06-26 08:26:28 TRACE nova.rpc.amqp rval = node_func(context=ctxt, **node_args) 2012-06-26 08:26:28 TRACE nova.rpc.amqp File /usr/lib/python2.7/dist-packages/nova/scheduler/manager.py, line 115, in run_instance 2012-06-26 08:26:28 TRACE nova.rpc.amqp context, ex, *args, **kwargs) 2012-06-26 08:26:28 TRACE nova.rpc.amqp File /usr/lib/python2.7/contextlib.py, line 24, in __exit__ 2012-06-26 08:26:28 TRACE nova.rpc.amqp self.gen.next() 2012-06-26 08:26:28 TRACE nova.rpc.amqp File /usr/lib/python2.7/dist-packages/nova/scheduler/manager.py, line 105, in run_instance 2012-06-26 08:26:28 TRACE nova.rpc.amqp return self.driver.schedule_run_instance(*args, **kwargs) 2012-06-26 08:26:28 TRACE nova.rpc.amqp File /usr/lib/python2.7/dist-packages/nova/scheduler/multi.py, line 78, in schedule_run_instance 2012-06-26 08:26:28 TRACE nova.rpc.amqp return self.drivers['compute'].schedule_run_instance(*args, **kwargs) 2012-06-26 08:26:28 TRACE nova.rpc.amqp File /usr/lib/python2.7/dist-packages/nova/scheduler/filter_scheduler.py, line 72, in schedule_run_instance 2012-06-26 08:26:28 TRACE nova.rpc.amqp *args, **kwargs) 2012-06-26 08:26:28 TRACE nova.rpc.amqp File /usr/lib/python2.7/dist-packages/nova/scheduler/filter_scheduler.py, line 194, in _schedule 2012-06-26 08:26:28 TRACE nova.rpc.amqp filter_properties) 2012-06-26 08:26:28 TRACE nova.rpc.amqp File /usr/lib/python2.7/dist-packages/nova/scheduler/host_manager.py, line 218, in filter_hosts 2012-06-26 08:26:28 TRACE nova.rpc.amqp if host.passes_filters(filter_fns, filter_properties): 2012-06-26 08:26:28 TRACE nova.rpc.amqp File /usr/lib/python2.7/dist-packages/nova/scheduler/host_manager.py, line 156, in passes_filters 2012-06-26 08:26:28 TRACE nova.rpc.amqp if not filter_fn(self, filter_properties): 2012-06-26 08:26:28 TRACE nova.rpc.amqp File /usr/lib/python2.7/dist-packages/nova/scheduler/filters/affinity_filter.py, line 64, in host_passes 2012-06-26 08:26:28 TRACE nova.rpc.amqp if self._affinity_host(context, i) == me]) 2012-06-26 08:26:28 TRACE nova.rpc.amqp File /usr/lib/python2.7/dist-packages/nova/scheduler/filters/affinity_filter.py, line 30, in _affinity_host 2012-06-26 08:26:28 TRACE nova.rpc.amqp return self.compute_api.get(context, instance_id)['host'] 2012-06-26 08:26:28 TRACE nova.rpc.amqp File /usr/lib/python2.7/dist-packages/nova/compute/api.py, line 1022, in get 2012-06-26 08:26:28 TRACE nova.rpc.amqp instance = self.db.instance_get(context, instance_id) 2012-06-26 08:26:28 TRACE nova.rpc.amqp File /usr/lib/python2.7/dist-packages/nova/db/api.py, line 540, in instance_get 2012-06-26 08:26:28 TRACE nova.rpc.amqp return IMPL.instance_get(context, instance_id) 2012-06-26 08:26:28 TRACE nova.rpc.amqp File /usr/lib/python2.7/dist-packages/nova/db/sqlalchemy/api.py, line 120, in wrapper 2012-06-26 08:26:28 TRACE nova.rpc.amqp return f(*args, **kwargs) 2012-06-26 08:26:28 TRACE nova.rpc.amqp File /usr/lib/python2.7/dist-packages/nova/db/sqlalchemy/api.py, line 1339, in instance_get 2012-06-26 08:26:28 TRACE nova.rpc.amqp raise exception.InstanceNotFound(instance_id=instance_id) 2012-06-26 08:26:28 TRACE nova.rpc.amqp InstanceNotFound: Instance [ could not be found. And that's the error in the log then. Any ideas if it's my fault or how to work around this? Many thanks in advance, Christian Parpart. ___ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp
[Openstack] maybe a bug, but where? (dnsmasq-dhcp versus Redis inside KVM)
Hey all, after having upgraded to dnsmasq 1.62 (current), increasing the lease times up to 7 days, I now have a very silent syslog on my gateway host. However, there is one KVM instance (running redis inside, w/ a 16GB RAM flavor), that still looses its IP very very frequently. It now seems, that even after just 4 hours of KVM instance uptime, the nova-network node receives the following and logs it with: 2260 Jun 15 16:51:37 cesar1 dnsmasq-dhcp[8707]: DHCPREQUEST(br100) 10.10.40.16 fa:16:3e:3d:ff:f3 2261 Jun 15 16:51:37 cesar1 dnsmasq-dhcp[8707]: DHCPACK(br100) 10.10.40.16 fa:16:3e:3d:ff:f3 redis-appdata1 [] 3381 Jun 15 21:59:41 cesar1 dnsmasq-dhcp[10889]: DHCPREQUEST(br100) 10.10.40.16 fa:16:3e:3d:ff:f3 3382 Jun 15 21:59:41 cesar1 dnsmasq-dhcp[10889]: DHCPACK(br100) 10.10.40.16 fa:16:3e:3d:ff:f3 redis-appdata1 [] And 26:59 was exactly the time our redis server went down. Although, I cannot find anything except cron logs in the KVM instance's syslog. My question now is, why is such a request causing network unreachability to that node? Many thanks in advance, Christian Parpart. ___ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp
Re: [Openstack] [Openstack-operators] Nova Controller HA issues
Hey, well, I said I might be wrong because I have no clear vision on how OpenStack works in its deepest detail, however, I would not like to depend on a controller node that is inside a virtual machine, controlled by compute nodes, that are controlled by the controller node. This sounds quite like a chicken-and-egg problem. However, at the time of this writing, I think you'll have to have a working nova-scheduler process, which is responsible on deciding on which compute node to spawn your VM (what else?), and think about what you do when this (or all your controller-)VMs terribly die, and you want to rebuild it, how do you plan to do this when your controller node is out-of-service? I in my case have put the controller services onto two compute nodes, and use Pacemaker to switch between them, in case one node goes down, the other can take over (via shared service-IP). Again, these are my thoughts, and I am using OpenStack for just about a month now :-) But I hope this helps a bit... Best regards, Christian Parpart. On Fri, Jun 15, 2012 at 8:16 AM, Igor Laskovy igor.lask...@gmail.comwrote: Why? Can you please clarify. Igor Laskovy facebook.com/igor.laskovy Kiev, Ukraine On Jun 15, 2012 1:55 AM, Christian Parpart tra...@gmail.com wrote: I don't think putting the controller node completely into a VM is a good advice, at least when speaking of nova-scheduler and nova-api (if central). I may be wrong, and if so, please correct me. Christian. On Thu, Jun 14, 2012 at 7:20 PM, Igor Laskovy igor.lask...@gmail.comwrote: Hi, have any updates there? Can anybody clarify what happens if controller nodes just going hard shutdown? I thinking about solution with two hypervisors and putting controller node in VM shared storage, which can be relaunched when active hypervisor will die. Any ideas, advise? On Tue, Jun 12, 2012 at 3:52 PM, John Garbutt john.garb...@citrix.com wrote: Sure, I get your point. I think Florian is working on some docs to help on that. Not sure how much has been done already. Cheers, John From: Christian Parpart [mailto:tra...@gmail.com] Sent: 12 June 2012 13:47 To: John Garbutt Cc: openstack-operat...@lists.openstack.org Subject: Re: [Openstack-operators] Nova Controller HA issues Hey, ya I also found this page, but didn't find it yet that helpful, it rather much sounds like a theoretical paper on how they implemented it rather then telling me on how to actually make it happen (from the sysop point of view :-) I hoped that someone had to face this already, since I really find it very unintuitive to realize, or need to wait until I get more time to investigate dedicated. :-) Regards, Christian. On Tue, Jun 12, 2012 at 12:52 PM, John Garbutt john.garb...@citrix.com wrote: I thought Rabbit had a built in HA solution these days: http://www.rabbitmq.com/ha.html From: openstack-operators-boun...@lists.openstack.org [mailto:openstack-operators-boun...@lists.openstack.org] On Behalf Of Christian Parpart Sent: 12 June 2012 09:59 To: openstack-operat...@lists.openstack.org Subject: [Openstack-operators] Nova Controller HA issues Hi all, after spending the whole evening in making our cloud controller node highly available using Corosync/Pacemaker, at which I am really proud about it, I am having just a few problems left, and the one that freaks me out the most is rabbitmq-server. That beast I just seem to find no good documenation on how to set rabbitmq-server up properly for HA'ing. Does anyone have ever tried to set a nova controller (including rabbitmq dependency) up for HAing? If so, I'd be pleased to share experiences, especially to the latter part. :-) Best regards, Christian Parpart ___ Openstack-operators mailing list openstack-operat...@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators -- Igor Laskovy Kiev, Ukraine ___ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp
Re: [Openstack] instances loosing IP address while running, due to No DHCPOFFER
Hey all, it now just happened twice again, both just today. and the last at 22:00 UTC, with the following in the nova-network's syslog: root@gw1:/var/log# grep 'dnsmasq.*10889' daemon.log Jun 15 17:39:32 cesar1 dnsmasq[10889]: started, version v2.62-7-g4ce4f37 cachesize 150 Jun 15 17:39:32 cesar1 dnsmasq[10889]: compile time options: IPv6 GNU-getopt no-DBus no-i18n no-IDN DHCP DHCPv6 no-Lua TFTP no-conntrack Jun 15 17:39:32 cesar1 dnsmasq-dhcp[10889]: DHCP, static leases only on 10.10.40.3, lease time 3d Jun 15 17:39:32 cesar1 dnsmasq[10889]: reading /etc/resolv.conf Jun 15 17:39:32 cesar1 dnsmasq[10889]: using nameserver 4.2.2.1#53 Jun 15 17:39:32 cesar1 dnsmasq[10889]: using nameserver 178.63.26.173#53 Jun 15 17:39:32 cesar1 dnsmasq[10889]: using nameserver 192.168.2.122#53 Jun 15 17:39:32 cesar1 dnsmasq[10889]: using nameserver 192.168.2.121#53 Jun 15 17:39:32 cesar1 dnsmasq[10889]: read /etc/hosts - 519 addresses Jun 15 17:39:32 cesar1 dnsmasq-dhcp[10889]: read /var/lib/nova/networks/nova-br100.conf Jun 15 21:59:41 cesar1 dnsmasq-dhcp[10889]: DHCPREQUEST(br100) 10.10.40.16 fa:16:3e:3d:ff:f3 Jun 15 21:59:41 cesar1 dnsmasq-dhcp[10889]: DHCPACK(br100) 10.10.40.16 fa:16:3e:3d:ff:f3 redis-appdata1 it seemed that this once VM was the only one who sent a dhcp request over the past 5 hours, and that first wone got replied with dhcp ack, and that is it. That's been the time the host behind that IP (redis-appdata1) stopped functioning. However, I now actually did update dnsmasq on our gateway note, to latest trunk of dnsmasq git repository, killed dnsmasq, restarted nova-network (which auto-starts dnsmasq per device). Now, I really hoped that this one particular bug fix was the cause of the downtime, but appearently, thet MIGHT be another factor. There is unfortunately nothing to read in the VM's syslog. What else could cause the VM to forget its IP? Can this also be caused by send_arp_for_ha=True? Regards, Christian. Christian. On Fri, Jun 15, 2012 at 2:50 AM, Nathanael Burton nathanael.i.bur...@gmail.com wrote: FWIW I haven't run across the dnsmasq bug in our environment using EPEL packages. Nate On Jun 14, 2012 7:20 PM, Vishvananda Ishaya vishvana...@gmail.com wrote: Are you running in VLAN mode? If so, you probably need to update to a new version of dnsmasq. See this message for reference: http://osdir.com/ml/openstack-cloud-computing/2012-05/msg00785.html Vish On Jun 14, 2012, at 1:41 PM, Christian Parpart wrote: Hey all, I feel really sad with saying this, now, that we have quite a few instances in producgtion since about 5 days at least, I now have encountered the second instance loosing its IP address due to No DHCPOFFER (as of syslog in the instance). I checked the logs in the central nova-network and gateway node and found dnsmasq still to reply on requests from all the other instances and it even got the request from the instance in question and even sent an OFFER, as of what I can tell by now (i'm investigating / posting logs asap), but while it seemed that the dnsmasq sends an offer, the instances says it didn't receive one - wtf? Please tell me what I can do to actually *fix* this issue, since this is by far very fatal. One chance I'd see (as a workaround) is, to let created instanced retrieve its IP via dhcp, but then reconfigure /etc/network/instances to continue with static networking setup. However, I'd just like the dhcp thingy to get fixed. I'm very open to any kind of helping comments, :) So long, Christian. ___ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp ___ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp ___ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp
[Openstack] instances loosing IP address while running, due to No DHCPOFFER
Hey all, I feel really sad with saying this, now, that we have quite a few instances in producgtion since about 5 days at least, I now have encountered the second instance loosing its IP address due to No DHCPOFFER (as of syslog in the instance). I checked the logs in the central nova-network and gateway node and found dnsmasq still to reply on requests from all the other instances and it even got the request from the instance in question and even sent an OFFER, as of what I can tell by now (i'm investigating / posting logs asap), but while it seemed that the dnsmasq sends an offer, the instances says it didn't receive one - wtf? Please tell me what I can do to actually *fix* this issue, since this is by far very fatal. One chance I'd see (as a workaround) is, to let created instanced retrieve its IP via dhcp, but then reconfigure /etc/network/instances to continue with static networking setup. However, I'd just like the dhcp thingy to get fixed. I'm very open to any kind of helping comments, :) So long, Christian. ___ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp
Re: [Openstack] [Openstack-operators] Nova Controller HA issues
I don't think putting the controller node completely into a VM is a good advice, at least when speaking of nova-scheduler and nova-api (if central). I may be wrong, and if so, please correct me. Christian. On Thu, Jun 14, 2012 at 7:20 PM, Igor Laskovy igor.lask...@gmail.comwrote: Hi, have any updates there? Can anybody clarify what happens if controller nodes just going hard shutdown? I thinking about solution with two hypervisors and putting controller node in VM shared storage, which can be relaunched when active hypervisor will die. Any ideas, advise? On Tue, Jun 12, 2012 at 3:52 PM, John Garbutt john.garb...@citrix.com wrote: Sure, I get your point. I think Florian is working on some docs to help on that. Not sure how much has been done already. Cheers, John From: Christian Parpart [mailto:tra...@gmail.com] Sent: 12 June 2012 13:47 To: John Garbutt Cc: openstack-operat...@lists.openstack.org Subject: Re: [Openstack-operators] Nova Controller HA issues Hey, ya I also found this page, but didn't find it yet that helpful, it rather much sounds like a theoretical paper on how they implemented it rather then telling me on how to actually make it happen (from the sysop point of view :-) I hoped that someone had to face this already, since I really find it very unintuitive to realize, or need to wait until I get more time to investigate dedicated. :-) Regards, Christian. On Tue, Jun 12, 2012 at 12:52 PM, John Garbutt john.garb...@citrix.com wrote: I thought Rabbit had a built in HA solution these days: http://www.rabbitmq.com/ha.html From: openstack-operators-boun...@lists.openstack.org [mailto:openstack-operators-boun...@lists.openstack.org] On Behalf Of Christian Parpart Sent: 12 June 2012 09:59 To: openstack-operat...@lists.openstack.org Subject: [Openstack-operators] Nova Controller HA issues Hi all, after spending the whole evening in making our cloud controller node highly available using Corosync/Pacemaker, at which I am really proud about it, I am having just a few problems left, and the one that freaks me out the most is rabbitmq-server. That beast I just seem to find no good documenation on how to set rabbitmq-server up properly for HA'ing. Does anyone have ever tried to set a nova controller (including rabbitmq dependency) up for HAing? If so, I'd be pleased to share experiences, especially to the latter part. :-) Best regards, Christian Parpart ___ Openstack-operators mailing list openstack-operat...@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators -- Igor Laskovy Kiev, Ukraine ___ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp
Re: [Openstack] instances loosing IP address while running, due to No DHCPOFFER
Hey, thanks for your reply. Unfortunately there was no process restart in nova-network nor in dnsmasq, both processes seem to have been up for about 2 and 3 days. However, why is the default dhcp_lease_time value equal 120s? Not having this one overridden causes the clients to actually re-acquire a new DHCP lease every 42 seconds (at least on my nodes), which is completely ridiculous. OTOH, I took a look at the sources (linux_net.py) and found out, why the max_lease_time is set to 2048, because that is the size of my network. So why is the max lease time the size of my network? I've written a tiny patch to allow overriding this value in nova.conf, and will submit it to launchpad soon - and hope it'll be accepted and then also applied to essex, since this is a very straight forward few-liner helpful thing. Nevertheless, that does not clarify on why now I had 2 (well, 3 actually) instances getting no DHCP replies/offers after some hours/days anymore. The one host that caused issues today (a few hours ago), I fixed it by hard rebooting the instance, however, just about 40 minutes later, it again forgot its IP, so one might say, that it maybe did not get any reply from the dhcp server (dnsmasq) almost right after it got a lease on instance boot. So long, Christian. On Thu, Jun 14, 2012 at 10:55 PM, Nathanael Burton nathanael.i.bur...@gmail.com wrote: Has nova-network been restarted? There was an issue where nova-network was signalling dnsmasq which would cause dnsmasq to stop responding to requests yet appear to be running fine. You can see if killing dnsmasq, restarting nova-network, and rebooting an instance allows it to get a dhcp address again ... Nate On Jun 14, 2012 4:46 PM, Christian Parpart tra...@gmail.com wrote: Hey all, I feel really sad with saying this, now, that we have quite a few instances in producgtion since about 5 days at least, I now have encountered the second instance loosing its IP address due to No DHCPOFFER (as of syslog in the instance). I checked the logs in the central nova-network and gateway node and found dnsmasq still to reply on requests from all the other instances and it even got the request from the instance in question and even sent an OFFER, as of what I can tell by now (i'm investigating / posting logs asap), but while it seemed that the dnsmasq sends an offer, the instances says it didn't receive one - wtf? Please tell me what I can do to actually *fix* this issue, since this is by far very fatal. One chance I'd see (as a workaround) is, to let created instanced retrieve its IP via dhcp, but then reconfigure /etc/network/instances to continue with static networking setup. However, I'd just like the dhcp thingy to get fixed. I'm very open to any kind of helping comments, :) So long, Christian. ___ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp ___ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp
Re: [Openstack] instances loosing IP address while running, due to No DHCPOFFER
Hey all, many many thanks for all your replies, and while already having raised the dhcp timeouts just by now, I'll have now enough time to sleep to actually apply the dnsmasq fix tomorrow then. Yes, I am running in VLAN-mode, since this is also the propagated way. Maybe OpenStack (nova-network) should check the version number of dnsmasq and if running in vlan mode, it really should issue a (critical) warning into the logs, especially where this kind of error can lead to disasters in datacenters. :) I also hope that Ubuntu 12.04 will pick up this patch soon enough, so the us won't end up in a patch-dominated distribution :-) Good night all, Christian. On Fri, Jun 15, 2012 at 1:16 AM, Narayan Desai narayan.de...@gmail.comwrote: I vaguely recall Vish mentioning a bug in dnsmasq that had a somewhat similar problem. (it had to do with lease renewal problems on ip aliases or something like that). This issue was particularly pronounced with windows VMs, apparently. -nld On Thu, Jun 14, 2012 at 6:02 PM, Christian Parpart tra...@gmail.com wrote: Hey, thanks for your reply. Unfortunately there was no process restart in nova-network nor in dnsmasq, both processes seem to have been up for about 2 and 3 days. However, why is the default dhcp_lease_time value equal 120s? Not having this one overridden causes the clients to actually re-acquire a new DHCP lease every 42 seconds (at least on my nodes), which is completely ridiculous. OTOH, I took a look at the sources (linux_net.py) and found out, why the max_lease_time is set to 2048, because that is the size of my network. So why is the max lease time the size of my network? I've written a tiny patch to allow overriding this value in nova.conf, and will submit it to launchpad soon - and hope it'll be accepted and then also applied to essex, since this is a very straight forward few-liner helpful thing. Nevertheless, that does not clarify on why now I had 2 (well, 3 actually) instances getting no DHCP replies/offers after some hours/days anymore. The one host that caused issues today (a few hours ago), I fixed it by hard rebooting the instance, however, just about 40 minutes later, it again forgot its IP, so one might say, that it maybe did not get any reply from the dhcp server (dnsmasq) almost right after it got a lease on instance boot. So long, Christian. On Thu, Jun 14, 2012 at 10:55 PM, Nathanael Burton nathanael.i.bur...@gmail.com wrote: Has nova-network been restarted? There was an issue where nova-network was signalling dnsmasq which would cause dnsmasq to stop responding to requests yet appear to be running fine. You can see if killing dnsmasq, restarting nova-network, and rebooting an instance allows it to get a dhcp address again ... Nate On Jun 14, 2012 4:46 PM, Christian Parpart tra...@gmail.com wrote: Hey all, I feel really sad with saying this, now, that we have quite a few instances in producgtion since about 5 days at least, I now have encountered the second instance loosing its IP address due to No DHCPOFFER (as of syslog in the instance). I checked the logs in the central nova-network and gateway node and found dnsmasq still to reply on requests from all the other instances and it even got the request from the instance in question and even sent an OFFER, as of what I can tell by now (i'm investigating / posting logs asap), but while it seemed that the dnsmasq sends an offer, the instances says it didn't receive one - wtf? Please tell me what I can do to actually *fix* this issue, since this is by far very fatal. One chance I'd see (as a workaround) is, to let created instanced retrieve its IP via dhcp, but then reconfigure /etc/network/instances to continue with static networking setup. However, I'd just like the dhcp thingy to get fixed. I'm very open to any kind of helping comments, :) So long, Christian. ___ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp ___ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp ___ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp
[Openstack] instance snapshotting failed. now in permanent Image_Snapshot state.
Hey all, I feel really sorry to bother you about that, but it really annoys me now for quite a while now. I used snapshotting quite a few times already - in success - but this time (maybe due to my HA tries on the cloud controller node) the snapshotting failed. I clicked snapshot via dashboard, and the went into Task = Image_Snapshot state, and the image snapshot is in Status = Queued. This is now about 12 hours ago, and I quite don't think that anything will happen w/o human intervention. Unfortunately, the nova-compute.log on the compute node in question just says it is about to snapshot, no errors above nor below. a few hours later I tried to hard reboot the instance, but that failed w/o telling me why either. I searched the nova-scheduler.log on the controller node, but here I couldn't find anything related (at least I did not find it). What can I do now? I do not want to terminate the instance, just because of the snapshotting error, and I'd also like to get snapshotting fixed again, but how? Many thanks in advance, Christian. ___ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp