[Openstack] adding/removing security groups at instance runtime

2012-11-26 Thread Christian Parpart
Hey all,

I was googling around trying to find a way to add a new security group to a
running instance.
I however just found the fact that this is said to work with trunk but not
how exactly
nor what components I should update to trunk to make it actually happen.

I am having a multi-node OpenStack Essex cluster and would like to update
only the parts
that I actually need to get this feature in.
As far as I could search, it seems that at least python-novaclient is
responsible
for the client side. But do I also need to update nova-api or something
similar?

Many thanks in advance,
Christian Parpart.
___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp


[Openstack] nova-network sometimes stops routing Floating IPs

2012-11-21 Thread Christian Parpart
Hey all,

I am having a rather serious with the central (OpenStack' Essex)
nova-network gateway we have set up.
We have quite some floating IPs assigned to a few virtual machines, and it
just works.
But since a few days (or weeks)  I notice that some VM does not get inbound
traffic from
external IPs, such as the internet, through the floating IP to be DNAT'ed
to the application VM.

We tried to debug it, and it is definitely something with nova-network
service going wrong here.

Now, a `pkill dnsmasq  sleep 2  initctl restart nova-network` actually
fixes it.

The question now is *why* does it fix it. The routing tables do not really
seem to have changed,
unless I was missing something while checking :).

Is there anything nova-network is doing as well except setting up IPs and
iptables?
Or - what is nova-network actually doing in general and what could be the
reason the run into
such a situation.

Many thanks in advance,
Christian.
___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp


[Openstack] Essex Dashboard: KeyError at /nova/instances_and_volumes/

2012-11-12 Thread Christian Parpart
Hey all,

since quite some weeks I am getting an error page instead of the Instances
and Volumes page in the
Essex Horizon Dashboard with the above title and the below detailed error
output:

Environment:
Request Method: GETRequest URL:
http://controller.rz.dawanda.com/nova/instances_and_volumes/
Django Version: 1.3.1Python Version: 2.7.3Installed Applications:
['openstack_dashboard', 'django.contrib.sessions',
 'django.contrib.messages', 'django.contrib.staticfiles', 'django_nose',
 'horizon', 'horizon.dashboards.nova', 'horizon.dashboards.syspanel',
 'horizon.dashboards.settings']Installed Middleware:
('django.middleware.common.CommonMiddleware',
 'django.middleware.csrf.CsrfViewMiddleware',
 'django.contrib.sessions.middleware.SessionMiddleware',
 'django.contrib.messages.middleware.MessageMiddleware',
 'openstack_dashboard.middleware.DashboardLogUnhandledExceptionsMiddleware',
 'horizon.middleware.HorizonMiddleware',
 'django.middleware.doc.XViewMiddleware',
 'django.middleware.locale.LocaleMiddleware')

Traceback:File
/usr/lib/python2.7/dist-packages/django/core/handlers/base.py in
get_response  111. response = callback(request,
*callback_args, **callback_kwargs)File
/usr/lib/python2.7/dist-packages/horizon/decorators.py in dec  40.
  return view_func(request, *args, **kwargs)File
/usr/lib/python2.7/dist-packages/horizon/decorators.py in dec  55.
  return view_func(request, *args, **kwargs)File
/usr/lib/python2.7/dist-packages/horizon/decorators.py in dec  40.
  return view_func(request, *args, **kwargs)File
/usr/lib/python2.7/dist-packages/django/views/generic/base.py in view
47. return self.dispatch(request, *args, **kwargs)File
/usr/lib/python2.7/dist-packages/django/views/generic/base.py in dispatch
68. return handler(request, *args, **kwargs)File
/usr/lib/python2.7/dist-packages/horizon/tables/views.py in get  105.
handled = self.construct_tables()File
/usr/lib/python2.7/dist-packages/horizon/tables/views.py in
construct_tables  96. handled = self.handle_table(table)File
/usr/lib/python2.7/dist-packages/horizon/tables/views.py in handle_table
68. data = self._get_data_dict()File
/usr/lib/python2.7/dist-packages/horizon/tables/views.py in _get_data_dict
37. self._data[table._meta.name] = data_func()File
/usr/lib/python2.7/dist-packages/horizon/dashboards/nova/instances_and_volumes/views.py
in get_volumes_data  74. att['instance'] =
instances[att['server_id']]
Exception Type: KeyError at /nova/instances_and_volumes/Exception Value:
u'8aa2989e-85ea-4975-b81b-04d06dbf8013'
--
--

now I wonder in how far that is a bug in the software and/or whether I have
an invalid entry in my nova database that
I can fix by hand.

if so, does anyone know how to actually work around this? I do really need
this (now not working) page :-)

Regards,
Christian Parpart.
___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp


[Openstack] floating IPs not routed from inside

2012-10-25 Thread Christian Parpart
Hey all,

we're having quite a few compute nodes with Essex installed and one central
nova-network gateway.

We now have a few floating IPs set up to route from the world through the
gateway to these VMs.

However, accessing these floating (public) IPs from inside a *tenant's
VM*results into timeouts,
but accessing the very same IP from a compute node (hypervisor) hosting
those VMs actually does work.

Now I'm a bit confused, it seems like a routing issue or iptables NAT thing
and would be really greatful
if anyone can help me out with a hint. :)

Is this known to not work or what do you need from me to actually
understand my issue a bit more?

Many thanks in advance,
Christian Parpart.
___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp


Re: [Openstack] looking for a Nova scheduler filter plugin to boot nodes on named hosts

2012-10-16 Thread Christian Parpart
Hey all,

many thanks for your replies so far.

In general, I must say that there really is an absolute need for explicit
provisioning, that is, deciding by the admin what single host to prefer
(but still to reject when there're just no resources left of course).

- Filter like SameHost filter only works when there is already a host, and
then you've to look up and built up the correlation first (not a big
problem, but doesn't feel comfortable).
- IsolatedHosts filter doesn't make that much sense, as we are using one
general template-%{TIMESTAMP} to bootstrap some node and then set up
everything else inside, and we usually still may have more than one VM on
that compute node (e.g. a memcached VM and a postfix VM).
- availability zones, so I got told, are deprecated already (dunno) and I
can't give every compute node a different availability zone, as - tbh -
that's what I have hostnames for :-)

Philip, I'd really like to dive into developing such a plugin, let's call
it HostnameFilter-plugin, that the operator can pass one (or a set of)
hostname(s) that are allowed to spawn the VM on.
However, I just wrote Python once, and even dislike the syntax a bit, not
saying I hate it, but still :-)

Is there any guide (/tutorial) for reference (or hello_world nova scheduler
plugin) I can look at to learn on how to write such a plugin?

Many thanks for your replies so far,
Christian Parpart.

On Tue, Oct 16, 2012 at 4:22 PM, Day, Phil philip@hp.com wrote:

  Hi Christian,

 ** **

 For a more general solution you might want to look at the code that
 supports passing in “—availabilty_zone=az:host” (look for forced_host in
 compute/api.py).  Currently this is limited to admin, but I think that
 should be changed to be a specific action that can be controlled by policy
 (we have a change in preparation for this).

 ** **

 Cheers,

 Phil

 ** **

 *From:* openstack-bounces+philip.day=hp@lists.launchpad.net [mailto:
 openstack-bounces+philip.day=hp@lists.launchpad.net] *On Behalf Of *GMI
 M
 *Sent:* 16 October 2012 15:13
 *To:* Christian Parpart
 *Cc:* openstack@lists.launchpad.net
 *Subject:* Re: [Openstack] looking for a Nova scheduler filter plugin to
 boot nodes on named hosts

 ** **

 Hi Christian,

 I think you might be able to use the existing filters in Essex.
 For example, you can add the following lines in the nova.conf of the
 controller host (or where nova-scheduler runs) and restart nova-scheduler:

 isolated_hosts=nova7
 isolated_images=sadsd1e35dfe63

 This will allow you to run the image with ID sadsd1e35dfe63 only on the
 compute host nova7.
 You can also pass a list of compute servers in the isolated_hosts, if you
 have the need.

 I certainly see the use-case for this feature, for example when you want
 to run Windows based instances and you don't want to buy a Windows
 datacenter license for each nova-compute host, but only for a few that will
 run Windows instances.

 I hope this helps you.




 

 On Mon, Oct 15, 2012 at 7:45 PM, Christian Parpart tra...@gmail.com
 wrote:

 Hi all,

 ** **

 I am looking for a (Essex) Nova scheduler plugin that parses the
 scheduler_hints to get a hostname of the

 hypervisor to spawn the actual VM on, rejecting any other node.

 ** **

 This allows us to explicitely spawn a VM on a certain host (yes, there are
 really usecases where you want that). :-)

 ** **

 I was trying to build my own and searching around since I couldn't believe
 I was the only one, but didn't find one yet.

 ** **

 Does anyone of you maybe have the skills to actually write that simple
 plugin, or even maybe knows where such

 a plugin has already been developed?

 ** **

 Many thanks in advance,

 Christian Parpart.


 ___
 Mailing list: https://launchpad.net/~openstack
 Post to : openstack@lists.launchpad.net
 Unsubscribe : https://launchpad.net/~openstack
 More help   : https://help.launchpad.net/ListHelp

 ** **

___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp


[Openstack] looking for a Nova scheduler filter plugin to boot nodes on named hosts

2012-10-15 Thread Christian Parpart
Hi all,

I am looking for a (Essex) Nova scheduler plugin that parses the
scheduler_hints to get a hostname of the
hypervisor to spawn the actual VM on, rejecting any other node.

This allows us to explicitely spawn a VM on a certain host (yes, there are
really usecases where you want that). :-)

I was trying to build my own and searching around since I couldn't believe
I was the only one, but didn't find one yet.

Does anyone of you maybe have the skills to actually write that simple
plugin, or even maybe knows where such
a plugin has already been developed?

Many thanks in advance,
Christian Parpart.
___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp


Re: [Openstack] irregular but frequent networking issues (Essex on Ubuntu 12.04)

2012-10-06 Thread Christian Parpart
I ran into this bug quite a few months ago, too, but worked around it by
loading vhost_net kernel driver.

Currently I get network outages for just a few seconds, like freezes for ~
10-15 seconds, and then everything
works like nothing have ever happened.
I unfortunately can't find anything in the logs inside VM, hypervisor nor
nova-network node.

Regards,
Christian.

On Fri, Oct 5, 2012 at 4:30 PM, Alejandro Comisario 
alejandro.comisa...@mercadolibre.com wrote:

 Hi Cris, maybe your problem is related to this bug ?

 https://bugs.launchpad.net/ubuntu/+source/qemu-kvm/+bug/997978

 Regards.
 Ale


 On Fri, Oct 5, 2012 at 8:44 AM, Christian Parpart tra...@gmail.comwrote:

 Hey all,

 we're pretty happy about our new OpenStack Essex installation atop of
 Ubuntu 12.04 (hypervisor and guests).
 We use KVM (tenant is in a VLAN) as virtualization technology, having
 about 15 compute nodes, and a
 central nova-network node to act as the gateway (yet to be fully HA'd,
 however). On that gateway we're having
 a PPTP VPN so we can log in on any host or guest though this VPN.

 Our problem however is, that from time to time (I think it's daily and
 even multiple times per day)
 we're encountering kind of networking freezes.

 I first noticed it as my SSH session froze for a few seconds from my
 desktop - VPN (nova-network node) - KVM guest.
 I quickly checked others, and they hang, too.
 I checked a hypervisor, which didn't hang, so it is not a general
 networking issue (like PPTP or SSH or alike).

 Now, it feels like there is some problem with networking from and/or to
 KVM guests and I absolutely have no clue
 on how to trace this down. It really feels like a bug, but that's really
 out of my scope, and that's why I'm seeking for advice here,
 since we just can't stay in this situation :-)

 It is confirmed that we're having this from desktop - VPN - KVM and
 from physical node (in data center) to KVM.
 However, I do not yet know whether all KVMs are affected at once (which
 would indicate that the issue MAY be caused
 by the central nova-network node) or whether it is just hypervisor based
 (so it may be due to some hypervisor's state)
 or just plain random across all 15 compute nodes we have.

 I now think that this issue may indeed be KVM networking related

 I'll be happy about any hints and proposals you can provide me in order
 to track this
 issue down.

 Please tell me about any further information you need.

 Many thanks in advance,
 Christian Parpart.

 ___
 Mailing list: https://launchpad.net/~openstack
 Post to : openstack@lists.launchpad.net
 Unsubscribe : https://launchpad.net/~openstack
 More help   : https://help.launchpad.net/ListHelp



___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp


Re: [Openstack] inter-tenant and VM-to-bare-metal communication policies/restrictions.

2012-08-23 Thread Christian Parpart
On Wed, Aug 15, 2012 at 4:16 AM, Lorin Hochstein
lo...@nimbisservices.comwrote:

 On Jul 5, 2012, at 11:47 AM, Christian Parpart tra...@gmail.com wrote:

 Hi all,

 I am running multiple compute nodes and a single nova-network node, that
 is to act
 as a central gateway for the tenant's VMs.

 However, since this nova-network node (of course) knows all routes, every
 VM of
 any tenant can talk to each other, including to the physical nodes, which
 I highly disagree with and would like to restrict that. :-)


 If you add this to nova.conf:

 allow_same_net_traffic=false

 It should prevent the VMs from communicating with each other. From


 http://docs.openstack.org/essex/openstack-compute/admin/content/compute-options-reference.html#d6e3133


Hey Lorin,

according to this rather short documentation for that flag, it is
unfortunately very unclear what they meant with from same network - I
hope to misread that line :-)

That is, it sounds like it does prevent communication with ANY of the other
VMs, but I just want to disallow communication from one tenant to another.
Like, having a production tenant and a staging tenant, they should not be
able to talk to each other but a VM from the production tenant should be
able to
talk to another VM within the same tenant.

It might be helpful, if one may want to find some more clear words to this
flag within the flag reference :-)

I would also like to know on what physical hosts I need this flag to be
applied, too. I mean, is it just the nova-network node(s) or all compute
nodes, that this flag takes affect?

Many thanks in advance,
Christian Parpart.
___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp


Re: [Openstack] inter-tenant and VM-to-bare-metal communication policies/restrictions.

2012-07-23 Thread Christian Parpart
On Fri, Jul 6, 2012 at 6:39 AM, romi zhang romizhang1...@163.com wrote:

  I am also very interesting about this and also try to find a way to
 forbid the talking between VMs on same compute+network node. J

 ** **

 Romi

 ** **

 *发件人:* openstack-bounces+romizhang1968=163@lists.launchpad.net[mailto:
 openstack-bounces+romizhang1968=163@lists.launchpad.net] *代表 *Christian
 Parpart
 *发送时间:* 2012年7月5日 星期四 23:48
 *收件人:* openstack@lists.launchpad.net
 *主题:* [Openstack] inter-tenant and VM-to-bare-metal communication
 policies/restrictions.

 ** **

 Hi all,

 ** **

 I am running multiple compute nodes and a single nova-network node, that
 is to act

 as a central gateway for the tenant's VMs.

 ** **

 However, since this nova-network node (of course) knows all routes, every
 VM of

 any tenant can talk to each other, including to the physical nodes, which*
 ***

 I highly disagree with and would like to restrict that. :-)

 ** **

 root@gw1:~# ip route show

 default via $UPLINK_IP dev eth1  metric 100 

 10.10.0.0/19 dev eth0  proto kernel  scope link  src 10.10.30.5 

 10.10.40.0/21 dev br100  proto kernel  scope link  src 10.10.40.1 

 10.10.48.0/24 dev br101  proto kernel  scope link  src 10.10.48.1 

 10.10.49.0/24 dev br102  proto kernel  scope link  src 10.10.49.1 

 $PUBLIC_NET/28 dev eth1  proto kernel  scope link  src $PUBLIC_IP

 192.168.0.0/16 dev eth0  proto kernel  scope link  src 192.168.2.1

 ** **

 - 10.10.0.0/19 is the network for bare metal nodes, switches, PDUs, etc.**
 **

 - 10.10.40.0/21(br100) is the production tenant

 - 10.10.48.0/24 (br101) is the staging tenant

 - 10.10.49.0/24 (br102) is the playground tenant.

 - 192.168.0.0/16 is the legacy network (management and VM nodes)

 ** **

 No tenant's VM shall be able to talk to a VM of another tenant.

 And ideally no tenant's VM should be able to talk to the management

 network either.

 ** **

 Unfortunately, since we're migrating a live system, and we also have

 production services on the bare-metal nodes, I had to add special routes**
 **

 to allow the legacy installations to communicate to the new production**
 **

 VMs for the transition phase. I hope I can remove that ASAP.

 ** **

 Now, checking iptables on the nova-network node:

 ** **

 root@gw1:~# iptables -t filter -vn -L FORWARD

 Chain FORWARD (policy ACCEPT 64715 packets, 13M bytes)

  pkts bytes target prot opt in out source
 destination 

   36M   29G nova-filter-top  all  --  *  *   0.0.0.0/0
 0.0.0.0/0   

   36M   29G nova-network-FORWARD  all  --  *  *   0.0.0.0/0
  0.0.0.0/0   

 ** **

 root@gw1:~# iptables -t filter -vn -L nova-filter-top

 Chain nova-filter-top (2 references)

  pkts bytes target prot opt in out source
 destination 

   36M   29G nova-network-local  all  --  *  *   0.0.0.0/0
0.0.0.0/0   

 ** **

 root@gw1:~# iptables -t filter -vn -L nova-network-local

 Chain nova-network-local (1 references)

  pkts bytes target prot opt in out source
 destination   

   

 root@gw1:~# iptables -t filter -vn -L nova-network-FORWARD

 Chain nova-network-FORWARD (1 references)

  pkts bytes target prot opt in out source
 destination 

 0 0 ACCEPT all  --  br102  *   0.0.0.0/0
 0.0.0.0/0   

 0 0 ACCEPT all  --  *  br102   0.0.0.0/0
 0.0.0.0/0   

 0 0 ACCEPT udp  --  *  *   0.0.0.0/0
  10.10.49.2   udp dpt:1194

   18M   11G ACCEPT all  --  br100  *   0.0.0.0/0
 0.0.0.0/0   

   18M   18G ACCEPT all  --  *  br100   0.0.0.0/0
 0.0.0.0/0   

 0 0 ACCEPT udp  --  *  *   0.0.0.0/0
  10.10.40.2   udp dpt:1194

  106K   14M ACCEPT all  --  br101  *   0.0.0.0/0
 0.0.0.0/0   

 79895   23M ACCEPT all  --  *  br101   0.0.0.0/0
 0.0.0.0/0   

 0 0 ACCEPT udp  --  *  *   0.0.0.0/0
  10.10.48.2   udp dpt:1194

 ** **

 Now I see, that all traffic from tenant staging (br101) for example
 allows any traffic from/to any destination (-j ACCEPT).

 I'd propose to reduce this limitation to the public gateway interface
 (eth1 in my case) and that this value

 shall be configurable in the nova.conf file.

 ** **

 Is there any other thing, I might have overseen, to disallow inter-tenant
 communication and to disallow

 tenant-VM-to-bare-metal communication?

 ** **

 Many thanks in advance,

 Christian Parpart.


Am I (almost) the only one interested in disallowing inter-tenant
communication, or am I overseeing something in the docs? :-(

Christian.
___
Mailing list

Re: [Openstack] About images list in dashboard

2012-07-13 Thread Christian Parpart
On Fri, Jul 13, 2012 at 10:56 PM, John Postlethwait 
john.postlethw...@nebula.com wrote:

 Well, it sounds like this issue only happens in Essex, and is no longer an
 issue in Folsom, so the bug will just be closed as invalid, as it is now
 fixed in the newer code...


Please backport this bug then. That is, the bug report indeed makes
absolutely sense to me. :-)



 John Postlethwait
 Nebula, Inc.
 206-999-4492

 On Friday, July 13, 2012 at 1:36 PM, Sam Su wrote:

 Thank you for you guys' suggestions.

 Even so, I'd like to file a bug to track this issue, if someone else have
 the same problem, they would know what happened and what progressed from
 the bug trace.

 Sam


 On Fri, Jul 13, 2012 at 12:43 PM, Gabriel Hurley 
 gabriel.hur...@nebula.com wrote:

  Glance pagination was added in Folsom. Adding a bug for this won’t help
 since it’s already been added in the current code.

 ** **

 **-  **Gabriel

 ** **

 *From:* 
 openstack-bounces+gabriel.hurley=nebula@lists.launchpad.net[mailto:
 openstack-bounces+gabriel.hurley=nebula@lists.launchpad.net] *On
 Behalf Of *John Postlethwait
 *Sent:* Friday, July 13, 2012 12:05 PM
 *To:* Sam Su
 *Cc:* openstack
 *Subject:* Re: [Openstack] About images list in dashboard

 ** **

 Hi Sam,

 ** **

 Would you mind filing a bug against Horizon with the details so that we
 can get it fixed? You can do so here:
 https://bugs.launchpad.net/horizon/+filebug

 ** **

 ** **

 ** **

 John Postlethwait

 Nebula, Inc.

 206-999-4492

 ** **

 On Thursday, July 12, 2012 at 3:55 PM, Sam Su wrote:

  I finally found why this happened.

 ** **

 If in one tenant, there are more than 30 images and snapshots so that
 glance cannot return the images list in one response, some images and
 snapshots will not be seen in the page Images  Snapshots of Horizon.

 ** **

 ** **

 Sam

 ** **

 ** **

 On Thu, Jul 5, 2012 at 1:19 PM, Sam Su susltd...@gmail.com wrote:

 

 Thank you for your suggestion.

 ** **

 I can see all images in other tenants from dashboard,  so I think the
 images type should be ok. 

 ** **

 ** **

 ** **

 On Thu, Jul 5, 2012 at 11:54 AM, Gabriel Hurley gabriel.hur...@nebula.com
 wrote:

 

 The “Project Dashboard” hides images with an AKI or AMI image type (as
 they’re not launchable and generally shouldn’t be edited by “normal”
 users). You can see those in the “Admin Dashboard” if you want to edit them.
 

  

 So my guess is that the kernel and ramdisk images are being hidden
 correctly and your “ubuntu-11.10-server-amd64” and
 “ubuntu-12.04-server-amd64” have the wrong image type set.

  

 All the best,

  

 -  Gabriel

  

 *From:* 
 openstack-bounces+gabriel.hurley=nebula@lists.launchpad.net[mailto:
 openstack-bounces+gabriel.hurley=nebula@lists.launchpad.net] *On
 Behalf Of *Sam Su
 *Sent:* Thursday, July 05, 2012 11:20 AM
 *To:* openstack
 *Subject:* [Openstack] About images list in dashboard

  

 Hi,

  

 I have an Openstack Essex environment. The nova control services, glance,
 keystone and dashboard are all deployed in one server. 

 Now I encounter a strange problem. I can only see two images (all images
 are set is_public=true)  in the tenant 'demo' from dashboard, i.e.,
 Horizon, as below:

 *Image Name Type   Status  Public Container
 Format   Actions*

 CentOS-6.2-x86_64  Image  Active YesOVF
 Launch

 CentOS-5.8-x86_64  Image  Active YesOVF
 Launch

  

  

 However, when I use  'nova image-list' with the same credential for the
 same tenant 'demo', I can see many more images (see the following result)*
 ***

  

 # nova image-list

 +-+--+-
 --+--+

 |  ID
| Name  |
 Status   |Server  |

 +-+--+---
 +--+

 | 18b130ce-a815-4671-80e8-9308a7b6fc6d  | ubuntu-12.04-server-amd64
  | ACTIVE |  |

 | 388d16ce-b80b-4e9e-b8db-db6dce6f4a83  | ubuntu-12.04-server-amd64-kernel
| ACTIVE |  |

 | 8d9505ce-0974-431d-a53d-e9ed6dc89033 | CentOS-6.2-x86_64
 | ACTIVE |  |

 | 99be14c0-3b15-470b-9e2d-a9d7e2242c7a | CentOS-5.8-x86_64
 | ACTIVE |  |

 | a486733f-c011-4fa1-8ce2-553084f9bc0e | ubuntu-11.10-server-amd64
| 

[Openstack] how to properly get rid of some `nova-manage service list` entries

2012-07-09 Thread Christian Parpart
Hey all,

I'm having some old entries in the output of `nova-manage service list`,
which I would like to get rid from. One compute, and 2 nova-network items.
Is it safe to just DELETE them from the mysql table or is there more
involved?

Best regards,
Christian Parpart.
___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp


[Openstack] inter-tenant and VM-to-bare-metal communication policies/restrictions.

2012-07-05 Thread Christian Parpart
Hi all,

I am running multiple compute nodes and a single nova-network node, that is
to act
as a central gateway for the tenant's VMs.

However, since this nova-network node (of course) knows all routes, every
VM of
any tenant can talk to each other, including to the physical nodes, which
I highly disagree with and would like to restrict that. :-)

root@gw1:~# ip route show
default via $UPLINK_IP dev eth1  metric 100
10.10.0.0/19 dev eth0  proto kernel  scope link  src 10.10.30.5
10.10.40.0/21 dev br100  proto kernel  scope link  src 10.10.40.1
10.10.48.0/24 dev br101  proto kernel  scope link  src 10.10.48.1
10.10.49.0/24 dev br102  proto kernel  scope link  src 10.10.49.1
$PUBLIC_NET/28 dev eth1  proto kernel  scope link  src $PUBLIC_IP
192.168.0.0/16 dev eth0  proto kernel  scope link  src 192.168.2.1

- 10.10.0.0/19 is the network for bare metal nodes, switches, PDUs, etc.
- 10.10.40.0/21(br100) is the production tenant
- 10.10.48.0/24 (br101) is the staging tenant
- 10.10.49.0/24 (br102) is the playground tenant.
- 192.168.0.0/16 is the legacy network (management and VM nodes)

No tenant's VM shall be able to talk to a VM of another tenant.
And ideally no tenant's VM should be able to talk to the management
network either.

Unfortunately, since we're migrating a live system, and we also have
production services on the bare-metal nodes, I had to add special routes
to allow the legacy installations to communicate to the new production
VMs for the transition phase. I hope I can remove that ASAP.

Now, checking iptables on the nova-network node:

root@gw1:~# iptables -t filter -vn -L FORWARD
Chain FORWARD (policy ACCEPT 64715 packets, 13M bytes)
 pkts bytes target prot opt in out source
destination
  36M   29G nova-filter-top  all  --  *  *   0.0.0.0/0
0.0.0.0/0
  36M   29G nova-network-FORWARD  all  --  *  *   0.0.0.0/0
   0.0.0.0/0

root@gw1:~# iptables -t filter -vn -L nova-filter-top
Chain nova-filter-top (2 references)
 pkts bytes target prot opt in out source
destination
  36M   29G nova-network-local  all  --  *  *   0.0.0.0/0
 0.0.0.0/0

root@gw1:~# iptables -t filter -vn -L nova-network-local
Chain nova-network-local (1 references)
 pkts bytes target prot opt in out source
destination

root@gw1:~# iptables -t filter -vn -L nova-network-FORWARD
Chain nova-network-FORWARD (1 references)
 pkts bytes target prot opt in out source
destination
0 0 ACCEPT all  --  br102  *   0.0.0.0/0
0.0.0.0/0
0 0 ACCEPT all  --  *  br102   0.0.0.0/0
0.0.0.0/0
0 0 ACCEPT udp  --  *  *   0.0.0.0/0
 10.10.49.2   udp dpt:1194
  18M   11G ACCEPT all  --  br100  *   0.0.0.0/0
0.0.0.0/0
  18M   18G ACCEPT all  --  *  br100   0.0.0.0/0
0.0.0.0/0
0 0 ACCEPT udp  --  *  *   0.0.0.0/0
 10.10.40.2   udp dpt:1194
 106K   14M ACCEPT all  --  br101  *   0.0.0.0/0
0.0.0.0/0
79895   23M ACCEPT all  --  *  br101   0.0.0.0/0
0.0.0.0/0
0 0 ACCEPT udp  --  *  *   0.0.0.0/0
 10.10.48.2   udp dpt:1194

Now I see, that all traffic from tenant staging (br101) for example
allows any traffic from/to any destination (-j ACCEPT).
I'd propose to reduce this limitation to the public gateway interface (eth1
in my case) and that this value
shall be configurable in the nova.conf file.

Is there any other thing, I might have overseen, to disallow inter-tenant
communication and to disallow
tenant-VM-to-bare-metal communication?

Many thanks in advance,
Christian Parpart.
___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp


Re: [Openstack] Nova Pacemaker Resource Agents

2012-07-03 Thread Christian Parpart
Hey,

that's great, but how do you handle RabbitMQ in-between?

I kind of achieved it w/o OCF agents but used native upstart support of
Pacemaker, however,
OCF's are much more nicer, and still, I'd be interested in how you solved
the RabbitMQ issue.

Best regards,
Christian Parpart.

On Mon, Jul 2, 2012 at 7:38 PM, Sébastien Han han.sebast...@gmail.comwrote:

 Hi everyone,

 For those of you who want to achieve HA in nova. I wrote some resource
 agents according to the OCF specification. The RAs available are:

- nova-scheduler
- nova-api
- novnc
- nova-consoleauth
- nova-cert

 The how-to is available here:
 http://www.sebastien-han.fr/blog/2012/07/02/openstack-nova-components-ha/ and
 the RAs on my Github https://github.com/leseb/OpenStack-ra

 Those RAs mainly re-use the structure of the resource agent written by
 Martin Gerhard Loschwitz from Hastexo.

 Hope it helps!

 Cheers.

 ~Seb


 ___
 Mailing list: https://launchpad.net/~openstack
 Post to : openstack@lists.launchpad.net
 Unsubscribe : https://launchpad.net/~openstack
 More help   : https://help.launchpad.net/ListHelp


___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp


Re: [Openstack] HA inside VMs (via Corosync/Pacemaker)

2012-06-30 Thread Christian Parpart
Oh, no. I use floating IPs for actually real public IPs.
But now, that you mention the pools, well, I would have to assign one
floating IP
to at least TWO KVM instances.

Hm, Pacemaker/Corosync *inside* the VM will add the Service-IP to the local
ethernet
interface, and thus, the outside OpenStack components do not know about.

Using a dedicated floating IP pool for service IPs might feel like a great
solution, but
OpenStack is not the one to manage who gets what IP - but
Corosync/Pacemaker inside
the KVM instances. :-)

Anyone an idea how to solve this?

Many thanks in advance,
Christian.

On Sat, Jun 30, 2012 at 5:00 AM, Vishvananda Ishaya
vishvana...@gmail.comwrote:

 Seems like you could use a floating ip for this. You can define a range
 for internal floating ips by using a separate floating ip pool.
 On Jun 29, 2012 7:06 PM, Christian Parpart tra...@gmail.com wrote:

 Hey all,

 I would like to setup a highly available service *inside* two KVM
 instances,
 so I have created a security group to contain all required service ports,
 so clients can connect to either VM and that works.

 And both instances have their own designated IP address, provided by
 nova itself.

 And now I want to allocate a custom private IP address (I just chose one
 from
  the higher address range, since I've a quite a big one (/21) and it was
 planned
 to use higher numbers for HA service IPs.

 But how do I teach OpneStack to let traffic to these KVMs via its
 designated
 Service IP?

 I took a look at the iptables rules, however, they are created
 automatically,
 and I did not get it really right what it all wants to tell me yet and
 what is there
  for what (not every rule uses -m comment --comment $hint). :-)

 So how do I teach OpneStack custom provided IP addresses?

 Best regards,
 Christian.

 ___
 Mailing list: https://launchpad.net/~openstack
 Post to : openstack@lists.launchpad.net
 Unsubscribe : https://launchpad.net/~openstack
 More help   : https://help.launchpad.net/ListHelp


___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp


Re: [Openstack] HA inside VMs (via Corosync/Pacemaker)

2012-06-30 Thread Christian Parpart
On Sat, Jun 30, 2012 at 1:51 PM, Narayan Desai narayan.de...@gmail.comwrote:

 On Sat, Jun 30, 2012 at 3:06 AM, Christian Parpart tra...@gmail.com
 wrote:
  Hm, Pacemaker/Corosync *inside* the VM will add the Service-IP to the
 local
  ethernet
  interface, and thus, the outside OpenStack components do not know about.
 
  Using a dedicated floating IP pool for service IPs might feel like a
 great
  solution, but
  OpenStack is not the one to manage who gets what IP - but
 Corosync/Pacemaker
  inside
  the KVM instances. :-)
 
  Anyone an idea how to solve this?

 It sounds like you want to add explicit support to pacemaker to deal
 with openstack fixed addresses. Then you could run with rfc1918
 floating addresses, and then have pacemaker/corosync reassign the
 (external) fixed address when consensus changes.

 Think of the openstack fixed address control plane in a similar way to
 ifconfig. You should even be able to script it up yourself; you'd need
 to add your openstack creds to the HA images though.


Hey,

that's a really great idea, and IMHO apparently the only way to not
interfere with OpenStack internals too much.

So I need to create a new resource agent that represents a floating IP.
If I succeed, I'll share that script then. :)

Cheers,
Christian Parpart.
___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp


[Openstack] HA inside VMs (via Corosync/Pacemaker)

2012-06-29 Thread Christian Parpart
Hey all,

I would like to setup a highly available service *inside* two KVM instances,
so I have created a security group to contain all required service ports,
so clients can connect to either VM and that works.

And both instances have their own designated IP address, provided by
nova itself.

And now I want to allocate a custom private IP address (I just chose one
from
the higher address range, since I've a quite a big one (/21) and it was
planned
to use higher numbers for HA service IPs.

But how do I teach OpneStack to let traffic to these KVMs via its designated
Service IP?

I took a look at the iptables rules, however, they are created
automatically,
and I did not get it really right what it all wants to tell me yet and what
is there
for what (not every rule uses -m comment --comment $hint). :-)

So how do I teach OpneStack custom provided IP addresses?

Best regards,
Christian.
___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp


Re: [Openstack] big problem with boot from iso

2012-06-26 Thread Christian Parpart
On Tue, Jun 26, 2012 at 2:30 AM, William Herry 
william.herry.ch...@gmail.com wrote:

 Hi
 I use boot from iso to install a centos instance, it can't recognize the
 disk, I create a flavor with 300G ephemeral and 300G disk, it says no valid
 disk found, but when I create a flavor with 30 swap, it found the disk
 vdb, and can install the system, but of course can boot

Hey,

maybe your VM disk space is exported as VIRTIO block-device (/dev/vda, ...)
and your ISO image doesn't support these block devices? Try loading its
underlying kernel module :)

Cheers,
Christian.
___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp


[Openstack] nova boot --hint same_host=[UUID] fails with InstanceNotFound: Instance [ could not be found. ?

2012-06-26 Thread Christian Parpart
Hey all,

while strictly following the guidelines [1] on how to spawn an instance on
the same host as another instance,
I run into the error, that it cannot find some instanced called: [, which
- of course - is not the UUID I specified.
I tried dropping the [ ] and just passed the UUID right away, but still,
then is just takes the first character of
the UUID and says, that it can't find this one.

my exact command line looked like this:

nova boot --image 6c73d25e-df93-4c96-a803-9d419a367267 --flavor 16 --hint
same_host=[df5fd16b-271d-45ac-9e9a-5d3ad33920e5] $instance_name

2012-06-26 08:26:28 ERROR nova.rpc.amqp
[req-67363fee-1046-4ff8-90a2-05aacb1cbe10 fe655fd0ad49474db5882931685c77fe
8f956f17ce9d4c4d9957c230aab4f720] Exception during message handling
2012-06-26 08:26:28 TRACE nova.rpc.amqp Traceback (most recent call last):
2012-06-26 08:26:28 TRACE nova.rpc.amqp   File
/usr/lib/python2.7/dist-packages/nova/rpc/amqp.py, line 252, in
_process_data
2012-06-26 08:26:28 TRACE nova.rpc.amqp rval = node_func(context=ctxt,
**node_args)
2012-06-26 08:26:28 TRACE nova.rpc.amqp   File
/usr/lib/python2.7/dist-packages/nova/scheduler/manager.py, line 115, in
run_instance
2012-06-26 08:26:28 TRACE nova.rpc.amqp context, ex, *args, **kwargs)
2012-06-26 08:26:28 TRACE nova.rpc.amqp   File
/usr/lib/python2.7/contextlib.py, line 24, in __exit__
2012-06-26 08:26:28 TRACE nova.rpc.amqp self.gen.next()
2012-06-26 08:26:28 TRACE nova.rpc.amqp   File
/usr/lib/python2.7/dist-packages/nova/scheduler/manager.py, line 105, in
run_instance
2012-06-26 08:26:28 TRACE nova.rpc.amqp return
self.driver.schedule_run_instance(*args, **kwargs)
2012-06-26 08:26:28 TRACE nova.rpc.amqp   File
/usr/lib/python2.7/dist-packages/nova/scheduler/multi.py, line 78, in
schedule_run_instance
2012-06-26 08:26:28 TRACE nova.rpc.amqp return
self.drivers['compute'].schedule_run_instance(*args, **kwargs)
2012-06-26 08:26:28 TRACE nova.rpc.amqp   File
/usr/lib/python2.7/dist-packages/nova/scheduler/filter_scheduler.py, line
72, in schedule_run_instance
2012-06-26 08:26:28 TRACE nova.rpc.amqp *args, **kwargs)
2012-06-26 08:26:28 TRACE nova.rpc.amqp   File
/usr/lib/python2.7/dist-packages/nova/scheduler/filter_scheduler.py, line
194, in _schedule
2012-06-26 08:26:28 TRACE nova.rpc.amqp filter_properties)
2012-06-26 08:26:28 TRACE nova.rpc.amqp   File
/usr/lib/python2.7/dist-packages/nova/scheduler/host_manager.py, line
218, in filter_hosts
2012-06-26 08:26:28 TRACE nova.rpc.amqp if
host.passes_filters(filter_fns, filter_properties):
2012-06-26 08:26:28 TRACE nova.rpc.amqp   File
/usr/lib/python2.7/dist-packages/nova/scheduler/host_manager.py, line
156, in passes_filters
2012-06-26 08:26:28 TRACE nova.rpc.amqp if not filter_fn(self,
filter_properties):
2012-06-26 08:26:28 TRACE nova.rpc.amqp   File
/usr/lib/python2.7/dist-packages/nova/scheduler/filters/affinity_filter.py,
line 64, in host_passes
2012-06-26 08:26:28 TRACE nova.rpc.amqp if self._affinity_host(context,
i) == me])
2012-06-26 08:26:28 TRACE nova.rpc.amqp   File
/usr/lib/python2.7/dist-packages/nova/scheduler/filters/affinity_filter.py,
line 30, in _affinity_host
2012-06-26 08:26:28 TRACE nova.rpc.amqp return
self.compute_api.get(context, instance_id)['host']
2012-06-26 08:26:28 TRACE nova.rpc.amqp   File
/usr/lib/python2.7/dist-packages/nova/compute/api.py, line 1022, in get
2012-06-26 08:26:28 TRACE nova.rpc.amqp instance =
self.db.instance_get(context, instance_id)
2012-06-26 08:26:28 TRACE nova.rpc.amqp   File
/usr/lib/python2.7/dist-packages/nova/db/api.py, line 540, in instance_get
2012-06-26 08:26:28 TRACE nova.rpc.amqp return
IMPL.instance_get(context, instance_id)
2012-06-26 08:26:28 TRACE nova.rpc.amqp   File
/usr/lib/python2.7/dist-packages/nova/db/sqlalchemy/api.py, line 120, in
wrapper
2012-06-26 08:26:28 TRACE nova.rpc.amqp return f(*args, **kwargs)
2012-06-26 08:26:28 TRACE nova.rpc.amqp   File
/usr/lib/python2.7/dist-packages/nova/db/sqlalchemy/api.py, line 1339, in
instance_get
2012-06-26 08:26:28 TRACE nova.rpc.amqp raise
exception.InstanceNotFound(instance_id=instance_id)
2012-06-26 08:26:28 TRACE nova.rpc.amqp InstanceNotFound: Instance [ could
not be found.

And that's the error in the log then.

Any ideas if it's my fault or how to work around this?

Many thanks in advance,
Christian Parpart.
___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp


[Openstack] maybe a bug, but where? (dnsmasq-dhcp versus Redis inside KVM)

2012-06-18 Thread Christian Parpart
Hey all,

after having upgraded to dnsmasq 1.62 (current), increasing
the lease times up to 7 days, I now have a very silent syslog
on my gateway host. However, there is one KVM instance
(running redis inside, w/ a 16GB RAM flavor),
that still looses its IP very very frequently.

It now seems, that even after just 4 hours of KVM instance uptime,
the nova-network node receives the following and logs it with:

2260 Jun 15 16:51:37 cesar1 dnsmasq-dhcp[8707]: DHCPREQUEST(br100)
10.10.40.16 fa:16:3e:3d:ff:f3
2261 Jun 15 16:51:37 cesar1 dnsmasq-dhcp[8707]: DHCPACK(br100) 10.10.40.16
fa:16:3e:3d:ff:f3 redis-appdata1
[]
3381 Jun 15 21:59:41 cesar1 dnsmasq-dhcp[10889]: DHCPREQUEST(br100)
10.10.40.16 fa:16:3e:3d:ff:f3
3382 Jun 15 21:59:41 cesar1 dnsmasq-dhcp[10889]: DHCPACK(br100) 10.10.40.16
fa:16:3e:3d:ff:f3 redis-appdata1
[]

And 26:59 was exactly the time our redis server went down.
Although, I cannot find anything except cron logs in the KVM instance's
syslog.

My question now is, why is such a request causing network unreachability to
that node?

Many thanks in advance,
Christian Parpart.
___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp


Re: [Openstack] [Openstack-operators] Nova Controller HA issues

2012-06-15 Thread Christian Parpart
Hey,

well, I said I might be wrong because I have no clear vision on how
OpenStack works in
its deepest detail, however, I would not like to depend on a controller
node that
is inside a virtual machine, controlled by compute nodes, that are
controlled by the controller
node. This sounds quite like a chicken-and-egg problem.

However, at the time of this writing, I think you'll have to have a working
nova-scheduler process,
which is responsible on deciding on which compute node to spawn your VM
(what else?),
and think about what you do when this (or all your controller-)VMs terribly
die,
and you want to rebuild it, how do you plan to do this when your controller
node is out-of-service?

I in my case have put the controller services onto two compute nodes, and
use Pacemaker
to switch between them, in case one node goes down, the other can take over
(via shared service-IP).

Again, these are my thoughts, and I am using OpenStack for just about a
month now :-)
But I hope this helps a bit...

Best regards,
Christian Parpart.

On Fri, Jun 15, 2012 at 8:16 AM, Igor Laskovy igor.lask...@gmail.comwrote:

 Why? Can you please clarify.

 Igor Laskovy
 facebook.com/igor.laskovy
 Kiev, Ukraine
 On Jun 15, 2012 1:55 AM, Christian Parpart tra...@gmail.com wrote:

 I don't think putting the controller node completely into a VM is a good
 advice,
 at least when speaking of nova-scheduler and nova-api (if central).

 I may be wrong, and if so, please correct me.

 Christian.

 On Thu, Jun 14, 2012 at 7:20 PM, Igor Laskovy igor.lask...@gmail.comwrote:

 Hi, have any updates there?
 Can anybody clarify what happens if controller nodes just going hard
 shutdown?

 I thinking about solution with two hypervisors and putting controller
 node in VM shared storage, which can be relaunched when active
 hypervisor will die.
 Any ideas, advise?


 On Tue, Jun 12, 2012 at 3:52 PM, John Garbutt john.garb...@citrix.com
 wrote:
  Sure, I get your point.
 
  I think Florian is working on some docs to help on that.
 
  Not sure how much has been done already.
 
 
 
  Cheers,
 
  John
 
 
 
  From: Christian Parpart [mailto:tra...@gmail.com]
  Sent: 12 June 2012 13:47
  To: John Garbutt
  Cc: openstack-operat...@lists.openstack.org
  Subject: Re: [Openstack-operators] Nova Controller HA issues
 
 
 
  Hey, ya I also found this page, but didn't find it yet that helpful, it
  rather much sounds like a theoretical paper on
 
  how they implemented it rather then telling me on how to actually make
 it
  happen (from the sysop point of view :-)
 
 
 
  I hoped that someone had to face this already, since I really find it
 very
  unintuitive to realize, or need to wait until
 
  I get more time to investigate dedicated. :-)
 
 
 
  Regards,
 
  Christian.
 
  On Tue, Jun 12, 2012 at 12:52 PM, John Garbutt 
 john.garb...@citrix.com
  wrote:
 
  I thought Rabbit had a built in HA solution these days:
 
  http://www.rabbitmq.com/ha.html
 
 
 
  From: openstack-operators-boun...@lists.openstack.org
  [mailto:openstack-operators-boun...@lists.openstack.org] On Behalf Of
  Christian Parpart
  Sent: 12 June 2012 09:59
  To: openstack-operat...@lists.openstack.org
  Subject: [Openstack-operators] Nova Controller HA issues
 
 
 
  Hi all,
 
 
 
  after spending the whole evening in making our cloud controller node
 highly
  available
 
  using Corosync/Pacemaker, at which I am really proud about it, I am
 having
  just a few
 
  problems left, and the one that freaks me out the most is
 rabbitmq-server.
 
 
 
  That beast I just seem to find no good documenation on how to set
  rabbitmq-server up
 
  properly for HA'ing.
 
 
 
  Does anyone have ever tried to set a nova controller (including
 rabbitmq
  dependency) up for HAing?
 
  If so, I'd be pleased to share experiences, especially to the latter
 part.
  :-)
 
 
 
  Best regards,
 
  Christian Parpart
 
 
 
 
  ___
  Openstack-operators mailing list
  openstack-operat...@lists.openstack.org
 
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
 



 --
 Igor Laskovy
 Kiev, Ukraine



___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp


Re: [Openstack] instances loosing IP address while running, due to No DHCPOFFER

2012-06-15 Thread Christian Parpart
Hey all,

it now just happened twice again, both just today. and the last at 22:00
UTC, with
the following in the nova-network's syslog:

root@gw1:/var/log# grep 'dnsmasq.*10889' daemon.log
Jun 15 17:39:32 cesar1 dnsmasq[10889]: started, version v2.62-7-g4ce4f37
cachesize 150
Jun 15 17:39:32 cesar1 dnsmasq[10889]: compile time options: IPv6
GNU-getopt no-DBus no-i18n no-IDN DHCP DHCPv6 no-Lua TFTP no-conntrack
Jun 15 17:39:32 cesar1 dnsmasq-dhcp[10889]: DHCP, static leases only on
10.10.40.3, lease time 3d
Jun 15 17:39:32 cesar1 dnsmasq[10889]: reading /etc/resolv.conf
Jun 15 17:39:32 cesar1 dnsmasq[10889]: using nameserver 4.2.2.1#53
Jun 15 17:39:32 cesar1 dnsmasq[10889]: using nameserver 178.63.26.173#53
Jun 15 17:39:32 cesar1 dnsmasq[10889]: using nameserver 192.168.2.122#53
Jun 15 17:39:32 cesar1 dnsmasq[10889]: using nameserver 192.168.2.121#53
Jun 15 17:39:32 cesar1 dnsmasq[10889]: read /etc/hosts - 519 addresses
Jun 15 17:39:32 cesar1 dnsmasq-dhcp[10889]: read
/var/lib/nova/networks/nova-br100.conf
Jun 15 21:59:41 cesar1 dnsmasq-dhcp[10889]: DHCPREQUEST(br100) 10.10.40.16
fa:16:3e:3d:ff:f3
Jun 15 21:59:41 cesar1 dnsmasq-dhcp[10889]: DHCPACK(br100) 10.10.40.16
fa:16:3e:3d:ff:f3 redis-appdata1

it seemed that this once VM was the only one who sent a dhcp request over
the past 5 hours,
and that first wone got replied with dhcp ack, and that is it.
That's been the time the host behind that IP (redis-appdata1) stopped
functioning.

However, I now actually did update dnsmasq on our gateway note, to latest
trunk
of dnsmasq git repository, killed dnsmasq, restarted nova-network (which
auto-starts dnsmasq per
device).

Now, I really hoped that this one particular bug fix was the cause of the
downtime,
but appearently, thet MIGHT be another factor.

There is unfortunately nothing to read in the VM's syslog.
What else could cause the VM to forget its IP?
Can this also be caused by send_arp_for_ha=True?

Regards,
Christian.

Christian.
On Fri, Jun 15, 2012 at 2:50 AM, Nathanael Burton 
nathanael.i.bur...@gmail.com wrote:

 FWIW I haven't run across the dnsmasq bug in our environment using EPEL
 packages.

 Nate
 On Jun 14, 2012 7:20 PM, Vishvananda Ishaya vishvana...@gmail.com
 wrote:

 Are you running in VLAN mode? If so, you probably need to update to a new
 version of dnsmasq.  See this message for reference:

 http://osdir.com/ml/openstack-cloud-computing/2012-05/msg00785.html

 Vish

 On Jun 14, 2012, at 1:41 PM, Christian Parpart wrote:

 Hey all,

 I feel really sad with saying this, now, that we have quite a few
 instances in producgtion
 since about 5 days at least, I now have encountered the second instance
 loosing its
 IP address due to No DHCPOFFER (as of syslog in the instance).

 I checked the logs in the central nova-network and gateway node and found
 dnsmasq still to reply on requests from all the other instances and it
 even
 got the request from the instance in question and even sent an OFFER, as
 of what
 I can tell by now (i'm investigating / posting logs asap), but while it
 seemed
 that the dnsmasq sends an offer, the instances says it didn't receive one
 - wtf?

 Please tell me what I can do to actually *fix* this issue, since this is
 by far very fatal.

 One chance I'd see (as a workaround) is, to let created instanced retrieve
 its IP via dhcp, but then reconfigure /etc/network/instances to continue
 with
 static networking setup. However, I'd just like the dhcp thingy to get
 fixed.

 I'm very open to any kind of helping comments, :)

 So long,
 Christian.

 ___
 Mailing list: https://launchpad.net/~openstack
 Post to : openstack@lists.launchpad.net
 Unsubscribe : https://launchpad.net/~openstack
 More help   : https://help.launchpad.net/ListHelp



 ___
 Mailing list: https://launchpad.net/~openstack
 Post to : openstack@lists.launchpad.net
 Unsubscribe : https://launchpad.net/~openstack
 More help   : https://help.launchpad.net/ListHelp


___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp


[Openstack] instances loosing IP address while running, due to No DHCPOFFER

2012-06-14 Thread Christian Parpart
Hey all,

I feel really sad with saying this, now, that we have quite a few instances
in producgtion
since about 5 days at least, I now have encountered the second instance
loosing its
IP address due to No DHCPOFFER (as of syslog in the instance).

I checked the logs in the central nova-network and gateway node and found
dnsmasq still to reply on requests from all the other instances and it even
got the request from the instance in question and even sent an OFFER, as of
what
I can tell by now (i'm investigating / posting logs asap), but while it
seemed
that the dnsmasq sends an offer, the instances says it didn't receive one -
wtf?

Please tell me what I can do to actually *fix* this issue, since this is by
far very fatal.

One chance I'd see (as a workaround) is, to let created instanced retrieve
its IP via dhcp, but then reconfigure /etc/network/instances to continue
with
static networking setup. However, I'd just like the dhcp thingy to get
fixed.

I'm very open to any kind of helping comments, :)

So long,
Christian.
___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp


Re: [Openstack] [Openstack-operators] Nova Controller HA issues

2012-06-14 Thread Christian Parpart
I don't think putting the controller node completely into a VM is a good
advice,
at least when speaking of nova-scheduler and nova-api (if central).

I may be wrong, and if so, please correct me.

Christian.

On Thu, Jun 14, 2012 at 7:20 PM, Igor Laskovy igor.lask...@gmail.comwrote:

 Hi, have any updates there?
 Can anybody clarify what happens if controller nodes just going hard
 shutdown?

 I thinking about solution with two hypervisors and putting controller
 node in VM shared storage, which can be relaunched when active
 hypervisor will die.
 Any ideas, advise?


 On Tue, Jun 12, 2012 at 3:52 PM, John Garbutt john.garb...@citrix.com
 wrote:
  Sure, I get your point.
 
  I think Florian is working on some docs to help on that.
 
  Not sure how much has been done already.
 
 
 
  Cheers,
 
  John
 
 
 
  From: Christian Parpart [mailto:tra...@gmail.com]
  Sent: 12 June 2012 13:47
  To: John Garbutt
  Cc: openstack-operat...@lists.openstack.org
  Subject: Re: [Openstack-operators] Nova Controller HA issues
 
 
 
  Hey, ya I also found this page, but didn't find it yet that helpful, it
  rather much sounds like a theoretical paper on
 
  how they implemented it rather then telling me on how to actually make it
  happen (from the sysop point of view :-)
 
 
 
  I hoped that someone had to face this already, since I really find it
 very
  unintuitive to realize, or need to wait until
 
  I get more time to investigate dedicated. :-)
 
 
 
  Regards,
 
  Christian.
 
  On Tue, Jun 12, 2012 at 12:52 PM, John Garbutt john.garb...@citrix.com
  wrote:
 
  I thought Rabbit had a built in HA solution these days:
 
  http://www.rabbitmq.com/ha.html
 
 
 
  From: openstack-operators-boun...@lists.openstack.org
  [mailto:openstack-operators-boun...@lists.openstack.org] On Behalf Of
  Christian Parpart
  Sent: 12 June 2012 09:59
  To: openstack-operat...@lists.openstack.org
  Subject: [Openstack-operators] Nova Controller HA issues
 
 
 
  Hi all,
 
 
 
  after spending the whole evening in making our cloud controller node
 highly
  available
 
  using Corosync/Pacemaker, at which I am really proud about it, I am
 having
  just a few
 
  problems left, and the one that freaks me out the most is
 rabbitmq-server.
 
 
 
  That beast I just seem to find no good documenation on how to set
  rabbitmq-server up
 
  properly for HA'ing.
 
 
 
  Does anyone have ever tried to set a nova controller (including rabbitmq
  dependency) up for HAing?
 
  If so, I'd be pleased to share experiences, especially to the latter
 part.
  :-)
 
 
 
  Best regards,
 
  Christian Parpart
 
 
 
 
  ___
  Openstack-operators mailing list
  openstack-operat...@lists.openstack.org
  http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
 



 --
 Igor Laskovy
 Kiev, Ukraine

___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp


Re: [Openstack] instances loosing IP address while running, due to No DHCPOFFER

2012-06-14 Thread Christian Parpart
Hey,

thanks for your reply. Unfortunately there was no process restart in
nova-network nor in dnsmasq,
both processes seem to have been up for about 2 and 3 days.

However, why is the default dhcp_lease_time value equal 120s? Not having
this one overridden
causes the clients to actually re-acquire a new DHCP lease every 42 seconds
(at least on my nodes),
which is completely ridiculous.
OTOH, I took a look at the sources (linux_net.py) and found out, why the
max_lease_time is
set to 2048, because that is the size of my network.
So why is the max lease time the size of my network?
I've written a tiny patch to allow overriding this value in nova.conf, and
will submit it to launchpad
soon - and hope it'll be accepted and then also applied to essex, since
this is a very straight forward
few-liner helpful thing.

Nevertheless, that does not clarify on why now I had 2 (well, 3 actually)
instances getting
no DHCP replies/offers after some hours/days anymore.

The one host that caused issues today (a few hours ago), I fixed it by hard
rebooting the instance,
however, just about 40 minutes later, it again forgot its IP, so one might
say, that it
maybe did not get any reply from the dhcp server (dnsmasq) almost right
after it got
a lease on instance boot.

So long,
Christian.

On Thu, Jun 14, 2012 at 10:55 PM, Nathanael Burton 
nathanael.i.bur...@gmail.com wrote:

 Has nova-network been restarted? There was an issue where nova-network was
 signalling dnsmasq which would cause dnsmasq to stop responding to requests
 yet appear to be running fine.

 You can see if killing dnsmasq, restarting nova-network, and rebooting an
 instance allows it to get a dhcp address again ...

 Nate
 On Jun 14, 2012 4:46 PM, Christian Parpart tra...@gmail.com wrote:

 Hey all,

 I feel really sad with saying this, now, that we have quite a few
 instances in producgtion
 since about 5 days at least, I now have encountered the second instance
 loosing its
 IP address due to No DHCPOFFER (as of syslog in the instance).

 I checked the logs in the central nova-network and gateway node and found
 dnsmasq still to reply on requests from all the other instances and it
 even
 got the request from the instance in question and even sent an OFFER, as
 of what
 I can tell by now (i'm investigating / posting logs asap), but while it
 seemed
 that the dnsmasq sends an offer, the instances says it didn't receive one
 - wtf?

 Please tell me what I can do to actually *fix* this issue, since this is
 by far very fatal.

 One chance I'd see (as a workaround) is, to let created instanced retrieve
 its IP via dhcp, but then reconfigure /etc/network/instances to continue
 with
 static networking setup. However, I'd just like the dhcp thingy to get
 fixed.

 I'm very open to any kind of helping comments, :)

 So long,
 Christian.


 ___
 Mailing list: https://launchpad.net/~openstack
 Post to : openstack@lists.launchpad.net
 Unsubscribe : https://launchpad.net/~openstack
 More help   : https://help.launchpad.net/ListHelp


___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp


Re: [Openstack] instances loosing IP address while running, due to No DHCPOFFER

2012-06-14 Thread Christian Parpart
Hey all,

many many thanks for all your replies, and while already having raised the
dhcp timeouts
just by now, I'll have now enough time to sleep to actually apply the
dnsmasq fix
tomorrow then.

Yes, I am running in VLAN-mode, since this is also the propagated way.

Maybe OpenStack (nova-network) should check the version number of dnsmasq
and
if running in vlan mode, it really should issue a (critical) warning into
the logs,
especially where this kind of error can lead to disasters in datacenters. :)

I also hope that Ubuntu 12.04 will pick up this patch soon enough, so the
us won't
end up in a patch-dominated distribution :-)

Good night all,
Christian.

On Fri, Jun 15, 2012 at 1:16 AM, Narayan Desai narayan.de...@gmail.comwrote:

 I vaguely recall Vish mentioning a bug in dnsmasq that had a somewhat
 similar problem. (it had to do with lease renewal problems on ip
 aliases or something like that).

 This issue was particularly pronounced with windows VMs, apparently.
  -nld

 On Thu, Jun 14, 2012 at 6:02 PM, Christian Parpart tra...@gmail.com
 wrote:
  Hey,
 
  thanks for your reply. Unfortunately there was no process restart in
  nova-network nor in dnsmasq,
  both processes seem to have been up for about 2 and 3 days.
 
  However, why is the default dhcp_lease_time value equal 120s? Not having
  this one overridden
  causes the clients to actually re-acquire a new DHCP lease every 42
 seconds
  (at least on my nodes),
  which is completely ridiculous.
  OTOH, I took a look at the sources (linux_net.py) and found out, why the
  max_lease_time is
  set to 2048, because that is the size of my network.
  So why is the max lease time the size of my network?
  I've written a tiny patch to allow overriding this value in nova.conf,
 and
  will submit it to launchpad
  soon - and hope it'll be accepted and then also applied to essex, since
 this
  is a very straight forward
  few-liner helpful thing.
 
  Nevertheless, that does not clarify on why now I had 2 (well, 3 actually)
  instances getting
  no DHCP replies/offers after some hours/days anymore.
 
  The one host that caused issues today (a few hours ago), I fixed it by
 hard
  rebooting the instance,
  however, just about 40 minutes later, it again forgot its IP, so one
 might
  say, that it
  maybe did not get any reply from the dhcp server (dnsmasq) almost right
  after it got
  a lease on instance boot.
 
  So long,
  Christian.
 
  On Thu, Jun 14, 2012 at 10:55 PM, Nathanael Burton
  nathanael.i.bur...@gmail.com wrote:
 
  Has nova-network been restarted? There was an issue where nova-network
 was
  signalling dnsmasq which would cause dnsmasq to stop responding to
 requests
  yet appear to be running fine.
 
  You can see if killing dnsmasq, restarting nova-network, and rebooting
 an
  instance allows it to get a dhcp address again ...
 
  Nate
 
  On Jun 14, 2012 4:46 PM, Christian Parpart tra...@gmail.com wrote:
 
  Hey all,
 
  I feel really sad with saying this, now, that we have quite a few
  instances in producgtion
  since about 5 days at least, I now have encountered the second instance
  loosing its
  IP address due to No DHCPOFFER (as of syslog in the instance).
 
  I checked the logs in the central nova-network and gateway node and
 found
  dnsmasq still to reply on requests from all the other instances and it
  even
  got the request from the instance in question and even sent an OFFER,
 as
  of what
  I can tell by now (i'm investigating / posting logs asap), but while it
  seemed
  that the dnsmasq sends an offer, the instances says it didn't receive
 one
  - wtf?
 
  Please tell me what I can do to actually *fix* this issue, since this
 is
  by far very fatal.
 
  One chance I'd see (as a workaround) is, to let created instanced
  retrieve
  its IP via dhcp, but then reconfigure /etc/network/instances to
 continue
  with
  static networking setup. However, I'd just like the dhcp thingy to get
  fixed.
 
  I'm very open to any kind of helping comments, :)
 
  So long,
  Christian.
 
 
  ___
  Mailing list: https://launchpad.net/~openstack
  Post to : openstack@lists.launchpad.net
  Unsubscribe : https://launchpad.net/~openstack
  More help   : https://help.launchpad.net/ListHelp
 
 
 
  ___
  Mailing list: https://launchpad.net/~openstack
  Post to : openstack@lists.launchpad.net
  Unsubscribe : https://launchpad.net/~openstack
  More help   : https://help.launchpad.net/ListHelp
 

___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp


[Openstack] instance snapshotting failed. now in permanent Image_Snapshot state.

2012-06-13 Thread Christian Parpart
Hey all,

I feel really sorry to bother you about that, but it really annoys me now
for quite a while now.
I used snapshotting quite a few times already - in success - but this time
(maybe due to
my HA tries on the cloud controller node) the snapshotting failed.

I clicked snapshot via dashboard, and the went into Task = Image_Snapshot
state,
and the image snapshot is in Status = Queued.
This is now about 12 hours ago, and I quite don't think that anything will
happen w/o human intervention.

Unfortunately, the nova-compute.log on the compute node in question just
says it is about
to snapshot, no errors above nor below. a few hours later I tried to hard
reboot the instance,
but that failed w/o telling me why either.

I searched the nova-scheduler.log on the controller node, but here I
couldn't find anything
related (at least I did not find it).

What can I do now? I do not want to terminate the instance, just because of
the snapshotting
error, and I'd also like to get snapshotting fixed again, but how?

Many thanks in advance,
Christian.
___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp