Re: [Openstack] Floating ip addresses take forever to display
On Tue, Nov 20, 2012 at 03:03:37PM -0500, Lars Kellogg-Stedman wrote: automatically assigned ip address for several minutes (possibly more than 10 or 15) after the system boots. In fact, 30 minutes. I spent some time staring at the clock yesterday. I'm assuming that the calls to self.network_api.invalidate_instance_cache(...) are supposed to take care of this...for example, I see one in compute.api.associate_floating_ip. Do automatically assigned addresses follow the same process as manually assigned ones? Is there a place where I can insert an explicit call to invalidate the cache? -- Lars Kellogg-Stedman l...@seas.harvard.edu | Senior Technologist | http://ac.seas.harvard.edu/ Academic Computing| http://code.seas.harvard.edu/ Harvard School of Engineering | and Applied Sciences| ___ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp
[Openstack] FIXED IT! Re: Floating ip addresses take forever to display
compute.api.associate_floating_ip. Do automatically assigned addresses follow the same process as manually assigned ones? The answer is NO! - compute.manager._allocate_network calls: network_info = self.network_api.allocate_for_instance( context, instance, vpn=is_vpn, requested_networks=requested_networks) ...but as far as I can tell, this code path never calls network.api.invalidate_instance_cache. Adding a call to self.network_api.invalidate_instance_cache immediately after the above call completely resolves this problem. The actual ip assignment happens in network.manager.FloatingIP.allocate_for_instance, which does this: if FLAGS.auto_assign_floating_ip: # allocate a floating ip floating_address = self.allocate_floating_ip(context, project_id) # set auto_assigned column to true for the floating ip self.db.floating_ip_set_auto_assigned(context, floating_address) # get the first fixed address belonging to the instance fixed_ips = nw_info.fixed_ips() fixed_address = fixed_ips[0]['address'] # associate the floating ip to fixed_ip self.associate_floating_ip(context, floating_address, fixed_address, affect_auto_assigned=True) Nothing in manager.py ever calls invalidate_instance_cache. -- Lars Kellogg-Stedman l...@seas.harvard.edu | Senior Technologist | http://ac.seas.harvard.edu/ Academic Computing| http://code.seas.harvard.edu/ Harvard School of Engineering | and Applied Sciences| ___ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp
[Openstack] Python API: Getting a list of floating ips?
Using the Python API, what's the best of getting a list of floating ips assigned to an instance? The Server.addresses dictionary contains *both* fixed and floating ips, and doesn't appear to differentiate between them. E.g: srvr = client.servers.find(name='myinstance') print srvr.addresses {u'fixed_0': [{u'addr': u'172.16.10.31', u'version': 4}, {u'addr': u'10.243.28.46', u'version': 4}]} Do I just assume that the first address in the list is the fixed address? -- Lars Kellogg-Stedman l...@seas.harvard.edu | Senior Technologist | http://ac.seas.harvard.edu/ Academic Computing| http://code.seas.harvard.edu/ Harvard School of Engineering | and Applied Sciences| ___ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp
Re: [Openstack] FIXED IT! Re: Floating ip addresses take forever to display
On Wed, Nov 21, 2012 at 09:12:36AM -0800, Vishvananda Ishaya wrote: This appears to be essex. That's correct. be called on the network_api side before returning from allocate_for_instance. I agree. If you look at folsom, you'll see there is a decorator for this purpose called @refresh_cache Any chance we can get it fixed in Essex, too? Or has this release been abandoned? I'm not clear on what the maintenance schedule looks like as the steamroller of progress moves forward. -- Lars Kellogg-Stedman l...@seas.harvard.edu | Senior Technologist | http://ac.seas.harvard.edu/ Academic Computing| http://code.seas.harvard.edu/ Harvard School of Engineering | and Applied Sciences| ___ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp
[Openstack] Floating ip addresses take forever to display
We've been having a persistent problem with our OpenStack (Essex) cluster. We are automatically assigning floating ips when systems are created (auto_assign_floating_ip = True). When a system boots, neithre the command line tools nor Horizon seem to know about the automatically assigned ip address for several minutes (possibly more than 10 or 15) after the system boots. The system demonstrably has a floating ip address assigned (if you initiate an outbound connection from the system, or inspect the iptables nat rules, you can determine that address and use it to connect to the system). Manually assigning a floating ip address will force things to update (so after manually assigning a floating address you'll see the fixed address, the automatically assigned address, and the manually assigned address). We're running the 2012.1.3 release of things; I've read at least one bug report that seems to describe this issue that implies the fix should already be in this release...but we're still having this problem. Has anyone else encountered this problem? Were you able to solve it? A fix would be great, because right now our documentation is basically start an instance...then go do something else for 30 minutes. -- Lars Kellogg-Stedman l...@seas.harvard.edu | Senior Technologist | http://ac.seas.harvard.edu/ Academic Computing| http://code.seas.harvard.edu/ Harvard School of Engineering | and Applied Sciences| ___ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp
Re: [Openstack] Handling of adminPass is arguably broken (essex)
On Thu, Nov 01, 2012 at 11:03:14AM -0700, Vishvananda Ishaya wrote: The new config drive code defaults to iso-9660, so that should work. The vfat version should probably create a partition table. Is that what Folsom is using? Or is it new-er than that? -- Lars Kellogg-Stedman l...@seas.harvard.edu | Senior Technologist | http://ac.seas.harvard.edu/ Academic Computing| http://code.seas.harvard.edu/ Harvard School of Engineering | and Applied Sciences| ___ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp
Re: [Openstack] Is there any method to Activate Windows during Launch a new Instance?
On Sat, Nov 03, 2012 at 10:00:40AM +0800, Ray Sun wrote: I create a windows 7 image(without activate) and upload to glance, and I can successfully start it up. But how can I automatically activate it after user launch it? Or how can I inject the SN into windows during startup? Or any other better idea? If you make the product key available in the user-data attribute, you can extract it via the scripting language of your choice from http://169.254.19.254/latest/user-data and then install the key: slmgr /ipk product_key And then activate windows: slmgr /ato If the product key is the *only* thing in your user-data attribute, you can do something like this with PowerShell: $web = new-object system.net.webclient $data = $web.DownloadString(http://169.254.169.254/latest/user-data;) slmgr /ipk $data slmgr /ato -- Lars Kellogg-Stedman l...@seas.harvard.edu | Senior Technologist | http://ac.seas.harvard.edu/ Academic Computing| http://code.seas.harvard.edu/ Harvard School of Engineering | and Applied Sciences| ___ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp
Re: [Openstack] Handling of adminPass is arguably broken (essex)
Honestly I think the entire idea of passing a password in to the instance at boot time is insecure and flawed. I think that the use of a configuration drive is a reasonably way to provide configuration information to an instance, and it's more secure than the metadata server. In any case, the problem extends beyond passwords; the way injected network configuration and ssh keys are handled also make unreasonable assumptions about the target operating system and suffer from the same problems as password provisioning. I've put together a patch that solves my needs, available here: https://github.com/seas-computing/nova/commits/lars/admin_pass That branch incorporates also changes from the EPEL packages for 2012.1.3 (since this is what we're running). It seems to work so far, although now we're facing a new problem: the adminPass generated by OpenStack is provided to people running the nova boot ... command line clients but (a) isn't exposed in the web ui and (b) doesn't appear to be otherwise accessible (e.g., via euca-describe-password). -- Lars Kellogg-Stedman l...@seas.harvard.edu | Senior Technologist | http://ac.seas.harvard.edu/ Academic Computing| http://code.seas.harvard.edu/ Harvard School of Engineering | and Applied Sciences| ___ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp
Re: [Openstack] Handling of adminPass is arguably broken (essex)
On Thu, Nov 01, 2012 at 02:07:08PM +, Gabe Westmaas wrote: (a) sounds like a bug in Horizon if that's not viewable immediately after creating the instance. Horizon doesn't display any information after booting an instance...it takes you directly to the Instances screen. So not so much a bug but a design decision, I think. (b) is definitely not going to work - we don't store the password at all, an intentional decision. I figured that, although it appears that Amazon has made a different decision. I'm just looking for a way to make this work :). -- Lars Kellogg-Stedman l...@seas.harvard.edu | Senior Technologist | http://ac.seas.harvard.edu/ Academic Computing| http://code.seas.harvard.edu/ Harvard School of Engineering | and Applied Sciences| ___ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp
Re: [Openstack] Handling of adminPass is arguably broken (essex)
On Wed, Oct 31, 2012 at 09:09:14PM -0400, Lars Kellogg-Stedman wrote: TL;DR: The way OpenStack handles the adminPass attribute during metadata injection is not useful on operating systems without an /etc/passwd and /etc/shadow. I would like to make the adminPass value available on a Windows instance, and this is my proposal for how to do it. Oh geez, it gets worse. The configuration disk created by OpenStack is a whole-disk filesystem (no partition map), so Windows thinks it's all unallocated space...so even with my patches in place I still can't get at the data. I can see I'm traveling through largely unexplored territory here. -- Lars Kellogg-Stedman l...@seas.harvard.edu | Senior Technologist | http://ac.seas.harvard.edu/ Academic Computing| http://code.seas.harvard.edu/ Harvard School of Engineering | and Applied Sciences| ___ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp
[Openstack] Handling of adminPass is arguably broken (essex)
TL;DR: The way OpenStack handles the adminPass attribute during metadata injection is not useful on operating systems without an /etc/passwd and /etc/shadow. I would like to make the adminPass value available on a Windows instance, and this is my proposal for how to do it. I've been putting together a Windows 2008 server image for deploying in our OpenStack environment. I started by setting it up to act just like our Linux images: - It has sshd running as a service via Cygwin - It runs a script at startup to pull an ssh key from the metadata server for the Administrator account. This works great! But I've had some push-back from folks who argue that this process won't be familiar to a typical Windows administrator, so I started trying to figure how to get an administrator password either (a) into the instance from the person creating it, or (b) back to the person creating the instance after generating the password. (a) is relatively easy to do via the user-data attribute, and I have a prototype of that working. However... One of my colleagues mention that there was some mechanism for injecting passwords into instances -- which sounds perfect. Based on my efforts in #openstack, it appears that very few people take advantage of this feature or even know how it operates, so I went diving through the code and eventually found myself in nova/virt/disk/api.py, where I discovered that even with config_drive=True, nova will attempt to copy /etc/passwd and /etc/shadow (which don't exist) off the config drive to modify them locally. This obviously fails, leaving the admin password inaccessible. I would like to propose that if config_drive=True, that the admin password simply get written into a file, where it could be used by operating systems without an /etc/passwd or /etc/shadow file. If this sounds like a good idea, I'll work up a patch. It seems that for this to work, inject_data needs to know whether or not it's targeting a config_drive or an actual partition...and so does inject_data_into_fs. Maybe something like: In virt/libvirt/connection.py: disk.inject_data(injection_path, key, net, metadata, admin_pass, files, partition=target_partition, use_cow=FLAGS.use_cow_images, config_drive=config_drive) And in virt/disk/api.py: def inject_data(image, key=None, net=None, metadata=None, admin_password=None, files=None, partition=None, use_cow=False, config_drive=False): ... inject_data_into_fs(img.mount_dir, key, net, metadata, admin_password, files, config_drive=False) ... def inject_data_into_fs(fs, key, net, metadata, admin_password, files, config_drive=False): ... if admin_password: if config_drive: _inject_admin_password_as_file_into_fs(admin_password, fs) else: _inject_admin_password_into_fs(admin_password, fs) Thoughts? -- Lars Kellogg-Stedman l...@seas.harvard.edu | Senior Technologist | http://ac.seas.harvard.edu/ Academic Computing| http://code.seas.harvard.edu/ Harvard School of Engineering | and Applied Sciences| ___ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp
Re: [Openstack] dnsmasq stops talking to instances?
On Fri, Oct 19, 2012 at 10:24:20AM -0400, Lars Kellogg-Stedman wrote: It happened again last night -- which means we were without networking on our instances for about seven hours -- and restarting nova-network doesn't resolve the problem. It is necessary to first kill dnsmasq (and allow nova-network to restart it). In case folks were curious: I'm pretty sure this was a bad interaction between dhclient on the host and the interface being used for instance networking. We've been running stabling now for a week. -- Lars Kellogg-Stedman l...@seas.harvard.edu | Senior Technologist | http://ac.seas.harvard.edu/ Academic Computing| http://code.seas.harvard.edu/ Harvard School of Engineering | and Applied Sciences| ___ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp
Re: [Openstack] dnsmasq stops talking to instances?
On Mon, Oct 22, 2012 at 01:54:11PM +0200, Gary Kotton wrote: Can you please explain the problems that you had with qpid? OpenStack components were periodically losing touch with each other. Requests to boot/delete an instance, for example, would never make it as far the compute hosts. They would get stuck scheduling. Initially we thought this was exclusively a problem with the network firewall infrastructure (there was a default 1 hour idle connection timeout), but reconfiguring our OpenStack environment to remove the firewalls from the picture did not resolve this problem. Since replacing qpid with rabbitmq, we have not had a single recurrence of this behavior. -- Lars Kellogg-Stedman l...@seas.harvard.edu | Senior Technologist | http://ac.seas.harvard.edu/ Academic Computing| http://code.seas.harvard.edu/ Harvard School of Engineering | and Applied Sciences| ___ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp
[Openstack] Default default security rules?
So there's a blueprint for this: https://blueprints.launchpad.net/nova/+spec/default-rules-for-default-security-group This is one of the biggest usability problems we've run into, because if we create a new tenant we often forget to open up ssh access, and everyone wonders why they can't access their instances. Since it looks like there's no way to set up some kind of default rules that will be applied automatically to new tenants, I'm trying to automate the process of creating a new tenant and security groups all in one fell swoop. I'm not entirely sure how to handle security groups. Create users and tenants is easy; I'm authenticating with the SERVICE_ENDPOINT and SERVICE_TOKEN values for keystone administrative access. That is: client = keystone.Client( endpoint=request.environ['SERVICE_ENDPOINT'], token=request.environ['SERVICE_TOKEN'], ) Is there a way -- using either these credentials or the OpenStack admin user credentials -- for me to modify the default security group for a particular tenant? Or do I have to authenticate as a user that is a member of the target tenant in order to set up the rules? Thanks, -- Lars Kellogg-Stedman l...@seas.harvard.edu | Senior Technologist | http://ac.seas.harvard.edu/ Academic Computing| http://code.seas.harvard.edu/ Harvard School of Engineering | and Applied Sciences| ___ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp
Re: [Openstack] dnsmasq stops talking to instances?
On Thu, Oct 18, 2012 at 06:16:07PM +0100, Ronivon Costa wrote: I have noticed a similar behaviour, for example when the switch/router is rebooted. I am able to recover the communications with the VMs restarting nova network (no need to kill dnsmasq). There are no network devices being rebooted here...and since we're running in multi_host mode, both dnsmasq and the affected instances are running *on the same physical system*. It happened again last night -- which means we were without networking on our instances for about seven hours -- and restarting nova-network doesn't resolve the problem. It is necessary to first kill dnsmasq (and allow nova-network to restart it). There are no errors being logged by dnsmasq; started just after 2AM, all of the DHCPREQUEST ... traffic just stops, and the logs after that point look like this: Oct 19 02:02:34 stack-1 dnsmasq[32013]: read /etc/hosts - 2 addresses Oct 19 02:02:34 stack-1 dnsmasq[32013]: read /var/lib/nova/networks/nova-br662.conf Oct 19 02:02:35 stack-1 dnsmasq[32013]: read /etc/hosts - 2 addresses Oct 19 02:02:35 stack-1 dnsmasq[32013]: read /var/lib/nova/networks/nova-br662.conf Oct 19 02:03:12 stack-1 dnsmasq[32013]: read /etc/hosts - 2 addresses Oct 19 02:03:12 stack-1 dnsmasq[32013]: read /var/lib/nova/networks/nova-br662.conf ...until I restart things. -- Lars Kellogg-Stedman l...@seas.harvard.edu | Senior Technologist | http://ac.seas.harvard.edu/ Academic Computing| http://code.seas.harvard.edu/ Harvard School of Engineering | and Applied Sciences| ___ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp
[Openstack] dnsmasq stops talking to instances?
The good news is that since replacing qpid with rabbitmq our environment seems to have stabilized to the point that it's *almost* useful. The last remaining issue is that dnsmasq will occasionally stop responding to instances. Killing dnsmasq and restarting openstack-nova-network makes things work again, but I haven't been able to figure out why dnsmasq stops responding in the first place. Has anyone seen this behavior before? Any pointers would be greatly appreciated. -- Lars Kellogg-Stedman l...@seas.harvard.edu | Senior Technologist | http://ac.seas.harvard.edu/ Academic Computing| http://code.seas.harvard.edu/ Harvard School of Engineering | and Applied Sciences| ___ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp
Re: [Openstack] Snapshotting ephemeral disks?
This is not possible. Sounds like you really want a fix for this bug: https://bugs.launchpad.net/nova/+bug/914484 The fix was reverted but hopefully it can be cleaned up and come back in. That looks like it would get us where we want to be. My main goal is to have a relatively easy process for someone without administrative access to the host to... - Uploading an ISO image. - Boot an instance from that image. - Install an OS onto something. - Transform something from the previous step into a deployable image. Allowing images booted from an ISO to have an attached, snapshottable root drive would certainly work. -- Lars Kellogg-Stedman l...@seas.harvard.edu | Senior Technologist| http://ac.seas.harvard.edu/ Academic Computing | http://code.seas.harvard.edu/ Harvard School of Engineering and Applied Sciences | ___ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp
[Openstack] Snapshotting ephemeral disks?
I have a system that I've booted using a live CD image (so .../instance-ID/disk points at the ISO file). I've installed an OS onto /dev/vda (which is .../instance-ID/disk.local). Running 'nova image-create INSTANCE NAME` results in a traceback in the compute log: 2012-08-16 09:48:56 TRACE nova.rpc.amqp ProcessExecutionError: Unexpected error while running command. 2012-08-16 09:48:56 TRACE nova.rpc.amqp Command: qemu-img convert -f qcow2 -O iso -s 6966cceec946407eb12531ddbe7bb7ac /virt/pools/openstack_2/instance-004c/disk /tmp/tmpMJ9GuL/6966cceec946407eb12531ddbe7bb7ac 2012-08-16 09:48:56 TRACE nova.rpc.amqp Exit code: 1 2012-08-16 09:48:56 TRACE nova.rpc.amqp Stdout: '' 2012-08-16 09:48:56 TRACE nova.rpc.amqp Stderr: qemu-img: Unknown file format 'iso'\n 2012-08-16 09:48:56 TRACE nova.rpc.amqp And the image gets stuck in the SAVING state: # nova image-list ... | f269eebc-d86e-4cd0-aacc-8b0c53fb3bb2 | cloud-f17-x86_64 | SAVING | 4b3fccf9-28ab-4c29-acdb-cc5c18c862af | Is there a way to do this properly? Obviously I can just munge around in the filesystem to do what I want, but I was hoping for something more convenient. I guess another option would be... - Boot from the live CD - Create a new volume - Attach the volume - Install onto the volume Is it possible to snapshot an ephemeral disk? -- Lars Kellogg-Stedman l...@seas.harvard.edu | Senior Technologist| http://ac.seas.harvard.edu/ Academic Computing | http://code.seas.harvard.edu/ Harvard School of Engineering and Applied Sciences | ___ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp
[Openstack] Running multiple Glance instances?
Assuming some sort of shared filesystem, can I run multiple glance indexes in order to distribute the i/o load across multiple systems? Do I need to run both the registry and API service in each location? We're running with an NFS-backed data store, and it seems that we could eliminate some network i/o if we were to have each compute note run the glance service locally (but all managing the same directory). Does this make any sense? -- Lars Kellogg-Stedman l...@seas.harvard.edu | Senior Technologist| http://ac.seas.harvard.edu/ Academic Computing | http://code.seas.harvard.edu/ Harvard School of Engineering and Applied Sciences | ___ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp
Re: [Openstack] Inbound connectivity and FlatDHCP networking
Traffic from vm to vm on different hosts should be able to go accross flat_interface Okay, that makes sense. Getting inbound connectivity over fixed_ips can be tricky. It looks like you want to set up a specific range from vms that is not snatted. there is a config option for this called dmz_cidr. Anything in the dmz_cidr range will not be snatted. With a multi_host, flatDHCP model, is the general idea that fixed_ips are -- generally -- internal to the compute host, and all external access is supposed to be via floating ips? That's sort of how it looks, but I hadn't seen that states explicitly anywhere. fixed_range=10.0.0.0/16 dmz_cidr=10.1.0.0/16 How does fixed_range interact with networks created via 'nova-manage network create ...'? There are a few bugs (e.g., https://bugs.launchpad.net/nova/+bug/741626) that suggest things need to be specified in both places. Is that correct? -- Lars Kellogg-Stedman l...@seas.harvard.edu | Senior Technologist| http://ac.seas.harvard.edu/ Academic Computing | http://code.seas.harvard.edu/ Harvard School of Engineering and Applied Sciences | ___ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp
Re: [Openstack] qpid_heartbeat...doesn't?
On Thu, Aug 02, 2012 at 12:33:13PM -0400, Lars Kellogg-Stedman wrote: Looks like a typo. Could you try this. FYI: The same typo appears to exist in notify_qpid.py. Err, that is, glance/notifier/notify_qpid.py, in case it wasn't obvious... -- Lars Kellogg-Stedman l...@seas.harvard.edu | Senior Technologist| http://ac.seas.harvard.edu/ Academic Computing | http://code.seas.harvard.edu/ Harvard School of Engineering and Applied Sciences | ___ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp
Re: [Openstack] Inbound connectivity and FlatDHCP networking
On Thu, Aug 02, 2012 at 09:24:56AM -0700, Vishvananda Ishaya wrote: It isn't explicitly that way, but it is the easiest setup. It is possible to set up fixed ips that are accessible/routable from outside but there are a lot of gotchas Got it. The snatting rule is created exclusively from fixed_range, so right now fixed_range must contain all created fixed networks. Thanks, that clears up a mystery! We've now got inbound networking operating correctly, although it did require us to fiddle around with some policy routing rules to get traffic going to the right gateway. I'm going to write up some details and post it here later. -- Lars Kellogg-Stedman l...@seas.harvard.edu | Senior Technologist| http://ac.seas.harvard.edu/ Academic Computing | http://code.seas.harvard.edu/ Harvard School of Engineering and Applied Sciences | ___ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp
[Openstack] Preventing OpenStack from allocating some floating ips?
If I create a floating address range like this: nova-manage floating create --ip_range=10.243.30.0/24 Is there any way to block out specific addresses in that range? For example, the .1 address is the network gateway, and everything will fall apart if that address is accidentally allocated to an instance. Similarly, our host needs an address in that range in order to route traffic to the gateway. Is there any way to exempt specific addresses? I realize that instead of allocating a /24 I could allocate a series of, say, /28 networks, but that seems a little clumsy. Thanks, -- Lars Kellogg-Stedman l...@seas.harvard.edu | Senior Technologist| http://ac.seas.harvard.edu/ Academic Computing | http://code.seas.harvard.edu/ Harvard School of Engineering and Applied Sciences | ___ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp
Re: [Openstack] Preventing OpenStack from allocating some floating ips?
(The following is assuming you're using Essex - I don't really know anything about Quantum) Yeah, we're using Essex with FlatDHCP networking for now. An interesting thing about how floating IPs work is that internally nova-network just has a big table of ip addresses in the database. That's good to know. We try as much as possible to avoid solutions that involve poking at the database, but we can probably live with this. Especially since MySQL knows about IP addresses (so we can select all addresses below x.x.x.10 or something). -- Lars Kellogg-Stedman l...@seas.harvard.edu | Senior Technologist| http://ac.seas.harvard.edu/ Academic Computing | http://code.seas.harvard.edu/ Harvard School of Engineering and Applied Sciences | ___ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp
[Openstack] Inbound connectivity and FlatDHCP networking
We are trying to use the FlatDHCP network model in multi_host mode. Getting things to boot and establishing *outbound* connectivity has been relatively simple. Systems come up, pull an address from the local dnsmasq process running on the compute host, and all traffic is routed out public_interface via an iptables SNAT rule. E.g., http://www.referencearchitecture.org/network-design/ For outbound access, it's not clear why the flat_network_bridge needs to be connected to an actual physical interface...since everything goes out public_interface, I'm not sure what flat_interface is for. It's also not clear how inbound access is supposed to work. Guest interfaces get addresses, but due to the NAT rule these are mostly inaccessible to external systems. The guests are on a locally routeable 10.x.x.x network, but the routing established by OpenStack means that any inbound connections from outside the network will result in replies going out via the SNAT rule, which means connections are never established. I've had a hard time finding documentation that shows a complete example of this configuration, and what I have found (like the picture above) only seems to answer the outbound half of the question. Any pointers would be greatly appreciated. Thanks, -- Lars Kellogg-Stedman l...@seas.harvard.edu | Senior Technologist| http://ac.seas.harvard.edu/ Academic Computing | http://code.seas.harvard.edu/ Harvard School of Engineering and Applied Sciences | ___ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp
Re: [Openstack] qpid_heartbeat...doesn't?
Looks like a typo. Could you try this. That seems better...although while the documentation says that qpid_heartbeat is Seconds between heartbeat messages [1], observed behavior suggests that it is actually *minutes* between messages. [1]: http://docs.openstack.org/essex/openstack-compute/admin/content/configuration-qpid.html -- Lars Kellogg-Stedman l...@seas.harvard.edu | Senior Technologist| http://ac.seas.harvard.edu/ Academic Computing | http://code.seas.harvard.edu/ Harvard School of Engineering and Applied Sciences | ___ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp
Re: [Openstack] qpid_heartbeat...doesn't?
On Mon, Jul 30, 2012 at 01:41:20AM +0100, Pádraig Brady wrote: Perhaps there is another issue with the scheduling of this? That's likely. While I verified that the patch successfully fixed our connection timeout issue, I didn't look closely to see exactly where the behavior changed...and the connection that is standing out now belongs to nova-volume, whereas the timouts were happening with nova-compute. How are you monitoring the connection? Our firewall is a Cisco ASDM (6.1). I'm monitoring the connection by running: show conn lport 5672 Which gets me: TCP compute-hosts:630 10.243.16.151:39756 controllers:621 openstack-controller:5672 idle 0:00:00 Bytes 6410148 FLAGS - UBOI TCP compute-hosts:630 10.243.16.151:39881 controllers:621 openstack-controller:5672 idle 0:00:04 Bytes 10470 FLAGS - UBOI TCP compute-hosts:630 10.243.16.151:39755 controllers:621 openstack-controller:5672 idle 0:00:02 Bytes 9717108 FLAGS - UBOI TCP compute-hosts:630 10.243.16.151:39736 controllers:621 openstack-controller:5672 idle 0:03:59 Bytes 36206 FLAGS - UBOI TCP compute-hosts:630 10.243.16.151:39752 controllers:621 openstack-controller:5672 idle 0:00:03 Bytes 4313246 FLAGS - UBOI Where the fields are: protocol source interface source ip/port dest. interface dest ip/port idle idle time ... The connection from port 39736 on the compute host (which is the nova-volume process) regularly cycles up to 5 minutes of idle time before resetting to 0 (and the firewall sets the idle time to zero whenever any traffic passes across the connection). And indeed, if I run a packet trace on this connection, I can verify that packets are only showing up at five-minute intervals. -- Lars Kellogg-Stedman l...@seas.harvard.edu | Senior Technologist| http://ac.seas.harvard.edu/ Academic Computing | http://code.seas.harvard.edu/ Harvard School of Engineering and Applied Sciences | ___ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp
Re: [Openstack] How do I stop image-create from using /tmp?
So, maybe setting any of this environment variables for nova-compute to desired value sholuld help. Yeah, I was expecting that. Given that this could easily take out a compute host I'd like to see it get an explicit configuration value (or default to instance_dir, I guess). If I can sort out the corporate contributor agreement stuff I may try to submit a patch... -- Lars Kellogg-Stedman l...@seas.harvard.edu | Senior Technologist| http://ac.seas.harvard.edu/ Academic Computing | http://code.seas.harvard.edu/ Harvard School of Engineering and Applied Sciences | ___ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp
[Openstack] How do I stop image-create from using /tmp?
I'm running nova image-create for the first time, and on the compute node I see: qemu-img convert -f qcow2 -O qcow2 -s 98a8efe9ec114b489eb163c64661441a /virt/pools/openstack_2/instance-0011/disk /tmp/tmpH9KkUZ/98a8efe9ec114b489eb163c64661441a I am concerned to see this operation dropping a disk image into /tmp. What if the disk image is larger than the root filesystem? Is there a way to have OpenStack put these temporary images somewhere else? I would prefer instance_dir by default, instead of /tmp, because instance_dir has already been provisioned with enough space to meet the needs of disk images, but an explicit parameter would probably be a better option. -- Lars Kellogg-Stedman l...@seas.harvard.edu | Senior Technologist| http://ac.seas.harvard.edu/ Academic Computing | http://code.seas.harvard.edu/ Harvard School of Engineering and Applied Sciences | ___ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp
Re: [Openstack] When are hostnames okay and when are ip addresses required?
Maybe I sent this out too late at night; I think it slipped below everyone's radar. I'm interested in whether or not people think this behavior is a functional bug, or maybe just a documentation bug: I ran into an issue earlier today where I had metadata_host set to the *hostname* of our controller. This got stuffed into an iptables rule as... -d os-controller.int.seas.harvard.edu/32 ...which promptly failed. Setting this to an ip address fixed this particular error, leading me to wonder: - Is this expected behavior? - Should I always use ip addresses for *_host values? - Is this a bug? - Should linux_net.py resolve hostnames? -- Lars Kellogg-Stedman l...@seas.harvard.edu | Senior Technologist| http://ac.seas.harvard.edu/ Academic Computing | http://code.seas.harvard.edu/ Harvard School of Engineering and Applied Sciences | ___ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp
Re: [Openstack] Nova doesn't release ips when terminating instances
force_dhcp_release=true should cause the ip to be released immediately, assuming the relevant optional binary from dnsmasq is installed (it is in the package dnsmasq-utils in ubuntu). The dhcp_release command does not appear to be packaged with Fedora. If it is set to false then the ips should be reclaimed after a set timeout period (ten minutes by default) via a periodic task in the network worker. If they are not being reclaimed properly then there is definitely a bug somewhere... It does not appear that ips are ever properly reclaimed. They will hang around with allocated=0 and instance_id != NULL forever, until I manually correct the database. -- Lars Kellogg-Stedman l...@seas.harvard.edu | Senior Technologist| http://ac.seas.harvard.edu/ Academic Computing | http://code.seas.harvard.edu/ Harvard School of Engineering and Applied Sciences | ___ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp
[Openstack] A collection of utilities for cleaning up the database
Since I've been regularly breaking things over the past few weeks I've gotten tired of manually clearing things out of the Nova database. I've written a small collection of tools to make this task easier: https://code.seas.harvard.edu/openstack/stackutil This is Python code which you should be able to easy_install. It requires (and easy_install will install) cliff, a framework for command-line applications. At the moment, it can: - Free ip addresses that have not been correctly released. - Delete instances stuck in the BUILD or ERROR states that won't go away when you use nova delete. - Delete volumes stuck in the attaching or creating state. - Delete disabled services. The last one is there because I couldn't figure out how to disable services (for example, when I remove a compute node). The code uses nova.flag.FLAG to read your nova configuration file, extract sql_connection, and then uses sqlalchemy.engine.create_engine() to create a database connection from that URL. WARNING: This code is a terrible hack and using it could possible corrupt your database and make your OpenStack environment fall over hard. On the other hand, if you're in the early stages of testing this may save you some grief. -- Lars Kellogg-Stedman l...@seas.harvard.edu | Senior Technologist| http://ac.seas.harvard.edu/ Academic Computing | http://code.seas.harvard.edu/ Harvard School of Engineering and Applied Sciences | ___ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp
Re: [Openstack] Nova doesn't release ips when terminating instances
Do you see something like the following every sixty seconds in your network.log? 2012-06-26 17:52:38 DEBUG nova.manager [-] Running periodic task FlatDHCPManager._disassociate_stale_fixed_ips from (pid=20993) periodic_tasks /opt/stack/nova/nova/manager.py:164 I do see these messages in the log (approximately once/minute, it looks like). Here's a test: - I boot and then deleted ('nova delete ...') an instance. This leaves the database looking like this: mysql select created_at,updated_at,allocated,instance_id from fixed_ips where allocated=0 and instance_id is not null; +-+-+---+-+ | created_at | updated_at | allocated | instance_id | +-+-+---+-+ | 2012-06-21 20:26:45 | 2012-06-26 18:56:32 | 0 |2247 | +-+-+---+-+ 1 row in set (0.00 sec) - I wait a while... mysql select utc_time(); ++ | utc_time() | ++ | 19:08:24 | ++ 1 row in set (0.00 sec) - But the ip still has an instance_id: mysql select created_at,updated_at,allocated,instance_id from fixed_ips where allocated=0 and instance_id is not null; +-+-+---+-+ | created_at | updated_at | allocated | instance_id | +-+-+---+-+ | 2012-06-21 20:26:45 | 2012-06-26 18:56:32 | 0 |2247 | +-+-+---+-+ 1 row in set (0.00 sec) fixed_ip_disassociate_timeout defaults to 600 so ips should be reclaimed after 10 minutes unless you have changed the value of that option. That option appears to be set to the default of 600 seconds. -- Lars Kellogg-Stedman l...@seas.harvard.edu | Senior Technologist| http://ac.seas.harvard.edu/ Academic Computing | http://code.seas.harvard.edu/ Harvard School of Engineering and Applied Sciences | ___ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp
Re: [Openstack] Nova doesn't release ips when terminating instances
A rebuild of this would probably work: http://ftp.redhat.com/pub/redhat/linux/enterprise/6Workstation/en/os/SRPMS/dnsmasq-2.48-6.el6.src.rpm Thanks for the pointer! I'll drop that into our build system and see what comes out. -- Lars Kellogg-Stedman l...@seas.harvard.edu | Senior Technologist| http://ac.seas.harvard.edu/ Academic Computing | http://code.seas.harvard.edu/ Harvard School of Engineering and Applied Sciences | ___ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp
Re: [Openstack] Nova doesn't release ips when terminating instances
Fix here: https://review.openstack.org/9026 That changes appears to be against nova/network/quantum/nova_ipam_lib.py. Is that also in the code path for non-Quantum users (specifically, people using the FlatDHCP model)? -- Lars Kellogg-Stedman l...@seas.harvard.edu | Senior Technologist| http://ac.seas.harvard.edu/ Academic Computing | http://code.seas.harvard.edu/ Harvard School of Engineering and Applied Sciences | ___ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp
Re: [Openstack] Nova doesn't release ips when terminating instances
The issue (which was very annoying to track down) is the nova_ipam_lib is loaded by default, and it trickily was unsetting the timeout_fixed_ips setting of FlatDHCPManager with the (seemingly innocuous): self.net_manager.timeout_fixed_ips = not self.net_manager.DHCP Got it. I can confirm that it has fixed our problem with addresses not being released. Thanks! -- Lars Kellogg-Stedman l...@seas.harvard.edu | Senior Technologist| http://ac.seas.harvard.edu/ Academic Computing | http://code.seas.harvard.edu/ Harvard School of Engineering and Applied Sciences | ___ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp
Re: [Openstack] Nova doesn't release ips when terminating instances
can you try with the flag: force_dhcp_release=false I've been looking unsuccessfully for documentation on this option...so I'm not sure exactly what it does. However, if I understand https://bugs.launchpad.net/nova/+bug/953712 correctly, it requires the dhcp_release command, which is sometimes part of the dnsmasq package. I'm running on CentOS 6.2 and dhcp_release does not appear to exist. On the other hand, simple database manipulation seems to resolve the problem, so would everything work just fine if I added a no-op dhcp_release command? I'll give this a shot later tonight. -- Lars Kellogg-Stedman l...@seas.harvard.edu | Senior Technologist| http://ac.seas.harvard.edu/ Academic Computing | http://code.seas.harvard.edu/ Harvard School of Engineering and Applied Sciences | ___ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp
Re: [Openstack] Nova doesn't release ips when terminating instances
On Sat, Jun 23, 2012 at 11:55:33AM -0700, John Postlethwait wrote: Seems like this might be a pretty valid issue to report/fix though, no? Is that a question for me or the list? I certainly think it should be fixed, but I wanted to check here before posting a bug report (in case someone said, oh, it's supposed to work that way, you need to set the nova_act_sane_please configuration option to change the behavior). Given the two responses here I will open a bug report later this evening. -- Lars Kellogg-Stedman l...@seas.harvard.edu | Senior Technologist| http://ac.seas.harvard.edu/ Academic Computing | http://code.seas.harvard.edu/ Harvard School of Engineering and Applied Sciences | ___ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp
Re: [Openstack] Nova doesn't release ips when terminating instances
can you try with the flag: force_dhcp_release=false It turns out I already had force_dhcp_release set to False. I've opened https://bugs.launchpad.net/nova/+bug/1017013 on this issue. -- Lars Kellogg-Stedman l...@seas.harvard.edu | Senior Technologist| http://ac.seas.harvard.edu/ Academic Computing | http://code.seas.harvard.edu/ Harvard School of Engineering and Applied Sciences | ___ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp
Re: [Openstack] Diagnosing RPC timeouts when attaching volumes
The timeout occurs when nova-compute is trying to do an rpc call to nova-volume. It looks like this is just the compute log. Do you have an error in the volume log? There were no errors in the volume log. It may have been a networking problem caused by the local iptables firewall on the volume server getting reset...but it's part of a larger issue we're struggling with, which is that in general OpenStack makes it very hard to track down errors along the RPC chain. Thanks! -- Lars Kellogg-Stedman l...@seas.harvard.edu | Senior Technologist| http://ac.seas.harvard.edu/ Academic Computing | http://code.seas.harvard.edu/ Harvard School of Engineering and Applied Sciences | ___ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp
[Openstack] Nova doesn't release ips when terminating instances
When an instance terminates, the allocated field in the fixed_ips is set to 0, but the instance_id field remains set. Once all the addresses are in this state, new instances fail to start, and the following error is logged by nova-network: 2012-06-22 23:09:34 ERROR nova.rpc.amqp [req-1fea207d-cd65-4375-9a04-17ba1ab92e3e 22bb8e502d3944ad953e72fc77879c2f 76e2726cacca4be0bde6d8840f88c136] Returning exception Zero fixed ips available. to caller Which shows up in compute.log as: 2012-06-22 23:08:35 TRACE nova.rpc.amqp RemoteError: Remote error: NoMoreFixedIps Zero fixed ips available. Manually set instance_id=NULL in the fixed_ips table allows things to work again. We're running the 2012.1.1 release and we're using the FlatDHCP model. Is this a known bug? Thanks, -- Lars Kellogg-Stedman l...@seas.harvard.edu | Senior Technologist| http://ac.seas.harvard.edu/ Academic Computing | http://code.seas.harvard.edu/ Harvard School of Engineering and Applied Sciences | ___ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp
[Openstack] Diagnosing RPC timeouts when attaching volumes
Hello all, I've run into a series of frustrating problems trying to get images to attach correctly to running instances. The current issue is that after running a nova volume-attach ... command, I get the following in compute.log on the compute host: 2012-06-21 12:32:03 ERROR nova.rpc.impl_qpid [req-a4720bff-afa5-48a3-a01c-d6697d53e835 22bb8e502d3944ad953e72fc77879c2f 76e2726cacca4be0bde6d8840f88c136] Timed out waiting for RPC response: None ...followed by a page or two of tracebacks from nova.rpc.impl_qpid and nova.compute.manager, which seem to buried so far inside decorators and RPC calls that I have a hard time figuring out what is actually happening. It *looks* like an Exception inside of attach_volume(), which is a good sign, I guess. I've posted the complete traceback here: https://gist.github.com/2966898 There is a nova-volume service running (on the compute host, because this is where the disk space was available): Binary Host Zone Status State Updated_At nova-network os-controller.int.seas.harvard.edu nova disabled XXX 2012-06-21 14:01:41 nova-certos-controller.int.seas.harvard.edu nova enabled:-) 2012-06-21 16:35:22 nova-scheduler os-controller.int.seas.harvard.edu nova enabled:-) 2012-06-21 16:35:22 nova-consoleauth os-controller.int.seas.harvard.edu nova enabled:-) 2012-06-21 16:35:22 nova-compute os-host.int.seas.harvard.edu nova enabled:-) 2012-06-21 16:35:22 nova-volume os-host.int.seas.harvard.edu nova enabled:-) 2012-06-21 16:35:15 nova-console os-controller.int.seas.harvard.edu nova enabled:-) 2012-06-21 16:35:16 nova-network os-host.int.seas.harvard.edu nova enabled:-) 2012-06-21 16:35:17 Creating volumes works just fine. -- Lars Kellogg-Stedman l...@seas.harvard.edu | Senior Technologist| http://ac.seas.harvard.edu/ Academic Computing | http://code.seas.harvard.edu/ Harvard School of Engineering and Applied Sciences | ___ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp
Re: [Openstack] Deleting a volume stuck in attaching state?
On Wed, Jun 20, 2012 at 02:30:12PM +, Thomas, Duncan wrote: nova-manage volume delete on a nova host works for this... Ah, that appears to do it. I wasn't previously aware that there were volume management commands in both 'nova' and 'nova-manage'. Thanks, -- Lars Kellogg-Stedman l...@seas.harvard.edu | Senior Technologist| http://ac.seas.harvard.edu/ Academic Computing | http://code.seas.harvard.edu/ Harvard School of Engineering and Applied Sciences | ___ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp
Re: [Openstack] Deleting a volume stuck in attaching state?
A strategy we are making in Nova (WIP) is to allow instance termination no matter what. Perhaps a similar strategy could be adopted for volumes too? Thanks, The 'nova-manage volume delete ...' solution worked just fine in this case...but in general, as a consumer of the software, we would really prefer to be able to delete things regardless of their state using established tools, rather than manipulating the database directly. I'm always worried that I'll screw something up due to my incomplete understanding of the database schema. -- Lars Kellogg-Stedman l...@seas.harvard.edu | Senior Technologist| http://ac.seas.harvard.edu/ Academic Computing | http://code.seas.harvard.edu/ Harvard School of Engineering and Applied Sciences | ___ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp
[Openstack] Problems accessing metadata service due to nova-network generated iptables rules
We seem to have OpenStack working correctly with a FlatDHCP network environment, running in multi_host mode. Outbound connectivity works just fine: instance# curl http://google.com HTMLHEADmeta http-equiv=content-type content=text/html;charset=utf-8 TITLE301 Moved/TITLE/HEADBODY H1301 Moved/H1 The document has moved A HREF=http://www.google.com/;here/A. /BODY/HTML We are having problems with connectivity from the instance to our OpenStack controller (particularly the metadata service). Our compute host is configured like this: public_interface= em1 flat_interface = bond0.662 bridge = br662 fixed_range = 10.243.28.0/24 routing_source_ip = 10.243.16.151 flat_network_dhcp_start = 10.243.28.10 Here's the bridge, with one instance attached: br662 8000.00212883a78c no bond0.662 vnet0 Our metadata server is at 10.243.21.36, and the example instance is at 10.243.28.28: instance# ifconfig eth0 Link encap:Ethernet HWaddr fa:16:3e:40:f3:ad inet addr:10.243.28.28 Bcast:10.243.28.255 Mask:255.255.255.0 From the instance, it's not possible to access the metadata server, either directly or via the DNAT rule for 169.254.169.254. An attempt to access http://10.243.21.26:8775/ fails because this causes a packet to be emitted on public_interface with a source address from fixed_range on our private network: host# tshark -i em1 -n host 10.243.21.36 0.00 10.243.28.28 - 10.243.21.36 TCP 37070 8775 [SYN] Seq=0 Win=14600 Len=0 MSS=1460 TSV=1699692 TSER=0 WS=3 This is promptly discarded by the routing infrastructure, since the source address of the packet does not match the address range of the network. The connection avoids the SNAT rule applied to other connections because of this rule: -A nova-network-POSTROUTING -s 10.243.28.0/24 -d 10.243.21.36/32 -j ACCEPT What is the reason to skip SNAT for access to controller? The only configuration I can see in which this would work is to give the metadata server an interface on the private instance network...but this doesn't seem to match any of the architecture diagrams I've seen at openstack.org, and it poses another set of problems w/r/t the DNAT rule (with the metadata server on the *same* network as the instance, access via http://169.254.169.254/ will fail because returning packets will have the wrong source address). I'm assuming that some part of our configuration does not match the expectations of nova-network. I would be grateful for suggestions as to which part needs fixing. -- Lars Kellogg-Stedman l...@seas.harvard.edu | Senior Technologist| http://ac.seas.harvard.edu/ Academic Computing | http://code.seas.harvard.edu/ Harvard School of Engineering and Applied Sciences | ___ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp
Re: [Openstack] Problems accessing metadata service due to nova-network generated iptables rules
We are having problems with connectivity from the instance to our OpenStack controller (particularly the metadata service)... zynzel on #openstack suggested that the metadata api service is supposed to run alongside the compute service, so I've modified our configuration accordingly to start nova-api-metadata on our compute node. nova-network is now attaching the 169.254.169.254 address to the loopback interface: host# ip addr show dev lo 1: lo: LOOPBACK,UP,LOWER_UP mtu 16436 qdisc noqueue state UNKNOWN link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 inet 127.0.0.1/8 scope host lo inet 169.254.169.254/32 scope link lo inet6 ::1/128 scope host valid_lft forever preferred_lft forever ...and at this point, explicit access to the metadata port from the isntance works as expected: instance# curl http://169.254.169.254:8775/ 1.0 2007-01-19 . . . 2009-04-04 Nova also adds the following rule to the nat PREROUTING chain: -A nova-network-PREROUTING -d 169.254.169.254/32 -p tcp -m tcp --dport 80 -j DNAT --to-destination 127.0.0.1:8775 This does *not* appear to work as expected. While packets to 169.254.168.254:80 clearly match this rule, they never make it to the metadata server. Replacing this rule with a REDIRECT rule does the right thing: host# iptables -t nat -I nova-network-PREROUTING 1 \ -d 169.254.169.254/32 -p tcp -m tcp \ --dport 80 -j REDIRECT --to-ports 8775 Now on the instance I can access http://169.254.169.254/. Is the DNAT rule expected to work? Does linux_net.py need a special case for when the metadata address is on the local host? Thanks, -- Lars Kellogg-Stedman l...@seas.harvard.edu | Senior Technologist| http://ac.seas.harvard.edu/ Academic Computing | http://code.seas.harvard.edu/ Harvard School of Engineering and Applied Sciences | ___ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp
Re: [Openstack] Problems accessing metadata service due to nova-network generated iptables rules
or better yet, just the nova-api-metadata service alongside every nova-network. Right, that's what we've got. The issue right now appears to be that of a DNAT rule failing where a REDIRECT rule succeeds, and I'm not sure what's causing that behavior. Presumably other people have this working successfully, so I'm assuming there's something about the network configuration on this host that is awry. -- Lars Kellogg-Stedman l...@seas.harvard.edu | Senior Technologist| http://ac.seas.harvard.edu/ Academic Computing | http://code.seas.harvard.edu/ Harvard School of Engineering and Applied Sciences | ___ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp
Re: [Openstack] Problems accessing metadata service due to nova-network generated iptables rules
Is the DNAT rule expected to work? Does linux_net.py need a special case for when the metadata address is on the local host? For now, I've modified linux_net.py so that it conditionally creates a REDIRECT rule if FLAGS.metadata_host is 127.0.0.1: def metadata_forward(): Create forwarding rule for metadata. if FLAGS.metadata_host == '127.0.0.1': iptables_manager.ipv4['nat'].add_rule('PREROUTING', '-s 0.0.0.0/0 -d 169.254.169.254/32 ' '-p tcp -m tcp --dport 80 -j REDIRECT ' '--to-ports %s' % (FLAGS.metadata_port)) else: iptables_manager.ipv4['nat'].add_rule('PREROUTING', '-s 0.0.0.0/0 -d 169.254.169.254/32 ' '-p tcp -m tcp --dport 80 -j DNAT ' '--to-destination %s:%s' % (FLAGS.metadata_host, FLAGS.metadata_port)) iptables_manager.apply() -- Lars Kellogg-Stedman l...@seas.harvard.edu | Senior Technologist| http://ac.seas.harvard.edu/ Academic Computing | http://code.seas.harvard.edu/ Harvard School of Engineering and Applied Sciences | ___ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp
Re: [Openstack] Problems accessing metadata service due to nova-network generated iptables rules
Is the DNAT rule expected to work? Does linux_net.py need a special case for when the metadata address is on the local host? I have confirmed that the DNAT rule works *unless* metadata_host is 127.0.0.1, in which case you need a REDIRECT rule. -- Lars Kellogg-Stedman l...@seas.harvard.edu | Senior Technologist| http://ac.seas.harvard.edu/ Academic Computing | http://code.seas.harvard.edu/ Harvard School of Engineering and Applied Sciences | ___ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp
[Openstack] When are hostnames okay and when are ip addresses required?
I ran into an issue earlier today where I had metadata_host set to the *hostname* of our controller. This got stuffed into an iptables rule as... -d os-controller.int.seas.harvard.edu/32 ...which promptly failed. Setting this to an ip address fixed this particular error, leading me to wonder: - Is this expected behavior? - Should I always use ip addresses for *_host values? - Is this a bug? - Should linux_net.py resolve hostnames? Thanks, -- Lars Kellogg-Stedman l...@seas.harvard.edu | Senior Technologist| http://ac.seas.harvard.edu/ Academic Computing | http://code.seas.harvard.edu/ Harvard School of Engineering and Applied Sciences | ___ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp
[Openstack] Deleting a volume stuck in attaching state?
I attempted to attach a volume to a running instance, but later deleted the instance, leaving the volume stuck in the attaching state: # nova volume-list ++---+--+--+-+-+ | ID | Status | Display Name | Size | Volume Type | Attached to | ++---+--+--+-+-+ | 9 | attaching | None | 1| None| | ++---+--+--+-+-+ It doesn't appear to be possible to delete this with nova volume-delete: # nova volume-delete nova volume-delete 9 ERROR: Invalid volume: Volume status must be available or error (HTTP 400) Other than directly editing the database (and I've had to do that an awful lot already), how do I recover from this situation? -- Lars Kellogg-Stedman l...@seas.harvard.edu | Senior Technologist| http://ac.seas.harvard.edu/ Academic Computing | http://code.seas.harvard.edu/ Harvard School of Engineering and Applied Sciences | ___ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp
Re: [Openstack] glance_api_servers vs. glance_host vs. keystone?
I don't see nova-network running... And in fact, that seems to have been at the root of a number of problems. Thanks! With some work over the weekend I'm now successfully booting instances with networking using the Flat network manager. Great. It wasn't clear from the documentation that nova-network was a *necessary* service (that is, it wasn't clear that the failure mode would be fail to create an instance vs. your instance has not networking). We were punting on network configuration until after we were able to successfully boot instances...so that was apparently a bad idea on our part. -- Lars Kellogg-Stedman l...@seas.harvard.edu | Senior Technologist| http://ac.seas.harvard.edu/ Academic Computing | http://code.seas.harvard.edu/ Harvard School of Engineering and Applied Sciences | ___ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp
Re: [Openstack] glance_api_servers vs. glance_host vs. keystone?
Thus, I suspect that nova may not even use the Keystone endpoints... That sounds crazy to me, but I just got here. That is, why go to the effort to develop an endpoint registration service and then decide not to use it? Given the asynchronous, distributed nature of OpenStack, an endpoint directory seems like a good idea. Just out of question, what *does* use the endpoint registry in KeyStone (in the Essex release)? -- Lars Kellogg-Stedman l...@seas.harvard.edu | Senior Technologist| http://ac.seas.harvard.edu/ Academic Computing | http://code.seas.harvard.edu/ Harvard School of Engineering and Applied Sciences | ___ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp
[Openstack] OpenStack ate my error message!
Working with OpenStack for the past few weeks I've noticed a tendency for the tools to eat error messages in a way that makes problem determination tricky. For example: Early on, there were some authentication issues in my configuration. The error message presented by the command line tools was: ERROR: list indices must be integers, not str It was only by trawling through the DEBUG logs that I was able to find the actual traceback (which indicated that Keystone was returning a 503 error, I think, and not returning the JSON expected by the client). And more recently: The error generated by nova-network not running was a series of tracebacks in the nova-compute log: - One for nova.rpc.impl_qpid - Another for nova.compute.manager - Another for nova.rpc.amqp I saw these errors prior to my mailing list post, but it was difficult to connect them to useful facts about our environment. I'm not suggesting there's an easy fix to this. Delivering error messages correctly in this sort of asynchronous, RPC environment is difficult. Thanks for all the hard work, -- Lars Kellogg-Stedman l...@seas.harvard.edu | Senior Technologist| http://ac.seas.harvard.edu/ Academic Computing | http://code.seas.harvard.edu/ Harvard School of Engineering and Applied Sciences | ___ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp
[Openstack] glance_api_servers vs. glance_host vs. keystone?
| +-+---+ +-++ |volume | Value | +-++ | adminURL| http://os-controller.int.seas.harvard.edu:8776/v1/76e2726cacca4be0bde6d8840f88c136 | | internalURL | http://os-controller.int.seas.harvard.edu:8776/v1/76e2726cacca4be0bde6d8840f88c136 | | publicURL | http://os-controller.int.seas.harvard.edu:8776/v1/76e2726cacca4be0bde6d8840f88c136 | | region | SEAS | +-++ +-+---+ | ec2 | Value | +-+---+ | adminURL| http://os-controller.int.seas.harvard.edu:8773/services/Admin | | internalURL | http://os-controller.int.seas.harvard.edu:8773/services/Cloud | | publicURL | http://os-controller.int.seas.harvard.edu:8773/services/Cloud | | region | SEAS | +-+---+ +-+-+ |swift| Value | +-+-+ | adminURL| http://os-controller.int.seas.harvard.edu:/ | | internalURL | http://os-controller.int.seas.harvard.edu:/v1/AUTH_76e2726cacca4be0bde6d8840f88c136 | | publicURL | http://os-controller.int.seas.harvard.edu:/v1/AUTH_76e2726cacca4be0bde6d8840f88c136 | | region | SEAS | +-+-+ +-+---+ | keystone | Value | +-+---+ | adminURL| http://os-controller.int.seas.harvard.edu:35357/v2.0/ | | internalURL | http://os-controller.int.seas.harvard.edu:5000/v2.0/ | | publicURL | http://os-controller.int.seas.harvard.edu:5000/v2.0/ | | region | SEAS | +-+---+ -- Lars Kellogg-Stedman l...@seas.harvard.edu | Senior Technologist| http://ac.seas.harvard.edu/ Academic Computing | http://code.seas.harvard.edu/ Harvard School of Engineering and Applied Sciences | ___ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp
Re: [Openstack] glance_api_servers vs. glance_host vs. keystone?
Well, apologies for the stupid subject line. In the interests of sewing as much confusion as possible, here's the question that was supposed to go along with that subject: nova.conf appears to sport several configuration options related to glance, including: - glance_host - glance_port - glance_api_servers These seem suspiciously similar. Do they do the same thing? And shouldn't this information actually come from Keystone, in which there is an endpoint registered for the glance service? -- Lars Kellogg-Stedman l...@seas.harvard.edu | Senior Technologist| http://ac.seas.harvard.edu/ Academic Computing | http://code.seas.harvard.edu/ Harvard School of Engineering and Applied Sciences | ___ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp
Re: [Openstack] glance_api_servers vs. glance_host vs. keystone?
I'm used to using rabbit, but I did notice you didn't include a nova-scheduler in your list above... There is a nova-scheduler service running on the controller: Binary Host Zone Status State Updated_At nova-certos-controller.int.seas.harvard.edu nova enabled :-) 2012-06-15 20:41:44 nova-scheduler os-controller.int.seas.harvard.edu nova enabled :-) 2012-06-15 20:41:45 nova-volume os-controller.int.seas.harvard.edu nova enabled :-) 2012-06-15 20:41:47 nova-compute os-host.int.seas.harvard.edu nova enabled :-) 2012-06-15 20:41:46 That was just me accidentally omitting it from the list. can't find an endpoint for qpidŠpossibly related? Again, I know nothing about qpid, but is there some way to see if the message is landing hitting qpid and getting stuck there? I don't really know anything about AMQ or how OpenStack utilizes the message broker. Are there any commands I can run that would exercise the message broker and confirm whether or not it is working correctly? One final piece of info that would be interesting to know is the vm_state and task_state from the db for the instances stuck in build. That would let us know just how far the instance got in the building process. My guess is that it is stuck in scheduling. Well, scheduling and/or deleting, due to my failed attempt to delete some of these instances: mysql select id,hostname,vm_state,task_state from instances; ++--+--++ | id | hostname | vm_state | task_state | ++--+--++ | 1 | lars0| building | deleting | | 2 | lars1| building | scheduling | | 3 | lars0| building | deleting | | 4 | lars2| building | scheduling | ++--+--++ 4 rows in set (0.00 sec) -- Lars Kellogg-Stedman l...@seas.harvard.edu | Senior Technologist| http://ac.seas.harvard.edu/ Academic Computing | http://code.seas.harvard.edu/ Harvard School of Engineering and Applied Sciences | ___ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp
Re: [Openstack] glance_api_servers vs. glance_host vs. keystone?
I don't really know anything about AMQ or how OpenStack utilizes the message broker. Are there any commands I can run that would exercise the message broker and confirm whether or not it is working correctly? For what it's worth: qpid provies a test suite (qpid-python-test that appears to complete successfully when run on the compute host using the broker on the controller. -- Lars Kellogg-Stedman l...@seas.harvard.edu | Senior Technologist| http://ac.seas.harvard.edu/ Academic Computing | http://code.seas.harvard.edu/ Harvard School of Engineering and Applied Sciences | ___ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp
Re: [Openstack] glance_api_servers vs. glance_host vs. keystone?
The reason these options have not gone away is probably a combination of supporting non-Keystone authentication and general programmer laziness… Kevin, Thanks for the reply, makes sense. Just to make sure I understand things, it sounds like Nova does not currently query Keystone for endpoints and continues to rely on explicit configuration (or to rephrase your answer, the reason these options have not gone away is because Nova does not yet have the necessary support for Keystone). Is that approximately correct? -- Lars Kellogg-Stedman l...@seas.harvard.edu | Senior Technologist| http://ac.seas.harvard.edu/ Academic Computing | http://code.seas.harvard.edu/ Harvard School of Engineering and Applied Sciences | ___ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp