Re: [Openstack-operators] regression testing before upgrade (operators view)
Hello. Thank you for reply. In those days I've tested tempest and rally. I found rally more suited for my goals. Insofar I'd like to admit that it's too overloaded and when I start poking around I quickly stuck with 'verify' mode, which in turn stuck in our neutron configuration (we have no tenant-allocated networks, only public ones). But after I found rally task start, and mangled samples a bit to fit to our needs, it really, really extremely close to what we want to have. I was disappointed by Ubuntu packaging (as usual) - their rally package broken and does not create any entry points at all. It worked much better in venv environment. Probably I'll try to combine jenkins, grafana, kibana and rally together, may be even for periodic service validation. Thank you for advice. On 08/29/2017 11:34 PM, Boris Pavlovic wrote: George, (with reduction of load to normal levels), Probably it's not the best idea just to run samples, they are called samples for the reason ;) Basically you can run same Rally task two times before/after upgrade and compare results (Rally has sort of trends support) Usually what I have heard from Ops guys is next: * Run Rally on periodic basis * Convert data Rally DB -> ElasticSearch * Build Kibana/Grafana on top FYI I am working on making above scenario work out of the box. -- So can you provide some more details on what you are looking for? What's missing? Best regards, Boris Pavlovic On Tue, Aug 29, 2017 at 2:24 AM, George Shuklin <george.shuk...@gmail.com <mailto:george.shuk...@gmail.com>> wrote: Hello everyone. Does someone do regression testing before performing upgrade (within same major version)? How do you do this? Do you know any tools for such tests? I started to research this area, and I see three openstack-specific tools: rally (with reduction of load to normal levels), tempest (can it be used by operators?) and granade. If you use any of tools, how do you feel about them? Are they worth time spent? ___ OpenStack-operators mailing list OpenStack-operators@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
[Openstack-operators] regression testing before upgrade (operators view)
Hello everyone. Does someone do regression testing before performing upgrade (within same major version)? How do you do this? Do you know any tools for such tests? I started to research this area, and I see three openstack-specific tools: rally (with reduction of load to normal levels), tempest (can it be used by operators?) and granade. If you use any of tools, how do you feel about them? Are they worth time spent? Thanks! ___ OpenStack-operators mailing list OpenStack-operators@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
Re: [Openstack-operators] Issue with snapshots of raw images
If anyone curious about this bug: It was clearly linux-3.13 bug, issue was completely solved by moving to 4.4. On 02/15/2017 08:03 PM, George Shuklin wrote: Hello. We've upgraded to mitaka, qemu 2.5/linux-3.13 and found that raw images now have BIG issues with snapshots. Symptoms: When snapshot process reach 'fallocated' blocks (see below) all IO in guest start lagging, including network IO. Windows starts loosing pings for very long period of time (~30-40 minutes), linux do this briefly (~500-700ms, but regularly). Research so far: All those symptoms may be resolved if every disk block is actually written on disk (dd if=disk of=disk conv=notrunc). If file has fallocated blocks, it will cause problem. If it has sparse hole it will cause problem too, but when preallocate_images = space, no any sparse holes are in the file. Best way so far to distinguish 'bad' disk from 'good' is to use filefrag -v. For 'bad' disk it shows "unwritten" flag. 1. Any idea how to prevent this? 2. Any idea how to force nova to actually write images completely without using 'fallocate'? ___ OpenStack-operators mailing list OpenStack-operators@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
Re: [Openstack-operators] need input on log translations
Whole idea with log translation is half-backed anyway. About the half of important log messages contain output of things outside openstack. Libvirt, ip, sudo, kernel, etc. In any i18n installation there going to be some amount of untranslated messages. This kills whole idea of localization. Modern operator ought to know English at 'technical reading' level anyway. Therefore, localization does not achieve it goal, but cause pain instead: search segmentation, slightly misleading translation (f.e. 'stream' and 'thread' both translate into Russian 'поток', which brings ambiguity), different system may use slightly different translation, causing even more mess. As Russian speaker and openstack operator I definitely don't want to have logs translation. On Mar 10, 2017 4:42 PM, "Doug Hellmann"wrote: There is a discussion on the -dev mailing list about the i18n team decision to stop translating log messages [1]. The policy change means that we may be able to clean up quite a lot of "clutter" throughout the service code, because without anyone actually translating the messages there is no need for the markup code used to tag those strings. If we do remove the markup from log messages, we will be effectively removing "multilingual logs" as a feature. Given the amount of work and code churn involved in the first roll out, I would not expect us to restore that feature later. Therefore, before we take what would almost certainly be an irreversible action, we would like some input about whether log message translations are useful to anyone. Please let us know if you or your customers use them. Thanks, Doug [1] http://lists.openstack.org/pipermail/openstack-dev/2017- March/113365.html ___ OpenStack-operators mailing list OpenStack-operators@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators ___ OpenStack-operators mailing list OpenStack-operators@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
[Openstack-operators] Issue with snapshots of raw images
Hello. We've upgraded to mitaka, qemu 2.5/linux-3.13 and found that raw images now have BIG issues with snapshots. Symptoms: When snapshot process reach 'fallocated' blocks (see below) all IO in guest start lagging, including network IO. Windows starts loosing pings for very long period of time (~30-40 minutes), linux do this briefly (~500-700ms, but regularly). Research so far: All those symptoms may be resolved if every disk block is actually written on disk (dd if=disk of=disk conv=notrunc). If file has fallocated blocks, it will cause problem. If it has sparse hole it will cause problem too, but when preallocate_images = space, no any sparse holes are in the file. Best way so far to distinguish 'bad' disk from 'good' is to use filefrag -v. For 'bad' disk it shows "unwritten" flag. 1. Any idea how to prevent this? 2. Any idea how to force nova to actually write images completely without using 'fallocate'? ___ OpenStack-operators mailing list OpenStack-operators@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
[Openstack-operators] allowed_address_pairs for port in neutron
Hello. I'm trying to allow more than one IP on interface for tenant, but neutron (Mitaka) rejects my requests: $ neutron port-update b59bc3bb-7d34-4fbb-8e55-a9f1c5c88411 --allowed-address-pairs type=dict list=true ip_address=10.254.15.4 Unrecognized attribute(s) 'allowed_address_pairs' Neutron server returns request_ids: ['req-9168f1f4-6e78-42fb-8521-c69b1cfd4f67'] Is someone done this? Can you show your commands to neutron and name version you are using? Thanks. ___ OpenStack-operators mailing list OpenStack-operators@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
Re: [Openstack-operators] Ironic with top-rack switches management
On 01/04/2017 07:31 PM, Clint Byrum wrote: Excerpts from George Shuklin's message of 2016-12-26 00:22:38 +0200: Hello everyone. Did someone actually made Ironic running with ToR (top rack switches) under neutron in production? Which switch verdor/plugin (and OS version) do you use? Do you have some switch configuration with parts outside of Neutron reach? Is it worth spent efforts on integration, etc? We had an experimental setup with Ironic and the OVN Neutron driver and VTEP-capable switches (Juniper, I forget the model #, but Arista also has models that fully support VTEP). It was able to boot baremetal nodes on isolated L2's (including an isolated provisioning network). In theory this would also allow VM<->baremetal L2 networking (and with kuryr, you could get VM<->baremetal<->container working too). But we never proved this definitively as we got tripped up on scheduling and hostmanager issues running with ironic in one host agg and libvirt in another. I believe these are solved, though I've not seen the documentation to prove it. Few weeks later I can answer may own question. Most of vendor drivers for Ironic suck. Some of them do not support baremetal ports, others have issues with own devices, or have no support for newer openstacks. Nonetheless, there is a great 'networking_generic_switch' ML2 driver which can do everything needed to run Ironic with tenant networking. It so well-written, that adding new vendor is bearable task for average admin. Switch description is just ~15 lines of code with switch-specific configuration commands. Ironic should be at least Newton to support multitenancy. And it has plenty of bugs, most of which are obvious to fix, but show that no one ever done production deployment before (or done, but patched it by oneself and kept that patch out of public). And one more question: Does Ironic support snapshotting of baremetal servers? With some kind of agent/etc? I think that's asking too much really. The point of baremetal is that you _don't_ have any special agents between your workload and hardware. Consider traditional backup strategies. But we already have cloud-init in baremetal instances. Why it can't be a cloud-backup? Main advantage of openstack-based snapshots for baremetal is to have 'golden image' creation. You press button, and your server become image. And that image (with proper cloud-init) can boot as VM or as baremetal. Convergence at it highest point. ___ OpenStack-operators mailing list OpenStack-operators@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
[Openstack-operators] Ironic with top-rack switches management
Hello everyone. Did someone actually made Ironic running with ToR (top rack switches) under neutron in production? Which switch verdor/plugin (and OS version) do you use? Do you have some switch configuration with parts outside of Neutron reach? Is it worth spent efforts on integration, etc? And one more question: Does Ironic support snapshotting of baremetal servers? With some kind of agent/etc? Thanks. ___ OpenStack-operators mailing list OpenStack-operators@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
Re: [Openstack-operators] Using novaclient, glanceclient, etc, from python
Em... Sorry, I'm trying to create_image. And it traces on duplicate images during creation process, not while passing image name to some 'create instance' or 'delete image' functions.\ Or you want to say I need to pass uuid for new image in image_create() function? Never hear about such thing. On 11/25/2016 12:48 PM, Ricardo Carrillo Cruz wrote: That is expected. The shade calls accept name_or_id param for a lot of methods for convenience. In your case, as there are multiple images with the same name you should pass the ID of the image you want to use, otherwise shade cannot guess it. 2016-11-25 11:42 GMT+01:00 George Shuklin <george.shuk...@gmail.com <mailto:george.shuk...@gmail.com>>: shade fails if see too duplicate images in account. o = shade.OpenStackCloud(**creds) o.create_image(name=’killme’, filename=’/tmp/random_junk’, disk_format=’qcow2', container_format=’bare’, wait=True) |Traceback (most recent call last): ... File "/usr/lib/python2.7/dist-packages/shade/openstackcloud.py", line 2269, in create_image current_image = self.get_image(name) File "/usr/lib/python2.7/dist-packages/shade/openstackcloud.py", line 1703, in get_image return _utils._get_entity(self.search_images, name_or_id, filters) File "/usr/lib/python2.7/dist-packages/shade/_utils.py", line 143, in _get_entity "Multiple matches found for %s" % name_or_id) shade.exc.OpenStackCloudException: Multiple matches found for killme| On 11/18/2016 12:20 AM, Clint Byrum wrote: You may find the 'shade' library a straight forward choice: http://docs.openstack.org/infra/shade/ <http://docs.openstack.org/infra/shade/> Excerpts from George Shuklin's message of 2016-11-17 20:17:08 +0200: Hello. I can't find proper documentation about how to use openstack clients from inside python application. All I can find is just examples and rather abstract (autogenerated) reference. Is there any normal documentation about proper way to use openstack clients from python applications? Thanks. ___ OpenStack-operators mailing list OpenStack-operators@lists.openstack.org <mailto:OpenStack-operators@lists.openstack.org> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators <http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators> ___ OpenStack-operators mailing list OpenStack-operators@lists.openstack.org <mailto:OpenStack-operators@lists.openstack.org> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators <http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators> ___ OpenStack-operators mailing list OpenStack-operators@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
Re: [Openstack-operators] Using novaclient, glanceclient, etc, from python
shade fails if see too duplicate images in account. o = shade.OpenStackCloud(**creds) o.create_image(name=’killme’, filename=’/tmp/random_junk’, disk_format=’qcow2', container_format=’bare’, wait=True) |Traceback (most recent call last): ... File "/usr/lib/python2.7/dist-packages/shade/openstackcloud.py", line 2269, in create_image current_image = self.get_image(name) File "/usr/lib/python2.7/dist-packages/shade/openstackcloud.py", line 1703, in get_image return _utils._get_entity(self.search_images, name_or_id, filters) File "/usr/lib/python2.7/dist-packages/shade/_utils.py", line 143, in _get_entity "Multiple matches found for %s" % name_or_id) shade.exc.OpenStackCloudException: Multiple matches found for killme| On 11/18/2016 12:20 AM, Clint Byrum wrote: You may find the 'shade' library a straight forward choice: http://docs.openstack.org/infra/shade/ Excerpts from George Shuklin's message of 2016-11-17 20:17:08 +0200: Hello. I can't find proper documentation about how to use openstack clients from inside python application. All I can find is just examples and rather abstract (autogenerated) reference. Is there any normal documentation about proper way to use openstack clients from python applications? Thanks. ___ OpenStack-operators mailing list OpenStack-operators@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators ___ OpenStack-operators mailing list OpenStack-operators@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
Re: [Openstack-operators] Using novaclient, glanceclient, etc, from python
No. I've tried to use openstacksdk to set up properties for image and there has been a zero information how to do this, and manual attempt to discover 'how to', I found it is broken. Bugreport is already 7 month old and there is no motion at all. python-openstacksdk is broken and unusable. https://medium.com/@george.shuklin/openstacksdk-unable-to-update-image-properties-191bffb670f2#.wa23wa7nm Bug: https://bugs.launchpad.net/python-openstacksdk/+bug/1455620 On 11/17/2016 11:27 PM, Kostyantyn Volenbovskyi wrote: Hi, do you mean that information in [1] is not adequate to your needs? BR, Konstantin [1] http://docs.openstack.org/user-guide/sdk.html On Nov 17, 2016, at 7:17 PM, George Shuklin <george.shuk...@gmail.com> wrote: Hello. I can't find proper documentation about how to use openstack clients from inside python application. All I can find is just examples and rather abstract (autogenerated) reference. Is there any normal documentation about proper way to use openstack clients from python applications? Thanks. ___ OpenStack-operators mailing list OpenStack-operators@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators ___ OpenStack-operators mailing list OpenStack-operators@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
[Openstack-operators] openstack sdk: how to update image (create image with properties)
Hello. I'm trying to use openstack SDK in my python code. I want to upload image and set few properties. And I can't. My code (without properties): from openstack import connection import os con=connection.Connection(auth_url=os.environ['OS_AUTH_URL'], project_name = os.environ['OS_TENANT_NAME'], username=os.environ['OS_USERNAME'], password=os.environ['OS_PASSWORD'] con.image.upload_image(name='killme', data=file('/tmp/1','r'), disk_format="qcow2", container_format="bare") with properties (few different attempts): con.image.upload_image(name='killme', data=file('/tmp/1','r'), disk_format="qcow2", container_format="bare", foo="bar") #ignored con.image.upload_image(name='killme', data=file('/tmp/1','r'), disk_format="qcow2", container_format="bare", properties="foo=bar") #set property 'properties' to 'foo=bar' con.image.upload_image(name='killme', data=file('/tmp/1','r'), disk_format="qcow2", container_format="bare", properties={"foo":"bar"}) #return http error How can I set properties for images via openstack SDK? This behavior is bug or feature? ___ OpenStack-operators mailing list OpenStack-operators@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
[Openstack-operators] Using novaclient, glanceclient, etc, from python
Hello. I can't find proper documentation about how to use openstack clients from inside python application. All I can find is just examples and rather abstract (autogenerated) reference. Is there any normal documentation about proper way to use openstack clients from python applications? Thanks. ___ OpenStack-operators mailing list OpenStack-operators@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
Re: [Openstack-operators] Allow to investigate instance actions after instance deletion
Yes, one more reason for upgrade. Thank you! On 04/13/2016 06:23 PM, Kris G. Lindgren wrote: This spec/feature has already done on it and is committed: https://review.openstack.org/#/q/topic:bp/os-instance-actions-read-deleted-instances It landed in mitaka. ___ Kris Lindgren Senior Linux Systems Engineer GoDaddy From: Dina Belova <dbel...@mirantis.com <mailto:dbel...@mirantis.com>> Date: Wednesday, April 13, 2016 at 4:08 AM To: George Shuklin <george.shuk...@gmail.com <mailto:george.shuk...@gmail.com>> Cc: "openstack-operators@lists.openstack.org <mailto:openstack-operators@lists.openstack.org>" <openstack-operators@lists.openstack.org <mailto:openstack-operators@lists.openstack.org>> Subject: Re: [Openstack-operators] Allow to investigate instance actions after instance deletion George, I really believe this can be processed via Ceilometer events. Events about all actions happened to instance are coming to Ceilometer. Cheers, Dina On Wed, Apr 13, 2016 at 12:23 PM, George Shuklin <george.shuk...@gmail.com <mailto:george.shuk...@gmail.com>> wrote: I filed a bug (feature request) about ability to see deleted instances action list: https://bugs.launchpad.net/nova/+bug/1569779 <https://bugs.launchpad.net/nova/+bug/1569779> Any ideas? I really want to see it like this: I filed a bug (feature request) about ability to see deleted instances action list: https://bugs.launchpad.net/nova/+bug/1569779 <https://bugs.launchpad.net/nova/+bug/1569779> Any ideas? I really want to see it like this: +---+--+-++ | Action| Request_ID | Message | Start_Time | +---+--+-++ | create| req-31f61086-ce71-4e0a-9ef5-3d1bdd386043 | - | 2015-05-26T12:09:54.00 | | reboot| req-4632c799-a83e-489c-bb04-5ed4f47705af | - | 2015-05-26T14:21:53.00 | | stop | req-120635d8-ef53-4237-b95a-7d15f00ab6bf | - | 2015-06-01T08:46:03.00 | | migrate | req-bdd680b3-06d5-48e6-868b-d3e4dc17796a | - | 2015-06-01T08:48:14.00 | | confirmResize | req-a9af49d4-833e-404e-86ac-7d8907badd9e | - | 2015-06-01T08:58:03.00 | | start | req-5a2f5295-8b63-4cb7-84d9-dad1c6abf053 | - | 2015-06-01T08:58:20.00 | | delete| req----- | - | 2016-04-01T00:00:00.00 | +---+--+-++ ___ OpenStack-operators mailing list OpenStack-operators@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
[Openstack-operators] Allow to investigate instance actions after instance deletion
I filed a bug (feature request) about ability to see deleted instances action list: https://bugs.launchpad.net/nova/+bug/1569779 Any ideas? I really want to see it like this: I filed a bug (feature request) about ability to see deleted instances action list: https://bugs.launchpad.net/nova/+bug/1569779 Any ideas? I really want to see it like this: +---+--+-++ | Action| Request_ID | Message | Start_Time | +---+--+-++ | create| req-31f61086-ce71-4e0a-9ef5-3d1bdd386043 | - | 2015-05-26T12:09:54.00 | | reboot| req-4632c799-a83e-489c-bb04-5ed4f47705af | - | 2015-05-26T14:21:53.00 | | stop | req-120635d8-ef53-4237-b95a-7d15f00ab6bf | - | 2015-06-01T08:46:03.00 | | migrate | req-bdd680b3-06d5-48e6-868b-d3e4dc17796a | - | 2015-06-01T08:48:14.00 | | confirmResize | req-a9af49d4-833e-404e-86ac-7d8907badd9e | - | 2015-06-01T08:58:03.00 | | start | req-5a2f5295-8b63-4cb7-84d9-dad1c6abf053 | - | 2015-06-01T08:58:20.00 | | delete| req----- | - | 2016-04-01T00:00:00.00 | +---+--+-++ ___ OpenStack-operators mailing list OpenStack-operators@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
Re: [Openstack-operators] Live snapshots on the raw disks never ends
Yes, we using swift as backend, and swift running as separate installation (own keystone, etc). I can't find any logs about any problems with backend. They should be logged by glance? On 09/29/2015 01:21 AM, David Wahlstrom wrote: George, What is your storage backend using (Gluster/ceph/local disk/etc)? Some of the distributed backend drivers have bugs in them or mask the real issue (such as watchers on objects). On Thu, Sep 24, 2015 at 8:11 AM, Kris G. Lindgren <klindg...@godaddy.com <mailto:klindg...@godaddy.com>> wrote: I believe I was talking to Josh Harlow (he's harlowja in #openstack-operators on freenode) from Yahoo, about something like this the other day. He was saying that recently on a few hypervisors they would randomly run into HV disks that were completely full due to snapshots. I have not personally ran into this, so I can't be of more help. ___ Kris Lindgren Senior Linux Systems Engineer GoDaddy On 9/24/15, 7:02 AM, "George Shuklin" <george.shuk...@gmail.com <mailto:george.shuk...@gmail.com>> wrote: >Hello everyone. > >Is someone ever saw 'endless snapshot' problem? Some instances (with raw >disks and live snapshoting enabled) are stuck at image_uploading forever. > >It looks like this: > >+--+--+ >| Property | Value | >+--+--+ >| status | ACTIVE | >| updated | 2015-07-16T08:07:00Z | >| OS-EXT-STS:task_state| image_uploading | >| OS-EXT-SRV-ATTR:host | compute | >| key_name | ses | >| image| Ubuntu 14.04 (3736af94-b25e-4b8d-96fd-fd5949bbd81e) | >| OS-EXT-STS:vm_state | active | >| OS-EXT-SRV-ATTR:instance_name| instance-000d | >| OS-SRV-USG:launched_at | 2015-05-09T17:28:09.00 | >| OS-EXT-SRV-ATTR:hypervisor_hostname | compute.lab.internal | >| flavor | flavor2 (2) | >| id | f2365fe4-9b30-4c24-b7b9-f7fcb4165160 | >| security_groups | [{u'name': u'default'}] | >| OS-SRV-USG:terminated_at | None | >| user_id | 61096c639d674e4cb8bf487cec01432a | >| name | non-test | >| created | 2015-05-09T17:27:48Z | >...etc > >Any ideas why this happens? All logs are clear, no errors or anything. >And it happens at random so no 'debug' log available... > >___ >OpenStack-operators mailing list >OpenStack-operators@lists.openstack.org <mailto:OpenStack-operators@lists.openstack.org> >http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators ___ OpenStack-operators mailing list OpenStack-operators@lists.openstack.org <mailto:OpenStack-operators@lists.openstack.org> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators -- David W. Unix, because every barista in Seattle has an MCSE. ___ OpenStack-operators mailing list OpenStack-operators@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
Re: [Openstack-operators] KVM memory overcommit with fast swap
One notice: Even on the super-super-fast SSD, there is a huge overhead on IO. Basically, you can't go lower than 50 us on IO, and this is 5 ns, almost eternity for the modern processors. And you get minor page fault, which is not the fastest thing in the world. Few context switching, filesystem/block device level... And 50us - is the best possible. Normally you will have something like 150us, which is very slow. It's ok to push to swap some unused or rarely used part of the guests memory, but do not expect it to be silver bullet. Borderline between 'normal swap operations' and 'thrashed system' is very blurry, and main symptom your guests will experience during overswapping is extreme raise of latency (everything: IO, networking...). And when this happens you will have no knobs to fix things... Even if you kill some of the guests, it will take up to 10 minutes to finish thrashing part of the swap and reduce congestion on IO. In my experience, for average compute node no more than 20% of memory may be pushed to swap without significant consequences. ... And swap in the guests is better. Because guest may throw away few pages from cache, if needed. But host will swap guest page cache as well, as actual process memory. Allocate that SSD as ephemeral drive to guests and let them swap. On 07/03/2015 11:19 AM, Blair Bethwaite wrote: Damnit! So no-one has done this or has a feel for it? I was really hoping for the lazy option here. So next question. Ideas for convoluting a reasonable test case? Assuming I've got a compute node with 256GB RAM and 350GB of PCIe SSD for swap, what next? We've got Rally going so could potentially use that, but I'm not sure whether it can do different tasks in parallel in order to simulate a set of varied workloads... Ideally we'd want at least these workloads happening in parallel: - web servers - db servers - idle servers - batch processing On 30 June 2015 at 03:24, Warren Wang war...@wangspeed.com wrote: I'm gonna forward this to my co-workers :) I've been kicking this idea around for some time now, and it hasn't caught traction. I think it could work for a modest overcommit, depending on the memory workload. We decided that it should be possible to do this sanely, but that it needed testing. I'm happy to help test this out. Sounds like the results could be part of a Tokyo talk :P Warren Warren On Mon, Jun 29, 2015 at 9:36 AM, Blair Bethwaite blair.bethwa...@gmail.com wrote: Hi all, Question up-front: Do the performance characteristics of modern PCIe attached SSDs invalidate/challenge the old don't overcommit memory with KVM wisdom (recently discussed on this list and at meetups and summits)? Has anyone out there tried tested this? Long-form: I'm currently looking at possible options for increasing virtual capacity in a public/community KVM based cloud. We started very conservatively at a 1:1 cpu allocation ratio, so perhaps predictably we have boatloads of CPU headroom to work with. We also see maybe 50% memory actually in-use on a host that is, from Nova's perspective, more-or-less full. The most obvious thing to do here is increase available memory. There are at least three ways to achieve that: 1/ physically add RAM 2/ reduce RAM per vcore (i.e., introduce lower RAM flavors) 3/ increase virtual memory capacity (i.e., add swap) and make ram_allocation_ratio 1 We're already doing a bit of #2, but at the end of the day, taking away flavors and trying to change user behaviour is actually harder than just upgrading hardware. #1 is ideal but I do wonder whether we'd be better to spend that same money on some PCIe SSD and use it for #3 (at least for our 'standard' flavor classes), the advantage being that SSD is cheaper per GB (and it might also help alleviate IOPs starvation for local storage based hosts)... The question is whether the performance characteristics of modern PCIe attached SSDs invalidate the old don't overcommit memory with KVM wisdom (recently discussed on this list: http://www.gossamer-threads.com/lists/openstack/operators/46104 and also apparently at the Kilo mid-cycle: https://etherpad.openstack.org/p/PHL-ops-capacity-mgmt where there was an action to update the default from 1.5 to 1.0, though that doesn't seem to have happened). Has anyone out there tried this? I'm also curious if anyone has any recent info re. the state of automated memory ballooning and/or memory hotplug? Ideally a RAM overcommitted host would try to inflate guest balloons before swapping. -- Cheers, ~Blairo ___ OpenStack-operators mailing list OpenStack-operators@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators ___ OpenStack-operators mailing list OpenStack-operators@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
Re: [Openstack-operators] Allow user to see instances of other users
Thank you! You saved me a day of the work. Well, we'll move a script to admin user instead of normal user with the special role. PS And thanks for filling a bugreport too. On 06/11/2015 10:40 PM, Sławek Kapłoński wrote: Hello, I don't think it is possible because in nova/db/sqlalchemy/api.py in function instance_get_all_by_filters You have something like: if not context.is_admin: # If we're not admin context, add appropriate filter.. if context.project_id: filters['project_id'] = context.project_id else: filters['user_id'] = context.user_id This is from Juno, but in Kilo it is the same. So in fact even if You will set proper policy.json rules it will still require admin context to search instances from different tenants. Maybe I'm wrong and this is in some other place possible and maybe someone will show me where because I was also looking for it last time :) -- Pozdrawiam / Best regards Sławek Kapłoński sla...@kaplonski.pl Dnia czwartek, 11 czerwca 2015 21:06:31 George Shuklin pisze: Hello. I'm trying to allow a user with special role to see all instances of all tenants without giving him admin privileges. My initial attempt was to change policy.json for nova to compute:get_all_tenants: role:special_role or is_admin:True. But it didn't work well. The command (nova list --all-tenants) is not failing anymore (no 'ERROR (Forbidden): Policy doesn't allow compute:get_all_tenants to be performed.'), but the returned list is empty: nova list --all-tenants ++--+++-+--+ | ID | Name | Status | Task State | Power State | Networks | ++--+++-+--+ ++--+++-+--+ Any ideas how to allow a user without admin privileges to see all instances? ___ OpenStack-operators mailing list OpenStack-operators@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators ___ OpenStack-operators mailing list OpenStack-operators@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators ___ OpenStack-operators mailing list OpenStack-operators@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
Re: [Openstack-operators] Gentoo image availability
On 06/09/2015 05:46 AM, Matthew Thode wrote: Ya, not sure how to do multi-interface yet. I'd love if the cloud-init static ip support would work with it. (hash with macs being the key and a list of IPs being the value for each interface). Then dhcp can go away (I tend to much prefer config-drive). The disk-image-builder support is on my todo list already :D I just updated the cloud-init ebuild with a better cloud.cfg, could probably use more love, but it works. I am working on getting gentoo as a first class citizen in openstack-ansible as well, which depends on the disk-image-builder work. So much work still to do :D Aw. Don't discriminate DHCP. It has many nice features (for example, if you add new interface to existing VM, cloud-init with static config will ignore it, but DHCP will works like magic). I don't know how it works in Gentoo, but in Debian 'allow-hotplug' for all interfaces but eth0 allows to support most of the future interfaces. Same for CentOS - you can add few eth scripts to network configuration and they will works as soon as new interface appears. ___ OpenStack-operators mailing list OpenStack-operators@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
Re: [Openstack-operators] Gentoo image availability
Nice to hear. You're doing a great job! Few things to make Gentoo 'first class citizen' for openstack (guest). 1. Check if you supports for all eth's, not only eth0. If instance boots with two or more interfaces, it should be able to get all it addresses. 2. Add Gentoo 'element' to disk-image-builder (https://github.com/openstack/diskimage-builder) 3. Ship image with proper cloud-init cloud.cfg On 06/08/2015 06:26 PM, Matthew Thode wrote: Hi, I'm the packager of Openstack on Gentoo and have just started generation of Gentoo Openstack images. Right now it is just a basic amd64 image, but I plan on adding nomultilib and hardened variants (for a total of at least 4 images). I plan on generating these images at least weekly These images are not yet sanctioned by our infra team, but I plan on remedying that (being a member of said team should help). I am currently using the scripts at https://github.com/prometheanfire/gentoo-cloud-prep to generate the images (based on a heavily modified version of Matt Vandermeulen's scripts). If you have any issues please submit bugs there or contact me on irc (prometheanfire on freenode). Here's the link to the images, I'm currently gpg signing them with the same key I use to sign this email (offline master key smartcard setup for security minded folk). http://23.253.251.73/ Let me know if you have questions, ___ OpenStack-operators mailing list OpenStack-operators@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators ___ OpenStack-operators mailing list OpenStack-operators@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
Re: [Openstack-operators] 100% CPU and hangs if syslog is restarted
On 05/28/2015 07:56 PM, George Shuklin wrote: Hello. Today we've discover a very serious bug in juno: https://bugs.launchpad.net/nova/+bug/1459726 In short: if you're using syslog, and restart rsyslog, all APIs processes will eventually stuck with 100% CPU usage without doing anything. Is anyone hits this bug before? It looks like very nasty. Just to let everyone to know: Update to proposed version of python-eventlet fixes the problem. Proposed debs can be found here: https://launchpad.net/ubuntu/+source/python-eventlet ___ OpenStack-operators mailing list OpenStack-operators@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
[Openstack-operators] 100% CPU and hangs if syslog is restarted
Hello. Today we've discover a very serious bug in juno: https://bugs.launchpad.net/nova/+bug/1459726 In short: if you're using syslog, and restart rsyslog, all APIs processes will eventually stuck with 100% CPU usage without doing anything. Is anyone hits this bug before? It looks like very nasty. ___ OpenStack-operators mailing list OpenStack-operators@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
Re: [Openstack-operators] Fw: VM Stuck in Error State
Enable debug in nova.conf on compute2 host, restart nova-compute and try again. You should see the reason in the log. It can be bad connection to glance, or problem with networking on the host. On 05/22/2015 12:04 PM, Abhishek Talwar wrote: Hi Folks, I know this is not the place to ask usage questions and doubts regarding my deployment but no one is answering my questions on ask.openstack.org. So I thought of asking the same here. Problem: I have a multi-node set up with 2 compute nodes( *Compute1 and Compute2*). When I boot VM's on *compute 1* they go to *ACTIVE* status but when I try to boot VM's on *Compute2* host they are stuck in *BUILD* status. I can see both the compute nodes *UP* in “*nova service-list*” and both the host are present in “*nova hypervisor-list*” So what could be the reason that VM's are not going to ACTIVE status in Compute2 host. The latest scheduler logs shows: *nova.scheduler.host_manager [req-a4debdff-0ad5-4fb8-afd4-e0dc255c3c78 None] Host filter forcing available hosts to compute1 *There is no error in the logs of scheduler so what could be the possible reason.* Regards Abhishek Talwar * =-=-= Notice: The information contained in this e-mail message and/or attachments to it may contain confidential or privileged information. If you are not the intended recipient, any dissemination, use, review, distribution, printing or copying of the information contained in this e-mail message and/or attachments to it are strictly prohibited. If you have received this communication in error, please notify us by reply e-mail or telephone and immediately and permanently delete the message and any attachments. Thank you ___ OpenStack-operators mailing list OpenStack-operators@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators ___ OpenStack-operators mailing list OpenStack-operators@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
Re: [Openstack-operators] Raising the degree of the scandal
On 05/17/2015 04:33 PM, Miguel Ángel Ajo wrote: Probably the solution is not selected to be backported because: * It’s an intrusive change * Introduces new dependencies * Probably it’s going to introduce a performance penalty because eatables is slow. I’m asking in reviews for this feature to be enabled/disabled via a flag. I'm understand that. All neutron/ovs stuff is horrible in term of performance. But it makes compromise between security fix and 'concerns'. I do not worry about problem, I worry about trading security for something else. If this is not fixed, why bother to talk about 'supported/unsupported' versions of openstack? Btw: If it is good enough for Libre, why it is not good enough for Kilo? ___ OpenStack-operators mailing list OpenStack-operators@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
Re: [Openstack-operators] Raising the degree of the scandal
On 05/15/2015 07:48 PM, Jay Pipes wrote: On 05/15/2015 12:38 PM, George Shuklin wrote: Just to let everyone know: broken antispoofing is not an 'security issue' and the fix is not planned to be backported to Juno/kilo. https://bugs.launchpad.net/bugs/1274034 What can I say? All hail devstack! Who care about production? George, I can understand you are frustrated with this issue and feel strongly about it. However, I don't think notes like this are all that productive. Would a more productive action be to tell the operator community a bit about the vulnerability and suggest appropriate remedies to take? Ok, sorry. Short issue: If few tenants use same network (shared network) one tenant may disrupt network activities of other tenant by sending a specially crafted ARP packets on behave of the victim. Normally, Openstack prohibit usage of unauthorized addresses (this feature is called 'antispoofing' and it is essential for multi-tenant clouds). This feature were subtly broken (malicious tenant may not use other addresses but still may disrupt activities of other tenants). Finally, that bug has been fixed. But now they says 'oh, it is not that important, we will not backport it to current releases, only to Libery' because of new etables dependency. ___ OpenStack-operators mailing list OpenStack-operators@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
[Openstack-operators] Raising the degree of the scandal
Just to let everyone know: broken antispoofing is not an 'security issue' and the fix is not planned to be backported to Juno/kilo. https://bugs.launchpad.net/bugs/1274034 What can I say? All hail devstack! Who care about production? ___ OpenStack-operators mailing list OpenStack-operators@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
Re: [Openstack-operators] Multiple vlan ranges on same physical interface [ml2]
On 05/11/2015 11:23 AM, Kevin Benton wrote: I apologize but I didn't quite follow what the issue was with tenants allocating networks in your use case, can you elaborate a bit there? From what it sounded like, it seems like you could define the vlan range you want the tenants' internal networks to come from in the network_vlan_ranges. Then any admin networks would just specify the segmentation id outside of that range. Why doesn't that work? I (as admin) can use vlans outside of network_vlan_ranges in [ml2_type_vlan] section of ml2_conf.ini? I've never tried... Yes, I can! Thank you. Thanks, Kevin Benton On May 9, 2015 17:16, George Shuklin george.shuk...@gmail.com mailto:george.shuk...@gmail.com wrote: Yes, that's result. My plan was to allow 'internal' networks in neutron (by tenants itself), but after some struggle we've decided to fallback to 'created by script during tenant bootstrapping'. Unfortunately, neutron has no conception of 'default physical segment' for VLAN autoallocation for tenant networks (it just grabs first available). On 05/09/2015 03:08 AM, Kevin Benton wrote: So if you don't let tenants allocate networks, then why do the VLAN ranges in neutron matter? It can just be part of your net-create scripts. On Fri, May 8, 2015 at 9:35 AM, George Shuklin george.shuk...@gmail.com mailto:george.shuk...@gmail.com wrote: We've got a bunch of business logic above openstack. It's allocating VLANs on-fly for external networks and connect pieces outside neutron (configuring hardware router, etc). Anyway, after some research we've decided to completely ditch idea of 'tenant networks'. All networks are external and handled by our software with administrative rights. All networks for tenant are created during tenant bootstrap, including local networks which are now looking funny 'external local network without gateway'. By nailing every moving part in 'neutron net-create' we've got stable behaviour and kept allocation database inside our software. That kills a huge part of openstack idea, but at least it works straightforward and nice. I really like to see all that been implemented in vendor plugins for neutron, but average code and documentation quality for them are below any usable level, so we implements hw configuration by ourselves. On 05/08/2015 09:15 AM, Kevin Benton wrote: If one set of VLANs is for external networks which are created by admins, why even specify network_vlan_ranges for that set? For example, even if network_vlan_ranges is 'local:1000:4000', you can still successfully run the following as an admin: neutron net-create --provider:network_type=vlan --provider:physical_network=local --provider:segmentation_id=40 myextnet --router:external On Thu, May 7, 2015 at 7:32 AM, George Shuklin george.shuk...@gmail.com mailto:george.shuk...@gmail.com wrote: Hello everyone. Got a problem: we want to use same physical interface for external networks and virtual (tenant) networks. All inside vlans with different ranges. My expected config was: [ml2] type_drivers = vlan tenant_network_types = vlan [ml2_type_vlan] network_vlan_ranges = external:1:100,local:1000:4000 [ovs] bridge_mappings = external:br-ex,local:br-ex But it does not work: ERROR neutron.plugins.openvswitch.agent.ovs_neutron_agent [-] Parsing bridge_mappings failed: Value br-ex in mapping: 'gp:br-ex' not unique. Agent terminated! I understand that I can cheat and manually configure bridge pile (br-ex and br-loc both plugged to br-real, which linked to physical interface), but it looks very fragile. Is any nicer way to do this? And why ml2 (ovs plugin?) does not allow to use mapping from many networks to one bridge? ___ OpenStack-operators mailing list OpenStack-operators@lists.openstack.org mailto:OpenStack-operators@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators -- Kevin Benton ___ OpenStack-operators mailing list OpenStack-operators@lists.openstack.org mailto:OpenStack-operators@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators -- Kevin Benton
Re: [Openstack-operators] Multiple vlan ranges on same physical interface [ml2]
Yes, that's result. My plan was to allow 'internal' networks in neutron (by tenants itself), but after some struggle we've decided to fallback to 'created by script during tenant bootstrapping'. Unfortunately, neutron has no conception of 'default physical segment' for VLAN autoallocation for tenant networks (it just grabs first available). On 05/09/2015 03:08 AM, Kevin Benton wrote: So if you don't let tenants allocate networks, then why do the VLAN ranges in neutron matter? It can just be part of your net-create scripts. On Fri, May 8, 2015 at 9:35 AM, George Shuklin george.shuk...@gmail.com mailto:george.shuk...@gmail.com wrote: We've got a bunch of business logic above openstack. It's allocating VLANs on-fly for external networks and connect pieces outside neutron (configuring hardware router, etc). Anyway, after some research we've decided to completely ditch idea of 'tenant networks'. All networks are external and handled by our software with administrative rights. All networks for tenant are created during tenant bootstrap, including local networks which are now looking funny 'external local network without gateway'. By nailing every moving part in 'neutron net-create' we've got stable behaviour and kept allocation database inside our software. That kills a huge part of openstack idea, but at least it works straightforward and nice. I really like to see all that been implemented in vendor plugins for neutron, but average code and documentation quality for them are below any usable level, so we implements hw configuration by ourselves. On 05/08/2015 09:15 AM, Kevin Benton wrote: If one set of VLANs is for external networks which are created by admins, why even specify network_vlan_ranges for that set? For example, even if network_vlan_ranges is 'local:1000:4000', you can still successfully run the following as an admin: neutron net-create --provider:network_type=vlan --provider:physical_network=local --provider:segmentation_id=40 myextnet --router:external On Thu, May 7, 2015 at 7:32 AM, George Shuklin george.shuk...@gmail.com mailto:george.shuk...@gmail.com wrote: Hello everyone. Got a problem: we want to use same physical interface for external networks and virtual (tenant) networks. All inside vlans with different ranges. My expected config was: [ml2] type_drivers = vlan tenant_network_types = vlan [ml2_type_vlan] network_vlan_ranges = external:1:100,local:1000:4000 [ovs] bridge_mappings = external:br-ex,local:br-ex But it does not work: ERROR neutron.plugins.openvswitch.agent.ovs_neutron_agent [-] Parsing bridge_mappings failed: Value br-ex in mapping: 'gp:br-ex' not unique. Agent terminated! I understand that I can cheat and manually configure bridge pile (br-ex and br-loc both plugged to br-real, which linked to physical interface), but it looks very fragile. Is any nicer way to do this? And why ml2 (ovs plugin?) does not allow to use mapping from many networks to one bridge? ___ OpenStack-operators mailing list OpenStack-operators@lists.openstack.org mailto:OpenStack-operators@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators -- Kevin Benton ___ OpenStack-operators mailing list OpenStack-operators@lists.openstack.org mailto:OpenStack-operators@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators -- Kevin Benton ___ OpenStack-operators mailing list OpenStack-operators@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
[Openstack-operators] Multiple vlan ranges on same physical interface [ml2]
Hello everyone. Got a problem: we want to use same physical interface for external networks and virtual (tenant) networks. All inside vlans with different ranges. My expected config was: [ml2] type_drivers = vlan tenant_network_types = vlan [ml2_type_vlan] network_vlan_ranges = external:1:100,local:1000:4000 [ovs] bridge_mappings = external:br-ex,local:br-ex But it does not work: ERROR neutron.plugins.openvswitch.agent.ovs_neutron_agent [-] Parsing bridge_mappings failed: Value br-ex in mapping: 'gp:br-ex' not unique. Agent terminated! I understand that I can cheat and manually configure bridge pile (br-ex and br-loc both plugged to br-real, which linked to physical interface), but it looks very fragile. Is any nicer way to do this? And why ml2 (ovs plugin?) does not allow to use mapping from many networks to one bridge? ___ OpenStack-operators mailing list OpenStack-operators@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
Re: [Openstack-operators] Multiple vlan ranges on same physical interface [ml2]
On 05/07/2015 06:17 PM, gustavo panizzo (gfa) wrote: On 2015-05-07 22:32, George Shuklin wrote: Hello everyone. Got a problem: we want to use same physical interface for external networks and virtual (tenant) networks. All inside vlans with different ranges. My expected config was: [ml2] type_drivers = vlan tenant_network_types = vlan [ml2_type_vlan] network_vlan_ranges = external:1:100,local:1000:4000 [ovs] bridge_mappings = external:br-ex,local:br-ex that's wrong you need something like [ml2] type_drivers = vlan tenant_network_types = vlan [ml2_type_vlan] network_vlan_ranges = blabla:1:100 [ovs] bridge_mappings = blabla:br-ex neutron net-create flat-network --provider:network-type flat --provider:physical_network blabla neutron net-create vlanN --provider:network-type vlan --provider:physical_network blabla --provider:segmentation_id N ... neutron net-create vlanN+nn --provider:network-type vlan --provider:physical_network blabla --provider:segmentation_id N+nn on each physical interface you can put one flat and up to 4096(?) vlans but you can't define the same bridge_mapping twice Thanks. I wanted to put tenant networks and external networks on the same network, but than I realised that there is no way to say neutron to avoid specific vlan_id's once you set up tenant_network_types=vlan and add vlan_id to the list of available for neutron. It working fine until you allocating networks by yourself (as admin) but will allocate random segment/id for tenant (because tenant usually do not specify physical network) on request. Sad. I'll stick to vlan for external and shared networks and put private networks back to the GRE. ___ OpenStack-operators mailing list OpenStack-operators@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
Re: [Openstack-operators] [neutron] multiple external networks on the same host NIC
Can you put them to different vlans? After that it would be very easy task. If not, AFAIK, neutron does not allow this. Or you can trick it thinking it is (are) separate networks. Create brige (br-join), plug eth to it. Create to fake external bridges (br-ex1, br-ex2). Join them together to br-join by patch links (http://blog.scottlowe.org/2012/11/27/connecting-ovs-bridges-with-patch-ports/) Instruct neutron like there is two external networks: one on br-ex1, second on br-ex2. But be alert that this not very stable configuration, you need to maintain it by yourself. On 04/25/2015 10:13 PM, Mike Spreitzer wrote: Is there a way to create multiple external networks from Neutron's point of view, where both of those networks are accessed through the same host NIC? Obviously those networks would be using different subnets. I need this sort of thing because the two subnets are treated differently by the stuff outside of OpenStack, so I need a way that a tenant can get a floating IP of the sort he wants. Since Neutron equates floating IP allocation pools with external networks, I need two external networks. I found, for example, http://www.marcoberube.com/archives/248--- which describes how to have multiple external networks but uses a distinct host network interface for each one. Thanks, Mike ___ OpenStack-operators mailing list OpenStack-operators@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators ___ OpenStack-operators mailing list OpenStack-operators@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
Re: [Openstack-operators] over commit ratios
Yes, it really depends on the used backing technique. We using SSDs and raw images, so IO is not an issue. But memory is more important: if you lack IO capability you left with slow guests. If you lack memory you left with dead guests (hello, OOM killer). BTW: Swap is needed not to swapin/swapout, but to relief memory pressure. With properly configured memory swin/swout should be less than 2-3. On 04/22/2015 09:49 AM, Tim Bell wrote: I'd also keep an eye on local I/O... we've found this to be the resource which can cause the worst noisy neighbours. Swapping makes this worse. Tim -Original Message- From: George Shuklin [mailto:george.shuk...@gmail.com] Sent: 21 April 2015 23:55 To: openstack-operators@lists.openstack.org Subject: Re: [Openstack-operators] over commit ratios It's very depend on production type. If you can control guests and predict their memory consumption, use it as base for ratio. If you can't (typical for public clouds) - use 1 or smaller with reserved_host_memory_mb in nova.conf. And one more: some swap sapce is really necessary. Add at least twice of reserved_host_memory_mb - it really improves performance and prevents strange OOMs in the situation of very large host with very small dom0 footprint. On 04/21/2015 10:59 PM, Caius Howcroft wrote: Just a general question: what kind of over commit ratios do people normally run in production with? We currently run 2 for cpu and 1 for memory (with some held back for OS/ceph) i.e.: default['bcpc']['nova']['ram_allocation_ratio'] = 1.0 default['bcpc']['nova']['reserved_host_memory_mb'] = 1024 # often larger default['bcpc']['nova']['cpu_allocation_ratio'] = 2.0 Caius ___ OpenStack-operators mailing list OpenStack-operators@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators ___ OpenStack-operators mailing list OpenStack-operators@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
Re: [Openstack-operators] Draft Agenda for the Vancouver Ops Summit Sessions
On 04/13/2015 12:32 PM, Tom Fifield wrote: What kind of projects will be a sessions 'Architecture Show and Tell' and 'Architecture Show and Tell - Special Edition' about? Thanks. On 04/13/2015 12:32 PM, Tom Fifield wrote: [cut] _*General Sessions*_ Tuesday Big Room 1 Big Room 2 Big Room 3 11:15 - 11:55 Ops Summit 101 / The Story So Far Federation - Keystone other - what do people need? RabbitMQ 12:05 - 12:45 How do we fix logging? Architecture Show and Tell Ceilometer - what needs fixing? 12:45 - 2:00 2:00 - 2:40 Billing / show back / charge back - how do I do that? Architecture Show and Tell - Special EditionCinder Feedback 2:50 - 3:30 [cut] ___ OpenStack-operators mailing list OpenStack-operators@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
Re: [Openstack-operators] MTU on router interface (neutron GRE) without jumbo
Thanks! But those examples are for same MTU for client and server. If we have Client: 1500 router in the middle 1500 OVS/GRE: 1458 server: 1458 For tcp this is ok. But can it hurt somehow other protocols? UDP, RST, etc? On 03/14/2015 08:52 PM, Joseph Bajin wrote: The size of MTU only really matters for the server and client. The between connections need to be larger than the packets that are being sent. Scenario 1: Server - 1400 MTU Client - 1400 MTU Switches - 9216 MTU OVS - 1500 MTU Result: Successful - Traffic passes without any issue Scenario 2: Server - 1520 MTU Client - 1520 MTU Switches - 1516 MTU OVS - 1500 MTU Result: Failure - Traffic will have issues passing through. So just make sure everything in-between is higher than your server and client. --Joe On Fri, Mar 13, 2015 at 9:28 AM, George Shuklin george.shuk...@gmail.com mailto:george.shuk...@gmail.com wrote: Hello. We've hit badly changes in behaviour of OVS when we switched from 3.08 to 3.13 kernel. When runs on 3.11 or above, OVS starts to use kernel GRE services. And they copy DNF (do not fragment) flag from encapsulated packet to GRE packet. And this mess up all things, because ICMP messages about dropped GRE never reach neither source nor destination of underlying TCP. We've fixed problems with MTU by using option for DHCP for dnsmasq. This lower MTU inside instances. But there are routers (router namespaces) and they are still using 1500 bytes MTU. I feel like this can cause problems with some types of traffic, when client (outside of openstack) sending DNF packets to instance (via floating) and that packet is silently dropped. 1) Is those concerns have any real life implication? TCP should take in account MTU on server and works smoothly, but other protocols? 2) Is there any way to lower MTU inside router namespace? Thanks. P.S. Jumbo frames is not an option due reasons outside of our reach. ___ OpenStack-operators mailing list OpenStack-operators@lists.openstack.org mailto:OpenStack-operators@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators ___ OpenStack-operators mailing list OpenStack-operators@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
Re: [Openstack-operators] [Neutron][Nova] No Valid Host when booting new VM with Public IP
check out if you allowed nova to use external networks. Somewhere around api-paste.ini, with 'external' in the name of permission. If nova compute rejects binding, it will rejects to start instance, and pass that error to nova-scheduler, and it will return 'NVHF'. On 03/16/2015 10:52 PM, Adam Lawson wrote: Got a strange error and I'm really hoping to get some help with it since it has be scratching my head. When I create a VM within Horizon and select the PRIVATE network, it boots up great. When I attempt to create a VM within Horizon and include the PUBLIC network (either by itself or with the private network), it fails with a No valid host found error. I looked at the nova-api and the nova-scheduler logs on the controller and the most I've found are errors/warnings binding VIF's but I'm not 100% certain it's the root cause although I believe it's related. I didn't find any WARNINGS or ERRORS in the compute or network node. Setup: * 1 physical host running 4 KVM domains/guests o 1x Controller o 1x Networ o 1x Volume o 1x Compute *Controller Node:* nova.conf (http://pastebin.com/q3e9cntH) * neutron.conf (http://pastebin.com/ukEVzBbN) * ml2_conf.ini (http://pastebin.com/w10jBGZC) * nova-api.log (http://pastebin.com/My99Mg2z) * nova-scheduler (http://pastebin.com/Nb75Z6yH) * neutron-server.log (http://pastebin.com/EQVQPVDF) *Network Node:* * l3_agent.ini (http://pastebin.com/DBaD1F5x) * neutron.conf (http://pastebin.com/Bb3qkNi7) * ml2_conf.ini (http://pastebin.com/xEC1Bs9L) *Compute Node:* * nova.conf (http://pastebin.com/K6SiE9Pw) * nova-compute.conf (http://pastebin.com/9Mz30b4v) * neutron.conf (http://pastebin.com/Le4wYRr4) * ml2_conf.ini (http://pastebin.com/nnyhC8mV) *Back-end:* Physical switch Any thoughts on what could be causing this? */ Adam Lawson/* AQORN, Inc. 427 North Tatnall Street Ste. 58461 Wilmington, Delaware 19801-2230 Toll-free: (844) 4-AQORN-NOW ext. 101 International: +1 302-387-4660 Direct: +1 916-246-2072 ___ OpenStack-operators mailing list OpenStack-operators@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators ___ OpenStack-operators mailing list OpenStack-operators@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
Re: [Openstack-operators] [Neutron][Nova] No Valid Host when booting new VM with Public IP
We have that configuration and it works fine. Even better than L3 NAT on neutron routers. Tenant's VM works perfect with external networks and white IPs, but you should make external network available on each compute node (ml2_conf.ini). On 03/18/2015 07:29 PM, Adam Lawson wrote: What I'm trying to do is force OpenStack to do something it normally doesn't do for the sake of learning and experimentation. I.e. bind a public network to a VM so it can be accessed outside the cloud when floating IP's are normally required. I know there are namespace issues at play which may prevent this from working, just trying to scope the boundaries of what I can and cannot do really. */ Adam Lawson/* AQORN, Inc. 427 North Tatnall Street Ste. 58461 Wilmington, Delaware 19801-2230 Toll-free: (844) 4-AQORN-NOW ext. 101 International: +1 302-387-4660 Direct: +1 916-246-2072 On Wed, Mar 18, 2015 at 7:08 AM, Pedro Sousa pgso...@gmail.com mailto:pgso...@gmail.com wrote: Hi Adam For external network you should use floating ips to access externally to your instances if I understood correctly. Regards Em 16/03/2015 20:56, Adam Lawson alaw...@aqorn.com mailto:alaw...@aqorn.com escreveu: Got a strange error and I'm really hoping to get some help with it since it has be scratching my head. When I create a VM within Horizon and select the PRIVATE network, it boots up great. When I attempt to create a VM within Horizon and include the PUBLIC network (either by itself or with the private network), it fails with a No valid host found error. I looked at the nova-api and the nova-scheduler logs on the controller and the most I've found are errors/warnings binding VIF's but I'm not 100% certain it's the root cause although I believe it's related. I didn't find any WARNINGS or ERRORS in the compute or network node. Setup: * 1 physical host running 4 KVM domains/guests o 1x Controller o 1x Networ o 1x Volume o 1x Compute *Controller Node:* nova.conf (http://pastebin.com/q3e9cntH) * neutron.conf (http://pastebin.com/ukEVzBbN) * ml2_conf.ini (http://pastebin.com/w10jBGZC) * nova-api.log (http://pastebin.com/My99Mg2z) * nova-scheduler (http://pastebin.com/Nb75Z6yH) * neutron-server.log (http://pastebin.com/EQVQPVDF) *Network Node:* * l3_agent.ini (http://pastebin.com/DBaD1F5x) * neutron.conf (http://pastebin.com/Bb3qkNi7) * ml2_conf.ini (http://pastebin.com/xEC1Bs9L) *Compute Node:* * nova.conf (http://pastebin.com/K6SiE9Pw) * nova-compute.conf (http://pastebin.com/9Mz30b4v) * neutron.conf (http://pastebin.com/Le4wYRr4) * ml2_conf.ini (http://pastebin.com/nnyhC8mV) *Back-end:* Physical switch Any thoughts on what could be causing this? */ Adam Lawson/* AQORN, Inc. 427 North Tatnall Street Ste. 58461 Wilmington, Delaware 19801-2230 Toll-free: (844) 4-AQORN-NOW ext. 101 International: +1 302-387-4660 tel:%2B1%20302-387-4660 Direct: +1 916-246-2072 tel:%2B1%20916-246-2072 ___ OpenStack-operators mailing list OpenStack-operators@lists.openstack.org mailto:OpenStack-operators@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators ___ OpenStack-operators mailing list OpenStack-operators@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators ___ OpenStack-operators mailing list OpenStack-operators@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
[Openstack-operators] MTU on router interface (neutron GRE) without jumbo
Hello. We've hit badly changes in behaviour of OVS when we switched from 3.08 to 3.13 kernel. When runs on 3.11 or above, OVS starts to use kernel GRE services. And they copy DNF (do not fragment) flag from encapsulated packet to GRE packet. And this mess up all things, because ICMP messages about dropped GRE never reach neither source nor destination of underlying TCP. We've fixed problems with MTU by using option for DHCP for dnsmasq. This lower MTU inside instances. But there are routers (router namespaces) and they are still using 1500 bytes MTU. I feel like this can cause problems with some types of traffic, when client (outside of openstack) sending DNF packets to instance (via floating) and that packet is silently dropped. 1) Is those concerns have any real life implication? TCP should take in account MTU on server and works smoothly, but other protocols? 2) Is there any way to lower MTU inside router namespace? Thanks. P.S. Jumbo frames is not an option due reasons outside of our reach. ___ OpenStack-operators mailing list OpenStack-operators@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
Re: [Openstack-operators] [Ceilometer] Real world experience with Ceilometer deployments - Feedback requested
Ceilometer is in sad state. 1. Collector leaks memory. We ran it on same host with mongo, and it grab 29Gb out of 32, leaving mongo with less than gig memory available. 2. Metering agent cause huge load on neutron-server. o(n) of metering rules and tenants. Few bugs reported, one bugfix in review. 3. Metering agent simply do no work on multi-network-nodes installation. It exepects all routers be on same host. Fixed or not - I don't know, we have our own crude fix. 4. Many rough edges. Ceilometer much less tested than nova. Sometimes it traces and skip counting. Fresh example: if metadata has '.' in the name, ceilometer trace on it and did not count in glance usage. 5. Very slow on reports (using mongo's mapreduce). Overall feeling: barely usable, but with my experience with cloud billings, not the worst thing I saw in my life. About load: except reporting and memory leaks, it use rather small amount of resources. On 02/11/2015 09:37 PM, Maish Saidel-Keesing wrote: Is Ceilometer ready for prime time? I would be interested in hearing from people who have deployed OpenStack clouds with Ceilometer, and their experience. Some of the topics I am looking for feedback on are: - Database Size - MongoDB management, Sharding, replica sets etc. - Replication strategies - Database backup/restore - Overall useability - Gripes, pains and problems (things to look out for) - Possible replacements for Ceilometer that you have used instead If you are willing to share - I am sure it will be beneficial to the whole community. Thanks in Advance With best regards, Maish Saidel-Keesing Platform Architect Cisco ___ OpenStack-operators mailing list OpenStack-operators@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators ___ OpenStack-operators mailing list OpenStack-operators@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
Re: [Openstack-operators] How to handle updates of public images?
On 02/07/2015 08:36 PM, Igor Bolotin wrote: Going back to the idea of archiving images and not allowing launch of new VMs and hiding archived images by default in Horizon/CLI (maybe still can list/show if requested, possibly admin function only). Would it make sense to propose this as a blueprint for the next release? Yes, it sounds nice. But more important - I want to have '_base' gone away when raw disks are used in nova. Why it's needed? ___ OpenStack-operators mailing list OpenStack-operators@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
Re: [Openstack-operators] [openstack-dev] [Telco][NFV][infra] Review process of TelcoWG use cases
On 02/06/2015 09:14 PM, Marcos Garcia wrote: It does look like that. However, the intent here is to allow non-developer members of a Telco provide the use cases they need to accomplish. This way the Telco WG can identify gaps and file a proper spec into each of the OpenStack projects. Indeed, what we're trying to do is help the non-developer members of the group articulate their use cases and tease them out to a level that is meaningful to someone who is not immersed in telecommunications themselves. In this way we hope to in turn be able to create meaningful specifications for the actual OpenStack projects impacted. It's possible that some of these will be truly cross-project and therefore head to openstack-specs but initial indications seem to be that most will either be specific to a project, or cross only a couple of projects (e.g. nova and neutron) - I am sure someone will come up with some more exceptions to this statement to prove me wrong :). Ok, I definitively out of telco business, and I indeed openstack operator. My first question: what you want to do, what problems you want to solve? IMO most of the Telco's are asking Openstack developers to work in the following big areas (the first 3 are basically Carrier Grade): - Performance on the virtualization layer (NUMA, etc) - get baremetal-like performance in big VM's - QoS and capacity management - to get deterministic behavior, always the same regardless of the load - Reliability (via HA, duplicate systems, live-migration, etc) - achieve 99'999% uptime, - Management interfaces (OAM), compatible with their current OSS/BSS systems (i.e. SNMP traps, usage metering for billing) - to don't reinvent the wheel, they have other things to manage too (i.e. legacy) Most of this sounds really interesting for any operators. May be except of NUMA. Buy why telco want more performance? Few percent of loss for manageability - most companies accept this. HA is achievable, QoS may be, duplication is ok. But of deterministic live migrations... Why telco want it? If system have own way to rebalance load, there is a more simple way: to terminate one instance and to buid new. Btw I really want to see deterministic way to fight with 'No valid hosts found'. I was on one 'NVF' session in Paris, and I've expected it to be about SR-IOV and using VF (virtual functions) of network cards for guest acceleration. But instead it was something I just didn't got at all (sorry, Ericsson). So, what are you want to do? Not in terms of 'business solution', but on very low level. Run some specific appliance? Add VoIP support to Neutron? Make something differ? It's all about SLA's stablished by telco's customers: government, military and healthcare systems. SLA's are crazy there. And as an IT operators, you'll all understand those requirements, so it's really not that different compared to Telco operators. Just remember that ETSI NFV is more than all that: you probably saw Ericsson speaking about high-level telco functions: MANO, VIM, EMS and VNFs, etc... that's beyond the scope of you guys, and probably outside the scope of all of the Openstack world.. that's why OPNFV exists. I will be a bit skeptic. It will not work with current quality of the development process ('devstack syndrome'). I just done digging in yet another funny nova 'half-bug' around migration and what I see in the code is... to agile for high SLA systems. May be they (telcos) can really change this, and I really hope, but up to now... Thousands of loosly coupled systems with own bugs and world vision. Just today I found 'hanged' network interface (any operation with netsocket goes to 'D' and can not be terminated) due ixgbe/netconsole bug. 99.99% in those conditions? I just do not believe. (https://www.mail-archive.com/e1000-devel@lists.sourceforge.net/msg10178.html) About Ericcson's presentation - yes, I was inspired by details of previous Rackspace's presentation about depth of the shell/s openvswitch, and suddenly all around starts to talk foreign language. ___ OpenStack-operators mailing list OpenStack-operators@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
[Openstack-operators] How to handle updates of public images?
Hello everyone. We are updating our public images regularly (to provide them to customers in up-to-date state). But there is a problem: If some instance starts from image it becomes 'used'. That means: * That image is used as _base for nova * If instance is reverted this image is used to recreate instance's disk * If instance is rescued this image is used as rescue base * It is redownloaded during resize/migration (on a new compute node) One more (our specific): We're using raw disks with _base on slow SATA drives (in comparison to fast SSD for disks), and if that SATA fails, we replace it (and nova redownloads stuff in _base). If image is deleted, it causes problems with nova (nova can't download _base). The second part of the problem: glance disallows to update image (upload new image with same ID), so we're forced to upload updated image with new ID and to remove the old one. This causes problems described above. And if tenant boots from own snapshot and removes snapshot without removing instance, it causes same problem even without our activity. How do you handle public image updates in your case? Thanks! ___ OpenStack-operators mailing list OpenStack-operators@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
Re: [Openstack-operators] How to handle updates of public images?
Updated report for 'no image' with deleted '_base' behaviour in juno (my previous comment was about havana): 1. If snapshot is removed, original image is used (image that was used for 1st instance to produce snapshot). Rather strange and unexpected, but nice (minus one headache). 2. If all images in chain are removed, behaviour changed: * hard reboot works fine (raw disks) * reinstallation asks for new image, seems no problem * rescue causes ugly problem, rendering instance completely broken (do not work but no ERROR state). https://bugs.launchpad.net/nova/+bug/1418590 I didn't test migrations yet. On 02/05/2015 03:09 PM, George Shuklin wrote: Hello everyone. We are updating our public images regularly (to provide them to customers in up-to-date state). But there is a problem: If some instance starts from image it becomes 'used'. That means: * That image is used as _base for nova * If instance is reverted this image is used to recreate instance's disk * If instance is rescued this image is used as rescue base * It is redownloaded during resize/migration (on a new compute node) One more (our specific): We're using raw disks with _base on slow SATA drives (in comparison to fast SSD for disks), and if that SATA fails, we replace it (and nova redownloads stuff in _base). If image is deleted, it causes problems with nova (nova can't download _base). The second part of the problem: glance disallows to update image (upload new image with same ID), so we're forced to upload updated image with new ID and to remove the old one. This causes problems described above. And if tenant boots from own snapshot and removes snapshot without removing instance, it causes same problem even without our activity. How do you handle public image updates in your case? Thanks! ___ OpenStack-operators mailing list OpenStack-operators@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
Re: [Openstack-operators] Deprecation of in tree EC2 API in Nova for Kilo release
On 01/28/2015 09:56 PM, Sean Dague wrote: The following review for Kilo deprecates the EC2 API in Nova - https://review.openstack.org/#/c/150929/ There are a number of reasons for this. The EC2 API has been slowly rotting in the Nova tree, never was highly tested, implements a substantially older version of what AWS has, and currently can't work with any recent releases of the boto library (due to implementing extremely old version of auth). This has given the misunderstanding that it's a first class supported feature in OpenStack, which it hasn't been in quite sometime. Deprecating honestly communicates where we stand. There is a new stackforge project which is getting some activity now - https://github.com/stackforge/ec2-api. The intent and hope is that is the path forward for the portion of the community that wants this feature, and that efforts will be focused there. Comments are welcomed, but we've attempted to get more people engaged to address these issues over the last 18 months, and never really had anyone step up. Without some real maintainers of this code in Nova (and tests somewhere in the community) it's really no longer viable. I think, if we talking about 'mature openstack', first step of deprecation should be removal from sample configs and moving documentation for it to chapter 'obsolete functions'. At least one release it should be deprecated in documentation, not in the code. Next few releases should just mark it as deprecated, and just print warning in logs. And only after that it can be removed from code. To be honest I don't really like deprecation rate in Openstack. Compare to Linux motto: 'If it's used it is not deprecated'. I understand that developers hate old code, but from usability (operators) point of view, all stuff should just continue work as it is after upgrade. How many application stops working due 'obsolete syscall' after kernel update? (F.e. I see notices about deprecation of oom_adj for last 5 years - and it still ok to use). And look to the openstack! Half of the code is already deprecated, second halt is candidate to deprecation... From user point of view all openstack is just big bug big pile of changes. Half of older code does not work with neutron or work incorrectly (they expects simple nova networking). And what should I (as operator) say to user who complains that vagrant/fog code can not connect to networking and using local only network instead of internet? (It use any first network by uuid it receive). It is me guilty (who use neutron instead of nova-networks), is it vagrant wrong, is it fog wrong, is it user wrong? I think user is wrong. Wrong user with wrong money. Should go away. Deprecation: * no one use, no one notice, no one complains, one-two releases and it's gone. * If someone use it, it should be the same like cutting a leg. May be it is cancer. But if you can live with it - better not to cut. ___ OpenStack-operators mailing list OpenStack-operators@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
Re: [Openstack-operators] Small openstack
Hello. If we have two computes, and each is network node, that means both hosts a router. Let say we have two tenants with two instance and two compute hosts. Compute1-tenant1-instance1 compute2-tenant2-instance2 But neutron have no idea about this. Someone asks him 'put router to any l3-agent'. And it put router1 on compute2, and router2 on compute1. It will works. Until not. If compute1 goes down it not only affects instance1, but also will cause disruption for network services for tenant2. And there is no way to control l3 agent placement, no respect to availability zones, instance placement, aggregates, etc. On 01/29/2015 01:34 AM, Thomas Goirand wrote: On 12/20/2014 11:16 PM, George Shuklin wrote: do 'network node on compute' is kinda sad Why? Thomas ___ OpenStack-operators mailing list OpenStack-operators@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators ___ OpenStack-operators mailing list OpenStack-operators@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
Re: [Openstack-operators] Packaging sample config versions
Yes! Just had have discussion about this with my colleague yesterday. Seems be perfect solution. On 01/28/2015 12:00 AM, Tom Fifield wrote: Hi all, Based on Gustavo's excellent work below, talking with many ops, and after a brief chats with Jeremey and a few other TC folks, here's what I'd propose as an end goal: * A git repository that has raw, sample configs in it for each project that will be automagically updated * Raw configs distributed in the tar files we make as part of the release Does that seem acceptable for us all? Regards, Tom On 21/01/15 13:22, gustavo panizzo (gfa) wrote: On 12/18/2014 09:57 AM, Jeremy Stanley wrote: 4. Set up a service that periodically regenerates sample configuration and tracks it over time. This attempts to address the stated desire to be able to see how sample configurations change, but note that this is a somewhat artificial presentation since there are a lot of variables (described earlier) influencing the contents of such samples--any attempt to render it as a linear/chronological series could be misleading. i've setup a github repo where i dump sample config files for the projects that autogenerate them, because i know nothing about rpm build tools i only do it for debian and ubuntu packages. if you build your deb packages you can use my, very simple and basic, scripts to autogenerate the sample config files. the repo is here https://github.com/gfa/os-sample-configs i will happily move to osops or other community repo ___ OpenStack-operators mailing list OpenStack-operators@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators ___ OpenStack-operators mailing list OpenStack-operators@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
Re: [Openstack-operators] RHEL 7 / CentOS 7 instances losing their network gateway
How many network interfaces have your instance? If more than one - check settings for second network (subnet). It can have own dhcp settings which may mess up with routes for the main network. On 01/27/2015 06:08 PM, Joe Topjian wrote: Hello, I have run into two different OpenStack clouds where instances running either RHEL 7 or CentOS 7 images are randomly losing their network gateway. There's nothing in the logs that show any indication of why. There's no DHCP hiccup or anything like that. The gateway has just disappeared. If I log into the instance via another instance (so on the same subnet since there's no gateway), I can manually re-add the gateway and everything works... until it loses it again. One cloud is running Havana and the other is running Icehouse. Both are using nova-network and both are Ubuntu 12.04. On the Havana cloud, we decided to install the dnsmasq package from Ubuntu 14.04. This looks to have resolved the issue as this was back in November and I haven't heard an update since. However, we don't want to do that just yet on the Icehouse cloud. We'd like to understand exactly why this is happening and why updating dnsmasq resolves an issue that only one specific type of image is having. I can make my way around CentOS, but I'm not as familiar with it as I am with Ubuntu (especially CentOS 7). Does anyone know what change in RHEL7/CentOS7 might be causing this? Or does anyone have any other ideas on how to troubleshoot the issue? I currently have access to two instances in this state, so I'd be happy to act as remote hands and eyes. :) Thanks, Joe ___ OpenStack-operators mailing list OpenStack-operators@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators ___ OpenStack-operators mailing list OpenStack-operators@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
Re: [Openstack-operators] :document an OpenStack production environment
In my earlier days I had tried many formal schemes, but it always cause problems. For now I settle to following scheme: machine-used database (dns, chef, etc) for explit details like mac addresses, hardware, rack location, network communication. That database should be constantly used, not 'write only', otherwise everyone will starts to forget to update, and suddenly it will loose it authority over 'I wrote you about it in hipchat and than send you update via sms, and final version is in your other skype account'. Usually it some kind of 'work', or 'control panel', or chef data bags. All topological schemes should be hand written. Whiteboards is just perfect for that. Why? Because all tools, except pen/pencil/marker are restrain you, forcing to use terminology and linking type of that tool. Even inkscape is restricting, because you can not just 'undersubscribe' link, or draw funny spiral (here it goes somewhere...). And text in corporate wiki in free form. Yes, updates will change everything, but even after updates original picture and text will be precious, because they will say history and will help to debug strange issues with historical reasons. Corporate blogs are perfect place for updates and ideas for future update. Yes, it is a mess, but it is better than 'not enough information because of the format restrictions'. On 01/26/2015 03:45 PM, matt wrote: I really liked using sphinx for documentation back in the day, it has the benefit of being community compatible. I also enjoyed graphviz integration in sphinx for diagrams... and then there was templating gnuplots but i think I was probably considered a masochist on this front. at the very least management types did not like that they couldn't really edit our documentation. -matt On Mon, Jan 26, 2015 at 5:10 AM, George Shuklin george.shuk...@gmail.com mailto:george.shuk...@gmail.com wrote: We using chef to manage hosts. Data bags contains all data of all hosts. We keep hardware configuration and DC-wide-name in databags too. For the flowcharts we mostly use markers and whiteboard, sometime I sketch stuff in dia [1] or with wacom tablet in mypaint. [1] http://sourceforge.net/projects/dia-installer/ On 01/25/2015 04:15 PM, Daniel Comnea wrote: Hi all, Can anyone who runs Openstack in a production environment/ data center share how you document the whole infrastructure, what tools are used for drawing diagrams(i guess you need some pictures otherwise is hard to understand it :)), maybe even an inventory etc? Thanks, Dani P.S in the past - 10+ - i used to have maintain a red book but i suspect situation is different in 2015 ___ OpenStack-operators mailing list OpenStack-operators@lists.openstack.org mailto:OpenStack-operators@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators ___ OpenStack-operators mailing list OpenStack-operators@lists.openstack.org mailto:OpenStack-operators@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators ___ OpenStack-operators mailing list OpenStack-operators@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
Re: [Openstack-operators] :document an OpenStack production environment
We using chef to manage hosts. Data bags contains all data of all hosts. We keep hardware configuration and DC-wide-name in databags too. For the flowcharts we mostly use markers and whiteboard, sometime I sketch stuff in dia [1] or with wacom tablet in mypaint. [1] http://sourceforge.net/projects/dia-installer/ On 01/25/2015 04:15 PM, Daniel Comnea wrote: Hi all, Can anyone who runs Openstack in a production environment/ data center share how you document the whole infrastructure, what tools are used for drawing diagrams(i guess you need some pictures otherwise is hard to understand it :)), maybe even an inventory etc? Thanks, Dani P.S in the past - 10+ - i used to have maintain a red book but i suspect situation is different in 2015 ___ OpenStack-operators mailing list OpenStack-operators@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators ___ OpenStack-operators mailing list OpenStack-operators@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
Re: [Openstack-operators] Small openstack (part 2), distributed glance
Directions: nova-switch port, switch port - glance, glance-switch port (to swift). I assume traffic from switch to swift outside installation. Glance-api receive and send same amount of traffic. It sounds like a minor issue until you starts to count CPU IRQ time of network card (doubled compare to a single direction of traffic). Glance on compute will consume less CPU (because of high performance loopback). On 01/21/2015 07:20 PM, Michael Dorman wrote: This is great info, George. Could you explain the 3x snapshot transport under the traditional Glance setup, please? I understand that you have compute — glance, and glance — swift. But what’s the third transfer? Thanks! Mike On 1/21/15, 10:36 AM, George Shuklin george.shuk...@gmail.com wrote: Ok, news so far: It works like a magic. Nova have option [glance] host=127.0.0.1 And I do not need to cheat with endpoint resolving (my initial plan was to resolve glance endpoint to 127.0.0.1 with /etc/hosts magic). Normal glance-api reply to external clients requests (image-create/download/list/etc), and local glance-apis (per compute) are used to connect to swift. Glance registry works in normal mode (only on 'official' api servers). I don't see any reason why we should centralize all traffic to swift through special dedicated servers, investing in fast CPU and 10G links. With that solution CPU load on glance-api is distributed evenly on all compute nodes, and overall snapshot traffic (on ports) was cut down 3 times! Why I didn't thought about this earlier? On 01/16/2015 12:20 AM, George Shuklin wrote: Hello everyone. One more thing in the light of small openstack. I really dislike tripple network load caused by current glance snapshot operations. When compute do snapshot, it playing with files locally, than it sends them to glance-api, and (if glance API is linked to swift), glance sends them to swift. Basically, for each 100Gb disk there is 300Gb on network operations. It is specially painful for glance-api, which need to get more CPU and network bandwidth than we want to spend on it. So idea: put glance-api on each compute node without cache. To help compute to go to the proper glance, endpoint points to fqdn, and on each compute that fqdn is pointing to localhost (where glance-api is live). Plus normal glance-api on API/controller node to serve dashboard/api clients. I didn't test it yet. Any ideas on possible problems/bottlenecks? And how many glance-registry I need for this? ___ OpenStack-operators mailing list OpenStack-operators@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators ___ OpenStack-operators mailing list OpenStack-operators@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators ___ OpenStack-operators mailing list OpenStack-operators@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
Re: [Openstack-operators] Small openstack (part 2), distributed glance
Ok, news so far: It works like a magic. Nova have option [glance] host=127.0.0.1 And I do not need to cheat with endpoint resolving (my initial plan was to resolve glance endpoint to 127.0.0.1 with /etc/hosts magic). Normal glance-api reply to external clients requests (image-create/download/list/etc), and local glance-apis (per compute) are used to connect to swift. Glance registry works in normal mode (only on 'official' api servers). I don't see any reason why we should centralize all traffic to swift through special dedicated servers, investing in fast CPU and 10G links. With that solution CPU load on glance-api is distributed evenly on all compute nodes, and overall snapshot traffic (on ports) was cut down 3 times! Why I didn't thought about this earlier? On 01/16/2015 12:20 AM, George Shuklin wrote: Hello everyone. One more thing in the light of small openstack. I really dislike tripple network load caused by current glance snapshot operations. When compute do snapshot, it playing with files locally, than it sends them to glance-api, and (if glance API is linked to swift), glance sends them to swift. Basically, for each 100Gb disk there is 300Gb on network operations. It is specially painful for glance-api, which need to get more CPU and network bandwidth than we want to spend on it. So idea: put glance-api on each compute node without cache. To help compute to go to the proper glance, endpoint points to fqdn, and on each compute that fqdn is pointing to localhost (where glance-api is live). Plus normal glance-api on API/controller node to serve dashboard/api clients. I didn't test it yet. Any ideas on possible problems/bottlenecks? And how many glance-registry I need for this? ___ OpenStack-operators mailing list OpenStack-operators@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
[Openstack-operators] Small openstack (part 2), distributed glance
Hello everyone. One more thing in the light of small openstack. I really dislike tripple network load caused by current glance snapshot operations. When compute do snapshot, it playing with files locally, than it sends them to glance-api, and (if glance API is linked to swift), glance sends them to swift. Basically, for each 100Gb disk there is 300Gb on network operations. It is specially painful for glance-api, which need to get more CPU and network bandwidth than we want to spend on it. So idea: put glance-api on each compute node without cache. To help compute to go to the proper glance, endpoint points to fqdn, and on each compute that fqdn is pointing to localhost (where glance-api is live). Plus normal glance-api on API/controller node to serve dashboard/api clients. I didn't test it yet. Any ideas on possible problems/bottlenecks? And how many glance-registry I need for this? ___ OpenStack-operators mailing list OpenStack-operators@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
Re: [Openstack-operators] Small openstack
On 01/09/2015 09:25 PM, Kris G. Lindgren wrote: Also, If you are running this configuration you should be aware of the following bug: https://bugs.launchpad.net/neutron/+bug/1274034 And the corresponding fix: https://review.openstack.org/#/c/141130/ Basically - Neutron security group rules do nothing to protect against arp spoofing/poisoning from vm's. So its possible under a shared network configuration for a vm to arp for another vm's ip address and temporarily knock that vm offline. The above commit - which is still a WIP adds ebtable rules to allow neutron to filter protocols other than IP (eg arp). Thank you! I just done playing with private networks (as external networks) and start to tuning internet network. And I saw something strange when I was doing a pentest from one of the instance. I'm going to check each thing from list in the bug description. But I thought that security groups, antispoofing and other things are nova-driven? ___ OpenStack-operators mailing list OpenStack-operators@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
Re: [Openstack-operators] glance directory traversal bug and havana
Seems I was wrong. Thanks, I'll look at it again. On 01/08/2015 07:37 PM, Jesse Keating wrote: On 1/7/15 8:47 PM, George Shuklin wrote: I spend few hours trying to backport to Havana, but than I found, that Havana seems be immune to the bug. I'm not 100% sure, so someone else advised to look too. The bug was that icehouse+ accepts all supported schemas. Fix excludes 'bad' schemes. Although Havana have explicitly given list of accepted schemes for location field, and 'bad' schemes are not in it. Havana is certainly not immune. I was able to fetch content from the system fairly easily. Start with an updated glance client Modify it as listed in https://bugs.launchpad.net/glance/+bug/1400966/comments/6 $ glance image-create --disk-format raw --container-format bare $ glance image-update --size 700 image_id $ glance --os-image-api-version 2 location-add --url file:///etc/passwd $ glance image-download image_id That got me (some of) the contents of /etc/passwd. The patch I posted prevented this from happening. It blocks adding a location that is file:// based, but still allows other location adds that should be allowed. https://github.com/blueboxgroup/glance/commit/7ab98b72802de1d5695d35306e32293463977496 ___ OpenStack-operators mailing list OpenStack-operators@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
Re: [Openstack-operators] glance directory traversal bug and havana
I spend few hours trying to backport to Havana, but than I found, that Havana seems be immune to the bug. I'm not 100% sure, so someone else advised to look too. The bug was that icehouse+ accepts all supported schemas. Fix excludes 'bad' schemes. Although Havana have explicitly given list of accepted schemes for location field, and 'bad' schemes are not in it. On Jan 6, 2015 8:34 PM, Jesse Keating j...@bluebox.net wrote: Hopefully all of you have seen http://seclists.org/oss-sec/2015/q1/64 which is the glance v2 api directory traversal bug. Upstream has fixed master (kilo) and juno, but havana has not been fixed. We, unfortunately, have a few havana installs out there and we'd like to patch this ahead of our planned upgrade to Juno. I'm curious if anybody else out there is in the same situation and is working on backporting the glance patch. If not, I'll share the patch when I'm done, but if so I'd love to share in the work and help the effort. Cheers, and happy patching! -- -jlk ___ OpenStack-operators mailing list OpenStack-operators@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators ___ OpenStack-operators mailing list OpenStack-operators@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
[Openstack-operators] Small openstack
Hello. I've suddenly got request for small installation of openstack (about 3-5 computes). They need almost nothing (just a management panel to span simple instances, few friendly tennants), and I curious, is nova-network good solution for this? They don't want network node and do 'network node on compute' is kinda sad. (And one more: did anyone tried to put management stuff on compute node in mild production?) ___ OpenStack-operators mailing list OpenStack-operators@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
Re: [Openstack-operators] Packaging sample config versions
On 12/15/2014 10:49 AM, Thomas Goirand wrote: and ubuntu just put files in proper places without changing configs. Ahem... Ubuntu simply doesn't care much about config files. See what they ship for Nova and Cinder. I wouldn't say without changing configs in this case. We using chef for configuration, so ubuntu approach is better It's not better or worse, it's exactly the same as for Debian, as the Debian package will *never* change something you modified in a config file, as per Debian policy (if they do, then it's a bug you shall report to the tracker). Thank you. It's rather unexpected, but I'll take this in account for next installation. I'm not a big fan of ubuntu maintanance policy (they basically dropped it 2 month prior announced date), and I prefer use of debian where possible. Now I see it's ok with openstack too, and it's good. I think is some kind of implicit FUD, because I was absolutely sure that Canonical/RH packaging and Debian is far in the tail of the process. It is not true and I'm happy. Anyway, I'm ready to help but have no idea how (within my limits). Do you have any experience building 3rd party CIs on OpenStack infra? Nope. I've only done stuff with debian-jenkins-glue. But I have some experience on backporting patches from icehouse to havana (it still in production and still need fixes). I can research/fix something specific and local. ___ OpenStack-operators mailing list OpenStack-operators@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
Re: [Openstack-operators] Packaging sample config versions
On 12/13/2014 05:13 PM, Thomas Goirand wrote: If I can help somehow, I'm ready to do something, but What should I do, exactly? There's a lot that can be done. If you like working on CI stuff, then you could help me with building the package validation CI which I'm trying to (re-)work. All of this is currently inside the debian/juno of the openstack-meta-packages (in the openstack-tempest-ci package, which uses the openstack-deploy package). In the past, I saw *A LOT* of CIs, and most of them were written in a very dirty way. In fact, it's easy to write a CI, but it's very hard to write it well. I'm not saying my approach is perfect, but IMO it's moving toward the good direction. All CIs are dirty pile of bash scripts. Some of them have enough dirt to give birth to new life. Which in turn starts civilization, inventing computers and start doing own CI. For the moment, the packaged CI can do a full all-in-one deployment from scratch (starting with an empty VM), install and configure tempest, and run the Keystone tempest unit tests. I'm having issues with nova-compute using Qemu, and also the Neutron setup. But once that's fixed, I hope to be able to run most tempest tests. The next step will be to run on a multi-node setup. So, if you want to help on that, and as it seems you like doing CI stuff, you're more welcome to do so. Once we have this, then we could start building a repository with everything from trunk. And when that is done, starting the effort of building a 3rd party CI to do package validation on the gate. Your thoughts? Oops, I don't feel I can't respond on this in smart way. I'm do not know some of the stuff (like tempest). It's better if you give one concrete area to work (something of scale of normal issue from tracker). Btw: we talking about debian packages or ubuntu? They are differ - debian heavily relies on answers to debconfig, and ubuntu just put files in proper places without changing configs. We using chef for configuration, so ubuntu approach is better (when we starts doing openstack that was on of deciding factors between debian and ubuntu). ___ OpenStack-operators mailing list OpenStack-operators@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
Re: [Openstack-operators] cloud-init: ssh-keys for host changed after global reboot
We do not use config drive, only metadata server. I think it somehow related to non-working (for some time) metadata server and standalone instances booted without metadata (we reboot them after, but cloud-init still can mess up...) On 11/14/2014 04:21 AM, Abel Lopez wrote: Haven't seen that myself, I wonder if there is a conflict between cloud-init and libvirt_inject_key. Also curious if you're using the metadata api or config_drive. On Thursday, November 13, 2014, George Shuklin george.shuk...@gmail.com mailto:george.shuk...@gmail.com wrote: Hello. We had planned power outage for one of our OS installation (havana). After everything booted back, we found every instance has change it's own ssh key (server key ssh-server presents upon connection). Is this bug or feature? Someone saw that? Is any way to prevent this? Thanks! ___ OpenStack-operators mailing list OpenStack-operators@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators ___ OpenStack-operators mailing list OpenStack-operators@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
[Openstack-operators] cloud-init: ssh-keys for host changed after global reboot
Hello. We had planned power outage for one of our OS installation (havana). After everything booted back, we found every instance has change it's own ssh key (server key ssh-server presents upon connection). Is this bug or feature? Someone saw that? Is any way to prevent this? Thanks! ___ OpenStack-operators mailing list OpenStack-operators@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
Re: [Openstack-operators] floatin ip issue
I was wrong, sorry. Floatings assigned as /32 on external interface inside network namespace. The signle idea I have now - is try to remove all iptables with NAT (it's destructive up to moment of network node reboot or router delete/create), and check out if address will reply to ping. If 'yes' - means problems in routing/nat If 'no' - means problem are outside openstack router (external net, provider routing, etc). On 10/29/2014 06:23 PM, Paras pradhan wrote: Hi George, You mean .193 and .194 should be in the different subnets? 192.168.122.193/24 http://192.168.122.193/24 reserved from the allocation pool and 192.168.122.194/32 http://192.168.122.194/32 is the floating ip. Here are the outputs for the commands *neutron port-list --device-id=8725dd16-8831-4a09-ae98-6c5342ea501f * +--+--+---++ | id | name | mac_address | fixed_ips | +--+--+---++ | 6f835de4-c15b-44b8-9002-160ff4870643 | | fa:16:3e:85:dc:ee | {subnet_id: 0189699c-8ffc-44cb-aebc-054c8d6001ee, ip_address: 192.168.122.193} | | be3c4294-5f16-45b6-8c21-44b35247d102 | | fa:16:3e:72:ae:da | {subnet_id: d01a6522-063d-40ba-b4dc-5843177aab51, ip_address: 10.10.0.1} | +--+--+---++ *neutron floatingip-list* +--+--+-+--+ | id | fixed_ip_address | floating_ip_address | port_id | +--+--+-+--+ | 55b00e9c-5b79-4553-956b-e342ae0a430a | 10.10.0.9| 192.168.122.194 | 82bcbb91-827a-41aa-9dd9-cb7a4f8e7166 | +--+--+-+--+ *neutron net-list* +--+--+---+ | id | name | subnets | +--+--+---+ | dabc2c18-da64-467b-a2ba-373e460444a7 | demo-net | d01a6522-063d-40ba-b4dc-5843177aab51 10.10.0.0/24 http://10.10.0.0/24 | | ceaaf189-5b6f-4215-8686-fbdeae87c12d | ext-net | 0189699c-8ffc-44cb-aebc-054c8d6001ee 192.168.122.0/24 http://192.168.122.0/24 | +--+--+---+ *neutron subnet-list* +--+-+--++ | id | name | cidr | allocation_pools | +--+-+--++ | d01a6522-063d-40ba-b4dc-5843177aab51 | demo-subnet | 10.10.0.0/24 http://10.10.0.0/24 | {start: 10.10.0.2, end: 10.10.0.254} | | 0189699c-8ffc-44cb-aebc-054c8d6001ee | ext-subnet | 192.168.122.0/24 http://192.168.122.0/24 | {start: 192.168.122.193, end: 192.168.122.222} | +--+-+--++ P.S: External subnet is 192.168.122.0/24 http://192.168.122.0/24 and internal vm instance's subnet is 10.10.0.0/24 http://10.10.0.0/24 Thanks Paras. On Mon, Oct 27, 2014 at 5:51 PM, George Shuklin george.shuk...@gmail.com mailto:george.shuk...@gmail.com wrote: I don't like this: 15: qg-d351f21a-08: BROADCAST,UP,LOWER_UP mtu 1500 qdisc noqueue state UNKNOWN group default inet 192.168.122.193/24 http://192.168.122.193/24 brd 192.168.122.255 scope global qg-d351f21a-08 valid_lft forever preferred_lft forever inet 192.168.122.194/32 http://192.168.122.194/32 brd 192.168.122.194 scope global qg-d351f21a-08 valid_lft forever preferred_lft forever Why you got two IPs on same interface with different netmasks? I just rechecked it on our installations - it should not be happens. Next: or this is a bug, or this is uncleaned network node (lesser bug), or someone messing with neutron. Starts from neutron: show ports for router: neutron port-list --device-id=router-uuid-here neutron
Re: [Openstack-operators] floatin ip issue
I don't like this: 15: qg-d351f21a-08: BROADCAST,UP,LOWER_UP mtu 1500 qdisc noqueue state UNKNOWN group default inet 192.168.122.193/24 http://192.168.122.193/24 brd 192.168.122.255 scope global qg-d351f21a-08 valid_lft forever preferred_lft forever inet 192.168.122.194/32 http://192.168.122.194/32 brd 192.168.122.194 scope global qg-d351f21a-08 valid_lft forever preferred_lft forever Why you got two IPs on same interface with different netmasks? I just rechecked it on our installations - it should not be happens. Next: or this is a bug, or this is uncleaned network node (lesser bug), or someone messing with neutron. Starts from neutron: show ports for router: neutron port-list --device-id=router-uuid-here neutron floatingips-list neutron net-list neutron subnet-list (trim to related only) (and please mark again who is 'internet' and who is 'internal' ips, i'm kinda loosing in '192.168.*'. On 10/27/2014 04:47 PM, Paras pradhan wrote: *Yes it got its ip which is 192.168.122.194 in the paste below.* -- root@juno2:~# ip netns exec qrouter-34f3b828-b7b8-4f44-b430-14d9c5bd0d0c ip -4 a 1: lo: LOOPBACK,UP,LOWER_UP mtu 65536 qdisc noqueue state UNKNOWN group default inet 127.0.0.1/8 http://127.0.0.1/8 scope host lo valid_lft forever preferred_lft forever 14: qr-ac50d700-29: BROADCAST,UP,LOWER_UP mtu 1500 qdisc noqueue state UNKNOWN group default inet 50.50.50.1/24 http://50.50.50.1/24 brd 50.50.50.255 scope global qr-ac50d700-29 valid_lft forever preferred_lft forever 15: qg-d351f21a-08: BROADCAST,UP,LOWER_UP mtu 1500 qdisc noqueue state UNKNOWN group default inet 192.168.122.193/24 http://192.168.122.193/24 brd 192.168.122.255 scope global qg-d351f21a-08 valid_lft forever preferred_lft forever inet 192.168.122.194/32 http://192.168.122.194/32 brd 192.168.122.194 scope global qg-d351f21a-08 valid_lft forever preferred_lft forever --- *stdbuf -e0 -o0 ip net exec qrouter... /bin/bash give me the following * -- root@juno2:~# ifconfig loLink encap:Local Loopback inet addr:127.0.0.1 Mask:255.0.0.0 inet6 addr: ::1/128 Scope:Host UP LOOPBACK RUNNING MTU:65536 Metric:1 RX packets:2 errors:0 dropped:0 overruns:0 frame:0 TX packets:2 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:0 RX bytes:168 (168.0 B) TX bytes:168 (168.0 B) qg-d351f21a-08 Link encap:Ethernet HWaddr fa:16:3e:79:0f:a2 inet addr:192.168.122.193 Bcast:192.168.122.255 Mask:255.255.255.0 inet6 addr: fe80::f816:3eff:fe79:fa2/64 Scope:Link UP BROADCAST RUNNING MTU:1500 Metric:1 RX packets:2673 errors:0 dropped:0 overruns:0 frame:0 TX packets:112 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:0 RX bytes:205377 (205.3 KB) TX bytes:6537 (6.5 KB) qr-ac50d700-29 Link encap:Ethernet HWaddr fa:16:3e:7e:6d:f3 inet addr:50.50.50.1 Bcast:50.50.50.255 Mask:255.255.255.0 inet6 addr: fe80::f816:3eff:fe7e:6df3/64 Scope:Link UP BROADCAST RUNNING MTU:1500 Metric:1 RX packets:345 errors:0 dropped:0 overruns:0 frame:0 TX packets:1719 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:0 RX bytes:27377 (27.3 KB) TX bytes:164541 (164.5 KB) -- Thanks Paras. On Sat, Oct 25, 2014 at 3:18 AM, George Shuklin george.shuk...@gmail.com mailto:george.shuk...@gmail.com wrote: Check out if qrouter got floating inside network namespace (ip net exec qrouter... ip -4 a), or just bash in to it (stdbuf -e0 -o0 ip net exec qrouter... /bin/bash) and play with it like with normal server. On 10/24/2014 07:38 PM, Paras pradhan wrote: Hello, Assigned a floating ip to an instance. But I can't ping the instance. This instance can reach internet with no problem. But I can't ssh or icmp to this instance. Its not a security group issue. On my network node that runs l3, I can see qrouter. The extenel subnet looks like this: allocation-pool start=192.168.122.193,end=192.168.122.222 --disable-dhcp --gateway 192.168.122.1 192.168.122.0/24 http://192.168.122.0/24 I can ping 192.168.122.193 using: ip netns exec qrouter-34f3b828-b7b8-4f44-b430-14d9c5bd0d0c ping 192.168.122.193 but not 192.168.122.194 (which is the floating ip) Doing tcp dump on the interace that connects to the external world, I can see ICMP request but not reply from the interface : 11:36:40.360255 IP 192.168.122.1 192.168.122.194 http://192.168.122.194: ICMP echo request, id 2589, seq 312, length 64 11:36:41.360222 IP 192.168.122.1 192.168.122.194 http://192.168.122.194: ICMP echo request, id 2589, seq 313, length 64 Ideas? Thanks Paras
Re: [Openstack-operators] floatin ip issue
Check out if qrouter got floating inside network namespace (ip net exec qrouter... ip -4 a), or just bash in to it (stdbuf -e0 -o0 ip net exec qrouter... /bin/bash) and play with it like with normal server. On 10/24/2014 07:38 PM, Paras pradhan wrote: Hello, Assigned a floating ip to an instance. But I can't ping the instance. This instance can reach internet with no problem. But I can't ssh or icmp to this instance. Its not a security group issue. On my network node that runs l3, I can see qrouter. The extenel subnet looks like this: allocation-pool start=192.168.122.193,end=192.168.122.222 --disable-dhcp --gateway 192.168.122.1 192.168.122.0/24 http://192.168.122.0/24 I can ping 192.168.122.193 using: ip netns exec qrouter-34f3b828-b7b8-4f44-b430-14d9c5bd0d0c ping 192.168.122.193 but not 192.168.122.194 (which is the floating ip) Doing tcp dump on the interace that connects to the external world, I can see ICMP request but not reply from the interface : 11:36:40.360255 IP 192.168.122.1 192.168.122.194 http://192.168.122.194: ICMP echo request, id 2589, seq 312, length 64 11:36:41.360222 IP 192.168.122.1 192.168.122.194 http://192.168.122.194: ICMP echo request, id 2589, seq 313, length 64 Ideas? Thanks Paras. ___ OpenStack-operators mailing list OpenStack-operators@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators ___ OpenStack-operators mailing list OpenStack-operators@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators