Re: [Openstack-operators] regression testing before upgrade (operators view)

2017-08-31 Thread George Shuklin

Hello.

Thank you for reply.

In those days I've tested tempest and rally. I found rally more suited 
for my goals. Insofar I'd like to admit that it's too overloaded and 
when I start poking around I quickly stuck with 'verify' mode, which in 
turn stuck in our neutron configuration (we have no tenant-allocated 
networks, only public ones).


But after I found rally task start, and mangled samples a bit to fit to 
our needs, it really, really extremely close to what we want to have. I 
was disappointed by Ubuntu packaging (as usual) - their rally package 
broken and does not create any entry points at all. It worked much 
better in venv environment.


Probably I'll try to combine jenkins, grafana, kibana and rally 
together, may be even for periodic service validation.


Thank you for advice.

On 08/29/2017 11:34 PM, Boris Pavlovic wrote:

George,

(with reduction of load to normal levels),

Probably it's not the best idea just to run samples, they are called 
samples for the reason  ;)


Basically you can run same Rally task two times before/after upgrade 
and compare results (Rally has sort of trends support)

Usually what I have heard from Ops guys is next:

  * Run Rally on periodic basis
  * Convert data Rally DB -> ElasticSearch
  * Build Kibana/Grafana on top


FYI I am working on making above scenario work out of the box.

--

So can you provide some more details on what you are looking for? 
What's missing?



Best regards,
Boris Pavlovic

On Tue, Aug 29, 2017 at 2:24 AM, George Shuklin 
<george.shuk...@gmail.com <mailto:george.shuk...@gmail.com>> wrote:


Hello everyone.

Does someone do regression testing before performing upgrade
(within same major version)? How do you do this? Do you know any
tools for such tests? I started to research this area, and I see
three openstack-specific tools: rally (with reduction of load to
normal levels), tempest (can it be used by operators?) and granade.

If you use any of tools, how do you feel about them? Are they
worth time spent?



___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


[Openstack-operators] regression testing before upgrade (operators view)

2017-08-29 Thread George Shuklin

Hello everyone.

Does someone do regression testing before performing upgrade (within 
same major version)? How do you do this? Do you know any tools for such 
tests? I started to research this area, and I see three 
openstack-specific tools: rally (with reduction of load to normal 
levels), tempest (can it be used by operators?) and granade.


If you use any of tools, how do you feel about them? Are they worth time 
spent?



Thanks!


___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] Issue with snapshots of raw images

2017-03-14 Thread George Shuklin
If anyone curious about this bug: It was clearly linux-3.13  bug, issue 
was completely solved by moving to 4.4.


On 02/15/2017 08:03 PM, George Shuklin wrote:

Hello.

We've upgraded to mitaka, qemu 2.5/linux-3.13 and found that raw 
images now have BIG issues with snapshots.


Symptoms:

When snapshot process reach 'fallocated' blocks (see below) all IO in 
guest start lagging, including network IO. Windows starts loosing 
pings for very long period of time (~30-40 minutes), linux do this 
briefly (~500-700ms, but regularly).


Research so far:

All those symptoms may be resolved if every disk block is actually 
written on disk (dd if=disk of=disk conv=notrunc). If file has 
fallocated blocks, it will cause problem. If it has sparse hole it 
will cause problem too, but when preallocate_images = space, no any 
sparse holes are in the file.


Best way so far to distinguish 'bad' disk from 'good' is to use 
filefrag -v. For 'bad' disk it shows "unwritten" flag.


1. Any idea how to prevent this?

2. Any idea how to force nova to actually write images completely 
without using 'fallocate'?





___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] need input on log translations

2017-03-11 Thread George Shuklin
Whole idea with log translation is half-backed anyway. About the half of
important log messages contain output of things outside openstack. Libvirt,
ip, sudo, kernel, etc. In any i18n installation there going to be some
amount of untranslated messages. This kills whole idea of localization.

Modern operator ought to know English at 'technical reading' level anyway.
Therefore, localization does not achieve it goal, but cause pain instead:
search segmentation, slightly misleading translation (f.e. 'stream' and
'thread' both translate into Russian 'поток', which brings ambiguity),
different system may use slightly different translation, causing even more
mess.

As Russian speaker and openstack operator I definitely don't want to have
logs translation.

On Mar 10, 2017 4:42 PM, "Doug Hellmann"  wrote:

There is a discussion on the -dev mailing list about the i18n team
decision to stop translating log messages [1]. The policy change means
that we may be able to clean up quite a lot of "clutter" throughout the
service code, because without anyone actually translating the messages
there is no need for the markup code used to tag those strings.

If we do remove the markup from log messages, we will be effectively
removing "multilingual logs" as a feature. Given the amount of work
and code churn involved in the first roll out, I would not expect
us to restore that feature later.

Therefore, before we take what would almost certainly be an
irreversible action, we would like some input about whether log
message translations are useful to anyone. Please let us know if
you or your customers use them.

Thanks,
Doug

[1] http://lists.openstack.org/pipermail/openstack-dev/2017-
March/113365.html

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


[Openstack-operators] Issue with snapshots of raw images

2017-02-15 Thread George Shuklin

Hello.

We've upgraded to mitaka, qemu 2.5/linux-3.13 and found that raw images 
now have BIG issues with snapshots.


Symptoms:

When snapshot process reach 'fallocated' blocks (see below) all IO in 
guest start lagging, including network IO. Windows starts loosing pings 
for very long period of time (~30-40 minutes), linux do this briefly 
(~500-700ms, but regularly).


Research so far:

All those symptoms may be resolved if every disk block is actually 
written on disk (dd if=disk of=disk conv=notrunc). If file has 
fallocated blocks, it will cause problem. If it has sparse hole it will 
cause problem too, but when preallocate_images = space, no any sparse 
holes are in the file.


Best way so far to distinguish 'bad' disk from 'good' is to use filefrag 
-v. For 'bad' disk it shows "unwritten" flag.


1. Any idea how to prevent this?

2. Any idea how to force nova to actually write images completely 
without using 'fallocate'?



___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


[Openstack-operators] allowed_address_pairs for port in neutron

2017-01-23 Thread George Shuklin

Hello.

I'm trying to allow more than one IP on interface for tenant, but 
neutron (Mitaka) rejects my requests:


$ neutron port-update b59bc3bb-7d34-4fbb-8e55-a9f1c5c88411 
--allowed-address-pairs type=dict list=true ip_address=10.254.15.4


Unrecognized attribute(s) 'allowed_address_pairs'
Neutron server returns request_ids: 
['req-9168f1f4-6e78-42fb-8521-c69b1cfd4f67']


Is someone done this? Can you show your commands to neutron and name 
version you are using?



Thanks.


___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] Ironic with top-rack switches management

2017-01-17 Thread George Shuklin

On 01/04/2017 07:31 PM, Clint Byrum wrote:

Excerpts from George Shuklin's message of 2016-12-26 00:22:38 +0200:

Hello everyone.


Did someone actually made Ironic running with ToR (top rack switches)
under neutron in production? Which switch verdor/plugin (and OS version)
do you use? Do you have some switch configuration with parts outside of
Neutron reach? Is it worth spent efforts on integration, etc?


We had an experimental setup with Ironic and the OVN Neutron driver and
VTEP-capable switches (Juniper, I forget the model #, but Arista also has
models that fully support VTEP). It was able to boot baremetal nodes on
isolated L2's (including an isolated provisioning network). In theory this
would also allow VM<->baremetal L2 networking (and with kuryr, you could
get VM<->baremetal<->container working too). But we never proved this
definitively as we got tripped up on scheduling and hostmanager issues
running with ironic in one host agg and libvirt in another. I believe
these are solved, though I've not seen the documentation to prove it.

Few weeks later I can answer may own question.

Most of vendor drivers for Ironic suck. Some of them do not support 
baremetal ports, others have issues with own devices, or have no support 
for newer openstacks.
Nonetheless, there is a great 'networking_generic_switch' ML2 driver 
which can do everything needed to run Ironic with tenant networking. It 
so well-written, that adding new vendor is bearable task for average 
admin. Switch description is just ~15 lines of code with switch-specific 
configuration commands.


Ironic should be at least Newton to support multitenancy.

And it has plenty of bugs, most of which are obvious to fix, but show 
that no one ever done production deployment before (or done, but patched 
it by oneself and kept that patch out of public).

And one more question: Does Ironic support snapshotting of baremetal
servers? With some kind of agent/etc?


I think that's asking too much really. The point of baremetal is that
you _don't_ have any special agents between your workload and hardware.
Consider traditional backup strategies.


But we already have cloud-init in baremetal instances. Why it can't be a 
cloud-backup? Main advantage of openstack-based snapshots for baremetal 
is to have 'golden image' creation. You press button, and your server 
become image. And that image (with proper cloud-init) can boot as VM or 
as baremetal. Convergence at it highest point.


___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


[Openstack-operators] Ironic with top-rack switches management

2016-12-25 Thread George Shuklin

Hello everyone.


Did someone actually made Ironic running with ToR (top rack switches) 
under neutron in production? Which switch verdor/plugin (and OS version) 
do you use? Do you have some switch configuration with parts outside of 
Neutron reach? Is it worth spent efforts on integration, etc?


And one more question: Does Ironic support snapshotting of baremetal 
servers? With some kind of agent/etc?


Thanks.


___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] Using novaclient, glanceclient, etc, from python

2016-11-25 Thread George Shuklin
Em... Sorry, I'm trying to create_image. And it traces on duplicate 
images during creation process, not while passing image name to some 
'create instance' or 'delete image' functions.\


Or you want to say I need to pass uuid for new image in image_create() 
function? Never hear about such thing.


On 11/25/2016 12:48 PM, Ricardo Carrillo Cruz wrote:

That is expected.

The shade calls accept name_or_id param for a lot of methods for 
convenience.
In your case, as there are multiple images with the same name you 
should pass the ID of the image you want to use, otherwise

shade cannot guess it.

2016-11-25 11:42 GMT+01:00 George Shuklin <george.shuk...@gmail.com 
<mailto:george.shuk...@gmail.com>>:


shade fails if see too duplicate images in account.

o = shade.OpenStackCloud(**creds)
o.create_image(name=’killme’, filename=’/tmp/random_junk’, 
disk_format=’qcow2', container_format=’bare’, wait=True)

|Traceback (most recent call last): ... File
"/usr/lib/python2.7/dist-packages/shade/openstackcloud.py", line
2269, in create_image current_image = self.get_image(name) File
"/usr/lib/python2.7/dist-packages/shade/openstackcloud.py", line
1703, in get_image return _utils._get_entity(self.search_images,
name_or_id, filters) File
"/usr/lib/python2.7/dist-packages/shade/_utils.py", line 143, in
_get_entity "Multiple matches found for %s" % name_or_id)
shade.exc.OpenStackCloudException: Multiple matches found for killme|

On 11/18/2016 12:20 AM, Clint Byrum wrote:

You may find the 'shade' library a straight forward choice:

http://docs.openstack.org/infra/shade/
<http://docs.openstack.org/infra/shade/>

Excerpts from George Shuklin's message of 2016-11-17 20:17:08 +0200:

Hello.

I can't find proper documentation about how to use openstack clients
from inside python application. All I can find is just examples and
rather abstract (autogenerated) reference. Is there any normal
documentation about proper way to use openstack clients from python
applications?


Thanks.


___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
<mailto:OpenStack-operators@lists.openstack.org>
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
<http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators>


___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
<mailto:OpenStack-operators@lists.openstack.org>
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
<http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators>


___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] Using novaclient, glanceclient, etc, from python

2016-11-25 Thread George Shuklin

shade fails if see too duplicate images in account.

o = shade.OpenStackCloud(**creds)
o.create_image(name=’killme’, filename=’/tmp/random_junk’, disk_format=’qcow2', 
container_format=’bare’, wait=True)

|Traceback (most recent call last): ... File 
"/usr/lib/python2.7/dist-packages/shade/openstackcloud.py", line 2269, 
in create_image current_image = self.get_image(name) File 
"/usr/lib/python2.7/dist-packages/shade/openstackcloud.py", line 1703, 
in get_image return _utils._get_entity(self.search_images, name_or_id, 
filters) File "/usr/lib/python2.7/dist-packages/shade/_utils.py", line 
143, in _get_entity "Multiple matches found for %s" % name_or_id) 
shade.exc.OpenStackCloudException: Multiple matches found for killme|




On 11/18/2016 12:20 AM, Clint Byrum wrote:

You may find the 'shade' library a straight forward choice:

http://docs.openstack.org/infra/shade/

Excerpts from George Shuklin's message of 2016-11-17 20:17:08 +0200:

Hello.

I can't find proper documentation about how to use openstack clients
from inside python application. All I can find is just examples and
rather abstract (autogenerated) reference. Is there any normal
documentation about proper way to use openstack clients from python
applications?


Thanks.


___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators



___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] Using novaclient, glanceclient, etc, from python

2016-11-25 Thread George Shuklin

No.

I've tried to use openstacksdk to set up properties for image and there 
has been a zero information how to do this, and manual attempt to 
discover 'how to', I found it is broken. Bugreport is already 7 month 
old and there is no motion at all.


python-openstacksdk is broken and unusable.

https://medium.com/@george.shuklin/openstacksdk-unable-to-update-image-properties-191bffb670f2#.wa23wa7nm
Bug: https://bugs.launchpad.net/python-openstacksdk/+bug/1455620

On 11/17/2016 11:27 PM, Kostyantyn Volenbovskyi wrote:

Hi,

do you mean that information in [1] is not adequate to your needs?

BR,
Konstantin
[1] http://docs.openstack.org/user-guide/sdk.html


On Nov 17, 2016, at 7:17 PM, George Shuklin <george.shuk...@gmail.com> wrote:

Hello.

I can't find proper documentation about how to use openstack clients from 
inside python application. All I can find is just examples and rather abstract 
(autogenerated) reference. Is there any normal documentation about proper way 
to use openstack clients from python applications?


Thanks.


___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators




___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


[Openstack-operators] openstack sdk: how to update image (create image with properties)

2016-11-23 Thread George Shuklin

Hello.

I'm trying to use openstack SDK in my python code.

I want to upload image and set few properties. And I can't.

My code (without properties):

from openstack import connection
import os
con=connection.Connection(auth_url=os.environ['OS_AUTH_URL'], 
project_name = os.environ['OS_TENANT_NAME'], 
username=os.environ['OS_USERNAME'], password=os.environ['OS_PASSWORD']
con.image.upload_image(name='killme', data=file('/tmp/1','r'), 
disk_format="qcow2", container_format="bare")



with properties (few different attempts):

con.image.upload_image(name='killme', data=file('/tmp/1','r'), 
disk_format="qcow2", container_format="bare", foo="bar")  #ignored


con.image.upload_image(name='killme', data=file('/tmp/1','r'), 
disk_format="qcow2", container_format="bare", properties="foo=bar")  
#set property 'properties' to 'foo=bar'


con.image.upload_image(name='killme', data=file('/tmp/1','r'), 
disk_format="qcow2", container_format="bare", properties={"foo":"bar"})  
#return http error



How can I set properties for images via openstack SDK? This behavior is 
bug or feature?



___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


[Openstack-operators] Using novaclient, glanceclient, etc, from python

2016-11-17 Thread George Shuklin

Hello.

I can't find proper documentation about how to use openstack clients 
from inside python application. All I can find is just examples and 
rather abstract (autogenerated) reference. Is there any normal 
documentation about proper way to use openstack clients from python 
applications?



Thanks.


___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] Allow to investigate instance actions after instance deletion

2016-04-18 Thread George Shuklin

Yes, one more reason for upgrade.

Thank you!

On 04/13/2016 06:23 PM, Kris G. Lindgren wrote:

This spec/feature has already done on it and is committed:

https://review.openstack.org/#/q/topic:bp/os-instance-actions-read-deleted-instances

It landed in mitaka.

___
Kris Lindgren
Senior Linux Systems Engineer
GoDaddy

From: Dina Belova <dbel...@mirantis.com <mailto:dbel...@mirantis.com>>
Date: Wednesday, April 13, 2016 at 4:08 AM
To: George Shuklin <george.shuk...@gmail.com 
<mailto:george.shuk...@gmail.com>>
Cc: "openstack-operators@lists.openstack.org 
<mailto:openstack-operators@lists.openstack.org>" 
<openstack-operators@lists.openstack.org 
<mailto:openstack-operators@lists.openstack.org>>
Subject: Re: [Openstack-operators] Allow to investigate instance 
actions after instance deletion


George,

I really believe this can be processed via Ceilometer events. Events 
about all actions happened to instance are coming to Ceilometer.


Cheers,
Dina

On Wed, Apr 13, 2016 at 12:23 PM, George Shuklin 
<george.shuk...@gmail.com <mailto:george.shuk...@gmail.com>> wrote:


I filed a bug (feature request) about ability to see deleted
instances action list:
https://bugs.launchpad.net/nova/+bug/1569779
<https://bugs.launchpad.net/nova/+bug/1569779>

Any ideas?

I really want to see it like this:

I filed a bug (feature request) about ability to see deleted
instances action list:
https://bugs.launchpad.net/nova/+bug/1569779
<https://bugs.launchpad.net/nova/+bug/1569779>

Any ideas?

I really want to see it like this:


+---+--+-++
| Action| Request_ID  | Message | Start_Time 
   |


+---+--+-++
| create| req-31f61086-ce71-4e0a-9ef5-3d1bdd386043 | -   
   | 2015-05-26T12:09:54.00 |
| reboot| req-4632c799-a83e-489c-bb04-5ed4f47705af | -   
   | 2015-05-26T14:21:53.00 |
| stop  | req-120635d8-ef53-4237-b95a-7d15f00ab6bf | -   
   | 2015-06-01T08:46:03.00 |
| migrate   | req-bdd680b3-06d5-48e6-868b-d3e4dc17796a | -   
   | 2015-06-01T08:48:14.00 |
| confirmResize | req-a9af49d4-833e-404e-86ac-7d8907badd9e | -   
   | 2015-06-01T08:58:03.00 |
| start | req-5a2f5295-8b63-4cb7-84d9-dad1c6abf053 | -   
   | 2015-06-01T08:58:20.00 |
| delete| req----- | -   
   | 2016-04-01T00:00:00.00 |


+---+--+-++



___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


[Openstack-operators] Allow to investigate instance actions after instance deletion

2016-04-13 Thread George Shuklin
I filed a bug (feature request) about ability to see deleted instances 
action list: https://bugs.launchpad.net/nova/+bug/1569779


Any ideas?

I really want to see it like this:

I filed a bug (feature request) about ability to see deleted instances 
action list: https://bugs.launchpad.net/nova/+bug/1569779


Any ideas?

I really want to see it like this:

+---+--+-++
| Action| Request_ID   | Message | 
Start_Time |
+---+--+-++
| create| req-31f61086-ce71-4e0a-9ef5-3d1bdd386043 | -   | 
2015-05-26T12:09:54.00 |
| reboot| req-4632c799-a83e-489c-bb04-5ed4f47705af | -   | 
2015-05-26T14:21:53.00 |
| stop  | req-120635d8-ef53-4237-b95a-7d15f00ab6bf | -   | 
2015-06-01T08:46:03.00 |
| migrate   | req-bdd680b3-06d5-48e6-868b-d3e4dc17796a | -   | 
2015-06-01T08:48:14.00 |
| confirmResize | req-a9af49d4-833e-404e-86ac-7d8907badd9e | -   | 
2015-06-01T08:58:03.00 |
| start | req-5a2f5295-8b63-4cb7-84d9-dad1c6abf053 | -   | 
2015-06-01T08:58:20.00 |
| delete| req----- | -   | 
2016-04-01T00:00:00.00 |
+---+--+-++




___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] Live snapshots on the raw disks never ends

2015-09-30 Thread George Shuklin
Yes, we using swift as backend, and swift running as separate 
installation (own keystone, etc).


I can't find any logs about any problems with backend. They should be 
logged by glance?



On 09/29/2015 01:21 AM, David Wahlstrom wrote:

George,

What is your storage backend using (Gluster/ceph/local disk/etc)?  
Some of the distributed backend drivers have bugs in them or mask the 
real issue (such as watchers on objects).


On Thu, Sep 24, 2015 at 8:11 AM, Kris G. Lindgren 
<klindg...@godaddy.com <mailto:klindg...@godaddy.com>> wrote:


I believe I was talking to Josh Harlow (he's harlowja in
#openstack-operators on freenode) from Yahoo, about something like
this the other day.  He was saying that recently on a few
hypervisors they would randomly run into HV disks that were
completely full due to snapshots.  I have not personally ran into
this, so I can't be of more help.

___
Kris Lindgren
Senior Linux Systems Engineer
GoDaddy





On 9/24/15, 7:02 AM, "George Shuklin" <george.shuk...@gmail.com
<mailto:george.shuk...@gmail.com>> wrote:

>Hello everyone.
>
>Is someone ever saw 'endless snapshot' problem? Some instances
(with raw
>disks and live snapshoting enabled) are stuck at image_uploading
forever.
>
>It looks like this:
>

>+--+--+
>| Property | Value  
|


>+--+--+
>| status   | ACTIVE  
   |

>| updated  | 2015-07-16T08:07:00Z  |
>| OS-EXT-STS:task_state| image_uploading |
>| OS-EXT-SRV-ATTR:host | compute
  |
>| key_name | ses
  |

>| image| Ubuntu 14.04
(3736af94-b25e-4b8d-96fd-fd5949bbd81e)  |
>| OS-EXT-STS:vm_state  | active  
   |

>| OS-EXT-SRV-ATTR:instance_name| instance-000d |
>| OS-SRV-USG:launched_at   |
2015-05-09T17:28:09.00  |
>| OS-EXT-SRV-ATTR:hypervisor_hostname  | compute.lab.internal  |
>| flavor   | flavor2 (2) 
|

>| id   |
f2365fe4-9b30-4c24-b7b9-f7fcb4165160  |
>| security_groups  | [{u'name': u'default'}] 
|
>| OS-SRV-USG:terminated_at | None
 |

>| user_id  |
61096c639d674e4cb8bf487cec01432a  |
>| name | non-test
 |

>| created  | 2015-05-09T17:27:48Z  |
>...etc
>
>Any ideas why this happens? All logs are clear, no errors or
anything.
>And it happens at random so no 'debug' log available...
>
>___
>OpenStack-operators mailing list
>OpenStack-operators@lists.openstack.org
<mailto:OpenStack-operators@lists.openstack.org>
>http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
<mailto:OpenStack-operators@lists.openstack.org>
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators




--
David W.
Unix, because every barista in Seattle has an MCSE.


___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] KVM memory overcommit with fast swap

2015-07-03 Thread George Shuklin
One notice: Even on the super-super-fast SSD, there is a huge overhead 
on IO. Basically, you can't go lower than 50 us on IO, and this is 5 
ns, almost eternity for the modern processors. And you get minor page 
fault, which is not the fastest thing in the world. Few context 
switching, filesystem/block device level... And 50us - is the best 
possible. Normally you will have something like 150us, which is very slow.


It's ok to push to swap some unused  or rarely used part of the guests 
memory, but do not expect it to be silver bullet. Borderline between 
'normal swap operations' and 'thrashed system' is very blurry, and main 
symptom your guests will experience during overswapping is extreme raise 
of latency (everything: IO, networking...). And when this happens you 
will have no knobs to fix things... Even if you kill some of the guests, 
it will take up to 10 minutes to finish thrashing part of the swap and 
reduce congestion on IO.


In my experience, for average compute node no more than 20% of memory 
may be pushed to swap without significant consequences.


... And swap in the guests is better. Because guest may throw away few 
pages from cache, if needed. But host will swap guest page cache as 
well, as actual process memory. Allocate that SSD as ephemeral drive to 
guests and let them swap.


On 07/03/2015 11:19 AM, Blair Bethwaite wrote:

Damnit! So no-one has done this or has a feel for it?
I was really hoping for the lazy option here.

So next question. Ideas for convoluting a reasonable test case?
Assuming I've got a compute node with 256GB RAM and 350GB of PCIe SSD
for swap, what next? We've got Rally going so could potentially use
that, but I'm not sure whether it can do different tasks in parallel
in order to simulate a set of varied workloads... Ideally we'd want at
least these workloads happening in parallel:
- web servers
- db servers
- idle servers
- batch processing

On 30 June 2015 at 03:24, Warren Wang war...@wangspeed.com wrote:

I'm gonna forward this to my co-workers :) I've been kicking this idea
around for some time now, and it hasn't caught traction. I think it could
work for a modest overcommit, depending on the memory workload. We decided
that it should be possible to do this sanely, but that it needed testing.
I'm happy to help test this out. Sounds like the results could be part of a
Tokyo talk :P

Warren

Warren

On Mon, Jun 29, 2015 at 9:36 AM, Blair Bethwaite blair.bethwa...@gmail.com
wrote:

Hi all,

Question up-front:

Do the performance characteristics of modern PCIe attached SSDs
invalidate/challenge the old don't overcommit memory with KVM wisdom
(recently discussed on this list and at meetups and summits)? Has
anyone out there tried  tested this?

Long-form:

I'm currently looking at possible options for increasing virtual
capacity in a public/community KVM based cloud. We started very
conservatively at a 1:1 cpu allocation ratio, so perhaps predictably
we have boatloads of CPU headroom to work with. We also see maybe 50%
memory actually in-use on a host that is, from Nova's perspective,
more-or-less full.

The most obvious thing to do here is increase available memory. There
are at least three ways to achieve that:
1/ physically add RAM
2/ reduce RAM per vcore (i.e., introduce lower RAM flavors)
3/ increase virtual memory capacity (i.e., add swap) and make
ram_allocation_ratio  1

We're already doing a bit of #2, but at the end of the day, taking
away flavors and trying to change user behaviour is actually harder
than just upgrading hardware. #1 is ideal but I do wonder whether we'd
be better to spend that same money on some PCIe SSD and use it for #3
(at least for our 'standard' flavor classes), the advantage being that
SSD is cheaper per GB (and it might also help alleviate IOPs
starvation for local storage based hosts)...

The question is whether the performance characteristics of modern PCIe
attached SSDs invalidate the old don't overcommit memory with KVM
wisdom (recently discussed on this list:
http://www.gossamer-threads.com/lists/openstack/operators/46104 and
also apparently at the Kilo mid-cycle:
https://etherpad.openstack.org/p/PHL-ops-capacity-mgmt where there was
an action to update the default from 1.5 to 1.0, though that doesn't
seem to have happened). Has anyone out there tried this?

I'm also curious if anyone has any recent info re. the state of
automated memory ballooning and/or memory hotplug? Ideally a RAM
overcommitted host would try to inflate guest balloons before
swapping.

--
Cheers,
~Blairo

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators








___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] Allow user to see instances of other users

2015-06-11 Thread George Shuklin

Thank you!

You saved me a day of the work. Well, we'll move a script to admin user 
instead of normal user with the special role.


PS And thanks for filling a bugreport too.

On 06/11/2015 10:40 PM, Sławek Kapłoński wrote:

Hello,

I don't think it is possible because in nova/db/sqlalchemy/api.py in function
instance_get_all_by_filters You have something like:

if not context.is_admin:
 # If we're not admin context, add appropriate filter..
 if context.project_id:
 filters['project_id'] = context.project_id
 else:
 filters['user_id'] = context.user_id

This is from Juno, but in Kilo it is the same. So in fact even if You will set
proper policy.json rules it will still require admin context to search
instances from different tenants. Maybe I'm wrong and this is in some other
place possible and maybe someone will show me where because I was also looking
for it last time :)

--
Pozdrawiam / Best regards
Sławek Kapłoński
sla...@kaplonski.pl

Dnia czwartek, 11 czerwca 2015 21:06:31 George Shuklin pisze:

Hello.

I'm trying to allow a user with special role to see all instances of all
tenants without giving him admin privileges.

My initial attempt was to change policy.json for nova to
compute:get_all_tenants: role:special_role or is_admin:True.

But it didn't work well.

The command (nova list --all-tenants) is not failing anymore (no 'ERROR
(Forbidden): Policy doesn't allow compute:get_all_tenants to be
performed.'), but the returned list is empty:

nova list  --all-tenants
++--+++-+--+

| ID | Name | Status | Task State | Power State | Networks |

++--+++-+--+
++--+++-+--+


Any ideas how to allow a user without admin privileges to see all instances?



___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] Gentoo image availability

2015-06-09 Thread George Shuklin

On 06/09/2015 05:46 AM, Matthew Thode wrote:


Ya, not sure how to do multi-interface yet.  I'd love if the cloud-init
static ip support would work with it. (hash with macs being the key and
a list of IPs being the value for each interface).  Then dhcp can go
away (I tend to much prefer config-drive).

The disk-image-builder support is on my todo list already :D

I just updated the cloud-init ebuild with a better cloud.cfg, could
probably use more love, but it works.

I am working on getting gentoo as a first class citizen in
openstack-ansible as well, which depends on the disk-image-builder work.
  So much work still to do :D

Aw. Don't discriminate DHCP. It has many nice features (for example, if 
you add new interface to existing VM, cloud-init with static config will 
ignore it, but DHCP will works like magic).


I don't know how it works in Gentoo, but in Debian 'allow-hotplug' for 
all interfaces but eth0 allows to support most of the future interfaces. 
Same for CentOS - you can add few eth scripts to network configuration 
and they will works as soon as new interface appears.


___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] Gentoo image availability

2015-06-08 Thread George Shuklin

Nice to hear.

You're doing a great job!

Few things to make Gentoo 'first class citizen' for openstack (guest).

1. Check if you supports for all eth's, not only eth0. If instance boots 
with two or more interfaces, it should be able to get all it addresses.


2. Add Gentoo 'element'  to disk-image-builder 
(https://github.com/openstack/diskimage-builder)


3. Ship image with proper cloud-init cloud.cfg


On 06/08/2015 06:26 PM, Matthew Thode wrote:

Hi,

I'm the packager of Openstack on Gentoo and have just started generation
of Gentoo Openstack images.  Right now it is just a basic amd64 image,
but I plan on adding nomultilib and hardened variants (for a total of at
least 4 images).  I plan on generating these images at least weekly

These images are not yet sanctioned by our infra team, but I plan on
remedying that (being a member of said team should help).

I am currently using the scripts at
https://github.com/prometheanfire/gentoo-cloud-prep to generate the
images (based on a heavily modified version of Matt Vandermeulen's
scripts).  If you have any issues please submit bugs there or contact me
on irc (prometheanfire on freenode).

Here's the link to the images, I'm currently gpg signing them with the
same key I use to sign this email (offline master key smartcard setup
for security minded folk).

http://23.253.251.73/

Let me know if you have questions,



___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] 100% CPU and hangs if syslog is restarted

2015-05-29 Thread George Shuklin


On 05/28/2015 07:56 PM, George Shuklin wrote:

Hello.

Today we've discover a very serious bug in juno: 
https://bugs.launchpad.net/nova/+bug/1459726


In short: if you're using syslog, and restart rsyslog, all APIs 
processes will eventually stuck with 100% CPU usage without doing 
anything.


Is anyone hits this bug before? It looks like very nasty.


Just to let everyone to know: Update to proposed version of 
python-eventlet fixes the problem.


Proposed debs can be found here: 
https://launchpad.net/ubuntu/+source/python-eventlet


___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


[Openstack-operators] 100% CPU and hangs if syslog is restarted

2015-05-28 Thread George Shuklin

Hello.

Today we've discover a very serious bug in juno: 
https://bugs.launchpad.net/nova/+bug/1459726


In short: if you're using syslog, and restart rsyslog, all APIs 
processes will eventually stuck with 100% CPU usage without doing anything.


Is anyone hits this bug before? It looks like very nasty.

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] Fw: VM Stuck in Error State

2015-05-22 Thread George Shuklin
Enable debug in nova.conf on compute2 host, restart nova-compute and try 
again. You should see the reason in the log. It can be bad connection to 
glance, or problem with networking on the host.



On 05/22/2015 12:04 PM, Abhishek Talwar wrote:

Hi Folks,


I know this is not the place to ask usage questions and doubts 
regarding my deployment but no one is answering my questions on 
ask.openstack.org. So I thought of asking the same here.



Problem:


I have a multi-node set up with 2 compute nodes( *Compute1 and 
Compute2*). When I boot VM's on *compute 1* they go to *ACTIVE* status 
but when I try to boot VM's on *Compute2* host they are stuck in 
*BUILD* status. I can see both the compute nodes *UP* in “*nova 
service-list*” and both the host are present in “*nova 
hypervisor-list*” So what could be the reason that VM's are not going 
to ACTIVE status in Compute2 host.



The latest scheduler logs shows:

*nova.scheduler.host_manager [req-a4debdff-0ad5-4fb8-afd4-e0dc255c3c78 
None] Host filter forcing available hosts to compute1


*There is no error in the logs of scheduler so what could be the 
possible reason.*




Regards
Abhishek Talwar
*

=-=-=
Notice: The information contained in this e-mail
message and/or attachments to it may contain
confidential or privileged information. If you are
not the intended recipient, any dissemination, use,
review, distribution, printing or copying of the
information contained in this e-mail message
and/or attachments to it are strictly prohibited. If
you have received this communication in error,
please notify us by reply e-mail or telephone and
immediately and permanently delete the message
and any attachments. Thank you



___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] Raising the degree of the scandal

2015-05-22 Thread George Shuklin

On 05/17/2015 04:33 PM, Miguel Ángel Ajo wrote:

Probably the solution is not selected to be backported because:

  * It’s an intrusive change
  * Introduces new dependencies
  * Probably it’s going to introduce a performance penalty because 
eatables is slow.


I’m asking in reviews for this feature to be enabled/disabled via a flag.


I'm understand that. All neutron/ovs stuff is horrible in term of 
performance. But it makes compromise between security fix and 
'concerns'. I do not worry about problem, I worry about trading security 
for something else. If this is not fixed, why bother to talk about 
'supported/unsupported' versions of openstack?


Btw: If it is good enough for Libre, why it is not good enough for Kilo?
___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] Raising the degree of the scandal

2015-05-16 Thread George Shuklin

On 05/15/2015 07:48 PM, Jay Pipes wrote:

On 05/15/2015 12:38 PM, George Shuklin wrote:

Just to let everyone know: broken antispoofing is not an 'security
issue' and the fix is not planned to be backported to Juno/kilo.

https://bugs.launchpad.net/bugs/1274034

What can I say? All hail devstack! Who care about production?


George, I can understand you are frustrated with this issue and feel 
strongly about it. However, I don't think notes like this are all that 
productive.


Would a more productive action be to tell the operator community a bit 
about the vulnerability and suggest appropriate remedies to take?



Ok, sorry.

Short issue: If few tenants use same network (shared network) one tenant 
may disrupt network activities of other tenant by sending a specially 
crafted ARP packets on behave of the victim. Normally, Openstack 
prohibit usage of unauthorized addresses (this feature is called 
'antispoofing' and it is essential for multi-tenant clouds). This 
feature were subtly broken (malicious tenant may not use other addresses 
but still may disrupt activities of other tenants).


Finally, that bug has been fixed. But now they says 'oh, it is not that 
important, we will not backport it to current releases, only to 
Libery' because of new etables dependency.



___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


[Openstack-operators] Raising the degree of the scandal

2015-05-15 Thread George Shuklin
Just to let everyone know: broken antispoofing is not an 'security 
issue' and the fix is not planned to be backported to Juno/kilo.


https://bugs.launchpad.net/bugs/1274034

What can I say? All hail devstack! Who care about production?

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] Multiple vlan ranges on same physical interface [ml2]

2015-05-14 Thread George Shuklin

On 05/11/2015 11:23 AM, Kevin Benton wrote:


I apologize but I didn't quite follow what the issue was with tenants 
allocating networks in your use case, can you elaborate a bit there?


From what it sounded like, it seems like you could define the vlan 
range you want the tenants' internal networks to come from in the 
network_vlan_ranges.  Then any admin networks would just specify the 
segmentation id outside of that range. Why doesn't that work?




I (as admin) can use vlans outside of network_vlan_ranges in 
[ml2_type_vlan] section of ml2_conf.ini?


I've never tried...

Yes, I can!

Thank you.


Thanks,
Kevin Benton

On May 9, 2015 17:16, George Shuklin george.shuk...@gmail.com 
mailto:george.shuk...@gmail.com wrote:


Yes, that's result.

My plan was to allow 'internal' networks in neutron (by tenants
itself), but after some struggle we've decided to fallback to
'created by script during tenant bootstrapping'.

Unfortunately, neutron has no conception of 'default physical
segment' for VLAN autoallocation for tenant networks (it just
grabs first available).

On 05/09/2015 03:08 AM, Kevin Benton wrote:

So if you don't let tenants allocate networks, then why do the
VLAN ranges in neutron matter? It can just be part of your
net-create scripts.


On Fri, May 8, 2015 at 9:35 AM, George Shuklin
george.shuk...@gmail.com mailto:george.shuk...@gmail.com wrote:

We've got a bunch of business logic above openstack. It's
allocating VLANs on-fly for external networks and connect
pieces outside neutron (configuring hardware router, etc).

Anyway, after some research we've decided to completely ditch
idea of 'tenant networks'. All networks are external and
handled by our software with administrative rights.

All networks for tenant are created during tenant bootstrap,
including local networks which are now looking funny
'external local network without gateway'. By nailing every
moving part in 'neutron net-create' we've got stable
behaviour and kept allocation database inside our software.
That kills a huge part of openstack idea, but at least it
works straightforward and nice.

I really like to see all that been implemented in vendor
plugins for neutron, but average code and documentation
quality for them are below any usable level, so we implements
hw configuration by ourselves.


On 05/08/2015 09:15 AM, Kevin Benton wrote:

If one set of VLANs is for external networks which are
created by admins, why even specify network_vlan_ranges for
that set?

For example, even if network_vlan_ranges is
'local:1000:4000', you can still successfully run the
following as an admin:
neutron net-create --provider:network_type=vlan
--provider:physical_network=local
--provider:segmentation_id=40 myextnet --router:external

On Thu, May 7, 2015 at 7:32 AM, George Shuklin
george.shuk...@gmail.com mailto:george.shuk...@gmail.com
wrote:

Hello everyone.

Got a problem: we want to use same physical interface
for external networks and virtual (tenant) networks. All
inside vlans with different ranges.

My expected config was:

[ml2]
type_drivers = vlan
tenant_network_types = vlan
[ml2_type_vlan]
network_vlan_ranges = external:1:100,local:1000:4000
[ovs]
bridge_mappings = external:br-ex,local:br-ex

But it does not work:

ERROR
neutron.plugins.openvswitch.agent.ovs_neutron_agent [-]
Parsing bridge_mappings failed: Value br-ex in mapping:
'gp:br-ex' not unique. Agent terminated!

I understand that I can cheat and manually configure
bridge pile (br-ex and br-loc both plugged to br-real,
which linked to physical interface), but it looks very
fragile.

Is any nicer way to do this? And why ml2 (ovs plugin?)
does not allow to use mapping from many networks to one
bridge?

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
mailto:OpenStack-operators@lists.openstack.org

http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators




-- 
Kevin Benton



___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
mailto:OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators




-- 
Kevin Benton

Re: [Openstack-operators] Multiple vlan ranges on same physical interface [ml2]

2015-05-09 Thread George Shuklin

Yes, that's result.

My plan was to allow 'internal' networks in neutron (by tenants itself), 
but after some struggle we've decided to fallback to 'created by script 
during tenant bootstrapping'.


Unfortunately, neutron has no conception of 'default physical segment' 
for VLAN autoallocation for tenant networks (it just grabs first available).


On 05/09/2015 03:08 AM, Kevin Benton wrote:
So if you don't let tenants allocate networks, then why do the VLAN 
ranges in neutron matter? It can just be part of your net-create scripts.



On Fri, May 8, 2015 at 9:35 AM, George Shuklin 
george.shuk...@gmail.com mailto:george.shuk...@gmail.com wrote:


We've got a bunch of business logic above openstack. It's
allocating VLANs on-fly for external networks and connect pieces
outside neutron (configuring hardware router, etc).

Anyway, after some research we've decided to completely ditch idea
of 'tenant networks'. All networks are external and handled by our
software with administrative rights.

All networks for tenant are created during tenant bootstrap,
including local networks which are now looking funny 'external
local network without gateway'. By nailing every moving part in
'neutron net-create' we've got stable behaviour and kept
allocation database inside our software. That kills a huge part of
openstack idea, but at least it works straightforward and nice.

I really like to see all that been implemented in vendor plugins
for neutron, but average code and documentation quality for them
are below any usable level, so we implements hw configuration by
ourselves.


On 05/08/2015 09:15 AM, Kevin Benton wrote:

If one set of VLANs is for external networks which are created by
admins, why even specify network_vlan_ranges for that set?

For example, even if network_vlan_ranges is 'local:1000:4000',
you can still successfully run the following as an admin:
neutron net-create --provider:network_type=vlan
--provider:physical_network=local --provider:segmentation_id=40
myextnet --router:external

On Thu, May 7, 2015 at 7:32 AM, George Shuklin
george.shuk...@gmail.com mailto:george.shuk...@gmail.com wrote:

Hello everyone.

Got a problem: we want to use same physical interface for
external networks and virtual (tenant) networks. All inside
vlans with different ranges.

My expected config was:

[ml2]
type_drivers = vlan
tenant_network_types = vlan
[ml2_type_vlan]
network_vlan_ranges = external:1:100,local:1000:4000
[ovs]
bridge_mappings = external:br-ex,local:br-ex

But it does not work:

ERROR neutron.plugins.openvswitch.agent.ovs_neutron_agent [-]
Parsing bridge_mappings failed: Value br-ex in mapping:
'gp:br-ex' not unique. Agent terminated!

I understand that I can cheat and manually configure bridge
pile (br-ex and br-loc both plugged to br-real, which linked
to physical interface), but it looks very fragile.

Is any nicer way to do this? And why ml2 (ovs plugin?) does
not allow to use mapping from many networks to one bridge?

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
mailto:OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators




-- 
Kevin Benton



___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
mailto:OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators




--
Kevin Benton


___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


[Openstack-operators] Multiple vlan ranges on same physical interface [ml2]

2015-05-07 Thread George Shuklin

Hello everyone.

Got a problem: we want to use same physical interface for external 
networks and virtual (tenant) networks. All inside vlans with different 
ranges.


My expected config was:

[ml2]
type_drivers = vlan
tenant_network_types = vlan
[ml2_type_vlan]
network_vlan_ranges = external:1:100,local:1000:4000
[ovs]
bridge_mappings = external:br-ex,local:br-ex

But it does not work:

ERROR neutron.plugins.openvswitch.agent.ovs_neutron_agent [-] Parsing 
bridge_mappings failed: Value br-ex in mapping: 'gp:br-ex' not unique. 
Agent terminated!


I understand that I can cheat and manually configure bridge pile (br-ex 
and br-loc both plugged to br-real, which linked to physical interface), 
but it looks very fragile.


Is any nicer way to do this? And why ml2 (ovs plugin?) does not allow to 
use mapping from many networks to one bridge?


___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] Multiple vlan ranges on same physical interface [ml2]

2015-05-07 Thread George Shuklin

On 05/07/2015 06:17 PM, gustavo panizzo (gfa) wrote:


On 2015-05-07 22:32, George Shuklin wrote:

Hello everyone.

Got a problem: we want to use same physical interface for external
networks and virtual (tenant) networks. All inside vlans with different
ranges.

My expected config was:

[ml2]
type_drivers = vlan
tenant_network_types = vlan
[ml2_type_vlan]
network_vlan_ranges = external:1:100,local:1000:4000
[ovs]
bridge_mappings = external:br-ex,local:br-ex

that's wrong

you need something like

[ml2]
type_drivers = vlan
tenant_network_types = vlan
[ml2_type_vlan]
network_vlan_ranges = blabla:1:100
[ovs]
bridge_mappings = blabla:br-ex


neutron  net-create flat-network --provider:network-type flat
--provider:physical_network blabla

neutron  net-create vlanN --provider:network-type vlan
--provider:physical_network blabla --provider:segmentation_id N

...

neutron  net-create vlanN+nn --provider:network-type vlan
--provider:physical_network blabla --provider:segmentation_id N+nn


on each physical interface you can put one flat and up to 4096(?) vlans
but you can't define the same bridge_mapping twice



Thanks.

I wanted  to put tenant networks and external networks on the same 
network, but than I realised that there is no way to say neutron to 
avoid specific vlan_id's once you set up tenant_network_types=vlan and 
add vlan_id to the list of available for neutron.


It working fine until you allocating networks by yourself (as admin) but 
will allocate random segment/id for tenant (because tenant usually do 
not specify physical network) on request.


Sad. I'll stick to vlan for external and shared networks and put private 
networks back to the GRE.



___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] [neutron] multiple external networks on the same host NIC

2015-04-25 Thread George Shuklin

Can you put them to different vlans? After that it would be very easy task.

If not, AFAIK, neutron does not allow this.

Or you can trick it thinking it is (are) separate networks.

Create brige (br-join), plug eth to it.
Create to fake external bridges (br-ex1, br-ex2). Join them together to 
br-join by patch links 
(http://blog.scottlowe.org/2012/11/27/connecting-ovs-bridges-with-patch-ports/)


Instruct neutron like there is two external networks: one on br-ex1, 
second on br-ex2.


But be alert that this not very stable configuration, you need to 
maintain it by yourself.


On 04/25/2015 10:13 PM, Mike Spreitzer wrote:
Is there a way to create multiple external networks from Neutron's 
point of view, where both of those networks are accessed through the 
same host NIC?  Obviously those networks would be using different 
subnets.  I need this sort of thing because the two subnets are 
treated differently by the stuff outside of OpenStack, so I need a way 
that a tenant can get a floating IP of the sort he wants.  Since 
Neutron equates floating IP allocation pools with external networks, I 
need two external networks.


I found, for example, http://www.marcoberube.com/archives/248--- which 
describes how to have multiple external networks but uses a distinct 
host network interface for each one.


Thanks,
Mike


___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] over commit ratios

2015-04-22 Thread George Shuklin
Yes, it really depends on the used backing technique. We using SSDs and 
raw images, so IO is not an issue.


But memory is more important: if you lack IO capability you left with 
slow guests. If you lack memory you left with dead guests (hello, OOM 
killer).


BTW: Swap is needed not to swapin/swapout, but to relief memory 
pressure. With properly configured memory swin/swout  should be less 
than 2-3.


On 04/22/2015 09:49 AM, Tim Bell wrote:

I'd also keep an eye on local I/O... we've found this to be the resource which 
can cause the worst noisy neighbours. Swapping makes this worse.

Tim


-Original Message-
From: George Shuklin [mailto:george.shuk...@gmail.com]
Sent: 21 April 2015 23:55
To: openstack-operators@lists.openstack.org
Subject: Re: [Openstack-operators] over commit ratios

It's very depend on production type.

If you can control guests and predict their memory consumption, use it as base
for ratio.
If you can't (typical for public clouds) - use 1 or smaller with
reserved_host_memory_mb in nova.conf.

And one more: some swap sapce is really necessary. Add at least twice of
reserved_host_memory_mb - it really improves performance and prevents
strange OOMs in the situation of very large host with very small dom0 footprint.

On 04/21/2015 10:59 PM, Caius Howcroft wrote:

Just a general question: what kind of over commit ratios do people
normally run in production with?

We currently run 2 for cpu and 1 for memory (with some held back for
OS/ceph)

i.e.:
default['bcpc']['nova']['ram_allocation_ratio'] = 1.0
default['bcpc']['nova']['reserved_host_memory_mb'] = 1024 # often
larger default['bcpc']['nova']['cpu_allocation_ratio'] = 2.0

Caius



___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators



___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] Draft Agenda for the Vancouver Ops Summit Sessions

2015-04-13 Thread George Shuklin


On 04/13/2015 12:32 PM, Tom Fifield wrote:


What kind of  projects will be a sessions 'Architecture Show and Tell' 
and 'Architecture Show and Tell - Special Edition' about?


Thanks.


On 04/13/2015 12:32 PM, Tom Fifield wrote:

[cut]


_*General Sessions*_

Tuesday Big Room 1  Big Room 2  Big Room 3
11:15 - 11:55 	Ops Summit 101 / The Story So Far 	Federation - 
Keystone  other - what do people need? 	RabbitMQ
12:05 - 12:45 	How do we fix logging? 	Architecture Show and Tell 
Ceilometer - what needs fixing?

12:45 - 2:00



2:00 - 2:40 Billing / show back / charge back - how do I do that?
Architecture Show and Tell - Special EditionCinder Feedback
2:50 - 3:30 




[cut]
___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] MTU on router interface (neutron GRE) without jumbo

2015-03-18 Thread George Shuklin

Thanks!

But those examples are for same MTU for client and server. If we have

Client: 1500
router in the middle 1500
OVS/GRE: 1458
server: 1458

For tcp this is ok. But can it hurt somehow other protocols? UDP, RST, etc?

On 03/14/2015 08:52 PM, Joseph Bajin wrote:
The size of MTU only really matters for the server and client.   The 
between connections need to be larger than the packets that are being 
sent.


Scenario 1:
Server - 1400 MTU
Client - 1400  MTU
Switches - 9216 MTU
OVS - 1500 MTU

Result: Successful - Traffic passes without any issue

Scenario 2:
Server - 1520 MTU
Client - 1520  MTU
Switches - 1516 MTU
OVS - 1500 MTU

Result: Failure - Traffic will have issues passing through.

So just make sure everything in-between is higher than your server and 
client.


--Joe



On Fri, Mar 13, 2015 at 9:28 AM, George Shuklin 
george.shuk...@gmail.com mailto:george.shuk...@gmail.com wrote:


Hello.

We've hit badly changes in behaviour of OVS when we switched from
3.08 to 3.13 kernel. When runs on  3.11 or above, OVS starts to
use kernel GRE services. And they copy DNF (do not fragment) flag
from encapsulated packet to GRE packet. And this mess up all
things, because ICMP messages about dropped GRE never reach
neither source nor destination of underlying TCP.

We've fixed problems with MTU by using option for DHCP for
dnsmasq. This lower MTU inside instances. But there are routers
(router namespaces) and they are still using 1500 bytes MTU.

I feel like this can cause problems with some types of traffic,
when client (outside of openstack) sending DNF packets to instance
(via floating) and that packet is silently dropped.

1) Is those concerns have any real life implication? TCP should
take in account MTU on server and works smoothly, but other protocols?
2) Is there any way to lower MTU inside router namespace?

Thanks.

P.S. Jumbo frames is not an option due reasons outside of our reach.

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
mailto:OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators





___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] [Neutron][Nova] No Valid Host when booting new VM with Public IP

2015-03-18 Thread George Shuklin
check out if you allowed nova to use external networks. Somewhere around 
api-paste.ini, with 'external' in the name of permission. If nova 
compute rejects binding, it will rejects to start instance, and pass 
that error to nova-scheduler, and it will return 'NVHF'.


On 03/16/2015 10:52 PM, Adam Lawson wrote:
Got a strange error and I'm really hoping to get some help with it 
since it has be scratching my head.


When I create a VM within Horizon and select the PRIVATE network, it 
boots up great.
When I attempt to create a VM within Horizon and include the PUBLIC 
network (either by itself or with the private network), it fails with 
a No valid host found error.


I looked at the nova-api and the nova-scheduler logs on the controller 
and the most I've found are errors/warnings binding VIF's but I'm not 
100% certain it's the root cause although I believe it's related.


I didn't find any WARNINGS or ERRORS in the compute or network node.

Setup:

  * 1 physical host running 4 KVM domains/guests
  o 1x Controller
  o 1x Networ
  o 1x Volume
  o 1x Compute


*Controller Node:*
nova.conf (http://pastebin.com/q3e9cntH)

  * neutron.conf (http://pastebin.com/ukEVzBbN)
  * ml2_conf.ini (http://pastebin.com/w10jBGZC)
  * nova-api.log (http://pastebin.com/My99Mg2z)
  * nova-scheduler (http://pastebin.com/Nb75Z6yH)
  * neutron-server.log (http://pastebin.com/EQVQPVDF)


*Network Node:*

  * l3_agent.ini (http://pastebin.com/DBaD1F5x)
  * neutron.conf (http://pastebin.com/Bb3qkNi7)
  * ml2_conf.ini (http://pastebin.com/xEC1Bs9L)


*Compute Node:*

  * nova.conf (http://pastebin.com/K6SiE9Pw)
  * nova-compute.conf (http://pastebin.com/9Mz30b4v)
  * neutron.conf (http://pastebin.com/Le4wYRr4)
  * ml2_conf.ini (http://pastebin.com/nnyhC8mV)


*Back-end:*
Physical switch

Any thoughts on what could be causing this?
*/
Adam Lawson/*

AQORN, Inc.
427 North Tatnall Street
Ste. 58461
Wilmington, Delaware 19801-2230
Toll-free: (844) 4-AQORN-NOW ext. 101
International: +1 302-387-4660
Direct: +1 916-246-2072



___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] [Neutron][Nova] No Valid Host when booting new VM with Public IP

2015-03-18 Thread George Shuklin
We have that configuration and it works fine. Even better than L3 NAT on 
neutron routers.


Tenant's VM works perfect with external networks and white IPs, but you 
should make external network available on each compute node (ml2_conf.ini).


On 03/18/2015 07:29 PM, Adam Lawson wrote:
What I'm trying to do is force OpenStack to do something it normally 
doesn't do for the sake of learning and experimentation. I.e. bind a 
public network to a VM so it can be accessed outside the cloud when 
floating IP's are normally required. I know there are namespace issues 
at play which may prevent this from working, just trying to scope the 
boundaries of what I can and cannot do really.


*/
Adam Lawson/*

AQORN, Inc.
427 North Tatnall Street
Ste. 58461
Wilmington, Delaware 19801-2230
Toll-free: (844) 4-AQORN-NOW ext. 101
International: +1 302-387-4660
Direct: +1 916-246-2072


On Wed, Mar 18, 2015 at 7:08 AM, Pedro Sousa pgso...@gmail.com 
mailto:pgso...@gmail.com wrote:


Hi Adam

For external network you should use floating ips to access
externally to your instances if I understood correctly.

Regards

Em 16/03/2015 20:56, Adam Lawson alaw...@aqorn.com
mailto:alaw...@aqorn.com escreveu:

Got a strange error and I'm really hoping to get some help
with it since it has be scratching my head.

When I create a VM within Horizon and select the PRIVATE
network, it boots up great.
When I attempt to create a VM within Horizon and include the
PUBLIC network (either by itself or with the private network),
it fails with a No valid host found error.

I looked at the nova-api and the nova-scheduler logs on the
controller and the most I've found are errors/warnings binding
VIF's but I'm not 100% certain it's the root cause although I
believe it's related.

I didn't find any WARNINGS or ERRORS in the compute or network
node.

Setup:

  * 1 physical host running 4 KVM domains/guests
  o 1x Controller
  o 1x Networ
  o 1x Volume
  o 1x Compute


*Controller Node:*
nova.conf (http://pastebin.com/q3e9cntH)

  * neutron.conf (http://pastebin.com/ukEVzBbN)
  * ml2_conf.ini (http://pastebin.com/w10jBGZC)
  * nova-api.log (http://pastebin.com/My99Mg2z)
  * nova-scheduler (http://pastebin.com/Nb75Z6yH)
  * neutron-server.log (http://pastebin.com/EQVQPVDF)


*Network Node:*

  * l3_agent.ini (http://pastebin.com/DBaD1F5x)
  * neutron.conf (http://pastebin.com/Bb3qkNi7)
  * ml2_conf.ini (http://pastebin.com/xEC1Bs9L)


*Compute Node:*

  * nova.conf (http://pastebin.com/K6SiE9Pw)
  * nova-compute.conf (http://pastebin.com/9Mz30b4v)
  * neutron.conf (http://pastebin.com/Le4wYRr4)
  * ml2_conf.ini (http://pastebin.com/nnyhC8mV)


*Back-end:*
Physical switch

Any thoughts on what could be causing this?
*/
Adam Lawson/*

AQORN, Inc.
427 North Tatnall Street
Ste. 58461
Wilmington, Delaware 19801-2230
Toll-free: (844) 4-AQORN-NOW ext. 101
International: +1 302-387-4660 tel:%2B1%20302-387-4660
Direct: +1 916-246-2072 tel:%2B1%20916-246-2072


___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
mailto:OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators




___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


[Openstack-operators] MTU on router interface (neutron GRE) without jumbo

2015-03-13 Thread George Shuklin

Hello.

We've hit badly changes in behaviour of OVS when we switched from 3.08 
to 3.13 kernel. When runs on  3.11 or above, OVS starts to use kernel 
GRE services. And they copy DNF (do not fragment) flag from encapsulated 
packet to GRE packet. And this mess up all things, because ICMP messages 
about dropped GRE never reach neither source nor destination of 
underlying TCP.


We've fixed problems with MTU by using option for DHCP for dnsmasq. This 
lower MTU inside instances. But there are routers (router namespaces) 
and they are still using 1500 bytes MTU.


I feel like this can cause problems with some types of traffic, when 
client (outside of openstack) sending DNF packets to instance (via 
floating) and that packet is silently dropped.


1) Is those concerns have any real life implication? TCP should take in 
account MTU on server and works smoothly, but other protocols?

2) Is there any way to lower MTU inside router namespace?

Thanks.

P.S. Jumbo frames is not an option due reasons outside of our reach.

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] [Ceilometer] Real world experience with Ceilometer deployments - Feedback requested

2015-02-11 Thread George Shuklin

Ceilometer is in sad state.

1. Collector leaks memory. We ran it on same host with mongo, and it 
grab 29Gb out of 32, leaving mongo with less than gig memory available.
2. Metering agent cause huge load on neutron-server. o(n) of metering 
rules and tenants. Few bugs reported, one bugfix in review.
3. Metering agent simply do no work on multi-network-nodes installation. 
It exepects all routers be on same host. Fixed or not - I don't know, we 
have our own crude fix.
4. Many rough edges. Ceilometer much less tested than nova. Sometimes it 
traces and skip counting. Fresh example: if metadata has '.' in the 
name, ceilometer trace on it and did not count in glance usage.

5. Very slow on reports (using mongo's mapreduce).

Overall feeling: barely usable, but with my experience with cloud 
billings, not the worst thing I saw in my life.


About load: except reporting and memory leaks, it use rather small 
amount of resources.


On 02/11/2015 09:37 PM, Maish Saidel-Keesing wrote:

Is Ceilometer ready for prime time?

I would be interested in hearing from people who have deployed 
OpenStack clouds with Ceilometer, and their experience. Some of the 
topics I am looking for feedback on are:


- Database Size
- MongoDB management, Sharding, replica sets etc.
- Replication strategies
- Database backup/restore
- Overall useability
- Gripes, pains and problems (things to look out for)
- Possible replacements for Ceilometer that you have used instead


If you are willing to share - I am sure it will be beneficial to the 
whole community.


Thanks in Advance


With best regards,


Maish Saidel-Keesing
Platform Architect
Cisco




___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators



___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] How to handle updates of public images?

2015-02-07 Thread George Shuklin


On 02/07/2015 08:36 PM, Igor Bolotin wrote:
Going back to the idea of archiving images and not allowing launch of 
new VMs and hiding archived images by default in Horizon/CLI (maybe 
still can list/show if requested, possibly admin function only). Would 
it make sense to propose this as a blueprint for the next release?



Yes, it sounds nice.

But more important - I want to have '_base' gone away when raw disks are 
used in nova. Why it's needed?



___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] [openstack-dev] [Telco][NFV][infra] Review process of TelcoWG use cases

2015-02-06 Thread George Shuklin


On 02/06/2015 09:14 PM, Marcos Garcia wrote:


It does look like that.  However, the intent here is to allow 
non-developer
members of a Telco provide the use cases they need to accomplish. 
This way

the Telco WG can identify gaps and file a proper spec into each of the
OpenStack projects.
Indeed, what we're trying to do is help the non-developer members of 
the group articulate their use cases and tease them out to a level 
that is meaningful to someone who is not immersed in 
telecommunications themselves. In this way we hope to in turn be 
able to create meaningful specifications for the actual OpenStack 
projects impacted.


It's possible that some of these will be truly cross-project and 
therefore head to openstack-specs but initial indications seem to be 
that most will either be specific to a project, or cross only a 
couple of projects (e.g. nova and neutron) - I am sure someone will 
come up with some more exceptions to this statement to prove me 
wrong :).



Ok, I definitively out of telco business, and I indeed openstack 
operator. My first question: what you want to do, what problems you 
want to solve?


IMO most of the Telco's are asking Openstack developers to work in the 
following big areas (the first 3 are basically Carrier Grade):
- Performance on the virtualization layer (NUMA, etc) - get 
baremetal-like performance in big VM's
- QoS and capacity management - to get deterministic behavior, always 
the same regardless of the load
- Reliability (via HA, duplicate systems, live-migration, etc) - 
achieve 99'999% uptime,
- Management interfaces (OAM), compatible with their current OSS/BSS 
systems (i.e. SNMP traps, usage metering for billing)  - to don't 
reinvent the wheel, they have other things to manage too (i.e. legacy)


Most of this sounds really interesting for any operators. May be except 
of NUMA. Buy why telco want more performance? Few percent of loss for 
manageability - most companies accept this.


HA is achievable, QoS may be, duplication is ok. But of deterministic 
live migrations... Why telco want it? If system have own way to 
rebalance load, there is a more simple way: to terminate one instance 
and to buid new. Btw I really want to see deterministic way to fight 
with 'No valid hosts found'.


I was on one 'NVF' session in Paris, and I've expected it to be about 
SR-IOV and using VF (virtual functions) of network cards for guest 
acceleration. But instead it was something I just didn't got at all 
(sorry, Ericsson). So, what are you want to do? Not in terms of 
'business solution', but on very low level. Run some specific 
appliance? Add VoIP support to Neutron? Make something differ?


It's all about SLA's stablished by telco's customers: government, 
military and healthcare systems. SLA's are crazy there. And as an IT 
operators, you'll all understand those requirements, so it's really 
not that different compared to Telco operators.


Just remember that ETSI NFV is more than all that: you probably saw 
Ericsson speaking about high-level telco functions: MANO, VIM, EMS and 
VNFs, etc... that's beyond the scope of you guys, and probably outside 
the scope of all of the Openstack world.. that's why OPNFV exists.
I will be a bit skeptic. It will not work with current quality of the 
development process ('devstack syndrome').  I just done digging in yet 
another funny nova 'half-bug' around migration and what I see in the 
code is... to agile for high SLA systems. May be they (telcos) can 
really change this, and I really hope, but up to now... Thousands of 
loosly coupled systems with own bugs and world vision. Just today I 
found 'hanged' network interface (any operation with netsocket goes to 
'D' and can not be terminated) due ixgbe/netconsole bug. 99.99% in those 
conditions? I just do not believe. 
(https://www.mail-archive.com/e1000-devel@lists.sourceforge.net/msg10178.html)



About Ericcson's presentation - yes, I was inspired by details of 
previous Rackspace's presentation about depth of the shell/s 
openvswitch, and suddenly all around starts to talk foreign language.
___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


[Openstack-operators] How to handle updates of public images?

2015-02-05 Thread George Shuklin

Hello everyone.

We are updating our public images regularly (to provide them to 
customers in up-to-date state). But there is a problem: If some instance 
starts from image it becomes 'used'. That means:

* That image is used as _base for nova
* If instance is reverted this image is used to recreate instance's disk
* If instance is rescued this image is used as rescue base
* It is redownloaded during resize/migration (on a new compute node)

One more (our specific):
We're using raw disks with _base on slow SATA drives (in comparison to 
fast SSD for disks), and if that SATA fails, we replace it (and nova 
redownloads stuff in _base).


If image is deleted, it causes problems with nova (nova can't download 
_base).


The second part of the problem: glance disallows to update image (upload 
new image with same ID), so we're forced to upload updated image with 
new ID and to remove the old one. This causes problems described above. 
And if tenant boots from own snapshot and removes snapshot without 
removing instance, it causes same problem even without our activity.


How do you handle public image updates in your case?

Thanks!

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] How to handle updates of public images?

2015-02-05 Thread George Shuklin
Updated report for 'no image' with deleted '_base' behaviour in juno (my 
previous comment was about havana):


1. If snapshot is removed, original image is used (image that was used 
for 1st instance to produce snapshot). Rather strange and unexpected, 
but nice (minus one headache).

2. If all images in chain are removed, behaviour changed:
* hard reboot works fine (raw disks)
* reinstallation asks for new image, seems no problem
* rescue causes ugly problem, rendering instance completely broken (do 
not work but no ERROR state). https://bugs.launchpad.net/nova/+bug/1418590


I didn't test migrations yet.

On 02/05/2015 03:09 PM, George Shuklin wrote:

Hello everyone.

We are updating our public images regularly (to provide them to 
customers in up-to-date state). But there is a problem: If some 
instance starts from image it becomes 'used'. That means:

* That image is used as _base for nova
* If instance is reverted this image is used to recreate instance's disk
* If instance is rescued this image is used as rescue base
* It is redownloaded during resize/migration (on a new compute node)

One more (our specific):
We're using raw disks with _base on slow SATA drives (in comparison to 
fast SSD for disks), and if that SATA fails, we replace it (and nova 
redownloads stuff in _base).


If image is deleted, it causes problems with nova (nova can't download 
_base).


The second part of the problem: glance disallows to update image 
(upload new image with same ID), so we're forced to upload updated 
image with new ID and to remove the old one. This causes problems 
described above. And if tenant boots from own snapshot and removes 
snapshot without removing instance, it causes same problem even 
without our activity.


How do you handle public image updates in your case?

Thanks!



___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] Deprecation of in tree EC2 API in Nova for Kilo release

2015-01-29 Thread George Shuklin

On 01/28/2015 09:56 PM, Sean Dague wrote:

The following review for Kilo deprecates the EC2 API in Nova -
https://review.openstack.org/#/c/150929/

There are a number of reasons for this. The EC2 API has been slowly
rotting in the Nova tree, never was highly tested, implements a
substantially older version of what AWS has, and currently can't work
with any recent releases of the boto library (due to implementing
extremely old version of auth). This has given the misunderstanding that
it's a first class supported feature in OpenStack, which it hasn't been
in quite sometime. Deprecating honestly communicates where we stand.

There is a new stackforge project which is getting some activity now -
https://github.com/stackforge/ec2-api. The intent and hope is that is
the path forward for the portion of the community that wants this
feature, and that efforts will be focused there.

Comments are welcomed, but we've attempted to get more people engaged to
address these issues over the last 18 months, and never really had
anyone step up. Without some real maintainers of this code in Nova (and
tests somewhere in the community) it's really no longer viable.


I think, if we talking about 'mature openstack', first step of 
deprecation should be removal from sample configs and moving 
documentation for it to chapter 'obsolete functions'. At least one 
release it should be deprecated in documentation, not in the code. Next 
few releases should just mark it as deprecated, and just print warning 
in logs. And only after that it can be removed from code.


To be honest I don't really like deprecation rate in Openstack. Compare 
to Linux motto: 'If it's used it is not deprecated'. I understand that 
developers hate old code, but from usability (operators) point of view, 
all stuff should just continue work as it is after upgrade. How many 
application stops working due 'obsolete syscall' after kernel update? 
(F.e. I see notices about deprecation of oom_adj for last 5 years - and 
it still ok to use). And look to the openstack! Half of the code is 
already deprecated, second halt is candidate to deprecation...


From user point of view all openstack is just big bug big pile of 
changes. Half of older code does not work with neutron or work 
incorrectly (they expects simple nova networking). And what should I (as 
operator) say to user who complains that vagrant/fog code can not 
connect to networking and using local only network instead of internet? 
(It use any first network by uuid it receive). It is me guilty (who use 
neutron instead of nova-networks), is it vagrant wrong, is it fog wrong, 
is it user wrong? I think user is wrong. Wrong user with wrong money. 
Should go away.


Deprecation:
* no one use, no one notice, no one complains, one-two releases and it's 
gone.
* If someone use it, it should be the same like cutting a leg. May be it 
is cancer. But if you can live with it - better not to cut.


___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] Small openstack

2015-01-29 Thread George Shuklin

Hello.

If we have two computes, and each is network node, that means both hosts 
a router.


Let say we have two tenants with two instance and two compute hosts.

Compute1-tenant1-instance1
compute2-tenant2-instance2

But neutron have no idea about this. Someone asks him 'put router to any 
l3-agent'. And it put router1 on compute2, and router2 on compute1.


It will works. Until not. If compute1 goes down it not only affects 
instance1, but also will cause disruption for network services for tenant2.


And there is no way to control l3 agent placement, no respect to 
availability zones, instance placement, aggregates, etc.


On 01/29/2015 01:34 AM, Thomas Goirand wrote:

On 12/20/2014 11:16 PM, George Shuklin wrote:

do 'network node on compute' is kinda sad

Why?

Thomas


___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators



___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] Packaging sample config versions

2015-01-28 Thread George Shuklin

Yes!

Just had have discussion about this with my colleague yesterday.

Seems be perfect solution.

On 01/28/2015 12:00 AM, Tom Fifield wrote:

Hi all,

Based on Gustavo's excellent work below, talking with many ops, and
after a brief chats with Jeremey and a few other TC folks, here's what
I'd propose as an end goal:


* A git repository that has raw, sample configs in it for each project
that will be automagically updated

* Raw configs distributed in the tar files we make as part of the release


Does that seem acceptable for us all?


Regards,



Tom



On 21/01/15 13:22, gustavo panizzo (gfa) wrote:


On 12/18/2014 09:57 AM, Jeremy Stanley wrote:

4. Set up a service that periodically regenerates sample
configuration and tracks it over time. This attempts to address the
stated desire to be able to see how sample configurations change,
but note that this is a somewhat artificial presentation since there
are a lot of variables (described earlier) influencing the contents
of such samples--any attempt to render it as a linear/chronological
series could be misleading.


i've setup a github repo where i dump sample config files for the
projects that autogenerate them, because i know nothing about rpm build
tools i only do it for debian and ubuntu packages.

if you build your deb packages you can use my, very simple and basic,
scripts to autogenerate the sample config files.


the repo is here

https://github.com/gfa/os-sample-configs

i will happily move to osops or other community repo



___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators



___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] RHEL 7 / CentOS 7 instances losing their network gateway

2015-01-27 Thread George Shuklin
How many network interfaces have your instance? If more than one - check 
settings for second network (subnet). It can have own dhcp settings 
which may mess up with routes for the main network.


On 01/27/2015 06:08 PM, Joe Topjian wrote:

Hello,

I have run into two different OpenStack clouds where instances running 
either RHEL 7 or CentOS 7 images are randomly losing their network 
gateway.


There's nothing in the logs that show any indication of why. There's 
no DHCP hiccup or anything like that. The gateway has just disappeared.


If I log into the instance via another instance (so on the same subnet 
since there's no gateway), I can manually re-add the gateway and 
everything works... until it loses it again.


One cloud is running Havana and the other is running Icehouse. Both 
are using nova-network and both are Ubuntu 12.04.


On the Havana cloud, we decided to install the dnsmasq package from 
Ubuntu 14.04. This looks to have resolved the issue as this was back 
in November and I haven't heard an update since.


However, we don't want to do that just yet on the Icehouse cloud. We'd 
like to understand exactly why this is happening and why updating 
dnsmasq resolves an issue that only one specific type of image is having.


I can make my way around CentOS, but I'm not as familiar with it as I 
am with Ubuntu (especially CentOS 7). Does anyone know what change in 
RHEL7/CentOS7 might be causing this? Or does anyone have any other 
ideas on how to troubleshoot the issue?


I currently have access to two instances in this state, so I'd be 
happy to act as remote hands and eyes. :)


Thanks,
Joe


___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] :document an OpenStack production environment

2015-01-26 Thread George Shuklin
In my earlier days I had tried many formal schemes, but it always cause 
problems.


For now I settle to following scheme:

machine-used database (dns, chef, etc) for explit details like mac 
addresses, hardware, rack location, network communication. That database 
should be constantly used, not 'write only', otherwise everyone will 
starts to forget to update, and suddenly it will loose it authority over 
'I wrote you about it in hipchat and than send you update via sms, and 
final version is in your other skype account'. Usually it some kind of 
'work', or 'control panel', or chef data bags.


All topological schemes should be hand written. Whiteboards is just 
perfect for that. Why? Because all tools, except pen/pencil/marker are 
restrain you, forcing to use terminology and linking type of that tool. 
Even inkscape is restricting, because you can not just 'undersubscribe' 
link, or draw funny spiral (here it goes somewhere...).


And text in corporate wiki in free form. Yes, updates will change 
everything, but even after updates original picture and text will be 
precious, because they will say history  and will help to debug strange 
issues with historical reasons. Corporate blogs are perfect place for 
updates and ideas for future update.


Yes, it is a mess, but it is better than 'not enough information because 
of the format restrictions'.



On 01/26/2015 03:45 PM, matt wrote:
I really liked using sphinx for documentation back in the day, it has 
the benefit of being community compatible.  I also enjoyed graphviz 
integration in sphinx for diagrams... and then there was templating 
gnuplots


but i think I was probably considered a masochist on this front.  at 
the very least management types did not like that they couldn't really 
edit our documentation.


-matt

On Mon, Jan 26, 2015 at 5:10 AM, George Shuklin 
george.shuk...@gmail.com mailto:george.shuk...@gmail.com wrote:


We using chef to manage hosts. Data bags contains all data of all
hosts. We keep hardware configuration and DC-wide-name in databags
too.

For the flowcharts we mostly use markers and whiteboard, sometime
I sketch stuff in dia [1] or with wacom tablet in mypaint.

[1] http://sourceforge.net/projects/dia-installer/



On 01/25/2015 04:15 PM, Daniel Comnea wrote:

Hi all,

Can anyone who runs Openstack in a production environment/ data
center share how you document the whole infrastructure, what
tools are used for drawing diagrams(i guess you need some
pictures otherwise is hard to understand it :)), maybe even an
inventory etc?



Thanks,
Dani



P.S in the past - 10+ - i used to have maintain a red book but i
suspect situation is different in 2015


___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org  
mailto:OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators



___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
mailto:OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators




___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] :document an OpenStack production environment

2015-01-26 Thread George Shuklin
We using chef to manage hosts. Data bags contains all data of all hosts. 
We keep hardware configuration and DC-wide-name in databags too.


For the flowcharts we mostly use markers and whiteboard, sometime I 
sketch stuff in dia [1] or with wacom tablet in mypaint.


[1] http://sourceforge.net/projects/dia-installer/


On 01/25/2015 04:15 PM, Daniel Comnea wrote:

Hi all,

Can anyone who runs Openstack in a production environment/ data center 
share how you document the whole infrastructure, what tools are used 
for drawing diagrams(i guess you need some pictures otherwise is hard 
to understand it :)), maybe even an inventory etc?




Thanks,
Dani



P.S in the past - 10+ - i used to have maintain a red book but i 
suspect situation is different in 2015



___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] Small openstack (part 2), distributed glance

2015-01-21 Thread George Shuklin

Directions:
nova-switch port, switch port - glance, glance-switch port (to 
swift). I assume traffic from switch to swift outside installation.


Glance-api receive and send same amount of traffic. It sounds like a 
minor issue until you starts to count CPU IRQ time of network card 
(doubled compare to a single direction of traffic).


Glance on compute will consume less CPU (because of high performance 
loopback).


On 01/21/2015 07:20 PM, Michael Dorman wrote:

This is great info, George.

Could you explain the 3x snapshot transport under the traditional Glance
setup, please?

I understand that you have compute — glance, and glance — swift.  But
what’s the third transfer?

Thanks!
Mike





On 1/21/15, 10:36 AM, George Shuklin george.shuk...@gmail.com wrote:


Ok, news so far:

It works like a magic. Nova have option
[glance]
host=127.0.0.1

And I do not need to cheat with endpoint resolving (my initial plan was
to resolve glance endpoint to 127.0.0.1 with /etc/hosts magic). Normal
glance-api reply to external clients requests
(image-create/download/list/etc), and local glance-apis (per compute)
are used to connect to swift.

Glance registry works in normal mode (only on 'official' api servers).

I don't see any reason why we should centralize all traffic to swift
through special dedicated servers, investing in fast CPU and 10G links.

With that solution CPU load on glance-api is distributed evenly on all
compute nodes, and overall snapshot traffic (on ports) was cut down 3
times!

Why I didn't thought about this earlier?

On 01/16/2015 12:20 AM, George Shuklin wrote:

Hello everyone.

One more thing in the light of small openstack.

I really dislike tripple network load caused by current glance
snapshot operations. When compute do snapshot, it playing with files
locally, than it sends them to glance-api, and (if glance API is
linked to swift), glance sends them to swift. Basically, for each
100Gb disk there is 300Gb on network operations. It is specially
painful for glance-api, which need to get more CPU and network
bandwidth than we want to spend on it.

So idea: put glance-api on each compute node without cache.

To help compute to go to the proper glance, endpoint points to fqdn,
and on each compute that fqdn is pointing to localhost (where
glance-api is live). Plus normal glance-api on API/controller node to
serve dashboard/api clients.

I didn't test it yet.

Any ideas on possible problems/bottlenecks? And how many
glance-registry I need for this?


___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators



___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] Small openstack (part 2), distributed glance

2015-01-21 Thread George Shuklin

Ok, news so far:

It works like a magic. Nova have option
[glance]
host=127.0.0.1

And I do not need to cheat with endpoint resolving (my initial plan was 
to resolve glance endpoint to 127.0.0.1 with /etc/hosts magic). Normal 
glance-api reply to external clients requests 
(image-create/download/list/etc), and local glance-apis (per compute) 
are used to connect to swift.


Glance registry works in normal mode (only on 'official' api servers).

I don't see any reason why we should centralize all traffic to swift 
through special dedicated servers, investing in fast CPU and 10G links.


With that solution CPU load on glance-api is distributed evenly on all 
compute nodes, and overall snapshot traffic (on ports) was cut down 3 times!


Why I didn't thought about this earlier?

On 01/16/2015 12:20 AM, George Shuklin wrote:

Hello everyone.

One more thing in the light of small openstack.

I really dislike tripple network load caused by current glance 
snapshot operations. When compute do snapshot, it playing with files 
locally, than it sends them to glance-api, and (if glance API is 
linked to swift), glance sends them to swift. Basically, for each 
100Gb disk there is 300Gb on network operations. It is specially 
painful for glance-api, which need to get more CPU and network 
bandwidth than we want to spend on it.


So idea: put glance-api on each compute node without cache.

To help compute to go to the proper glance, endpoint points to fqdn, 
and on each compute that fqdn is pointing to localhost (where 
glance-api is live). Plus normal glance-api on API/controller node to 
serve dashboard/api clients.


I didn't test it yet.

Any ideas on possible problems/bottlenecks? And how many 
glance-registry I need for this?



___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


[Openstack-operators] Small openstack (part 2), distributed glance

2015-01-15 Thread George Shuklin

Hello everyone.

One more thing in the light of small openstack.

I really dislike tripple network load caused by current glance snapshot 
operations. When compute do snapshot, it playing with files locally, 
than it sends them to glance-api, and (if glance API is linked to 
swift), glance sends them to swift. Basically, for each 100Gb disk there 
is 300Gb on network operations. It is specially painful for glance-api, 
which need to get more CPU and network bandwidth than we want to spend 
on it.


So idea: put glance-api on each compute node without cache.

To help compute to go to the proper glance, endpoint points to fqdn, and 
on each compute that fqdn is pointing to localhost (where glance-api is 
live). Plus normal glance-api on API/controller node to serve 
dashboard/api clients.


I didn't test it yet.

Any ideas on possible problems/bottlenecks? And how many glance-registry 
I need for this?


___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] Small openstack

2015-01-09 Thread George Shuklin
On 01/09/2015 09:25 PM, Kris G. Lindgren wrote:
 Also, If you are running this configuration you should be aware of the
 following bug:

 https://bugs.launchpad.net/neutron/+bug/1274034

 And the corresponding fix: https://review.openstack.org/#/c/141130/

 Basically - Neutron security group rules do nothing to protect against arp
 spoofing/poisoning from vm's.  So its possible under a shared network
 configuration for a vm to arp for another vm's ip address and temporarily
 knock that vm offline.  The above commit - which is still a WIP adds
 ebtable rules to allow neutron to filter protocols other than IP (eg arp).
Thank you!

I just done playing with private networks (as external networks) and
start to tuning internet network. And I saw something strange when I was
doing a pentest from one of the instance. I'm going to check each thing
from list in the bug description.

But I thought that security groups, antispoofing and other things are
nova-driven?


___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] glance directory traversal bug and havana

2015-01-09 Thread George Shuklin

Seems I was wrong.

Thanks, I'll look at it again.

On 01/08/2015 07:37 PM, Jesse Keating wrote:

On 1/7/15 8:47 PM, George Shuklin wrote:

I spend few hours trying to backport to Havana, but than I found,  that
Havana seems be immune to the bug.  I'm not 100% sure, so someone else
advised to look too.

The bug was that icehouse+ accepts all supported schemas. Fix excludes
'bad' schemes. Although Havana have explicitly given list of accepted
schemes for location field, and 'bad' schemes are not in it.



Havana is certainly not immune. I was able to fetch content from the 
system fairly easily.


Start with an updated glance client

Modify it as listed in 
https://bugs.launchpad.net/glance/+bug/1400966/comments/6


$ glance image-create --disk-format raw --container-format bare

$ glance image-update --size 700 image_id

$ glance --os-image-api-version 2 location-add --url file:///etc/passwd

$ glance image-download image_id


That got me (some of) the contents of /etc/passwd.

The patch I posted prevented this from happening. It blocks adding a 
location that is file:// based, but still allows other location adds 
that should be allowed.


https://github.com/blueboxgroup/glance/commit/7ab98b72802de1d5695d35306e32293463977496 






___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] glance directory traversal bug and havana

2015-01-07 Thread George Shuklin
I spend few hours trying to backport to Havana, but than I found,  that
Havana seems be immune to the bug.  I'm not 100% sure, so someone else
advised to look too.

The bug was that icehouse+ accepts all supported schemas. Fix excludes
'bad' schemes. Although Havana have explicitly given list of accepted
schemes for location field, and 'bad' schemes are not in it.
On Jan 6, 2015 8:34 PM, Jesse Keating j...@bluebox.net wrote:

 Hopefully all of you have seen http://seclists.org/oss-sec/2015/q1/64
 which is the glance v2 api directory traversal bug. Upstream has fixed
 master (kilo) and juno, but havana has not been fixed.

 We, unfortunately, have a few havana installs out there and we'd like to
 patch this ahead of our planned upgrade to Juno. I'm curious if anybody
 else out there is in the same situation and is working on backporting the
 glance patch. If not, I'll share the patch when I'm done, but if so I'd
 love to share in the work and help the effort.

 Cheers, and happy patching!

 --
 -jlk

 ___
 OpenStack-operators mailing list
 OpenStack-operators@lists.openstack.org
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


[Openstack-operators] Small openstack

2014-12-20 Thread George Shuklin

Hello.

I've suddenly got request for small installation of openstack (about 3-5 
computes).


They need almost nothing (just a management panel to span simple 
instances, few friendly tennants), and I curious, is nova-network good 
solution for this? They don't want network node and do 'network node on 
compute' is kinda sad.


(And one more: did anyone tried to put management stuff on compute node 
in mild production?)


___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] Packaging sample config versions

2014-12-15 Thread George Shuklin

On 12/15/2014 10:49 AM, Thomas Goirand wrote:

and ubuntu just put files
in proper places without changing configs.

Ahem... Ubuntu simply doesn't care much about config files. See what
they ship for Nova and Cinder. I wouldn't say without changing configs
in this case.


We using chef for
configuration, so ubuntu approach is better

It's not better or worse, it's exactly the same as for Debian, as the
Debian package will *never* change something you modified in a config
file, as per Debian policy (if they do, then it's a bug you shall report
to the tracker).

Thank you.

It's rather unexpected, but I'll take this in account for next 
installation. I'm not a big fan of ubuntu maintanance policy (they 
basically dropped it 2 month prior announced date), and I prefer use of 
debian where possible. Now I see it's ok with openstack too, and it's 
good. I think is some kind of implicit FUD, because I was absolutely 
sure that Canonical/RH packaging and Debian is far in the tail of the 
process. It is not true and I'm happy.

Anyway, I'm ready to help but have no idea how (within my limits).

Do you have any experience building 3rd party CIs on OpenStack infra?

Nope. I've only done stuff with debian-jenkins-glue. But I have some 
experience on backporting patches from icehouse to havana (it still in 
production and still need fixes). I can research/fix something specific 
and local.


___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] Packaging sample config versions

2014-12-13 Thread George Shuklin

On 12/13/2014 05:13 PM, Thomas Goirand wrote:

If I can help somehow, I'm ready to do something, but What should I
do, exactly?

There's a lot that can be done. If you like working on CI stuff, then
you could help me with building the package validation CI which I'm
trying to (re-)work. All of this is currently inside the debian/juno of
the openstack-meta-packages (in the openstack-tempest-ci package, which
uses the openstack-deploy package).

In the past, I saw *A LOT* of CIs, and most of them were written in a
very dirty way. In fact, it's easy to write a CI, but it's very hard to
write it well. I'm not saying my approach is perfect, but IMO it's
moving toward the good direction.
All CIs are dirty pile of bash scripts. Some of them have enough dirt to 
give birth to new life. Which in turn starts  civilization, inventing 
computers and start doing own CI.



For the moment, the packaged CI can do a full all-in-one deployment from
scratch (starting with an empty VM), install and configure tempest, and
run the Keystone tempest unit tests. I'm having issues with nova-compute
using Qemu, and also the Neutron setup. But once that's fixed, I hope to
be able to run most tempest tests. The next step will be to run on a
multi-node setup.

So, if you want to help on that, and as it seems you like doing CI
stuff, you're more welcome to do so.

Once we have this, then we could start building a repository with
everything from trunk. And when that is done, starting the effort of
building a 3rd party CI to do package validation on the gate.

Your thoughts?

Oops, I don't feel I can't respond on this in smart way. I'm do not know 
some of the stuff (like tempest). It's better if you give one concrete 
area to work (something of scale of normal issue from tracker).


Btw: we talking about debian packages or ubuntu? They are differ - 
debian heavily relies on answers to debconfig, and ubuntu just put files 
in proper places without changing configs. We using chef for 
configuration, so ubuntu approach is better (when we starts doing 
openstack that was on of deciding factors between debian and ubuntu).


___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] cloud-init: ssh-keys for host changed after global reboot

2014-11-14 Thread George Shuklin
We do not use config drive, only metadata server. I think it somehow 
related to non-working (for some time) metadata server and standalone 
instances booted without metadata (we reboot them after, but cloud-init 
still can mess up...)


On 11/14/2014 04:21 AM, Abel Lopez wrote:
Haven't seen that myself, I wonder if there is a conflict between 
cloud-init and libvirt_inject_key. Also curious if you're using the 
metadata api or config_drive.


On Thursday, November 13, 2014, George Shuklin 
george.shuk...@gmail.com mailto:george.shuk...@gmail.com wrote:


Hello.

We had planned power outage for one of our OS installation
(havana). After everything booted back, we found every instance
has change it's own ssh key (server key ssh-server presents upon
connection).

Is this bug or feature? Someone saw that? Is any way to prevent this?

Thanks!

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators



___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


[Openstack-operators] cloud-init: ssh-keys for host changed after global reboot

2014-11-13 Thread George Shuklin

Hello.

We had planned power outage for one of our OS installation (havana). 
After everything booted back, we found every instance has change it's 
own ssh key (server key ssh-server presents upon connection).


Is this bug or feature? Someone saw that? Is any way to prevent this?

Thanks!

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] floatin ip issue

2014-10-31 Thread George Shuklin
I was wrong, sorry. Floatings assigned as /32 on external interface 
inside network namespace. The signle idea I have now - is try to remove 
all iptables with NAT (it's destructive up to moment of network node 
reboot or router delete/create), and check out if address will reply to 
ping.


If 'yes' - means problems in routing/nat
If 'no' - means problem are outside openstack router (external net, 
provider routing, etc).


On 10/29/2014 06:23 PM, Paras pradhan wrote:

Hi George,


You mean .193 and .194 should be in the different subnets? 
192.168.122.193/24 http://192.168.122.193/24 reserved  from the 
allocation pool and 192.168.122.194/32 http://192.168.122.194/32 is 
the floating ip.


Here are the outputs for the commands

*neutron port-list --device-id=8725dd16-8831-4a09-ae98-6c5342ea501f
*

+--+--+---++

| id   | name | mac_address   | 
fixed_ips   |


+--+--+---++

| 6f835de4-c15b-44b8-9002-160ff4870643 |  | fa:16:3e:85:dc:ee | 
{subnet_id: 0189699c-8ffc-44cb-aebc-054c8d6001ee, ip_address: 
192.168.122.193} |


| be3c4294-5f16-45b6-8c21-44b35247d102 |  | fa:16:3e:72:ae:da | 
{subnet_id: d01a6522-063d-40ba-b4dc-5843177aab51, ip_address: 
10.10.0.1}   |


+--+--+---++


*neutron floatingip-list*

+--+--+-+--+

| id   | fixed_ip_address | 
floating_ip_address | port_id   |


+--+--+-+--+

| 55b00e9c-5b79-4553-956b-e342ae0a430a | 10.10.0.9| 
192.168.122.194 | 82bcbb91-827a-41aa-9dd9-cb7a4f8e7166 |


+--+--+-+--+


*neutron net-list*

+--+--+---+

| id   | name | subnets   
|


+--+--+---+

| dabc2c18-da64-467b-a2ba-373e460444a7 | demo-net | 
d01a6522-063d-40ba-b4dc-5843177aab51 10.10.0.0/24 
http://10.10.0.0/24 |


| ceaaf189-5b6f-4215-8686-fbdeae87c12d | ext-net | 
0189699c-8ffc-44cb-aebc-054c8d6001ee 192.168.122.0/24 
http://192.168.122.0/24 |


+--+--+---+


*neutron subnet-list*

+--+-+--++

| id   | name   | cidr | 
allocation_pools   |


+--+-+--++

| d01a6522-063d-40ba-b4dc-5843177aab51 | demo-subnet | 10.10.0.0/24 
http://10.10.0.0/24 | {start: 10.10.0.2, end: 
10.10.0.254}   |


| 0189699c-8ffc-44cb-aebc-054c8d6001ee | ext-subnet  | 
192.168.122.0/24 http://192.168.122.0/24 | {start: 
192.168.122.193, end: 192.168.122.222} |


+--+-+--++


P.S: External subnet is 192.168.122.0/24 http://192.168.122.0/24 and 
internal vm instance's subnet is 10.10.0.0/24 http://10.10.0.0/24



Thanks

Paras.


On Mon, Oct 27, 2014 at 5:51 PM, George Shuklin 
george.shuk...@gmail.com mailto:george.shuk...@gmail.com wrote:



I don't like this:

15: qg-d351f21a-08: BROADCAST,UP,LOWER_UP mtu 1500 qdisc noqueue
state UNKNOWN group default
inet 192.168.122.193/24 http://192.168.122.193/24 brd
192.168.122.255 scope global qg-d351f21a-08
   valid_lft forever preferred_lft forever
inet 192.168.122.194/32 http://192.168.122.194/32 brd
192.168.122.194 scope global qg-d351f21a-08
   valid_lft forever preferred_lft forever

Why you got two IPs on same interface with different netmasks?

I just rechecked it on our installations - it should not be happens.

Next: or this is a bug, or this is uncleaned network node (lesser
bug), or someone messing with neutron.

Starts from neutron:

show ports for router:

neutron port-list --device-id=router-uuid-here
neutron

Re: [Openstack-operators] floatin ip issue

2014-10-27 Thread George Shuklin


I don't like this:

15: qg-d351f21a-08: BROADCAST,UP,LOWER_UP mtu 1500 qdisc noqueue state 
UNKNOWN group default
inet 192.168.122.193/24 http://192.168.122.193/24 brd 
192.168.122.255 scope global qg-d351f21a-08

   valid_lft forever preferred_lft forever
inet 192.168.122.194/32 http://192.168.122.194/32 brd 
192.168.122.194 scope global qg-d351f21a-08

   valid_lft forever preferred_lft forever

Why you got two IPs on same interface with different netmasks?

I just rechecked it on our installations - it should not be happens.

Next: or this is a bug, or this is uncleaned network node (lesser bug), 
or someone messing with neutron.


Starts from neutron:

show ports for router:

neutron port-list --device-id=router-uuid-here
neutron floatingips-list
neutron net-list
neutron subnet-list
(trim to related only)

(and please mark again who is 'internet' and who is 'internal' ips, i'm 
kinda loosing in '192.168.*'.



On 10/27/2014 04:47 PM, Paras pradhan wrote:

*Yes it got its ip which is 192.168.122.194 in the paste below.*

--

root@juno2:~# ip netns exec 
qrouter-34f3b828-b7b8-4f44-b430-14d9c5bd0d0c ip -4 a


1: lo: LOOPBACK,UP,LOWER_UP mtu 65536 qdisc noqueue state UNKNOWN 
group default


inet 127.0.0.1/8 http://127.0.0.1/8 scope host lo

   valid_lft forever preferred_lft forever

14: qr-ac50d700-29: BROADCAST,UP,LOWER_UP mtu 1500 qdisc noqueue 
state UNKNOWN group default


inet 50.50.50.1/24 http://50.50.50.1/24 brd 50.50.50.255 scope 
global qr-ac50d700-29


   valid_lft forever preferred_lft forever

15: qg-d351f21a-08: BROADCAST,UP,LOWER_UP mtu 1500 qdisc noqueue 
state UNKNOWN group default


inet 192.168.122.193/24 http://192.168.122.193/24 brd 
192.168.122.255 scope global qg-d351f21a-08


   valid_lft forever preferred_lft forever

inet 192.168.122.194/32 http://192.168.122.194/32 brd 
192.168.122.194 scope global qg-d351f21a-08


   valid_lft forever preferred_lft forever

---

*stdbuf -e0 -o0 ip net exec qrouter... /bin/bash give me the following
*


--


root@juno2:~# ifconfig

loLink encap:Local Loopback

  inet addr:127.0.0.1  Mask:255.0.0.0

  inet6 addr: ::1/128 Scope:Host

  UP LOOPBACK RUNNING  MTU:65536 Metric:1

  RX packets:2 errors:0 dropped:0 overruns:0 frame:0

  TX packets:2 errors:0 dropped:0 overruns:0 carrier:0

  collisions:0 txqueuelen:0

  RX bytes:168 (168.0 B)  TX bytes:168 (168.0 B)


qg-d351f21a-08 Link encap:Ethernet  HWaddr fa:16:3e:79:0f:a2

  inet addr:192.168.122.193 Bcast:192.168.122.255  
Mask:255.255.255.0


  inet6 addr: fe80::f816:3eff:fe79:fa2/64 Scope:Link

  UP BROADCAST RUNNING  MTU:1500 Metric:1

  RX packets:2673 errors:0 dropped:0 overruns:0 frame:0

  TX packets:112 errors:0 dropped:0 overruns:0 carrier:0

  collisions:0 txqueuelen:0

  RX bytes:205377 (205.3 KB)  TX bytes:6537 (6.5 KB)


qr-ac50d700-29 Link encap:Ethernet  HWaddr fa:16:3e:7e:6d:f3

  inet addr:50.50.50.1 Bcast:50.50.50.255  Mask:255.255.255.0

  inet6 addr: fe80::f816:3eff:fe7e:6df3/64 Scope:Link

  UP BROADCAST RUNNING  MTU:1500 Metric:1

  RX packets:345 errors:0 dropped:0 overruns:0 frame:0

  TX packets:1719 errors:0 dropped:0 overruns:0 carrier:0

  collisions:0 txqueuelen:0

  RX bytes:27377 (27.3 KB)  TX bytes:164541 (164.5 KB)

--


Thanks

Paras.



On Sat, Oct 25, 2014 at 3:18 AM, George Shuklin 
george.shuk...@gmail.com mailto:george.shuk...@gmail.com wrote:


Check out if qrouter got floating inside network namespace  (ip
net exec qrouter... ip -4 a), or just bash in to it (stdbuf -e0
-o0 ip net exec qrouter... /bin/bash) and play with it like with
normal server.



On 10/24/2014 07:38 PM, Paras pradhan wrote:

Hello,

Assigned a floating ip to an instance. But I can't ping the
instance. This instance can reach internet with no problem. But I
can't ssh or icmp to this instance. Its not a security group issue.

On my network node that runs l3, I can see qrouter. The extenel
subnet looks like this:

allocation-pool start=192.168.122.193,end=192.168.122.222
--disable-dhcp --gateway 192.168.122.1 192.168.122.0/24
http://192.168.122.0/24

I can ping 192.168.122.193 using: ip netns exec
qrouter-34f3b828-b7b8-4f44-b430-14d9c5bd0d0c ping 192.168.122.193

but not 192.168.122.194 (which is the floating ip)

Doing tcp dump on the interace that connects to the external
world, I can see ICMP request but not reply from the interface :


11:36:40.360255 IP 192.168.122.1  192.168.122.194
http://192.168.122.194: ICMP echo request, id 2589, seq 312,
length 64

11:36:41.360222 IP 192.168.122.1  192.168.122.194
http://192.168.122.194: ICMP echo request, id 2589, seq 313,
length 64


Ideas?

Thanks

Paras

Re: [Openstack-operators] floatin ip issue

2014-10-25 Thread George Shuklin
Check out if qrouter got floating inside network namespace  (ip net exec 
qrouter... ip -4 a), or just bash in to it (stdbuf -e0 -o0 ip net exec 
qrouter... /bin/bash) and play with it like with normal server.



On 10/24/2014 07:38 PM, Paras pradhan wrote:

Hello,

Assigned a floating ip to an instance. But I can't ping the instance. 
This instance can reach internet with no problem. But I can't ssh or 
icmp to this instance. Its not a security group issue.


On my network node that runs l3, I can see qrouter. The extenel subnet 
looks like this:


allocation-pool start=192.168.122.193,end=192.168.122.222 
--disable-dhcp --gateway 192.168.122.1 192.168.122.0/24 
http://192.168.122.0/24


I can ping 192.168.122.193 using: ip netns exec 
qrouter-34f3b828-b7b8-4f44-b430-14d9c5bd0d0c ping 192.168.122.193


but not 192.168.122.194 (which is the floating ip)

Doing tcp dump on the interace that connects to the external world, I 
can see ICMP request but not reply from the interface :



11:36:40.360255 IP 192.168.122.1  192.168.122.194 
http://192.168.122.194: ICMP echo request, id 2589, seq 312, length 64


11:36:41.360222 IP 192.168.122.1  192.168.122.194 
http://192.168.122.194: ICMP echo request, id 2589, seq 313, length 64



Ideas?

Thanks

Paras.



___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators