Re: [openstack-dev] [Openstack-operators] [nova] Does anyone rely on PUT /os-services/disable for non-compute services?

2017-06-13 Thread Kris G. Lindgren
I am fine with #2, and I am also fine with calling it a bug.  Since the 
enabled/disabled state for the other services didn’t actually do anything.


___
Kris Lindgren
Senior Linux Systems Engineer
GoDaddy

On 6/13/17, 8:46 PM, "Dan Smith"  wrote:

> Are we allowed to cheat and say auto-disabling non-nova-compute services
> on startup is a bug and just fix it that way for #2? :) Because (1) it
> doesn't make sense, as far as we know, and (2) it forces the operator to
> have to use the API to enable them later just to fix their nova
> service-list output.

Yes, definitely.

--Dan

___
OpenStack-operators mailing list
openstack-operat...@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [kolla-ansible] [kolla] Am I doing this wrong?

2017-01-23 Thread Kris G. Lindgren
Hi Paul,

Thanks for responding.

> The fact gathering on every server is a compromise taken by Kolla to
> work around limitations in Ansible. It works well for the majority of
> situations; for more detail and potential improvements on this please
> have a read of this post:
> http://lists.openstack.org/pipermail/openstack-dev/2016-November/107833.html

So my problem with this is the logging in to the compute nodes.  While this may 
be fine for a smaller deployment.  Logging into thousands, even hundreds, of 
nodes via ansible to gather facts, just to do a deployment against 2 or 3 of 
them is not tenable.  Additionally, in our higher audited environments 
(pki/pci) will cause our auditors heartburn.

> I'm not quite following you here, the config templates from
> kolla-ansible are one of it's stronger pieces imo, they're reasonably
> well tested and maintained. What leads you to believe they shouldn't be
> used?
>
> > * Certain parts of it are 'reference only' (the config tasks),
>  > are not recommended
>
> This is untrue - kolla-ansible is designed to stand up a stable and
> usable OpenStack 'out of the box'. There are definitely gaps in the
> operator type tasks as you've highlighted, but I would not call it
> ‘reference only'.

http://eavesdrop.openstack.org/irclogs/%23openstack-kolla/%23openstack-kolla.2017-01-09.log.html#t2017-01-09T21:33:15

This is where we were told the config stuff was “reference only”?

___
Kris Lindgren
Senior Linux Systems Engineer
GoDaddy
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [kolla-ansible] [kolla] Am I doing this wrong?

2017-01-20 Thread Kris G. Lindgren
Adding [kolla] tag.


___
Kris Lindgren
Senior Linux Systems Engineer
GoDaddy

From: "Kris G. Lindgren" <klindg...@godaddy.com>
Date: Friday, January 20, 2017 at 4:54 PM
To: "openstack-dev@lists.openstack.org" <openstack-dev@lists.openstack.org>
Cc: "openstack-operat...@lists.openstack.org" 
<openstack-operat...@lists.openstack.org>
Subject: Re: [kolla-ansible] Am I doing this wrong?

Poke.  Bueller?


___
Kris Lindgren
Senior Linux Systems Engineer
GoDaddy

From: "Kris G. Lindgren" <klindg...@godaddy.com>
Date: Tuesday, January 10, 2017 at 5:34 PM
To: "openstack-dev@lists.openstack.org" <openstack-dev@lists.openstack.org>
Subject: [kolla-ansible] Am I doing this wrong?

Hello Kolla/Kolla-ansible peoples.

I have been trying to take kolla/kolla-ansible and use it to start moving our 
existing openstack deployment into containers.  At the same time also trying to 
fix some of the problems that we created with our previous deployment work 
(everything was in puppet).  Where we had puppet doing *everything* which 
eventually created a system that effectively performed actions at a distance.  
As we were never really 100% what puppet was going to do when we ran it.  Even 
with NOOP mode enabled.  So taking an example of building and deploying glance 
via kolla-ansible. I am running into some problems/concerns and wanted to reach 
out to make sure that I am not missing something.

Things that I am noticing:
 * I need to define a number of servers in my inventory outside of the specific 
servers that I want to perform actions against.  I need to define groups 
baremetal, rabbitmq, memcached, and control (IN addition to the glance specific 
groups) most of these seem to be gathering information for config? (Baremetal 
was needed soley to try to run the bootstrap play).  Running a change 
specifically against "glance" causes fact gathering on a number of other 
servers not specifically where glance is running?  My concern here is that I 
want to be able to run kola-ansible against a specific service and know that 
only those servers are being logged into.

* I want to run a dry-run only, being able to see what will happen before it 
happens, not during; during makes it really hard to see what will happen until 
it happens. Also supporting  `ansible --diff` would really help in 
understanding what will be changed (before it happens).  Ideally, this wouldn’t 
be 100% needed.  But the ability to figure out what a run would *ACTUALLY* do 
on a box is what I was hoping to see.

* Database task are ran on every deploy and status of change DB permissions 
always reports as changed? Even when nothing happens, which makes you wonder 
"what changed"?  Seems like this is because the task either reports a 0 or a 1, 
where it seems like there is 3 states, did nothing, updated something, failed 
to do what was required.  Also, Can someone tell me why the DB stuff is done on 
a deployment task?  Seems like the db checks/migration work should only be done 
on a upgrade or a bootstrap?

* Database services (that at least we have) our not managed by our team, so 
don't want kolla-ansible touching those (since it won't be able to). No way to 
mark the DB as "externally managed"?  IE we dont have permissions to create 
databases or add users.  But we got all other permissions on the databases that 
are created, so normal db-manage tooling works.

* Maintenance level operations; doesn't seem to be any built-in to say 'take a 
server out  of a production state, deploy to it, test it, put it back into 
production'  Seems like if kola-ansible is doing haproxy for API's, it should 
be managing this?  Or an extension point to allow us to run our own 
maintenance/testing scripts?

* Config must come from kolla-ansible and generated templates.  I know we have 
a patch up for externally managed service configuration.  But if we aren't 
suppose to use kolla-ansible for generating configs (see below), why cant we 
override this piece?

Hard to determine what kolla-ansible *should* be used for:

* Certain parts of it are 'reference only' (the config tasks), some are not 
recommended
  to be used at all (bootstrap?); what is the expected parts of kolla-ansible 
people are
  actually using (and not just as a reference point); if parts of kolla-ansible 
are just
  *reference only* then might as well be really upfront about it and tell 
people how to
  disable/replace those reference pieces?

* Seems like this will cause everyone who needs to make tweaks to fork or 
create an "overlay" to override playbooks/tasks with specific functions?

Other questions:

Is kolla-ansibles design philosophy that every deployment is an upgrade?  Or 
every deployment should include all the base level boostrap tests?

Re: [openstack-dev] [kolla-ansible] Am I doing this wrong?

2017-01-20 Thread Kris G. Lindgren
Poke.  Bueller?


___
Kris Lindgren
Senior Linux Systems Engineer
GoDaddy

From: "Kris G. Lindgren" <klindg...@godaddy.com>
Date: Tuesday, January 10, 2017 at 5:34 PM
To: "openstack-dev@lists.openstack.org" <openstack-dev@lists.openstack.org>
Subject: [kolla-ansible] Am I doing this wrong?

Hello Kolla/Kolla-ansible peoples.

I have been trying to take kolla/kolla-ansible and use it to start moving our 
existing openstack deployment into containers.  At the same time also trying to 
fix some of the problems that we created with our previous deployment work 
(everything was in puppet).  Where we had puppet doing *everything* which 
eventually created a system that effectively performed actions at a distance.  
As we were never really 100% what puppet was going to do when we ran it.  Even 
with NOOP mode enabled.  So taking an example of building and deploying glance 
via kolla-ansible. I am running into some problems/concerns and wanted to reach 
out to make sure that I am not missing something.

Things that I am noticing:
 * I need to define a number of servers in my inventory outside of the specific 
servers that I want to perform actions against.  I need to define groups 
baremetal, rabbitmq, memcached, and control (IN addition to the glance specific 
groups) most of these seem to be gathering information for config? (Baremetal 
was needed soley to try to run the bootstrap play).  Running a change 
specifically against "glance" causes fact gathering on a number of other 
servers not specifically where glance is running?  My concern here is that I 
want to be able to run kola-ansible against a specific service and know that 
only those servers are being logged into.

* I want to run a dry-run only, being able to see what will happen before it 
happens, not during; during makes it really hard to see what will happen until 
it happens. Also supporting  `ansible --diff` would really help in 
understanding what will be changed (before it happens).  Ideally, this wouldn’t 
be 100% needed.  But the ability to figure out what a run would *ACTUALLY* do 
on a box is what I was hoping to see.

* Database task are ran on every deploy and status of change DB permissions 
always reports as changed? Even when nothing happens, which makes you wonder 
"what changed"?  Seems like this is because the task either reports a 0 or a 1, 
where it seems like there is 3 states, did nothing, updated something, failed 
to do what was required.  Also, Can someone tell me why the DB stuff is done on 
a deployment task?  Seems like the db checks/migration work should only be done 
on a upgrade or a bootstrap?

* Database services (that at least we have) our not managed by our team, so 
don't want kolla-ansible touching those (since it won't be able to). No way to 
mark the DB as "externally managed"?  IE we dont have permissions to create 
databases or add users.  But we got all other permissions on the databases that 
are created, so normal db-manage tooling works.

* Maintenance level operations; doesn't seem to be any built-in to say 'take a 
server out  of a production state, deploy to it, test it, put it back into 
production'  Seems like if kola-ansible is doing haproxy for API's, it should 
be managing this?  Or an extension point to allow us to run our own 
maintenance/testing scripts?

* Config must come from kolla-ansible and generated templates.  I know we have 
a patch up for externally managed service configuration.  But if we aren't 
suppose to use kolla-ansible for generating configs (see below), why cant we 
override this piece?

Hard to determine what kolla-ansible *should* be used for:

* Certain parts of it are 'reference only' (the config tasks), some are not 
recommended
  to be used at all (bootstrap?); what is the expected parts of kolla-ansible 
people are
  actually using (and not just as a reference point); if parts of kolla-ansible 
are just
  *reference only* then might as well be really upfront about it and tell 
people how to
  disable/replace those reference pieces?

* Seems like this will cause everyone who needs to make tweaks to fork or 
create an "overlay" to override playbooks/tasks with specific functions?

Other questions:

Is kolla-ansibles design philosophy that every deployment is an upgrade?  Or 
every deployment should include all the base level boostrap tests?

Because it seems to me that you have a required set of tasks that should only 
be done once (boot strap).  Another set of tasks that should be done for day to 
day care/feeding: service restarts, config changes, updates to code (new 
container deployments), package updates (new docker container deployment).  And 
a final set of tasks for upgrades where you will need to do things like db 
migrations and other special upgrade things.  It also seems like the day to day 
care and feeding tasks shou

[openstack-dev] [kolla-ansible] Am I doing this wrong?

2017-01-10 Thread Kris G. Lindgren
Hello Kolla/Kolla-ansible peoples.

I have been trying to take kolla/kolla-ansible and use it to start moving our 
existing openstack deployment into containers.  At the same time also trying to 
fix some of the problems that we created with our previous deployment work 
(everything was in puppet).  Where we had puppet doing *everything* which 
eventually created a system that effectively performed actions at a distance.  
As we were never really 100% what puppet was going to do when we ran it.  Even 
with NOOP mode enabled.  So taking an example of building and deploying glance 
via kolla-ansible. I am running into some problems/concerns and wanted to reach 
out to make sure that I am not missing something.

Things that I am noticing:
 * I need to define a number of servers in my inventory outside of the specific 
servers that I want to perform actions against.  I need to define groups 
baremetal, rabbitmq, memcached, and control (IN addition to the glance specific 
groups) most of these seem to be gathering information for config? (Baremetal 
was needed soley to try to run the bootstrap play).  Running a change 
specifically against "glance" causes fact gathering on a number of other 
servers not specifically where glance is running?  My concern here is that I 
want to be able to run kola-ansible against a specific service and know that 
only those servers are being logged into.

* I want to run a dry-run only, being able to see what will happen before it 
happens, not during; during makes it really hard to see what will happen until 
it happens. Also supporting  `ansible --diff` would really help in 
understanding what will be changed (before it happens).  Ideally, this wouldn’t 
be 100% needed.  But the ability to figure out what a run would *ACTUALLY* do 
on a box is what I was hoping to see.

* Database task are ran on every deploy and status of change DB permissions 
always reports as changed? Even when nothing happens, which makes you wonder 
"what changed"?  Seems like this is because the task either reports a 0 or a 1, 
where it seems like there is 3 states, did nothing, updated something, failed 
to do what was required.  Also, Can someone tell me why the DB stuff is done on 
a deployment task?  Seems like the db checks/migration work should only be done 
on a upgrade or a bootstrap?

* Database services (that at least we have) our not managed by our team, so 
don't want kolla-ansible touching those (since it won't be able to). No way to 
mark the DB as "externally managed"?  IE we dont have permissions to create 
databases or add users.  But we got all other permissions on the databases that 
are created, so normal db-manage tooling works.

* Maintenance level operations; doesn't seem to be any built-in to say 'take a 
server out  of a production state, deploy to it, test it, put it back into 
production'  Seems like if kola-ansible is doing haproxy for API's, it should 
be managing this?  Or an extension point to allow us to run our own 
maintenance/testing scripts?

* Config must come from kolla-ansible and generated templates.  I know we have 
a patch up for externally managed service configuration.  But if we aren't 
suppose to use kolla-ansible for generating configs (see below), why cant we 
override this piece?

Hard to determine what kolla-ansible *should* be used for:

* Certain parts of it are 'reference only' (the config tasks), some are not 
recommended
  to be used at all (bootstrap?); what is the expected parts of kolla-ansible 
people are
  actually using (and not just as a reference point); if parts of kolla-ansible 
are just
  *reference only* then might as well be really upfront about it and tell 
people how to
  disable/replace those reference pieces?

* Seems like this will cause everyone who needs to make tweaks to fork or 
create an "overlay" to override playbooks/tasks with specific functions?

Other questions:

Is kolla-ansibles design philosophy that every deployment is an upgrade?  Or 
every deployment should include all the base level boostrap tests?

Because it seems to me that you have a required set of tasks that should only 
be done once (boot strap).  Another set of tasks that should be done for day to 
day care/feeding: service restarts, config changes, updates to code (new 
container deployments), package updates (new docker container deployment).  And 
a final set of tasks for upgrades where you will need to do things like db 
migrations and other special upgrade things.  It also seems like the day to day 
care and feeding tasks should be incredibly targeted/explicit. For example, 
deploying a new glance container (not in an upgrade scenario).  I would expect 
it to login to the glance servers one at a time.  Place the server in 
maintenance mode to ensure that actions are not performed against it.  
Downloaded the new container.  Start the new container.  Test the new 
container, if successful, place the new container into rotation.  Stop the old 
container. 

Re: [openstack-dev] [glance][nova] Globally disabling hw_qemu_guest_agent support

2016-07-18 Thread Kris G. Lindgren
I also happened to be looking at this today and was wondering about this as 
well.  From the multi-places that talk about how to enable the qemu guest agent 
for quiescing drives during snapshots, they all have a warning that this should 
be enabled on trusted guests only. [1] [2] [3]  So, I am wondering has anyone 
actually solved any of the security issues called out in the tail end of [3]? 
It seems interesting that we would would make it so where the only flag that’s 
needed to enabled/disable this is done on the image metadata – which any users 
that is given permission to upload images can set.  Since this opens up a 
communication channel directly between the Untrusted (for most people running a 
cloud) vm and libvirt running on the HV.

[1] - 
https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/7/html/Virtualization_Deployment_and_Administration_Guide/chap-QEMU_Guest_Agent.html#idp9487712
 (see the warning directly the title)
[2] - http://wiki.libvirt.org/page/Qemu_guest_agent (see the last sentence)
[3] - http://wiki.qemu.org/Features/QAPI/GuestAgent (See the Security section)
___
Kris Lindgren
Senior Linux Systems Engineer
GoDaddy
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Openstack-operators] [nova] Rabbit-mq 3.4 crashing (anyone else seen this?)

2016-07-05 Thread Kris G. Lindgren
We tried some of these (well I did last night), but the issue was that 
eventually rabbitmq actually died.  I was trying some of the eval commands to 
try to get what was in the mgmt_db, bet any get-status call eventually lead to 
a timeout error.  Part of the problem is that we can go from a warning to a 
zomg out of memory in under 2 minutes.  Last night it was taking only 2 hours 
to chew thew 40GB of ram.  Messaging rates were in the 150-300/s which is not 
all that high (another cell is doing a constant 1k-2k).

___
Kris Lindgren
Senior Linux Systems Engineer
GoDaddy

From: Matt Fischer >
Date: Tuesday, July 5, 2016 at 11:25 AM
To: Joshua Harlow >
Cc: 
"openstack-dev@lists.openstack.org" 
>, 
OpenStack Operators 
>
Subject: Re: [Openstack-operators] [nova] Rabbit-mq 3.4 crashing (anyone else 
seen this?)


Yes! This happens often but I'd not call it a crash, just the mgmt db gets 
behind then eats all the memory. We've started monitoring it and have runbooks 
on how to bounce just the mgmt db. Here are my notes on that:

restart rabbitmq mgmt server - this seems to clear the memory usage.

rabbitmqctl eval 'application:stop(rabbitmq_management).'
rabbitmqctl eval 'application:start(rabbitmq_management).'

run GC on rabbit_mgmt_db:
rabbitmqctl eval '(erlang:garbage_collect(global:whereis_name(rabbit_mgmt_db)))'

status of rabbit_mgmt_db:
rabbitmqctl eval 'sys:get_status(global:whereis_name(rabbit_mgmt_db)).'

Rabbitmq mgmt DB how much memory is used:
/usr/sbin/rabbitmqctl status | grep mgmt_db

Unfortunately I didn't see that an upgrade would fix for sure and any settings 
changes to reduce the number of monitored events also require a restart of the 
cluster. The other issue with an upgrade for us is the ancient version of 
erlang shipped with trusty. When we upgrade to Xenial we'll upgrade erlang and 
rabbit and hope it goes away. I'll also probably tweak the settings on 
retention of events then too.

Also for the record the GC doesn't seem to help at all.

On Jul 5, 2016 11:05 AM, "Joshua Harlow" 
> wrote:
Hi ops and dev-folks,

We over at godaddy (running rabbitmq with openstack) have been hitting a issue 
that has been causing the `rabbit_mgmt_db` consuming nearly all the processes 
memory (after a given amount of time),

We've been thinking that this bug (or bugs?) may have existed for a while and 
our dual-version-path (where we upgrade the control plane and then 
slowly/eventually upgrade the compute nodes to the same version) has somehow 
triggered this memory leaking bug/issue since it has happened most prominently 
on our cloud which was running nova-compute at kilo and the other services at 
liberty (thus using the versioned objects code path more frequently due to 
needing translations of objects).

The rabbit we are running is 3.4.0 on CentOS Linux release 7.2.1511 with kernel 
3.10.0-327.4.4.el7.x86_64 (do note that upgrading to 3.6.2 seems to make the 
issue go away),

# rpm -qa | grep rabbit

rabbitmq-server-3.4.0-1.noarch

The logs that seem relevant:

```
**
*** Publishers will be blocked until this alarm clears ***
**

=INFO REPORT 1-Jul-2016::16:37:46 ===
accepting AMQP connection <0.23638.342> 
(127.0.0.1:51932 -> 
127.0.0.1:5671)

=INFO REPORT 1-Jul-2016::16:37:47 ===
vm_memory_high_watermark clear. Memory used:29910180640 allowed:47126781542
```

This happens quite often, the crashes have been affecting our cloud over the 
weekend (which made some dev/ops not so happy especially due to the july 4th 
mini-vacation),

Looking to see if anyone else has seen anything similar?

For those interested this is the upstream bug/mail that I'm also seeing about 
getting confirmation from the upstream users/devs (which also has erlang crash 
dumps attached/linked),

https://groups.google.com/forum/#!topic/rabbitmq-users/FeBK7iXUcLg

Thanks,

-Josh

___
OpenStack-operators mailing list
openstack-operat...@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [ironic] using ironic as a replacement for existing datacenter baremetal provisioning

2016-06-07 Thread Kris G. Lindgren
Replying to a digest so sorry for the copy and pastes….


>> There's also been discussion of ways we could do ad-hoc changes in RAID 
>> level,
>> based on flavor metadata, during the provisioning process (rather than ahead 
>> of
>> time) but no code has been done for this yet, AFAIK.
>
> I'm still pretty interested in it, because I agree with anything said
> above about building RAID ahead-of-time not being convenient. I don't
> quite understand how such a feature would look like, we might add it as
> a topic for midcycle.

This sounds like an interesting/acceptable way to handle this problem to me.  
Update the node to set the desired raid state from the flavor.

>> - Inspection is geared towards using a different network and dnsmasq

>> infrastructure than what is in use for ironic/neutron.  Which also means 
>> that in
>> order to not conflict with dhcp requests for servers in ironic I need to use
>> different networks.  Which also means I now need to handle swinging server 
>> ports
>> between different networks.
>
> Inspector is designed to respond only to requests for nodes in the inspection
> phase, so that it *doesn't* conflict with provisioning of nodes by Ironic. 
> I've
> been using the same network for inspection and provisioning without issue -- 
> so
> I'm not sure what problem you're encountering here.

So I was mainly thinking about the use case of using inspector to onboard 
unknown hosts into ironic (though I see didn't mention that).  So in a 
datacenter we are always on boarding servers.  Right now we boot a linux agent 
that "inventories" the box and adds it to our management system as a node to be 
able to be consumed by a build request.  My understanding is that inspector 
supports this as of Mitaka.  However, the install guide for inspection states 
that you need to install its own dnsmasq instance for inspection.  To me this 
implies that this is suppose to be a separate network.  As if I have 2 dhcp 
servers running on the same L2 network I am going to get races between the 2 
dhcp servers for normal provisioning activities.  Especially if one dhcp server 
is configured to respond to everything (so that it can onboard unknown 
hardware) and the other only to specific hosts(ironic/neutron).  The only way 
that wouldn't be an issue is if both inspector and ironic/neutron are using the 
same dhcp servers.  Or am I missing something?

___
Kris Lindgren
Senior Linux Systems Engineer
GoDaddy
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [ironic] using ironic as a replacement for existing datacenter baremetal provisioning

2016-06-06 Thread Kris G. Lindgren
Hi ironic folks,
As I'm trying to explore how GoDaddy can use ironic I've created the following 
in an attempt to document some of my concerns, and I'm wondering if you folks 
could help myself identity ongoing work to solve these (or alternatives?)
List of concerns with ironic:

1.)Nova <-> ironic interactions are generally seem terrible?
  -How to accept raid config and partitioning(?) from end users? Seems to not a 
yet agreed upon method between nova/ironic.
   -How to run multiple conductors/nova-computes?   Right now as far as I can 
tell all of ironic front-end by a single nova-compute, which I will have to 
manage via a cluster technology between two or mode nodes.  Because of this and 
the way host-agregates work I am unable to expose fault domains for ironic 
instances (all of ironic can only be under a single AZ (the az that is assigned 
to the nova-compute node)). Unless I create multiple nova-compute servers and 
manage multiple independent ironic setups.  This makes on-boarding/query of 
hardware capacity painful.
  - Nova appears to be forcing a we are "compute" as long as "compute" is VMs, 
means that we will have a baremetal flavor explosion (ie the mismatch between 
baremetal and VM).
  - This is a feeling I got from the ironic-nova cross project meeting in 
Austin.  General exmaple goes back to raid config above. I can configure a 
single piece of hardware many different ways, but to fit into nova's world view 
I need to have many different flavors exposed to end-user.  In this way many 
flavors can map back to a single piece of hardware with just a lsightly 
different configuration applied. So how am I suppose to do a single server with 
6 drives as either: Raid 1 + Raid 5, Raid 5, Raid 10, Raid 6, or JBOD.  Seems 
like I would need to pre-mark out servers that were going to be a specific raid 
level.  Which means that I need to start managing additional sub-pools of 
hardware to just deal with how the end users wants the raid configured, this is 
pretty much a non-starter for us.  I have not really heard of whats being done 
on this specific front.

2.) Inspector:
  - IPA service doesn't gather port/switching information
  - Inspection service doesn't process port/switching information, which means 
that it wont add it to ironic.  Which makes managing network swinging of the 
host a non-starter.  As I would inspect the host – then modify the ironic 
record to add the details about what port/switch the server is connected to 
from a different source.  At that point why wouldn't I just onboard everything 
through the API?
  - Doesn't grab hardware disk configurations, If the server has multiple raids 
(r1 + r5) only reports boot raid disk capacity.
  - Inspection is geared towards using a different network and dnsmasq 
infrastructure than what is in use for ironic/neutron.  Which also means that 
in order to not conflict with dhcp requests for servers in ironic I need to use 
different networks.  Which also means I now need to handle swinging server 
ports between different networks.

3.) IPA image:
  - Default build stuff is pinned to extremly old versions due to gate failure 
issues. So I can not work without a fork for onboard of servers due to the fact 
that IPMI modules aren't built for the kernel, so inspection can never match 
the node against ironic.  Seems like currently functionality here is MVP for 
gate to work and to deploy images.  But if you need to do firmware, 
bios-config, any other hardware specific features you are pretty much going to 
need to roll your own IPA image and IPA modules to do standard provisioning 
tasks.

4.) Conductor:
  - Serial-over-lan consoles require a unique port on the conductor server (I 
have seen purposes to try and fix this?), this is painful to manage with large 
numbers of servers.
  - SOL consoles aren't restarted when conductor is restarted (I think this 
might be fixed in newer versions of ironic?), again if end users aren't suppose 
to consume ironic api's directly - this is painful to handle.
  - Its very easy to get a node to fall off the staemachine rails (reboot a 
server while an image is being deployed to it), the only way I have seen to be 
able to fix this is to update the DB directly.
  - As far as I can tell shell-in-a- box, SOL consoles aren't support via nova 
– so how are end users suppose to consume the shell-in-box console?
  - I have BMC that need specific configuration (some require SOL on com2, 
others on com1) this makes it pretty much impossible without per box overrides 
against the conductor hardcoded templates.
  - Additionally it would be nice to default to having a provisioning 
kernel/image that was set as a single config option with per server overrides – 
rather than on each server.  If we ever change the IPA image – that means at 
scale we would need to update thousands of ironic nodes.

What is ironic doing to monitor the hardware for failures?  I assume the answer 
here is nothing and that we will 

Re: [openstack-dev] [Openstack-operators] [barbican]barbican github installation failing

2016-05-10 Thread Kris G. Lindgren
Uwsgi is a way to run the API portion of a python code base.  You most likely 
need to install uwsgi for you operating system.

http://uwsgi-docs.readthedocs.io/en/latest/

___
Kris Lindgren
Senior Linux Systems Engineer
GoDaddy

From: Akshay Kumar Sanghai 
>
Date: Tuesday, May 10, 2016 at 11:15 AM
To: "OpenStack Development Mailing List (not for usage questions)" 
>, 
openstack-operators 
>
Subject: [Openstack-operators] [openstack-dev][barbican]barbican github 
installation failing

Hi,
I have a 4 node working setup of openstack (1 controller, 1 network node, 2 
compute node).
I am trying to use ssl offload feature of lbaas v2. For that I need tls 
containers, hence barbican.
I did a git clone of barbican repo from https://github.com/openstack/barbican
Then ./bin/barbican.sh install
I am getting this error

Traceback (most recent call last):
  File "/usr/lib/python2.7/dist-packages/mock/mock.py", line 1305, in patched
return func(*args, **keywargs)
  File "barbican/tests/queue/test_keystone_listener.py", line 327, in 
test_should_wait
msg_server = keystone_listener.MessageServer(self.conf)
  File "barbican/queue/keystone_listener.py", line 156, in __init__
endpoints=[self])
  File "barbican/queue/__init__.py", line 112, in get_notification_server
allow_requeue)
TypeError: __init__() takes exactly 3 arguments (5 given)
Ran 1246 tests in 172.776s (-10.533s)
FAILED (id=1, failures=4, skips=4)
error: testr failed (1)
Starting barbican...
./bin/barbican.sh: line 57: uwsgi: command not found

Please help me.

Thanks
Akshay
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Openstack-operators] [keystone] RBAC usage at production

2015-12-09 Thread Kris G. Lindgren
In other projects the policy.json file is read each time of api request.  So 
changes to the file take place immediately.  I was 90% sure keystone was the 
same way?

___
Kris Lindgren
Senior Linux Systems Engineer
GoDaddy







On 12/9/15, 1:39 AM, "Oguz Yarimtepe"  wrote:

>Hi,
>
>I am wondering whether there are people using RBAC at production. The 
>policy.json file has a structure that requires restart of the service 
>each time you edit the file. Is there and on the fly solution or tips 
>about it?
>
>
>
>___
>OpenStack-operators mailing list
>openstack-operat...@lists.openstack.org
>http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Performance][Proposal] Moving IRC meeting from 15:00 UTC to 16:00 UTC

2015-12-04 Thread Kris G. Lindgren
+1

___
Kris Lindgren
Senior Linux Systems Engineer
GoDaddy

From: Dina Belova >
Date: Friday, December 4, 2015 at 2:46 AM
To: OpenStack Development Mailing List 
>, 
openstack-operators 
>
Subject: [Performance][Proposal] Moving IRC meeting from 15:00 UTC to 16:00 UTC

Dear performance folks,

There is a suggestion to move our meeting time from 15:00 UTC 
(Tuesdays) 
to 16:00 UTC (also 
Tuesdays) 
to make them more comfortable for US guys.

Please leave your +1 / -1 here in the email thread.

Btw +1 from me :)

Cheers,
Dina
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [nova] [openstack-operators] Tools to move instances between projects?

2015-12-02 Thread Kris G. Lindgren
Hello,

I was wondering if someone has a set of tools/code to work allow admins to move 
vm's from one tenant to another?  We get asked this fairly frequently in our 
internal cloud (atleast once a week, more when we start going through and 
cleaning up resources for people who are no longer with the company).   I have 
searched and I was able to find anything externally.

Matt Riedemann pointed me to an older spec for nova : 
https://review.openstack.org/#/c/105367/ for nova.  I realize that this will 
most likely need to be a cross projects effort.  Since vm's consume resources 
for multiple other projects, and to move a VM between projects would also 
require that those other resources get updated as well.

Is anyone aware of a cross project spec to handle this – or of specs in other 
projects?
___
Kris Lindgren
Senior Linux Systems Engineer
GoDaddy
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Openstack-operators] [nova] [openstack-operators] Tools to move instances between projects?

2015-12-02 Thread Kris G. Lindgren
I can describe our specific uses cases, not sure our same limitations apply to 
everyone.

Every developer in our company has a project created for them (user-username) 
they are allowed to spinup 5 vm's in this project to do dev/test/POC whatever.  
These projects are not tied into show back or usage that is done internally for 
orgs.  It's simply done to allow any dev to have immediate access to servers so 
that they cant test out ideas/try something ect ect.  Actual applications/teams 
create projects.  Resources used in these projects are done as show back model 
to allow us to move fake money around to help purchase capacity for the cloud.  
We are moving to a lease model for for user- projects, where we we 
automatically, unless action is taken by the user, reclaiming those resources 
after x number of days.  Additionally, every so often we cleanup projects that 
are tied to users that are no longer with the company.  It's during these 
actions that we usually find people asking if we can transfer vm's from one 
project to another project.  Only the employee has access to their 
user- project within openstack.

For us - we don't allow snapshots in our private cloud.  We encourage all of 
our devs to be able to rebuild any vm that is running in cloud at any time.  
Which is the line we have been toting for these requests.  However, we would 
still like to be able to support their requests.  Additionally, all of our vm's 
are joined to a domain (both linux and windows), taking a snapshot of the 
server and trying to spin it up a replacement is problematic with servers 
joined to the domain - specifically windows.  It also doesn't take care of 
floating ip's or applied security group rules, volumes that are mapped ect ect.

Taking a snapshot of the vm, making it public, booting another vm from that 
snapshot, deleting the old vm and the snapshot is pretty heavy handed... when 
we really just need to update in nova with which project the vm falls under.

We have also had people who didn't pay attention to which project they created 
vm's under and asked us later if we could move the vm from tenant x to tenant y.

We try to have cattle, but people, apparently, really like cows as pets.

___
Kris Lindgren
Senior Linux Systems Engineer
GoDaddy






On 12/2/15, 3:50 PM, "Matt Riedemann" <mrie...@linux.vnet.ibm.com> wrote:

>
>
>On 12/2/2015 2:52 PM, Kris G. Lindgren wrote:
>> Hello,
>>
>> I was wondering if someone has a set of tools/code to work allow admins
>> to move vm's from one tenant to another?  We get asked this fairly
>> frequently in our internal cloud (atleast once a week, more when we
>> start going through and cleaning up resources for people who are no
>> longer with the company).   I have searched and I was able to find
>> anything externally.
>>
>> Matt Riedemann pointed me to an older spec for nova :
>> https://review.openstack.org/#/c/105367/ for nova.  I realize that this
>> will most likely need to be a cross projects effort.  Since vm's consume
>> resources for multiple other projects, and to move a VM between projects
>> would also require that those other resources get updated as well.
>>
>> Is anyone aware of a cross project spec to handle this – or of specs in
>> other projects?
>> ___
>> Kris Lindgren
>> Senior Linux Systems Engineer
>> GoDaddy
>>
>>
>> ___
>> OpenStack-operators mailing list
>> openstack-operat...@lists.openstack.org
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
>>
>
>I think we need a good understanding of what the use case is first. I 
>have to assume that these are pets and that's why we can't just snapshot 
>an instance and then the new user/project can boot an instance from that.
>
>Quotas are going to be a big issue here I'd think, along with any 
>orchestration that nova would need to do with other services like 
>cinder/glance/neutron to transfer ownership of volumes or network 
>resources (ports), and those projects also have their own quota frameworks.
>
>-- 
>
>Thanks,
>
>Matt Riedemann
>
>
>___
>OpenStack-operators mailing list
>openstack-operat...@lists.openstack.org
>http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Openstack-operators] [logs] Neutron not logging user information on wsgi requests by default

2015-11-19 Thread Kris G. Lindgren
Sorry, I missed this earlier.

I was in no way meaning to suggest running devstack for production. I was 
asking that operators look at their wsgi logs, and see if they are logging a 
username/tenant for Neutron requests. If not, providing a way to fix that 
(which I happened to take from DevStack). We had an issue where we tried to 
trace an event that happened back to a user and its impossible without that 
logging being in place. In directly I was also pointing out that if the 
devstack config modifies the logging context, and that changed context is used 
to make sure that logging is consistently happening across projects. It becomes 
problematic when people don't override the same context in their configs and 
end up wondering why devstack logs usernames/tenants but their install doesn't. 
 Additionally, people may not even realize it's happening until they need to 
use the logs to figure out who did something.

Hi,

I started looking into the issue in Liberty cycle already. The reason why
devstack messes with the config option for multiple projects is to replace
user and project ids with their names. The fix we came up with oslo team is
to add a new configuration option for oslo.log that would allow to request
names without replacing the format string:
https://review.openstack.org/#/c/218139/

Once it’s in, we will make devstack set it, but otherwise avoid messing
with oslo.log config options. In that way, any updates for the format
string in oslo.log will propagate to all services.

It seems that you suggest you use devstack for production though. I would
like to note that it’s not designed for production but for developers only.
You should generally use something more stable and more secure than
devstack.

Ihar

___
Kris Lindgren
Senior Linux Systems Engineer
GoDaddy
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [nova] [Openstack-operators] Profiling nova-conductor and eventlet

2015-11-19 Thread Kris G. Lindgren
Calling all Proflers!

I am running into an issue with CPU usage on remote nova-conductor and I am 
trying to profile it to see where its consuming the most amount of cpu, so that 
we can investigate further.  The etherpad where we have been working on this 
issue is located at: 
https://etherpad.openstack.org/p/remote-conductor-performance

I tired running nova conductor under Cprofiler with 20 workers, however it only 
saw the main thread.  So I started conducotr with a single worker and cProfiler 
was able to see more requests.
I ran the following:

  *   python -m cProfile -o conductor-1worker /usr/bin/nova-conductor

  *   ./gprof2dot/gprof2dot.py -f pstats conductor-1worker | dot -Tsvg -o 
conductor-1worker.svg

  *   SVG output here: http://tempsend.com/5340867576/F283/conductor1worker.svg

The SVG showed that most of the time was spent in multiple different locations 
but AMQP seemed to be the highest consume of time.  However, I am reading that 
cProfiler can get confused under eventlet so I am wondering if the cProfiler 
can be trusted.  I tired using greenletprofiler however, that started to causes 
errors to be thrown in the conductor logs around connecting to rabbitmq.

So what I am asking is those of you who have experience debugging python with 
eventlet, what tooling do you use?
___
Kris Lindgren
Senior Linux Systems Engineer
GoDaddy
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [Openstack-operators] [logs] Neutron not logging user information on wsgi requests by default

2015-11-06 Thread Kris G. Lindgren
Hello all,

I noticed the otherday that in our Openstack install (Kilo) Neutron seems to be 
the only project that was not logging the username/tenant information on every 
wsgi request.  Nova/Glance/heat all log a username and/or project on each 
request.  Our wsgi logs from neutron look like the following:

2015-11-05 13:45:24.302 14549 INFO neutron.wsgi 
[req-ab633261-da6d-4ac7-8a35-5d321a8b4a8f ] 10.224.48.132 - - [05/Nov/2015 
13:45:24]
"GET /v2.0/networks.json?id=2d5fe344-4e98-4ccc-8c91-b8064d17c64c HTTP/1.1" 200 
655 0.027550

I did a fair amount of digging and it seems that devstack is by default 
overriding the context log format for neutron to add the username/tenant 
information into the logs.  However, there is active work to remove this 
override from devstack[1].  However, using the devstack way I was able to true 
up our neutron wsgi logs to be inline with what other services are providing.

If you add:
loggin_context_format_string = %(asctime)s.%(msecs)03d %(levelname)s %(name)s 
[%(request_id)s %(user_name)s %(project_name)s] %(instance)s%(message)s

To the [DEFAULT] section of neutron.conf and restart neutron-server.  You will 
now get log output like the following:

 2015-11-05 18:07:31.033 INFO neutron.wsgi 
[req-ebf1d3c9-b556-48a7-b1fa-475dd9df0bf7  ] 10.224.48.132 - - [05/Nov/2015 18:07:31]
"GET /v2.0/networks.json?id=55e1b92a-a2a3-4d64-a2d8-4b0bee46f3bf HTTP/1.1" 200 
617 0.035515

So go forth, check your logs, and before you need to use your logs to debug who 
did what,when, and where.  Get the information that you need added to the wsgi 
logs.  if you are not seeing wsgi logs for your projects trying enabling 
verbose=true in the [DEFAULT] section as well.

Adding [logs] tag since it would be nice to have all projects logging to a 
standard wsgi format out of the gate.

[1] - https://review.openstack.org/#/c/172510/2
___
Kris Lindgren
Senior Linux Systems Engineer
GoDaddy
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Openstack-operators] [logs] Neutron not logging user information on wsgi requests by default

2015-11-06 Thread Kris G. Lindgren
Fixes to below:

logging_context_format_string is the correct config option name.

The exact link that I wanted for [1] below is actually: 
https://review.openstack.org/#/c/172508/2/lib/neutron-legacy

___
Kris Lindgren
Senior Linux Systems Engineer
GoDaddy

From: "Kris G. Lindgren" <klindg...@godaddy.com<mailto:klindg...@godaddy.com>>
Date: Friday, November 6, 2015 at 10:27 AM
To: 
"openstack-dev@lists.openstack.org<mailto:openstack-dev@lists.openstack.org>" 
<openstack-dev@lists.openstack.org<mailto:openstack-dev@lists.openstack.org>>
Subject: [Openstack-operators] [logs] Neutron not logging user information on 
wsgi requests by default

Hello all,

I noticed the otherday that in our Openstack install (Kilo) Neutron seems to be 
the only project that was not logging the username/tenant information on every 
wsgi request.  Nova/Glance/heat all log a username and/or project on each 
request.  Our wsgi logs from neutron look like the following:

2015-11-05 13:45:24.302 14549 INFO neutron.wsgi 
[req-ab633261-da6d-4ac7-8a35-5d321a8b4a8f ] 10.224.48.132 - - [05/Nov/2015 
13:45:24]
"GET /v2.0/networks.json?id=2d5fe344-4e98-4ccc-8c91-b8064d17c64c HTTP/1.1" 200 
655 0.027550

I did a fair amount of digging and it seems that devstack is by default 
overriding the context log format for neutron to add the username/tenant 
information into the logs.  However, there is active work to remove this 
override from devstack[1].  However, using the devstack way I was able to true 
up our neutron wsgi logs to be inline with what other services are providing.

If you add:
loggin_context_format_string = %(asctime)s.%(msecs)03d %(levelname)s %(name)s 
[%(request_id)s %(user_name)s %(project_name)s] %(instance)s%(message)s

To the [DEFAULT] section of neutron.conf and restart neutron-server.  You will 
now get log output like the following:

 2015-11-05 18:07:31.033 INFO neutron.wsgi 
[req-ebf1d3c9-b556-48a7-b1fa-475dd9df0bf7  ] 10.224.48.132 - - [05/Nov/2015 18:07:31]
"GET /v2.0/networks.json?id=55e1b92a-a2a3-4d64-a2d8-4b0bee46f3bf HTTP/1.1" 200 
617 0.035515

So go forth, check your logs, and before you need to use your logs to debug who 
did what,when, and where.  Get the information that you need added to the wsgi 
logs.  if you are not seeing wsgi logs for your projects trying enabling 
verbose=true in the [DEFAULT] section as well.

Adding [logs] tag since it would be nice to have all projects logging to a 
standard wsgi format out of the gate.

[1] - https://review.openstack.org/#/c/172510/2
___
Kris Lindgren
Senior Linux Systems Engineer
GoDaddy
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Openstack-operators] [nova] Min libvirt for Mitaka is 0.10.2 and suggest Nxxx uses 1.1.1

2015-10-07 Thread Kris G. Lindgren
Please see inline.

___
Kris Lindgren
Senior Linux Systems Engineer
GoDaddy







On 10/7/15, 6:12 AM, "Tim Bell"  wrote:

>
>
>> -Original Message-
>> From: Daniel P. Berrange [mailto:berra...@redhat.com]
>> Sent: 07 October 2015 13:25
>> To: Tim Bell 
>> Cc: Sean Dague ; OpenStack Development Mailing List
>> (not for usage questions) ; openstack-
>> operat...@lists.openstack.org
>> Subject: Re: [Openstack-operators] [openstack-dev] [nova] Min libvirt for
>> Mitaka is 0.10.2 and suggest Nxxx uses 1.1.1
>>
>> On Wed, Oct 07, 2015 at 11:13:12AM +, Tim Bell wrote:
>> >
>> > Although Red Hat is no longer supporting RHEL 6 after Icehouse, a
>> > number of users such as GoDaddy and CERN are using Software
>> > Collections to run the Python 2.7 code.
>>
>> Do you have any educated guess as to when you might switch to deploying
>> new OpenStack version exclusively on RHEL 7 ? I understand such a switch is
>> likely to take a while so you can test its performance and reliability and 
>> so on,
>> but I'm assuming you'll eventually switch ?
>>
>
>I think we'll be all 7 by spring next year (i.e. when we install Liberty). The 
>software collections work is not for the faint hearted and 7 brings lots of 
>good things with it for operations so we want to get there as soon as 
>possible. Thus, I think we'd be fine with a change in Mitaka (especially given 
>the points you mention below).

Like CERN, we don't currently plan on doing the software collections + venv 
trick past kilo.  We plan on having all of our HV's running cent 7+ before we 
move to liberty.  That said, Liberty should still technically work under CentOS 
6...

I am ok dropping support for RHEL/CentOS 6 in N.

>
>> > However, since this modification would only take place when Mitaka
>> > gets released, this would realistically give those sites a year to
>> > complete migration to RHEL/CentOS 7 assuming they are running from one
>> > of the community editions.
>> >
>> > What does the 1.1.1 version bring that is the motivation for raising
>> > the limit ?
>>
>> If we require 1.1.1 we could have unconditional support for
>>
>>  - Hot-unplug of PCI devices (needs 1.1.1)
>>  - Live snapshots (needs 1.0.0)
>>  - Live volume snapshotting (needs 1.1.1)
>>  - Disk sector discard support (needs 1.0.6)
>>  - Hyper-V clock tunables (needs 1.0.0 & 1.1.0)
>>
>> If you lack those versions, in case of hotunplug, and live volume snapshots
>> we just refuse the corresponding API call. With live snapshots we fallback 
>> to
>> non-live snapshots. For disk discard and hyperv clock we just run with
>> degraded functionality. The lack of hyperv clock tunables means Windows
>> guests will have unreliable time keeping and are likely to suffer random
>> BSOD, which I think is a particularly important issue.
>>
>> And of course we remove a bunch of conditional logic from Nova which
>> simplifies the code paths and removes code paths which rarely get testing
>> coverage.
>>
>> Regards,
>> Daniel
>> --
>> |: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ 
>> :|
>> |: http://libvirt.org  -o- http://virt-manager.org 
>> :|
>> |: http://autobuild.org   -o- http://search.cpan.org/~danberr/ 
>> :|
>> |: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc 
>> :|
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [magnum]swarm + compose = k8s?

2015-09-30 Thread Kris G. Lindgren
We are looking at deploying magnum as an answer for how do we do containers 
company wide at Godaddy.  I am going to agree with both you and josh.

I agree that managing one large system is going to be a pain and pas experience 
tells me this wont be practical/scale, however from experience I also know 
exactly the pain Josh is talking about.

We currently have ~4k projects in our internal openstack cloud, about 1/4 of 
the projects are currently doing some form of containers on their own, with 
more joining every day.  If all of these projects were to convert of to the 
current magnum configuration we would suddenly be attempting to 
support/configure ~1k magnum clusters.  Considering that everyone will want it 
HA, we are looking at a minimum of 2 kube nodes per cluster + lbaas vips + 
floating ips.  From a capacity standpoint this is an excessive amount of 
duplicated infrastructure to spinup in projects where people maybe running 
10–20 containers per project.  From an operator support perspective this is a 
special level of hell that I do not want to get into.   Even if I am off by 
75%,  250 still sucks.

From my point of view an ideal use case for companies like ours (yahoo/godaddy) 
would be able to support hierarchical projects in magnum.  That way we could 
create a project for each department, and then the subteams of those 
departments can have their own projects.  We create a a bay per department.  
Sub-projects if they want to can support creation of their own bays (but 
support of the kube cluster would then fall to that team).  When a sub-project 
spins up a pod on a bay, minions get created inside that teams sub projects and 
the containers in that pod run on the capacity that was spun up  under that 
project, the minions for each pod would be a in a scaling group and as such 
grow/shrink as dictated by load.

The above would make it so where we support a minimal, yet imho reasonable, 
number of kube clusters, give people who can't/don’t want to fall inline with 
the provided resource a way to make their own and still offer a "good enough 
for a single company" level of multi-tenancy.

>Joshua,
>
>If you share resources, you give up multi-tenancy.  No COE system has the
>concept of multi-tenancy (kubernetes has some basic implementation but it
>is totally insecure).  Not only does multi-tenancy have to “look like” it
>offers multiple tenants isolation, but it actually has to deliver the
>goods.

>

>I understand that at first glance a company like Yahoo may not want >separate 
>bays for their various applications because of the perceived >administrative 
>overhead. I would then challenge Yahoo to go deploy a COE >like kubernetes 
>(which has no multi-tenancy or a very basic implementation >of such) and get 
>it to work with hundreds of different competing >applications. I would 
>speculate the administrative overhead of getting >all that to work would be 
>greater then the administrative overhead of >simply doing a bay create for the 
>various tenants.

>

>Placing tenancy inside a COE seems interesting, but no COE does that >today. 
>Maybe in the future they will. Magnum was designed to present an >integration 
>point between COEs and OpenStack today, not five years down >the road. Its not 
>as if we took shortcuts to get to where we are.

>

>I will grant you that density is lower with the current design of Magnum >vs a 
>full on integration with OpenStack within the COE itself. However, >that model 
>which is what I believe you proposed is a huge design change to >each COE 
>which would overly complicate the COE at the gain of increased >density. I 
>personally don’t feel that pain is worth the gain.


___
Kris Lindgren
Senior Linux Systems Engineer
GoDaddy
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [ops] Operator Local Patches

2015-09-29 Thread Kris G. Lindgren
Hello All,

We have some pretty good contributions of local patches on the etherpad.  We 
are going through right now and trying to group patches that multiple people 
are carrying and patches that people may not be carrying but solves a problem 
that they are running into.  If you can take some time and either add your own 
local patches that you have to the ether pad or add +1's next to the patches 
that are laid out, it would help us immensely.

The etherpad can be found at: 
https://etherpad.openstack.org/p/operator-local-patches

Thanks for your help!

___
Kris Lindgren
Senior Linux Systems Engineer
GoDaddy

From: "Kris G. Lindgren"
Date: Tuesday, September 22, 2015 at 4:21 PM
To: openstack-operators
Subject: Re: Operator Local Patches

Hello all,

Friendly reminder: If you have local patches and haven't yet done so, please 
contribute to the etherpad at: 
https://etherpad.openstack.org/p/operator-local-patches

___
Kris Lindgren
Senior Linux Systems Engineer
GoDaddy

From: "Kris G. Lindgren"
Date: Friday, September 18, 2015 at 4:35 PM
To: openstack-operators
Cc: Tom Fifield
Subject: Operator Local Patches

Hello Operators!

During the ops meetup in Palo Alto were we talking about sessions for Tokyo. A 
session that I purposed, that got a bunch of +1's,  was about local patches 
that operators were carrying.  From my experience this is done to either 
implement business logic,  fix assumptions in projects that do not apply to 
your implementation, implement business requirements that are not yet 
implemented in openstack, or fix scale related bugs.  What I would like to do 
is get a working group together to do the following:

1.) Document local patches that operators have (even those that are in gerrit 
right now waiting to be committed upstream)
2.) Figure out commonality in those patches
3.) Either upstream the common fixes to the appropriate projects or figure out 
if a hook can be added to allow people to run their code at that specific point
4.) 
5.) Profit

To start this off, I have documented every patch, along with a description of 
what it does and why we did it (where needed), that GoDaddy is running [1].  
What I am asking is that the operator community please update the etherpad with 
the patches that you are running, so that we have a good starting point for 
discussions in Tokyo and beyond.

[1] - https://etherpad.openstack.org/p/operator-local-patches
___
Kris Lindgren
Senior Linux Systems Engineer
GoDaddy
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [ops] Operator Local Patches

2015-09-23 Thread Kris G. Lindgren

Cross-posting to the dev list as well for better coverage.

___
Kris Lindgren
Senior Linux Systems Engineer
GoDaddy

From: "Kris G. Lindgren"
Date: Tuesday, September 22, 2015 at 4:21 PM
To: openstack-operators
Subject: Re: Operator Local Patches

Hello all,

Friendly reminder: If you have local patches and haven't yet done so, please 
contribute to the etherpad at: 
https://etherpad.openstack.org/p/operator-local-patches

___
Kris Lindgren
Senior Linux Systems Engineer
GoDaddy

From: "Kris G. Lindgren"
Date: Friday, September 18, 2015 at 4:35 PM
To: openstack-operators
Cc: Tom Fifield
Subject: Operator Local Patches

Hello Operators!

During the ops meetup in Palo Alto were we talking about sessions for Tokyo. A 
session that I purposed, that got a bunch of +1's,  was about local patches 
that operators were carrying.  From my experience this is done to either 
implement business logic,  fix assumptions in projects that do not apply to 
your implementation, implement business requirements that are not yet 
implemented in openstack, or fix scale related bugs.  What I would like to do 
is get a working group together to do the following:

1.) Document local patches that operators have (even those that are in gerrit 
right now waiting to be committed upstream)
2.) Figure out commonality in those patches
3.) Either upstream the common fixes to the appropriate projects or figure out 
if a hook can be added to allow people to run their code at that specific point
4.) 
5.) Profit

To start this off, I have documented every patch, along with a description of 
what it does and why we did it (where needed), that GoDaddy is running [1].  
What I am asking is that the operator community please update the etherpad with 
the patches that you are running, so that we have a good starting point for 
discussions in Tokyo and beyond.

[1] - https://etherpad.openstack.org/p/operator-local-patches
___
Kris Lindgren
Senior Linux Systems Engineer
GoDaddy
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Openstack-operators] [Large Deployments Team][Performance Team] New informal working group suggestion

2015-09-23 Thread Kris G. Lindgren
Dina,

Do we have a place to put things (etherpad) that we are seeing performance 
issues with?  I know we are seeing issues with CPU load under nova-conductor as 
well as some stuff with the neutron API timing out (seems like it never 
responds to the request (no log entry on the neutron side).

___
Kris Lindgren
Senior Linux Systems Engineer
GoDaddy

From: Matt Van Winkle
Date: Tuesday, September 22, 2015 at 7:46 AM
To: Dina Belova, OpenStack Development Mailing List, 
"openstack-operat...@lists.openstack.org"
Subject: Re: [Openstack-operators] [Large Deployments Team][Performance Team] 
New informal working group suggestion

Thanks, Dina!

For context to the rest of the LDT folks, Dina reached out to me about working 
on this under our umbrella for now.  It made sense until we understand if it's 
a large enough thing to live as its own working group because most of us have 
various performance concerns too.  So, like Public Clouds, we'll have to figure 
out how to integrate this sub group.

I suspect the time slot for Tokyo is already packed, so the work for the 
Performance subgroup may have to be informal or in other sessions, but I'll 
start working with Tom and the folks covering the session for me (since I won't 
be able to make it) on what we might be able to do.  I've also asked Dina to 
join the Oct meeting prior to the Summit so we can further discuss the sub team.

Thanks!
VW

From: Dina Belova >
Date: Tuesday, September 22, 2015 7:57 AM
To: OpenStack Development Mailing List 
>, 
"openstack-operat...@lists.openstack.org"
 
>
Subject: [Large Deployments Team][Performance Team] New informal working group 
suggestion

Hey, OpenStackers!

I'm writing to propose to organise new informal team to work specifically on 
the OpenStack performance issues. This will be a sub team in already existing 
Large Deployments Team, and I suppose it will be a good idea to gather people 
interested in OpenStack performance in one room and identify what issues are 
worrying contributors, what can be done and share results of performance 
researches :)

So please volunteer to take part in this initiative. I hope it will be many 
people interested and we'll be able to use cross-projects session 
slot to meet in Tokyo and hold a 
kick-off meeting.

I would like to apologise I'm writing to two mailing lists at the same time, 
but I want to make sure that all possibly interested people will notice the 
email.

Thanks and see you in Tokyo :)

Cheers,
Dina

--

Best regards,

Dina Belova

Senior Software Engineer

Mirantis Inc.
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] Make libguestfs available on pypi

2015-07-29 Thread Kris G. Lindgren
We are packaging nova in a venv so that we can run some kilo code on top of 
some cent6 nodes (default python install is 2.6) (additionally we are working 
on replacing the cent6 nodes with a newer os, but when you have a large number 
of machines - things take time).  We are using python27 software collections 
and pretty much everything is working.  But the issue is that libguestfs is not 
able to be installed in a venv via normal means (pip install).  I would like to 
make the request that libguestfs get added to pypi.

The following bug has already been created over a year ago [1], and it looks 
like most of the work on the libguestfs side is already done [2].  It seems 
something about a complaint of licensing per the bug report.

[1] - https://bugzilla.redhat.com/show_bug.cgi?id=1075594
[2] - 
https://github.com/libguestfs/libguestfs/commit/fcbfc4775fa2a44020974073594a745ca420d614



Kris Lindgren
Senior Linux Systems Engineer
GoDaddy, LLC.
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Openstack-operators] [nova] [neutron] Re: How do your end users use networking?

2015-06-17 Thread Kris G. Lindgren

On 6/17/15, 10:59 AM, Neil Jerram neil.jer...@metaswitch.com wrote:



On 17/06/15 16:17, Kris G. Lindgren wrote:
 See inline.
 

 Kris Lindgren
 Senior Linux Systems Engineer
 GoDaddy, LLC.



 On 6/17/15, 5:12 AM, Neil Jerram neil.jer...@metaswitch.com wrote:

 Hi Kris,

 Apologies in advance for questions that are probably really dumb - but
 there are several points here that I don't understand.

 On 17/06/15 03:44, Kris G. Lindgren wrote:
 We are doing pretty much the same thing - but in a slightly different
 way.
We extended the nova scheduler to help choose networks (IE. don't
put
 vm's on a network/host that doesn't have any available IP address).

 Why would a particular network/host not have any available IP address?

   If a created network has 1024 ip's on it (/22) and we provision 1020
vms,
   anything deployed after that will not have an additional ip address
 because
   the network doesn't have any available ip addresses (loose some ip's
to
   the network).

OK, thanks, that certainly explains the particular network possibility.

So I guess this applies where your preference would be for network A,
but it would be OK to fall back to network B, and so on.  That sounds
like it could be a useful general enhancement.

(But, if a new VM absolutely _has_ to be on, say, the 'production'
network, and the 'production' network is already fully used, you're
fundamentally stuck, aren't you?)

Yes - this would be a scheduling failure - and I am ok with that.  It does
no good to have a vm on a network that doesn't work.


What about the /host part?  Is it possible in your system for a
network to have IP addresses available, but for them not to be usable on
a particular host?

Yes this is also a possibility.  That the network allocated to a set of
hosts has IP's available but no compute capacity to spin up vms on it.
Again - I am ok with this.


 Then,
 we add into the host-aggregate that each HV is attached to a network
 metadata item which maps to the names of the neutron networks that
host
 supports.  This basically creates the mapping of which host supports
 what
 networks, so we can correctly filter hosts out during scheduling. We
do
 allow people to choose a network if they wish and we do have the
neutron
 end-point exposed. However, by default if they do not supply a boot
 command with a network, we will filter the networks down and choose
one
 for them.  That way they never hit [1].  This also works well for us,
 because the default UI that we provide our end-users is not horizon.

 Why do you define multiple networks - as opposed to just one - and why
 would one of your users want to choose a particular one of those?

 (Do you mean multiple as in public-1, public-2, ...; or multiple as in
 public, service, ...?)

   This is answered in the other email and original email as well.  But
 basically
   we have multiple L2 segments that only exists on certain switches and
 thus are
   only tied to certain hosts.  With the way neutron is currently
structured
 we
   need to create a network for each L2. So that¹s why we define multiple
 networks.

Thanks!  Ok, just to check that I really understand this:

- You have real L2 segments connecting some of your compute hosts
together - and also I guess to a ToR that does L3 to the rest of the
data center.

Correct.



- You presumably then just bridge all the TAP interfaces, on each host,
to the host's outwards-facing interface.

+ VM
|
+- Host + VM
|   |
|   + VM
|
|   + VM
|   |
+- Host + VM
|   |
ToR ---+   + VM
|
|   + VM
|   |
|- Host + VM
|
+ VM

Also correct, we are using flat provider networks (shared=true) -
however provider vlan networks would work as well.


- You specify each such setup as a network in the Neutron API - and
hence you have multiple similar networks, for your data center as a whole.

Out of interest, do you do this just because it's the Right Thing
according to the current Neutron API - i.e. because a Neutron network is
L2 - or also because it's needed in order to get the Neutron
implementation components that you use to work correctly?  For example,
so that you have a DHCP agent for each L2 network (if you use the
Neutron DHCP agent).

Somewhat both.  It was a how do I get neutron to handle this without
making drastic changes to the base level neutron concepts.  We currently
do have dhcp-agents and nova-metadata agent running in each L2 and we
specifically assign them to hosts in that L2 space.  We are currently
working on ways to remove this requirement.


   For our end users - they only care about getting a vm with a single ip
 address

Re: [openstack-dev] [Openstack-operators] [nova] [neutron] Re: How do your end users use networking?

2015-06-17 Thread Kris G. Lindgren
While I didn't know about the Neutron mid-cycle being next week.  I do happen 
to live in Fort Collins, so I could easy become available if you want to talk 
face-to-face about https://bugs.launchpad.net/neutron/+bug/1458890.


Kris Lindgren
Senior Linux Systems Engineer
GoDaddy, LLC.

From: Kyle Mestery mest...@mestery.commailto:mest...@mestery.com
Date: Wednesday, June 17, 2015 at 7:08 AM
To: OpenStack Development Mailing List (not for usage questions) 
openstack-dev@lists.openstack.orgmailto:openstack-dev@lists.openstack.org
Cc: 
openstack-operat...@lists.openstack.orgmailto:openstack-operat...@lists.openstack.org
 
openstack-operat...@lists.openstack.orgmailto:openstack-operat...@lists.openstack.org
Subject: Re: [Openstack-operators] [openstack-dev] [nova] [neutron] Re: How do 
your end users use networking?

On Wed, Jun 17, 2015 at 1:59 AM, Armando M. 
arma...@gmail.commailto:arma...@gmail.com wrote:


On 16 June 2015 at 22:36, Sam Morrison 
sorri...@gmail.commailto:sorri...@gmail.com wrote:

On 17 Jun 2015, at 10:56 am, Armando M. 
arma...@gmail.commailto:arma...@gmail.com wrote:



On 16 June 2015 at 17:31, Sam Morrison 
sorri...@gmail.commailto:sorri...@gmail.com wrote:
We at NeCTAR are starting the transition to neutron from nova-net and neutron 
almost does what we want.

We have 10 “public networks and 10 “service networks and depending on which 
compute node you land on you get attached to one of them.

In neutron speak we have multiple shared externally routed provider networks. 
We don’t have any tenant networks or any other fancy stuff yet.
How I’ve currently got this set up is by creating 10 networks and subsequent 
subnets eg. public-1, public-2, public-3 … and service-1, service-2, service-3 
and so on.

In nova we have made a slight change in allocate for instance [1] whereby the 
compute node has a designated hardcoded network_ids for the public and service 
network it is physically attached to.
We have also made changes in the nova API so users can’t select a network and 
the neutron endpoint is not registered in keystone.

That all works fine but ideally I want a user to be able to choose if they want 
a public and or service network. We can’t let them as we have 10 public 
networks, we almost need something in neutron like a network group” or 
something that allows a user to select “public” and it allocates them a port in 
one of the underlying public networks.

I tried going down the route of having 1 public and 1 service network in 
neutron then creating 10 subnets under each. That works until you get to things 
like dhcp-agent and metadata agent although this looks like it could work with 
a few minor changes. Basically I need a dhcp-agent to be spun up per subnet and 
ensure they are spun up in the right place.

I’m not sure what the correct way of doing this. What are other people doing in 
the interim until this kind of use case can be done in Neutron?

Would something like [1] be adequate to address your use case? If not, I'd 
suggest you to file an RFE bug (more details in [2]), so that we can keep the 
discussion focused on this specific case.

HTH
Armando

[1] https://blueprints.launchpad.net/neutron/+spec/rbac-networks

That’s not applicable in this case. We don’t care about what tenants are when 
in this case.

[2] 
https://github.com/openstack/neutron/blob/master/doc/source/policies/blueprints.rst#neutron-request-for-feature-enhancements

The bug Kris mentioned outlines all I want too I think.

I don't know what you're referring to.


Armando, I think this is the bug he's referring to:

https://bugs.launchpad.net/neutron/+bug/1458890

This is something I'd like to look at next week during the mid-cycle, 
especially since Carl is there and his spec for routed networks [2] covers a 
lot of these use cases.

[2] https://review.openstack.org/#/c/172244/


Sam





Cheers,
Sam

[1] 
https://github.com/NeCTAR-RC/nova/commit/1bc2396edc684f83ce471dd9dc9219c4635afb12



 On 17 Jun 2015, at 12:20 am, Jay Pipes 
 jaypi...@gmail.commailto:jaypi...@gmail.com wrote:

 Adding -dev because of the reference to the Neutron Get me a network spec. 
 Also adding [nova] and [neutron] subject markers.

 Comments inline, Kris.

 On 05/22/2015 09:28 PM, Kris G. Lindgren wrote:
 During the Openstack summit this week I got to talk to a number of other
 operators of large Openstack deployments about how they do networking.
  I was happy, surprised even, to find that a number of us are using a
 similar type of networking strategy.  That we have similar challenges
 around networking and are solving it in our own but very similar way.
  It is always nice to see that other people are doing the same things
 as you or see the same issues as you are and that you are not crazy.
 So in that vein, I wanted to reach out to the rest of the Ops Community
 and ask one pretty simple question.

 Would it be accurate to say that most of your end users want almost