from:"Mike Bayer"

Re: [openstack-dev] [trove][all][tc] A proposal to rearchitect Trove

2017-06-23 Thread Mike Bayer




On 06/22/2017 11:59 AM, Fox, Kevin M wrote:

My $0.02.

That view of dependencies is why Kubernetes development is outpacing OpenStacks 
and some users are leaving IMO. Not trying to be mean here but trying to shine 
some light on this issue.

Kubernetes at its core has essentially something kind of equivalent to keystone 
(k8s rbac), nova (container mgmt), cinder (pv/pvc/storageclasses), heat with 
convergence (deployments/daemonsets/etc), barbican (secrets), designate 
(kube-dns), and octavia (kube-proxy,svc,ingress) in one unit. Ops dont have to 
work hard to get all of it, users can assume its all there, and devs don't have 
many silo's to cross to implement features that touch multiple pieces.

This core functionality being combined has allowed them to land features that 
are really important to users but has proven difficult for OpenStack to do 
because of the silo's. OpenStack's general pattern has been, stand up a new 
service for new feature, then no one wants to depend on it so its ignored and 
each silo reimplements a lesser version of it themselves.

The OpenStack commons then continues to suffer.

We need to stop this destructive cycle.

OpenStack needs to figure out how to increase its commons. Both internally and 
externally. etcd as a common service was a step in the right direction.


+1 to this, and it's a similar theme to my dismay a few weeks ago when I 
realized projects are looking to ditch oslo rather than improve it; 
since then I got to chase down a totally avoidable problem in Zaqar 
that's been confusing dozens of people because zaqar implemented their 
database layer as direct-to-SQLAlchemy rather than using oslo.db 
(https://bugs.launchpad.net/tripleo/+bug/1691951) and missed out on some 
basic stability features that oslo.db turns on.


There is a balance to be struck between monolithic and expansive for 
sure, but I think the monolith-phobia may be affecting the quality of 
the product.  It is possible to have clean modularity and separation of 
concerns in a codebase while still having tighter dependencies, it just 
takes more discipline to monitor the boundaries.





I think k8s needs to be another common service all the others can rely on. That 
could greatly simplify the rest of the OpenStack projects as a lot of its 
functionality no longer has to be implemented in each project.

We also need a way to break down the silo walls and allow more cross project 
collaboration for features. I fear the new push for letting projects run 
standalone will make this worse, not better, further fracturing OpenStack.

Thanks,
Kevin

From: Thierry Carrez [thie...@openstack.org]
Sent: Thursday, June 22, 2017 12:58 AM
To: openstack-dev@lists.openstack.org
Subject: Re: [openstack-dev] [trove][all][tc] A proposal to rearchitect Trove

Fox, Kevin M wrote:

[...]
If you build a Tessmaster clone just to do mariadb, then you share nothing with 
the other communities and have to reinvent the wheel, yet again. Operators load 
increases because the tool doesn't function like other tools.

If you rely on a container orchestration engine that's already cross cloud that 
can be easily deployed by user or cloud operator, and fill in the gaps with 
what Trove wants to support, easy management of db's, you get to reuse a lot of 
the commons and the users slight increase in investment in dealing with the bit 
of extra plumbing in there allows other things to also be easily added to their 
cluster. Its very rare that a user would need to deploy/manage only a database. 
The net load on the operator decreases, not increases.


I think the user-side tool could totally deploy on Kubernetes clusters
-- if that was the only possible target that would make it a Kubernetes
tool more than an open infrastructure tool, but that's definitely a
possibility. I'm not sure work is needed there though, there are already
tools (or charts) doing that ?

For a server-side approach where you want to provide a DB-provisioning
API, I fear that making the functionality depend on K8s would make
TroveV2/Hoard would not only depend on Heat and Nova, but also depend on
something that would deploy a Kubernetes cluster (Magnum?), which would
likely hurt its adoption (and reusability in simpler setups). Since
databases would just work perfectly well in VMs, it feels like a
gratuitous dependency addition ?

We generally need to be very careful about creating dependencies between
OpenStack projects. On one side there are base services (like Keystone)
that we said it was alright to depend on, but depending on anything else
is likely to reduce adoption. Magnum adoption suffers from its
dependency on Heat. If Heat starts depending on Zaqar, we make the
problem worse. I understand it's a hard trade-off: you want to reuse
functionality rather than reinvent it in every project... we just need
to recognize the cost of doing that.

--
Thierry Carrez (ttx)

Re: [openstack-dev] [trove][all][tc] A proposal to rearchitect Trove

2017-06-20 Thread Mike Bayer




On 06/20/2017 11:45 AM, Jay Pipes wrote:

Good discussion, Zane. Comments inline.

On 06/20/2017 11:01 AM, Zane Bitter wrote:


2) The database VMs are created in a project belonging to the operator 
of the service. They're connected to the user's network through 
, and isolated from other users' databases running in the same 
project through . 
Trove has its own quota management and billing. The user cannot 
interact with the server using Nova since it is owned by a different 
project. On a cloud that doesn't include Trove, a user could run Trove 
as an application themselves, by giving it credentials for their own 
project and disabling all of the cross-tenant networking stuff.


None of the above :)

Don't think about VMs at all. Or networking plumbing. Or volume storage 
or any of that.


Think only in terms of what a user of a DBaaS really wants. At the end 
of the day, all they want is an address in the cloud where they can 
point their application to write and read data from.


Do they want that data connection to be fast and reliable? Of course, 
but how that happens is irrelevant to them


Do they want that data to be safe and backed up? Of course, but how that 
happens is irrelevant to them.


Hi, I'm just newb trying to follow along...isnt that what #2 is 
proposing?  just it's talking about the implementation a bit.


(Guess this comes down to the terms "user" and "operator" - e.g. 
"operator" has the VMs w/ the DBs, "user" gets a login to a DB.  "user" 
is the person who pushes the trove button to "give me a database")






The problem with many of these high-level *aaS projects is that they 
consider their user to be a typical tenant of general cloud 
infrastructure -- focused on launching VMs and creating volumes and 
networks etc. And the discussions around the implementation of these 
projects always comes back to minutia about how to set up secure 
communication channels between a control plane message bus and the 
service VMs.


If you create these projects as applications that run on cloud 
infrastructure (OpenStack, k8s or otherwise), then the discussions focus 
instead on how the real end-users -- the ones that actually call the 
APIs and utilize the service -- would interact with the APIs and not the 
underlying infrastructure itself.


Here's an example to think about...

What if a provider of this DBaaS service wanted to jam 100 database 
instances on a single VM and provide connectivity to those database 
instances to 100 different tenants?


Would those tenants know if those databases were all serviced from a 
single database server process running on the VM? Or 100 contains each 
running a separate database server process? Or 10 containers running 10 
database server processes each?


No, of course not. And the tenant wouldn't care at all, because the 
point of the DBaaS service is to get a database. It isn't to get one or 
more VMs/containers/baremetal servers.


At the end of the day, I think Trove is best implemented as a hosted 
application that exposes an API to its users that is entirely separate 
from the underlying infrastructure APIs like Cinder/Nova/Neutron.


This is similar to Kevin's k8s Operator idea, which I support but in a 
generic fashion that isn't specific to k8s.


In the same way that k8s abstracts the underlying infrastructure (via 
its "cloud provider" concept), I think that Trove and similar projects 
need to use a similar abstraction and focus on providing a different API 
to their users that doesn't leak the underlying infrastructure API 
concepts out.


Best,
-jay

Of course the current situation, as Amrith alluded to, where the 
default is option (1) except without the lock-down feature in Nova, 
though some operators are deploying option (2) but it's not tested 
upstream... clearly that's the worst of all possible worlds, and AIUI 
nobody disagrees with that.


To my mind, (1) sounds more like "applications that run on OpenStack 
(or other) infrastructure", since it doesn't require stuff like the 
admin-only cross-project networking that makes it effectively "part of 
the infrastructure itself" - as evidenced by the fact that 
unprivileged users can run it standalone with little more than a 
simple auth middleware change. But I suspect you are going to use 
similar logic to argue for (2)? I'd be interested to hear your thoughts.


cheers,
Zane.

__ 


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: 
openstack-dev-requ...@lists.openstack.org?subject:unsubscribe

http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [oslo.db] Stepping down from core

2017-06-12 Thread Mike Bayer


hey Roman -

It was a huge pleasure working w/ you on oslo.db!I hope we can 
collaborate again soon.


- mike



On 06/11/2017 10:32 AM, Roman Podoliaka wrote:

Hi all,

I recently changed job and hasn't been able to devote as much time to
oslo.db as it is expected from a core reviewer. I'm no longer working
on OpenStack, so you won't see me around much.

Anyway, it's been an amazing experience to work with all of you! Best
of luck! And see ya at various PyCon's around the world! ;)

Thanks,
Roman

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev



__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [all] etcd3 as base service - update

2017-06-09 Thread Mike Bayer




On 06/09/2017 11:12 AM, Lance Bragstad wrote:



I should have clarified. The idea was to put the keys used to encrypt 
and decrypt the tokens in etcd so that synchronizing the repository 
across a cluster for keystone nodes is easier for operators (but not 
without other operator pain as Kevin pointed out). The tokens themselves 
will remain completely non-persistent. Fernet key creation is explicitly 
controlled by operators and isn't something that end users generate.


makes sense and I agree is entirely appropriate thanks!





[0] 
https://github.com/openstack/keystone/blob/c528539879e824b8e6d5654292a85ccbee6dcf89/keystone/conf/fernet_tokens.py#L44-L54

[1] https://launchpad.net/bugs/1649616






On Thu, Jun 8, 2017 at 11:37 AM, Mike Bayer <mba...@redhat.com
<mailto:mba...@redhat.com> <mailto:mba...@redhat.com
<mailto:mba...@redhat.com>>> wrote:



 On 06/08/2017 12:47 AM, Joshua Harlow wrote:

 So just out of curiosity, but do people really even
know what
 etcd is good for? I am thinking that there should be some
 guidance from folks in the community as to where etcd
should be
 used and where it shouldn't (otherwise we just all end
up in a
 mess).


 So far I've seen a proposal of etcd3 as a replacement for
memcached
 in keystone, and a new dogpile connector was added to
oslo.cache to
 handle referring to etcd3 as a cache backend.  This is a really
 simplistic / minimal kind of use case for a key-store.

 But, keeping in mind I don't know anything about etcd3
other than
 "it's another key-store", it's the only database used by
Kubernetes
 as a whole, which suggests it's doing a better job than
Redis in
 terms of "durable".   So I wouldn't be surprised if new /
existing
 openstack applications express some gravitational pull
towards using
 it as their own datastore as well.I'll be trying to
hang onto
 the etcd3 track as much as possible so that if/when that
happens I
 still have a job :).





 Perhaps a good idea to actually give examples of how it
should
 be used, how it shouldn't be used, what it offers, what it
 doesn't... Or at least provide links for people to read
up on this.

 Thoughts?

 Davanum Srinivas wrote:

 One clarification: Since
https://pypi.python.org/pypi/etcd3gw
<https://pypi.python.org/pypi/etcd3gw>
 <https://pypi.python.org/pypi/etcd3gw
<https://pypi.python.org/pypi/etcd3gw>> just
 uses the HTTP API (/v3alpha) it will work under both
 eventlet and
 non-eventlet environments.

 Thanks,
 Dims


 On Wed, Jun 7, 2017 at 6:47 AM, Davanum
 Srinivas<dava...@gmail.com
<mailto:dava...@gmail.com> <mailto:dava...@gmail.com
<mailto:dava...@gmail.com>>>  wrote:

 Team,

 Here's the update to the base services
resolution from
 the TC:
https://governance.openstack.org/tc/reference/base-services.html
<https://governance.openstack.org/tc/reference/base-services.html>

<https://governance.openstack.org/tc/reference/base-services.html <https://governance.openstack.org/tc/reference/base-services.html>>


 First request is to Distros, Packagers, Deployers,
 anyone who
 installs/configures OpenStack:
 Please make sure you have latest etcd 3.x
available in your
 environment for Services to use, Fedora already
does, we
 need help in
 making sure all distros and architectures are
covered.

 Any project who want to use etcd v3 API via
grpc, please
 use:
https://pypi.python.org/pypi/etcd3
<https://pypi.python.org/pypi/etcd3>
 <https://pypi.python.org/pypi/etcd3
<https://pypi.python.org/pypi/etcd3>> (works only for
 non-eventlet services)

 Those that depend on eventlet, please use the etcd3
 v3alpha HTTP API using:
https://pypi.python.org/pypi/etcd3gw
<https://pypi.python.org/pypi/etcd3gw>

Re: [openstack-dev] [all] etcd3 as base service - update

2017-06-09 Thread Mike Bayer




On 06/08/2017 04:24 PM, Julien Danjou wrote:

On Thu, Jun 08 2017, Mike Bayer wrote:



So I wouldn't be surprised if new / existing openstack applications
express some gravitational pull towards using it as their own
datastore as well. I'll be trying to hang onto the etcd3 track as much
as possible so that if/when that happens I still have a job :).


Sounds like a recipe for disaster. :)


What architectural decision in any of Openstack is *not* considered by 
some subset of folks to be a "recipe for disaster" ?  :)








__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [all] etcd3 as base service - update

2017-06-09 Thread Mike Bayer




On 06/08/2017 01:34 PM, Lance Bragstad wrote:
After digging into etcd a bit, one place this might be help deployer 
experience would be the handling of fernet keys for token encryption in 
keystone. Currently, all keys used to encrypt and decrypt tokens are 
kept on disk for each keystone node in the deployment. While simple, it 
requires operators to perform rotation on a single node and then push, 
or sync, the new key set to the rest of the nodes. This must be done in 
lock step in order to prevent early token invalidation and inconsistent 
token responses.


An alternative would be to keep the keys in etcd and make the fernet 
bits pluggable so that it's possible to read keys from disk or etcd 
(pending configuration). The advantage would be that operators could 
initiate key rotations from any keystone node in the deployment (or 
using etcd directly) and not have to worry about distributing the new 
key set. Since etcd associates metadata to the key-value pairs, we might 
be able to simplify the rotation strategy as well.


Interesting, I had the mis-conception that "fernet" keys no longer 
required any server-side storage (how is "kept-on-disk" now 
implemented?) .  We've had continuous issues with the pre-fernet 
Keystone tokens filling up databases, even when operators were correctly 
expunging old tokens; some environments just did so many requests that 
the keystone-token table still blew up to where MySQL can no longer 
delete from it without producing a too-large transaction for Galera.


So after all the "finally fernet solves this problem" we propose, hey 
lets put them *back* in the database :).  That's great.  But, lets 
please not leave "cleaning out old tokens" as some kind of 
cron/worry-about-it-later thing.  that was a terrible architectural 
decision, with apologies to whoever made it.if you're putting some 
kind of "we create an infinite, rapidly growing, 
turns-to-garbage-in-30-seconds" kind of data in a database, removing 
that data robustly and ASAP needs to be part of the process.








On Thu, Jun 8, 2017 at 11:37 AM, Mike Bayer <mba...@redhat.com 
<mailto:mba...@redhat.com>> wrote:




On 06/08/2017 12:47 AM, Joshua Harlow wrote:

So just out of curiosity, but do people really even know what
etcd is good for? I am thinking that there should be some
guidance from folks in the community as to where etcd should be
used and where it shouldn't (otherwise we just all end up in a
mess).


So far I've seen a proposal of etcd3 as a replacement for memcached
in keystone, and a new dogpile connector was added to oslo.cache to
handle referring to etcd3 as a cache backend.  This is a really
simplistic / minimal kind of use case for a key-store.

But, keeping in mind I don't know anything about etcd3 other than
"it's another key-store", it's the only database used by Kubernetes
as a whole, which suggests it's doing a better job than Redis in
terms of "durable".   So I wouldn't be surprised if new / existing
openstack applications express some gravitational pull towards using
it as their own datastore as well.I'll be trying to hang onto
the etcd3 track as much as possible so that if/when that happens I
still have a job :).





Perhaps a good idea to actually give examples of how it should
be used, how it shouldn't be used, what it offers, what it
doesn't... Or at least provide links for people to read up on this.

Thoughts?

Davanum Srinivas wrote:

One clarification: Since
https://pypi.python.org/pypi/etcd3gw
<https://pypi.python.org/pypi/etcd3gw> just
uses the HTTP API (/v3alpha) it will work under both
eventlet and
non-eventlet environments.

Thanks,
Dims


On Wed, Jun 7, 2017 at 6:47 AM, Davanum
Srinivas<dava...@gmail.com <mailto:dava...@gmail.com>>  wrote:

Team,

Here's the update to the base services resolution from
the TC:
https://governance.openstack.org/tc/reference/base-services.html

<https://governance.openstack.org/tc/reference/base-services.html>

First request is to Distros, Packagers, Deployers,
anyone who
installs/configures OpenStack:
Please make sure you have latest etcd 3.x available in your
environment for Services to use, Fedora already does, we
need help in
making sure all distros and architectures are covered.

Any project who want to use etcd v3 API via grpc, please
use:
https://pypi.python.org/pypi/etcd3
<https://pypi.python.org/pypi/etcd3> (

Re: [openstack-dev] [all] etcd3 as base service - update

2017-06-08 Thread Mike Bayer




On 06/08/2017 12:47 AM, Joshua Harlow wrote:
So just out of curiosity, but do people really even know what etcd is 
good for? I am thinking that there should be some guidance from folks in 
the community as to where etcd should be used and where it shouldn't 
(otherwise we just all end up in a mess).


So far I've seen a proposal of etcd3 as a replacement for memcached in 
keystone, and a new dogpile connector was added to oslo.cache to handle 
referring to etcd3 as a cache backend.  This is a really simplistic / 
minimal kind of use case for a key-store.


But, keeping in mind I don't know anything about etcd3 other than "it's 
another key-store", it's the only database used by Kubernetes as a 
whole, which suggests it's doing a better job than Redis in terms of 
"durable".   So I wouldn't be surprised if new / existing openstack 
applications express some gravitational pull towards using it as their 
own datastore as well.I'll be trying to hang onto the etcd3 track as 
much as possible so that if/when that happens I still have a job :).






Perhaps a good idea to actually give examples of how it should be used, 
how it shouldn't be used, what it offers, what it doesn't... Or at least 
provide links for people to read up on this.


Thoughts?

Davanum Srinivas wrote:

One clarification: Since https://pypi.python.org/pypi/etcd3gw just
uses the HTTP API (/v3alpha) it will work under both eventlet and
non-eventlet environments.

Thanks,
Dims


On Wed, Jun 7, 2017 at 6:47 AM, Davanum Srinivas  
wrote:

Team,

Here's the update to the base services resolution from the TC:
https://governance.openstack.org/tc/reference/base-services.html

First request is to Distros, Packagers, Deployers, anyone who
installs/configures OpenStack:
Please make sure you have latest etcd 3.x available in your
environment for Services to use, Fedora already does, we need help in
making sure all distros and architectures are covered.

Any project who want to use etcd v3 API via grpc, please use:
https://pypi.python.org/pypi/etcd3 (works only for non-eventlet 
services)


Those that depend on eventlet, please use the etcd3 v3alpha HTTP API 
using:

https://pypi.python.org/pypi/etcd3gw

If you use tooz, there are 2 driver choices for you:
https://github.com/openstack/tooz/blob/master/setup.cfg#L29
https://github.com/openstack/tooz/blob/master/setup.cfg#L30

If you use oslo.cache, there is a driver for you:
https://github.com/openstack/oslo.cache/blob/master/setup.cfg#L33

Devstack installs etcd3 by default and points cinder to it:
http://git.openstack.org/cgit/openstack-dev/devstack/tree/lib/etcd3
http://git.openstack.org/cgit/openstack-dev/devstack/tree/lib/cinder#n356 



Review in progress for keystone to use etcd3 for caching:
https://review.openstack.org/#/c/469621/

Doug is working on proposal(s) for oslo.config to store some
configuration in etcd3:
https://review.openstack.org/#/c/454897/

So, feel free to turn on / test with etcd3 and report issues.

Thanks,
Dims

--
Davanum Srinivas :: https://twitter.com/dims






__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [Keystone] Cockroachdb for Keystone Multi-master

2017-05-31 Thread Mike Bayer




On 05/30/2017 09:06 PM, Jay Pipes wrote:

On 05/30/2017 05:07 PM, Clint Byrum wrote:

Excerpts from Jay Pipes's message of 2017-05-30 14:52:01 -0400:

Sorry for the delay in getting back on this... comments inline.

On 05/18/2017 06:13 PM, Adrian Turjak wrote:

Hello fellow OpenStackers,

For the last while I've been looking at options for multi-region
multi-master Keystone, as well as multi-master for other services I've
been developing and one thing that always came up was there aren't many
truly good options for a true multi-master backend.


Not sure whether you've looked into Galera? We had a geo-distributed
12-site Galera cluster servicing our Keystone assignment/identity
information WAN-replicated. Worked a charm for us at AT Much easier
to administer than master-slave replication topologies and the
performance (yes, even over WAN links) of the ws-rep replication was
excellent. And yes, I'm aware Galera doesn't have complete snapshot
isolation support, but for Keystone's workloads (heavy, heavy read, very
little write) it is indeed ideal.



This has not been my experience.

We had a 3 site, 9 node global cluster and it was _extremely_ sensitive
to latency. We'd lose even read ability whenever we had a latency storm
due to quorum problems.

Our sites were London, Dallas, and Sydney, so it was pretty common for
there to be latency between any of them.

I lost track of it after some reorgs, but I believe the solution was
to just have a single site 3-node galera for writes, and then use async
replication for reads. We even helped land patches in Keystone to allow
split read/write host configuration.


Interesting, thanks for the info. Can I ask, were you using the Galera 
cluster for read-heavy data like Keystone identity/assignment storage? 
Or did you have write-heavy data mixed in (like Keystone's old UUID 
token storage...)


I'd also throw in, there's lots of versions of Galera with different 
bugfixes / improvements as we go along, not to mention configuration 
settings if Jay observes it working great on a distributed cluster 
and Clint observes it working terribly, it could be that these were not 
the same Galera versions being used.






It should be noted that CockroachDB's documentation specifically calls 
out that it is extremely sensitive to latency due to the way it measures 
clock skew... so might not be suitable for WAN-separated clusters?


Best,
-jay

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [upgrades][skip-level][leapfrog] - RFC - Skipping releases when upgrading

2017-05-26 Thread Mike Bayer




On 05/26/2017 10:56 AM, Dan Smith wrote:

As most of the upgrade issues center around database migrations, we
discussed some of the potential pitfalls at length. One approach was to
roll-up all DB migrations into a single repository and run all upgrades
for a given project in one step. Another was to simply have mutliple
python virtual environments and just run in-line migrations from a
version specific venv (this is what the OSA tooling does). Does one way
work better than the other? Any thoughts on how this could be better?


IMHO, and speaking from a Nova perspective, I think that maintaining a
separate repo of migrations is a bad idea. We occasionally have to fix a
migration to handle a case where someone is stuck and can't move past a
certain revision due to some situation that was not originally
understood. If you have a separate copy of our migrations, you wouldn't
get those fixes. Nova hasn't compacted migrations in a while anyway, so
there's not a whole lot of value there I think.



+1 I think it's very important that migration logic not be duplicated. 
Nova's (and everyone else's) migration files have the information on how 
to move between specific schema versions.Any concatenation of these 
into an effective "N+X" migration should be on the fly as much as is 
possible.





The other thing to consider is that our _schema_ migrations often
require _data_ migrations to complete before moving on. That means you
really have to move to some milestone version of the schema, then
move/transform data, and then move to the next milestone. Since we
manage those according to releases, those are the milestones that are
most likely to be successful if you're stepping through things.

I do think that the idea of being able to generate a small utility
container (using the broad sense of the word) from each release, and
using those to step through N, N+1, N+2 to arrive at N+3 makes the most
sense.


+1




Nova has offline tooling to push our data migrations (even though the
command is intended to be runnable online). The concern I would have
would be over how to push Keystone's migrations mechanically, since I
believe they moved forward with their proposal to do data migrations in
stored procedures with triggers. Presumably there is a need for
something similar to nova's online-data-migrations command which will
trip all the triggers and provide a green light for moving on?


I haven't looked at what Keystone is doing, but to the degree they are 
using triggers, those triggers would only impact new data operations as 
they continue to run into the schema that is straddling between two 
versions (e.g. old column/table still exists, data should be synced to 
new column/table).   If they are actually running a stored procedure to 
migrate existing data (which would be surprising to me...) then I'd 
assume that invokes just like any other "ALTER TABLE" instruction in 
their migrations.  If those operations themselves rely on the triggers, 
that's fine.


But a keystone person to chime in would be much better than me just 
making stuff up.










In the end, projects support N->N+1 today, so if you're just stepping
through actual 1-version gaps, you should be able to do as many of those
as you want and still be running "supported" transitions. There's a lot
of value in that, IMHO.

--Dan

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev



__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [tc] Active or passive role with our database layer

2017-05-23 Thread Mike Bayer




On 05/23/2017 03:16 PM, Edward Leafe wrote:

On May 23, 2017, at 1:43 PM, Jay Pipes  wrote:


[1] Witness the join constructs in Golang in Kubernetes as they work around 
etcd not being a relational data store:



Maybe it’s just me, but I found that Go code more understandable than some of 
the SQL we are using in the placement engine. :)

I assume that the SQL in a relational engine is faster than the same thing in 
code, but is that difference significant? For extremely large data sets I think 
that the database processing may be rate limiting, but is that the case here? 
Sometimes it seems that we are overly obsessed with optimizing data handling 
when the amount of data is relatively small. A few million records should be 
fast enough using just about anything.


When you write your app fresh, put some data into it, a few hundred 
rows, not at all.  Pull it all into memory and sort/filter all you want, 
SQL is too hard.  Push it to production!  works great.   send the 
customer your bill.


6 months later.   Customer has 10K rows.   The tools their contractor 
wrote seem a little sticky.Not sure when that happened?


A year later.  Customer is at 300K rows, nowhere near "a few million" 
records.  Application regularly crashes when asked to search and filter 
results.   Because Python interpreter uses a fair amount of memory for a 
result set, multiplied by the overhead of Python object() / dict() per 
row == 100's / 1000's of megs of memory to have 30 objects in memory 
all at once.  Multiply by dozens of threads / processes handling 
concurrent requests, Python interpreter rarely returns memory.  Then add 
latency of fetching 300K rows over the wire, converting to objects. 
Concurrent requests pile up because they're slower; == more processes, 
== more memory.


New contractor is called in to rewrite the whole thing in MongoDB.   Now 
it's fast again!   Proceed to chapter 2, "So you decided to use 
MongoDB"   :)




__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [tc] Active or passive role with our database layer

2017-05-23 Thread Mike Bayer




On 05/23/2017 01:10 PM, Octave J. Orgeron wrote:

Comments below..

On 5/21/2017 1:38 PM, Monty Taylor wrote:


For example: An HA strategy using slave promotion and a VIP that 
points at the current write master paired with an application 
incorrectly configured to do such a thing can lead to writes to the 
wrong host after a failover event and an application that seems to be 
running fine until the data turns up weird after a while.


This is definitely a more complicated area that becomes more and more 
specific to the clustering technology being used. Galera vs. MySQL 
Cluster is a good example. Galera has an active/passive architecture 
where the above issues become a concern for sure. 


This is not my understanding; Galera is multi-master and if you lose a 
node, you don't lose any committed transactions; the writesets are 
validated as acceptable by, and pushed out to all nodes before your 
commit succeeds.   There's an option to make it wait until all those 
writesets are fully written to disk as well, but even with that option 
flipped off, if you COMMIT to one node then that node explodes, you lose 
nothing. your writesets have been verified as will be accepted by all 
the other nodes.


active/active is the second bullet point on the main homepage: 
http://galeracluster.com/products/





In the "active" approach, we still document expectations, but we also 
validate them. If they are not what we expect but can be changed at 
runtime, we change them overriding conflicting environmental config, 
and if we can't, we hard-stop indicating an unsuitable environment. 
Rather than providing helper tools, we perform the steps needed 
ourselves, in the order they need to be performed, ensuring that they 
are done in the manner in which they need to be done.


This might be a trickier situation, especially if the database(s) are in 
a separate or dedicated environment that the OpenStack service processes 
don't have access to. Of course for SQL commands, this isn't a problem. 
But changing the configuration files and restarting the database may be 
a harder thing to expect.


nevertheless the HA setup within tripleo does do this, currently using 
Pacemaker and resource agents.This is within the scope of at least 
parts of Openstack.






In either approach the OpenStack service has to be able to talk to 
both old and new versions of the schema. And in either approach we 
need to make sure to limit the schema change operations to the set 
that can be accomplished in an online fashion. We also have to be 
careful to not start writing values to new columns until all of the 
nodes have been updated, because the replication stream can't 
replicate the new column value to nodes that don't have the new column.


This is another area where something like MySQL Cluster (NDB) would 
operate differently because it's an active/active architecture. So 
limiting the number of online changes while a table is locked across the 
cluster would be very important. There is also the timeouts for the 
applications to consider, something that could be abstracted again with 
oslo.db.


So the DDL we do on Galera, to confirm but also clarify Monty's point, 
is under the realm of "total order isolation", which means it's going to 
hold up the whole cluster while DDL is applied to all nodes.   Monty 
says this disqualifies it as an "online upgrade", which is because if 
you emitted DDL that had to run default values into a million rows then 
your whole cluster would temporarily have to wait for that to happen; we 
handle that by making sure we don't do migrations with that kind of data 
requirement and while yes, the DB has to wait for a schema change to 
apply, they are at least very short (in theory).   For practical 
purposes, it is *mostly* an "online" style of migration because all the 
services that talk to the database can keep on talking to the database 
without being stopped, upgraded to new software version, and restarted, 
which IMO is what's really hard about "online" upgrades.   It does mean 
that services will just have a little more latency while operations 
proceed.  Maybe we need a new term called "quasi-online" or something 
like that.


Facebook has released a Python version of their "online" schema 
migration tool for MySQL which does the full blown "create a new, blank 
table" approach, e.g. which contains the newer version of the schema, so 
that nothing at all stops or slows down at all.  And then to manage 
between the two tables while everything is running it also makes a 
"change capture" table to keep track of what's going on, and then to 
wire it all together it uses...triggers! 
https://github.com/facebookincubator/OnlineSchemaChange/wiki/How-OSC-works. 
  Crazy Facebook kids.  How we know that "make two more tables and wire 
it all together with new triggers" in fact is more performant than just, 
"add a column to the table", I'm not sure how/when they make that 
determination.   I don't see

Re: [openstack-dev] [Keystone] Cockroachdb for Keystone Multi-master

2017-05-22 Thread Mike Bayer




On 05/22/2017 05:02 AM, Thierry Carrez wrote:

Mike Bayer wrote:

On 05/18/2017 06:13 PM, Adrian Turjak wrote:


So, specifically in the realm of Keystone, since we are using sqlalchemy
we already have Postgresql support, and since Cockroachdb does talk
Postgres it shouldn't be too hard to back Keystone with it. At that
stage you have a Keystone DB that could be multi-region, multi-master,
consistent, and mostly impervious to disaster. Is that not the holy
grail for a service like Keystone? Combine that with fernet tokens and
suddenly Keystone becomes a service you can't really kill, and can
mostly forget about.


So this is exhibit A for why I think keeping some level of "this might
need to work on other databases" within a codebase is always a great
idea even if you are not actively supporting other DBs at the moment.
Even if Openstack dumped Postgresql completely, I'd not take the
rudimental PG-related utilities out of oslo.db nor would I rename all
the "mysql_XYZ" facilities to be "XYZ".
[...]

Yes, that sounds like another reason why we'd not want to aggressively
contract to the MySQL family of databases. At the very least, before we
do that, we should experiment with CockroachDB and see how reasonable it
would be to use in an OpenStack context. It might (might) hit a sweet
spot between performance, durability, database decentralization and
keeping SQL advanced features -- I'd hate it if we discovered that too late.


there's a difference between "architecting for pluggability" and 
"supporting database X, Y, and Z".   I only maintain we should keep the 
notion of pluggability around.  This doesn't mean you can't use MySQL 
specific features, it only means, anytime you're using a MySQL feature, 
it's in the context of a unit of code that would be swapped out when a 
different database backend were to be implemented.   The vast majority 
of our database code is like this already, mostly implicitly due to 
SQLAlchemy and in other cases explicitly as we see in a lot of the 
migration scripts.


I think the existence of the PG backend, combined with the immediate 
interest in getting NDB to work, and now Cockroach DB, not to mention 
that there are two major MySQL variants (MySQL, MariaDB) which do have 
significant differences (the JSON type one of the biggest examples) 
should suggest that any modern database-enabled application can't really 
afford to completely hardcode to a single database backend without at 
least basic layers of abstraction being present.	






__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [nova] Boston Forum session recap - searchlight integration

2017-05-22 Thread Mike Bayer




On 05/22/2017 05:39 AM, Matthew Booth wrote:


There are also a couple of optimisations to make which I won't bother 
with up front. Dan suggested in his CellsV2 talk that we would only 
query cells where the user actually has instances. If we find users tend 
to clump in a small number of cells this would be a significant 
optimisation, although the overhead on the api node for a query 
returning no rows is probably very little. Also, I think you mentioned 
that there's an option to tell SQLA not to batch-process rows, but that 
it is less efficient for total throughput? I suspect there would be a 
point at which we'd want that. 


it's the yield_per() option and I think you should use it up front, just 
so it's there and we can hit any issues it might cause (shouldn't be any 
provided no eager loading is used).  Have it yield on about 5 rows at a 
time.  The pymysql driver these days I think does not actually buffer 
the rows but 50 is very little anyway.





If there's a reasonable way to calculate

a tipping point, that might give us some additional life.

Bear in mind that the principal advantages to not using Searchlight are:

* It is simpler to implement
* It is simpler to manage
* It will return accurate results

Following the principal of 'as simple as possible, but no simpler', I 
think there's enormous benefit to this much simpler approach for anybody 
who doesn't need a more complex approach. However, while it reduces the 
urgency of something like the Searchlight solution, I expect there are 
going to be deployments which need that.



More over, during the query there are instances operation(
create, delete)  in parallel during the pagination/sort query,
there is situation some cells may not provide response in time,
or network connection broken, etc, many abnormal cases may
happen. How to deal with some of cells abnormal query response
is also one great factor to be considered.


Aside: For a query operation, what's the better user experience when a 
single cell is failing:


1. The whole query fails.
2. The user gets incomplete results.

Either of these are simple to implement. Incomplete results would also 
additionally be logged as an ERROR, but I can't think of any way to also 
return to the user that there's a problem with the data we returned 
without throwing an error.


Thoughts?

Matt


It's not good idea to support pagination and sort at the same
time (may not provide exactly the result end user want) if
searchlight should not be integrated.

In fact in Tricircle, when query ports from neutron where
tricircle central plugin is installed, the tricircle central
plugin do the similar cross local Neutron ports query, and not
support pagination/sort together.

Best Regards
Chaoyi Huang (joehuang)


From: Matt Riedemann [mriede...@gmail.com
]
Sent: 19 May 2017 5:21
To: openstack-dev@lists.openstack.org

Subject: [openstack-dev] [nova] Boston Forum session recap -
searchlightintegration

Hi everyone,

After previous summits where we had vertical tracks for Nova
sessions I
would provide a recap for each session.

The Forum in Boston was a bit different, so here I'm only
attempting to
recap the Forum sessions that I ran. Dan Smith led a session on
Cells
v2, John Garbutt led several sessions on the VM and Baremetal
platform
concept, and Sean Dague led sessions on hierarchical quotas and API
microversions, and I'm going to leave recaps for those sessions
to them.

I'll do these one at a time in separate emails.


Using Searchlight to list instances across cells in nova-api


The etherpad for this session is here [1]. The goal for this
session was
to explain the problem and proposed plan from the spec [2] to the
operators in the room and get feedback.

Polling the room we found that not many people are deploying
Searchlight
but most everyone was using ElasticSearch.

An immediate concern that came up was the complexity involved with
integrating Searchlight, especially around issues with latency
for state
changes and questioning how this does not redo the top-level
cells v1
sync issue. It admittedly does to an extent, but we don't have
all of
the weird side code paths with cells v1 and it should be
self-healing.
Kris Lindgren noted that the instance.usage.exists periodic
notification
from the computes hammers their notification bus; we suggested
he

Re: [openstack-dev] [tc] Active or passive role with our database layer

2017-05-21 Thread Mike Bayer




On 05/21/2017 03:38 PM, Monty Taylor wrote:

documentation on the sequence of steps the operator should take.

In the "active" approach, we still document expectations, but we also 
validate them. If they are not what we expect but can be changed at 
runtime, we change them overriding conflicting environmental config, and 
if we can't, we hard-stop indicating an unsuitable environment. Rather 
than providing helper tools, we perform the steps needed ourselves, in 
the order they need to be performed, ensuring that they are done in the 
manner in which they need to be done.


we do this in places like tripleo.   The MySQL configs and such are 
checked into the source tree, it includes details like 
innodb_file_per_table, timeouts used by haproxy, etc.   I know tripleo 
is not like the service itself like Nova but it's also not exactly 
something we hand off to the operators to figure out from scratch either.


We do some of it in oslo.db as well.  We set things like MySQL SQL_MODE. 
 We try to make sure the unicode-ish flags are set up and that we're 
using utf-8 encoding.




Some examples:

* Character Sets / Collations

We currently enforce at testing time that all database migrations are 
explicit about InnoDB. We also validate in oslo.db that table character 
sets have the string 'utf8' in them. (only on MySQL) We do not have any 
check for case-sensitive or case-insensitive collations (these affect 
sorting and comparison operations) Because we don't, different server 
config settings or different database backends for different clouds can 
actually behave differently through the REST API.


To deal with that:

First we'd have to decide whether case sensitive or case insensitive was 
what we wanted. If we decided we wanted case sensitive, we could add an 
enforcement of that in oslo.db, and write migrations to get from case 
insensitive indexes to case sensitive indexes on tables where we 
detected that a case insensitive collation had been used. If we decided 
we wanted to stick with case insensitive we could similarly add code to 
enforce it on MySQL. To enforce it actively on PostgresSQL, we'd need to 
either switch our code that's using comparisons to use the sqlalchemy 
case-insensitive versions explicitly, or maybe write some sort of 
overloaded driver for PG that turns all comparisons into 
case-insensitive, which would wrap both sides of comparisons in lower() 
calls (which has some indexing concerns, but let's ignore that for the 
moment) We could also take the 'external' approach and just document it, 
then define API tests and try to tie the insensitive behavior in the API 
to Interop Compliance. I'm not 100% sure how a db operator would 
remediate this - but PG has some fancy computed index features - so 
maybe it would be possible.


let's make the case sensitivity explicitly enforced!



A similar issue lurks with the fact that MySQL unicode storage is 3-byte 
by default and 4-byte is opt-in. We could take the 'external' approach 
and document it and assume the operator has configured their my.cnf with 
the appropriate default, or taken an 'active' approach where we override 
it in all the models and make migrations to get us from 3 to 4 byte.


let's force MySQL to use utf8mb4!   Although I am curious what is the 
actual use case we want to hit here (which gets into, zzzeek is ignorant 
as to which unicode glyphs actually live in 4-byte utf8 characters).




* Schema Upgrades

The way you roll out online schema changes is highly dependent on your 
database architecture.


Just limiting to the MySQL world:

If you do Galera, you can do roll them out in Total Order or Rolling 
fashion. Total Order locks basically everything while it's happening, so 
isn't a candidate for "online". In rolling you apply the schema change 
to one node at a time. If you do that, the application has to be able to 
deal with both forms of the table, and you have to deal with ensuring 
that data can replicate appropriately while the schema change is happening.


Galera replicates DDL operations.   If I add a column on a node, it pops 
up on the other nodes too in a similar way as transactions are 
replicated, e.g. nearly synchronous.   I would *assume* it has to do 
this in the context of it's usual transaction ordering, even though 
MySQL doesn't do transactional DDL, so that if the cluster sees 
transaction A, schema change B, transaction C that depends on B, that 
ordering is serialized appropriately.However, even if it doesn't do 
that, the rolling upgrades we do don't start the services talking to the 
new schema structures until the DDL changes are complete, and Galera is 
near-synchronous replication.


Also speaking to the "active" question, we certainly have all kinds of 
logic in Openstack (the optimistic update strategy in particular) that 
take "Galera" into account.  And of course we have Galera config inside 
of tripleo.  So that's kind of the "active" approach, I think.





If you do DRBD

Re: [openstack-dev] [tc] revised Postgresql support status patch for governance

2017-05-21 Thread Mike Bayer




On 05/21/2017 03:51 PM, Monty Taylor wrote:


So I don't see the problem of "consistent utf8 support" having much to
do with whether or not we support Posgtresql - you of course need your
"CREATE DATABASE" to include the utf8 charset like we do on MySQL, but
that's it.


That's where we stand which means that we're doing 3 byte UTF8 on MySQL,
and 4 byte on PG. That's actually an API facing difference today. It's
work to dig out of from the MySQL side, maybe the PG one is just all
super cool and done. But it's still a consideration point.


The biggest concern for me is that we're letting API behavior be 
dictated by database backend and/or database config choices. The API 
should behave like the API behaves.


The API should behave like, "we store utf-8".  We should accept that 
"utf-8" means "up to four bytes" and make sure we are using utf8mb4 for 
all MySQL backends.  That the API of MySQL has made this bizarre 
decision about what utf-8 is to be would be a bug in MySQL that needs to 
be worked around by the calling application.   Other databases that want 
to work with openstack need to also do utf-8 with four bytes.  We can 
easily add some tests to oslo.db that round trip an assortment of 
unicode glyphs to confirm this (if there's one kind of test I've written 
more than anyone should, it's pushing out non-ascii bytes to a database 
and testing they come back the same).





Sure, it's work. But that's fine. The point of that list was that there
is stuff that is work because SQLA is a leaky abstraction. Which is fine
if there are people taking that work off the table.


I would not characterize this as SQLA being a leaky abstraction.


yee !   win!:)



I'd say that at some point we didn't make a decision as to what we 
wanted to do with text input and how it would be stored or not stored 
and how it would be searched and sorted. Case sensitive collations have 
been available to us the entire time, but we never decided whether our
API was case sensitive or case insensitive. OR - we *DID* decide that 
our API is case insensitive the fact that it isn't on some deployments 
is a bug. I'm putting money on the 'nobody made a decision' answer.


I wasn't there but perhaps early Openstack versions didn't have "textual 
search" kinds of features ?   maybe they were added by folks who didn't 
consider the case sensitivity issue at that time. I'd be strongly in 
favor of making use of oslo.db / SQLAlchemy constructs that are 
explicitly case sensitive or not.  It's true, SQLAlchemy also does not 
force you to "make a decision" on this, if it did, this would be in the 
"hooray the abstraction did not leak!" category.   But SQLA makes lots 
of these kinds of decisions to be kind of hands-off about things like 
this as developers often don't want there to be a decision made here 
(lest it adds even more to the "SQLAlchemy forces me to make so many 
decisions!" complaint I have to read on twitter every day).








__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [oslo] Can we stop global requirements update?

2017-05-21 Thread Mike Bayer




On 05/20/2017 12:04 PM, Julien Danjou wrote:

On Fri, May 19 2017, Mike Bayer wrote:


IMO that's a bug for them.


Of course it's a bug. IIRC Mehdi tried to fix it without much success.


I'm inspired to see that Keystone, Nova etc. are
able to move between and eventlet backend and a mod_wsgi backend.IMO
eventlet is really not needed for those services that present a REST interface.
Although for a message queue with lots of long-running connections that receive
events, that's a place where I *would* want to use a polling / non-blocking
model.  But I'd use it explicitly, not with monkeypatching.


+1


I'd ask why not oslo.cotyledon but it seems there's a faction here that is
overall moving out of the Openstack umbrella in any case.


Not oslo because it can be used by other projects than just OpenStack.
And it's a condition of success. As Mehdi said, Oslo has been deserted
in the recent cycles, so putting a lib there as very little chance of
seeing its community and maintenance help grow. Whereas trying to reach
the whole Python ecosystem is more likely to get traction.

As a maintainer of SQLAlchemy I'm surprised you even suggest that. Or do
you plan on doing oslo.sqlalchemy? ;)


I do oslo.db (which also is not "abandoned" in any way).  the point of 
oslo is that it is an openstack-centric mediation layer between some 
common service/library and openstack.


it looks like there already is essentially such a layer for cotyledon. 
I'd just name it "oslo.cotyledon" :)  or oslo. something.  We have a 
moose.  It's cool.






Basically I think openstack should be getting off eventlet in a big way so I
guess my sentiment here is that the Gnocchi / Cotyledon /etc. faction is just
splitting off rather than serving as any kind of direction for the rest of
Openstack to start looking.  But that's only an impression, maybe projects will
use Cotyledon anyway.   If every project goes off and uses something completely
different though, then I think we're losing.   The point of oslo was to prevent
that.


I understand your concern and opinion. I think you, me and Mehdi don't
have the experience as contributors in OpenStack. I invite you to try
moving any major OpenStack project to something like oslo.service2 or
Cotyledon or to achieve any technical debt resolution in OpenStack to
have a view on hard it is to tackle. Then you'll see where we stand. :)


Sure, that's an area where I think the whole direction of openstack 
would benefit from more centralized planning, but i have been here just 
enough to observe that this kind of thing has been discussed before and 
it is of course very tricky to implement.



__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [Keystone] Cockroachdb for Keystone Multi-master

2017-05-19 Thread Mike Bayer




On 05/18/2017 06:13 PM, Adrian Turjak wrote:


So, specifically in the realm of Keystone, since we are using sqlalchemy
we already have Postgresql support, and since Cockroachdb does talk
Postgres it shouldn't be too hard to back Keystone with it. At that
stage you have a Keystone DB that could be multi-region, multi-master,
consistent, and mostly impervious to disaster. Is that not the holy
grail for a service like Keystone? Combine that with fernet tokens and
suddenly Keystone becomes a service you can't really kill, and can
mostly forget about.


So this is exhibit A for why I think keeping some level of "this might 
need to work on other databases" within a codebase is always a great 
idea even if you are not actively supporting other DBs at the moment. 
Even if Openstack dumped Postgresql completely, I'd not take the 
rudimental PG-related utilities out of oslo.db nor would I rename all 
the "mysql_XYZ" facilities to be "XYZ".


Cockroachdb advertises SQLAlchemy compatibility very prominently.  While 
their tutorial at 
https://www.cockroachlabs.com/docs/build-a-python-app-with-cockroachdb-sqlalchemy.html 
says it uses psycopg2 as the database driver, they have implemented 
their own "cockroachdb://" dialect on top of it, which likely smooths 
out the SQL dialect and connectivity quirks between real Postgresql and 
CockroachDB.


This is not the first "distributed database" to build on the Postgresql 
protocol, I did a bunch of work for a database that started out called 
"Akiban", then got merged to "FoundationDB", and then sadly was sucked 
into a black hole shaped like a huge Apple and the entire product and 
staff were gone forever.  CockroachDB seems to be filling in that same 
hole that I was hoping FoundationDB was going to do (until they fell 
into said hole).




I'm welcome to being called mad, but I am curious if anyone has looked
at this. I'm likely to do some tests at some stage regarding this,
because I'm hoping this is the solution I've been hoping to find for
quite a long time.


I'd have a blast if Keystone wanted to get into this.   Distributed / 
NewSQL is something I have a lot of optimism about.   Please keep me 
looped in.






Further reading:
https://www.cockroachlabs.com/
https://github.com/cockroachdb/cockroach
https://www.cockroachlabs.com/docs/build-a-python-app-with-cockroachdb-sqlalchemy.html

Cheers,
- Adrian Turjak


__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev



__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [oslo] can we make everyone drop eventlet? (was: Can we stop global requirements update?)

2017-05-19 Thread Mike Bayer


FTFY



On 05/19/2017 03:58 PM, Joshua Harlow wrote:

Mehdi Abaakouk wrote:

Not really, I just put some comments on reviews and discus this on IRC.
Since nobody except Telemetry have expressed/try to get rid of eventlet.


Octavia is using cotyledon and they have gotten rid of eventlet. Didn't 
seem like it was that hard either to do it (of course the experience in 
how easy it was is likely not transferable to other projects...)


-Josh

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [oslo] Can we stop global requirements update?

2017-05-19 Thread Mike Bayer




On 05/19/2017 04:23 AM, Mehdi Abaakouk wrote:



And some applications rely

on implicit internal contract/behavior/assumption.


IMO that's a bug for them.I'm inspired to see that Keystone, Nova 
etc. are able to move between and eventlet backend and a mod_wsgi 
backend.IMO eventlet is really not needed for those services that 
present a REST interface.   Although for a message queue with lots of 
long-running connections that receive events, that's a place where I 
*would* want to use a polling / non-blocking model.  But I'd use it 
explicitly, not with monkeypatching.




Since a new API is needed, why not writing a new lib. Anyways when you
get rid of eventlet you have so many thing to change to ensure your
performance will not drop. 


While I don't know the specifics for your project(s), I don't buy that 
in general because IMO eventlet is not giving us any performance boost 
in the majority of cases.   most of our IO is blocking on the database 
and all the applications have DB connections throttled at about 50 per 
process at the max, and that's only recently, it used to be just 15.




Changing from oslo.service to cotyledon is

really easy on the side.


I'd ask why not oslo.cotyledon but it seems there's a faction here that 
is overall moving out of the Openstack umbrella in any case.





Docs state: "oslo.service being impossible to fix and bringing an 
heavy dependency on eventlet, "  is there a discussion thread on that?


Not really, I just put some comments on reviews and discus this on IRC.
Since nobody except Telemetry have expressed/try to get rid of eventlet.


Many (most?) of the web services can run under mod_wsgi with threads, 
Keystone seems to be standard on this now and I get the impression Nova 
is going in that direction too.(anyone correct me if I'm wrong on 
any of that, I looked to ask around on IRC but it's too late in the day).






For the story we first get rid of eventlet in Telemetry, fixes couple of
performance issue due to using threading/process instead
greenlet/greenthread.

Then we fall into some weird issue due to oslo.service internal
implementation. Process not exiting properly, signals not received,
deadlock when signal are received, unkillable process,
tooz/oslo.messaging heartbeat not scheduled correctly, worker not
restarted when they are dead. All of what we expect from oslo.service
was not working correctly anymore because we remove the line
'eventlet.monkeypatch()'.


So, I've used gevent more than eventlet in my own upstream non-blocking 
work, and while this opinion is like spilling water in the ocean, I 
think applications should never use monkeypatching.   They should call 
into the eventlet/gevent greenlet API directly if that's what they want 
to do.


Of course this means that database logic has to move out of greenlets 
entirely since none of the DBAPIs use non-blocking IO.  That's fine. 
Database-oriented business logic should not be in greenlets.I've 
written about this as well.If one is familiar enough with greenlets 
and threads you can write an application that makes explicit use of 
both.   However, that's application level stuff.   Web service apps like 
Nova conductor  / Neutron / Keystone should not be aware of any of that. 
  They should be coded to assume nothing about context switching.  IMO 
the threading model is "safer" to code towards since you have to handle 
locking and concurrency contingencies explicitly without hardwiring that 
to your assumptions about when context switching is to take place and 
when it's not.





For example, when oslo.service receive a signal, it can arrive on any
thread, this thread is paused, the callback is run in this thread
context, but if the callback try to discus to your code in this thread,
the process lockup, because your code is paused. Python
offers tool to avoid that (signal.set_wakeup_fd), but oslo.service don't
use it. I have tried to run callbacks only on the main thread with
set_wakeup_fd, to avoid this kind of issue but I fail. The whole
oslo.service code is clearly not designed to be threadsafe/signalsafe.
Well, it works for eventlet because you have only one real thread.

And this is just one example on complicated thing I have tried to fix,
before starting cotyledon.


I've no doubt oslo.service has major eventlet problems baked in, I've 
looked at it a little bit but didn't go too far with it.   That still 
doesn't mean there shouldn't be an "oslo.service2" that can effectively 
produce a concurrency-agnostic platform.It of course would have the 
goal in mind of moving projects off eventlet since as I mentioned, 
eventlet monkeypatching should not be used which means our services 
should do most of their "implicitly concurrent" work within threads.


Basically I think openstack should be getting off eventlet in a big way 
so I guess my sentiment here is that the Gnocchi / Cotyledon /etc. 
faction is just splitting off rather than serving as any kind of

Re: [openstack-dev] [nova] Boston Forum session recap - searchlight integration

2017-05-19 Thread Mike Bayer




On 05/19/2017 02:46 AM, joehuang wrote:

Support sort and pagination together will be the biggest challenge: it's up to 
how many cells will be involved in the query, 3,5 may be OK, you can search 
each cells, and cached data. But how about 20, 50 or more, and how many data 
will be cached?



I've talked to Matthew in Boston and I am also a little concerned about 
this.The approach involves trying to fetch just the smallest number 
of records possible from each backend, merging them as they come in, and 
then discarding the rest (unfetched) once there's enough for a page. 
But there is latency around invoking query before any results are 
received, and the database driver really wants to send out all the rows 
as well, not to mention the ORM (with configurability) wants to convert 
the whole set of rows received to objects, all has overhead.


To at least handle the problem of 50 connections that have all executed 
a statement and waiting on results, to parallelize that means there 
needs to be a threadpool , greenlet pool, or explicit non-blocking 
approach put in place.  The "thread pool" would be the approach that's 
possible, which with eventlet monkeypatching transparently becomes a 
greenlet pool.  But that's where this starts getting a little intense 
for something you want to do in the context of "a web request".   So I 
think the DB-based solution here is feasible but I'm a little skeptical 
of it at higher scale.   Usually, the search engine would be something 
pluggable, like, "SQL" or "searchlight".








More over, during the query there are instances operation( create, delete)  in 
parallel during the pagination/sort query, there is situation some cells may 
not provide response in time, or network connection broken, etc, many abnormal 
cases may happen. How to deal with some of cells abnormal query response is 
also one great factor to be considered.

It's not good idea to support pagination and sort at the same time (may not 
provide exactly the result end user want) if searchlight should not be 
integrated.

In fact in Tricircle, when query ports from neutron where tricircle central 
plugin is installed, the tricircle central plugin do the similar cross local 
Neutron ports query, and not support pagination/sort together.

Best Regards
Chaoyi Huang (joehuang)


From: Matt Riedemann [mriede...@gmail.com]
Sent: 19 May 2017 5:21
To: openstack-dev@lists.openstack.org
Subject: [openstack-dev] [nova] Boston Forum session recap - searchlight
integration

Hi everyone,

After previous summits where we had vertical tracks for Nova sessions I
would provide a recap for each session.

The Forum in Boston was a bit different, so here I'm only attempting to
recap the Forum sessions that I ran. Dan Smith led a session on Cells
v2, John Garbutt led several sessions on the VM and Baremetal platform
concept, and Sean Dague led sessions on hierarchical quotas and API
microversions, and I'm going to leave recaps for those sessions to them.

I'll do these one at a time in separate emails.


Using Searchlight to list instances across cells in nova-api


The etherpad for this session is here [1]. The goal for this session was
to explain the problem and proposed plan from the spec [2] to the
operators in the room and get feedback.

Polling the room we found that not many people are deploying Searchlight
but most everyone was using ElasticSearch.

An immediate concern that came up was the complexity involved with
integrating Searchlight, especially around issues with latency for state
changes and questioning how this does not redo the top-level cells v1
sync issue. It admittedly does to an extent, but we don't have all of
the weird side code paths with cells v1 and it should be self-healing.
Kris Lindgren noted that the instance.usage.exists periodic notification
from the computes hammers their notification bus; we suggested he report
a bug so we can fix that.

It was also noted that if data is corrupted in ElasticSearch or is out
of sync, you could re-sync that from nova to searchlight, however,
searchlight syncs up with nova via the compute REST API, which if the
compute REST API is using searchlight in the backend, you end up getting
into an infinite loop of broken. This could probably be fixed with
bypass query options in the compute API, but it's not a fun problem.

It was also suggested that we store a minimal set of data about
instances in the top-level nova API database's instance_mappings table,
where all we have today is the uuid. Anything that is set in the API
would probably be OK for this, but operators in the room noted that they
frequently need to filter instances by an IP, which is set in the
compute. So this option turns into a slippery slope, and is potentially
not inter-operable across clouds.

Matt Booth is also skeptical that we can't have a multi-cell query
perform well, and

Re: [openstack-dev] [oslo] Can we stop global requirements update?

2017-05-18 Thread Mike Bayer




On 05/18/2017 02:37 PM, Julien Danjou wrote:

On Thu, May 18 2017, Mike Bayer wrote:


I'm not understanding this?  do you mean this?


In the long run, yes. Unfortunately, we're not happy with the way Oslo
libraries are managed and too OpenStack centric. I've tried for the last
couple of years to move things on, but it's barely possible to deprecate
anything and contribute, so I feel it's safer to start fresh and better
alternative. Cotyledon by Mehdi is a good example of what can be
achieved.



here's cotyledon:

https://cotyledon.readthedocs.io/en/latest/


replaces oslo.service with a multiprocessing approach that doesn't use 
eventlet.  great!  any openstack service that rides on oslo.service 
would like to be able to transparently switch from eventlet to 
multiprocessing the same way they can more or less switch to mod_wsgi at 
the moment.IMO this should be part of oslo.service itself.   Docs 
state: "oslo.service being impossible to fix and bringing an heavy 
dependency on eventlet, "  is there a discussion thread on that?


I'm finding it hard to believe that only a few years ago, everyone saw 
the wisdom of not re-implementing everything in their own projects and 
using a common layer like oslo, and already that whole situation is 
becoming forgotten - not just for consistency, but also when a bug is 
found, if fixed in oslo it gets fixed for everyone.


An increase in the scope of oslo is essential to dealing with the issue 
of "complexity" in openstack.  The state of openstack as dozens of 
individual software projects each with their own idiosyncratic quirks, 
CLIs, process and deployment models, and everything else that is visible 
to operators is ground zero for perceived operator complexity.









Though to comment on your example, oslo.db is probably the most useful
Oslo library that Gnocchi depends on and that won't go away in a snap.
:-(



__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [oslo] Can we stop global requirements update?

2017-05-18 Thread Mike Bayer




On 05/16/2017 05:42 AM, Julien Danjou wrote:

On Wed, Apr 19 2017, Julien Danjou wrote:


So Gnocchi gate is all broken (agan) because it depends on "pbr" and
some new release of oslo.* depends on pbr!=2.1.0.


Same things happened today with Babel. As far as Gnocchi is concerned,
we're going to take the easiest route and remove all our oslo
dependencies over the next months for sanely maintained alternative at
this point.


I'm not understanding this?  do you mean this?

diff --git a/gnocchi/indexer/sqlalchemy.py b/gnocchi/indexer/sqlalchemy.py
index 3497b52..0ae99fd 100644
--- a/gnocchi/indexer/sqlalchemy.py
+++ b/gnocchi/indexer/sqlalchemy.py
@@ -22,11 +22,7 @@ import uuid

 from alembic import migration
 from alembic import operations
-import oslo_db.api
-from oslo_db import exception
-from oslo_db.sqlalchemy import enginefacade
-from oslo_db.sqlalchemy import utils as oslo_db_utils
-from oslo_log import log
+from ??? import ???
 try:
 import psycopg2
 except ImportError:








Cheers,



__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev



__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [tc] revised Postgresql support status patch for governance

2017-05-18 Thread Mike Bayer




On 05/17/2017 02:38 PM, Sean Dague wrote:


Some of the concerns/feedback has been "please describe things that are
harder by this being an abstraction", so examples are provided.


so let's go through this list:

- OpenStack services taking a more active role in managing the DBMS

, "managing" is vague to me, are we referring to the database 
service itself, e.g. starting / stopping / configuring?   installers 
like tripleo do this now, pacemaker is standard in HA for control of 
services, I think I need some background here as to what the more active 
role would look like.



- The ability to have zero down time upgrade for services such as
  Keystone.

So "zero down time upgrades" seems to have broken into:

* "expand / contract with the code carefully dancing around the 
existence of two schema concepts simultaneously", e.g. nova, neutron. 
AFAIK there is no particular issue supporting multiple backends on this 
because we use alembic or sqlalchemy-migrate to abstract away basic 
ALTER TABLE types of feature.


* "expand / contract using server side triggers to reconcile the two 
schema concepts", e.g. keystone.   This is more difficult because there 
is currently no "trigger" abstraction layer.   Triggers represent more 
of an imperative programming model vs. typical SQL,  which is why I've 
not taken on trying to build a one-size-fits-all abstraction for this in 
upstream Alembic or SQLAlchemy.   However, it is feasible to build a 
"one-size-that-fits-openstack-online-upgrades" abstraction.  I was 
trying to gauge interest in helping to create this back in the 
"triggers" thread, in my note at 
http://lists.openstack.org/pipermail/openstack-dev/2016-August/102345.html, 
which also referred to some very raw initial code examples.  However, it 
received strong pushback from a wide range of openstack veterans, which 
led me to believe this was not a thing that was happening.   Apparently 
Keystone has gone ahead and used triggers anyway, however I was not 
pulled into that process.   But if triggers are to be "blessed" by at 
least some projects, I can likely work on this problem for MySQL / 
Postgresql agnosticism.  If keystone is using triggers right now for 
online upgrades, I would ask, are they currently working on Postgresql 
as well with PG-specific triggers, or does Postgresql degrade into a 
"non-online" migration scenario if you're running Keystone?



- Consistent UTF8 4 & 5 byte support in our APIs

"5 byte support" appears to refer to utf-8's ability to be...well a 
total of 6 bytes.But in practice, unicode itself only needs 4 bytes 
and that is as far as any database supports right now since they target 
unicode (see https://en.wikipedia.org/wiki/UTF-8#Description).  That's 
all any database we're talking about supports at most.  So...lets assume 
this means four bytes.


From the perspective of database-agnosticism with regards to database 
and driver support for non-ascii characters, this problem has been 
solved by SQLAlchemy well before Python 3 existed when many DBAPIs would 
literally crash if they received a u'' string, and the rest of them 
would churn out garbage; SQLAlchemy implemented a full encode/decode 
layer on top of the Python DBAPI to fix this.  The situation is vastly 
improved now that all DBAPIs support unicode natively.


However, on the MySQL side there is this complexity that their utf-8 
support is a 3-byte only storage model, and you have to use utf8mb4 if 
you want the four byte model.   I'm not sure right now what projects are 
specifically hitting issues related to this.


Postgresql doesn't have such a limitation.   If your Postgresql server 
or specific database is set up for utf-8 (which should be the case), 
then you get full utf-8 character set support.


So I don't see the problem of "consistent utf8 support" having much to 
do with whether or not we support Posgtresql - you of course need your 
"CREATE DATABASE" to include the utf8 charset like we do on MySQL, but 
that's it.



- The requirement that Postgresql libraries are compiled for new users
  trying to just run unit tests (no equiv is true for mysql because of
  the pure python driver).

I would suggest that new developers for whom the presence of things like 
postgresql client libraries is a challenge (but somehow they are running 
a MySQL server for their pure python driver to talk to?)  don't actually 
have to worry about running the tests against Postgresql, this is how 
the "opportunistic" testing model in oslo.db has always worked; it only 
runs for the backends that you have set up.


Also, openstack got all the way through Kilo approximately using the 
native python-MySQL driver which required a compiled client library as 
well as the MySQL dependencies be installed.  The psycopg2 driver has a 
ton of whl's up on pypi (https://pypi.python.org/pypi/psycopg2) and all 
linux distros supply it as a package in any case, so an actual "compile" 
should not be needed.   Also, this is

Re: [openstack-dev] [Zun]Use 'uuid' instead of 'id' as object ident in data model

2017-04-06 Thread Mike Bayer




On 04/05/2017 11:02 AM, gordon chung wrote:



On 05/04/17 09:00 AM, Monty Taylor wrote:


Please do NOT use uuid as a primary key in MySQL:

* UUID has 36 characters which makes it bulky.


you can store it as a binary if space is a concern.


this is highly inconvenient from a datadump / MySQL commandline 
perspective.






* InnoDB stores data in the PRIMARY KEY order and all the secondary keys
also contain PRIMARY KEY. So having UUID as PRIMARY KEY makes the index
bigger which can not be fit into the memory
* Inserts are random and the data is scattered.


can store a ordered uuid (uuid1) for performance but arguably not much
diff from just autoincrement



In cases where data has a large natural key (like a uuid) It is
considered a best practice to use an auto-increment integer as the
primary key and to put a second column in the table to store the uuid,
potentially with a unique index applied to it for consistency.

That way the external identifier for things like gnocchi can still be
the UUID, but the internal id for the database can be an efficient
auto-increment primary key.


very good points. i guess ultimately should probably just test to the
scale you hope for


there's no advantage to the UUID being the physical primary key of the 
table.  If you don't care about the surrogate integer, just ignore it; 
it gets created for you.   The only argument I can see is that you 
really want to generate rows in Python that refer to the UUID of another 
row and you want that UUID to go straight into a foreign-key constrained 
column, in which case I'd urge you to instead use idiomatic SQLAlchemy 
ORM patterns for data manipulation (e.g. relationships).


The surrogate integer thing is the use case that all database engines 
are very well tested for and while it is not "pure" from Codd's point of 
view, it is definitely the most pragmatic approach from many different 
perspectives.





cheers,



__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [Zun]Use 'uuid' instead of 'id' as object ident in data model

2017-04-06 Thread Mike Bayer




On 04/05/2017 11:00 AM, Monty Taylor wrote:

On 04/05/2017 09:39 AM, Akihiro Motoki wrote:

I noticed this thread by Monty's reply. Sorry for my late :(

I think we need to think 'id' separately for API modeling and DB
modeling.

In the API perspective, one of the important things is that 'id' is
not predictable
and it rarely conflict. From this perspective, UUID works.

In the DB perspective, the context will be different.
Efficiency is another important point.
auto-incremental way brings us a good efficiency.

In most OpenStack projects, we use 'id' in a database as 'id' in an
API layer.
I am okay with using incremental integer as 'id' in DB, but I don't think
it is not a good idea to use predictable 'id' in the API layer.

I don't know how 'id' in API and DB layer are related in Zun
implementation
but I believe this is one of the important point.


Yes! Very well said. UUID is the excellent choice for API - auto-inc is
the excellent choice for the database.


+1

with primary key datatype, you also imply the datatype of columns 
constrained by foreign key as well, which itself usually gets indexed too.








2017-04-05 22:00 GMT+09:00 Monty Taylor :

On 02/21/2017 07:28 AM, gordon chung wrote:




On 21/02/17 01:28 AM, Qiming Teng wrote:


in mysql[2].


Can someone remind me the benefits we get from Integer over UUID as
primary key? UUID, as its name implies, is meant to be an
identifier for
a resource. Why are we generating integer key values?



this ^. use UUID please. you can google why auto increment is a
probably
not a good idea.

from a selfish pov, as gnocchi captures data on all resources in
openstack, we store everything as a uuid anyways. even if your id
doesn't clash in zun, it has a higher chance of clashing when you
consider all the other resources from other services.

cheers,



sorry - I just caught this.

Please do NOT use uuid as a primary key in MySQL:

* UUID has 36 characters which makes it bulky.
* InnoDB stores data in the PRIMARY KEY order and all the secondary keys
also contain PRIMARY KEY. So having UUID as PRIMARY KEY makes the index
bigger which can not be fit into the memory
* Inserts are random and the data is scattered.

In cases where data has a large natural key (like a uuid) It is
considered a
best practice to use an auto-increment integer as the primary key and
to put
a second column in the table to store the uuid, potentially with a
unique
index applied to it for consistency.

That way the external identifier for things like gnocchi can still be
the
UUID, but the internal id for the database can be an efficient
auto-increment primary key.



__

OpenStack Development Mailing List (not for usage questions)
Unsubscribe:
openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


__

OpenStack Development Mailing List (not for usage questions)
Unsubscribe:
openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev




__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [oslo][requirements][all] requesting assistance to unblock SQLAlchemy 1.1 from requirements

2017-03-15 Thread Mike Bayer



On 03/15/2017 11:42 AM, Sean Dague wrote:

Perhaps, but in doing so oslo.db is going to get the pin and uc from
stable/ocata, which is going to force it back to SQLA < 1.1, which will
prevent oslo.db changes that require >= 1.1 to work.


so do we want to make that job non-voting or something like that?





-Sean

On 03/15/2017 11:26 AM, Roman Podoliaka wrote:

Isn't the purpose of that specific job -
gate-tempest-dsvm-neutron-src-oslo.db-ubuntu-xenial-ocata - to test a
change to the library master branch with stable releases (i.e. Ocata)
- of all other components?

On Wed, Mar 15, 2017 at 5:20 PM, Sean Dague <s...@dague.net> wrote:

On 03/15/2017 10:38 AM, Mike Bayer wrote:



On 03/15/2017 07:30 AM, Sean Dague wrote:


The problem was the original patch kept a cap on SQLA, just moved it up
to the next pre-release, not realizing the caps in general are the
concern by the requirements team. So instead of upping the cap, I just
removed it entirely. (It also didn't help on clarity that there was a
completely unrelated fail in the tests which made it look like the
system was stopping this.)

This should hopefully let new SQLA releases very naturally filter out to
all our services and libraries.

-Sean



so the failure I'm seeing now is *probably* one I saw earlier when we
tried to do this, the tempest run fails on trying to run a keystone
request, but I can't find the same error in the logs this time.

In an earlier build of https://review.openstack.org/#/c/423192/, we saw
this:

ContextualVersionConflict: (SQLAlchemy 1.1.5
(/usr/local/lib/python2.7/dist-packages),
Requirement.parse('SQLAlchemy<1.1.0,>=1.0.10'), set(['oslo.db',
'keystone']))

stack trace was in the apache log:  http://paste.openstack.org/show/601583/


but now on our own oslo.db build, the same jobs are failing and are
halting at keystone, but I can't find any error:

the failure is:


http://logs.openstack.org/30/445930/1/check/gate-tempest-dsvm-neutron-src-oslo.db-ubuntu-xenial-ocata/815962d/


and is on:  https://review.openstack.org/#/c/445930/


if someone w/ tempest expertise could help with this that would be great.


It looks like oslo.db master is being used with ocata services?
http://logs.openstack.org/30/445930/1/check/gate-tempest-dsvm-neutron-src-oslo.db-ubuntu-xenial-ocata/815962d/logs/devstacklog.txt.gz#_2017-03-15_13_10_52_434


I suspect that's the root issue. That should be stable/ocata branch, right?

-Sean

--
Sean Dague
http://dague.net

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev






__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [oslo][requirements][all] requesting assistance to unblock SQLAlchemy 1.1 from requirements

2017-03-15 Thread Mike Bayer




On 03/15/2017 07:30 AM, Sean Dague wrote:


The problem was the original patch kept a cap on SQLA, just moved it up
to the next pre-release, not realizing the caps in general are the
concern by the requirements team. So instead of upping the cap, I just
removed it entirely. (It also didn't help on clarity that there was a
completely unrelated fail in the tests which made it look like the
system was stopping this.)

This should hopefully let new SQLA releases very naturally filter out to
all our services and libraries.

-Sean



so the failure I'm seeing now is *probably* one I saw earlier when we 
tried to do this, the tempest run fails on trying to run a keystone 
request, but I can't find the same error in the logs this time.


In an earlier build of https://review.openstack.org/#/c/423192/, we saw 
this:


ContextualVersionConflict: (SQLAlchemy 1.1.5 
(/usr/local/lib/python2.7/dist-packages), 
Requirement.parse('SQLAlchemy<1.1.0,>=1.0.10'), set(['oslo.db', 
'keystone']))


stack trace was in the apache log:  http://paste.openstack.org/show/601583/


but now on our own oslo.db build, the same jobs are failing and are 
halting at keystone, but I can't find any error:


the failure is:


http://logs.openstack.org/30/445930/1/check/gate-tempest-dsvm-neutron-src-oslo.db-ubuntu-xenial-ocata/815962d/ 



and is on:  https://review.openstack.org/#/c/445930/


if someone w/ tempest expertise could help with this that would be great.

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

[openstack-dev] [oslo][requirements][all] requesting assistance to unblock SQLAlchemy 1.1 from requirements

2017-03-14 Thread Mike Bayer


Hello all -

As mentioned previously, SQLAlchemy 1.1 has now been released for about 
six months.   My work now is on SQLAlchemy 1.2 which should hopefully 
see initial releases in late spring.SQLAlchemy 1.1 includes tons of 
features, bugfixes, and improvements, and in particular the most recent 
versions contain some critical performance improvements focused around 
the "joined eager loading" feature, most typically encountered when an 
application makes many, many queries for small, single-row result sets 
with lots of joined eager loading.   In other words, exactly the kinds 
of queries that Openstack applications do a lot; the fixes here were 
identified as a direct result of Neutron query profiling by myself and a 
few other contributors.


For many weeks now, various patches to attempt to bump requirements for 
SQLAlchemy 1.1 have been languishing with little interest, and I do not 
have enough knowledge of the requirements system to get exactly the 
correct patch that will accomplish the goal (nor do others).  The 
current gerrit is at https://review.openstack.org/#/c/423192/, where you 
can see that not just me, but a bunch of folks, have no idea what 
incantations we need to put here that will make this happen.  Tony 
Breeds has chimed in thusly:



To get this in we'll need to remove the cap in global-requirements
*and* at the same time add a heap of entries to 
upper-constratints-xfails.txt. this will allow us to merge the cap 
removal and keep the constraint in the 1.0 family while we wait for the 
requirements sync to propagate out.


I'm not readily familiar with what goes into upper-constraints-xfails 
and this file does not appear to be documented in common places like 
https://wiki.openstack.org/wiki/Requirements or 
https://git.openstack.org/cgit/openstack/requirements/tree/README.rst .


I'm asking on the list here for some assistance in moving this forward. 
SQLAlchemy development these days is closely attuned to the needs of 
Openstack now, a series of Openstack test suites are part of 
SQLAlchemy's own CI servers to ensure backwards compatibility with all 
changes, and 1.2 will have even more features that are directly 
consumable by oslo.db (features everyone will want, I promise you). 
Being able to bump requirements across Openstack so that new versions 
can be tested and integrated in a timely manner would be very helpful.


Thanks for reading!


__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [neutron] Some findings while profiling instances boot

2017-02-16 Thread Mike Bayer




On 02/15/2017 12:46 PM, Daniel Alvarez Sanchez wrote:

Also, while having a look at server profiling, around the 33% of the
time was spent building SQL queries [1]. Mike Bayer went through this
and suggested having a look at baked queries and also submitted a sketch
of his proposal [2].


Neutron relies heavily on a big JOIN query that returns just one row. 
In the profiling, it seemed like joined eager loading overhead is 
significant.  Someone independently opened an upstream issue at 
https://bitbucket.org/zzzeek/sqlalchemy/issues/3915/performance-degradation-on-version-10xx#comment-34442856 
with similar comments.


While the "baked" query thing is the ultimate hammer for "ORM SQL 
building" overhead, it's a very heavy hammer to swing as folks will note 
in the gerrit that shows roughly how it would look, it's involved and 
not that easy to work with.


Fortunately, the joined eager load codepaths here have never been 
optimized for the "many short queries" use case, and a large portion of 
the overhead is all rolled up into some SQL alias objects that can be 
memoized so that most of the work they do happens once, instead of 
thousands of times.


In https://gerrit.sqlalchemy.org/311  (note this is SQLAlchemy's gerrit, 
not openstack's) I have a patch that reduces the overhead associated 
specifically with joined eager loaded entities by around 270% for a 
worst-case scenario (which Neutron seems to be close to).  If those 
folks running the load tests can please try this revision out and see if 
it makes a dent, that would be helpful.


Note that SQLAlchemy 1.1 has been out for about five months now, and 
it's time that Openstack move up to 1.1 series - that's where the 
performance enhancement will be.






I wanted to share these findings with you (probably most of you knew but
I'm quite new to OpenStack so It's been a really nice exercise for me to
better understand how things work) and gather your feedback about how
things can be improved. Also, I'll be happy to share the results and
discuss further if you think it's worth during the PTG next week.

Thanks a lot for reading and apologies for such a long email!

Cheers,
Daniel
IRC: dalvarez

[0] http://imgur.com/WQqaiYQ
[1] http://imgur.com/6KrfJUC
[2] https://review.openstack.org/430973


__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev



__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [oslo][oslo.db] MySQL Cluster support

2017-02-06 Thread Mike Bayer

On 02/06/2017 01:28 PM, Octave J. Orgeron wrote:

Hi Mike,

I've had a chance to look through the links you provided. I do think
this is a rather heavy solution that would be more suited if there were
actually significant dialect features to override from MySQL. MySQL and
NDB use the same dialect and the differences really just come down to
operation ordering, no support for savepoints, and no support for nested
transactions. Even if you tried to do those operations today, SQL
Alchemy is able to throw back appropriate errors telling you that you're
doing something wrong or that the feature isn't supported. If we go down
this path, we really only buy two things:

* Ability to use the with_variant for setting column types.
* Do some logic based on the selected dialect, which we would probably
still have to set in oslo.db anyways as the hook.

It doesn't solve the issue of proper ordering of FK, constraints, or
index operations. It doesn't remove the need to do variable
substitutions where things are hard coded. And it doesn't resolve the
issues where we have to intercept savepoints and nested transactions. It
looks like the only major impact it would have is to reduce the number
of if/then logic blocks in the SQL Alchemy and Alembic migration scripts.

But what does it cost to do this? Would the dialect be rolled into SQL
Alchemy for the community, or would it be a separate plugin like
Redshifts? Is it easier to maintain just the patches? Or would it mean
more overhead for me to support the patches and the ndb dialect? I'd
like to keep the overhead simple since it's just me at this point
working on this.

you are probably right that it's not worth it, if you are definitely
sure this is the extent of the changes.

if you could please post an example of "proper ordering of FK,
constraints, indexes" that would be helpful.

So what I propose is that I'll update my patches for keystone and cinder
next and post those for gerrit review. That will give folks a view into
what the patches will look like and we can figure out if we want to
change the approach. I'm also going to create a spec and blueprint to
cover the changes across the services. I'll post links once all of that
is up for review.

Thanks,
Octave

On 2/6/2017 7:53 AM, Mike Bayer wrote:

On 02/03/2017 11:59 AM, Octave J. Orgeron wrote:

Hi Mike,

Comments below..

Thanks,
Octave

On 2/3/2017 7:41 AM, Mike Bayer wrote:

On 02/02/2017 05:28 PM, Octave J. Orgeron wrote:

That refers to the total length of the row. InnoDB has a limit of 65k
and NDB is limited to 14k.

A simple example would be the volumes table in Cinder where the row
length goes beyond 14k. So in the IF logic block, I change columns
types
that are vastly oversized such as status and attach_status, which by
default are 255 chars.

let me give you a tip on IF blocks, that they are a bit of an
anti-pattern. If you want a column type to do one thing in one case,
and another in another case, create an object that does the thing you
want:

some_table = Table(
'some_table', metadata,
Column('my_column', VARCHAR(255).with_variant(VARCHAR(50), 'ndb'))
)

I think we might want to look into creating a stub dialect called
'ndb' that subclasses mysql+pymysql. Treating ndb as a whole
different database means there's no longer the need for a flag in
oslo.db, the 'ndb' name would instead be interpreted as a new backend
- the main thing would be ensuring all the mysql-appropriate hooks in
oslo.db are also emitted for ndb, but this also gives us a way to pick
and choose which hooks apply. It seems like there may be enough
different about it to separate it at this level.

Not sure if people on the list are seeing that we are simultaneously
talking about getting rid of Postgresql in the efforts to support only
"one database", while at the same time adding one that is in many ways
a new database.

This is an interesting approach as it would significantly reduce the
amount of code in my patches today. Do you have any pointers on where
this should be implemented as a stub? Would we have to take different
approaches for SQL Alchemy vs. Alembic?

there are simple plugin points for both libraries.

One of the popular 3rd party dialects right now is the
sqlalchemy-redshift dialect, which similarly to a lot of these
dialects is one that acts 95% like a "normal" dialect, in this case
postgresql, however various elements are overridden to provide
compatibility with Amazon's redshift. The overlay of an NDB style
dialect on top of mysql would be a similar idea.The SQLAlchemy
plugin point consists of a setuptools entrypoint (see
https://github.com/sqlalchemy-redshift/sqlalchemy-redshift/blob/master/setup.py#L40
,
https://github.com/sqlalchemy-redshift/sqlalchemy-redshift/blob/master/sqlalchemy_redshift/dialect.py#L315)
and for Alembic, once the dialect is imported you define a special
Alembic class so that Alembic sees the engine nam

Re: [openstack-dev] [oslo][oslo.db] MySQL Cluster support

2017-02-06 Thread Mike Bayer

On 02/03/2017 11:59 AM, Octave J. Orgeron wrote:

Hi Mike,

Comments below..

Thanks,
Octave

On 2/3/2017 7:41 AM, Mike Bayer wrote:

On 02/02/2017 05:28 PM, Octave J. Orgeron wrote:

That refers to the total length of the row. InnoDB has a limit of 65k
and NDB is limited to 14k.

A simple example would be the volumes table in Cinder where the row
length goes beyond 14k. So in the IF logic block, I change columns types
that are vastly oversized such as status and attach_status, which by
default are 255 chars.

some_table = Table(
'some_table', metadata,
Column('my_column', VARCHAR(255).with_variant(VARCHAR(50), 'ndb'))
)

there are simple plugin points for both libraries.

One of the popular 3rd party dialects right now is the
sqlalchemy-redshift dialect, which similarly to a lot of these dialects
is one that acts 95% like a "normal" dialect, in this case postgresql,
however various elements are overridden to provide compatibility with
Amazon's redshift. The overlay of an NDB style dialect on top of
mysql would be a similar idea.The SQLAlchemy plugin point consists
of a setuptools entrypoint (see
https://github.com/sqlalchemy-redshift/sqlalchemy-redshift/blob/master/setup.py#L40
,
https://github.com/sqlalchemy-redshift/sqlalchemy-redshift/blob/master/sqlalchemy_redshift/dialect.py#L315)
and for Alembic, once the dialect is imported you define a special
Alembic class so that Alembic sees the engine name also (see
https://github.com/sqlalchemy-redshift/sqlalchemy-redshift/blob/master/sqlalchemy_redshift/dialect.py#L19).

In this case the NDB dialect seems like it may be a little bit of a
heavy solution but it would solve lots of issues like the "mysql_engine"
flag would no longer be in conflict, special datatypes and naming
schemes can be pulled in, etc. It would at least allow conditionals
against "ndb" in Openstack projects to switch on the same kind of
criteria that they already do for sqlite/postgresql/mysql.

It is possible for the ndb "stub dialect" to be at least temporarily
within oslo.db, however the way to go about this would be to start
getting ndb working as a proof of concept in terms of gerrit reviews.
that is, propose reviews to multiple projects and work at that level,
without actually merging anything. We don't merge anything until it's
actually "done" as a tested and working feature / fix.

Oracle <http://www.oracle.com/>
Octave J. Orgeron | Sr. Principal Architect and Software Engineer
Oracle Linux OpenStack
Mobile: +1-720-616-1550 <tel:+17206161550>
500 Eldorado Blvd. | Broomfield, CO 80021
Certified Oracle Enterprise Architect: Systems Infrastructure
<http://www.oracle.com/us/solutions/enterprise-architecture/index.html>
Green Oracle <http://www.oracle.com/commitment> Oracle is committed to
developing practices and products that help protect the environment

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [oslo][oslo.db] MySQL Cluster support

2017-02-03 Thread Mike Bayer




On 02/03/2017 10:21 AM, Doug Hellmann wrote:

Excerpts from Mike Bayer's message of 2017-02-03 09:41:11 -0500:


On 02/02/2017 05:28 PM, Octave J. Orgeron wrote:

That refers to the total length of the row. InnoDB has a limit of 65k
and NDB is limited to 14k.

A simple example would be the volumes table in Cinder where the row
length goes beyond 14k. So in the IF logic block, I change columns types
that are vastly oversized such as status and attach_status, which by
default are 255 chars.



let me give you a tip on IF blocks, that they are a bit of an
anti-pattern.  If you want a column type to do one thing in one case,
and another in another case, create an object that does the thing you want:


some_table = Table(
 'some_table', metadata,
 Column('my_column', VARCHAR(255).with_variant(VARCHAR(50), 'ndb'))
)


I wonder if we want to do either, though. Shouldn't we try to use
the same (smaller) column size all the time? Otherwise we end up
with another incompatibility between different deployments, since
sometimes things like names might have different sizes in different
clouds.


in that case you have to do a migration which as you know these days 
means the "old" column remains for a whole release cycle and the 
application must undergo significant complexity, either at the app level 
or in triggers, to keep data between "old" and "new" columns 
simultaneously.   So one advantage to keeping this at the "create for 
NDB" level is that we don't need to get into schema migrations.


Unless we changed the value in the application and its migration files 
completely, and *didnt* migrate old applications, and just hope/ensure 
that they aren't writing larger data values.   Maybe that's possible 
though it seems a little scary.   Perhaps some kind of annotated type 
like VARCHAR(50, unmigrated=255) to note what's going on.








I think we might want to look into creating a stub dialect called 'ndb'
that subclasses mysql+pymysql.   Treating ndb as a whole different
database means there's no longer the need for a flag in oslo.db, the
'ndb' name would instead be interpreted as a new backend - the main
thing would be ensuring all the mysql-appropriate hooks in oslo.db are
also emitted for ndb, but this also gives us a way to pick and choose
which hooks apply.   It seems like there may be enough different about
it to separate it at this level.

Not sure if people on the list are seeing that we are simultaneously
talking about getting rid of Postgresql in the efforts to support only
"one database", while at the same time adding one that is in many ways a
new database.


Yes, that does seem a bit ironic. That's also why I was pointing
out that we're going to want to have people lined up to support the
work before starting. The lack of help with Postresql testing
resulted in removing it from the gate, and possibly to dropping
support entirely.

For reference, the discussion in [1] led to this proposed TC
resolution [2].

[1] 
http://lists.openstack.org/pipermail/openstack-dev/2017-February/thread.html#111357
[2] https://review.openstack.org/427880






So to determine a more appropriate size, I look

through the Cinder code to find where the possible options/states are
for those columns. Then I cut it down to a more reasonable size. I'm
very careful when I cut the size of a string column to ensure that all
of the possible values can be contained.

In cases where a column is extremely large for capturing the outputs of
a command, I will change the type to Text or TinyText depending on the
length required. A good example of this is in the agents table of
Neutron where there is a column for configurations that has a string
length of 4096 characters, which I change to Text. Text blobs are stored
differently and do not count against the row length.

I've also observed differences between Kilo, Mitaka, and tip where even
for InnoDB some of these tables are getting wider than can be supported.
So in the case of Cinder, some of the columns have been shifted to
separate tables to fit within 65k. I've seen the same thing in Neutron.
So I fully expect that some of the services that have table bloat will
have to cut the lengths or break the tables up over time anyways. As
that happens, it reduces the amount of work for me, which is a good thing.

The most complicated database schemas to patch up are cinder, glance,
neutron, and nova due to the size and complexity of their tables. Those
also have a lot of churn between releases where the schema changes more
often. Other services like keystone, heat, and ironic are considerably
easier to work with and have well laid out tables that don't change much.

Thanks,
Octave

On 2/2/2017 1:25 PM, Mike Bayer wrote:



On 02/02/2017 02:52 PM, Mike Bayer wrote:


But more critically I noticed you referred to altering the names of
columns to suit NDB.  How will this be accomplished?   Changing a column
name in an openstac

Re: [openstack-dev] [oslo][oslo.db] MySQL Cluster support

2017-02-03 Thread Mike Bayer




On 02/02/2017 05:28 PM, Octave J. Orgeron wrote:

That refers to the total length of the row. InnoDB has a limit of 65k
and NDB is limited to 14k.

A simple example would be the volumes table in Cinder where the row
length goes beyond 14k. So in the IF logic block, I change columns types
that are vastly oversized such as status and attach_status, which by
default are 255 chars.



let me give you a tip on IF blocks, that they are a bit of an 
anti-pattern.  If you want a column type to do one thing in one case, 
and another in another case, create an object that does the thing you want:



some_table = Table(
'some_table', metadata,
Column('my_column', VARCHAR(255).with_variant(VARCHAR(50), 'ndb'))
)


I think we might want to look into creating a stub dialect called 'ndb' 
that subclasses mysql+pymysql.   Treating ndb as a whole different 
database means there's no longer the need for a flag in oslo.db, the 
'ndb' name would instead be interpreted as a new backend - the main 
thing would be ensuring all the mysql-appropriate hooks in oslo.db are 
also emitted for ndb, but this also gives us a way to pick and choose 
which hooks apply.   It seems like there may be enough different about 
it to separate it at this level.


Not sure if people on the list are seeing that we are simultaneously 
talking about getting rid of Postgresql in the efforts to support only 
"one database", while at the same time adding one that is in many ways a 
new database.





So to determine a more appropriate size, I look

through the Cinder code to find where the possible options/states are
for those columns. Then I cut it down to a more reasonable size. I'm
very careful when I cut the size of a string column to ensure that all
of the possible values can be contained.

In cases where a column is extremely large for capturing the outputs of
a command, I will change the type to Text or TinyText depending on the
length required. A good example of this is in the agents table of
Neutron where there is a column for configurations that has a string
length of 4096 characters, which I change to Text. Text blobs are stored
differently and do not count against the row length.

I've also observed differences between Kilo, Mitaka, and tip where even
for InnoDB some of these tables are getting wider than can be supported.
So in the case of Cinder, some of the columns have been shifted to
separate tables to fit within 65k. I've seen the same thing in Neutron.
So I fully expect that some of the services that have table bloat will
have to cut the lengths or break the tables up over time anyways. As
that happens, it reduces the amount of work for me, which is a good thing.

The most complicated database schemas to patch up are cinder, glance,
neutron, and nova due to the size and complexity of their tables. Those
also have a lot of churn between releases where the schema changes more
often. Other services like keystone, heat, and ironic are considerably
easier to work with and have well laid out tables that don't change much.

Thanks,
Octave

On 2/2/2017 1:25 PM, Mike Bayer wrote:



On 02/02/2017 02:52 PM, Mike Bayer wrote:


But more critically I noticed you referred to altering the names of
columns to suit NDB.  How will this be accomplished?   Changing a column
name in an openstack application is no longer trivial, because online
upgrades must be supported for applications like Nova and Neutron.  A
column name can't just change to a new name, both columns have to exist
and logic must be added to keep these columns synchronized.



correction, the phrase was "Row character length limits 65k -> 14k" -
does this refer to the total size of a row?  I guess rows that store
JSON or tables like keystone tokens are what you had in mind here, can
you give specifics ?



__

OpenStack Development Mailing List (not for usage questions)
Unsubscribe:
openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev



--

Oracle <http://www.oracle.com/>
Octave J. Orgeron | Sr. Principal Architect and Software Engineer
Oracle Linux OpenStack
Mobile: +1-720-616-1550 <tel:+17206161550>
500 Eldorado Blvd. | Broomfield, CO 80021
Certified Oracle Enterprise Architect: Systems Infrastructure
<http://www.oracle.com/us/solutions/enterprise-architecture/index.html>
Green Oracle <http://www.oracle.com/commitment> Oracle is committed to
developing practices and products that help protect the environment



__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev



__
OpenStack De

Re: [openstack-dev] [oslo][oslo.db] MySQL Cluster support

2017-02-02 Thread Mike Bayer




On 02/02/2017 02:52 PM, Mike Bayer wrote:


But more critically I noticed you referred to altering the names of
columns to suit NDB.  How will this be accomplished?   Changing a column
name in an openstack application is no longer trivial, because online
upgrades must be supported for applications like Nova and Neutron.  A
column name can't just change to a new name, both columns have to exist
and logic must be added to keep these columns synchronized.



correction, the phrase was "Row character length limits 65k -> 14k" - 
does this refer to the total size of a row?  I guess rows that store 
JSON or tables like keystone tokens are what you had in mind here, can 
you give specifics ?




__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [oslo][oslo.db] MySQL Cluster support

2017-02-02 Thread Mike Bayer




On 02/02/2017 02:16 PM, Octave J. Orgeron wrote:

Hi Doug,

Comments below..

Thanks,
Octave

On 2/2/2017 11:27 AM, Doug Hellmann wrote:

It sounds like part of the plan is to use the configuration setting
to control how the migration scripts create tables. How will that
work? Does each migration need custom logic, or can we build helpers
into oslo.db somehow? Or will the option be passed to the database
to change its behavior transparently?


These are good questions. For each service, when the db sync or db
manage operation is done it will call into SQL Alchemy or Alembic
depending on the methods used by the given service. For example, most
use SQL Alchemy, but there are services like Ironic and Neutron that use
Alembic. It is within these scripts under the /db/* hierarchy
that the logic exist today to configure the database schema for any
given service. Both approaches will look at the schema version in the
database to determine where to start the create, upgrade, heal, etc.
operations. What my patches do is that in the scripts where a table
needs to be modified, there will be custom IF/THEN logic to check the
cfg.CONF.database.mysql_storage_engine setting to make the required
modifications. There are also use cases where the api.py or model(s).py
under the /db/ hierarchy needs to look at this setting as well
for API and CLI operations where mysql_engine is auto-inserted into DB
operations. In those use cases, I replace the hard coded "InnoDB" with
the mysql_storage_engine variable.


can you please clarify "replace the hard coded "InnoDB" " ?Are you 
proposing to send reviews for patches against all occurrences of 
"InnoDB" in files like 
https://github.com/openstack/nova/blob/master/nova/db/sqlalchemy/migrate_repo/versions/216_havana.py 
?The "InnoDB" keyword is hardcoded in hundreds of migration files 
across all openstack projects that use MySQL.   Are all of these going 
to be patched with some kind of conditional?





It would be interesting if we could develop some helpers to automate
this, but it would probably have to be at the SQL Alchemy or Alembic
levels.


not really, you can build a hook that intercepts operations like 
CreateTable, or that intercepts SQL as it is emitted over a connection, 
in order to modify these values on the fly.  But that is a specific kind 
of approach with it's own set of surprises.   Alternatively you can make 
an alternate SQLAlchemy dialect that no longer recognizes "mysql_*" as 
the prefix for these arguments.   There's ways to do this part.


But more critically I noticed you referred to altering the names of 
columns to suit NDB.  How will this be accomplished?   Changing a column 
name in an openstack application is no longer trivial, because online 
upgrades must be supported for applications like Nova and Neutron.  A 
column name can't just change to a new name, both columns have to exist 
and logic must be added to keep these columns synchronized.


Unfortunately, throughout all of the OpenStack services today we

are hard coding things like mysql_engine, using InnoDB specific features
(savepoints, nested operations, etc.), and not following the strict SQL
orders for modifying table elements (foreign keys, constraints, and
indexes).


Savepoints aren't InnoDB specific, they are a standard SQL feature and 
also their use is not widespread right now.   I'm not sure what you mean 
by "the strict SQL orders", we use ALTER TABLE as is standard in MySQL 
for this and it's behind an abstraction layer that supports other 
databases such as Postgresql.





  * Many of the SQL Alchemy and Alembic scripts only import the minimal
set of python modules. If we imported others, we would also have to
initialize those name spaces which means a lot more code :(


I'm not sure what this means, can you clarify ?


   * Reduces the amount of overhead required to make these changes.

What sort of "overhead", do you mean code complexity, performance ?








Keep in mind that we do not encourage code outside of libraries to
rely on configuration settings defined within libraries, because
that limits our ability to change the names and locations of the
configuration variables.  If migration scripts need to access the
configuration setting we will need to add some sort of public API
to oslo.db to query the value. The function can simply return the
configured value.


Configuration parameters within any given service will make use of a
large namespace that pulls in things from oslo and the .conf files for a
given service. So even when an API, CLI, or DB related call is made,
these namespaces are key for things to work. In the case of the SQL
Alchemy and Alembic scripts, they also make use of this namespace with
oslo, oslo.db, etc. to figure out how to connect to the database and
other database settings. I don't think we need a public API for these
kinds of calls as the community already makes use of the libraries to
build the namespace. My oslo.db setting

Re: [openstack-dev] [nova][ceilometer][postgresql][gate][telemetry] PostgreSQL gate failure (again)

2017-02-02 Thread Mike Bayer




On 02/02/2017 11:42 AM, Sean Dague wrote:


That's all fine and good, we just need to rewrite about 100,000 unit
tests to do that. I'm totally cool with someone taking that task on, but
making a decision about postgresql shouldn't be filibustered on
rewriting all the unit tests in OpenStack because of the ways we use sqlite.


two points:

first is, you don't need to rewrite any tests, just reorganize the 
fixtures.   This is all done top-level and I've always been trying to 
get people to standardize on oslo-db built-in fixtures more anyway, this 
would ultimately make that all easier.   We would need to use an 
efficient process for tearing down of data within a schema so that tests 
can run quickly without schema rebuilds, this is all under the realm of 
"roll back the transaction" testing which oslo.db supports though nobody 
is using this right now.   It would be a big change in how things run 
and there'd be individual issues to fix but it's not a rewrite of actual 
tests.   I am not in any hurry to do any of this.


second is, I'm not a "filibuster" vote at all :).   I'm like the least 
important person in the decision chain here and I didn't even -1 the 
proposal.Deprecating Postgresql alone and doing nothing else is 
definitely very easy and would make development simpler, whereas getting 
rid of SQLite would be a much bigger job.  I'm just pointing out that we 
shouldn't pretend we "target only one database" until we get rid of 
SQLite in our test suites.



OK, third bonus point.   If we do drop postgresql support, to the degree 
that we really remove it totally from test fixtures, oslo.db 
architectures, all of that, the codebase would probably become 
mysql-specific in subtle and not-so-subtle ways pretty quickly, and 
within a few cycles we should consider that we probably will never be 
able to target multiple databases again without a much larger 
"unwinding" effort.   So while not worrying about Postgresql is handy, I 
would miss the fact that targeting two real DBs keeps us honest in terms 
of being able to target multiple databases at all, because this is a 
door that once you close we're not going to be able to open again. I 
doubt that in oslo.db itself we would realistically ever drop the 
architectures that support multiple databases, though, and as oslo.db is 
a pretty simple library it should likely continue to target postgresql 
as nothing more than a "keeping things honest" sanity check.










-Sean



__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [oslo][oslo.db] MySQL Cluster support

2017-02-02 Thread Mike Bayer




On 02/02/2017 10:25 AM, Monty Taylor wrote:

On 02/01/2017 09:33 PM, Octave J. Orgeron wrote:

Hi Folks,

I'm working on adding support for MySQL Cluster to the core OpenStack
services. This will enable the community to benefit from an
active/active, auto-sharding, and scale-out MySQL database. My approach
is to have a single configuration setting in each core OpenStack service
in the oslo.db configuration section called mysql_storage_engine that
will enable the logic in the SQL Alchemy or Alembic upgrade scripts to
handle the differences between InnoDB and NDB storage engines
respectively. When enabled, this logic will make the required table
schema changes around:

  * Row character length limits 65k -> 14k
  * Proper SQL ordering of foreign key, constraints, and index operations
  * Interception of savepoint and nested operations

By default this functionality will not be enabled and will have no
impact on the default InnoDB functionality. These changes have been
tested on Kilo and Mitaka in previous releases of our OpenStack
distributions with Tempest. I'm working on updating these patches for
upstream consumption. We are also working on a 3rd party CI for
regression testing against MySQL Cluster for the community.

The first change set is for oslo.db and can be reviewed at:

https://review.openstack.org/427970


Yay!

(You may not be aware, but there are several of us who used to be on the
MySQL Cluster team who are now on OpenStack. I've been wanting good NDB
support for a while. So thank you!)


as I noted on the review it would be nice to have some specifics of how 
this is to be accomplished as the code review posted doesn't show 
anything of how this would work.








__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev



__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [nova][ceilometer][postgresql][gate][telemetry] PostgreSQL gate failure (again)

2017-02-02 Thread Mike Bayer




On 02/01/2017 10:22 AM, Monty Taylor wrote:


I personally continue to be of the opinion that without an explicit
vocal and well-staffed champion, supporting postgres is more trouble
than it is worth. The vast majority of OpenStack deployments are on
MySQL - and what's more, the code is written with MySQL in mind.
Postgres and MySQL have different trade offs, different things each are
good at and different places in which each has weakness. By attempting
to support Postgres AND MySQL, we prevent ourselves from focusing
adequate attention on making sure that our support for one of them is
top-notch and in keeping with best practices for that database.

So let me state my opinion slightly differently. I think we should
support one and only one RDBMS backend for OpenStack, and we should open
ourselves up to use advanced techniques for that backend. I don't
actually care whether that DB is MySQL or Postgres - but the corpus of
existing deployments on MySQL and the existing gate jobs I think make
the choice one way or the other simple.



well, let me blow your mind and agree, but noting that this means, *we 
drop SQLite also*.   IMO every openstack developer should have 
MySQL/MariaDB running on their machine and that is part of what runs if 
you expect to run database-related unit tests.   Targeting just one 
database is very handy but if you really want to use the features 
without roadblocks, you need to go all the way.









I agree that ceilometer should not be providing Postgres testing for the
rest of OpenStack.

Monty


__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev



__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [All projects that use Alembic] Absence of pk on alembic_version table

2017-01-30 Thread Mike Bayer

 data loss or corruption.

If OS wants support Galera, it needs to comply with the Galera
requirements.

On Mon, Jan 23, 2017 at 9:59 PM, Ihar Hrachyshka <ihrac...@redhat.com
<mailto:ihrac...@redhat.com>> wrote:

An alternative could also be, for Newton and earlier, to release a
note saying that operators should not run the code against ENFORCING
galera mode. What are the reasons to enable that mode in OpenStack
scope that would not allow operators to live without it for another
cycle?

Ihar

On Mon, Jan 23, 2017 at 10:12 AM, Anna Taraday
<akamyshnik...@mirantis.com <mailto:akamyshnik...@mirantis.com>> wrote:
> Hello everyone!
>
> Guys in our team faced an issue when they try to run alembic
migrations on
> Galera with ENFORCING mode. [1]
>
> This was an issue with Alembic [2], which was quickly fixed by
Mike Bayer
> (many thanks!) and new version of alembic was resealed [3].
> The global requirements are updated [4].
>
> I think that it is desired to fix this for Newton at least. We
cannot bump
> requirements for Newton, so hot fix can be putting pk on this
table in the
> first migration like proposed [5].  Any other ideas?
>
> [1] - https://bugs.launchpad.net/neutron/+bug/1655610
<https://bugs.launchpad.net/neutron/+bug/1655610>
> [2] - https://bitbucket.org/zzzeek/alembic/issues/406
<https://bitbucket.org/zzzeek/alembic/issues/406>
> [3] -
http://alembic.zzzcomputing.com/en/latest/changelog.html#change-0.8.10
<http://alembic.zzzcomputing.com/en/latest/changelog.html#change-0.8.10>
> [4] - https://review.openstack.org/#/c/423118/
<https://review.openstack.org/#/c/423118/>
> [5] - https://review.openstack.org/#/c/419320/
<https://review.openstack.org/#/c/419320/>
>
>
> --
> Regards,
> Ann Taraday
>
>
__
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe:
openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
<http://openstack-dev-requ...@lists.openstack.org?subject:unsubscribe>
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
<http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev>
>

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe:
openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
<http://openstack-dev-requ...@lists.openstack.org?subject:unsubscribe>
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
<http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev>




--
Best regards,
Proskurin Kirill


__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev



__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [networking-sfc] Intermittent database transaction issues, affecting the tempest gate

2016-12-22 Thread Mike Bayer



On 12/20/2016 06:50 PM, Cathy Zhang wrote:

Hi Bernard,

Thanks for the email. I will take a look at this. Xiaodong has been working on 
tempest test scripts.
I will work with Xiaodong on this issue.


I've added a comment to the issue which refers to upstream SQLAlchemy 
issue https://bitbucket.org/zzzeek/sqlalchemy/issues/3803 as a potential 
contributor, though looking at the logs linked from the issue it appears 
that database deadlocks are also occurring which may also be a precursor 
here.   There are many improvements in SQLAlchemy 1.1 such that the 
"rollback()" state should not be as susceptible to a corrupted database 
connection as seems to be the case here.







Cathy


-Original Message-
From: Bernard Cafarelli [mailto:bcafa...@redhat.com]
Sent: Tuesday, December 20, 2016 3:00 AM
To: OpenStack Development Mailing List
Subject: [openstack-dev] [networking-sfc] Intermittent database transaction 
issues, affecting the tempest gate

Hi everyone,

we have an open bug (thanks Igor for the report) on DB transaction issues:
https://bugs.launchpad.net/networking-sfc/+bug/1630503

The thing is, I am seeing  quite a few tempest gate failures that follow the 
same pattern: at some point in the test suite, the service gets warnings/errors 
from the DB layer (reentrant call, closed transaction, nested rollback, …), and 
all following tests fail.

This affects both master and stable/newton branches (not many changes for now 
in the DB parts between these branches)

Some examples:
* https://review.openstack.org/#/c/400396/ failed with console log
http://logs.openstack.org/96/400396/2/check/gate-tempest-dsvm-networking-sfc-ubuntu-xenial/c27920b/console.html#_2016-12-16_12_44_47_564544
and service log
http://logs.openstack.org/96/400396/2/check/gate-tempest-dsvm-networking-sfc-ubuntu-xenial/c27920b/logs/screen-q-svc.txt.gz?level=WARNING#_2016-12-16_12_44_32_301
* https://review.openstack.org/#/c/405391/ failed,
http://logs.openstack.org/91/405391/2/check/gate-tempest-dsvm-networking-sfc-ubuntu-xenial/7e2b1de/console.html.gz#_2016-12-16_13_05_17_384323
and 
http://logs.openstack.org/91/405391/2/check/gate-tempest-dsvm-networking-sfc-ubuntu-xenial/7e2b1de/logs/screen-q-svc.txt.gz?level=WARNING#_2016-12-16_13_04_11_840
* another on master branch: https://review.openstack.org/#/c/411194/
with 
http://logs.openstack.org/94/411194/1/gate/gate-tempest-dsvm-networking-sfc-ubuntu-xenial/90633de/console.html.gz#_2016-12-15_22_36_15_216260
and 
http://logs.openstack.org/94/411194/1/gate/gate-tempest-dsvm-networking-sfc-ubuntu-xenial/90633de/logs/screen-q-svc.txt.gz?level=WARNING#_2016-12-15_22_35_53_310

I took a look at the errors, but only found old-and-apparently-fixed pymysql 
bugs, and suggestions like:
* 
http://docs.sqlalchemy.org/en/latest/faq/sessions.html#this-session-s-transaction-has-been-rolled-back-due-to-a-previous-exception-during-flush-or-similar
*  https://review.openstack.org/#/c/230481/
Not really my forte, so if someone could take a look at these logs and fix the 
problem, it would be great! Especially with the upcoming multinode tempest gate

Thanks,
--
Bernard Cafarelli

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev



__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [nova][oslo][openstack-ansible] DB deadlocks, Mitaka, and You

2016-10-19 Thread Mike Bayer




On 10/19/2016 08:36 AM, Ian Cordasco wrote:

Hey Kevin,

So just looking at the pastes you have here, I'm inclined to believe
this is actually a bug in oslo_db/sqlalchemy. If you follow the trace,
there's a PyMySQL InternalError not being handled inside of
sqlalchemy. I'm not sure if SQLAlchemy considers InternalErrors to be
something it cannot retry, or something that the user should decide
how to handle, but I would start chatting with the folk who work on
oslo_db and SQLAlchemy in the community.


SQLAlchemy itself does not retry transactions.  A retry is typically at 
the method level where the calling application (nova in this case) would 
make use of the oslo retry decorator, seen here: 
https://github.com/openstack/oslo.db/blob/master/oslo_db/api.py#L85 . 
This decorator is configured to retry based on specific oslo-level 
exceptions being intercepted, of which DBDeadlock is the primary 
exception this function was written for.


In this case, both stack traces illustrate the error being thrown is 
DBDeadlock, which is an oslo-db-specific error that is the result of the 
correct handling of this PyMySQL error code.   The original error object 
is maintained as a data member of DBDeadlock so that the source of the 
DBDeadlock can be seen.  The declaration of this interception is here: 
https://github.com/openstack/oslo.db/blob/master/oslo_db/sqlalchemy/exc_filters.py#L56 
.   SQLAlchemy re-throws this user-generated exception in the context of 
the original, so in Python 2 where stack traces are still a confusing 
affair, it's hard to see that this interception occurred, but DBDeadlock 
indicates that it has.









That said, this also looks like something that should be reported to
Nova. Something causing an unhandled exception is definitely bug worth
(even if the fix belongs somewhere in one of its dependencies).

--
Ian Cordasco

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev



__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [cinder][db] lazy loading of an attribute impossible

2016-10-01 Thread Mike Bayer




On 09/30/2016 10:54 AM, Roman Podoliaka wrote:

Michał,

You are absolutely right: this exception is raised when you try to
lazy-load instance attributes outside a Session scope. There is an
obvious problem with that - instances do not communicate with a DB on
their own - it's left up to Session [1].

Unfortunately, it does not play nicely with the "classic" DB access
layer we have in Cinder and other projects, when you have a notion of
pluggable DB APIs and SQLAlchemy implementation that looks like:

@require_context
@handle_db_data_error
def snapshot_create(context, values):
values['snapshot_metadata'] = _metadata_refs(values.get('metadata'),
 models.SnapshotMetadata)
if not values.get('id'):
values['id'] = str(uuid.uuid4())

session = get_session()
with session.begin():
snapshot_ref = models.Snapshot()
snapshot_ref.update(values)
session.add(snapshot_ref)

return _snapshot_get(context, values['id'], session=session)

In this case a Session (and transaction) scope is bound to "public" DB
API functions. There are a few problems with this:

1) once a public DB function returns an instance, it becomes prone to
lazy-load errors, as the corresponding session (and DB transaction) is
already gone and it's not possible to load missing data (without
establishing a new session/transaction)

2) you have to carefully pass a Session object when doing calls to
"private" DB API functions to ensure they all participate in the very
same DB transaction. Otherwise snapshot_get() above would not see the
row created by snapshot_create() due to isolation of transactions in
RDBMS

3) if you do multiple calls to "public" DB API functions when handling
a single HTTP request it's not longer easy to do a rollback as every
function creates its own DB transaction

Mixing of Session objects creation with the actual business logic is
considered to be an anti-pattern in SQLAlchemy [2] due to problems
mentioned above.

At this point I suggest you take a look at [3] and start using in
Cinder: in Kilo we did a complete redesign of EngineFacade in oslo.db
- it won't solve all you problems with lazy-loading automatically, but
what it can do is provide a tool for declarative definition of session
(and transaction) scope, so that it's not longer limited to one
"public" DB API function and you can extend it when needed: you no
longer create a Session object explicitly, but rather mark methods
with a decorator, that will inject a session into the context, and all
callees will participate in the established session (thus, DB
transaction) rather than create a new one (my personal opinion is that
for web-services it's preferable to bind session/transaction scope to
the scope of one HTTP request, so that it's easy to roll back changes
on errors - we are not there yet, but some projects like Nova are
already moving the session scope up the stack, e.g. to objects layer).



+1 thanks Roman !




Thanks,
Roman

[1] 
http://docs.sqlalchemy.org/en/latest/orm/session_basics.html#what-does-the-session-do
[2] 
http://docs.sqlalchemy.org/en/latest/orm/session_basics.html#when-do-i-construct-a-session-when-do-i-commit-it-and-when-do-i-close-it
[3] 
https://specs.openstack.org/openstack/oslo-specs/specs/kilo/make-enginefacade-a-facade.html

On Thu, Sep 22, 2016 at 4:45 PM, Michał Dulko  wrote:

Hi,

I've just noticed another Cinder bug [1], similar to past bugs [2], [3].
All of them have a common exception causing them:

sqlalchemy.orm.exc.DetachedInstanceError: Parent instance
<{$SQLAlchemyObject} at {$MemoryLocation}> is not bound to a Session;
lazy load operation of attribute '{$ColumnName}' cannot proceed

We've normally fixed them by simply making the $ColumnName eager-loaded,
but as there's another similar bug report, I'm starting to think that we
have some issue with how we're managing our DB connections and
SQLAlchemy objects are losing their sessions too quickly, before we'll
manage to lazy-load required stuff.

I'm not too experienced with SQLAlchemy session management, so I would
welcome any help with investigation.

Thanks,
Michal


[1] https://bugs.launchpad.net/cinder/+bug/1626499
[2] https://bugs.launchpad.net/cinder/+bug/1517763
[3] https://bugs.launchpad.net/cinder/+bug/1501838

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev



__
OpenStack

Re: [openstack-dev] [oslo.db] [release] opportunistic tests breaking randomly

2016-09-21 Thread Mike Bayer




On 09/21/2016 11:41 AM, Joshua Harlow wrote:


I've seen something similar at https://review.openstack.org/#/c/316935/

Maybe its time we asked again why are we still using eventlet and do we
need to anymore. What functionality of it are people actually taking
advantage of? If it's supporting libraries like oslo.service then it'd
probably be useful to talk to the ceilometer folks who replaced
oslo.service with something else (another oslo library for periodics and
https://github.com/sileht/cotyledon for service oriented tasks).


Plus Keystone has gotten off of it.

I actually like eventlet and gevent quite a lot.   I am using it in a 
new middleware component that will be involved with database connection 
pooling.  However, I *don't* use the global monkeypatching aspect. 
That's where this all goes very wrong.   Things that are designed for 
synchronous operations, like database-oriented business methods as well 
as the work of the database driver itself, should run within threads. 
You can in fact use eventlet/gevent's APIs explicitly and you can even 
combine it with traditional threading explicitly.   I'm actually using a 
stdlib Queue (carefully) to send data between greenlets and threads. 
Madness!








-Josh

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [oslo.db] [release] opportunistic tests breaking randomly

2016-09-14 Thread Mike Bayer




On 09/14/2016 11:05 PM, Mike Bayer wrote:


Are *these* errors also new as of version 4.13.3 of oslo.db ?   Because
here I have more suspicion of one particular oslo.db change here.


The version in question that has the changes to provisioning and 
anything really to do with this area is 4.12.0.   So if you didn't see 
any problem w/ 4.12 then almost definitely oslo.db is not the cause - 
the code changes subsequent to 4.12 have no relationship to any system 
used by the opportunistic test base.I would hope at least that 4.12 
is the version where we see things changing because there were small 
changes to the provisioning code.


But at the same time, I'm combing through the quite small adjustments to 
the provisioning code as of 4.12.0 and I'm not seeing what could 
introduce this issue.   That said, we really should never see the kind 
of error we see with the "DROP DATABASE" failing because it remains in 
use, however this can be a side effect of the test itself having 
problems with the state of a different connection, not being closed and 
locks remain held.


That is, there's poor failure modes for sure here, I just can't see 
anything in 4.13 or even 4.12 that would suddenly introduce them.


By all means if these failures disappear when we go to 4.11 vs. 4.12, 
that would be where we need to go and to look for next cycle. From 
my POV if the failures do disappear then that would be the best evidence 
that the oslo.db version is the factor.











 fits much more with your initial description

On 09/14/2016 10:48 PM, Mike Bayer wrote:



On 09/14/2016 07:04 PM, Alan Pevec wrote:

Olso.db 4.13.3 did hit the scene about the time this showed up. So I
think we need to strongly consider blocking it and revisiting these
issues post newton.


So that means reverting all stable/newton changes, previous 4.13.x
have been already blocked https://review.openstack.org/365565
How would we proceed, do we need to revert all backport on
stable/newton?


In case my previous email wasn't clear, I don't *yet* see evidence that
the recent 4.13.3 release of oslo.db is the cause of this problem.
However, that is only based upon what I see in this stack trace, which
is that the test framework is acting predictably (though erroneously)
based on the timeout condition which is occurring.   I don't (yet) see a
reason that the same effect would not occur prior to 4.13.3 in the face
of a signal pre-empting the work of the pymysql driver mid-stream.
However, this assumes that the timeout condition itself is not a product
of the current oslo.db version and that is not known yet.

There's a list of questions that should all be answerable which could
assist in giving some hints towards this.

There's two parts to the error in the logs.  There's the "timeout"
condition, then there is the bad reaction of the PyMySQL driver and the
test framework as a result of the operation being interrupted within the
test.

* Prior to oslo.db 4.13.3, did we ever see this "timeout" condition
occur?   If so, was it also accompanied by the same "resource closed"
condition or did this second part of the condition only appear at 4.13.3?

* Did we see a similar "timeout" / "resource closed" combination prior
to 4.13.3, just with less frequency?

* Was the version of PyMySQL also recently upgraded (I'm assuming this
environment has been on PyMySQL for a long time at this point) ?   What
was the version change if so?  Especially if we previously saw "timeout"
but no "resource closed", perhaps an older version pf PyMySQL didn't
react in this way?

* Was the version of MySQL running in the CI environment changed?   What
was the version change if so?Were there any configurational changes
such as transaction isolation, memory or process settings?

* Have there been changes to the "timeout" logic itself in the test
suite, e.g. whatever it is that sets up fixtures.Timeout()?  Or some
change that alters how teardown of tests occurs when a test is
interrupted via this timeout?

* What is the magnitude of the "timeout" this fixture is using, is it on
the order of seconds, minutes, hours?

* If many minutes or hours, can the test suite be observed to be stuck
on this test?   Has someone tried to run a "SHOW PROCESSLIST" while this
condition is occurring to see what SQL is pausing?

* Has there been some change such that the migration tests are running
against non-empty tables or tables with much more data than was present
before?

* Is this failure only present within the Nova test suite or has it been
observed in the test suites of other projects?

* Is this failure present only on the "database migration" test suite or
is it present in other opportunistic tests, for Nova and others?

* Have there been new database migrations added to Nova which are being
exercised here and may be involved?

I'm not sure how much

Re: [openstack-dev] [oslo.db] [release] opportunistic tests breaking randomly

2016-09-14 Thread Mike Bayer

There's a different set of logs attached to the launchpad issue, that's 
not what I was looking at before.


These logs are at 
http://logs.openstack.org/90/369490/1/check/gate-nova-tox-db-functional-ubuntu-xenial/085ac3e/console.html#_2016-09-13_14_54_18_098031 
.In these logs, I see something *very* different, not just the MySQL 
tests but the Postgresql tests are definitely hitting conflicts against 
the randomly generated database.


This set of traces, e.g.:

sqlalchemy.exc.OperationalError: (psycopg2.OperationalError) database 
"dbzrtmgbxv" is being accessed by other users
2016-09-13 14:54:18.093723 | DETAIL:  There is 1 other session using 
the database.

2016-09-13 14:54:18.093736 |  [SQL: 'DROP DATABASE dbzrtmgbxv']

and

File 
"/home/jenkins/workspace/gate-nova-tox-db-functional-ubuntu-xenial/.tox/functional/local/lib/python2.7/site-packages/sqlalchemy/engine/base.py", 
line 668, in _rollback_impl
2016-09-13 14:54:18.095470 | 
self.engine.dialect.do_rollback(self.connection)
2016-09-13 14:54:18.095513 |   File 
"/home/jenkins/workspace/gate-nova-tox-db-functional-ubuntu-xenial/.tox/functional/local/lib/python2.7/site-packages/sqlalchemy/engine/default.py", 
line 420, in do_rollback

2016-09-13 14:54:18.095526 | dbapi_connection.rollback()
2016-09-13 14:54:18.095548 | sqlalchemy.exc.InterfaceError: 
(psycopg2.InterfaceError) connection already closed


are a very different animal. For one thing, they're on Postgresql where 
the driver and DB acts extremely rationally.   For another, there's no 
timeout exception here, and not all the conflicts are within the teardown.


Are *these* errors also new as of version 4.13.3 of oslo.db ?   Because 
here I have more suspicion of one particular oslo.db change here.








 fits much more with your initial description

On 09/14/2016 10:48 PM, Mike Bayer wrote:



On 09/14/2016 07:04 PM, Alan Pevec wrote:

Olso.db 4.13.3 did hit the scene about the time this showed up. So I
think we need to strongly consider blocking it and revisiting these
issues post newton.


So that means reverting all stable/newton changes, previous 4.13.x
have been already blocked https://review.openstack.org/365565
How would we proceed, do we need to revert all backport on stable/newton?


In case my previous email wasn't clear, I don't *yet* see evidence that
the recent 4.13.3 release of oslo.db is the cause of this problem.
However, that is only based upon what I see in this stack trace, which
is that the test framework is acting predictably (though erroneously)
based on the timeout condition which is occurring.   I don't (yet) see a
reason that the same effect would not occur prior to 4.13.3 in the face
of a signal pre-empting the work of the pymysql driver mid-stream.
However, this assumes that the timeout condition itself is not a product
of the current oslo.db version and that is not known yet.

There's a list of questions that should all be answerable which could
assist in giving some hints towards this.

There's two parts to the error in the logs.  There's the "timeout"
condition, then there is the bad reaction of the PyMySQL driver and the
test framework as a result of the operation being interrupted within the
test.

* Prior to oslo.db 4.13.3, did we ever see this "timeout" condition
occur?   If so, was it also accompanied by the same "resource closed"
condition or did this second part of the condition only appear at 4.13.3?

* Did we see a similar "timeout" / "resource closed" combination prior
to 4.13.3, just with less frequency?

* Was the version of PyMySQL also recently upgraded (I'm assuming this
environment has been on PyMySQL for a long time at this point) ?   What
was the version change if so?  Especially if we previously saw "timeout"
but no "resource closed", perhaps an older version pf PyMySQL didn't
react in this way?

* Was the version of MySQL running in the CI environment changed?   What
was the version change if so?Were there any configurational changes
such as transaction isolation, memory or process settings?

* Have there been changes to the "timeout" logic itself in the test
suite, e.g. whatever it is that sets up fixtures.Timeout()?  Or some
change that alters how teardown of tests occurs when a test is
interrupted via this timeout?

* What is the magnitude of the "timeout" this fixture is using, is it on
the order of seconds, minutes, hours?

* If many minutes or hours, can the test suite be observed to be stuck
on this test?   Has someone tried to run a "SHOW PROCESSLIST" while this
condition is occurring to see what SQL is pausing?

* Has there been some change such that the migration tests are running
against non-empty tables or tables with much more data than was present
before?

* Is this failure only present within the Nova test suite or has it been
observed in the test suites of ot

Re: [openstack-dev] [oslo.db] [release] opportunistic tests breaking randomly

2016-09-14 Thread Mike Bayer




On 09/14/2016 07:04 PM, Alan Pevec wrote:

Olso.db 4.13.3 did hit the scene about the time this showed up. So I
think we need to strongly consider blocking it and revisiting these
issues post newton.


So that means reverting all stable/newton changes, previous 4.13.x
have been already blocked https://review.openstack.org/365565
How would we proceed, do we need to revert all backport on stable/newton?


In case my previous email wasn't clear, I don't *yet* see evidence that 
the recent 4.13.3 release of oslo.db is the cause of this problem. 
However, that is only based upon what I see in this stack trace, which 
is that the test framework is acting predictably (though erroneously) 
based on the timeout condition which is occurring.   I don't (yet) see a 
reason that the same effect would not occur prior to 4.13.3 in the face 
of a signal pre-empting the work of the pymysql driver mid-stream. 
However, this assumes that the timeout condition itself is not a product 
of the current oslo.db version and that is not known yet.


There's a list of questions that should all be answerable which could 
assist in giving some hints towards this.


There's two parts to the error in the logs.  There's the "timeout" 
condition, then there is the bad reaction of the PyMySQL driver and the 
test framework as a result of the operation being interrupted within the 
test.


* Prior to oslo.db 4.13.3, did we ever see this "timeout" condition 
occur?   If so, was it also accompanied by the same "resource closed" 
condition or did this second part of the condition only appear at 4.13.3?


* Did we see a similar "timeout" / "resource closed" combination prior 
to 4.13.3, just with less frequency?


* Was the version of PyMySQL also recently upgraded (I'm assuming this 
environment has been on PyMySQL for a long time at this point) ?   What 
was the version change if so?  Especially if we previously saw "timeout" 
but no "resource closed", perhaps an older version pf PyMySQL didn't 
react in this way?


* Was the version of MySQL running in the CI environment changed?   What 
was the version change if so?Were there any configurational changes 
such as transaction isolation, memory or process settings?


* Have there been changes to the "timeout" logic itself in the test 
suite, e.g. whatever it is that sets up fixtures.Timeout()?  Or some 
change that alters how teardown of tests occurs when a test is 
interrupted via this timeout?


* What is the magnitude of the "timeout" this fixture is using, is it on 
the order of seconds, minutes, hours?


* If many minutes or hours, can the test suite be observed to be stuck 
on this test?   Has someone tried to run a "SHOW PROCESSLIST" while this 
condition is occurring to see what SQL is pausing?


* Has there been some change such that the migration tests are running 
against non-empty tables or tables with much more data than was present 
before?


* Is this failure only present within the Nova test suite or has it been 
observed in the test suites of other projects?


* Is this failure present only on the "database migration" test suite or 
is it present in other opportunistic tests, for Nova and others?


* Have there been new database migrations added to Nova which are being 
exercised here and may be involved?


I'm not sure how much of an inconvenience it is to downgrade oslo.db. 
If downgrading it is feasible, that would at least be a way to eliminate 
it as a possibility if these same failures continue to occur, or a way 
to confirm its involvement if they disappear.   But if downgrading is 
disruptive then there are other things to look at in order to have a 
better chance at predicting its involvement.






Cheers,
Alan

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev



__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [oslo.db] [release] opportunistic tests breaking randomly

2016-09-14 Thread Mike Bayer

On 09/14/2016 11:08 AM, Mike Bayer wrote:

On 09/14/2016 09:15 AM, Sean Dague wrote:

I noticed the following issues happening quite often now in the
opportunistic db tests for nova -
http://logstash.openstack.org/#dashboard/file/logstash.json?query=message%3A%5C%22sqlalchemy.exc.ResourceClosedError%5C%22

It looks like some race has been introduced where the various db
connections are not fully isolated from each other like they used to be.
The testing magic for this is buried pretty deep in oslo.db.

that error message occurs when a connection that is intended against a
SELECT statement fails to provide a cursor.description attribute. It is
typically a driver-level bug in the MySQL world and corresponds to
mis-handled failure modes from the MySQL connection.

By "various DB connections are not fully isolated from each other" are
you suggesting that a single in-Python connection object itself is being
shared among multiple greenlets? I'm not aware of a change in oslo.db
that would be a relationship to such an effect.

So, I think by "fully isolated from each other" what you really mean is
"operations upon a connection are not fully isolated from the subsequent
use of that connection", since that's what I see in the logs. A
connection is attempting to be used during teardown to drop tables,
however it's in this essentially broken state from a PyMySQL
perspective, which would indicate something has gone wrong with this
(pooled) connection in the preceding test that could not be detected or
reverted once the connection was returned to the pool.

From Roman's observation, it looks like a likely source of this
corruption is a timeout that is interrupting the state of the PyMySQL
connection. In the preceding stack trace, PyMySQL is encountering a
raise as it attempts to call "self._sock.recv_into(b)", and it seems
like some combination of eventlet's response to signals and the
fixtures.Timeout() fixture is the cause of this interruption. As an
additional wart, something else is getting involved and turning it into
an IndexError, I'm not sure what that part is yet though I can imagine
that might be SQLAlchemy mis-interpreting what it expects to be a
PyMySQL exception class, since we normally look inside of
exception.args[0] to get the MySQL error code. With a blank exception
like fixtures.TimeoutException, .args is the empty tuple.

The PyMySQL connection is now in an invalid state and unable to perform
a SELECT statement correctly, but the connection is not invalidated and
is instead returned to the connection pool in a broken state. So the
subsequent teardown, if it uses this same connection (which is likely),
fails because the connection has been interrupted in the middle of its
work and not given the chance to clean up.

Seems like the use of fixtures.Timeout() fixture here is not organized
to work with a database operation in progress, especially an
eventlet-monkeypatched PyMySQL. Ideally, if something like a timeout
due to a signal handler occurs, the entire connection pool should be
disposed (quickest way, engine.dispose()), or at the very least (and
much more targeted), the connection that's involved should be
invalidated from the pool, e.g. connection.invalidate().

The change to the environment here would be that this timeout is
happening at all - the reason for that is not yet known. If oslo.db's
version were involved in this error, I would guess that it would be
related to this timeout condition being caused, and not anything to do
with the connection provisioning.

Olso.db 4.13.3 did hit the scene about the time this showed up. So I
think we need to strongly consider blocking it and revisiting these
issues post newton.

-Sean

Re: [openstack-dev] [oslo.db] [release] opportunistic tests breaking randomly

2016-09-14 Thread Mike Bayer




On 09/14/2016 09:15 AM, Sean Dague wrote:

I noticed the following issues happening quite often now in the
opportunistic db tests for nova -
http://logstash.openstack.org/#dashboard/file/logstash.json?query=message%3A%5C%22sqlalchemy.exc.ResourceClosedError%5C%22


It looks like some race has been introduced where the various db
connections are not fully isolated from each other like they used to be.
The testing magic for this is buried pretty deep in oslo.db.


that error message occurs when a connection that is intended against a 
SELECT statement fails to provide a cursor.description attribute.  It is 
typically a driver-level bug in the MySQL world and corresponds to 
mis-handled failure modes from the MySQL connection.


By "various DB connections are not fully isolated from each other" are 
you suggesting that a single in-Python connection object itself is being 
shared among multiple greenlets?   I'm not aware of a change in oslo.db 
that would be a relationship to such an effect.






Olso.db 4.13.3 did hit the scene about the time this showed up. So I
think we need to strongly consider blocking it and revisiting these
issues post newton.

-Sean



__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [oslo] [telemetry] [requirements] [FFE] Oslo.db 4.13.2

2016-09-07 Thread Mike Bayer




On 09/07/2016 01:29 PM, Doug Hellmann wrote:

Excerpts from Matthew Thode's message of 2016-09-07 09:11:58 -0500:

On 09/07/2016 08:58 AM, Doug Hellmann wrote:

Excerpts from Matthew Thode's message of 2016-09-07 08:21:50 -0500:

https://review.openstack.org/366298

This is just a bump to upper-constraints so is more minor to get testing
working and fix the bug that occurred in Gnocchi (and possibly others).

We are able to mask the 'bad' versions of oslo.db and unmask pymysql
0.7.7 after the freeze (and backport them to stable/newton) so this is
easier to merge.



If we have a known-bad version of the library, maybe it would be better
to incorporate that info into the global-requirements list before we
branch the requirements repository? I'm not sure what we've done in
this case for past cycles.

Doug

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Are you fine with the knock on effects that a gr update would cause?


What do we have that uses oslo.db and is itself a library that would
need to be re-released?


I would assume Gnocchi which acts as a backend for Ceilometer.





Doug

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev



__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [keystone][nova][neutron][all] Rolling upgrades: database triggers and oslo.versionedobjects

2016-09-02 Thread Mike Bayer




On 09/02/2016 01:53 PM, Doug Hellmann wrote:

Excerpts from Thierry Carrez's message of 2016-09-02 12:15:33 +0200:

Sean Dague wrote:

Putting DB trigger failure analysis into the toolkit required to manage
an upgrade failure is a really high bar for new ops.


I agree with Sean: increasing the variety of technologies used increases
the system complexity, which in turn requires more skills to fully
understand and maintain operationally. It should only be done as a last
resort, with pros and cons carefully weighted. We really should involve
operators in this discussion to get the full picture of arguments for
and against.



Yes, I would like to understand better what aspect of the approach
taken elsewhere is leading to the keystone team exploring other
options. So far I'm not seeing much upside to being different, and I'm
hearing a lot of cons.


I continue to maintain that the problems themselves being discussed at 
https://review.openstack.org/#/c/331740/ are different than what has 
been discussed in detail before.   To be "not different", this spec 
would need to no longer discuss the concept of "we need N to be reading 
from and writing to the old column to be compatible with N-1 as shown in 
the below diagram...Once all the N-1 services are upgraded, N services 
should be moved out of compatibility mode to use the new column. ". 
To my knowledge, there are no examples of code in Openstack that 
straddles table and column changes directly in the SQL access layer as 
this document describes.There's still a handful of folks including 
myself that think this is a new kind of awkwardness we've not had to 
deal with yet.   My only ideas on how to reduce it is to put the N-1/N 
differences on the write side, not the read side, and triggers are *not* 
the only way to do it.   But if "being different" means, "doing it on 
the write side", then it seems like that overall concept is being 
vetoed.  Which I actually appreciate knowing up front before I spend a 
lot of time on it.


















Doug

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev



__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [oslo] [telemetry] Oslo.db 4.13.1 broke Gnocchi

2016-09-02 Thread Mike Bayer


is the failure here something that comes up in gnocchi's test suite?

Could there be some way that oslo libraries run the test suites of all 
consuming projects before a patch and/or a release?  (apologies if we 
already do this).





On 09/02/2016 12:17 PM, Matthew Thode wrote:

On 09/02/2016 03:43 AM, Julien Danjou wrote:

On Fri, Sep 02 2016, Julien Danjou wrote:


I'll look into fixing that, though any help would be welcome.


My attempt at:
  https://review.openstack.org/364767



__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev



Sucks this was broken but let us (requirements) know when it's released
so we can make a feature freeze exception (and email the list with
[requirements][FFE] in the title).



__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev



__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [oslo] [telemetry] Oslo.db 4.13.1 broke Gnocchi

2016-09-02 Thread Mike Bayer




On 09/02/2016 04:43 AM, Julien Danjou wrote:

On Fri, Sep 02 2016, Julien Danjou wrote:


I'll look into fixing that, though any help would be welcome.


My attempt at:
  https://review.openstack.org/364767


I've augmented it with mapper-level SQLAlchemy API use.






__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev



__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [keystone][nova][neutron][all] Rolling upgrades: database triggers and oslo.versionedobjects

2016-09-01 Thread Mike Bayer




On 09/01/2016 11:52 AM, Dan Smith wrote:


The indirection service is really unrelated to this discussion, IMHO. If
you take RPC out of the picture, all you have left is a
direct-to-the-database facade to handle the fact that schema has
expanded underneath you. As Clint (et al) have said -- designing the
application to expect schema expansion (and avoiding unnecessary
contraction) is the key here.


pretty much.  there's no fixed pattern in how to do these.  Every 
version of a data access API will be weighed down with baggage from the 
previous version and an inability to take full advantage of new 
improvements until the next release, and background migrations are 
complicated by the old application undoing their work.  Even small 
migrations mean all these issues have to be considered each time on a 
case-by-case basis.   These are the problems people are hoping to 
improve upon if possible.   The spec at 
https://review.openstack.org/#/c/331740/ is discussing these issues in 
detail and is the first such specification I've seen that tries to get 
into it at this level.





__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [keystone][nova][neutron][all] Rolling upgrades: database triggers and oslo.versionedobjects

2016-09-01 Thread Mike Bayer




On 09/01/2016 08:29 AM, Henry Nash wrote:


From a purely keystone perspective, my gut feeling is that actually the
trigger approach is likely to lead to a more robust, not less, solution - due
to the fact that we solve the very specific problems of a given migration
(i.e. need to keep column A in sync with Column B) or a short period of time,
right at the point of pain, with well established techniques - albeit they be
complex ones that need experienced coders in those techniques.


this is really the same philosophy I'm going for, that is, make a schema 
migration, then accompany it by a data migration, and then you're done. 
The rest of the world need not be concerned.


It's not as much about "triggers" as it is, "handle the data difference 
on the write side, not the read side".  That is, writing data to a SQL 
database is squeezed through exactly three very boring forms of 
statement, the INSERT, UPDATE, and DELETE.   These are easy to intercept 
in the database, and since we use an abstraction like SQLAlchemy they 
are easy to intercept in the application layer too (foreshadowing). 
  When you put it on the read side, reading is of course (mostly) 
through just one statement, the SELECT, but it is a crazy beast in 
practice and it is all over the place in an unlimited number of forms.


If you can get your migrations to be, hey, we can just read JSON records 
from version 1.0 of the service and pump them into version 2.0, then 
you're doing read-side, but you've solved the problem at the service 
layer.  This only works for those situations where it "works", and the 
dual-layer service architecture has to be feasibly present as well.


__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [keystone][nova][neutron][all] Rolling upgrades: database triggers and oslo.versionedobjects

2016-09-01 Thread Mike Bayer




On 08/31/2016 06:18 PM, Monty Taylor wrote:


I said this the other day in the IRC channel, and I'm going to say it
again here. I'm going to do it as bluntly as I can - please keeping in
mind that I respect all of the humans involved.

I think this is a monstrously terrible idea.

There are MANY reasons for this -but I'm going to limit myself to two.

OpenStack is One Project


Nova and Neutron have an approach for this. It may or may not be ideal -
but it exists right now. While it can be satisfying to discount the
existing approach and write a new one, I do not believe that is in the
best interests of OpenStack as a whole. To diverge in _keystone_ - which
is one of the few projects that must exist in every OpenStack install -
when there exists an approach in the two other most commonly deployed
projects - is such a terrible example of the problems inherent in
Conway's Law that it makes me want to push up a proposal to dissolve all
of the individual project teams and merge all of the repos into a single
repo.


So that is fine.  However, correct me if I'm wrong but you're proposing 
just that these projects migrate to also use a new service layer with 
oslo.versionedobjects, because IIUC Nova/Neutron's approach is dependent 
on that area of indirection being present. Otherwise, if you meant 
something like, "use an approach that's kind of like what Nova does w/ 
versionedobjects but without actually having to use versionedobjects", 
that still sounds like, "come up with a new idea".


I suppose if you're thinking more at the macro level, where "current 
approach" means "do whatever you have to on the app side", then your 
position is consistent, but I think there's still a lot of confusion in 
that area when the indirection of a versioned service layer is not 
present.   It gets into the SQL nastiness I was discussing w/ Clint and 
I don't see anyone doing anything like that yet.


Triggers aside since it clearly is "triggering" (ahem) allergic 
reactions, what's the approach when new approaches are devised that are 
alternatives to what "exists right now"?   E.g. I have yet another 
proposal in the works that allows for SQL-level translations but runs in 
the Python application space and does not use triggers.  Should I stop 
right now because Nova/Neutron already have a system that's "good 
enough"?This would be fine.  I find it uncomfortable working in this 
ambiguous space where some projects rightly proclaim they've solved a 
problem, and others continue to disregard that and plow forward with 
other approaches without a universally accepted reason why the current 
solution is not feasible.





BUT - I also don't think it's a good technical solution. That isn't
because triggers don't work in MySQL (they do) - but because we've spent
the last six years explicitly NOT writing raw SQL. We've chosen an
abstraction layer (SQLAlchemy) which does its job well.


There's a canard in there which is that all along I've been proposing to 
start adding systems to oslo.db to help produce and maintain triggers 
which certainly would have among its goals that consuming projects 
wouldn't be writing raw SQL.  That part of the discomfort is more 
manageable than Clint's, which is that he doesn't want the database 
doing things with the data other than storing it, and I totally know 
where he's coming from on that.


The "be more similar" argument would be the only one you have to make. 
It basically says, "problem X is 'solved', other approaches are now 
unnecessary".   I'm skeptical that I am reading that correctly.  I have 
another approach to the issue of "rolling upgrades where we really need 
to translate at the SQL layer" that is in some ways similar to what 
triggers do, but entirely within the abstraction layer that you so 
appropriately appreciate :).   I have a binary decision to make here, 
"do i work on this new idea that Glance has already expressed an 
interest in and Keystone might like also? Or do I not, because this 
problem is solved?".   I have other projects to work on, so it's not 
like I'm looking for more.   It's just I'd like to see Glance and others 
have their rolling upgrades problem solved, at least with the benefit of 
a fixed and predictable pattern, rather than every schema change being 
an ongoing seat-of-the-pants type of operation as it is right now.


Finally, it's a known and accepted pattern in large

scale MySQL shops ... Roll out a new version of the app code which
understands both the old and the new schema version, then roll out a
no-downtime additive schema change to the database, then have the app
layer process and handle on the fly transformation if needed.



Right, as I've mentioned previously, I only take issue with the 
"monolithic app code that speaks both versions of the schema" part. 
Assuming there's no layer of service indirection where migration issues 
can be finessed outside of the SQL interaction layer, it means every

Re: [openstack-dev] [oslo] pymysql change in error formatting has broken exception handing in oslo.db

2016-08-31 Thread Mike Bayer




On 08/31/2016 10:48 AM, Ihar Hrachyshka wrote:


Unless we fix the bug in next pymysql, it’s not either/or but both will
be needed, and also minimal oslo.db version bump.


upstream issue:

https://github.com/PyMySQL/PyMySQL/issues/507

PyMySQL tends to be very responsive to issues (plus I think I'm a 
committer anyway, even I could commit a fix I suppose)



__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

[openstack-dev] [oslo] pymysql change in error formatting has broken exception handing in oslo.db

2016-08-31 Thread Mike Bayer


We need to decide how to handle this:

https://review.openstack.org/#/c/362991/


Basically, PyMySQL normally raises an error message like this:

(pymysql.err.IntegrityError) (1452, u'Cannot add or update a child row: 
a foreign key constraint fails (`vaceciqnzs`.`resource_entity`, 
CONSTRAINT `foo_fkey` FOREIGN KEY (`foo_id`) REFERENCES `resource_foo` 
(`id`))')


for some reason, PyMySQL 0.7.7 is now raising it like this:

(pymysql.err.IntegrityError) (1452, u'23000Cannot add or update a child 
row: a foreign key constraint fails (`vaceciqnzs`.`resource_entity`, 
CONSTRAINT `foo_fkey` FOREIGN KEY (`foo_id`) REFERENCES `resource_foo` 
(`id`))')


this impacts oslo.db's "exception re-handling" functionality which tries 
to classify this exception as a DBNonExistentConstraint exception.   It 
also breaks oslo.db's test suite locally, but in a downstream project 
would only impact its ability to intercept this exception appropriately.


now that "23000" there looks like a bug.  The above gerrit proposes to 
work around it.  However, if we didn't push out the above gerrit, we'd 
instead have to change requirements:


https://review.openstack.org/#/q/I33d5ef8f35747d3b6d3bc0bd4972ce3b7fd60371,n,z

It seems like at least one or the other would be needed for Newton.





__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [keystone][nova][neutron][all] Rolling upgrades: database triggers and oslo.versionedobjects

2016-08-30 Thread Mike Bayer




On 08/30/2016 08:04 PM, Clint Byrum wrote:


My direct experience with this was MySQL 5.0 and 5.1. They worked as
documented, and no I don't think they've changed much since then.

When they were actually installed into the schema and up to date with
the code that expected them, and the debugging individual was aware of them, 
things were fine.

However, every other imperative part of the code was asserted with git,
package managers, ansible, puppet, pick your choice of thing that puts
file on disk and restarts daemons. These things all have obvious entry
points too. X is where wsgi starts running code. Y is where flask hands
off to the app, etc. But triggers are special and go in the database at
whatever time they go in. This means you lose all the benefit of all of
the tools you're used to using to debug and operate on imperative code.


to use your phrasing, I'd characterize this as "an unnecessarily bleak 
view" of the use of triggers as a whole.  I've no doubt you worked with 
some horrible trigger code (just as I've worked with some horrible 
application code, but I've worked with horrible stored procedure / 
trigger stuff too).


The triggers that have been in play in the current Keystone proposals as 
well as the one we were working with in Neutron were simple one liners 
that essentially act as custom constraints - they check a condition then 
raise an error if it fails.  In particular, MySQL doesn't have support 
for CHECK constraints, so if you want to assert that values going into a 
row have some quality more exotic than "not null", you might have to use 
a trigger to get this effect.


Clearly, a trigger that is so complex that it is invoking a whole series 
of imperative steps is not a trigger any of us should be considering. 
IMO these are not those triggers.





Of course, you can have books that get their edition 0 updated in book
while you're upgrading. But the editions feature code always treats
that old update as an update to edition 0.  It's still the same object
it always was, your app just makes some assumptions about it. You can
use a union in some cases where you need to see them all for instance,
and just select a literal '0' for the edition column of your union.


I find unions to be very awkward and really subject to poor performance. 
 Of course this can be made to work but I'm sticking to my preference 
for getting the data in the right shape on the write side, not the read 
side.




And one can say "old app is gone" when one knows it's gone. At that point,
one can run a migration that inserts 0 editions into book_edition, and
drops the book table. For OpenStack, we can say "all releases that used
that old schema are EOL, so we can simplify the code now". Our 6 month
pace and short EOL windows are built for this kind of thing.


Assuming we aren't able to use Nova's approach and we're stuck 
translating in the data access layer, we can simplify the code and put 
out a new release, although that "simplified" code now has to be 
"unsimplified" by all the *new* schema changes - code will always be 
carrying along junk to try and adapt it to the previous version of the 
software.   There's no problem if projects in this situation want to do 
it this way and I will gladly support everyone's efforts in going this 
route.However, I still think it's worth looking into approaches that 
can push the interaction between old and new app version into the write 
side instead of the read side, and if that interaction can be removed 
from the primary database access code into a separate layer.


To the degree that teams can just emulate Nova's finessing of the issue 
at the service level, that's even better.   This thread is just in 
response to particular teams who *want* to use triggers for a specific 
problem.Hopefully I will have time to flesh out my alternative 
technique for "application-level translation triggers" and maybe those 
folks might want to try that kind of thing too someday.






__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [keystone][nova][neutron][all] Rolling upgrades: database triggers and oslo.versionedobjects

2016-08-30 Thread Mike Bayer




On 08/30/2016 04:43 PM, Clint Byrum wrote:




Correct, it is harder for development. Since the database server has all
of the potential for the worst problems, being a stateful service, then
I believe moving complexity _out_ of it, is generally an operational
win, at the expense of some development effort. The development effort,
however, is mostly on the front of the pipeline where timelines can be
longer. Operations typically is operating under SLA's and with
requirements to move slowly in defense of peoples' data and performance
of the system. So I suggest that paying costs in dev, vs. at the
database is usually the highest value choice.

This is of course not the case if timelines are short for development as
well, but I can't really answer the question in that case. For OpenStack,
we nearly always find ourselves with more time to develop, than operators
do to operate.


So the idea of triggers is hey, for easy things like column X is now 
column Y elsewhere, instead of complicating the code, use a trigger to 
maintain that value.   Your argument against triggers is: "Triggers 
introduce emergent behaviors and complicate scaling and reasonable 
debugging in somewhat hidden ways that

can frustrate even the most experienced DBA."

I'd wager that triggers probably work a little more smoothly in modern 
MySQL/Postgresql than a more classical "DBA" platform like a crusty old 
MS SQL Server or Oracle, but more examples on these emergent behaviors 
would be useful, as well as evidence that they apply to current versions 
of database software that are in use within Openstack, and are 
disruptive enough that even the most clear-cut case for triggers vs. 
in-application complexity should favor in-app complexity without question.







I don't think it's all that ambitious to think we can just use tried and
tested schema evolution techniques that work for everyone else.


People have been asking me for over a year how to do this, and I have no
easy answer, I'm glad that you do.  I would like to see some examples of
these techniques.

If you can show me the SQL access code that deals with the above change,
that would help a lot.



So schema changes fall into several categories. But basically, the only
one that is hard, is a relationship change. Basically, a new PK. Here's
an example:

Book.isbn was the PK, but we want to have a record per edition, so the
new primary key is (isbn, edition).

Solution: Maintain two tables. You have created an entirely new object!

CREATE TABLE book (
  isbn varchar(30) not null primary key,
  description text,
)

CREATE TABLE book_editions (
  isbn varchar(30) not null,
  edition int not null,
  description text,
  primary key (isbn, edition),
)

And now on read, your new code has to do this:

SELECT b.isbn,
   COALESCE(be.edition, 0) AS edition,
   COALESCE(be.description, b.description) AS description
FROM book b
 LEFT OUTER JOIN book_editions be
 ON b.isbn = be.isbn
WHERE b.isbn = 'fooisbn'

And now, if a book has only ever been written by old code, you get one
record with a 0 edition. And if it were written by the new system, the
new system would need to go ahead and duplicate the book description into
the old table for as long as we have code that might expect it.


So some pain points here are:

1. you really can't ever trust what's in book_editions.description as 
long as any "old" application is running, since it can put new data into 
book.description at any time.  You shouldn't bother reading from it at 
all, just write to it. You won't be able to use it until the next 
version of the application, e.g. "new" + 1. Or if you support some kind 
of "old app is gone! " flag that modifies the behavior of "new" app to 
modify all its queries, which is even more awkward.


2. deletes by "old" app of entries in "book" have to be synchronized 
offline by a background script of some kind.  You at least need to run a 
final, authoritative "clean up all the old book deletions" job before 
you go into "old app is gone" mode and the new app begins reading from 
book_editions alone.


3. LEFT OUTER JOINs can be a major performance hit.   You can't turn it 
off here until you go to version "new + 1" (bad performance locked in 
for a whole release cycle) or your app has a "turn off old app mode" 
flag (basically you have to write two different database access layers).


Contrast to the trigger approach, which removes all the SELECT pain and 
moves it all to writes:


1. new application has no code whatsoever referring to old application

2. no performance hit on SELECT

3. no "wait til version "new+1"" and/or "old app is gone" switch

If we have evidence that triggers are always, definitely, universally 
going to make even this extremely simple use case non-feasible, great, 
let's measure and test for that.   But in a case like this they look 
very attractive and I'd hate to just dispense with them unilaterally 
without a case-by-case examination.


As I wrote this,

Re: [openstack-dev] [keystone][nova][neutron][all] Rolling upgrades: database triggers and oslo.versionedobjects

2016-08-30 Thread Mike Bayer




On 08/30/2016 09:57 AM, Clint Byrum wrote:




As someone else brought up, this is an unnecessarily bleak view of how database
migrations work.


We aren't talking about database migrations.  We are talking about 
*online* database migrations, where we would like both the *old* and 
*new* versions of the code, talking to the database at the same time.



If I write code that does this:


SELECT foo, bar FROM table

then I do a migration that replaces "bar" with some new table, the new 
SQL is:


SELECT table.foo, othertable.bar FROM table JOIN othertable ON 
table.id == othertable.foo_id


Those two SQL statements are incompatible.  The "new" version of the 
code must expect and maintain the old "bar" column for the benefit of 
the "old" version of the code still reading and writing to it.   To me, 
this seems to contradict your suggestion "don't delete columns, ignore 
them".  We can't ignore "bar" above.





Following these commandments, one can run schema changes at any time. A
new schema should be completely ignorable by older code, because their
columns keep working, and no new requirements are introduced. New code
can deal with defaulted new columns gracefully.


You need to specify how new code deals with the above two totally 
different SQL statements "gracefully", except that it has to accommodate 
for both versions of the schema at the same time.   This may be 
"graceful" in operator land but in developer land, there is no easy 
solution for this.  Unless there is, and nobody has shown it to me yet:




I don't think it's all that ambitious to think we can just use tried and
tested schema evolution techniques that work for everyone else.


People have been asking me for over a year how to do this, and I have no 
easy answer, I'm glad that you do.  I would like to see some examples of 
these techniques.


If you can show me the SQL access code that deals with the above change, 
that would help a lot.


If the answer is, "oh well just don't do a schema change like that", 
then we're basically saying we aren't really changing our schemas 
anymore except for totally new features that otherwise aren't accessed 
by the older version of the code.  That's fine.   It's not what people 
coming to me are saying, though.



__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [keystone][nova][neutron][all] Rolling upgrades: database triggers and oslo.versionedobjects

2016-08-26 Thread Mike Bayer




On 08/25/2016 01:13 PM, Steve Martinelli wrote:

The keystone team is pursuing a trigger-based approach to support
rolling, zero-downtime upgrades. The proposed operator experience is
documented here:

  http://docs.openstack.org/developer/keystone/upgrading.html

This differs from Nova and Neutron's approaches to solve for rolling
upgrades (which use oslo.versionedobjects), however Keystone is one of
the few services that doesn't need to manage communication between
multiple releases of multiple service components talking over the
message bus (which is the original use case for oslo.versionedobjects,
and for which it is aptly suited). Keystone simply scales horizontally
and every node talks directly to the database.



Hi Steve -

I'm a strong proponent of looking into the use of triggers to smooth 
upgrades between database versions.Even in the case of projects 
using versioned objects, it still means a SQL layer has to include 
functionality for both versions of a particular schema change which 
itself is awkward.   I'm also still a little worried that not every case 
of this can be handled by orchestration at the API level, and not as a 
single SQL layer method that integrates both versions of a schema change.


Using triggers would resolve the issue of SQL-specific application code 
needing to refer to two versions of a schema at once, at least for those 
areas where triggers and SPs can handle it.   In the "ideal", it means 
all the Python code can just refer to one version of a schema, and nuts 
and bolts embedded into database migrations would handle all the 
movement between schema versions, including the phase between expand and 
contract.   Not that I think the "ideal" is ever going to be realized 
100%, but maybe in some / many places, this can work.


So if Keystone wants to be involved in paving the way for working with 
triggers, IMO this would benefit other projects in that they could 
leverage this kind of functionality in those places where it makes sense.


The problem of "zero downtime database migrations" is an incredibly 
ambitious goal and I think it would be wrong to exclude any one 
particular technique in pursuing this.  A real-world success story would 
likely integrate many different techniques as they apply to specific 
scenarios, and triggers and SPs IMO are a really major one which I 
believe can be supported.





Database triggers are obviously a new challenge for developers to write,
honestly challenging to debug (being side effects), and are made even
more difficult by having to hand write triggers for MySQL, PostgreSQL,
and SQLite independently (SQLAlchemy offers no assistance in this case),
as seen in this patch:


So I would also note that we've been working on the availability of 
triggers and stored functions elsewhere, a very raw patch that is to be 
largely rolled into oslo.db is here:


https://review.openstack.org/#/c/314054/

This patch makes use of an Alembic pattern called "replaceable object", 
which is intended specifically as a means of versioning things like 
triggers and stored procedures:


http://alembic.zzzcomputing.com/en/latest/cookbook.html#replaceable-objects

Within the above Neutron patch, one thing I want to move towards is that 
things like triggers and SPs would only need to be specified once, in 
the migration layer, and not within the model.   To achieve this, tests 
that work against MySQL and Postgresql would need to ensure that the 
test schema is built up using migrations, and not create_all.  This is 
already the case in some places and not in others.  There is work 
ongoing in oslo.db to provide a modernized fixture system that supports 
enginefacade cleanly as well as allows for migrations to be used 
efficiently (read: once per many tests) for all MySQL/Postgresql test 
suites, athttps://review.openstack.org/#/c/351411/ .


As far as SQLite, I have a simple opinion with SQLite which is that 
migrations, triggers, and SPs should not be anywhere near a SQLite 
database.   SQLite should be used strictly for simple model unit tests, 
the schema is created using create_all(), and that's it.   The test 
fixture system accommodates this as well.




Our primary concern at this point are how to effectively test the
triggers we write against our supported database systems, and their
various deployment variations. We might be able to easily drop SQLite
support (as it's only supported for our own test suite), but should we
expect variation in support and/or actual behavior of triggers across
the MySQLs, MariaDBs, Perconas, etc, of the world that would make it
necessary to test each of them independently? If you have operational
experience working with triggers at scale: are there landmines that we
need to be aware of? What is it going to take for us to say we support
*zero* dowtime upgrades with confidence?


*zero* downtime is an extremely difficult goal.   I appreciate that 
people are generally nervous about making more use of

Re: [openstack-dev] Let's drop the postgresql gate job

2016-08-19 Thread Mike Bayer




On 08/18/2016 11:00 AM, Matt Riedemann wrote:

It's that time of year again to talk about killing this job, at least
from the integrated gate (move it to experimental for people that care
about postgresql, or make it gating on a smaller subset of projects like
oslo.db).



Running a full tempest load for Postgresql for everything is not very 
critical.   I'm sure the software gets full-integration tested against 
PG at some point past the gate at least so regressions are reportable, 
so if that's all that's being dropped, I don't see any issue.


That there is PG-specific code being proposed in Neutron [1].  The patch 
here is planned to be rolled largely into oslo.db, so most of it would 
be tested under oslo.db in any case, however Neutron would still have a 
specific issue that is addressed by this library code, so local unit 
testing of this issue would still be needed against both MySQL and 
Postgresql.


There is also the subject area of routines that are somehow dependent on 
the transaction isolation behavior of the backend database, such as code 
that's attempting to see if something has changed in another 
transaction.   This is usually in PG's favor because Postgresql defaults 
to a lower isolation level than MySQL, but there are probably some weird 
edges to this particular subject area.   Again, these things should be 
tested in a local unit-test kind of context.


For the specific goal of oslo.db running cross-project checks, I'd like 
that a lot, not necessarily for the Postgresql use case, but just to 
ensure that API changes in oslo.db don't break on any downstream 
projects.   I would think that for all of oslo, seeing that oslo is 
"horizontal" to openstack "verticals", that all oslo projects would 
somehow have cross-project testing of new patches against consuming 
projects.I run a very small and focused type of this kind of testing 
on my own against downstream openstack for all proposed changes to 
SQLAlchemy, Alembic, and dogpile.cache.



[1] https://review.openstack.org/#/c/314054/



The postgresql job used to have three interesting things about it:

1. It ran keystone with eventlet (which is no longer a thing).
2. It runs the n-api-meta service rather than using config drive.
3. It uses postgresql for the database.

So #1 is gone, and for #3, according to the April 2016 user survey (page
40) [1], 4% of reporting deployments are using it in production.

I don't think we're running n-api-meta in any other integrated gate
jobs, but I'm pretty sure there is at least one neutron job out there
that's running with it that way. We could also consider making the
nova-net dsvm full gate job run n-api-meta, or vice-versa with the
neutron dsvm full gate job.

We also have to consider that with HP public cloud being gone as a node
provider and we've got fewer test nodes to run with, we have to make
tough decisions about which jobs we're going to run in the integrated gate.

I'm bringing this up again because Nova has a few more jobs it would
like to make voting on it's repo (neutron LB and live migration, at
least in the check queue) but there are concerns about adding yet more
jobs that each change has to get through before it's merged, which means
if anything goes wrong in any of those we can have a 24 hour turnaround
on getting an approved change back through the gate.

[1]
https://www.openstack.org/assets/survey/April-2016-User-Survey-Report.pdf



__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [Neutron][oslo.db] Inspecting sqlite db during unit tests

2016-07-25 Thread Mike Bayer

On 07/25/2016 12:55 PM, Carl Baldwin wrote:

On Fri, Jul 22, 2016 at 8:47 AM, Mike Bayer <mba...@redhat.com
<mailto:mba...@redhat.com>> wrote:

On 07/22/2016 04:02 AM, Kevin Benton wrote:

Now that we have switched to oslo.db for test provisioning the
responsibility of choosing a location lands
here:

https://github.com/openstack/oslo.db/blob/a79479088029e4fa51def91cb36bc652356462b6/oslo_db/sqlalchemy/provision.py#L505

The problem is that when you specify
OS_TEST_DBAPI_ADMIN_CONNECTION it
does end up creating the file, but then the logic above chooses
a URL
based on the random ident. So you can find an sqlite file in
your tmp
dir, it just won't be the one you asked for.

It seems like a bug in the oslo.db logic, but the commit that
added it
was part of a much larger refactor so I'm not sure if it was
intentional
to ensure that no two tests used the same db.

it is, the testr system runs tests in multiple subprocesses and I
think neutron has it set to four. if they all shared the same
sqlite database file you'd have failed tests.

A potential improvement might be to replace
OS_TEST_DBAPI_ADMIN_CONNECTION with another environment variable which
could be used to provide a template for generating multiple unique
database names. That would make it a little more intuitive. But, I can
work with this for now.

perhaps we can allow some kind of tokenized syntax within the
OS_TEST_DBAPI_ADMIN_CONNECTION variable itself. That env variable is
already kind of a beast but at least there's just the one.

Car

Re: [openstack-dev] [Neutron][oslo.db] Inspecting sqlite db during unit tests

2016-07-22 Thread Mike Bayer

On 07/22/2016 04:02 AM, Kevin Benton wrote:

Now that we have switched to oslo.db for test provisioning the
responsibility of choosing a location lands
here:
https://github.com/openstack/oslo.db/blob/a79479088029e4fa51def91cb36bc652356462b6/oslo_db/sqlalchemy/provision.py#L505

The problem is that when you specify OS_TEST_DBAPI_ADMIN_CONNECTION it
does end up creating the file, but then the logic above chooses a URL
based on the random ident. So you can find an sqlite file in your tmp
dir, it just won't be the one you asked for.

It seems like a bug in the oslo.db logic, but the commit that added it
was part of a much larger refactor so I'm not sure if it was intentional
to ensure that no two tests used the same db.

it is, the testr system runs tests in multiple subprocesses and I think
neutron has it set to four. if they all shared the same sqlite database
file you'd have failed tests.

On Thu, Jul 21, 2016 at 1:45 PM, Carl Baldwin > wrote:

Hi,

In Neutron, we run unit tests with an in-memory sqlite instance. It
is impossible, as far as I know, to inspect this database using the
sqlite3 command line while the unit tests are running. So, we have
to resort to python / sqlalchemy to do it. This is inconvenient.

Months ago, I was able to get the unit tests to write the sqlite db
to a file so that I could inspect it while I was sitting at a
breakpoint in the code. That was very nice. Yesterday, I tried to
repeat that while traveling and was unable to figure it out. I had
to time box my effort to move on to other things.

As far as I remember, the mechanism that I used was to adjust the
neutron.conf for the tests [1]. I'm not totally sure about this
because I didn't take sufficient notes, I think because it was
pretty easy to figure it out at the time. This mechanism doesn't
seem to have any effect these days. I changed it to
'sqlite:tmp/unit-test.db' and never saw a file created there.

I did a little bit of digging and I tried one more thing. That was
to set OS_TEST_DBAPI_ADMIN_CONNECTION='sqlite:tmp/unit-test.db'
in the environment before running tests. I was encouraged because
this caused a file to be created at that location but the file
remained empty for the duration of the run.

Does anyone know off the top of their head how to get unit tests in
Neutron to use a file based sqlite db?

Carl

[1]
https://github.com/openstack/neutron/blob/97c491294cf9eca0921336719d62d74ec4e1fa96/neutron/tests/etc/neutron.conf#L26

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe:
openstack-dev-requ...@lists.openstack.org?subject:unsubscribe

http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [Neutron][oslo.db] Inspecting sqlite db during unit tests

2016-07-22 Thread Mike Bayer

On 07/22/2016 04:02 AM, Kevin Benton wrote:

The problem is that when you specify OS_TEST_DBAPI_ADMIN_CONNECTION it
does end up creating the file, but then the logic above chooses a URL
based on the random ident. So you can find an sqlite file in your tmp
dir, it just won't be the one you asked for.

It seems like a bug in the oslo.db logic, but the commit that added it
was part of a much larger refactor so I'm not sure if it was intentional
to ensure that no two tests used the same db.

There is also a very recent commit to Neutron at
https://review.openstack.org/#/c/332476/ , which I think changes the
system to actually use the provisioning for the SQLite database as well,
whereas before it might have been not taking effect. But in any case,
the OS_TEST_DBAPI_ADMIN_CONNECTION thing still works in that if you give
it a file-based URL, provisioning should be putting the database files
in /tmp. If your approach is "pdb.set_trace(); then look at the file",
just do this:

$ OS_TEST_DBAPI_ADMIN_CONNECTION=sqlite:///myfile.db
.tox/functional/bin/python -m unittest
neutron.tests.unit.db.test_db_base_plugin_v2.TestBasicGet.test_single_get_admin

>
/home/classic/dev/redhat/openstack/neutron/neutron/tests/unit/db/test_db_base_plugin_v2.py(790)test_single_get_admin()

-> plugin = neutron.db.db_base_plugin_v2.NeutronDbPluginV2()
(Pdb)
(Pdb) self.engine.url
sqlite:tmp/hjbckefatl.db

then you can "sqlite3 /tmp/hjbckefatl.db" while the test is pending.

On Thu, Jul 21, 2016 at 1:45 PM, Carl Baldwin > wrote:

Hi,

Does anyone know off the top of their head how to get unit tests in
Neutron to use a file based sqlite db?

Carl

[1]
https://github.com/openstack/neutron/blob/97c491294cf9eca0921336719d62d74ec4e1fa96/neutron/tests/etc/neutron.conf#L26

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe:
openstack-dev-requ...@lists.openstack.org?subject:unsubscribe

http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [oslo.db] [CC neutron] CIDR overlap functionality and constraints

2016-07-22 Thread Mike Bayer




On 07/21/2016 02:43 PM, Carl Baldwin wrote:


None of these operations are expected to be very contentious and
performance hasn't really been a concern yet. If it were a big concern,
I'd be very interested in the GiST index solution because, as I
understand it, detecting overlap without that capability requires a
linear search through the existing records. But, GiST index capability
isn't ubiquitous which makes it difficult to get excited about for
practical purposes. I do have an academic interest in it. Computational
geometry used to be a hobby of mine when I worked on tools for physical
design of microchips. I've been telling people for years that I thought
it'd be cool if databases had some facility for indexing potentially
overlapping ranges in one or more dimensions. This looks like some
pretty cool stuff.

Can you think of any other operations in Neutron -- or elsewhere in
OpenStack -- which will benefit from these new functions? I'll be
honest. Without some compelling benefit, it may be very difficult to
swallow the pill of dealing with special case code in each kind of DB
for this capability. But, if it is abstracted sufficiently by oslo db,
it might be worth looking at down the road. The changes to adopt such
functionality shouldn't be too difficult.


Well let me reiterate the idea, which is that:

1. we add features to oslo.db so that the use of a custom stored 
function is not a big deal


2. we add features to oslo.db that are based on using triggers, special 
constraints, or Gist indexes, so that the use of a database constraint 
that needs this kind of thing is not a big deal


3. the first proof of concept for this, is a CIDR function / trigger for 
this one reported issue in Neutron.


Now the question is, "can I think of any operation in openstack, besides 
this one, that would benefit from a custom stored function or a 
specialized constraint".   The answer for me is "not specifically but i 
bet if I started looking, I would".  Anytime there's an application 
loading some rows of data out of a table, doing some calculations on it, 
then dealing with a subset of those rows as a result, is a candidate for 
#1 (in fact I have some vague recollection of seeing some patch in 
Neutron that had this issue, it was the reason that compare-and-swap 
could not be used).   Anytime an application is trying to insert rows 
into a table which should be rejected based on some criteria beyond 
"unique key", that's a candidate for #2 - perhaps the plethora of 
UUID-based recipes throughout openstack in some cases could be better 
stated by more data-driven constraints.


If we were to decide that the Neutron issue right here doesn't need any 
changes, then I would be fine abandoning this initiative for now.  But 
as it stands, there seems to be a need to either do this change, *or* 
add a new UUID column to the subnets table, and basically I'm hoping to 
start steering the boat away from the island of 
add-a-new-column-everytime-theres-a-concurrency-problem.






__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

[openstack-dev] [oslo.db] [CC neutron] CIDR overlap functionality and constraints

2016-07-19 Thread Mike Bayer


Oslo.db devs :

We've developed a system by which CIDR math, such as that of detecting 
region overlaps, can be performed on a MySQL database within queries [1] 
[2].   This feature makes use of a custom stored function I helped to 
produce which provides functionality similar to that which Postgresql 
provides built in [3].   SQLite also supports a simple way to add CIDR 
math functions as well which I've demonstrated at [4].


Note that I use the term "function" and not "procedure" to stress that 
this is not a "stored procedure" in the traditional sense of performing 
complex business logic and persistence operations - this CIDR function 
performs a calculation that is not at all specific to Openstack, and is 
provided already by other databases as a built-in, and nothing else.


The rationale for network-math logic being performed in the relational 
database is so that SQL like SELECT, UPDATE, and INSERT can make use of 
CIDR overlaps and other network math, such as to locate records that 
correspond to network ranges in some way and of course to provide guards 
and constraints, like that of concurrent UPDATE statements against 
conflicting ranges as well as being able to produce INSERT constraints 
for similar reasons.   Both MySQL and Postgresql have support for 
network number functions, Postgresql just has a lot more.


The INSERT constraint problem is also addressed by our patch and makes 
use of an INSERT trigger on MySQL [5], but on Postgresql we use a GIST 
index which has been shown to be more reliable under concurrent use than 
a trigger on this backend [6].


Not surprisingly, there's a lot of verbosity to both the production of 
the MySQL CIDR overlap function and the corresponding trigger and 
constraint, as well as the fact that to support the addition of these 
functions / constraints at both the Alembic migration level as well as 
that of the model level (because we would like metadata.create_all() to 
work), they are currently stated twice within this patch within their 
full verbosity.This is sub-optimal, and while the patch here makes 
use of an Alembic recipe [7] to aid in the maintenance of special DDL 
constructs, it's adding lots of burden to the Neutron codebase that 
could be better stated elsewhere.


The general verbosity and unfamiliarity of these well known SQL features 
is understandably being met with trepidation.  I've identified that this 
trepidation is likely rooted in the fact that unlike the many other 
elaborate SQL features we use like ALTER TABLE, savepoints, subqueries, 
SELECT FOR UPDATE, isolation levels, etc. etc., there is no warm and 
fuzzy abstraction layer here that is both greatly reducing the amount of 
explicit code needed to produce and upgrade the feature, as well as 
indicating that "someone else" will fix this system when it has problems.


Rather than hobbling the entire Openstack ecosystem to using a small 
subset of what our relational databases are capable of, I'd like to 
propose that preferably somewhere in oslo.db, or elsewhere, we begin 
providing the foundation for the use of SQL features that are rooted in 
mechanisms such as triggers and small use of stored functions, and more 
specifically begin to produce network-math SQL features as the public 
API, starting with this one.



[1] 
https://review.openstack.org/gitweb?p=openstack/neutron.git;a=blob;f=neutron/db/migration/alembic_migrations/versions/newton/expand/5bbf1e0b1774_add_stored_procedure_and_trigger_for_.py;h=8af394d319d119f57b224d391c844c0a87178856;hb=90f46e235672d3917015e5c49aa0513fb1de7ba9#l36


[2] https://review.openstack.org/#/c/314054/

[3] https://www.postgresql.org/docs/9.1/static/functions-net.html

[4] https://gist.github.com/zzzeek/a3bccad40610b9b69803531cc71a79b1

[5] 
https://review.openstack.org/gitweb?p=openstack/neutron.git;a=blob;f=neutron/db/migration/alembic_migrations/versions/newton/expand/5bbf1e0b1774_add_stored_procedure_and_trigger_for_.py;h=8af394d319d119f57b224d391c844c0a87178856;hb=90f46e235672d3917015e5c49aa0513fb1de7ba9#l92


[6] 
https://review.openstack.org/gitweb?p=openstack/neutron.git;a=blob;f=neutron/db/migration/alembic_migrations/versions/newton/expand/5bbf1e0b1774_add_stored_procedure_and_trigger_for_.py;h=8af394d319d119f57b224d391c844c0a87178856;hb=90f46e235672d3917015e5c49aa0513fb1de7ba9#l116


[7] 
http://alembic.zzzcomputing.com/en/latest/cookbook.html#replaceable-objects


__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

[openstack-dev] [oslo] [keystone] dogpile.cache 0.6.0 released

2016-06-06 Thread Mike Bayer



Hey all -

I've released dogpile.cache 0.6.0.  As discussed earlier in this thread, 
the big change in this is that we've retired the dogpile.core package; 
while that package will stay out on pypi as it is, the actual 
implementation has been rolled into dogpile.cache itself and the 
namespace packaging logic is removed.


In order to prevent any namespace-packaging debacles, the "dogpile.core" 
path itself is no longer used internally by dogpile.cache; however the 
package itself will still provide a dogpile.core import point for 
applications which may have been using dogpile.core directly (this 
should be very rare).


Changelog for 0.6.0 is at:

http://dogpilecache.readthedocs.io/en/latest/changelog.html#change-0.6.0






On 06/01/2016 04:54 PM, Mike Bayer wrote:

Just a reminder, dogpile.cache is doing away with namespace packaging in
version 0.6.0, due for the end of this week or sometime next week.
dogpile.core is being retired and left as-is.   No changes should be
needed by anyone using only dopgile.cache.



On 05/30/2016 06:17 PM, Mike Bayer wrote:

Hi all -

Just a heads up what's happening for dogpile.cache, in version 0.6.0 we
are rolling the functionality of the dogpile.core package into
dogpile.cache itself, and retiring the use of namespace package naming
for dogpile.cache.

Towards retiring the use of namespace packaging, the magic
"declare_namespace() / extend_path()" logic is being removed from the
file dogpile/__init__.py from dogpile.cache, and the "namespace_package"
directive being removed from setup.py.

However, currently, the plan is to leave alone entirely the
"dogpile.core" package as is, and to no longer use the name
"dogpile.core" within dogpile.cache at all; the constructs that it
previously imported from "dogpile.core" it now just imports from
"dogpile" and "dogpile.util" from within the dogpile.cache package.

The caveat here is that Python environments that have dogpile.cache
0.5.7 or earlier installed will also have dogpile.core 0.4.1 installed
as well, and dogpile.core *does* still contain the namespace package
verbiage as before.   From our testing, we don't see there being any
problem with this, however, I know there are people on this list who are
vastly more familiar than I am with namespace packaging and I would
invite them to comment on this as well as on the gerrit review [1] (the
gerrit invites anyone with a Github account to register and comment).

Note that outside of the Openstack world, there are a very small number
of applications that make use of dopgile.core directly.  From our
grepping we can find no mentions of "dogpile.core" in any Openstack
requirements files.For these applications, if a Python environment
already has dogpile.core installed, this would continue to be used;
however dogpile.cache also includes a file dogpile/core.py which sets up
a compatible namespace, so that applications which list only
dogpile.cache in their requirements but make use of "dogpile.core"
constructs will continue to work as before.

I would ask that anyone reading this to please alert me to anyone, any
project, or any announcement medium which may be necessary in order to
ensure that anyone who needs to be made aware of these changes are aware
of them and have vetted them ahead of time.   I would like to release
dogpile.cache 0.6.0 by the end of the week if possible.  I will send
this email a few more times to the list to make sure that it is seen.


[1] https://gerrit.sqlalchemy.org/#/c/89/



__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [oslo] [keystone] rolling dogpile.core into dogpile.cache, removing namespace packaging (PLEASE REVIEW)

2016-06-01 Thread Mike Bayer

Just a reminder, dogpile.cache is doing away with namespace packaging in 
version 0.6.0, due for the end of this week or sometime next week. 
dogpile.core is being retired and left as-is.   No changes should be 
needed by anyone using only dopgile.cache.




On 05/30/2016 06:17 PM, Mike Bayer wrote:

Hi all -

Just a heads up what's happening for dogpile.cache, in version 0.6.0 we
are rolling the functionality of the dogpile.core package into
dogpile.cache itself, and retiring the use of namespace package naming
for dogpile.cache.

Towards retiring the use of namespace packaging, the magic
"declare_namespace() / extend_path()" logic is being removed from the
file dogpile/__init__.py from dogpile.cache, and the "namespace_package"
directive being removed from setup.py.

However, currently, the plan is to leave alone entirely the
"dogpile.core" package as is, and to no longer use the name
"dogpile.core" within dogpile.cache at all; the constructs that it
previously imported from "dogpile.core" it now just imports from
"dogpile" and "dogpile.util" from within the dogpile.cache package.

The caveat here is that Python environments that have dogpile.cache
0.5.7 or earlier installed will also have dogpile.core 0.4.1 installed
as well, and dogpile.core *does* still contain the namespace package
verbiage as before.   From our testing, we don't see there being any
problem with this, however, I know there are people on this list who are
vastly more familiar than I am with namespace packaging and I would
invite them to comment on this as well as on the gerrit review [1] (the
gerrit invites anyone with a Github account to register and comment).

Note that outside of the Openstack world, there are a very small number
of applications that make use of dopgile.core directly.  From our
grepping we can find no mentions of "dogpile.core" in any Openstack
requirements files.For these applications, if a Python environment
already has dogpile.core installed, this would continue to be used;
however dogpile.cache also includes a file dogpile/core.py which sets up
a compatible namespace, so that applications which list only
dogpile.cache in their requirements but make use of "dogpile.core"
constructs will continue to work as before.

I would ask that anyone reading this to please alert me to anyone, any
project, or any announcement medium which may be necessary in order to
ensure that anyone who needs to be made aware of these changes are aware
of them and have vetted them ahead of time.   I would like to release
dogpile.cache 0.6.0 by the end of the week if possible.  I will send
this email a few more times to the list to make sure that it is seen.


[1] https://gerrit.sqlalchemy.org/#/c/89/



__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

[openstack-dev] [oslo] [keystone] rolling dogpile.core into dogpile.cache, removing namespace packaging (PLEASE REVIEW)

2016-05-30 Thread Mike Bayer


Hi all -

Just a heads up what's happening for dogpile.cache, in version 0.6.0 we 
are rolling the functionality of the dogpile.core package into 
dogpile.cache itself, and retiring the use of namespace package naming 
for dogpile.cache.


Towards retiring the use of namespace packaging, the magic 
"declare_namespace() / extend_path()" logic is being removed from the 
file dogpile/__init__.py from dogpile.cache, and the "namespace_package" 
directive being removed from setup.py.


However, currently, the plan is to leave alone entirely the 
"dogpile.core" package as is, and to no longer use the name 
"dogpile.core" within dogpile.cache at all; the constructs that it 
previously imported from "dogpile.core" it now just imports from 
"dogpile" and "dogpile.util" from within the dogpile.cache package.


The caveat here is that Python environments that have dogpile.cache 
0.5.7 or earlier installed will also have dogpile.core 0.4.1 installed 
as well, and dogpile.core *does* still contain the namespace package 
verbiage as before.   From our testing, we don't see there being any 
problem with this, however, I know there are people on this list who are 
vastly more familiar than I am with namespace packaging and I would 
invite them to comment on this as well as on the gerrit review [1] (the 
gerrit invites anyone with a Github account to register and comment).


Note that outside of the Openstack world, there are a very small number 
of applications that make use of dopgile.core directly.  From our 
grepping we can find no mentions of "dogpile.core" in any Openstack 
requirements files.For these applications, if a Python environment 
already has dogpile.core installed, this would continue to be used; 
however dogpile.cache also includes a file dogpile/core.py which sets up 
a compatible namespace, so that applications which list only 
dogpile.cache in their requirements but make use of "dogpile.core" 
constructs will continue to work as before.


I would ask that anyone reading this to please alert me to anyone, any 
project, or any announcement medium which may be necessary in order to 
ensure that anyone who needs to be made aware of these changes are aware 
of them and have vetted them ahead of time.   I would like to release 
dogpile.cache 0.6.0 by the end of the week if possible.  I will send 
this email a few more times to the list to make sure that it is seen.



[1] https://gerrit.sqlalchemy.org/#/c/89/


__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [neutron] Seeing db lockout issues in neutron add_router_interface

2016-05-10 Thread Mike Bayer




On 05/10/2016 04:57 PM, Divya wrote:

Hi,
I am trying to run this rally test on stable/kilo
https://github.com/openstack/rally/blob/master/samples/tasks/scenarios/neutron/create_and_delete_routers.json

with concurrency 50 and iterations 2000.

This test basically cretaes routers and subnets
and then calls
router-interface-add
router-interface-delete


And i am running this against 3rd party Nuage plugin.

In the NuagePlugin:

add_router_interface is something like this:

super().add_router_interface
try:
   some calls to external rest server
   super().delete_port
except:

remove_router_interface:
---
super().remove_router_interface
some calls to external rest server
super().create_port()
some calls to external rest server


If i comment delete_port in the add_router_interface, i am not hitting
the db lockout issue.
delete_port or any other operations are not within any transaction.
So not sure, why this is leading to db lock timeouts in insert to routerport

error trace
http://paste.openstack.org/show/496626/



Really appreciate any help on this.



I'm not on the Neutron team, but in general, Openstack applications 
should be employing retry logic internally which anticipates database 
deadlocks like these and retries the operation.  I'd report this stack 
trace (especially if it is reproducible) as a bug to this plugin's 
launchpad project.






Thanks,
Divya














__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev



__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [nova] Distributed Database

2016-05-03 Thread Mike Bayer




On 05/02/2016 01:48 PM, Clint Byrum wrote:




FWIW, I agree with you. If you're going to use SQLAlchemy, use it to
take advantage of the relational model.

However, how is what you describe a win? Whether you use SELECT .. FOR
UPDATE, or a stored procedure, the lock is not distributed, and thus, will
still suffer rollback failures in Galera. For single DB server setups, you
don't have to worry about that, and SELECT .. FOR UPDATE will work fine.


Well it's a "win" vs. the lesser approach considered which also did not 
include a distributed locking system like Zookeeper.   It is also a win 
even with a Zookeeper-like system in place because it allows a SQL query 
to be much smarter about selecting data that involves IP numbers and 
CIDRs, without the need to pull data into memory and process it there. 
This is the most common mistake in SQL programming, not taking advantage 
of SQL's set-based nature and instead pulling data into memory 
unnecessarily.


Also, the "federated MySQL" approach of Cells V2 would still be OK with 
pessimistic locking, since this lock is not "distributed" across the 
entire dataspace.   Only the usual Galera caveats apply, e.g. point to 
only one galera "master" at a time and/or wait for Galera to support 
"SELECT FOR UPDATE" across the cluster.





Furthermore, any logic that happens inside the database server is extra
load on a much much much harder resource to scale, using code that is
much more complicated to update.


So I was careful to use the term "stored function" and not "stored 
procedure".   As ironic as it is for me to defend both the ORM 
business-logic-in-the-application-not-the-database position, *and* the 
let-the-database-do-things-not-the-application at the same time, using 
database functions to allow new kinds of math and comparison operations 
to take place over sets is entirely reasonable, and should not be 
confused with the old-school big-business approach of building an entire 
business logic layer as a huge wall of stored procedures, this is 
nothing like that.


The Postgresql database has INET and CIDR types native which include the 
same overlap logic we are implementing here as a MySQL stored function, 
so the addition of math functions like these shouldn't be controversial. 
  The "load" of this function is completely negligible (however I would 
be glad to assist in load testing it to confirm), especially compared to 
pulling the same data across the wire, processing it in Python, then 
sending just a tiny portion of it back again after we've extracted the 
needle from the haystack.


In pretty much every kind of load testing scenario we do with Openstack, 
the actual "load" on the database barely pushes anything.   The only 
database "resource" issue we have is Openstack using far more idle 
connections than it should, which is on my end to work on improvements 
to the connection pooling system which does not scale well across 
Openstack's tons-of-processes model.





To be clear, it's not the amount of data, but the size of the failure
domain. We're more worried about what will happen to those 40,000 open
connections from our 4000 servers when we do have to violently move them.


That's a really big number and I will admit I would need to dig into 
this particular problem domain more deeply to understand what exactly 
the rationale of that kind of scale would be here.   But it does seem 
like if you were using SQL databases, and the 4000 server system is in 
fact grouped into hundreds of "silos" that only deal with strict 
segments of the total dataspace, a federated approach would be exactly 
what you'd want to go with.





That particular problem isn't as scary if you have a large
Cassandra/MongoDB/Riak/ROME cluster, as the client libraries are
generally connecting to all or most of the nodes already, and will
simply use a different connection if the initial one fails. However,
these other systems also bring a whole host of new problems which the
simpler SQL approach doesn't have.


Regarding ROME, I only seek to make the point that if you're going to 
switch to NoSQL, you have to switch to NoSQL.   Bolting SQLAlchemy on 
top of Redis without a mature and widely-proven relational layer in 
between, down to the level of replicating the actual tables that were 
built within a relational schema, is a denial of the reality of the 
problem to be solved.






So it's worth doing an actual analysis of the failure handling before
jumping to the conclusion that a pile of cells/sharding code or a rewrite
to use a distributed database would be of benefit.

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev



__
OpenStack Development Mailing List (not

Re: [openstack-dev] [nova] Distributed Database

2016-05-02 Thread Mike Bayer




On 05/02/2016 07:38 AM, Matthieu Simonin wrote:



As far as we understand the idea of an ORM is to hide the relational database 
with an Object oriented API.


I actually disagree with that completely.  The reason ORMs are so 
maligned is because of this misconception; developer attempts to use an 
ORM so that they will need not have to have any awareness of their 
database, how queries are constructed, or even its schema's design; 
witness tools such as Django ORM and Rails ActiveRecord which promise 
this.   You then end up with an inefficient and unextensible mess 
because the developers never considered anything about how the database 
works or how it is queried, nor do they even have easy ways to monitor 
or control it while still making use of the tool.   There are many blog 
posts and articles that discuss this and it is in general known as the 
"object relational impedance mismatch".


SQLAlchemy's success comes from its rejection of this entire philosophy. 
 The purpose of SQLAlchemy's ORM is not to "hide" anything but rather 
to apply automation to the many aspects of relational database 
communication as well as row->object mapping that otherwise express 
themselves in an application as either a large amount of repetitive 
boilerplate throughout an application or as an awkward series of ad-hoc 
abstractions that don't really do the job very well.   SQLAlchemy is 
designed to expose both the schema design as well as the structure of 
queries completely.   My talk at [1] goes into this topic in detail 
including specific API architectures that facilitate this concept.


It's for that reason that I've always rejected notions of attempting to 
apply SQLAlchemy directly on top of a datastore that is explicitly 
non-relational.   By doing so, you remove a vast portion of the 
functionality that relational databases provide and there's really no 
point in using a tool like SQLAlchemy that is very explicit about DDL 
and SQL on top of that kind of database.


To effectively put SQLAlchemy on top of a non-relational datastore, what 
you really want to do is build an entire SQL engine on top of it.  This 
is actually feasible; I was doing work for the now-defunct FoundationDB 
(was bought by Apple) who had a very good implementation of 
SQL-on-top-of-distributed keystore going, and the Cockroach and TiDB 
projects you mention are definitely the most appropriate choice to take 
if a certain variety of distribution underneath SQL is desired.


 Concerning SQLAlchemy,

relationnal aspect of the underlying database may also be used by the user but 
we observed that in Nova, most
of the db interactions are written in an Object-oriented style (few queries are 
using SQL),
thus we don't think that Nova requires a relational database, it just requires 
an object oriented abstraction to manipulate a database.


Well IMO that's actually often a problem.  My goal across Openstack 
projects in general is to allow them to make use of SQL more effectively 
than they do right now; for example, in Neutron I am helping them to 
move a block of code that inefficiently needs to load a block of data 
into memory, scan it for CIDR overlaps, and then push data back out. 
This approach prevents it from performing a single UPDATE statement and 
ushers in the need for pessimistic locking against concurrent 
transactions.  Instead, I've written for them a simple stored function 
proof-of-concept [2] that will allow the entire operation to be 
performed on the database side alone in a single statement.  Wins like 
these are much less feasible if not impossible when a project decides it 
wants to split its backend store between dramatically different 
databases which don't offer such features.




Concretely, we think that there are three possible approaches:
 1) We can use the SQLAlchemy API as the common denominator between a 
relational and non-relational implementation of the db.api component. These two 
implementation could continue to converge by sharing a large amount of code.
 2) We create a new non-relational implementation (from scratch) of the 
db.api component. It would require probably more work.
 3) We are also studying a last alternative: writing a SQLAlchemy engine 
that targets NewSQL databases (scalability + ACID):
  - https://github.com/cockroachdb/cockroach
  - https://github.com/pingcap/tidb


Going with a NewSQL backend is by far the best approach here.   That 
way, very little needs to be reinvented and the application's approach 
to data doesn't need to dramatically change.


But also, w.r.t. Cells there seems to be some remaining debate over why 
exactly a distributed approach is even needed.  As others have posted, a 
single MySQL database, replicated across Galera or not, scales just fine 
for far more data than Nova ever needs to store.  So it's not clear why 
the need for a dramatic rewrite of its datastore is called for.



[1]

Re: [openstack-dev] [Fuel][MySQL][DLM][Oslo][DB][Trove][Galera][operators] Multi-master writes look OK, OCF RA and more things

2016-04-30 Thread Mike Bayer




On 04/30/2016 10:50 AM, Clint Byrum wrote:

Excerpts from Roman Podoliaka's message of 2016-04-29 12:04:49 -0700:




I'm curious why you think setting wsrep_sync_wait=1 wouldn't help.

The exact example appears in the Galera documentation:

http://galeracluster.com/documentation-webpages/mysqlwsrepoptions.html#wsrep-sync-wait

The moment you say 'SET SESSION wsrep_sync_wait=1', the behavior should
prevent the list problem you see, and it should not matter that it is
a separate session, as that is the entire point of the variable:



we prefer to keep it off and just point applications at a single node 
using master/passive/passive in HAProxy, so that we don't have the 
unnecessary performance hit of waiting for all transactions to 
propagate; we just stick on one node at a time.   We've fixed a lot of 
issues in our config in ensuring that HAProxy definitely keeps all 
clients on exactly one Galera node at a time.




"When you enable this parameter, the node triggers causality checks in
response to certain types of queries. During the check, the node blocks
new queries while the database server catches up with all updates made
in the cluster to the point where the check was begun. Once it reaches
this point, the node executes the original query."

In the active/passive case where you never use the passive node as a
read slave, one could actually set wsrep_sync_wait=1 globally. This will
cause a ton of lag while new queries happen on the new active and old
transactions are still being applied, but that's exactly what you want,
so that when you fail over, nothing proceeds until all writes from the
original active node are applied and available on the new active node.
It would help if your failover technology actually _breaks_ connections
to a presumed dead node, so writes stop happening on the old one.


If HAProxy is failing over from the master, which is no longer 
reachable, to another passive node, which is reachable, that means that 
master is partitioned and will leave the Galera primary component.   It 
also means all current database connections are going to be bounced off, 
which will cause errors for those clients either in the middle of an 
operation, or if a pooled connection is reused before it is known that 
the connection has been reset.  So failover is usually not an error-free 
situation in any case from a database client perspective and retry 
schemes are always going to be needed.


Additionally, the purpose of the enginefacade [1] is to allow Openstack 
applications to fix their often incorrectly written database access 
logic such that in many (most?) cases, a single logical operation is no 
longer unnecessarily split among multiple transactions when possible. 
I know that this is not always feasible in the case where multiple web 
requests are coordinating, however.


That leaves only the very infrequent scenario of, the master has 
finished sending a write set off, the passives haven't finished 
committing that write set, the master goes down and HAProxy fails over 
to one of the passives, and the application that just happens to also be 
connecting fresh onto that new passive node in order to perform the next 
operation that relies upon the previously committed data so it does not 
see a database error, and instead runs straight onto the node where the 
committed data it's expecting hasn't arrived yet.   I can't make the 
judgment for all applications if this scenario can't be handled like any 
other transient error that occurs during a failover situation, however 
if there is such a case, then IMO the wsrep_sync_wait (formerly known as 
wsrep_causal_reads) may be used on a per-transaction basis for that very 
critical, not-retryable-even-during-failover operation.  Allowing this 
variable to be set for the scope of a transaction and reset afterwards, 
and only when talking to Galera, is something we've planned to work into 
the enginefacade as well as an declarative transaction attribute that 
would be a pass-through on other systems.


[1] 
https://specs.openstack.org/openstack/oslo-specs/specs/kilo/make-enginefacade-a-facade.html





Also, If you thrash back and forth a bit, that could cause your app to
virtually freeze, but HAProxy and most other failover technologies allow
tuning timings so that you can stay off of a passive server long enough
to calm it down and fail more gracefully to it.

Anyway, this is why sometimes I do wonder if we'd be better off just
using MySQL with DRBD and good old pacemaker.

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev



__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe:

Re: [openstack-dev] [Fuel][MySQL][DLM][Oslo][DB][Trove][Galera][operators] Multi-master writes look OK, OCF RA and more things

2016-04-30 Thread Mike Bayer

On 04/30/2016 02:57 AM, bdobre...@mirantis.com wrote:

Hi Roman.
That's interesting, although’s hard to believe (there is no slave lag in
galera multi master). I can only suggest us to create another jepsen
test to verify exactly scenario you describe. As well as other OpenStack
specific patterns.

There is definitely slave lag in Galera and it can be controlled using 
the wsrep_causal_reads_flag.

Demonstration script, whose results I have confirmed separately using 
Pythons scripts, is at:

https://www.percona.com/blog/2013/03/03/investigating-replication-
latency-in-percona-xtradb-cluster/  

Regards,
Bogdan.

*Od:* Roman Podoliaka 
*Wysłano:* ‎piątek‎, ‎29‎ ‎kwietnia‎ ‎2016 ‎21‎:‎04
*Do:* OpenStack Development Mailing List (not for usage questions)

*DW:* openstack-operat...@lists.openstack.org

Hi Bogdan,

Thank you for sharing this! I'll need to familiarize myself with this
Jepsen thing, but overall it looks interesting.

As it turns out, we already run Galera in multi-writer mode in Fuel
unintentionally in the case, when the active MySQL node goes down,
HAProxy starts opening connections to a backup, then the active goes
up again, HAProxy starts opening connections to the original MySQL
node, but OpenStack services may still have connections opened to the
backup in their connection pools - so now you may have connections to
multiple MySQL nodes at the same time, exactly what you wanted to
avoid by using active/backup in the HAProxy configuration.

^ this actually leads to an interesting issue [1], when the DB state
committed on one node is not immediately available on another one.
Replication lag can be controlled  via session variables [2], but that
does not always help: e.g. in [1] Nova first goes to Neutron to create
a new floating IP, gets 201 (and Neutron actually *commits* the DB
transaction) and then makes another REST API request to get a list of
floating IPs by address - the latter can be served by another
neutron-server, connected to another Galera node, which does not have
the latest state applied yet due to 'slave lag' - it can happen that
the list will be empty. Unfortunately, 'wsrep_sync_wait' can't help
here, as it's two different REST API requests, potentially served by
two different neutron-server instances.

Basically, you'd need to *always* wait for the latest state to be
applied before executing any queries, which Galera is trying to avoid
for performance reasons.

Thanks,
Roman

[1] https://bugs.launchpad.net/fuel/+bug/1529937
[2]
http://galeracluster.com/2015/06/achieving-read-after-write-semantics-with-galera/

On Fri, Apr 22, 2016 at 10:42 AM, Bogdan Dobrelya
 wrote:
 > [crossposting to openstack-operat...@lists.openstack.org]
 >
 > Hello.
 > I wrote this paper [0] to demonstrate an approach how we can leverage a
 > Jepsen framework for QA/CI/CD pipeline for OpenStack projects like Oslo
 > (DB) or Trove, Tooz DLM and perhaps for any integration projects which
 > rely on distributed systems. Although all tests are yet to be finished,
 > results are quite visible, so I better off share early for a review,
 > discussion and comments.
 >
 > I have similar tests done for the RabbitMQ OCF RA clusterers as well,
 > although have yet wrote a report.
 >
 > PS. I'm sorry for so many tags I placed in the topic header, should I've
 > used just "all" :) ? Have a nice weekends and take care!
 >
 > [0] https://goo.gl/VHyIIE
 >
 > --
 > Best regards,
 > Bogdan Dobrelya,
 > Irc #bogdando
 >
 >
 >
 >
__
 > OpenStack Development Mailing List (not for usage questions)
 > Unsubscribe:
openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
 > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [nova] Distributed Database

2016-04-28 Thread Mike Bayer




On 04/28/2016 08:25 PM, Edward Leafe wrote:


Your own tests showed that a single RDBMS instance doesn’t even break a sweat
under your test loads. I don’t see why we need to shard it in the first
place, especially if in doing so we add another layer of complexity and
another dependency in order to compensate for that choice. Cells are a useful
concept, but this proposed implementation is adding way too much complexity
and debt to make it worthwhile.


now that is a question I have also.  Horizontal sharding is usually for 
the case where you need to store say, 10B rows, and you'd like to split 
it up among different silos.  Nothing that I've seen about Nova suggests 
this is a system with any large data requirements, or even medium size 
data (a few million rows in relational databases is nothing).I 
didn't have the impression that this was the rationale behind Cells, it 
seems like this is more of some kind of logical separation of some kind 
that somehow suits some environments (but I don't know how). 
Certainly, if you're proposing a single large namespace of data across a 
partition of nonrelational databases, and then the data size itself is 
not that large, as long as "a single namespace" is appropriate then 
there's no reason to break out of more than one MySQL database.  There's 
not much reason to transparently shard unless you are concerned about 
adding limitless storage capacity.   The Cells sharding seems to be 
intentionally explicit and non-transparent.






-- Ed Leafe





__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev



__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [nova] Distributed Database

2016-04-28 Thread Mike Bayer




On 04/28/2016 08:44 AM, Edward Leafe wrote:

On Apr 24, 2016, at 3:28 PM, Robert Collins  wrote:


For instance, the things I think are essential for a distributed
database based datastore:
- good single-machine developer story. Must not need a physical
cluster to hack on OpenStack
- deal gracefully with single node/rack/site failures (when deployed
appropriately) - allow limiting failure domain impact
- straightforward programming model: wrong uses should be obvious to reviewers
- low latency performance with big datasets: e.g. nova list as an
admin should be able to get the Nth page as rapidly as the 2nd or 3rd.
- code to deliver that should be (approximately) no worse than the current code


Agree on all of these points, as well as the rest of your post.

After several hallway track discussions, as well as yesterday’s Cells V2 
discussion, I’ve written a follow-up post:

http://blog.leafe.com/index.php/2016/04/28/fragmented-data/

Feedback, of course, is welcomed!



Regarding ROME [1], I've taken a look at its source code and while it is 
certainly interesting, I wouldn't recommend lifting and moving all of 
Nova's database infrastructure onto it as a dependency within the near 
term, as the state of this code is very immature.  SQLAlchemy itself was 
once immature as well, so there is no sin here, but that was eleven 
years ago.


The internals here are not only highly dependent on SQLAlchemy internals 
(pinned at the 0.9 series which is obsolete), it is using these APIs in 
a very brittle and non-performant way [2].  In this code example, the 
internal elements of SQLAlchemy expression objects are repeatedly run 
through str() which on each call runs a full string compilation step in 
order to test for what their actual type is.  It can't be overstated how 
inappropriate this approach is and the author of the library would have 
benefited from reaching out to me in order to get some guidance on the 
correct way to introspect SQLAlchemy expression objects.  Basic Python 
idioms like type checking also seem to be misunderstood [3].


I don't think anyone denies that Nova can use any kind of database 
backend but the point was raised that to start from scratch with an 
entirely new database approach is an enormous job.   If the first step 
of that job is in fact "port SQLAlchemy and the relational model to 
Redis", that makes the job extremely more involved and I'd disagree with 
your post's assertion that "It's not too late" if this is the case. 
If the admission of ROME for Nova is that the relational model is in 
fact necessary for Nova, then that disqualifies NoSQL databases out of 
the gate - it's one thing to lament that MySQL is not as "distributed" 
out of the box as a NoSQL database, but it's another to lament that 
non-relational databases are not in fact relational.


[1] https://github.com/BeyondTheClouds/rome

[2] 
https://github.com/BeyondTheClouds/rome/blob/master/lib/rome/core/expression/expression.py#L172


[3] 
https://github.com/BeyondTheClouds/rome/blob/master/lib/rome/core/expression/expression.py#L102





-- Ed Leafe






__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev



__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [oslo.config] Encrypt the sensitive options

2016-04-26 Thread Mike Bayer




On 04/26/2016 09:32 AM, Daniel P. Berrange wrote:


IMHO encrypting oslo config files is addressing the wrong problem.
Rather than having sensitive passwords stored in the main config
files, we should have them stored completely separately by a secure
password manager of some kind. The config file would then merely
contain the name or uuid of an entry in the password manager. The
service (eg nova-compute) would then query that password manager
to get the actual sensitive password data it requires. At this point
oslo.config does not need to know/care about encryption of its data
as there's no longer sensitive data stored.


at the end of the day, if someone is on the machine where they can read 
those config files, they are on that machine where they can run any 
Python code they want which itself can be exactly the code in the 
openstack app that contacts this password service and gets the same 
information.   Or put another way, nova-compute still needs a password 
or key of some kind to connect to this password service anyway.


If what we're going for as far as passwords in config files is that they 
don't get committed to source repositories or copied out to public 
places, then fine, store them "somewhere else" just to note that these 
are special values.  But as far as someone on the machine (assuming 
per-user permissions to read the same files that the app can see have 
been acquired), there's always a key/password/token needed to get to 
"the password service", so they have access.   The best you can do is 
run some closed-source executable that has private keys buried within 
it, to at least make this attack more difficult, or if you are really 
looking for something inconvenient, an administrator has to manually 
type in a passphrase when starting up the services.  But we're using 
open source, source-code-present Python and I don't think we're doing 
passphrase-on-startup.   So being on the box means, you have the passwords.





Regards,
Daniel



__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [nova] Distributed Database

2016-04-23 Thread Mike Bayer




On 04/22/2016 04:27 PM, Ed Leafe wrote:

OK, so I know that Friday afternoons are usually the worst times to
write a blog post and start an email discussion, and that the Friday
immediately before a Summit is the absolute worst, but I did it anyway.

http://blog.leafe.com/index.php/2016/04/22/distributed_data_nova/

Summary: we are creating way too much complexity by trying to make Nova
handle things that are best handled by a distributed database. The
recent split of the Nova DB into an API database and separate cell
databases is the glaring example of going down the wrong road.

Anyway, read it on your flight (or, in my case, drive) to Austin, and
feel free to pull me aside to explain just how wrong I am. ;-)


Distributed databases aren't mutually exclusive against SQL databases. 
  I am only vaguely familiar with Cells and how it divides up data into 
entirely different databases of the same schema, and perhaps it wasn't 
executed well, however a discussion like this would need to separate the 
concept of "distributed" from the notion that "that means we need a 
database that advertises itself as distributed!".


The general problem Cells is solving strikes me very much as a 
traditional horizontal sharding problem.  While key stores like to 
advertise that cross-database sharding is very easy with plain 
key/values, that's at the expense of the enormous amount of 
functionality you give up, including ACID and the relational model. 
There's no reason you can't horizontally shard a relational database, 
and while Cells seems like it's made this approach somewhat rigid, it 
doesn't have to be that way.   SQLAlchemy has long had a horizontal 
sharding extension and relational databases like Postgresql also include 
horizontal sharding structures built in (see 
http://www.postgresql.org/docs/9.1/static/ddl-partitioning.html).  If 
you shard your data into compartments the way Cells does, you can still 
pretty much keep ACID local to one database at a time, or if you want to 
distribute a transaction you can use two phase commit which MySQL and 
Postgresql both support.


A key reason the NoSQL movement failed to completely replace relational 
databases as its advocates seemed to think would happen about five years 
ago, was that they spent lots of time claiming to solve problems in SQL 
that weren't actually problems, such as the idea that "schemaless" is 
easier to work with (there's always a schema, NoSQL just has no way of 
validating or enforcing it), or that you just couldn't do key/value 
transactions nearly as fast with ACID (until Postgresql made a few 
tweaks and successfully beats MongoDB at this task now).


It may or may not be the case that "Cells didn't do a very good job of 
distributing SQL" but that doesn't mean "SQL is not appropriate for 
distributing data".   Facebook and LinkedIn have built distributed 
database systems based on MySQL at profoundly massive scales. 
Openstack's problem I'm going to guess isn't as hard as that.










__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev



__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] Deleting a cluster in Sahara SQL/PyMYSQL Error

2016-03-25 Thread Mike Bayer

and on the subject of"more folks reading this message", here's that 
bug report!   https://bugs.launchpad.net/sahara/+bug/1561807




On 03/25/2016 12:42 AM, Vitaly Gridnev wrote:

Hi. Thanks for your bug report! Also I would recommend to add
appropriate tags in future (like sahara in our case) to subject of the
message so that more folks can read this message.

Let's continue discussions about the bug in launchpad.

On Fri, Mar 25, 2016 at 4:23 AM, Jerico Revote <jerico.rev...@monash.edu
<mailto:jerico.rev...@monash.edu>> wrote:

Yes, first, when creating a new cluster, it get stuck on "validating",
then tried deleting it but get stuck again on "deleting",
and seeing that SQL error.
Will submit a bug @ Launchpad.

On 25 March 2016 at 01:56, Mike Bayer <mba...@redhat.com
<mailto:mba...@redhat.com>> wrote:

Id recommend filing a bug in Launchpad against Sahara for that.
Can you reproduce it ?




On 03/23/2016 07:10 PM, Jerico Revote wrote:

Hello,

When trying to delete a cluster in sahara,
I'm getting the following error:

code 500 and message 'Internal Server Error'
2016-03-23 17:25:21.651 18827 <tel:21.651%2018827> ERROR
sahara.utils.api
[req-d797bbc8-7932-4187-a428-565f9d834f8b ] Traceback
(most recent
call last):
OperationalError: (pymysql.err.OperationalError) (2014,
'Command Out
of Sync')
2016-03-23 17:25:35.803 18823 ERROR
oslo_db.sqlalchemy.exc_filters
[req-377ef364-f2c7-4343-b32c-3741bfc0a05b ] DB exception
wrapped.
2016-03-23 17:25:35.803 18823 ERROR
oslo_db.sqlalchemy.exc_filters
Traceback (most recent call last):
2016-03-23 17:25:35.803 18823 ERROR
oslo_db.sqlalchemy.exc_filters
  File
"/usr/lib/python2.7/dist-packages/sqlalchemy/engine/base.py",
line 1139, in _execute_context
2016-03-23 17:25:35.803 18823 ERROR
oslo_db.sqlalchemy.exc_filters
  context)
2016-03-23 17:25:35.803 18823 ERROR
oslo_db.sqlalchemy.exc_filters
  File
"/usr/lib/python2.7/dist-packages/sqlalchemy/engine/default.py",
line 450, in do_execute
2016-03-23 17:25:35.803 18823 ERROR
oslo_db.sqlalchemy.exc_filters
  cursor.execute(statement, parameters)
2016-03-23 17:25:35.803 18823 ERROR
oslo_db.sqlalchemy.exc_filters
  File
"/usr/lib/python2.7/dist-packages/pymysql/cursors.py",
line 132,
in execute
2016-03-23 17:25:35.803 18823 ERROR
oslo_db.sqlalchemy.exc_filters
  result = self._query(query)
2016-03-23 17:25:35.803 18823 ERROR
oslo_db.sqlalchemy.exc_filters
  File
"/usr/lib/python2.7/dist-packages/pymysql/cursors.py",
line 271,
in _query
2016-03-23 17:25:35.803 18823 ERROR
oslo_db.sqlalchemy.exc_filters
  conn.query(q)
2016-03-23 17:25:35.803 18823 ERROR
oslo_db.sqlalchemy.exc_filters
  File
"/usr/lib/python2.7/dist-packages/pymysql/connections.py",
line
726, in query
2016-03-23 17:25:35.803 18823 ERROR
oslo_db.sqlalchemy.exc_filters
  self._affected_rows =
self._read_query_result(unbuffered=unbuffered)
2016-03-23 17:25:35.803 18823 ERROR
oslo_db.sqlalchemy.exc_filters
  File
"/usr/lib/python2.7/dist-packages/pymysql/connections.py",
line
861, in _read_query_result
2016-03-23 17:25:35.803 18823 ERROR
oslo_db.sqlalchemy.exc_filters
  result.read()
2016-03-23 17:25:35.803 18823 ERROR
oslo_db.sqlalchemy.exc_filters
  File
"/usr/lib/python2.7/dist-packages/pymysql/connections.py",
line
1064, in read
2016-03-23 17:25:35.803 18823 ERROR
oslo_db.sqlalchemy.exc_filters
  first_packet = self.connection._read_packet()
2016-03-23 17:25:35.803 18823 ERROR
oslo_db.sqlalchemy.exc_filters
  File
"/usr/lib/python2.7/dist-packages

Re: [openstack-dev] Deleting a cluster in Sahara SQL/PyMYSQL Error

2016-03-24 Thread Mike Bayer

Id recommend filing a bug in Launchpad against Sahara for that.  Can you 
reproduce it ?




On 03/23/2016 07:10 PM, Jerico Revote wrote:

Hello,

When trying to delete a cluster in sahara,
I'm getting the following error:


code 500 and message 'Internal Server Error'
2016-03-23 17:25:21.651 18827 ERROR sahara.utils.api
[req-d797bbc8-7932-4187-a428-565f9d834f8b ] Traceback (most recent
call last):
OperationalError: (pymysql.err.OperationalError) (2014, 'Command Out
of Sync')
2016-03-23 17:25:35.803 18823 ERROR oslo_db.sqlalchemy.exc_filters
[req-377ef364-f2c7-4343-b32c-3741bfc0a05b ] DB exception wrapped.
2016-03-23 17:25:35.803 18823 ERROR oslo_db.sqlalchemy.exc_filters
Traceback (most recent call last):
2016-03-23 17:25:35.803 18823 ERROR oslo_db.sqlalchemy.exc_filters
 File "/usr/lib/python2.7/dist-packages/sqlalchemy/engine/base.py",
line 1139, in _execute_context
2016-03-23 17:25:35.803 18823 ERROR oslo_db.sqlalchemy.exc_filters
 context)
2016-03-23 17:25:35.803 18823 ERROR oslo_db.sqlalchemy.exc_filters
 File "/usr/lib/python2.7/dist-packages/sqlalchemy/engine/default.py",
line 450, in do_execute
2016-03-23 17:25:35.803 18823 ERROR oslo_db.sqlalchemy.exc_filters
 cursor.execute(statement, parameters)
2016-03-23 17:25:35.803 18823 ERROR oslo_db.sqlalchemy.exc_filters
 File "/usr/lib/python2.7/dist-packages/pymysql/cursors.py", line 132,
in execute
2016-03-23 17:25:35.803 18823 ERROR oslo_db.sqlalchemy.exc_filters
 result = self._query(query)
2016-03-23 17:25:35.803 18823 ERROR oslo_db.sqlalchemy.exc_filters
 File "/usr/lib/python2.7/dist-packages/pymysql/cursors.py", line 271,
in _query
2016-03-23 17:25:35.803 18823 ERROR oslo_db.sqlalchemy.exc_filters
 conn.query(q)
2016-03-23 17:25:35.803 18823 ERROR oslo_db.sqlalchemy.exc_filters
 File "/usr/lib/python2.7/dist-packages/pymysql/connections.py", line
726, in query
2016-03-23 17:25:35.803 18823 ERROR oslo_db.sqlalchemy.exc_filters
 self._affected_rows = self._read_query_result(unbuffered=unbuffered)
2016-03-23 17:25:35.803 18823 ERROR oslo_db.sqlalchemy.exc_filters
 File "/usr/lib/python2.7/dist-packages/pymysql/connections.py", line
861, in _read_query_result
2016-03-23 17:25:35.803 18823 ERROR oslo_db.sqlalchemy.exc_filters
 result.read()
2016-03-23 17:25:35.803 18823 ERROR oslo_db.sqlalchemy.exc_filters
 File "/usr/lib/python2.7/dist-packages/pymysql/connections.py", line
1064, in read
2016-03-23 17:25:35.803 18823 ERROR oslo_db.sqlalchemy.exc_filters
 first_packet = self.connection._read_packet()
2016-03-23 17:25:35.803 18823 ERROR oslo_db.sqlalchemy.exc_filters
 File "/usr/lib/python2.7/dist-packages/pymysql/connections.py", line
825, in _read_packet
2016-03-23 17:25:35.803 18823 ERROR oslo_db.sqlalchemy.exc_filters
 packet = packet_type(self)
2016-03-23 17:25:35.803 18823 ERROR oslo_db.sqlalchemy.exc_filters
 File "/usr/lib/python2.7/dist-packages/pymysql/connections.py", line
242, in __init__
2016-03-23 17:25:35.803 18823 ERROR oslo_db.sqlalchemy.exc_filters
 self._recv_packet(connection)
2016-03-23 17:25:35.803 18823 ERROR oslo_db.sqlalchemy.exc_filters
 File "/usr/lib/python2.7/dist-packages/pymysql/connections.py", line
248, in _recv_packet
2016-03-23 17:25:35.803 18823 ERROR oslo_db.sqlalchemy.exc_filters
 packet_header = connection._read_bytes(4)
2016-03-23 17:25:35.803 18823 ERROR oslo_db.sqlalchemy.exc_filters
 File "/usr/lib/python2.7/dist-packages/pymysql/connections.py", line
839, in _read_bytes
2016-03-23 17:25:35.803 18823 ERROR oslo_db.sqlalchemy.exc_filters
 if len(data) < num_bytes:
2016-03-23 17:25:35.803 18823 ERROR oslo_db.sqlalchemy.exc_filters
TypeError: object of type 'NoneType' has no len()
2016-03-23 17:25:35.803 18823 ERROR oslo_db.sqlalchemy.exc_filters
2016-03-23 17:25:35.808 18823 ERROR sahara.utils.api
[req-377ef364-f2c7-4343-b32c-3741bfc0a05b ] Request aborted with
status code 500 and message 'Internal Server Error'
2016-03-23 17:25:35.809 18823 ERROR sahara.utils.api
[req-377ef364-f2c7-4343-b32c-3741bfc0a05b ] Traceback (most recent
call last):
OperationalError: (pymysql.err.OperationalError) (2014, 'Command Out
of Sync')


Any idea what could this mean? Thanks
As a result, sahara clusters are stuck in "Deleting" state.


pkg -l | grep -i sahara
ii  python-sahara1:3.0.0-0ubuntu1~cloud0
all  OpenStack data processing cluster as a service - library
ii sahara-api   1:3.0.0-0ubuntu1~cloud0
  all  OpenStack data processing cluster as a service - API
ii sahara-common1:3.0.0-0ubuntu1~cloud0
  all  OpenStack data processing cluster as a service - common
files


Regards,

Jerico





__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev



__

Re: [openstack-dev] [tricircle] Using in-memory database for unit tests

2016-03-23 Thread Mike Bayer

On 03/23/2016 01:33 AM, Vega Cai wrote:

On 22 March 2016 at 12:09, Shinobu Kinjo > wrote:

Thank you for your comment (inline for my message).

On Tue, Mar 22, 2016 at 11:53 AM, Vega Cai > wrote:
> Let me try to explain some.
>
> On 22 March 2016 at 10:09, Shinobu Kinjo > wrote:
>>
>> On Tue, Mar 22, 2016 at 10:22 AM, joehuang > wrote:
>> > Hello, Shinobu,
>> >
>> > Yes, as what you described here, the "initialize" in "core.py" is used
>> > for unit/function test only. For system integration test( for example,
>> > tempest ), it would be better to use mysql like DB, this is done by the
>> > configuration in DB part.
>>
>> Thank you for your thought.
>>
>> >
>> > From my point of view, the tricircle DB part could be enhanced in the 
DB
>> > model and migration scripts. Currently unit test use DB model to 
initialize
>> > the data base, but not using the migration scripts,
>>
>> I'm assuming the migration scripts are in "tricircle/db". Is it right?
>
>
> migration scripts are in tricircle/db/migrate_repo
>>
>>
>> What is the DB model?
>> Why do we need 2-way-methods at the moment?
>
>
> DB models are defined in tricircle/db/models.py. Models.py defines tables 
in
> object level, so other modules can import models.py then operate the 
tables
> by operating the objects. Migration scripts defines tables in table level,
> you define table fields, constraints in the scripts then migration tool 
will
> read the scripts and build the tables.

Dose "models.py" manage database schema(e.g., create / delete columns,
tables, etc)?

In "models.py" we only define database schema. SQLAlchemy provides
functionality to create tables based on schema definition, which is
"ModelBase.metadata.create_all". This is used to initialized the
in-memory database for tests currently.

FTR this is the best way to do this.   SQLite's migration patterns are 
entirely different than for any other database, so while Alembic has a 
"batch" mode that can provide some level of code-compatibility (with 
many caveats, difficulties, and dead-end cases) between a SQLite 
migration and a migration for all the other databases, it is far 
preferable to not use any migration pattern at all for the SQLite 
database and just do a create_all().  It's also much faster, especially 
in the SQLite case where migrations require that the whole table is 
dropped and re-created for most changes.

> Migration tool has a feature to
> generate migration scripts from DB models automatically but it may make
> mistakes sometimes, so currently we manually maintain the table structure 
in
> both DB model and migration scripts.

Is *migration tool* different from bot DB models and migration scripts?

Migration tool is Alembic, a lightweight database migration tool for
usage of SQLAlchemy:

https://alembic.readthedocs.org/en/latest/

It runs migration scripts to update database schema. Each database
version has one migrate script. After defining "upgrade" and "downgrade"
method in the script, you can update your database from one version to
another version. Alembic isn't aware of DB models defined in
"models.py", users need to guarantee the version of database and the
version of "models.py" match.

If you create a new database, both "ModelBase.metadata.create_all" and
Alembic can be used. But Alembic can also be used to update an existing
database to a specific version of schema.

 >>
 >>
 >> > so the migration scripts can only be tested when using
devstack for
 >> > integration test. It would better to using migration script to
instantiate
 >> > the DB, and tested in the unit test too.
 >>
 >> If I understand you correctly, we are moving forward to using the
 >> migration scripts for both unit and integration tests.
 >>
 >> Cheers,
 >> Shinobu
 >>
 >> >
 >> > (Also move the discussion to the openstack-dev mail-list)
 >> >
 >> > Best Regards
 >> > Chaoyi Huang ( joehuang )
 >> >
 >> > -Original Message-
 >> > From: Shinobu Kinjo [mailto:ski...@redhat.com
]
 >> > Sent: Tuesday, March 22, 2016 7:43 AM
 >> > To: joehuang; khayam.gondal; zhangbinsjtu; shipengfei92; newypei;
 >> > Liuhaixia; caizhiyuan (A); huangzhipeng
 >> > Subject: Using in-memory database for unit tests
 >> >
 >> > Hello,
 >> >
 >> > In "initialize" method defined in "core.py", we're using
*in-memory*
 >> > strategy making use of sqlite. AFAIK we are using this
solution for only
 >> > testing purpose. Unit tests using this solution should be

Re: [openstack-dev] [oslo] documentation on using the oslo.db opportunistic test feature

2016-02-23 Thread Mike Bayer




On 02/23/2016 12:20 PM, Sean Dague wrote:

On 02/23/2016 11:29 AM, Mike Bayer wrote:



On 02/22/2016 08:18 PM, Sean Dague wrote:

On 02/22/2016 08:08 PM, Davanum Srinivas wrote:

Sean,

You need to set the env variable like so. See testenv:mysql-python
for example
OS_TEST_DBAPI_ADMIN_CONNECTION=mysql://openstack_citest:openstack_citest@localhost


Thanks,
Dims

[1]
http://codesearch.openstack.org/?q=OS_TEST_DBAPI_ADMIN_CONNECTION=nope==



If I am reading this correctly, this needs full access to the whole
mysql administratively?


the openstack_citest user needs permission to create and use new
databases when the multiprocessing feature of testr is used.   This is
not a new requirement and the provisioning refactor in oslo.db did not
invent this.


Ok, well it was invented somewhere after it was extracted from Nova. :)


Is that something that could be addressed? In many of my environments
the mysql db does other things as well, so giving full admin to
arbitrary test code is a bit concerning.


I'd suggest that running any test suite against a database that is used
for other things is not an optimal practice; test suites by definition
can break things.   Even if the test suite user has limited permissions,
there's still many ways a bad test can break your database even though
it's less likely.   Running an additional mysql server against an
alternate data directory with a different port is one option here.


  Tempest ran into a similar

issue and addressed this by allowing for preallocation of accounts. That
kind of approach seems like it would work here given that you could do
grants on well known names.


This is a feature that could be supported by oslo.db provisioning. Right
now the multi-process provisioning is hardcoded to use random names but
certainly options or environment variables can be established that it
would work among.But you'd have to ensure that multiple test suites
aren't using the same set of names at the same time.

Feel free to suggest the preferred system of establishing these
pre-defined database names and I or someone else (since im on PTO all
next week) can work something up.


2 thoughts on that:

1) Being able to do a grant with a prefix like

GRANT all on 'openstack_ci%'.* to openstack_citest

Then using that prefix in the random db generation. That would at least
limit scope. That seems the easiest to do with the existing infrastructure.


a prefix would be very easy, and I almost wonder if we should just have 
an identifiable prefix on the username in all cases anyway.   However, 
the wildcard scheme here is only useful on MySQL.  Other backends don't 
support such a liberal setting.





2) Have a set of stack dbs with openstack_citest## where # is number,
and the testr worker id is used to set the number.

That would be more like the static accounts model used in Tempest.

-Sean



__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [oslo] oslo.db reset session?

2016-02-23 Thread Mike Bayer




On 02/23/2016 12:06 PM, Roman Podoliaka wrote:

Mike,

I think that won't work as Nova creates its own instance of
_TransactionContextManager:

https://github.com/openstack/nova/blob/d8ddecf6e3ed1e8193e5f6dba910eb29bbe6dac6/nova/db/sqlalchemy/api.py#L134-L135

Maybe we could change _TestTransactionFactory a bit, so that it takes
a context manager instance as an argument?


If they aren't using the enginefacade global context, then that's even 
easier.  They should be able to drop in _TestTransactionFactory or any 
other TransactionFactory into the _TransactionContextManager they have 
and then swap it back.   If there aren't API methods for this already, 
because everything in enginefacade is underscored, feel free to add. 
Also I'm not sure how the enginefacade integration with nova didn't 
already cover this, I guess it doesn't yet impact all of those existing 
MySQLOpportunisticTest classes it has.







On Tue, Feb 23, 2016 at 6:09 PM, Mike Bayer <mba...@redhat.com> wrote:



On 02/23/2016 09:22 AM, Sean Dague wrote:


With enginefascade working coming into projects, there seems to be some
new bits around oslo.db global sessions.

The effect of this on tests is a little problematic. Because it builds
global state which couples between tests. I've got a review to use mysql
connection explicitly for some Nova functional tests which correctly
fails and exposes a bug when run individually. However, when run in a
full test run, the global session means that it's not run against mysql,
it's run against sqlite, and passes.

https://review.openstack.org/#/c/283364/

We need something that's the inverse of session.configure() -

https://github.com/openstack/nova/blob/d8ddecf6e3ed1e8193e5f6dba910eb29bbe6dac6/nova/tests/fixtures.py#L205
to reset the global session.

Pointers would be welcomed.



from the oslo.db side, we have frameworks for testing that handle all of
these details (e.g. oslo_db.sqlalchemy.test_base.DbTestCase and DbFixture).
I don't believe Nova uses these frameworks (I think it should long term),
but for now the techniques used by oslo.db's framework should likely be
used:

self.test.enginefacade = enginefacade._TestTransactionFactory(
 self.test.engine, self.test.sessionmaker, apply_global=True,
 synchronous_reader=True)

self.addCleanup(self.test.enginefacade.dispose_global)


The above apply_global flag indicates that the global enginefacade should
use this TestTransactionFactory until disposed.







 -Sean



__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev



__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [oslo] documentation on using the oslo.db opportunistic test feature

2016-02-23 Thread Mike Bayer




On 02/22/2016 08:18 PM, Sean Dague wrote:

On 02/22/2016 08:08 PM, Davanum Srinivas wrote:

Sean,

You need to set the env variable like so. See testenv:mysql-python for example
OS_TEST_DBAPI_ADMIN_CONNECTION=mysql://openstack_citest:openstack_citest@localhost

Thanks,
Dims

[1] 
http://codesearch.openstack.org/?q=OS_TEST_DBAPI_ADMIN_CONNECTION=nope==


If I am reading this correctly, this needs full access to the whole
mysql administratively?


the openstack_citest user needs permission to create and use new 
databases when the multiprocessing feature of testr is used.   This is 
not a new requirement and the provisioning refactor in oslo.db did not 
invent this.






Is that something that could be addressed? In many of my environments
the mysql db does other things as well, so giving full admin to
arbitrary test code is a bit concerning.


I'd suggest that running any test suite against a database that is used 
for other things is not an optimal practice; test suites by definition 
can break things.   Even if the test suite user has limited permissions, 
there's still many ways a bad test can break your database even though 
it's less likely.   Running an additional mysql server against an 
alternate data directory with a different port is one option here.



 Tempest ran into a similar

issue and addressed this by allowing for preallocation of accounts. That
kind of approach seems like it would work here given that you could do
grants on well known names.


This is a feature that could be supported by oslo.db provisioning. 
Right now the multi-process provisioning is hardcoded to use random 
names but certainly options or environment variables can be established 
that it would work among.But you'd have to ensure that multiple test 
suites aren't using the same set of names at the same time.


Feel free to suggest the preferred system of establishing these 
pre-defined database names and I or someone else (since im on PTO all 
next week) can work something up.







-Sean



__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev



__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [oslo] documentation on using the oslo.db opportunistic test feature

2016-02-23 Thread Mike Bayer




On 02/22/2016 08:08 PM, Davanum Srinivas wrote:

Sean,

You need to set the env variable like so. See testenv:mysql-python for example
OS_TEST_DBAPI_ADMIN_CONNECTION=mysql://openstack_citest:openstack_citest@localhost


you should not need to set this if you're using the default URL.  The 
default is right here:


https://github.com/openstack/oslo.db/blob/master/oslo_db/sqlalchemy/provision.py#L457

if that default is not working when OS_TEST_DBAPI_ADMIN_CONNECTION is 
not set, then that's a bug in oslo.db that should be reported.


It is using pymysql now though, so if you are trying to run against 
python-mysql then you'd need to set this.






Thanks,
Dims

[1] 
http://codesearch.openstack.org/?q=OS_TEST_DBAPI_ADMIN_CONNECTION=nope==


On Mon, Feb 22, 2016 at 8:02 PM, Sean Dague  wrote:

Before migrating into oslo.db the opportunistic testing for database
backends was pretty simple. Create an openstack_citest@openstack_citest
pw:openstack_citest and you could get tests running on mysql. This no
longer seems to be the case.

I went digging through the source code a bit and it's not entirely
evident what the new required setup is. Can someone point me to the docs
to use this? Or explain what the setup for local testing is now? We've
got some bugs which expose on mysql and not sqlite in nova that we'd
like to get some test cases written for.

 -Sean

--
Sean Dague
http://dague.net


__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev







__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [oslo] documentation on using the oslo.db opportunistic test feature

2016-02-23 Thread Mike Bayer




On 02/22/2016 08:02 PM, Sean Dague wrote:

Before migrating into oslo.db the opportunistic testing for database
backends was pretty simple. Create an openstack_citest@openstack_citest
pw:openstack_citest and you could get tests running on mysql. This no
longer seems to be the case.


this is still the case.   The provisioning system hardcodes this URL as 
the default and no changes were needed to any classes using the existing 
MySQLOpportunisticTestCase base.


Nova has plenty of test cases that use this and I run these tests 
against MySQL on my own CI daily:


grep -hC3  "MySQLOpportunistic" `find nova/tests -name "*.py"`


class TestMySQLSqlalchemyTypesRepr(TestSqlalchemyTypesRepr,
test_base.MySQLOpportunisticTestCase):
pass


--


class TestMigrationUtilsMySQL(TestMigrationUtilsSQLite,
  test_base.MySQLOpportunisticTestCase):
pass
--


class TestNovaMigrationsMySQL(NovaMigrationsCheckers,
  test_base.MySQLOpportunisticTestCase,
  test.NoDBTestCase):
def test_innodb_tables(self):
with mock.patch.object(sa_migration, 'get_engine',
--


class TestNovaAPIMigrationsMySQL(NovaAPIModelsSync,
 test_base.MySQLOpportunisticTestCase,
 test.NoDBTestCase):
pass

--


class TestNovaAPIMigrationsWalkMySQL(NovaAPIMigrationsWalk,
 test_base.MySQLOpportunisticTestCase,
 test.NoDBTestCase):
pass





I went digging through the source code a bit and it's not entirely
evident what the new required setup is. Can someone point me to the docs
to use this? Or explain what the setup for local testing is now? We've
got some bugs which expose on mysql and not sqlite in nova that we'd
like to get some test cases written for.

-Sean



__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev



__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [oslo] oslo.db reset session?

2016-02-23 Thread Mike Bayer

On 02/23/2016 09:22 AM, Sean Dague wrote:

With enginefascade working coming into projects, there seems to be some
new bits around oslo.db global sessions.

The effect of this on tests is a little problematic. Because it builds
global state which couples between tests. I've got a review to use mysql
connection explicitly for some Nova functional tests which correctly
fails and exposes a bug when run individually. However, when run in a
full test run, the global session means that it's not run against mysql,
it's run against sqlite, and passes.

https://review.openstack.org/#/c/283364/

We need something that's the inverse of session.configure() -
https://github.com/openstack/nova/blob/d8ddecf6e3ed1e8193e5f6dba910eb29bbe6dac6/nova/tests/fixtures.py#L205
to reset the global session.

Pointers would be welcomed.

from the oslo.db side, we have frameworks for testing that handle all of
these details (e.g. oslo_db.sqlalchemy.test_base.DbTestCase and
DbFixture). I don't believe Nova uses these frameworks (I think it
should long term), but for now the techniques used by oslo.db's
framework should likely be used:

self.test.enginefacade = enginefacade._TestTransactionFactory(
self.test.engine, self.test.sessionmaker, apply_global=True,
synchronous_reader=True)

self.addCleanup(self.test.enginefacade.dispose_global)

The above apply_global flag indicates that the global enginefacade
should use this TestTransactionFactory until disposed.

-Sean

Re: [openstack-dev] [nova] nova-compute blocking main thread under heavy disk IO

2016-02-22 Thread Mike Bayer




On 02/22/2016 11:30 AM, Chris Friesen wrote:

On 02/22/2016 11:17 AM, Jay Pipes wrote:

On 02/22/2016 10:43 AM, Chris Friesen wrote:

Hi all,

We've recently run into some interesting behaviour that I thought I
should bring up to see if we want to do anything about it.

Basically the problem seems to be that nova-compute is doing disk I/O
from the main thread, and if it blocks then it can block all of
nova-compute (since all eventlets will be blocked).  Examples that we've
found include glance image download, file renaming, instance directory
creation, opening the instance xml file, etc.  We've seen nova-compute
block for upwards of 50 seconds.

Now the specific case where we hit this is not a production
environment.  It's only got one spinning disk shared by all the guests,
the guests were hammering on the disk pretty hard, the IO scheduler for
the instance disk was CFQ which seems to be buggy in our kernel.

But the fact remains that nova-compute is doing disk I/O from the main
thread, and if the guests push that disk hard enough then nova-compute
is going to suffer.

Given the above...would it make sense to use eventlet.tpool or similar
to perform all disk access in a separate OS thread?  There'd likely be a
bit of a performance hit, but at least it would isolate the main thread
from IO blocking.


This is probably a good idea, but will require quite a bit of code
change. I
think in the past we've taken the expedient route of just exec'ing
problematic
code in a greenthread using utils.spawn().


I'm not an expert on eventlet, but from what I've seen this isn't
sufficient to deal with disk access in a robust way.

It's my understanding that utils.spawn() will result in the code running
in the same OS thread, but in a separate eventlet greenthread.  If that
code tries to access the disk via a potentially-blocking call the
eventlet subsystem will not jump to another greenthread.  Because of
this it can potentially block the whole OS thread (and thus all other
greenthreads running in that OS thread).


not sure what utils.spawn() does but if it is in fact an "exec" (or if 
Jay is suggesting that an exec() be used within) then the code would be 
in a different process entirely, and communicating with it becomes an 
issue of pipe IO over unix sockets which IIRC can do non blocking.





I think we need to eventlet.tpool for disk IO (or else fork a whole
separate process).  Basically we need to ensure that the main OS thread
never issues a potentially-blocking syscall.


tpool would probably be easier (and more performant because no socket 
needed).





Chris

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [all] Please do not use git (and specifically "git log") when generating the docs

2016-02-18 Thread Mike Bayer




On 02/18/2016 04:39 PM, Dolph Mathews wrote:


On Thu, Feb 18, 2016 at 11:17 AM, Thomas Goirand > wrote:

Hi,

I've seen Reno doing it, then some more. It's time that I raise the
issue globally in this list before the epidemic spreads to the whole of
OpenStack ! :)

The last occurence I have found is in oslo.config (but please keep in
mind this message is for all projects), which has, its
doc/source/conf.py:

git_cmd = ["git", "log", "--pretty=format:'%ad, commit %h'",
"--date=local","-n1"]
html_last_updated_fmt = subprocess.check_output(git_cmd,
 stdin=subprocess.PIPE)


Probably a dumb question, but why do you need to build the HTML docs
when you're building a package for Debian?


Sphinx builds in many formats, not just HTML, and includes among others 
man page format which is probably relevant to a Debian package.







Of course, the .git folder is *NOT* available when building a package in
Debian (and more generally, in downstream distros). This means that this
kind of joke *will* break the build of the packages when they also build
the docs of your project. And consequently, the package maintainers have
to patch out the above lines from conf.py. It'd be best if it wasn't
needed to do so.

As a consequence, it is *not ok* to do "git log" anywhere in the sphinx
docs. Please keep this in mind.

More generally, it is wrong to assume that even the git command is
present. For Mitaka b2, I had to add git as build-dependency on nearly
all server packages, otherwise they would FTBFS (fail to build from
source). This is plain wrong and makes no sense. I hope this can be
reverted somehow.

Thanks in advance for considering the above, and to try to see things
from the package maintainer's perspective,
Cheers,

Thomas Goirand (zigo)


__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe:
openstack-dev-requ...@lists.openstack.org?subject:unsubscribe

http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev




__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev



__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [oslo][oslo.versionedobjects] Is it possible to make changes to oslo repos?

2016-01-28 Thread Mike Bayer



On 01/28/2016 01:52 PM, Doug Hellmann wrote:
> Excerpts from Hayes, Graham's message of 2016-01-28 17:01:09 +:
>> Recently I tried started to use oslo.versionedobjects for a project.
>>
>> After playing around with it for a while, I noticed I could set "this is
>> not a uuid" as the value of a UUIDField.
>>
>> After making sure I made no mistakes - I looked at the underlying code,
>> and found:[0]
>>
>> class UUID(FieldType):
>>  @staticmethod
>>  def coerce(obj, attr, value):
>>  # FIXME(danms): We should actually verify the UUIDness here
>>  return str(value)
>>
>> So, I went to implement this. [1]
>>
>> it quickly got -2'd as it would break Nova - so I went and implemented 2
>> steps of a 4 step process to get this field working as it should.
>>
>> In the review I was told: [2]
>>
>> "... I think that if a project wants that level of enforcement it
>> needs to land the project, not in the library. Libraries ideally should
>> support all supported branches of OpenStack."
>>
>> Basically - if a project wants the UUIDField to act like a UUIDField,
>> and not a field that str()'s all input, they should copy and paste code
>> around.
> 
> That's not actually the only option, as you point out below.
> 
>>
>> This is being blocked by the -2 until Nova's unit tests are fixed (just
>> Nova's - we have no way of knowing how many projects assumed it was
>> testing UUIDness and will break)
>>
>> The steps I had looked at doing was this:
>>
>> 1. Allow a "validate" flag on the Field __init__() defaulting to False.
>> 1.1. This would allow current projects to continue as is, and projects
>>   starting for the first time to do the right thing.
>> 2. Deprecate the default value - issue a FutureWarning that it is
>> changing to True
>> 3. Deprecate the option entirely.
>> 4. Remove the option, and always validate.
>>
>> 3 & 4 are even optional if some projects want to keep using UUIDFields
>> like StringFields.
>>
>> Currently the -2 still stands as the reviewer does not like the idea of
>> a flag.
>>
>> What are the options for this now? If we are supposed to support all
>> stable branches of all projects, this is the only option if it is going
>> to merge in the next 2 years.
>>
>> Or we can create a ActuallyValidatingUUIDField?
> 
> I like the idea of adding a new class, though maybe not the name
> you've proposed here. Projects that want enforcement could use that
> instead of the UUIDField. Then, as we're able to "fix" UUIDs in
> other projects, the existing UUIDField class can be deprecated in
> favor of the new one.

I'm +1 on a new class, -1 on consuming projects implementing this
themselves (e.g. more cut-and-paste of key functionality).   Normally
I'd be +1 on the "validates=True" flag approach as well but that makes
it impossible to ever change the default to True someday.  Better to
deprecate UUIDField in favor of a new class.

> 
>>
>> Also, olso seem to be very -2 heavy. This means that alternative views
>> on the review are very unlikely. My question is what is the difference
>> between a -1 and a -2 for oslo?
> 
> I'm not sure the Oslo review team's patterns are the same as in some
> other projects. We do tend to discuss things that have negative reviews.
> 
> [1] https://review.openstack.org/270178
> 
>>
>> In designate we reserve -2 for things that will completely break our
>> code, or is completely out of line for the project. (I would hope
>> implementing a FIXME is not out of line for the project)
> 
> No, but fixing it in a way that is known to break other projects
> is. In this case, the change is known to break at least one project.
> We have to be extremely careful with things that look like breaking
> changes, since we can break *everyone* with a release. So I think
> in this case the -2 is warranted.
> 
> The other case you've pointed out on IRC, of the logging timezone
> thing [1], is my -2. It was originally implemented as a breaking
> change.  That has been fixed, but it still needs some discussion
> on the mailing list, at least in part because I don't see the point
> of the change.
> 
> Doug
> 
>>
>> Thanks,
>>
>> Graham
>>
>> 0 - 
>> https://git.openstack.org/cgit/openstack/oslo.versionedobjects/tree/oslo_versionedobjects/fields.py#n305
>>
>> 1 - https://review.openstack.org/#/c/250493/
>>
>> 2 - 
>> https://review.openstack.org/#/c/250493/9/oslo_versionedobjects/fields.py
>>
> 
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> 

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [neutron] InvalidRequestError: This session is in 'prepared' state; no further SQL can be emitted within this transaction.

2016-01-11 Thread Mike Bayer



On 01/11/2016 03:58 AM, Koteswar wrote:
> Hi All,
> 
>  
> 
> In my mechanism driver, I am reading/writing into sql db in a fixed
> interval looping call. Sometimes I get the following error when I stop
> and start neutron server
> 
> InvalidRequestError: This session is in 'prepared' state; no further SQL
> can be emitted within this transaction.
> 
>  
> 
> I am using context.session.query() for add, delete, update and get.
> Please help me if any one resolved an issue like this.

the stack trace is unfortunately re-thrown from the ml2.managers code
without retaining the original traceback; use this form to reraise with
original tb:

exc_info = sys.exc_info()
raise type(e), e, exc_info[2]

There's likely helpers somewhere in oslo that do this.

The cause of this error is that a transaction commit is failing, the
error is being caught and this same Session is being used again without
rollback called first.   The code below illustrates the problem and how
to solve.

from sqlalchemy import create_engine
from sqlalchemy.orm import Session


e = create_engine("sqlite://")

s = Session(e)


conn = s.connection()


def boom():
raise Exception("sqlite commit failed")

# "break" connection.commit(),
# so that the commit fails
conn.connection.commit = boom
try:
# fail
s.commit()
except Exception, e:
# uncomment this to fix the error
# s.rollback()
pass
finally:
boom = False


# prepared state error
s.connection()

> 
>  
> 
> Full trace is as follows:
> 
> 2016-01-06 15:33:21.799 [01;31mERROR neutron.plugins.ml2.managers
> [[01;36mreq-d940a1b6-253a-43d2-b5ff-6c784c8a520f [00;36madmin
> 83b5358da62a407f88155f447966356f[01;31m] [01;35m[01;31mMechanism driver
> 'hp' failed in create_port_precommit[00m
> 
> [01;31m2016-01-06 15:33:21.799 TRACE neutron.plugins.ml2.managers
> [01;35m[00mTraceback (most recent call last):
> 
> [01;31m2016-01-06 15:33:21.799 TRACE neutron.plugins.ml2.managers
> [01;35m[00m  File "/opt/stack/neutron/neutron/plugins/ml2/managers.py",
> line 394, in _call_on_drivers
> 
> [01;31m2016-01-06 15:33:21.799 TRACE neutron.plugins.ml2.managers
> [01;35m[00mgetattr(driver.obj, method_name)(context)
> 
> [01;31m2016-01-06 15:33:21.799 TRACE neutron.plugins.ml2.managers
> [01;35m[00m  File
> "/usr/local/lib/python2.7/dist-packages/baremetal_network_provisioning/ml2/mechanism_hp.py",
> line 67, in create_port_precommit
> 
> [01;31m2016-01-06 15:33:21.799 TRACE neutron.plugins.ml2.managers
> [01;35m[00mraise e
> 
> [01;31m2016-01-06 15:33:21.799 TRACE neutron.plugins.ml2.managers
> [01;35m[00mInvalidRequestError: This session is in 'prepared' state; no
> further SQL can be emitted within this transaction.
> 
> [01;31m2016-01-06 15:33:21.799 TRACE neutron.plugins.ml2.managers
> [01;35m[00m
> 
> 2016-01-06 15:33:21.901 [01;31mERROR neutron.api.v2.resource
> [[01;36mreq-d940a1b6-253a-43d2-b5ff-6c784c8a520f [00;36madmin
> 83b5358da62a407f88155f447966356f[01;31m] [01;35m[01;31mcreate failed[00m
> 
> [01;31m2016-01-06 15:33:21.901 TRACE neutron.api.v2.resource
> [01;35m[00mTraceback (most recent call last):
> 
> [01;31m2016-01-06 15:33:21.901 TRACE neutron.api.v2.resource
> [01;35m[00m  File "/opt/stack/neutron/neutron/api/v2/resource.py", line
> 83, in resource
> 
> [01;31m2016-01-06 15:33:21.901 TRACE neutron.api.v2.resource
> [01;35m[00mresult = method(request=request, **args)
> 
> [01;31m2016-01-06 15:33:21.901 TRACE neutron.api.v2.resource
> [01;35m[00m  File
> "/usr/local/lib/python2.7/dist-packages/oslo_db/api.py", line 146, in
> wrapper
> 
> [01;31m2016-01-06 15:33:21.901 TRACE neutron.api.v2.resource
> [01;35m[00mectxt.value = e.inner_exc
> 
> [01;31m2016-01-06 15:33:21.901 TRACE neutron.api.v2.resource
> [01;35m[00m  File
> "/usr/local/lib/python2.7/dist-packages/oslo_utils/excutils.py", line
> 195, in __exit__
> 
> [01;31m2016-01-06 15:33:21.901 TRACE neutron.api.v2.resource
> [01;35m[00msix.reraise(self.type_, self.value, self.tb)
> 
> [01;31m2016-01-06 15:33:21.901 TRACE neutron.api.v2.resource
> [01;35m[00m  File
> "/usr/local/lib/python2.7/dist-packages/oslo_db/api.py", line 136, in
> wrapper
> 
> [01;31m2016-01-06 15:33:21.901 TRACE neutron.api.v2.resource
> [01;35m[00mreturn f(*args, **kwargs)
> 
> [01;31m2016-01-06 15:33:21.901 TRACE neutron.api.v2.resource
> [01;35m[00m  File "/opt/stack/neutron/neutron/api/v2/base.py", line 516,
> in create
> 
> [01;31m2016-01-06 15:33:21.901 TRACE neutron.api.v2.resource
> [01;35m[00mobj = do_create(body)
> 
> [01;31m2016-01-06 15:33:21.901 TRACE neutron.api.v2.resource
> [01;35m[00m  File "/opt/stack/neutron/neutron/api/v2/base.py", line 498,
> in do_create
> 
> [01;31m2016-01-06 15:33:21.901 TRACE neutron.api.v2.resource
> [01;35m[00mrequest.context, reservation.reservation_id)
> 
> [01;31m2016-01-06 15:33:21.901 TRACE neutron.api.v2.resource
> [01;35m[00m  File
> "/usr/local/lib/python2.7/dist-packages/oslo_utils/excutils.py", line
> 195, in __exit__
> 
> [01;31m2016-01-06

Re: [openstack-dev] [nova] [all] Excessively high greenlet default + excessively low connection pool defaults leads to connection pool latency, timeout errors, idle database connections / workers

2016-01-11 Thread Mike Bayer



On 01/11/2016 05:39 AM, Radomir Dopieralski wrote:
> On 01/08/2016 09:51 PM, Mike Bayer wrote:
>>
>>
>> On 01/08/2016 04:44 AM, Radomir Dopieralski wrote:
>>> On 01/07/2016 05:55 PM, Mike Bayer wrote:
>>>
>>>> but also even if you're under something like
>>>> mod_wsgi, you can spawn a child process or worker thread regardless.
>>>> You always have a Python interpreter running and all the things it can
>>>> do.
>>>
>>> Actually you can't, reliably. Or, more precisely, you really shouldn't.
>>> Most web servers out there expect to do their own process/thread
>>> management and get really embarrassed if you do something like this,
>>> resulting in weird stuff happening.
>>
>> I have to disagree with this as an across-the-board rule, partially
>> because my own work in building an enhanced database connection
>> management system is probably going to require that a background thread
>> be running in order to reap stale database connections.   Web servers
>> certainly do their own process/thread management, but a thoughtfully
>> organized background thread in conjunction with a supporting HTTP
>> service allows this to be feasible.   In the case of mod_wsgi,
>> particularly when using mod_wsgi in daemon mode, spawning of threads,
>> processes and in some scenarios even wholly separate applications are
>> supported use cases.
> 
> [...]
> 
>> It is certainly reasonable that not all web application containers would
>> be effective with apps that include custom background threads or
>> processes (even though IMO any system that's running a Python
>> interpreter shouldn't have any issues with a limited number of
>> well-behaved daemon-mode threads), but at least in the case of mod_wsgi,
>> this is supported; that gives Openstack's HTTP-related applications with
>> carefully/thoughtfully organized background threads at least one
>> industry-standard alternative besides being forever welded to its
>> current homegrown WSGI server implementation.
> 
> This is still writing your application for a specific configuration of a
> specific version of a specific implementation of the protocol on a
> specific web server. While this may work as a stopgap solution, I think
> it's a really bad long-term strategy. We should be programming for a
> protocol specification (WSGI in this case), not for a particular
> implementation (unless we need to throw in workarounds for
> implementation bugs). 

That is fine, but then you are saying that all of those aforementioned
Nova services which do in fact use WSGI with its own homegrown eventlet
server should nevertheless be rewritten to not use any background
threads, which I also presented as the ideal choice.   Right now, the
fact that these Nova services use background threads is being used as a
justification for why these services can never move to use a proper web
server, even though they are still WSGI apps running inside of a WSGI
container, so they are already doing the thing that claims to prevent
this move from being possible.

Also, mod_wsgi's compatibility with background threads is not linked to
a "specific version", it's intrinsic in the organization of the product.
  I would wager that most other WSGI containers can probably handle this
use case as well but this would need to be confirmed.





> 
> At least it seems so to my naive programmer mind. Sorry for ranting,
> I'm sure that you are aware of the trade-off here.
> 

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [nova] [all] Excessively high greenlet default + excessively low connection pool defaults leads to connection pool latency, timeout errors, idle database connections / workers

2016-01-08 Thread Mike Bayer

On 01/08/2016 04:44 AM, Radomir Dopieralski wrote:
> On 01/07/2016 05:55 PM, Mike Bayer wrote:
> 
>> but also even if you're under something like
>> mod_wsgi, you can spawn a child process or worker thread regardless.
>> You always have a Python interpreter running and all the things it can
>> do.
> 
> Actually you can't, reliably. Or, more precisely, you really shouldn't.
> Most web servers out there expect to do their own process/thread
> management and get really embarrassed if you do something like this,
> resulting in weird stuff happening.

I have to disagree with this as an across-the-board rule, partially
because my own work in building an enhanced database connection
management system is probably going to require that a background thread
be running in order to reap stale database connections.   Web servers
certainly do their own process/thread management, but a thoughtfully
organized background thread in conjunction with a supporting HTTP
service allows this to be feasible.   In the case of mod_wsgi,
particularly when using mod_wsgi in daemon mode, spawning of threads,
processes and in some scenarios even wholly separate applications are
supported use cases.

In mod_wsgi daemon mode (which is its recommended mode of use [1]), the
Python interpreter is not in-process with Apache in any case, and if you
set your own thread to be a "daemon", it won't block the process from
exiting.   I have successfully used this technique (again, carefully and
thoughtfully) to achieve asynchronous workers within Apache mod_wsgi
daemon-mode processes, without negative consequences.

Graham Dumpleton's own mod_wsgi documentation illustrates how to run a
background thread on development servers in mod_wsgi daemon mode in
order to achieve code reloading [2], and he also has produced a tool [3]
that uses a background thread in order to provide a debugging shell to a
running WSGI application which can work in either embedded or daemon
mode.

In [4], he illustrates using the WSGIImportScript mod_wsgi directive
under mod_wsgi daemon mode to actually spawn a whole docker container
when an Apache mod_wsgi process starts up; this isn't something I'd want
to do myself, but this is the author of mod_wsgi illustrating even
something as heavy as spinning up a whole docker instance under mod_wsgi
which then runs it's own WSGI process as a supported technique.

It is certainly reasonable that not all web application containers would
be effective with apps that include custom background threads or
processes (even though IMO any system that's running a Python
interpreter shouldn't have any issues with a limited number of
well-behaved daemon-mode threads), but at least in the case of mod_wsgi,
this is supported; that gives Openstack's HTTP-related applications with
carefully/thoughtfully organized background threads at least one
industry-standard alternative besides being forever welded to its
current homegrown WSGI server implementation.

[1] http://lanyrd.com/2013/pycon/scdyzk/

[2] https://code.google.com/p/modwsgi/wiki/ReloadingSourceCode

[3] https://github.com/GrahamDumpleton/ispyd

[4]
http://blog.dscpl.com.au/2015/07/using-apache-to-start-and-manage-docker.html

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [nova] [all] Excessively high greenlet default + excessively low connection pool defaults leads to connection pool latency, timeout errors, idle database connections / workers

2016-01-07 Thread Mike Bayer



On 01/07/2016 11:02 AM, Sean Dague wrote:
> On 01/07/2016 09:56 AM, Brant Knudson wrote:
>>
>>
>> On Thu, Jan 7, 2016 at 6:39 AM, Clayton O'Neill > > wrote:
>>
>> On Thu, Jan 7, 2016 at 2:49 AM, Roman Podoliaka
>> > wrote:
>> >
>> > Linux gurus please correct me here, but my understanding is that Linux
>> > kernel queues up to $backlog number of connections *per socket*. In
>> > our case child processes inherited the FD of the socket, so they will
>> > accept() connections from the same queue in the kernel, i.e. the
>> > backlog value is for *all* child processes, not *per* process.
>>
>>
>> Yes, it will be shared across all children.
>>
>> >
>> > In each child process eventlet WSGI server calls accept() in a loop to
>> > get a client socket from the kernel and then puts into a greenlet from
>> > a pool for processing:
>>
>> It’s worse than that.  What I’ve seen (via strace) is that eventlet
>> actually
>> converts socket into a non-blocking socket, then converts that
>> accept() into a
>> epoll()/accept() pair in every child.  Then when a connection comes
>> in, every
>> child process wakes up out of poll and races to try to accept on the the
>> non-blocking socket, and all but one of them fails.
>>
>> This means that every time there is a request, every child process
>> is woken
>> up, scheduled on CPU and then put back to sleep.  This is one of the
>> reasons we’re (slowly) moving to uWSGI.
>>
>>
>> I just want to note that I've got a change proposed to devstack that
>> adds a config option to run keystone in uwsgi (rather than under
>> eventlet or in apache httpd mod_wsgi), see
>> https://review.openstack.org/#/c/257571/ . It's specific to keystone
>> since I didn't think other projects were moving away from eventlet, too.
> 
> I feel like this is a confused point that keeps being brought up.
> 
> The preferred long term direction of all API services is to be deployed
> on a real web server platform. It's a natural fit for those services as
> they are accepting HTTP requests and doing things with them.
> 
> Most OpenStack projects have worker services beyond just an HTTP server.
> (Keystone is one of the very few exceptions here). Nova has nearly a
> dozen of these worker services. These don't naturally fit as wsgi apps,
> they are more traditional daemons, which accept requests over the
> network, but also have periodic jobs internally and self initiate
> actions. They are not just call / response. There is no long term
> direction for these to move off of eventlet.

This is totally speaking as an outsider without taking into account all
the history of these decisions, but the notion of "Python + we're a
daemon" == "we must use eventlet" seems a little bit rigid.  Also, the
notion of "we have background tasks" == "we cant run in a web server",
also not clear.  If a service intends to serve HTTP requests, that
portion of that service should be deployed in a web server; if the
system has other "background tasks", ideally those are in a separate
daemon altogether, but also even if you're under something like
mod_wsgi, you can spawn a child process or worker thread regardless.
You always have a Python interpreter running and all the things it can do.

> 
>   -Sean
> 

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [nova] [all] Excessively high greenlet default + excessively low connection pool defaults leads to connection pool latency, timeout errors, idle database connections / workers

2016-01-06 Thread Mike Bayer



On 01/06/2016 09:11 AM, Roman Podoliaka wrote:
> Hi Mike,
> 
> Thank you for this brilliant analysis! We've been seeing such timeout
> errors in downstream periodically and this is the first time someone
> has analysed the root cause thoroughly.
> 
> On Fri, Dec 18, 2015 at 10:33 PM, Mike Bayer <mba...@redhat.com> wrote:
> 
>> But if we only have a super low number of greenlets and only a few dozen
>> workers, what happens if we have more than 240 requests come in at once,
>> aren't those connections going to get rejected?  No way!  eventlet's
>> networking system is better than that, those connection requests just
>> get queued up in any case, waiting for a greenlet to be available.  Play
>> with the script and its settings to see.
> 
> Right, it must be controlled by the backlog argument value here:
> 
> https://github.com/openstack/oslo.service/blob/master/oslo_service/wsgi.py#L80

oh wow, totally missed that!  But, how does backlog here interact with
multiple processes?   E.g. all workers are saturated, it will place a
waiting connection onto a random greenlet which then has to wait?  It
would be better if the "backlog" were pushed up to the parent process,
not sure if that's possible?



__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] The command "neutron-db-manage" of 8.0.0~b1 fails

2016-01-04 Thread Mike Bayer



On 01/04/2016 06:59 AM, Ihar Hrachyshka wrote:
> Martinx - ジェームズ  wrote:
> 
>> Guys,
>>
>>  I'm trying to experiment Mitaka on Ubuntu Xenial, which already have
>> beta version on its repositories, however, "neutron-db-manage" fails.
>>
>>  Here is the output of it:
>>
>>  http://paste.openstack.org/show/482920/
>>
>>  Any clue?
>>
>>  I'm using the Kilo instructions as a start point, of course, I'm
>> using new neutron.conf and new ml2_conf.ini as well.
>>
>> Thanks in advance!
>> Thiago
> 
> I believe it was fixed in:
> https://review.openstack.org/#/c/253150/2/neutron/db/migration/alembic_migrations/versions/mitaka/contract/8a6d8bdae39_migrate_neutron_resources_table.py

doh and I'm the one who fixed it!



> 
> 
> Ihar
> 
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] The command "neutron-db-manage" of 8.0.0~b1 fails

2016-01-04 Thread Mike Bayer



On 01/03/2016 10:57 PM, Martinx - ジェームズ wrote:
> On 4 January 2016 at 01:28, Mike Bayer <mba...@redhat.com> wrote:
>>
>>
>> On 01/03/2016 05:15 PM, Martinx - ジェームズ wrote:
>>> Guys,
>>>
>>>  I'm trying to experiment Mitaka on Ubuntu Xenial, which already have
>>> beta version on its repositories, however, "neutron-db-manage" fails.
>>>
>>>  Here is the output of it:
>>>
>>>  http://paste.openstack.org/show/482920/
>>>
>>>  Any clue?
>>
>> this is a new error added to MySQL as of version 5.6.7:
>>
>> https://dev.mysql.com/doc/refman/5.6/en/error-messages-server.html#error_er_fk_column_cannot_change
>>
>> some discussion is at http://stackoverflow.com/a/17019351/34549.
>>
>> the issue here is either that the table contains NULL values or that the
>> BIGINT datatype is not compatible with the column to which the foreign
>> key refers.
>>
>> This error looks familiar but I don't recall if I saw it specific to
>> Neutron already having this issue before.
>>
> 
> Wow! Thank you! It is working now!
> 
> What I did?
> 
> 
> To turn off foreign key constraint globally, by running directly on
> MySQL root shell:
> 
> SET GLOBAL FOREIGN_KEY_CHECKS=0;
> 
> and remember to set it back when you are done... Then, neutron-db-manage 
> worked!
> 
> su -s /bin/sh -c "neutron-db-manage --config-file
> /etc/neutron/neutron.conf --config-file
> /etc/neutron/plugins/ml2/ml2_conf.ini upgrade head" neutron
> 
> After that, I re-enabled foreign_key_checks:
> 
> SET GLOBAL FOREIGN_KEY_CHECKS=1;


the problem with doing this is that whatever invalid conditions exist
with this foreign key aren't checked.   probably OK in this case but as
a general approach, turning off FKs, while a popular solution on google,
should be avoided if possible.

> 
> Apparently, people will face this problem while trying Mitaka on
> Xenial... Isn't "neutron-db-manage" aware of this new feature /
> situation of MySQL 5.6?
> 
> Should I fill a bug report on Launchpad about this?
> 
> Continuing my tests now...   :-D
> 
> Thanks again!
> Thiago
> 
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> 

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] The command "neutron-db-manage" of 8.0.0~b1 fails

2016-01-03 Thread Mike Bayer



On 01/03/2016 05:15 PM, Martinx - ジェームズ wrote:
> Guys,
> 
>  I'm trying to experiment Mitaka on Ubuntu Xenial, which already have
> beta version on its repositories, however, "neutron-db-manage" fails.
> 
>  Here is the output of it:
> 
>  http://paste.openstack.org/show/482920/
> 
>  Any clue?

this is a new error added to MySQL as of version 5.6.7:

https://dev.mysql.com/doc/refman/5.6/en/error-messages-server.html#error_er_fk_column_cannot_change

some discussion is at http://stackoverflow.com/a/17019351/34549.

the issue here is either that the table contains NULL values or that the
BIGINT datatype is not compatible with the column to which the foreign
key refers.

This error looks familiar but I don't recall if I saw it specific to
Neutron already having this issue before.




> 
>  I'm using the Kilo instructions as a start point, of course, I'm
> using new neutron.conf and new ml2_conf.ini as well.
> 
> Thanks in advance!
> Thiago
> 
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> 

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] Nova scheduler startup when database is not available

2015-12-23 Thread Mike Bayer



On 12/23/2015 01:32 PM, Jay Pipes wrote:
> On 12/23/2015 12:27 PM, Lars Kellogg-Stedman wrote:
>> I've been looking into the startup constraints involved when launching
>> Nova services with systemd using Type=notify (which causes systemd to
>> wait for an explicit notification from the service before considering
>> it to be "started".  Some services (e.g., nova-conductor) will happily
>> "start" even if the backing database is currently unavailable (and
>> will enter a retry loop waiting for the database).
>>
>> Other services -- specifically, nova-scheduler -- will block waiting
>> for the database *before* providing systemd with the necessary
>> notification.
>>
>> nova-scheduler blocks because it wants to initialize a list of
>> available aggregates (in scheduler.host_manager.HostManager.__init__),
>> which it gets by calling objects.AggregateList.get_all.
>>
>> Does it make sense to block service startup at this stage?  The
>> database disappearing during runtime isn't a hard error -- we will
>> retry and reconnect when it comes back -- so should the same situation
>> at startup be a hard error?  As an operator, I am more interested in
>> "did my configuration files parse correctly?" at startup, and would
>> generally prefer the service to start (and permit any dependent
>> services to start) even when the database isn't up (because that's
>> probably a situation of which I am already aware).
> 
> If your configuration file parsed correctly but has the wrong database
> connection URI, what good is the service in an active state? It won't be
> able to do anything at all.

this is true, but to be fair, Nova doesn't work like this at all, at
least not in nova/db/sqlalchemy/api.py.  It is very intentionally
designed to *not* connect to the database until an API call is first
accessed, to the extent that it does an end-run around oslo.db's
create_engine() feature which itself does a "test" connection when it is
called (FTR, SQLAlchemy's create_engine() that is called by oslo.db is
in fact a lazy-initializing function).I find it quite awkward
overall that oslo.db reverses SQLAlchemy's "lazyness", but then nova and
others re-reverse *back* to "lazyness", but at the expense of allowing
oslo.db's create_engine() to receive its configuration up front.

In the reworked enginefacade API I went through a lot of effort to
replicate this behavior.   It would be nice if all Openstack apps could
just pick one paradigm and stick with it so that we can just make
oslo.db do *one* pattern and that's all (probably too late though).


> 
> This is why I think it's better to have hard checks like for connections
> on startup and not have services active if they won't be able to do
> anything useful.
> 
>> It would be relatively easy to have the scheduler lazy-load the list
>> of aggregates on first references, rather than at __init__.
> 
> Sure, but if the root cause of the issue is a problem due to
> misconfigured connection string, then that lazy-load will just bomb out
> and the scheduler will be useless anyway. I'd rather have a
> fail-early/fast occur here than a fail-late.
> 
> Best,
> -jay
> 
>> I'm not
>> familiar enough with the nova code to know if there would be any
>> undesirable implications of this behavior.  We're already punting
>> initializing the list of instances to an asynchronous task in order to
>> avoid blocking service startup.
>>
>> Does it make sense to permit nova-scheduler to complete service
>> startup in the absence of the database (and then retry the connection
>> in the background)?
>>
>>
>>
>> __
>>
>> OpenStack Development Mailing List (not for usage questions)
>> Unsubscribe:
>> openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>
> 
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

1 2 3 4 >

1 - 100 of 350 matches

Mail list logo