Re: [openstack-dev] [tc] Active or passive role with our database layer

2017-05-23 Thread Jay Pipes

On 05/23/2017 07:07 PM, Boris Pavlovic wrote:
And how can someone, that is trying to deploy OpenStack, understand/find 
the right config for db? Or it's Ops tasks and community doesn't care 
about them?


Neither. It's the ops responsibility to understand database 
configuration fundamentals (since, you know, they are operating a 
database...).


It's the dev community's responsibility to document necessary 
configuration settings, provide reasonable defaults and advise on 
how/when ops should adjust those defaults.


It's a give and take.

Best,
-jay

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [tc] Active or passive role with our database layer

2017-05-23 Thread Boris Pavlovic
Zane,


> This is your periodic reminder that we have ~50 applications sharing the
> same database and not only do none of them know how the deployer will
> configure the database, most will not even have an idea which set of
> assumptions the other ~49 are making about how the deployer will configure
> the database.


And how can someone, that is trying to deploy OpenStack, understand/find
the right config for db? Or it's Ops tasks and community doesn't care about
them?

I would better give to Ops one config and say that everything should work
with it, and find the way to align everybody in community and make it
default for all projects.

Best regards,
Boris Pavlovic

On Tue, May 23, 2017 at 2:18 PM, Zane Bitter  wrote:

> On 21/05/17 15:38, Monty Taylor wrote:
>
>> One might argue that HA strategies are an operator concern, but in
>> reality the set of workable HA strategies is tightly constrained by how
>> the application works, and the pairing an application expecting one HA
>> strategy with a deployment implementing a different one can have
>> negative results ranging from unexpected downtime to data corruption.
>>
>
> This is your periodic reminder that we have ~50 applications sharing the
> same database and not only do none of them know how the deployer will
> configure the database, most will not even have an idea which set of
> assumptions the other ~49 are making about how the deployer will configure
> the database.
>
> (Ditto for RabbitMQ.)
>
> - ZB
>
>
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [tc] Active or passive role with our database layer

2017-05-23 Thread Zane Bitter

On 21/05/17 15:38, Monty Taylor wrote:

One might argue that HA strategies are an operator concern, but in
reality the set of workable HA strategies is tightly constrained by how
the application works, and the pairing an application expecting one HA
strategy with a deployment implementing a different one can have
negative results ranging from unexpected downtime to data corruption.


This is your periodic reminder that we have ~50 applications sharing the 
same database and not only do none of them know how the deployer will 
configure the database, most will not even have an idea which set of 
assumptions the other ~49 are making about how the deployer will 
configure the database.


(Ditto for RabbitMQ.)

- ZB

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [tc] Active or passive role with our database layer

2017-05-23 Thread Mike Bayer



On 05/23/2017 03:16 PM, Edward Leafe wrote:

On May 23, 2017, at 1:43 PM, Jay Pipes  wrote:


[1] Witness the join constructs in Golang in Kubernetes as they work around 
etcd not being a relational data store:



Maybe it’s just me, but I found that Go code more understandable than some of 
the SQL we are using in the placement engine. :)

I assume that the SQL in a relational engine is faster than the same thing in 
code, but is that difference significant? For extremely large data sets I think 
that the database processing may be rate limiting, but is that the case here? 
Sometimes it seems that we are overly obsessed with optimizing data handling 
when the amount of data is relatively small. A few million records should be 
fast enough using just about anything.


When you write your app fresh, put some data into it, a few hundred 
rows, not at all.  Pull it all into memory and sort/filter all you want, 
SQL is too hard.  Push it to production!  works great.   send the 
customer your bill.


6 months later.   Customer has 10K rows.   The tools their contractor 
wrote seem a little sticky.Not sure when that happened?


A year later.  Customer is at 300K rows, nowhere near "a few million" 
records.  Application regularly crashes when asked to search and filter 
results.   Because Python interpreter uses a fair amount of memory for a 
result set, multiplied by the overhead of Python object() / dict() per 
row == 100's / 1000's of megs of memory to have 30 objects in memory 
all at once.  Multiply by dozens of threads / processes handling 
concurrent requests, Python interpreter rarely returns memory.  Then add 
latency of fetching 300K rows over the wire, converting to objects. 
Concurrent requests pile up because they're slower; == more processes, 
== more memory.


New contractor is called in to rewrite the whole thing in MongoDB.   Now 
it's fast again!   Proceed to chapter 2, "So you decided to use 
MongoDB"   :)




__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [tc] Active or passive role with our database layer

2017-05-23 Thread Jay Pipes

On 05/23/2017 03:16 PM, Edward Leafe wrote:

On May 23, 2017, at 1:43 PM, Jay Pipes  wrote:


[1] Witness the join constructs in Golang in Kubernetes as they work around 
etcd not being a relational data store:


Maybe it’s just me, but I found that Go code more understandable than some of 
the SQL we are using in the placement engine. :)


Err, apples, oranges.

The Golang code is doing a single JOIN operation. The placement API is 
doing dozens of join operations, aggregate operations, and more.



I assume that the SQL in a relational engine is faster than the same thing in 
code, but is that difference significant? For extremely large data sets I think 
that the database processing may be rate limiting, but is that the case here? 
Sometimes it seems that we are overly obsessed with optimizing data handling 
when the amount of data is relatively small. A few million records should be 
fast enough using just about anything.


You are more than welcome to implement the placement API in etcd or 
Cassandra, Ed. :)


Best,
-jay

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [tc] Active or passive role with our database layer

2017-05-23 Thread Edward Leafe
On May 23, 2017, at 1:43 PM, Jay Pipes  wrote:

> [1] Witness the join constructs in Golang in Kubernetes as they work around 
> etcd not being a relational data store:


Maybe it’s just me, but I found that Go code more understandable than some of 
the SQL we are using in the placement engine. :)

I assume that the SQL in a relational engine is faster than the same thing in 
code, but is that difference significant? For extremely large data sets I think 
that the database processing may be rate limiting, but is that the case here? 
Sometimes it seems that we are overly obsessed with optimizing data handling 
when the amount of data is relatively small. A few million records should be 
fast enough using just about anything.

-- Ed Leafe






__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [tc] Active or passive role with our database layer

2017-05-23 Thread Chris Dent

On Tue, 23 May 2017, Jay Pipes wrote:

Err, in my experience, having a *completely* dumb persistence layer -- i.e. 
one that tries to assuage the differences between, say, relational and 
non-relational stores -- is a recipe for disaster. The developer just ends up 
writing join constructs in that business layer instead of using a relational 
data store the way it is intended to be used. Same for aggregate operations. 
[1]


Now, if what you're referring to is "don't use vendor-specific extensions in 
your persistence layer", then yes, I agree with you.


If you've commited to doing an RDBMS then, yeah, stick with
relational, but dumb relational. Since that's where we are [3] in
OpenStack, then we should go with that.

[3] Of course sometimes I'm sad that we made that commitment and
instead we had an abstract storage interface, an implementation
of which was stupid text files on disk, another which was generic
sqlalchemy, and another which was raw SQL extracted wholesale from
the mind of jaypipes, optimized for Drizzle 8.x. But then I'm often
sad about completely unrealistic things.

--
Chris Dent  ┬──┬◡ノ(° -°ノ)   https://anticdent.org/
freenode: cdent tw: @anticdent__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [tc] Active or passive role with our database layer

2017-05-23 Thread Jay Pipes

On 05/23/2017 07:23 AM, Chris Dent wrote:

That "higher dev cost" is one of my objections to the 'active'
approach but it is another implication that worries me more. If we
limit deployer architecture choices at the persistence layer then it
seems very likely that we will be tempted to build more and more
power and control into the persistence layer rather than in the
so-called "business" layer. In my experience this is a recipe for
ossification. The persistence layer needs to be dumb and
replaceable.


Err, in my experience, having a *completely* dumb persistence layer -- 
i.e. one that tries to assuage the differences between, say, relational 
and non-relational stores -- is a recipe for disaster. The developer 
just ends up writing join constructs in that business layer instead of 
using a relational data store the way it is intended to be used. Same 
for aggregate operations. [1]


Now, if what you're referring to is "don't use vendor-specific 
extensions in your persistence layer", then yes, I agree with you.


Best,
-jay

[1] Witness the join constructs in Golang in Kubernetes as they work 
around etcd not being a relational data store:


https://github.com/kubernetes/kubernetes/blob/master/pkg/controller/deployment/deployment_controller.go#L528-L556

Instead of a single SQL statement:

SELECT p.* FROM pods AS p
JOIN deployments AS d
ON p.deployment_id = d.id
WHERE d.name = $name;

the deployments controller code has to read every Pod message from etcd 
and loop through each Pod message, returning a list of Pods that match 
the deployment searched for.


Similarly, Kubenetes API does not support any aggregate (SUM, GROUP BY, 
etc) functionality. Instead, clients are required to perform these kinds 
of calculations/operations in memory. This is because etcd, being an 
(awesome) key/value store is not designed for aggregate operations (just 
as Cassandra or CockroachDB do not allow most aggregate operations).


My point here is not to denigrate Kubernetes. Far from it. They (to 
date) have a relatively shallow relational schema and doing join and 
index maintenance [2] operations in client-side code has so far been a 
cost that the project has been OK carrying. The point I'm trying to make 
is that the choice of data store semantics (relational or not, columnar 
or not, eventually-consistent or not, etc) *does make a difference* to 
the architecture of a project, its deployment and the amount of code 
that the project needs to keep to properly handle its data schema. 
There's no way -- in my experience -- to make a "persistence layer" that 
papers over these differences and ends up being useful.


[2] In Kubernetes, all services are required to keep all relevant data 
in memory:


https://github.com/kubernetes/community/blob/master/contributors/design-proposals/principles.md

This means that code that maintains a bunch of in-memory indexes of 
various data objects ends up being placed into every component, Here's 
an example of this in the kubelet (the equivalent-ish of the 
nova-compute daemon) pod manager, keeping an index of pods and mirrored 
pods in memory:


https://github.com/kubernetes/kubernetes/blob/master/pkg/kubelet/pod/pod_manager.go#L104-L114

https://github.com/kubernetes/kubernetes/blob/master/pkg/kubelet/pod/pod_manager.go#L159-L181

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [tc] Active or passive role with our database layer

2017-05-23 Thread Mike Bayer



On 05/23/2017 01:10 PM, Octave J. Orgeron wrote:

Comments below..

On 5/21/2017 1:38 PM, Monty Taylor wrote:


For example: An HA strategy using slave promotion and a VIP that 
points at the current write master paired with an application 
incorrectly configured to do such a thing can lead to writes to the 
wrong host after a failover event and an application that seems to be 
running fine until the data turns up weird after a while.


This is definitely a more complicated area that becomes more and more 
specific to the clustering technology being used. Galera vs. MySQL 
Cluster is a good example. Galera has an active/passive architecture 
where the above issues become a concern for sure. 


This is not my understanding; Galera is multi-master and if you lose a 
node, you don't lose any committed transactions; the writesets are 
validated as acceptable by, and pushed out to all nodes before your 
commit succeeds.   There's an option to make it wait until all those 
writesets are fully written to disk as well, but even with that option 
flipped off, if you COMMIT to one node then that node explodes, you lose 
nothing. your writesets have been verified as will be accepted by all 
the other nodes.


active/active is the second bullet point on the main homepage: 
http://galeracluster.com/products/





In the "active" approach, we still document expectations, but we also 
validate them. If they are not what we expect but can be changed at 
runtime, we change them overriding conflicting environmental config, 
and if we can't, we hard-stop indicating an unsuitable environment. 
Rather than providing helper tools, we perform the steps needed 
ourselves, in the order they need to be performed, ensuring that they 
are done in the manner in which they need to be done.


This might be a trickier situation, especially if the database(s) are in 
a separate or dedicated environment that the OpenStack service processes 
don't have access to. Of course for SQL commands, this isn't a problem. 
But changing the configuration files and restarting the database may be 
a harder thing to expect.


nevertheless the HA setup within tripleo does do this, currently using 
Pacemaker and resource agents.This is within the scope of at least 
parts of Openstack.






In either approach the OpenStack service has to be able to talk to 
both old and new versions of the schema. And in either approach we 
need to make sure to limit the schema change operations to the set 
that can be accomplished in an online fashion. We also have to be 
careful to not start writing values to new columns until all of the 
nodes have been updated, because the replication stream can't 
replicate the new column value to nodes that don't have the new column.


This is another area where something like MySQL Cluster (NDB) would 
operate differently because it's an active/active architecture. So 
limiting the number of online changes while a table is locked across the 
cluster would be very important. There is also the timeouts for the 
applications to consider, something that could be abstracted again with 
oslo.db.


So the DDL we do on Galera, to confirm but also clarify Monty's point, 
is under the realm of "total order isolation", which means it's going to 
hold up the whole cluster while DDL is applied to all nodes.   Monty 
says this disqualifies it as an "online upgrade", which is because if 
you emitted DDL that had to run default values into a million rows then 
your whole cluster would temporarily have to wait for that to happen; we 
handle that by making sure we don't do migrations with that kind of data 
requirement and while yes, the DB has to wait for a schema change to 
apply, they are at least very short (in theory).   For practical 
purposes, it is *mostly* an "online" style of migration because all the 
services that talk to the database can keep on talking to the database 
without being stopped, upgraded to new software version, and restarted, 
which IMO is what's really hard about "online" upgrades.   It does mean 
that services will just have a little more latency while operations 
proceed.  Maybe we need a new term called "quasi-online" or something 
like that.


Facebook has released a Python version of their "online" schema 
migration tool for MySQL which does the full blown "create a new, blank 
table" approach, e.g. which contains the newer version of the schema, so 
that nothing at all stops or slows down at all.  And then to manage 
between the two tables while everything is running it also makes a 
"change capture" table to keep track of what's going on, and then to 
wire it all together it uses...triggers! 
https://github.com/facebookincubator/OnlineSchemaChange/wiki/How-OSC-works. 
  Crazy Facebook kids.  How we know that "make two more tables and wire 
it all together with new triggers" in fact is more performant than just, 
"add a column to the table", I'm not sure how/when they make that 
determination.   I don't see 

Re: [openstack-dev] [tc] Active or passive role with our database layer

2017-05-23 Thread Octave J. Orgeron

Comments below..

On 5/21/2017 1:38 PM, Monty Taylor wrote:

Hi all!

As the discussion around PostgreSQL has progressed, it has come clear 
to me that there is a decently deep philosophical question on which we 
do not currently share either definition or agreement. I believe that 
the lack of clarity on this point is one of the things that makes the 
PostgreSQL conversation difficult.


I believe the question is between these two things:

* Should OpenStack assume the existence of an external database 
service that it treat as an black-box on the other side of a 
connection string?


* Should OpenStack take an active and/or opinionated role in managing 
the database service?


A potentially obvious question about that (asked by Mike Bayer in a 
different thread) is: "what do you mean by managing?"


What I mean by managing is doing all of the things you can do related 
to database operational controls short of installing the software, 
writing the basic db config files to disk and stopping and starting 
the services. It means being much more prescriptive about what types 
of config we support, validating config settings that cannot be 
overridden at runtime and refusing to operate if they are unworkable.


I think it's helpful and important for us to have automation tooling 
like tripleo, puppet, etc. that can stand up a MySQL database. But we 
also have to realize that as shops mature, they will deploy more 
complicated database topologies, clustered configurations, and 
replication scenarios. So I think we shouldn't go overboard with being 
prescriptive. We also have to realize that in the enterprise space, 
databases are usually deployed and managed by a separate database team, 
which means less control over that layer. So we shouldn't force people 
into this model. We should provide best practice documentation, examples 
(tripleo, puppet, ansible, etc.), and leave it up to the operator.




Why would we want to be 'more active'? When managing and tuning 
databases, there are some things that are driven by the environment 
and some things that are driven by the application.


Things that are driven by the environment include things like the 
amount of RAM actually available, whether or not the machines running 
the database are dedicated or shared, firewall settings, selinux 
settings and what versions of software are available.


This is a good example of an area that we should focus on documenting 
best practices and leave it to the operator to implement. Guidelines 
around cpu, memory, security settings, tunables, etc. are what's needed 
here. Today, there isn't really any guidance or best practices on even 
sizing the database(s) for a given deployment size.




Things that are driven by the application are things like character 
set and collation, schema design, data types, schema upgrade and HA 
strategies.


These are things that we can have a bit more control or direction on.



One might argue that HA strategies are an operator concern, but in 
reality the set of workable HA strategies is tightly constrained by 
how the application works, and the pairing an application expecting 
one HA strategy with a deployment implementing a different one can 
have negative results ranging from unexpected downtime to data 
corruption.


For example: An HA strategy using slave promotion and a VIP that 
points at the current write master paired with an application 
incorrectly configured to do such a thing can lead to writes to the 
wrong host after a failover event and an application that seems to be 
running fine until the data turns up weird after a while.


This is definitely a more complicated area that becomes more and more 
specific to the clustering technology being used. Galera vs. MySQL 
Cluster is a good example. Galera has an active/passive architecture 
where the above issues become a concern for sure. While MySQL Cluster 
(NDB) is an active/active architecture, so losing a node only effects 
any uncommitted transactions, that could easily be addressed with a 
retry. These topologies will become more complicated as people start 
looking at cross regional replication and DR.




For the areas in which the characteristics of the database are tied 
closely to the application behavior, there is a constrained set of 
valid choices at the database level. Sometimes that constrained set 
only has one member.


The approach to those is what I'm talking about when I ask the 
question about "external" or "active".


In the "external" approach, we document the expectations and then 
write the code assuming that the database is set up appropriately. We 
may provide some helper tools, such as 'nova-manage db sync' and 
documentation on the sequence of steps the operator should take.


In the "active" approach, we still document expectations, but we also 
validate them. If they are not what we expect but can be changed at 
runtime, we change them overriding conflicting environmental config, 
and if we can't, 

Re: [openstack-dev] [tc] Active or passive role with our database layer

2017-05-23 Thread Chris Dent

On Tue, 23 May 2017, Sean Dague wrote:


Do you have an example of an Open Source project that (after it was
widely deployed) replaced their core storage engine for their existing
users?


That's not the point here. The point is that new deployments may
choose to use a different one and old ones can choose to change if
they like (but don't have to) if storage is abstracted.

The notion of a "core storage engine" is not something that I see as
currently existing in OpenStack. It is clear it is something that at
least you and likely several other people would like to see.

But it is most definitely not something we have now and as I
responded to Monty, getting there from where we are now would be a
huge undertaking with as yet unproven value [1].


I do get that when building more targeted things, this might be a value,
but I don't see that as a useful design constraint for OpenStack.


Completely the opposite from my point of view. When something is as
frameworky as OpenStack is (perhaps accidently and probably
unfortunately) then _of course_ replaceable DBs are the norm,
expected, useful and potentially required to satisfy more use cases.

Adding specialization (tier 1?) is probably something we want and
want to encourage but it is not something we should build into the
"core" of the "product".

But there's that philosophical disagreement again. I'm not sure we
can resolve that. What I'm hoping is that by starting the ball
rolling other people will join in and people like you and me can
step out of the way.

[1] Of the issues described elsewhere in the thread the only one
which seems to be a bit sticking point is the trigger thing, and
there's significant disagreement on that being "okay".

--
Chris Dent  ┬──┬◡ノ(° -°ノ)   https://anticdent.org/
freenode: cdent tw: @anticdent__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [tc] Active or passive role with our database layer

2017-05-23 Thread Sean Dague
On 05/23/2017 07:23 AM, Chris Dent wrote:

>> Some operations have one and only one "right" way to be done. For
>> those operations if we take an 'active' approach, we can implement
>> them once and not make all of our deployers and distributors each
>> implement and run them. However, there is a cost to that. Automatic
>> and prescriptive behavior has a higher dev cost that is proportional
>> to the number of supported architectures. This then implies a need to
>> limit deployer architecture choices.
> 
> That "higher dev cost" is one of my objections to the 'active'
> approach but it is another implication that worries me more. If we
> limit deployer architecture choices at the persistence layer then it
> seems very likely that we will be tempted to build more and more
> power and control into the persistence layer rather than in the
> so-called "business" layer. In my experience this is a recipe for
> ossification. The persistence layer needs to be dumb and
> replaceable.

Why?

Do you have an example of an Open Source project that (after it was
widely deployed) replaced their core storage engine for their existing
users?

I do get that when building more targeted things, this might be a value,
but I don't see that as a useful design constraint for OpenStack.

-Sean

-- 
Sean Dague
http://dague.net

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [tc] Active or passive role with our database layer

2017-05-23 Thread Chris Dent

On Sun, 21 May 2017, Monty Taylor wrote:

As the discussion around PostgreSQL has progressed, it has come clear to me 
that there is a decently deep philosophical question on which we do not 
currently share either definition or agreement. I believe that the lack of 
clarity on this point is one of the things that makes the PostgreSQL 
conversation difficult.


Good analysis. I think this does hit to at least some of the core
differences, maybe even most. And as with so many other things we do
in OpenStack, because we have landed somewhere in the middle between
the two positions we find ourselves in a pickle (see, for example,
the different needs for and attitudes to orchestration underlying
this thread [1]).

You're right to say we need to pick one and move in that direction
but our standard struggles with reaching agreement across the entire
community, especially on an opinionated position, will need to be
overcome. Writing about it to make it visible is a good start.

In the "external" approach, we document the expectations and then write the 
code assuming that the database is set up appropriately. We may provide some 
helper tools, such as 'nova-manage db sync' and documentation on the sequence 
of steps the operator should take.


In the "active" approach, we still document expectations, but we also 
validate them. If they are not what we expect but can be changed at runtime, 
we change them overriding conflicting environmental config, and if we can't, 
we hard-stop indicating an unsuitable environment. Rather than providing 
helper tools, we perform the steps needed ourselves, in the order they need 
to be performed, ensuring that they are done in the manner in which they need 
to be done.


I think there's a middle ground here which is "externalize but
validate" which is:

* document expectations
* validate them
* do _not_ change at runtime, but tell people what's wrong

Some operations have one and only one "right" way to be done. For those 
operations if we take an 'active' approach, we can implement them once and 
not make all of our deployers and distributors each implement and run them. 
However, there is a cost to that. Automatic and prescriptive behavior has a 
higher dev cost that is proportional to the number of supported 
architectures. This then implies a need to limit deployer architecture 
choices.


That "higher dev cost" is one of my objections to the 'active'
approach but it is another implication that worries me more. If we
limit deployer architecture choices at the persistence layer then it
seems very likely that we will be tempted to build more and more
power and control into the persistence layer rather than in the
so-called "business" layer. In my experience this is a recipe for
ossification. The persistence layer needs to be dumb and
replaceable.

On the other hand, taking an 'external' approach allows us to federate the 
work of supporting the different architectures to the deployers. This means 
more work on the deployer's part, but also potentially a greater amount of 
freedom on their part to deploy supporting services the way they want. It 
means that some of the things that have been requested of us - such as easier 
operation and an increase in the number of things that can be upgraded with 
no-downtime - might become prohibitively costly for us to implement.


That's not necessarily the case. Consider that in an external
approach, where the persistence layer is opaque to the application, it
means that third parties (downstream consumers, the market, the
invisible hand, etc) have the option to do all kinds of wacky stuff.
Probably avec containers™.

In that model, the core functionality is simple and adequate but not
deluxe. Deluxe is an after-market add on.

BUT - without a decision as to what our long-term philosophical intent in 
this space is that is clear and understandable to everyone, we cannot have 
successful discussions about the impact of implementation choices, since we 
will not have a shared understanding of the problem space or the solutions 
we're talking about.


Yes.

For my part - I hear complaints that OpenStack is 'difficult' to operate and 
requests for us to make it easier. This is why I have been advocating some 
actions that are clearly rooted in an 'active' worldview.


If OpenStack were more of a monolith instead of a system with 3 to
many different databases, along with some optional number of other
ways to do other kinds of (short term) persistence, I would find the
'active' model a good option. If we were to start over I'd say let's
do that.

But as it stands implementing actually useful 'active' management of
the database feels like a very large amount of work that will take
so long that by the time we complete it it will be not just out of
date but also limit us.

External but validate feels much more viable. What we really want is
that people can get reasonably good results without trying that hard
and great (but also various) results 

Re: [openstack-dev] [tc] Active or passive role with our database layer

2017-05-22 Thread Sean Dague
On 05/21/2017 10:09 PM, Mike Bayer wrote:

>>
>> A similar issue lurks with the fact that MySQL unicode storage is
>> 3-byte by default and 4-byte is opt-in. We could take the 'external'
>> approach and document it and assume the operator has configured their
>> my.cnf with the appropriate default, or taken an 'active' approach
>> where we override it in all the models and make migrations to get us
>> from 3 to 4 byte.
> 
> let's force MySQL to use utf8mb4!   Although I am curious what is the
> actual use case we want to hit here (which gets into, zzzeek is ignorant
> as to which unicode glyphs actually live in 4-byte utf8 characters).

There are sets of existing CJK ideographs in the 4-byte range, and the
reality is that all the world's languages are still not encoded in
unicode, so more Asian languages probably land in here in the future.

We've had specific bug reports in Nova on this, but it's actually a lot
to dig out of because that db migration seems expensive.

-Sean

-- 
Sean Dague
http://dague.net

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [tc] Active or passive role with our database layer

2017-05-21 Thread Mike Bayer



On 05/21/2017 03:38 PM, Monty Taylor wrote:

documentation on the sequence of steps the operator should take.

In the "active" approach, we still document expectations, but we also 
validate them. If they are not what we expect but can be changed at 
runtime, we change them overriding conflicting environmental config, and 
if we can't, we hard-stop indicating an unsuitable environment. Rather 
than providing helper tools, we perform the steps needed ourselves, in 
the order they need to be performed, ensuring that they are done in the 
manner in which they need to be done.


we do this in places like tripleo.   The MySQL configs and such are 
checked into the source tree, it includes details like 
innodb_file_per_table, timeouts used by haproxy, etc.   I know tripleo 
is not like the service itself like Nova but it's also not exactly 
something we hand off to the operators to figure out from scratch either.


We do some of it in oslo.db as well.  We set things like MySQL SQL_MODE. 
 We try to make sure the unicode-ish flags are set up and that we're 
using utf-8 encoding.




Some examples:

* Character Sets / Collations

We currently enforce at testing time that all database migrations are 
explicit about InnoDB. We also validate in oslo.db that table character 
sets have the string 'utf8' in them. (only on MySQL) We do not have any 
check for case-sensitive or case-insensitive collations (these affect 
sorting and comparison operations) Because we don't, different server 
config settings or different database backends for different clouds can 
actually behave differently through the REST API.


To deal with that:

First we'd have to decide whether case sensitive or case insensitive was 
what we wanted. If we decided we wanted case sensitive, we could add an 
enforcement of that in oslo.db, and write migrations to get from case 
insensitive indexes to case sensitive indexes on tables where we 
detected that a case insensitive collation had been used. If we decided 
we wanted to stick with case insensitive we could similarly add code to 
enforce it on MySQL. To enforce it actively on PostgresSQL, we'd need to 
either switch our code that's using comparisons to use the sqlalchemy 
case-insensitive versions explicitly, or maybe write some sort of 
overloaded driver for PG that turns all comparisons into 
case-insensitive, which would wrap both sides of comparisons in lower() 
calls (which has some indexing concerns, but let's ignore that for the 
moment) We could also take the 'external' approach and just document it, 
then define API tests and try to tie the insensitive behavior in the API 
to Interop Compliance. I'm not 100% sure how a db operator would 
remediate this - but PG has some fancy computed index features - so 
maybe it would be possible.


let's make the case sensitivity explicitly enforced!



A similar issue lurks with the fact that MySQL unicode storage is 3-byte 
by default and 4-byte is opt-in. We could take the 'external' approach 
and document it and assume the operator has configured their my.cnf with 
the appropriate default, or taken an 'active' approach where we override 
it in all the models and make migrations to get us from 3 to 4 byte.


let's force MySQL to use utf8mb4!   Although I am curious what is the 
actual use case we want to hit here (which gets into, zzzeek is ignorant 
as to which unicode glyphs actually live in 4-byte utf8 characters).




* Schema Upgrades

The way you roll out online schema changes is highly dependent on your 
database architecture.


Just limiting to the MySQL world:

If you do Galera, you can do roll them out in Total Order or Rolling 
fashion. Total Order locks basically everything while it's happening, so 
isn't a candidate for "online". In rolling you apply the schema change 
to one node at a time. If you do that, the application has to be able to 
deal with both forms of the table, and you have to deal with ensuring 
that data can replicate appropriately while the schema change is happening.


Galera replicates DDL operations.   If I add a column on a node, it pops 
up on the other nodes too in a similar way as transactions are 
replicated, e.g. nearly synchronous.   I would *assume* it has to do 
this in the context of it's usual transaction ordering, even though 
MySQL doesn't do transactional DDL, so that if the cluster sees 
transaction A, schema change B, transaction C that depends on B, that 
ordering is serialized appropriately.However, even if it doesn't do 
that, the rolling upgrades we do don't start the services talking to the 
new schema structures until the DDL changes are complete, and Galera is 
near-synchronous replication.


Also speaking to the "active" question, we certainly have all kinds of 
logic in Openstack (the optimistic update strategy in particular) that 
take "Galera" into account.  And of course we have Galera config inside 
of tripleo.  So that's kind of the "active" approach, I think.





If you do DRBD