Re: [openstack-dev] bad default values in conf files

2014-02-16 Thread Robert Collins
On 15 February 2014 12:15, Dirk Müller d...@dmllr.de wrote:

 I agree, and changing defaults has a cost as well: Every deployment
 solution out there has to detect the value change, update their config
 templates and potentially also migrate the setting from the old to the
 new default for existing deployments. Being in that situation, it has
 happened that we were surprised by default changes that had
 undesireable side effects, just because we chose to overwrite a
 different default elsewhere.

 I'm totally on board with having production ready defaults, but that
 also includes that they seldomly change and change only for a very
 good, possibly documented reason.

Indeed! And in classic ironic fashion -
https://bugs.launchpad.net/keystone/+bug/1280692 was caused by
https://review.openstack.org/#/c/73621 - a patch of yours, changing
defaults and breaking anyone running with them!

-Rob

-- 
Robert Collins rbtcoll...@hp.com
Distinguished Technologist
HP Converged Cloud

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] bad default values in conf files

2014-02-15 Thread Michael Chapman
Have the folks creating our puppet modules and install recommendations
taken a close look at all the options and determined
that the defaults are appropriate for deploying RHEL OSP in the
configurations we are recommending?

If by our puppet modules you mean the ones in stackforge, in the vast
majority of cases they follow the defaults provided. I check that this is
the case during review, and the only exceptions should be stuff like the db
and mq locations that have to change for almost every install.

 - Michael



On Sat, Feb 15, 2014 at 10:15 AM, Dirk Müller d...@dmllr.de wrote:

  were not appropriate for real deployment, and our puppet modules were
  not providing better values
  https://bugzilla.redhat.com/show_bug.cgi?id=1064061.

 I'd agree that raising the caching timeout is a not a good production
 default choice. I'd also argue that the underlying issue is fixed
 with https://review.openstack.org/#/c/69884/

 In our testing this patch has speed up the revocation retrieval by factor
 120.

  The default probably is too low, but raising it too high will cause
  concern with those who want revoked tokens to take effect immediately
  and are willing to scale the backend to get that result.

 I agree, and changing defaults has a cost as well: Every deployment
 solution out there has to detect the value change, update their config
 templates and potentially also migrate the setting from the old to the
 new default for existing deployments. Being in that situation, it has
 happened that we were surprised by default changes that had
 undesireable side effects, just because we chose to overwrite a
 different default elsewhere.

 I'm totally on board with having production ready defaults, but that
 also includes that they seldomly change and change only for a very
 good, possibly documented reason.


 Greetings,
 Dirk

 ___
 OpenStack-dev mailing list
 OpenStack-dev@lists.openstack.org
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] bad default values in conf files

2014-02-14 Thread Lingxian Kong
2014-02-13 23:19 GMT+08:00 Jay Pipes jaypi...@gmail.com:

 On Thu, 2014-02-13 at 09:38 -0500, David Kranz wrote:
  I was recently bitten by a case where some defaults in keystone.conf
  were not appropriate for real deployment, and our puppet modules were
  not providing better values
  https://bugzilla.redhat.com/show_bug.cgi?id=1064061. Since there are
  hundreds (thousands?) of options across all the services. I am wondering
  whether there are other similar issues lurking and if we have done what
  we can to flush them out.
 
  Defaults in conf files seem to be one of the following:
 
  - Generic, appropriate for most situations
  - Appropriate for devstack
  - Appropriate for small, distro-based deployment
  - Approprate for large deployment
 
  Upstream, I don't think there is a shared view of how defaults should be
  chosen.
 
  Keeping bad defaults can have a huge impact on performance and when a
  system falls over but the problems may not be visible until some time
  after a system gets into real use. Have the folks creating our puppet
  modules and install recommendations taken a close look at all the
  options and determined
  that the defaults are appropriate for deploying RHEL OSP in the
  configurations we are recommending?

 This is a very common problem in the configuration management space,
 frankly. One good example is the upstream mysql Chef cookbook keeping
 ludicrously low InnoDB buffer pool, log and data file sizes. The
 defaults from MySQL -- which were chosen, frankly, in the 1990s, are
 useful for nothing more than a test environment, but unfortunately they
 propogate to far too many deployments with folks unaware of the serious
 side-effects on performance and scalability until it's too late [1].

 I think it's an excellent idea to do a review of the values in all of
 the configuration files and do the following:

 * Identify settings that simply aren't appropriate for anything and make
 the change to a better default.

 * Identify settings that need to scale with the size of the underlying
 VM or host capabilities, and provide patches to the configuration file
 comments that clearly indicate
 
 a recommended scaling factor. Remember
 that folks writing Puppet modules, Ansible scripts, Salt SLS files, and
 Chef cookbooks look first to the configuration files to get an idea of
 how to set the values.


Good idea! +1 for providing a recommended scaling factor for the related
settings




 Best,
 -jay

 [1] The reason I say it's too late is because for some configuration
 value -- notably innodb_log_file_size and innodb_data_file_size -- it is
 not possible to change the configuration values after data has been
 written to disk. You need to literally dump the contents of the DBs and
 reload the database after removing the files and restarting the DBs
 after changing the configuration options in my.cnf. See this bug for
 details on this pain in the behind:

 https://tickets.opscode.com/browse/COOK-2100


 ___
 OpenStack-dev mailing list
 OpenStack-dev@lists.openstack.org
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev




-- 
*---*
*Lingxian Kong*
Huawei Technologies Co.,LTD.
IT Product Line CloudOS PDU
China, Xi'an
Mobile: +86-18602962792
Email: konglingx...@huawei.com; anlin.k...@gmail.com
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] bad default values in conf files

2014-02-14 Thread Greg C
On Thu, Feb 13, 2014 at 6:38 AM, David Kranz dkr...@redhat.com wrote:


 Defaults in conf files seem to be one of the following:

 - Generic, appropriate for most situations
 - Appropriate for devstack
 - Appropriate for small, distro-based deployment
 - Approprate for large deployment


 In my experiences in creating OpenStack production systems, it appears the
answer is mostly Appropriate for devstack.  I haven't used devstack
myself, only created production systems from Openstack releases.  For
practically every openstack component the message queue config is missing,
and every api-paste.ini needs a [filter:authtoken] section for keystone.

It appears to me that those things are somehow covered when using
devstack.  Besides having to be added, the pitfall this creates is that
documentation for releases will not point out that they need to be added
and configured, because somehow devstack doesn't require it.

I use a VLAN model of networking as well, which as far as I can tell
devstack doesn't test/support, so I have to chase down a bunch of other
config items that are missing and scarcely documented.  The whole thing is
quite a chore.  I don't know why those common keystone and message queue
configs can't be in there from the start.
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] bad default values in conf files

2014-02-14 Thread Jay Pipes
On Fri, 2014-02-14 at 11:30 -0800, Greg C wrote:
 
 On Thu, Feb 13, 2014 at 6:38 AM, David Kranz dkr...@redhat.com
 wrote:
 
 Defaults in conf files seem to be one of the following:
 
 - Generic, appropriate for most situations
 - Appropriate for devstack
 - Appropriate for small, distro-based deployment
 - Approprate for large deployment
 
 
  In my experiences in creating OpenStack production systems, it
 appears the answer is mostly Appropriate for devstack.  I haven't
 used devstack myself, only created production systems from Openstack
 releases.  For practically every openstack component the message queue
 config is missing

Actually, every OpenStack component has MQ configs in their conf files,
and well documented:

Nova MQ configs:

https://github.com/openstack/nova/blob/master/etc/nova/nova.conf.sample#L9-L187

Cinder:

https://github.com/openstack/cinder/blob/master/etc/cinder/cinder.conf.sample#L610-L815

Keystone:

https://github.com/openstack/keystone/blob/master/etc/keystone.conf.sample#L141-L163

Glance:

https://github.com/openstack/glance/blob/master/etc/glance-api.conf#L246-L283

Neutron:

https://github.com/openstack/neutron/blob/master/etc/neutron.conf#L105-L175

Ceilometer:

https://github.com/openstack/ceilometer/blob/master/etc/ceilometer/ceilometer.conf.sample#L305-L475

 , and every api-paste.ini needs a [filter:authtoken] section for
 keystone.

Actually, every OpenStack project has this in their regular conf files:

Nova:

https://github.com/openstack/nova/blob/master/etc/nova/nova.conf.sample#L2624

Cinder:

https://github.com/openstack/cinder/blob/master/etc/cinder/cinder.conf.sample#L1879

Glance:

https://github.com/openstack/glance/blob/master/etc/glance-api.conf#L551

Neutron:

https://github.com/openstack/neutron/blob/master/etc/neutron.conf#L332

Ceilometer:

https://github.com/openstack/ceilometer/blob/master/etc/ceilometer/ceilometer.conf.sample#L713

 It appears to me that those things are somehow covered when using
 devstack.  Besides having to be added, the pitfall this creates is
 that documentation for releases will not point out that they need to
 be added and configured, because somehow devstack doesn't require it.

I'm not sure why you think this. What configuration management system
are you using to deploy OpenStack?

 I use a VLAN model of networking as well, which as far as I can tell
 devstack doesn't test/support, 

Incorrect. export NETWORK_MANAGER=VlanManager in your localrc.

 so I have to chase down a bunch of other config items that are missing
 and scarcely documented.  The whole thing is quite a chore.  I don't
 know why those common keystone and message queue configs can't be in
 there from the start.

They are.

Best,
-jay



___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] bad default values in conf files

2014-02-14 Thread Dirk Müller
 were not appropriate for real deployment, and our puppet modules were
 not providing better values
 https://bugzilla.redhat.com/show_bug.cgi?id=1064061.

I'd agree that raising the caching timeout is a not a good production
default choice. I'd also argue that the underlying issue is fixed
with https://review.openstack.org/#/c/69884/

In our testing this patch has speed up the revocation retrieval by factor 120.

 The default probably is too low, but raising it too high will cause
 concern with those who want revoked tokens to take effect immediately
 and are willing to scale the backend to get that result.

I agree, and changing defaults has a cost as well: Every deployment
solution out there has to detect the value change, update their config
templates and potentially also migrate the setting from the old to the
new default for existing deployments. Being in that situation, it has
happened that we were surprised by default changes that had
undesireable side effects, just because we chose to overwrite a
different default elsewhere.

I'm totally on board with having production ready defaults, but that
also includes that they seldomly change and change only for a very
good, possibly documented reason.


Greetings,
Dirk

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] bad default values in conf files

2014-02-13 Thread Clint Byrum
Excerpts from David Kranz's message of 2014-02-13 06:38:52 -0800:
 I was recently bitten by a case where some defaults in keystone.conf 
 were not appropriate for real deployment, and our puppet modules were 
 not providing better values 
 https://bugzilla.redhat.com/show_bug.cgi?id=1064061.

Just taking a look at that issue, Keystone's PKI and revocation are
causing all kinds of issues with performance that are being tackled with
a bit of a redesign. I doubt we can find a cache timeout setting that
will work generically for everyone, but if we make detecting revocation
scale, we won't have to.

The default probably is too low, but raising it too high will cause
concern with those who want revoked tokens to take effect immediately
and are willing to scale the backend to get that result.

 Since there are 
 hundreds (thousands?) of options across all the services. I am wondering 
 whether there are other similar issues lurking and if we have done what 
 we can to flush them out.
 
 Defaults in conf files seem to be one of the following:
 
 - Generic, appropriate for most situations
 - Appropriate for devstack
 - Appropriate for small, distro-based deployment
 - Approprate for large deployment
 
 Upstream, I don't think there is a shared view of how defaults should be 
 chosen.
 

I don't know that we have been clear enough about this, but nobody has
ever challenged the assertion we've been making for a while in TripleO
which is that OpenStack _must_ have production defaults. We don't make
OpenStack for devstack.

In TripleO, we consider it a bug when we can't run with a default value
that isn't directly related to whatever makes that cloud unique. So
the virt driver: meh, that's a choice, but leaving file injection on is
really not appropriate for 99% of users in production. Also you'll see
quite a few commits from me in the keystone SQL token driver trying to
speed it up because the old default token backend was KVS (in-memory),
which was fast, but REALLY not useful in production. We found these
things by running defaults and noticing in a long running cloud where
the performance problems are, and we intend to keep doing that.

So perhaps we should encode this assertion in
https://wiki.openstack.org/wiki/ReviewChecklist

 Keeping bad defaults can have a huge impact on performance and when a 
 system falls over but the problems may not be visible until some time 
 after a system gets into real use. Have the folks creating our puppet 
 modules and install recommendations taken a close look at all the 
 options and determined
 that the defaults are appropriate for deploying RHEL OSP in the 
 configurations we are recommending?


TripleO is the official deployment program. We are taking the approach
described above. We're standing up several smallish (50 nodes) clouds
with the intention of testing the defaults on real hardware in the gate
of OpenStack eventually.

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] bad default values in conf files

2014-02-13 Thread Boris Pavlovic
David,

Good that you rise this topic. It is actually sad that you have to make a
big investigation of OpenStack config params, before you are able to use
OpenStack. I think that this work should be done mostly inside upstream.


So I have a couple of ideas how we can simplify investigation of how CONF
values impact on performance at scale (without having tons of servers).

In closer future you will be able to use Rally [1] for it:
1) (WIP) Deploy multinode installation in click inside lxc containers [2]
(you need only 200 MB of Ram for 1 compute node)
2) Use fake virtualization
3) Run rally benchmarks and get performance for different conf parameters
4) Analyze results and set as default best one.
5) (WIP) I am working as well on pure OpenStack cross service profiler [3],
that will allows us to find slow parts of code, and analyze CONF args that
are related to them. (not whole list of cong params)



[1] https://wiki.openstack.org/wiki/Rally
[2]
https://review.openstack.org/#/c/57240/27/doc/samples/deployments/multihost.rst
[3] https://github.com/pboris/osprofiler

Best regards,
Boris Pavlovic



On Thu, Feb 13, 2014 at 6:38 PM, David Kranz dkr...@redhat.com wrote:

 I was recently bitten by a case where some defaults in keystone.conf were
 not appropriate for real deployment, and our puppet modules were not
 providing better values https://bugzilla.redhat.com/
 show_bug.cgi?id=1064061. Since there are hundreds (thousands?) of options
 across all the services. I am wondering whether there are other similar
 issues lurking and if we have done what we can to flush them out.

 Defaults in conf files seem to be one of the following:

 - Generic, appropriate for most situations
 - Appropriate for devstack
 - Appropriate for small, distro-based deployment
 - Approprate for large deployment

 Upstream, I don't think there is a shared view of how defaults should be
 chosen.

 Keeping bad defaults can have a huge impact on performance and when a
 system falls over but the problems may not be visible until some time after
 a system gets into real use. Have the folks creating our puppet modules and
 install recommendations taken a close look at all the options and determined
 that the defaults are appropriate for deploying RHEL OSP in the
 configurations we are recommending?

  -David

 ___
 OpenStack-dev mailing list
 OpenStack-dev@lists.openstack.org
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] bad default values in conf files

2014-02-13 Thread Jay Pipes
On Thu, 2014-02-13 at 09:38 -0500, David Kranz wrote:
 I was recently bitten by a case where some defaults in keystone.conf 
 were not appropriate for real deployment, and our puppet modules were 
 not providing better values 
 https://bugzilla.redhat.com/show_bug.cgi?id=1064061. Since there are 
 hundreds (thousands?) of options across all the services. I am wondering 
 whether there are other similar issues lurking and if we have done what 
 we can to flush them out.
 
 Defaults in conf files seem to be one of the following:
 
 - Generic, appropriate for most situations
 - Appropriate for devstack
 - Appropriate for small, distro-based deployment
 - Approprate for large deployment
 
 Upstream, I don't think there is a shared view of how defaults should be 
 chosen.
 
 Keeping bad defaults can have a huge impact on performance and when a 
 system falls over but the problems may not be visible until some time 
 after a system gets into real use. Have the folks creating our puppet 
 modules and install recommendations taken a close look at all the 
 options and determined
 that the defaults are appropriate for deploying RHEL OSP in the 
 configurations we are recommending?

This is a very common problem in the configuration management space,
frankly. One good example is the upstream mysql Chef cookbook keeping
ludicrously low InnoDB buffer pool, log and data file sizes. The
defaults from MySQL -- which were chosen, frankly, in the 1990s, are
useful for nothing more than a test environment, but unfortunately they
propogate to far too many deployments with folks unaware of the serious
side-effects on performance and scalability until it's too late [1].

I think it's an excellent idea to do a review of the values in all of
the configuration files and do the following:

* Identify settings that simply aren't appropriate for anything and make
the change to a better default.

* Identify settings that need to scale with the size of the underlying
VM or host capabilities, and provide patches to the configuration file
comments that clearly indicate a recommended scaling factor. Remember
that folks writing Puppet modules, Ansible scripts, Salt SLS files, and
Chef cookbooks look first to the configuration files to get an idea of
how to set the values.

Best,
-jay

[1] The reason I say it's too late is because for some configuration
value -- notably innodb_log_file_size and innodb_data_file_size -- it is
not possible to change the configuration values after data has been
written to disk. You need to literally dump the contents of the DBs and
reload the database after removing the files and restarting the DBs
after changing the configuration options in my.cnf. See this bug for
details on this pain in the behind:

https://tickets.opscode.com/browse/COOK-2100


___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] bad default values in conf files

2014-02-13 Thread Ben Nemec

On 2014-02-13 09:01, Clint Byrum wrote:

Excerpts from David Kranz's message of 2014-02-13 06:38:52 -0800:

I was recently bitten by a case where some defaults in keystone.conf
were not appropriate for real deployment, and our puppet modules were
not providing better values
https://bugzilla.redhat.com/show_bug.cgi?id=1064061.


Just taking a look at that issue, Keystone's PKI and revocation are
causing all kinds of issues with performance that are being tackled 
with

a bit of a redesign. I doubt we can find a cache timeout setting that
will work generically for everyone, but if we make detecting revocation
scale, we won't have to.

The default probably is too low, but raising it too high will cause
concern with those who want revoked tokens to take effect immediately
and are willing to scale the backend to get that result.


Since there are
hundreds (thousands?) of options across all the services. I am 
wondering
whether there are other similar issues lurking and if we have done 
what

we can to flush them out.

Defaults in conf files seem to be one of the following:

- Generic, appropriate for most situations
- Appropriate for devstack
- Appropriate for small, distro-based deployment
- Approprate for large deployment

Upstream, I don't think there is a shared view of how defaults should 
be

chosen.



I don't know that we have been clear enough about this, but nobody has
ever challenged the assertion we've been making for a while in TripleO
which is that OpenStack _must_ have production defaults. We don't make
OpenStack for devstack.


Especially since devstack has config overrides in place to make sure 
everything is set up the way it needs.  There's absolutely no reason to 
set a default because devstack needs it - just have devstack set it when 
it runs.


Of course, what qualifies as production-ready for my single-node 
OpenStack installation may not be appropriate for a 1 node 
installation.  Basically what you talked about above with Keystone.  
Some of those defaults might not be as easy to set, but it should be a 
more manageable subset of options that have that problem.




In TripleO, we consider it a bug when we can't run with a default value
that isn't directly related to whatever makes that cloud unique. So
the virt driver: meh, that's a choice, but leaving file injection on is
really not appropriate for 99% of users in production. Also you'll see
quite a few commits from me in the keystone SQL token driver trying to
speed it up because the old default token backend was KVS (in-memory),
which was fast, but REALLY not useful in production. We found these
things by running defaults and noticing in a long running cloud where
the performance problems are, and we intend to keep doing that.

So perhaps we should encode this assertion in
https://wiki.openstack.org/wiki/ReviewChecklist


+1




Keeping bad defaults can have a huge impact on performance and when a
system falls over but the problems may not be visible until some time
after a system gets into real use. Have the folks creating our puppet
modules and install recommendations taken a close look at all the
options and determined
that the defaults are appropriate for deploying RHEL OSP in the
configurations we are recommending?



TripleO is the official deployment program. We are taking the 
approach

described above. We're standing up several smallish (50 nodes) clouds
with the intention of testing the defaults on real hardware in the gate
of OpenStack eventually.



___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev