Re: [openstack-dev] bad default values in conf files
On 15 February 2014 12:15, Dirk Müller d...@dmllr.de wrote: I agree, and changing defaults has a cost as well: Every deployment solution out there has to detect the value change, update their config templates and potentially also migrate the setting from the old to the new default for existing deployments. Being in that situation, it has happened that we were surprised by default changes that had undesireable side effects, just because we chose to overwrite a different default elsewhere. I'm totally on board with having production ready defaults, but that also includes that they seldomly change and change only for a very good, possibly documented reason. Indeed! And in classic ironic fashion - https://bugs.launchpad.net/keystone/+bug/1280692 was caused by https://review.openstack.org/#/c/73621 - a patch of yours, changing defaults and breaking anyone running with them! -Rob -- Robert Collins rbtcoll...@hp.com Distinguished Technologist HP Converged Cloud ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] bad default values in conf files
Have the folks creating our puppet modules and install recommendations taken a close look at all the options and determined that the defaults are appropriate for deploying RHEL OSP in the configurations we are recommending? If by our puppet modules you mean the ones in stackforge, in the vast majority of cases they follow the defaults provided. I check that this is the case during review, and the only exceptions should be stuff like the db and mq locations that have to change for almost every install. - Michael On Sat, Feb 15, 2014 at 10:15 AM, Dirk Müller d...@dmllr.de wrote: were not appropriate for real deployment, and our puppet modules were not providing better values https://bugzilla.redhat.com/show_bug.cgi?id=1064061. I'd agree that raising the caching timeout is a not a good production default choice. I'd also argue that the underlying issue is fixed with https://review.openstack.org/#/c/69884/ In our testing this patch has speed up the revocation retrieval by factor 120. The default probably is too low, but raising it too high will cause concern with those who want revoked tokens to take effect immediately and are willing to scale the backend to get that result. I agree, and changing defaults has a cost as well: Every deployment solution out there has to detect the value change, update their config templates and potentially also migrate the setting from the old to the new default for existing deployments. Being in that situation, it has happened that we were surprised by default changes that had undesireable side effects, just because we chose to overwrite a different default elsewhere. I'm totally on board with having production ready defaults, but that also includes that they seldomly change and change only for a very good, possibly documented reason. Greetings, Dirk ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] bad default values in conf files
2014-02-13 23:19 GMT+08:00 Jay Pipes jaypi...@gmail.com: On Thu, 2014-02-13 at 09:38 -0500, David Kranz wrote: I was recently bitten by a case where some defaults in keystone.conf were not appropriate for real deployment, and our puppet modules were not providing better values https://bugzilla.redhat.com/show_bug.cgi?id=1064061. Since there are hundreds (thousands?) of options across all the services. I am wondering whether there are other similar issues lurking and if we have done what we can to flush them out. Defaults in conf files seem to be one of the following: - Generic, appropriate for most situations - Appropriate for devstack - Appropriate for small, distro-based deployment - Approprate for large deployment Upstream, I don't think there is a shared view of how defaults should be chosen. Keeping bad defaults can have a huge impact on performance and when a system falls over but the problems may not be visible until some time after a system gets into real use. Have the folks creating our puppet modules and install recommendations taken a close look at all the options and determined that the defaults are appropriate for deploying RHEL OSP in the configurations we are recommending? This is a very common problem in the configuration management space, frankly. One good example is the upstream mysql Chef cookbook keeping ludicrously low InnoDB buffer pool, log and data file sizes. The defaults from MySQL -- which were chosen, frankly, in the 1990s, are useful for nothing more than a test environment, but unfortunately they propogate to far too many deployments with folks unaware of the serious side-effects on performance and scalability until it's too late [1]. I think it's an excellent idea to do a review of the values in all of the configuration files and do the following: * Identify settings that simply aren't appropriate for anything and make the change to a better default. * Identify settings that need to scale with the size of the underlying VM or host capabilities, and provide patches to the configuration file comments that clearly indicate a recommended scaling factor. Remember that folks writing Puppet modules, Ansible scripts, Salt SLS files, and Chef cookbooks look first to the configuration files to get an idea of how to set the values. Good idea! +1 for providing a recommended scaling factor for the related settings Best, -jay [1] The reason I say it's too late is because for some configuration value -- notably innodb_log_file_size and innodb_data_file_size -- it is not possible to change the configuration values after data has been written to disk. You need to literally dump the contents of the DBs and reload the database after removing the files and restarting the DBs after changing the configuration options in my.cnf. See this bug for details on this pain in the behind: https://tickets.opscode.com/browse/COOK-2100 ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev -- *---* *Lingxian Kong* Huawei Technologies Co.,LTD. IT Product Line CloudOS PDU China, Xi'an Mobile: +86-18602962792 Email: konglingx...@huawei.com; anlin.k...@gmail.com ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] bad default values in conf files
On Thu, Feb 13, 2014 at 6:38 AM, David Kranz dkr...@redhat.com wrote: Defaults in conf files seem to be one of the following: - Generic, appropriate for most situations - Appropriate for devstack - Appropriate for small, distro-based deployment - Approprate for large deployment In my experiences in creating OpenStack production systems, it appears the answer is mostly Appropriate for devstack. I haven't used devstack myself, only created production systems from Openstack releases. For practically every openstack component the message queue config is missing, and every api-paste.ini needs a [filter:authtoken] section for keystone. It appears to me that those things are somehow covered when using devstack. Besides having to be added, the pitfall this creates is that documentation for releases will not point out that they need to be added and configured, because somehow devstack doesn't require it. I use a VLAN model of networking as well, which as far as I can tell devstack doesn't test/support, so I have to chase down a bunch of other config items that are missing and scarcely documented. The whole thing is quite a chore. I don't know why those common keystone and message queue configs can't be in there from the start. ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] bad default values in conf files
On Fri, 2014-02-14 at 11:30 -0800, Greg C wrote: On Thu, Feb 13, 2014 at 6:38 AM, David Kranz dkr...@redhat.com wrote: Defaults in conf files seem to be one of the following: - Generic, appropriate for most situations - Appropriate for devstack - Appropriate for small, distro-based deployment - Approprate for large deployment In my experiences in creating OpenStack production systems, it appears the answer is mostly Appropriate for devstack. I haven't used devstack myself, only created production systems from Openstack releases. For practically every openstack component the message queue config is missing Actually, every OpenStack component has MQ configs in their conf files, and well documented: Nova MQ configs: https://github.com/openstack/nova/blob/master/etc/nova/nova.conf.sample#L9-L187 Cinder: https://github.com/openstack/cinder/blob/master/etc/cinder/cinder.conf.sample#L610-L815 Keystone: https://github.com/openstack/keystone/blob/master/etc/keystone.conf.sample#L141-L163 Glance: https://github.com/openstack/glance/blob/master/etc/glance-api.conf#L246-L283 Neutron: https://github.com/openstack/neutron/blob/master/etc/neutron.conf#L105-L175 Ceilometer: https://github.com/openstack/ceilometer/blob/master/etc/ceilometer/ceilometer.conf.sample#L305-L475 , and every api-paste.ini needs a [filter:authtoken] section for keystone. Actually, every OpenStack project has this in their regular conf files: Nova: https://github.com/openstack/nova/blob/master/etc/nova/nova.conf.sample#L2624 Cinder: https://github.com/openstack/cinder/blob/master/etc/cinder/cinder.conf.sample#L1879 Glance: https://github.com/openstack/glance/blob/master/etc/glance-api.conf#L551 Neutron: https://github.com/openstack/neutron/blob/master/etc/neutron.conf#L332 Ceilometer: https://github.com/openstack/ceilometer/blob/master/etc/ceilometer/ceilometer.conf.sample#L713 It appears to me that those things are somehow covered when using devstack. Besides having to be added, the pitfall this creates is that documentation for releases will not point out that they need to be added and configured, because somehow devstack doesn't require it. I'm not sure why you think this. What configuration management system are you using to deploy OpenStack? I use a VLAN model of networking as well, which as far as I can tell devstack doesn't test/support, Incorrect. export NETWORK_MANAGER=VlanManager in your localrc. so I have to chase down a bunch of other config items that are missing and scarcely documented. The whole thing is quite a chore. I don't know why those common keystone and message queue configs can't be in there from the start. They are. Best, -jay ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] bad default values in conf files
were not appropriate for real deployment, and our puppet modules were not providing better values https://bugzilla.redhat.com/show_bug.cgi?id=1064061. I'd agree that raising the caching timeout is a not a good production default choice. I'd also argue that the underlying issue is fixed with https://review.openstack.org/#/c/69884/ In our testing this patch has speed up the revocation retrieval by factor 120. The default probably is too low, but raising it too high will cause concern with those who want revoked tokens to take effect immediately and are willing to scale the backend to get that result. I agree, and changing defaults has a cost as well: Every deployment solution out there has to detect the value change, update their config templates and potentially also migrate the setting from the old to the new default for existing deployments. Being in that situation, it has happened that we were surprised by default changes that had undesireable side effects, just because we chose to overwrite a different default elsewhere. I'm totally on board with having production ready defaults, but that also includes that they seldomly change and change only for a very good, possibly documented reason. Greetings, Dirk ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] bad default values in conf files
Excerpts from David Kranz's message of 2014-02-13 06:38:52 -0800: I was recently bitten by a case where some defaults in keystone.conf were not appropriate for real deployment, and our puppet modules were not providing better values https://bugzilla.redhat.com/show_bug.cgi?id=1064061. Just taking a look at that issue, Keystone's PKI and revocation are causing all kinds of issues with performance that are being tackled with a bit of a redesign. I doubt we can find a cache timeout setting that will work generically for everyone, but if we make detecting revocation scale, we won't have to. The default probably is too low, but raising it too high will cause concern with those who want revoked tokens to take effect immediately and are willing to scale the backend to get that result. Since there are hundreds (thousands?) of options across all the services. I am wondering whether there are other similar issues lurking and if we have done what we can to flush them out. Defaults in conf files seem to be one of the following: - Generic, appropriate for most situations - Appropriate for devstack - Appropriate for small, distro-based deployment - Approprate for large deployment Upstream, I don't think there is a shared view of how defaults should be chosen. I don't know that we have been clear enough about this, but nobody has ever challenged the assertion we've been making for a while in TripleO which is that OpenStack _must_ have production defaults. We don't make OpenStack for devstack. In TripleO, we consider it a bug when we can't run with a default value that isn't directly related to whatever makes that cloud unique. So the virt driver: meh, that's a choice, but leaving file injection on is really not appropriate for 99% of users in production. Also you'll see quite a few commits from me in the keystone SQL token driver trying to speed it up because the old default token backend was KVS (in-memory), which was fast, but REALLY not useful in production. We found these things by running defaults and noticing in a long running cloud where the performance problems are, and we intend to keep doing that. So perhaps we should encode this assertion in https://wiki.openstack.org/wiki/ReviewChecklist Keeping bad defaults can have a huge impact on performance and when a system falls over but the problems may not be visible until some time after a system gets into real use. Have the folks creating our puppet modules and install recommendations taken a close look at all the options and determined that the defaults are appropriate for deploying RHEL OSP in the configurations we are recommending? TripleO is the official deployment program. We are taking the approach described above. We're standing up several smallish (50 nodes) clouds with the intention of testing the defaults on real hardware in the gate of OpenStack eventually. ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] bad default values in conf files
David, Good that you rise this topic. It is actually sad that you have to make a big investigation of OpenStack config params, before you are able to use OpenStack. I think that this work should be done mostly inside upstream. So I have a couple of ideas how we can simplify investigation of how CONF values impact on performance at scale (without having tons of servers). In closer future you will be able to use Rally [1] for it: 1) (WIP) Deploy multinode installation in click inside lxc containers [2] (you need only 200 MB of Ram for 1 compute node) 2) Use fake virtualization 3) Run rally benchmarks and get performance for different conf parameters 4) Analyze results and set as default best one. 5) (WIP) I am working as well on pure OpenStack cross service profiler [3], that will allows us to find slow parts of code, and analyze CONF args that are related to them. (not whole list of cong params) [1] https://wiki.openstack.org/wiki/Rally [2] https://review.openstack.org/#/c/57240/27/doc/samples/deployments/multihost.rst [3] https://github.com/pboris/osprofiler Best regards, Boris Pavlovic On Thu, Feb 13, 2014 at 6:38 PM, David Kranz dkr...@redhat.com wrote: I was recently bitten by a case where some defaults in keystone.conf were not appropriate for real deployment, and our puppet modules were not providing better values https://bugzilla.redhat.com/ show_bug.cgi?id=1064061. Since there are hundreds (thousands?) of options across all the services. I am wondering whether there are other similar issues lurking and if we have done what we can to flush them out. Defaults in conf files seem to be one of the following: - Generic, appropriate for most situations - Appropriate for devstack - Appropriate for small, distro-based deployment - Approprate for large deployment Upstream, I don't think there is a shared view of how defaults should be chosen. Keeping bad defaults can have a huge impact on performance and when a system falls over but the problems may not be visible until some time after a system gets into real use. Have the folks creating our puppet modules and install recommendations taken a close look at all the options and determined that the defaults are appropriate for deploying RHEL OSP in the configurations we are recommending? -David ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] bad default values in conf files
On Thu, 2014-02-13 at 09:38 -0500, David Kranz wrote: I was recently bitten by a case where some defaults in keystone.conf were not appropriate for real deployment, and our puppet modules were not providing better values https://bugzilla.redhat.com/show_bug.cgi?id=1064061. Since there are hundreds (thousands?) of options across all the services. I am wondering whether there are other similar issues lurking and if we have done what we can to flush them out. Defaults in conf files seem to be one of the following: - Generic, appropriate for most situations - Appropriate for devstack - Appropriate for small, distro-based deployment - Approprate for large deployment Upstream, I don't think there is a shared view of how defaults should be chosen. Keeping bad defaults can have a huge impact on performance and when a system falls over but the problems may not be visible until some time after a system gets into real use. Have the folks creating our puppet modules and install recommendations taken a close look at all the options and determined that the defaults are appropriate for deploying RHEL OSP in the configurations we are recommending? This is a very common problem in the configuration management space, frankly. One good example is the upstream mysql Chef cookbook keeping ludicrously low InnoDB buffer pool, log and data file sizes. The defaults from MySQL -- which were chosen, frankly, in the 1990s, are useful for nothing more than a test environment, but unfortunately they propogate to far too many deployments with folks unaware of the serious side-effects on performance and scalability until it's too late [1]. I think it's an excellent idea to do a review of the values in all of the configuration files and do the following: * Identify settings that simply aren't appropriate for anything and make the change to a better default. * Identify settings that need to scale with the size of the underlying VM or host capabilities, and provide patches to the configuration file comments that clearly indicate a recommended scaling factor. Remember that folks writing Puppet modules, Ansible scripts, Salt SLS files, and Chef cookbooks look first to the configuration files to get an idea of how to set the values. Best, -jay [1] The reason I say it's too late is because for some configuration value -- notably innodb_log_file_size and innodb_data_file_size -- it is not possible to change the configuration values after data has been written to disk. You need to literally dump the contents of the DBs and reload the database after removing the files and restarting the DBs after changing the configuration options in my.cnf. See this bug for details on this pain in the behind: https://tickets.opscode.com/browse/COOK-2100 ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] bad default values in conf files
On 2014-02-13 09:01, Clint Byrum wrote: Excerpts from David Kranz's message of 2014-02-13 06:38:52 -0800: I was recently bitten by a case where some defaults in keystone.conf were not appropriate for real deployment, and our puppet modules were not providing better values https://bugzilla.redhat.com/show_bug.cgi?id=1064061. Just taking a look at that issue, Keystone's PKI and revocation are causing all kinds of issues with performance that are being tackled with a bit of a redesign. I doubt we can find a cache timeout setting that will work generically for everyone, but if we make detecting revocation scale, we won't have to. The default probably is too low, but raising it too high will cause concern with those who want revoked tokens to take effect immediately and are willing to scale the backend to get that result. Since there are hundreds (thousands?) of options across all the services. I am wondering whether there are other similar issues lurking and if we have done what we can to flush them out. Defaults in conf files seem to be one of the following: - Generic, appropriate for most situations - Appropriate for devstack - Appropriate for small, distro-based deployment - Approprate for large deployment Upstream, I don't think there is a shared view of how defaults should be chosen. I don't know that we have been clear enough about this, but nobody has ever challenged the assertion we've been making for a while in TripleO which is that OpenStack _must_ have production defaults. We don't make OpenStack for devstack. Especially since devstack has config overrides in place to make sure everything is set up the way it needs. There's absolutely no reason to set a default because devstack needs it - just have devstack set it when it runs. Of course, what qualifies as production-ready for my single-node OpenStack installation may not be appropriate for a 1 node installation. Basically what you talked about above with Keystone. Some of those defaults might not be as easy to set, but it should be a more manageable subset of options that have that problem. In TripleO, we consider it a bug when we can't run with a default value that isn't directly related to whatever makes that cloud unique. So the virt driver: meh, that's a choice, but leaving file injection on is really not appropriate for 99% of users in production. Also you'll see quite a few commits from me in the keystone SQL token driver trying to speed it up because the old default token backend was KVS (in-memory), which was fast, but REALLY not useful in production. We found these things by running defaults and noticing in a long running cloud where the performance problems are, and we intend to keep doing that. So perhaps we should encode this assertion in https://wiki.openstack.org/wiki/ReviewChecklist +1 Keeping bad defaults can have a huge impact on performance and when a system falls over but the problems may not be visible until some time after a system gets into real use. Have the folks creating our puppet modules and install recommendations taken a close look at all the options and determined that the defaults are appropriate for deploying RHEL OSP in the configurations we are recommending? TripleO is the official deployment program. We are taking the approach described above. We're standing up several smallish (50 nodes) clouds with the intention of testing the defaults on real hardware in the gate of OpenStack eventually. ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev