Re: [openstack-dev] [openstack-operators][cinder] max_concurrent_builds in Cinder
On 05/24/2016 04:38 PM, Gorka Eguileor wrote: > On 23/05, Ivan Kolodyazhny wrote: >> Hi developers and operators, >> I would like to get any feedback from you about my idea before I'll start >> work on spec. >> >> In Nova, we've got max_concurrent_builds option [1] to set 'Maximum number >> of instance builds to run concurrently' per each compute. There is no >> equivalent Cinder. > Hi, > > First I want to say that I think this is a good idea because I know this > message will get diluted once I start with my mumbling. ;-) > > The first thing we should allow to control is the number of workers per > service, since we currently only allow setting it for the API nodes and > all other nodes will use a default of 1000. I posted a patch [1] to > allow this and it's been sitting there for the last 3 months. :'-( > > As I see it not all mentioned problems are equal, and the main > distinction is caused by Cinder being not only in the control path but > also in the data path. Resulting in some of the issues being backend > specific limitations, that I believe should be address differently in > the specs. > > For operations where Cinder is in the control path we should be > limiting/queuing operations in the cinder core code (for example the > manager) whereas when the limitation only applies to some drivers this > should be addressed by the drivers themselves. Although the spec should > provide a clear mechanism/pattern to solve it in the drivers as well so > all drivers can use a similar pattern which will provide consistency, > making it easier to review and maintain. > > The queuing should preserve the order of arrival of operations, which > file locks from Oslo concurrency and Tooz don't do. I would be seriously opposed to queuing done inside Cinder code. It makes draining a service harder and increases impact of a failure of a single service. We already have a queue system and it is whatever you're running under oslo.messaging (RabbitMQ mostly). Making our RPC workers number configurable for each service sounds like a best shot to me. >> Why do we need it for Cinder? IMO, it could help us to address following >> issues: >> >>- Creation of N volumes at the same time increases a lot of resource >>usage by cinder-volume service. Image caching feature [2] could help us a >>bit in case when we create volume form image. But we still have to upload >> N >>images to the volumes backend at the same time. > This is an example where we are in the data path. > >>- Deletion on N volumes at parallel. Usually, it's not very hard task >>for Cinder, but if you have to delete 100+ volumes at once, you can fit >>different issues with DB connections, CPU and memory usages. In case of >>LVM, it also could use 'dd' command to cleanup volumes. > This is a case where it is a backend limitation and should be handled by > the drivers. > > I know some people say that deletion and attaching have problems when a > lot of them are requested to the c-vol nodes and that cinder cannot > handle the workload properly, but in my experience these cases are > always due to suboptimal cinder configuration, like a low number of DB > connections configured in cinder that make operations fight for a DB > connection creating big delays to complete operations. > >>- It will be some kind of load balancing in HA mode: if cinder-volume >>process is busy with current operations, it will not catch message from >>RabbitMQ and other cinder-volume service will do it. > I don't understand what you mean with this. Do you mean that Cinder > service will stop listening to the message queue when it reaches a > certain workload on the "heavy" operations? Then wouldn't it also stop > processing "light" operations? > >>- From users perspective, it seems that better way is to create/delete N >>volumes a bit slower than fail after X volumes were created/deleted. > I agree, it's better not to fail. :-) > > Cheers, > Gorka. > >> >> [1] >> https://github.com/openstack/nova/blob/283da2bbb74d0a131a66703516d16698a05817c7/nova/conf/compute.py#L161-L163 >> [2] >> https://specs.openstack.org/openstack/cinder-specs/specs/liberty/image-volume-cache.html >> >> Regards, >> Ivan Kolodyazhny, >> http://blog.e0ne.info/ >> __ >> OpenStack Development Mailing List (not for usage questions) >> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe >> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev > > __ > OpenStack Development Mailing List (not for usage questions) > Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev __ OpenStack Development Mailing List (not for usage questions)
Re: [openstack-dev] [openstack-operators][cinder] max_concurrent_builds in Cinder
On 23/05, Ivan Kolodyazhny wrote: > Hi developers and operators, > I would like to get any feedback from you about my idea before I'll start > work on spec. > > In Nova, we've got max_concurrent_builds option [1] to set 'Maximum number > of instance builds to run concurrently' per each compute. There is no > equivalent Cinder. Hi, First I want to say that I think this is a good idea because I know this message will get diluted once I start with my mumbling. ;-) The first thing we should allow to control is the number of workers per service, since we currently only allow setting it for the API nodes and all other nodes will use a default of 1000. I posted a patch [1] to allow this and it's been sitting there for the last 3 months. :'-( As I see it not all mentioned problems are equal, and the main distinction is caused by Cinder being not only in the control path but also in the data path. Resulting in some of the issues being backend specific limitations, that I believe should be address differently in the specs. For operations where Cinder is in the control path we should be limiting/queuing operations in the cinder core code (for example the manager) whereas when the limitation only applies to some drivers this should be addressed by the drivers themselves. Although the spec should provide a clear mechanism/pattern to solve it in the drivers as well so all drivers can use a similar pattern which will provide consistency, making it easier to review and maintain. The queuing should preserve the order of arrival of operations, which file locks from Oslo concurrency and Tooz don't do. > > Why do we need it for Cinder? IMO, it could help us to address following > issues: > >- Creation of N volumes at the same time increases a lot of resource >usage by cinder-volume service. Image caching feature [2] could help us a >bit in case when we create volume form image. But we still have to upload N >images to the volumes backend at the same time. This is an example where we are in the data path. >- Deletion on N volumes at parallel. Usually, it's not very hard task >for Cinder, but if you have to delete 100+ volumes at once, you can fit >different issues with DB connections, CPU and memory usages. In case of >LVM, it also could use 'dd' command to cleanup volumes. This is a case where it is a backend limitation and should be handled by the drivers. I know some people say that deletion and attaching have problems when a lot of them are requested to the c-vol nodes and that cinder cannot handle the workload properly, but in my experience these cases are always due to suboptimal cinder configuration, like a low number of DB connections configured in cinder that make operations fight for a DB connection creating big delays to complete operations. >- It will be some kind of load balancing in HA mode: if cinder-volume >process is busy with current operations, it will not catch message from >RabbitMQ and other cinder-volume service will do it. I don't understand what you mean with this. Do you mean that Cinder service will stop listening to the message queue when it reaches a certain workload on the "heavy" operations? Then wouldn't it also stop processing "light" operations? >- From users perspective, it seems that better way is to create/delete N >volumes a bit slower than fail after X volumes were created/deleted. I agree, it's better not to fail. :-) Cheers, Gorka. > > > [1] > https://github.com/openstack/nova/blob/283da2bbb74d0a131a66703516d16698a05817c7/nova/conf/compute.py#L161-L163 > [2] > https://specs.openstack.org/openstack/cinder-specs/specs/liberty/image-volume-cache.html > > Regards, > Ivan Kolodyazhny, > http://blog.e0ne.info/ > __ > OpenStack Development Mailing List (not for usage questions) > Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [openstack-operators][cinder] max_concurrent_builds in Cinder
On 24 May 2016 at 05:46, John Griffithwrote: > > Just curious about a couple things: Is this attempting to solve a > problem in the actual Cinder Volume Service or is this trying to solve > problems with backends that can't keep up and deliver resources under heavy > load? > I would posit that no backend can cope with infinite load, and with things like A/A c-vol on the way, cinder is likely to get more efficient to the point it will start stressing more backends. It is certainly worth thinking about. We've more than enough backend technologies that have different but entirely reasonable metadata performance limitations, and several pieces of code outside of backend's control (examples: FC zoning, iSCSI multipath) seem to have clear scalability issues. I think I share a worry that putting limits everywhere becomes a bandaid that avoids fixing deeper problems, whether in cinder or on the backends themselves. > I get the copy-image to volume, that's a special case that certainly does > impact Cinder services and the Cinder node itself, but there's already > throttling going on there, at least in terms of IO allowed. > Which is probably not the behaviour we want - queuing generally gives a better user experience than fair sharing beyond a certain point, since you get to the point that *nothing* gets completed in a reasonable amount of time with only moderate loads. It also seems to be a very common thing for customers to try to boot 300 instances from volume as an early smoke test of a new cloud deployment. I've no idea why, but I've seen it many times, and others have reported the same thing. While I'm not entirely convinced it is a reasonable test, we should probably make sure that the usual behaviour for this is not horrible breakage. The image cache, if turned on, certainly helps massively with this, but I think some form of queuing is a good thing for both image cache work and probably backups too eventually. > Also, I'm curious... would the exiting API Rate Limit configuration > achieve the same sort of thing you want to do here? Granted it's not > selective but maybe it's worth mentioning. > Certainly worth mentioning, since I'm not sure how many people are aware it exists. My experiences of it were that it was too limited to be actually useful (it only rate limits a single process, and we've usually got more than enough enough API workers across multiple nodes that very significant loads are possible before tripping any reasonable per-process rate limit). > If we did do something like this I would like to see it implemented as a > driver config; but that wouldn't help if the problem lies in the Rabbit or > RPC space. That brings me back to wondering about exactly where we want to > solve problems and exactly which. If delete is causing problems like you > describe I'd suspect we have an issue in our DB code (too many calls to > start with) and that we've got some overhead elsewhere that should be > eradicated. Delete is a super simple operation on the Cinder side of > things (and most back ends) so I'm a bit freaked out thinking that it's > taxing resources heavily. > I agree we should definitely do more analysis of where the breakage occurs before adding many limits or queues. Image copy stuff is an easy to analyse first case - i/o stat can tell you exactly where the problem is. Using the fake backend and a large number of API workers / nodes with a pathological load trivially finds breakages currently, though it depends exactly which code version you're running as to where the issues are. The compare & update changes (aka race avoidance patches) have removed a bunch of these, but seem to have led to a significant increase in DB load that means it is easier to get DB timeouts and other issues. As for delete being resource heavy, our reference driver provides a pathological example with the secure delete code. Now that we've got a high degree of confidence in the LVM thin code (specifically, I'm not aware of any instances where it is worse than the LVM-thick code and I don't see any open bugs that disagree), is it time to dump the LVM-thick support completely? -- Duncan Thomas __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [openstack-operators][cinder] max_concurrent_builds in Cinder
On Mon, May 23, 2016 at 8:32 AM, Ivan Kolodyazhnywrote: > Hi developers and operators, > I would like to get any feedback from you about my idea before I'll start > work on spec. > > In Nova, we've got max_concurrent_builds option [1] to set 'Maximum number > of instance builds to run concurrently' per each compute. There is no > equivalent Cinder. > > Why do we need it for Cinder? IMO, it could help us to address following > issues: > >- Creation of N volumes at the same time increases a lot of resource >usage by cinder-volume service. Image caching feature [2] could help us a >bit in case when we create volume form image. But we still have to upload N >images to the volumes backend at the same time. >- Deletion on N volumes at parallel. Usually, it's not very hard task >for Cinder, but if you have to delete 100+ volumes at once, you can fit >different issues with DB connections, CPU and memory usages. In case of >LVM, it also could use 'dd' command to cleanup volumes. >- It will be some kind of load balancing in HA mode: if cinder-volume >process is busy with current operations, it will not catch message from >RabbitMQ and other cinder-volume service will do it. >- From users perspective, it seems that better way is to create/delete >N volumes a bit slower than fail after X volumes were created/deleted. > > > [1] > https://github.com/openstack/nova/blob/283da2bbb74d0a131a66703516d16698a05817c7/nova/conf/compute.py#L161-L163 > [2] > https://specs.openstack.org/openstack/cinder-specs/specs/liberty/image-volume-cache.html > > Regards, > Ivan Kolodyazhny, > http://blog.e0ne.info/ > > __ > OpenStack Development Mailing List (not for usage questions) > Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev > > Just curious about a couple things: Is this attempting to solve a problem in the actual Cinder Volume Service or is this trying to solve problems with backends that can't keep up and deliver resources under heavy load? I get the copy-image to volume, that's a special case that certainly does impact Cinder services and the Cinder node itself, but there's already throttling going on there, at least in terms of IO allowed. Also, I'm curious... would the exiting API Rate Limit configuration achieve the same sort of thing you want to do here? Granted it's not selective but maybe it's worth mentioning. If we did do something like this I would like to see it implemented as a driver config; but that wouldn't help if the problem lies in the Rabbit or RPC space. That brings me back to wondering about exactly where we want to solve problems and exactly which. If delete is causing problems like you describe I'd suspect we have an issue in our DB code (too many calls to start with) and that we've got some overhead elsewhere that should be eradicated. Delete is a super simple operation on the Cinder side of things (and most back ends) so I'm a bit freaked out thinking that it's taxing resources heavily. Thanks, John __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [openstack-operators][cinder] max_concurrent_builds in Cinder
On Mon, May 23, 2016 at 05:32:45PM +0300, Ivan Kolodyazhny wrote: > Hi developers and operators, > I would like to get any feedback from you about my idea before I'll start > work on spec. > > In Nova, we've got max_concurrent_builds option [1] to set 'Maximum number > of instance builds to run concurrently' per each compute. There is no > equivalent Cinder. I do think it would be worth having some sort of operation throttling. So are you thinking something more the "maximum number of volume builds to run concurrently"? As you point out below, there are other operations besides the creation that can cause problems when run in bulk. I do like the idea! > > Why do we need it for Cinder? IMO, it could help us to address following > issues: > >- Creation of N volumes at the same time increases a lot of resource >usage by cinder-volume service. Image caching feature [2] could help us a >bit in case when we create volume form image. But we still have to upload N >images to the volumes backend at the same time. >- Deletion on N volumes at parallel. Usually, it's not very hard task >for Cinder, but if you have to delete 100+ volumes at once, you can fit >different issues with DB connections, CPU and memory usages. In case of >LVM, it also could use 'dd' command to cleanup volumes. >- It will be some kind of load balancing in HA mode: if cinder-volume >process is busy with current operations, it will not catch message from >RabbitMQ and other cinder-volume service will do it. >- From users perspective, it seems that better way is to create/delete N >volumes a bit slower than fail after X volumes were created/deleted. > > > [1] > https://github.com/openstack/nova/blob/283da2bbb74d0a131a66703516d16698a05817c7/nova/conf/compute.py#L161-L163 > [2] > https://specs.openstack.org/openstack/cinder-specs/specs/liberty/image-volume-cache.html > > Regards, > Ivan Kolodyazhny, > http://blog.e0ne.info/ > __ > OpenStack Development Mailing List (not for usage questions) > Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [openstack-operators][cinder] max_concurrent_builds in Cinder
This sounds like a good idea to me. The queue doesn't handle this since we just read everything off immediately anyways. I have seen issues where customers have to write scripts that build 5 volumes, sleep, then build more until they get >100 volumes. Just because a Cinder volume service will clobber itself. -Alex On Mon, May 23, 2016 at 10:32 AM, Ivan Kolodyazhnywrote: > Hi developers and operators, > I would like to get any feedback from you about my idea before I'll start > work on spec. > > In Nova, we've got max_concurrent_builds option [1] to set 'Maximum number > of instance builds to run concurrently' per each compute. There is no > equivalent Cinder. > > Why do we need it for Cinder? IMO, it could help us to address following > issues: > >- Creation of N volumes at the same time increases a lot of resource >usage by cinder-volume service. Image caching feature [2] could help us a >bit in case when we create volume form image. But we still have to upload N >images to the volumes backend at the same time. >- Deletion on N volumes at parallel. Usually, it's not very hard task >for Cinder, but if you have to delete 100+ volumes at once, you can fit >different issues with DB connections, CPU and memory usages. In case of >LVM, it also could use 'dd' command to cleanup volumes. >- It will be some kind of load balancing in HA mode: if cinder-volume >process is busy with current operations, it will not catch message from >RabbitMQ and other cinder-volume service will do it. >- From users perspective, it seems that better way is to create/delete >N volumes a bit slower than fail after X volumes were created/deleted. > > > [1] > https://github.com/openstack/nova/blob/283da2bbb74d0a131a66703516d16698a05817c7/nova/conf/compute.py#L161-L163 > [2] > https://specs.openstack.org/openstack/cinder-specs/specs/liberty/image-volume-cache.html > > Regards, > Ivan Kolodyazhny, > http://blog.e0ne.info/ > > __ > OpenStack Development Mailing List (not for usage questions) > Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev > > __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
[openstack-dev] [openstack-operators][cinder] max_concurrent_builds in Cinder
Hi developers and operators, I would like to get any feedback from you about my idea before I'll start work on spec. In Nova, we've got max_concurrent_builds option [1] to set 'Maximum number of instance builds to run concurrently' per each compute. There is no equivalent Cinder. Why do we need it for Cinder? IMO, it could help us to address following issues: - Creation of N volumes at the same time increases a lot of resource usage by cinder-volume service. Image caching feature [2] could help us a bit in case when we create volume form image. But we still have to upload N images to the volumes backend at the same time. - Deletion on N volumes at parallel. Usually, it's not very hard task for Cinder, but if you have to delete 100+ volumes at once, you can fit different issues with DB connections, CPU and memory usages. In case of LVM, it also could use 'dd' command to cleanup volumes. - It will be some kind of load balancing in HA mode: if cinder-volume process is busy with current operations, it will not catch message from RabbitMQ and other cinder-volume service will do it. - From users perspective, it seems that better way is to create/delete N volumes a bit slower than fail after X volumes were created/deleted. [1] https://github.com/openstack/nova/blob/283da2bbb74d0a131a66703516d16698a05817c7/nova/conf/compute.py#L161-L163 [2] https://specs.openstack.org/openstack/cinder-specs/specs/liberty/image-volume-cache.html Regards, Ivan Kolodyazhny, http://blog.e0ne.info/ __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev