Re: [openstack-dev] [openstack-operators][cinder] max_concurrent_builds in Cinder

2016-05-24 Thread Michał Dulko


On 05/24/2016 04:38 PM, Gorka Eguileor wrote:
> On 23/05, Ivan Kolodyazhny wrote:
>> Hi developers and operators,
>> I would like to get any feedback from you about my idea before I'll start
>> work on spec.
>>
>> In Nova, we've got max_concurrent_builds option [1] to set 'Maximum number
>> of instance builds to run concurrently' per each compute. There is no
>> equivalent Cinder.
> Hi,
>
> First I want to say that I think this is a good idea because I know this
> message will get diluted once I start with my mumbling.  ;-)
>
> The first thing we should allow to control is the number of workers per
> service, since we currently only allow setting it for the API nodes and
> all other nodes will use a default of 1000.  I posted a patch [1] to
> allow this and it's been sitting there for the last 3 months.  :'-(
>
> As I see it not all mentioned problems are equal, and the main
> distinction is caused by Cinder being not only in the control path but
> also in the data path. Resulting in some of the issues being backend
> specific limitations, that I believe should be address differently in
> the specs.
>
> For operations where Cinder is in the control path we should be
> limiting/queuing operations in the cinder core code (for example the
> manager) whereas when the limitation only applies to some drivers this
> should be addressed by the drivers themselves.  Although the spec should
> provide a clear mechanism/pattern to solve it in the drivers as well so
> all drivers can use a similar pattern which will provide consistency,
> making it easier to review and maintain.
>
> The queuing should preserve the order of arrival of operations, which
> file locks from Oslo concurrency and Tooz don't do.

I would be seriously opposed to queuing done inside Cinder code. It
makes draining a service harder and increases impact of a failure of a
single service. We already have a queue system and it is whatever you're
running under oslo.messaging (RabbitMQ mostly). Making our RPC workers
number configurable for each service sounds like a best shot to me.

>> Why do we need it for Cinder? IMO, it could help us to address following
>> issues:
>>
>>- Creation of N volumes at the same time increases a lot of resource
>>usage by cinder-volume service. Image caching feature [2] could help us a
>>bit in case when we create volume form image. But we still have to upload 
>> N
>>images to the volumes backend at the same time.
> This is an example where we are in the data path.
>
>>- Deletion on N volumes at parallel. Usually, it's not very hard task
>>for Cinder, but if you have to delete 100+ volumes at once, you can fit
>>different issues with DB connections, CPU and memory usages. In case of
>>LVM, it also could use 'dd' command to cleanup volumes.
> This is a case where it is a backend limitation and should be handled by
> the drivers.
>
> I know some people say that deletion and attaching have problems when a
> lot of them are requested to the c-vol nodes and that cinder cannot
> handle the workload properly, but in my experience these cases are
> always due to suboptimal cinder configuration, like a low number of DB
> connections configured in cinder that make operations fight for a DB
> connection creating big delays to complete operations.
>
>>- It will be some kind of load balancing in HA mode: if cinder-volume
>>process is busy with current operations, it will not catch message from
>>RabbitMQ and other cinder-volume service will do it.
> I don't understand what you mean with this.  Do you mean that Cinder
> service will stop listening to the message queue when it reaches a
> certain workload on the "heavy" operations?  Then wouldn't it also stop
> processing "light" operations?
>
>>- From users perspective, it seems that better way is to create/delete N
>>volumes a bit slower than fail after X volumes were created/deleted.
> I agree, it's better not to fail.  :-)
>
> Cheers,
> Gorka.
>
>>
>> [1]
>> https://github.com/openstack/nova/blob/283da2bbb74d0a131a66703516d16698a05817c7/nova/conf/compute.py#L161-L163
>> [2]
>> https://specs.openstack.org/openstack/cinder-specs/specs/liberty/image-volume-cache.html
>>
>> Regards,
>> Ivan Kolodyazhny,
>> http://blog.e0ne.info/
>> __
>> OpenStack Development Mailing List (not for usage questions)
>> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


__
OpenStack Development Mailing List (not for usage questions)

Re: [openstack-dev] [openstack-operators][cinder] max_concurrent_builds in Cinder

2016-05-24 Thread Gorka Eguileor
On 23/05, Ivan Kolodyazhny wrote:
> Hi developers and operators,
> I would like to get any feedback from you about my idea before I'll start
> work on spec.
> 
> In Nova, we've got max_concurrent_builds option [1] to set 'Maximum number
> of instance builds to run concurrently' per each compute. There is no
> equivalent Cinder.

Hi,

First I want to say that I think this is a good idea because I know this
message will get diluted once I start with my mumbling.  ;-)

The first thing we should allow to control is the number of workers per
service, since we currently only allow setting it for the API nodes and
all other nodes will use a default of 1000.  I posted a patch [1] to
allow this and it's been sitting there for the last 3 months.  :'-(

As I see it not all mentioned problems are equal, and the main
distinction is caused by Cinder being not only in the control path but
also in the data path. Resulting in some of the issues being backend
specific limitations, that I believe should be address differently in
the specs.

For operations where Cinder is in the control path we should be
limiting/queuing operations in the cinder core code (for example the
manager) whereas when the limitation only applies to some drivers this
should be addressed by the drivers themselves.  Although the spec should
provide a clear mechanism/pattern to solve it in the drivers as well so
all drivers can use a similar pattern which will provide consistency,
making it easier to review and maintain.

The queuing should preserve the order of arrival of operations, which
file locks from Oslo concurrency and Tooz don't do.

> 
> Why do we need it for Cinder? IMO, it could help us to address following
> issues:
> 
>- Creation of N volumes at the same time increases a lot of resource
>usage by cinder-volume service. Image caching feature [2] could help us a
>bit in case when we create volume form image. But we still have to upload N
>images to the volumes backend at the same time.

This is an example where we are in the data path.

>- Deletion on N volumes at parallel. Usually, it's not very hard task
>for Cinder, but if you have to delete 100+ volumes at once, you can fit
>different issues with DB connections, CPU and memory usages. In case of
>LVM, it also could use 'dd' command to cleanup volumes.

This is a case where it is a backend limitation and should be handled by
the drivers.

I know some people say that deletion and attaching have problems when a
lot of them are requested to the c-vol nodes and that cinder cannot
handle the workload properly, but in my experience these cases are
always due to suboptimal cinder configuration, like a low number of DB
connections configured in cinder that make operations fight for a DB
connection creating big delays to complete operations.

>- It will be some kind of load balancing in HA mode: if cinder-volume
>process is busy with current operations, it will not catch message from
>RabbitMQ and other cinder-volume service will do it.

I don't understand what you mean with this.  Do you mean that Cinder
service will stop listening to the message queue when it reaches a
certain workload on the "heavy" operations?  Then wouldn't it also stop
processing "light" operations?

>- From users perspective, it seems that better way is to create/delete N
>volumes a bit slower than fail after X volumes were created/deleted.

I agree, it's better not to fail.  :-)

Cheers,
Gorka.

> 
> 
> [1]
> https://github.com/openstack/nova/blob/283da2bbb74d0a131a66703516d16698a05817c7/nova/conf/compute.py#L161-L163
> [2]
> https://specs.openstack.org/openstack/cinder-specs/specs/liberty/image-volume-cache.html
> 
> Regards,
> Ivan Kolodyazhny,
> http://blog.e0ne.info/

> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [openstack-operators][cinder] max_concurrent_builds in Cinder

2016-05-24 Thread Duncan Thomas
On 24 May 2016 at 05:46, John Griffith  wrote:

>
> ​Just curious about a couple things:  Is this attempting to solve a
> problem in the actual Cinder Volume Service or is this trying to solve
> problems with backends that can't keep up and deliver resources under heavy
> load?
>

I would posit that no backend can cope with infinite load, and with things
like A/A c-vol on the way, cinder is likely to get more efficient to the
point it will start stressing more backends. It is certainly worth thinking
about.

We've more than enough backend technologies that have different but
entirely reasonable metadata performance limitations, and several pieces of
code outside of backend's control (examples: FC zoning, iSCSI multipath)
seem to have clear scalability issues.

I think I share a worry that putting limits everywhere becomes a bandaid
that avoids fixing deeper problems, whether in cinder or on the backends
themselves.


> I get the copy-image to volume, that's a special case that certainly does
> impact Cinder services and the Cinder node itself, but there's already
> throttling going on there, at least in terms of IO allowed.
>

Which is probably not the behaviour we want - queuing generally gives a
better user experience than fair sharing beyond a certain point, since you
get to the point that *nothing* gets completed in a reasonable amount of
time with only moderate loads.

It also seems to be a very common thing for customers to try to boot 300
instances from volume as an early smoke test of a new cloud deployment.
I've no idea why, but I've seen it many times, and others have reported the
same thing. While I'm not entirely convinced it is a reasonable test, we
should probably make sure that the usual behaviour for this is not horrible
breakage. The image cache, if turned on, certainly helps massively with
this, but I think some form of queuing is a good thing for both image cache
work and probably backups too eventually.


> Also, I'm curious... would the exiting API Rate Limit configuration
> achieve the same sort of thing you want to do here?  Granted it's not
> selective but maybe it's worth mentioning.
>

Certainly worth mentioning, since I'm not sure how many people are aware it
exists. My experiences of it were that it was too limited to be actually
useful (it only rate limits a single process, and we've usually got more
than enough enough API workers across multiple nodes that very significant
loads are possible before tripping any reasonable per-process rate limit).



> If we did do something like this I would like to see it implemented as a
> driver config; but that wouldn't help if the problem lies in the Rabbit or
> RPC space.  That brings me back to wondering about exactly where we want to
> solve problems and exactly which.  If delete is causing problems like you
> describe I'd suspect we have an issue in our DB code (too many calls to
> start with) and that we've got some overhead elsewhere that should be
> eradicated.  Delete is a super simple operation on the Cinder side of
> things (and most back ends) so I'm a bit freaked out thinking that it's
> taxing resources heavily.
>

I agree we should definitely do more analysis of where the breakage occurs
before adding many limits or queues. Image copy stuff is an easy to analyse
first case - i/o stat can tell you exactly where the problem is.

Using the fake backend and a large number of API workers / nodes with a
pathological load trivially finds breakages currently, though it depends
exactly which code version you're running as to where the issues are. The
compare & update changes (aka race avoidance patches) have removed a bunch
of these, but seem to have led to a significant increase in DB load that
means it is easier to get DB timeouts and other issues.

As for delete being resource heavy, our reference driver provides a
pathological example with the secure delete code. Now that we've got a high
degree of confidence in the LVM thin code (specifically, I'm not aware of
any instances where it is worse than the LVM-thick code and I don't see any
open bugs that disagree), is it time to dump the LVM-thick support
completely?


-- 
Duncan Thomas
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [openstack-operators][cinder] max_concurrent_builds in Cinder

2016-05-23 Thread John Griffith
On Mon, May 23, 2016 at 8:32 AM, Ivan Kolodyazhny  wrote:

> Hi developers and operators,
> I would like to get any feedback from you about my idea before I'll start
> work on spec.
>
> In Nova, we've got max_concurrent_builds option [1] to set 'Maximum number
> of instance builds to run concurrently' per each compute. There is no
> equivalent Cinder.
>
> Why do we need it for Cinder? IMO, it could help us to address following
> issues:
>
>- Creation of N volumes at the same time increases a lot of resource
>usage by cinder-volume service. Image caching feature [2] could help us a
>bit in case when we create volume form image. But we still have to upload N
>images to the volumes backend at the same time.
>- Deletion on N volumes at parallel. Usually, it's not very hard task
>for Cinder, but if you have to delete 100+ volumes at once, you can fit
>different issues with DB connections, CPU and memory usages. In case of
>LVM, it also could use 'dd' command to cleanup volumes.
>- It will be some kind of load balancing in HA mode: if cinder-volume
>process is busy with current operations, it will not catch message from
>RabbitMQ and other cinder-volume service will do it.
>- From users perspective, it seems that better way is to create/delete
>N volumes a bit slower than fail after X volumes were created/deleted.
>
>
> [1]
> https://github.com/openstack/nova/blob/283da2bbb74d0a131a66703516d16698a05817c7/nova/conf/compute.py#L161-L163
> [2]
> https://specs.openstack.org/openstack/cinder-specs/specs/liberty/image-volume-cache.html
>
> Regards,
> Ivan Kolodyazhny,
> http://blog.e0ne.info/
>
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
> ​Just curious about a couple things:  Is this attempting to solve a
problem in the actual Cinder Volume Service or is this trying to solve
problems with backends that can't keep up and deliver resources under heavy
load?  I get the copy-image to volume, that's a special case that certainly
does impact Cinder services and the Cinder node itself, but there's already
throttling going on there, at least in terms of IO allowed.

Also, I'm curious... would the exiting API Rate Limit configuration achieve
the same sort of thing you want to do here?  Granted it's not selective but
maybe it's worth mentioning.

If we did do something like this I would like to see it implemented as a
driver config; but that wouldn't help if the problem lies in the Rabbit or
RPC space.  That brings me back to wondering about exactly where we want to
solve problems and exactly which.  If delete is causing problems like you
describe I'd suspect we have an issue in our DB code (too many calls to
start with) and that we've got some overhead elsewhere that should be
eradicated.  Delete is a super simple operation on the Cinder side of
things (and most back ends) so I'm a bit freaked out thinking that it's
taxing resources heavily.

Thanks,
John
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [openstack-operators][cinder] max_concurrent_builds in Cinder

2016-05-23 Thread Sean McGinnis
On Mon, May 23, 2016 at 05:32:45PM +0300, Ivan Kolodyazhny wrote:
> Hi developers and operators,
> I would like to get any feedback from you about my idea before I'll start
> work on spec.
> 
> In Nova, we've got max_concurrent_builds option [1] to set 'Maximum number
> of instance builds to run concurrently' per each compute. There is no
> equivalent Cinder.

I do think it would be worth having some sort of operation throttling.

So are you thinking something more the "maximum number of volume builds
to run concurrently"? As you point out below, there are other operations
besides the creation that can cause problems when run in bulk.

I do like the idea!

> 
> Why do we need it for Cinder? IMO, it could help us to address following
> issues:
> 
>- Creation of N volumes at the same time increases a lot of resource
>usage by cinder-volume service. Image caching feature [2] could help us a
>bit in case when we create volume form image. But we still have to upload N
>images to the volumes backend at the same time.
>- Deletion on N volumes at parallel. Usually, it's not very hard task
>for Cinder, but if you have to delete 100+ volumes at once, you can fit
>different issues with DB connections, CPU and memory usages. In case of
>LVM, it also could use 'dd' command to cleanup volumes.
>- It will be some kind of load balancing in HA mode: if cinder-volume
>process is busy with current operations, it will not catch message from
>RabbitMQ and other cinder-volume service will do it.
>- From users perspective, it seems that better way is to create/delete N
>volumes a bit slower than fail after X volumes were created/deleted.
> 
> 
> [1]
> https://github.com/openstack/nova/blob/283da2bbb74d0a131a66703516d16698a05817c7/nova/conf/compute.py#L161-L163
> [2]
> https://specs.openstack.org/openstack/cinder-specs/specs/liberty/image-volume-cache.html
> 
> Regards,
> Ivan Kolodyazhny,
> http://blog.e0ne.info/

> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [openstack-operators][cinder] max_concurrent_builds in Cinder

2016-05-23 Thread Alex Meade
This sounds like a good idea to me. The queue doesn't handle this since we
just read everything off immediately anyways. I have seen issues where
customers have to write scripts that build 5 volumes, sleep, then build
more until they get >100 volumes. Just because a Cinder volume service will
clobber itself.

-Alex

On Mon, May 23, 2016 at 10:32 AM, Ivan Kolodyazhny  wrote:

> Hi developers and operators,
> I would like to get any feedback from you about my idea before I'll start
> work on spec.
>
> In Nova, we've got max_concurrent_builds option [1] to set 'Maximum number
> of instance builds to run concurrently' per each compute. There is no
> equivalent Cinder.
>
> Why do we need it for Cinder? IMO, it could help us to address following
> issues:
>
>- Creation of N volumes at the same time increases a lot of resource
>usage by cinder-volume service. Image caching feature [2] could help us a
>bit in case when we create volume form image. But we still have to upload N
>images to the volumes backend at the same time.
>- Deletion on N volumes at parallel. Usually, it's not very hard task
>for Cinder, but if you have to delete 100+ volumes at once, you can fit
>different issues with DB connections, CPU and memory usages. In case of
>LVM, it also could use 'dd' command to cleanup volumes.
>- It will be some kind of load balancing in HA mode: if cinder-volume
>process is busy with current operations, it will not catch message from
>RabbitMQ and other cinder-volume service will do it.
>- From users perspective, it seems that better way is to create/delete
>N volumes a bit slower than fail after X volumes were created/deleted.
>
>
> [1]
> https://github.com/openstack/nova/blob/283da2bbb74d0a131a66703516d16698a05817c7/nova/conf/compute.py#L161-L163
> [2]
> https://specs.openstack.org/openstack/cinder-specs/specs/liberty/image-volume-cache.html
>
> Regards,
> Ivan Kolodyazhny,
> http://blog.e0ne.info/
>
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
>
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [openstack-operators][cinder] max_concurrent_builds in Cinder

2016-05-23 Thread Ivan Kolodyazhny
Hi developers and operators,
I would like to get any feedback from you about my idea before I'll start
work on spec.

In Nova, we've got max_concurrent_builds option [1] to set 'Maximum number
of instance builds to run concurrently' per each compute. There is no
equivalent Cinder.

Why do we need it for Cinder? IMO, it could help us to address following
issues:

   - Creation of N volumes at the same time increases a lot of resource
   usage by cinder-volume service. Image caching feature [2] could help us a
   bit in case when we create volume form image. But we still have to upload N
   images to the volumes backend at the same time.
   - Deletion on N volumes at parallel. Usually, it's not very hard task
   for Cinder, but if you have to delete 100+ volumes at once, you can fit
   different issues with DB connections, CPU and memory usages. In case of
   LVM, it also could use 'dd' command to cleanup volumes.
   - It will be some kind of load balancing in HA mode: if cinder-volume
   process is busy with current operations, it will not catch message from
   RabbitMQ and other cinder-volume service will do it.
   - From users perspective, it seems that better way is to create/delete N
   volumes a bit slower than fail after X volumes were created/deleted.


[1]
https://github.com/openstack/nova/blob/283da2bbb74d0a131a66703516d16698a05817c7/nova/conf/compute.py#L161-L163
[2]
https://specs.openstack.org/openstack/cinder-specs/specs/liberty/image-volume-cache.html

Regards,
Ivan Kolodyazhny,
http://blog.e0ne.info/
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev