Re: [openstack-dev] [Neutron] Quota enforcement
On 06/16/2015 11:58 PM, Carl Baldwin wrote: On Tue, Jun 16, 2015 at 5:17 PM, Kevin Benton wrote: There seems to be confusion on what causes deadlocks. Can one of you explain to me how an optimistic locking strategy (a.k.a. compare-and-swap) results in deadlocks? Take the following example where two workers want to update a record: Worker1: "UPDATE items set value=newvalue1 where value=oldvalue" Worker2: "UPDATE items set value=newvalue2 where value=oldvalue" Then each worker checks the count of rows affected by the query. The one that modified 1 gets to proceed, the one that modified 0 must retry. Here's my understanding: In a Galera cluster, if the two are run in parallel on different masters, then the second one gets a write certification failure after believing that it had succeeded *and* reading that 1 row was modified. The transaction -- when it was all prepared for commit -- is aborted because the server finds out from the other masters that it doesn't really work. This failure is manifested as a deadlock error from the server that lost. The code must catch this "deadlock" error and retry the entire thing. Yes, Carl, you are correct. I just learned about Mike Bayer's DBFacade from this thread which will apparently make the db behave as an active/passive for writes which should clear this up. This is new information to me. The two things are actually unrelated. You can think of the DBFacade work -- specifically the @reader and @writer decorators -- as a slicker version of the "use_slave=True" keyword arguments that many DB API functions in Nova have, which send SQL SELECT statements that can tolerate some slave lag to a slave DB node. In Galera, however, there are no master and slave nodes. They are all "masters", because they all represent exactly the same data on disk, since Galera uses synchronous replication [1]. So the @writer and @reader decorators of DBFacade are not actually going to be useful for separating reads and writes to Galera nodes in the same way that that functionality is useful in traditional MySQL master/slave replication setups. Best, -jay [1] Technically, it's not synchronous, which implies some sort of distributed locking is used to protect the order of writes, and Galera does not do that. But, for all intents and purposes, the behaviour of the replication is synchronous. __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Neutron] Quota enforcement
Ok. So if I understand it correctly, every update operation we do could result in a deadlock then? Or is it just ones with "where" criteria that became invalid. On Tue, Jun 16, 2015 at 8:58 PM, Carl Baldwin wrote: > On Tue, Jun 16, 2015 at 5:17 PM, Kevin Benton wrote: > > There seems to be confusion on what causes deadlocks. Can one of you > explain > > to me how an optimistic locking strategy (a.k.a. compare-and-swap) > results > > in deadlocks? > > > > Take the following example where two workers want to update a record: > > > > Worker1: "UPDATE items set value=newvalue1 where value=oldvalue" > > Worker2: "UPDATE items set value=newvalue2 where value=oldvalue" > > > > Then each worker checks the count of rows affected by the query. The one > > that modified 1 gets to proceed, the one that modified 0 must retry. > > Here's my understanding: In a Galera cluster, if the two are run in > parallel on different masters, then the second one gets a write > certification failure after believing that it had succeeded *and* > reading that 1 row was modified. The transaction -- when it was all > prepared for commit -- is aborted because the server finds out from > the other masters that it doesn't really work. This failure is > manifested as a deadlock error from the server that lost. The code > must catch this "deadlock" error and retry the entire thing. > > I just learned about Mike Bayer's DBFacade from this thread which will > apparently make the db behave as an active/passive for writes which > should clear this up. This is new information to me. > > I hope my understanding is sound and that it makes sense. > > Carl > > __ > OpenStack Development Mailing List (not for usage questions) > Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev > -- Kevin Benton __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Neutron] Quota enforcement
On Tue, Jun 16, 2015 at 5:17 PM, Kevin Benton wrote: > There seems to be confusion on what causes deadlocks. Can one of you explain > to me how an optimistic locking strategy (a.k.a. compare-and-swap) results > in deadlocks? > > Take the following example where two workers want to update a record: > > Worker1: "UPDATE items set value=newvalue1 where value=oldvalue" > Worker2: "UPDATE items set value=newvalue2 where value=oldvalue" > > Then each worker checks the count of rows affected by the query. The one > that modified 1 gets to proceed, the one that modified 0 must retry. Here's my understanding: In a Galera cluster, if the two are run in parallel on different masters, then the second one gets a write certification failure after believing that it had succeeded *and* reading that 1 row was modified. The transaction -- when it was all prepared for commit -- is aborted because the server finds out from the other masters that it doesn't really work. This failure is manifested as a deadlock error from the server that lost. The code must catch this "deadlock" error and retry the entire thing. I just learned about Mike Bayer's DBFacade from this thread which will apparently make the db behave as an active/passive for writes which should clear this up. This is new information to me. I hope my understanding is sound and that it makes sense. Carl __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Neutron] Quota enforcement
There seems to be confusion on what causes deadlocks. Can one of you explain to me how an optimistic locking strategy (a.k.a. compare-and-swap) results in deadlocks? Take the following example where two workers want to update a record: Worker1: "UPDATE items set value=newvalue1 where value=oldvalue" Worker2: "UPDATE items set value=newvalue2 where value=oldvalue" Then each worker checks the count of rows affected by the query. The one that modified 1 gets to proceed, the one that modified 0 must retry. Do those statements also risk throwing deadlock exceptions? If so, why? I haven't seen a clear article explaining deadlock conditions not related to "FOR UPDATE". On Tue, Jun 16, 2015 at 4:01 PM, Carl Baldwin wrote: > On Tue, Jun 16, 2015 at 2:18 PM, Salvatore Orlando > wrote: > > But zzzeek (Mike Bayer) is coming to our help; as a part of his DBFacade > > work, we should be able to treat active/active cluster as active/passive > for > > writes, and active/active for reads. This means that the write set > > certification issue just won't show up, and the benefits of active/active > > clusters will still be attained for most operations (I don't think > there's > > any doubt that SELECT operations represent the majority of all DB > > statements). > > Okay, so we stop worrying about the write certification failures? > Lock for update would work as expected? That would certainly simplify > the Galera concern. Maybe everyone already knew this and I have just > been behind on the latest news again. > > > DBDeadlocks without multiple workers also suggest we should look closely > at > > what eventlet is doing before placing the blame on pymysql. I don't think > > that the switch to pymysql is changing the behaviour of the database > > interface; I think it's changing the way in which neutron interacts to > the > > database thus unveiling concurrency issues that we did not spot before > as we > > were relying on a sort of implicit locking triggered by the fact that > some > > parts of Mysql-Python were implemented in C. > > ++ > > Carl > > __ > OpenStack Development Mailing List (not for usage questions) > Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev > -- Kevin Benton __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Neutron] Quota enforcement
On Tue, Jun 16, 2015 at 2:18 PM, Salvatore Orlando wrote: > But zzzeek (Mike Bayer) is coming to our help; as a part of his DBFacade > work, we should be able to treat active/active cluster as active/passive for > writes, and active/active for reads. This means that the write set > certification issue just won't show up, and the benefits of active/active > clusters will still be attained for most operations (I don't think there's > any doubt that SELECT operations represent the majority of all DB > statements). Okay, so we stop worrying about the write certification failures? Lock for update would work as expected? That would certainly simplify the Galera concern. Maybe everyone already knew this and I have just been behind on the latest news again. > DBDeadlocks without multiple workers also suggest we should look closely at > what eventlet is doing before placing the blame on pymysql. I don't think > that the switch to pymysql is changing the behaviour of the database > interface; I think it's changing the way in which neutron interacts to the > database thus unveiling concurrency issues that we did not spot before as we > were relying on a sort of implicit locking triggered by the fact that some > parts of Mysql-Python were implemented in C. ++ Carl __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Neutron] Quota enforcement
Some more comments inline. Salvatore On 16 June 2015 at 19:00, Carl Baldwin wrote: > On Tue, Jun 16, 2015 at 12:33 AM, Kevin Benton wrote: > >>Do these kinds of test even make sense? And are they feasible at all? I > >> doubt we have any framework for injecting anything in neutron code under > >> test. > > > > I was thinking about this in the context of a lot of the fixes we have > for > > other concurrency issues with the database. There are several exception > > handlers that aren't exercised in normal functional, tempest, and API > tests > > because they require a very specific order of events between workers. > > > > I wonder if we could write a small shim DB driver that wraps the python > one > > for use in tests that just makes a desired set of queries take a long > time > > or fail in particular ways? That wouldn't require changes to the neutron > > code, but it might not give us the right granularity of control. > > Might be worth a look. > It's a solution for pretty much mocking out the DB interactions. This would work for fault injection on most neutron-server scenarios, both for RESTful and RPC interfaces, but we'll need something else to "mock" interactions with the data plane that are performed by agents. I think we already have a mock for the AMQP bus on which we shall just install hooks for injecting faults. > >>Finally, please note I am using DB-level locks rather than non-locking > >> algorithms for making reservations. > > > > I thought these were effectively broken in Galera clusters. Is that not > > correct? > > As I understand it, if two writes to two different masters end up > violating some db-level constraint then the operation will cause a > failure regardless if there is a lock. > > Basically, on Galera, instead of waiting for the lock, each will > proceed with the transaction. Finally, on commit, a write > certification will double check constraints with the rest of the > cluster (with a write certification). It is at this point where > Galera will fail one of them as a deadlock for violating the > constraint. Hence the need to retry. To me, non-locking just means > that you embrace the fact that the lock won't work and you don't > bother to apply it in the first place. > This is correct. Db level locks are broken in galera. As Carl says, write sets are sent out for certification after a transaction is committed. So the write intent lock, or even primary key constraint violations cannot be verified before committing the transaction. As a result you incur a write set certification failure, which is notably more expensive than an instance-level rollback, and manifests as a DBDeadlock exception to the OpenStack service. Retrying a transaction is also a way of embracing this behaviour... you just accept the idea of having to reach to write set certifications. Non-locking approaches instead aim at avoiding write set certifications. The downside is that especially in high concurrency scenario, the operation is retries many times, and this might become even more expensive than dealing with the write set certification failure. But zzzeek (Mike Bayer) is coming to our help; as a part of his DBFacade work, we should be able to treat active/active cluster as active/passive for writes, and active/active for reads. This means that the write set certification issue just won't show up, and the benefits of active/active clusters will still be attained for most operations (I don't think there's any doubt that SELECT operations represent the majority of all DB statements). > If my understanding is incorrect, please set me straight. > You're already straight enough ;) > > > If you do go that route, I think you will have to contend with DBDeadlock > > errors when we switch to the new SQL driver anyway. From what I've > observed, > > it seems that if someone is holding a lock on a table and you try to grab > > it, pymsql immediately throws a deadlock exception. > > I'm not familiar with pymysql to know if this is true or not. But, > I'm sure that it is possible not to detect the lock at all on galera. > Someone else will have to chime in to set me straight on the details. > DBDeadlocks without multiple workers also suggest we should look closely at what eventlet is doing before placing the blame on pymysql. I don't think that the switch to pymysql is changing the behaviour of the database interface; I think it's changing the way in which neutron interacts to the database thus unveiling concurrency issues that we did not spot before as we were relying on a sort of implicit locking triggered by the fact that some parts of Mysql-Python were implemented in C. > > Carl > > __ > OpenStack Development Mailing List (not for usage questions) > Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev >
Re: [openstack-dev] [Neutron] Quota enforcement
On 16 June 2015 at 18:49, Carl Baldwin wrote: > On Thu, Jun 11, 2015 at 2:45 PM, Salvatore Orlando > wrote: > > I have been then following a different approach. And a set of patches, > > including a devref one [2], if up for review [3]. This hardly completes > the > > job: more work is required on the testing side, both as unit and > functional > > tests. > > > > As for the spec, since I honestly would like to spare myself the hassle > of > > rewriting it, I would kindly ask our glorious drivers team if they're ok > > with me submitting a spec in the shorter format approved for Liberty > without > > going through the RFE process, as the spec is however in the Kilo > backlog. > > It took me a second read through to realize that you're talking to me > among the drivers team. Personally, I'm okay with this and our > currently documented policy seems to allow for this until Liberty-1. > Great! > > I just hope that this isn't an indication that we're requiring too > much in this new RFE process and scaring potential filers away. I'm > trying to learn how to write good RFEs, so let me give it a shot: > > Summary: "Need robust quota enforcement in Neutron." > > Further Information: "Neutron can allow exceeding the quota in > certain cases. Some investigation revealed that quotas in Neutron are > subject to a race where parallel requests can each check quota and > find there is just enough left to fulfill its individual request. > Each request proceeds to fulfillment with no more regard to the quota. > When all of the requests are eventually fulfilled, we find that they > have exceeded the quota." > > Given my current knowledge of the RFE process, that is what I would > file as a bug in launchpad and tag it with 'rfe.' > The RFE process is fine and relatively simple. I was just luring somebody into giving me the exact text to put in it! Jokes apart, I was suggesting this because since it was a "backlog" spec, it was already assumed that it was something we wanted to have for Neutron and thus skip the RFE approval step. > > For testing I wonder what strategy do you advice for implementing > functional > > tests. I could do some black-box testing and verifying quota limits are > > correctly enforced. However, I would also like to go a bit white-box and > > also verify that reservation entries are created and removed as > appropriate > > when a reservation is committed or cancelled. > > Finally it would be awesome if I was able to run in the gate functional > > tests on multi-worker servers, and inject delays or faults to verify the > > systems behaves correctly when it comes to quota enforcement. > > Full black box testing would be impossible to achieve without multiple > workers, right? We've proposed adding multiple worker processes to > the gate a couple of times if I recall including a recent one to . > Yeah but Neutron was not as stable with multiple workers, and we had to revert it (I think I did the revert) > Fixing the failures has not yet been seen as a priority. > I wonder if this is because developers are too busy bikeshedding or chasing unicorns, or because the issues we saw are mostly due to the way we run tests in the gate and are not found by operators in real deployments (another option if that operators are too afraid of neutron's unpredictability and they do not even try turning on multiple workers) > I agree that some whitebox testing should be added. It may sound a > bit double-entry to some but I don't mind, especially given the > challenges around block box testing. Maybe Assaf can chime in here > and set us straight. > I want white-box testing. I think it's important. Unit tests to an extent do this, but they don't test the whole functionality. On the other hand black-bot testing tests the functionality, but it does not tell you whether the system is actually behaving as you expect. If it's not, it means you have a fault. And that fault will eventually emerge as a failure. So we need this kind of testing. However, I need hooks in Neutron in order to achieve this. Like a sqlalchemy event listener that informs me of completed transactions, for instance. Or hooks to perform fault injection - like adding a delay, or altering the return value of a function. It would be good for me to know whether this is in the testing roadmap for Liberty. > > > Do these kinds of test even make sense? And are they feasible at all? I > > doubt we have any framework for injecting anything in neutron code under > > test. > > Dunno. > > Finally, please note I am using DB-level locks rather than non-locking > > algorithms for making reservations. I can move to a non-locking > algorithm, > > Jay proposed one for nova for Kilo, and I can just implement that one, > but > > first I would like to be convinced with a decent proof (or sort of) that > the > > extra cost deriving from collision among workers is overshadowed by the > cost > > for having to handle a write-set certification failure
Re: [openstack-dev] [Neutron] Quota enforcement
On Tue, Jun 16, 2015 at 12:33 AM, Kevin Benton wrote: >>Do these kinds of test even make sense? And are they feasible at all? I >> doubt we have any framework for injecting anything in neutron code under >> test. > > I was thinking about this in the context of a lot of the fixes we have for > other concurrency issues with the database. There are several exception > handlers that aren't exercised in normal functional, tempest, and API tests > because they require a very specific order of events between workers. > > I wonder if we could write a small shim DB driver that wraps the python one > for use in tests that just makes a desired set of queries take a long time > or fail in particular ways? That wouldn't require changes to the neutron > code, but it might not give us the right granularity of control. Might be worth a look. >>Finally, please note I am using DB-level locks rather than non-locking >> algorithms for making reservations. > > I thought these were effectively broken in Galera clusters. Is that not > correct? As I understand it, if two writes to two different masters end up violating some db-level constraint then the operation will cause a failure regardless if there is a lock. Basically, on Galera, instead of waiting for the lock, each will proceed with the transaction. Finally, on commit, a write certification will double check constraints with the rest of the cluster (with a write certification). It is at this point where Galera will fail one of them as a deadlock for violating the constraint. Hence the need to retry. To me, non-locking just means that you embrace the fact that the lock won't work and you don't bother to apply it in the first place. If my understanding is incorrect, please set me straight. > If you do go that route, I think you will have to contend with DBDeadlock > errors when we switch to the new SQL driver anyway. From what I've observed, > it seems that if someone is holding a lock on a table and you try to grab > it, pymsql immediately throws a deadlock exception. I'm not familiar with pymysql to know if this is true or not. But, I'm sure that it is possible not to detect the lock at all on galera. Someone else will have to chime in to set me straight on the details. Carl __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Neutron] Quota enforcement
On Thu, Jun 11, 2015 at 2:45 PM, Salvatore Orlando wrote: > I have been then following a different approach. And a set of patches, > including a devref one [2], if up for review [3]. This hardly completes the > job: more work is required on the testing side, both as unit and functional > tests. > > As for the spec, since I honestly would like to spare myself the hassle of > rewriting it, I would kindly ask our glorious drivers team if they're ok > with me submitting a spec in the shorter format approved for Liberty without > going through the RFE process, as the spec is however in the Kilo backlog. It took me a second read through to realize that you're talking to me among the drivers team. Personally, I'm okay with this and our currently documented policy seems to allow for this until Liberty-1. I just hope that this isn't an indication that we're requiring too much in this new RFE process and scaring potential filers away. I'm trying to learn how to write good RFEs, so let me give it a shot: Summary: "Need robust quota enforcement in Neutron." Further Information: "Neutron can allow exceeding the quota in certain cases. Some investigation revealed that quotas in Neutron are subject to a race where parallel requests can each check quota and find there is just enough left to fulfill its individual request. Each request proceeds to fulfillment with no more regard to the quota. When all of the requests are eventually fulfilled, we find that they have exceeded the quota." Given my current knowledge of the RFE process, that is what I would file as a bug in launchpad and tag it with 'rfe.' > For testing I wonder what strategy do you advice for implementing functional > tests. I could do some black-box testing and verifying quota limits are > correctly enforced. However, I would also like to go a bit white-box and > also verify that reservation entries are created and removed as appropriate > when a reservation is committed or cancelled. > Finally it would be awesome if I was able to run in the gate functional > tests on multi-worker servers, and inject delays or faults to verify the > systems behaves correctly when it comes to quota enforcement. Full black box testing would be impossible to achieve without multiple workers, right? We've proposed adding multiple worker processes to the gate a couple of times if I recall including a recent one to . Fixing the failures has not yet been seen as a priority. I agree that some whitebox testing should be added. It may sound a bit double-entry to some but I don't mind, especially given the challenges around block box testing. Maybe Assaf can chime in here and set us straight. > Do these kinds of test even make sense? And are they feasible at all? I > doubt we have any framework for injecting anything in neutron code under > test. Dunno. > Finally, please note I am using DB-level locks rather than non-locking > algorithms for making reservations. I can move to a non-locking algorithm, > Jay proposed one for nova for Kilo, and I can just implement that one, but > first I would like to be convinced with a decent proof (or sort of) that the > extra cost deriving from collision among workers is overshadowed by the cost > for having to handle a write-set certification failure and retry the > operation. Do you have a reference describing the algorithm Jay proposed? > Please advice. > > Regards, > Salvatore > > [1] > http://specs.openstack.org/openstack/neutron-specs/specs/kilo-backlog/better-quotas.html > [2] https://review.openstack.org/#/c/190798/ > [3] > https://review.openstack.org/#/q/project:openstack/neutron+branch:master+topic:bp/better-quotas,n,z > > __ > OpenStack Development Mailing List (not for usage questions) > Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev > __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Neutron] Quota enforcement
>I would kindly ask our glorious drivers team if they're ok with me submitting a spec in the shorter format approved for Liberty without going through the RFE process, as the spec is however in the Kilo backlog. +1! >Do these kinds of test even make sense? And are they feasible at all? I doubt we have any framework for injecting anything in neutron code under test. I was thinking about this in the context of a lot of the fixes we have for other concurrency issues with the database. There are several exception handlers that aren't exercised in normal functional, tempest, and API tests because they require a very specific order of events between workers. I wonder if we could write a small shim DB driver that wraps the python one for use in tests that just makes a desired set of queries take a long time or fail in particular ways? That wouldn't require changes to the neutron code, but it might not give us the right granularity of control. >Finally, please note I am using DB-level locks rather than non-locking algorithms for making reservations. I thought these were effectively broken in Galera clusters. Is that not correct? If you do go that route, I think you will have to contend with DBDeadlock errors when we switch to the new SQL driver anyway. From what I've observed, it seems that if someone is holding a lock on a table and you try to grab it, pymsql immediately throws a deadlock exception. Cheers, Kevin Benton On Thu, Jun 11, 2015 at 1:45 PM, Salvatore Orlando wrote: > Aloha! > > As you know I pushed spec [1] during the Kilo lifecycle, but given the > lazy procrastinator that I am, I did not manage to complete in time for the > release. > > This actually gave me a chance to realise that the spec that I pushed and > had approved did not make a lot of sense. Even worse, there were some false > claims especially when it comes to active-active DB clusters such as mysql > galera. > > Thankfully nobody bothered to look at that - possibly because it renders > horribly in HTML - and that spared me a public shaming. > > I have been then following a different approach. And a set of patches, > including a devref one [2], if up for review [3]. This hardly completes the > job: more work is required on the testing side, both as unit and functional > tests. > > As for the spec, since I honestly would like to spare myself the hassle of > rewriting it, I would kindly ask our glorious drivers team if they're ok > with me submitting a spec in the shorter format approved for Liberty > without going through the RFE process, as the spec is however in the Kilo > backlog. > > For testing I wonder what strategy do you advice for implementing > functional tests. I could do some black-box testing and verifying quota > limits are correctly enforced. However, I would also like to go a bit > white-box and also verify that reservation entries are created and removed > as appropriate when a reservation is committed or cancelled. > Finally it would be awesome if I was able to run in the gate functional > tests on multi-worker servers, and inject delays or faults to verify the > systems behaves correctly when it comes to quota enforcement. > > Do these kinds of test even make sense? And are they feasible at all? I > doubt we have any framework for injecting anything in neutron code under > test. > > Finally, please note I am using DB-level locks rather than non-locking > algorithms for making reservations. I can move to a non-locking algorithm, > Jay proposed one for nova for Kilo, and I can just implement that one, but > first I would like to be convinced with a decent proof (or sort of) that > the extra cost deriving from collision among workers is overshadowed by the > cost for having to handle a write-set certification failure and retry the > operation. > > Please advice. > > Regards, > Salvatore > > [1] > http://specs.openstack.org/openstack/neutron-specs/specs/kilo-backlog/better-quotas.html > [2] https://review.openstack.org/#/c/190798/ > [3] > https://review.openstack.org/#/q/project:openstack/neutron+branch:master+topic:bp/better-quotas,n,z > > __ > OpenStack Development Mailing List (not for usage questions) > Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev > > -- Kevin Benton __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
[openstack-dev] [Neutron] Quota enforcement
Aloha! As you know I pushed spec [1] during the Kilo lifecycle, but given the lazy procrastinator that I am, I did not manage to complete in time for the release. This actually gave me a chance to realise that the spec that I pushed and had approved did not make a lot of sense. Even worse, there were some false claims especially when it comes to active-active DB clusters such as mysql galera. Thankfully nobody bothered to look at that - possibly because it renders horribly in HTML - and that spared me a public shaming. I have been then following a different approach. And a set of patches, including a devref one [2], if up for review [3]. This hardly completes the job: more work is required on the testing side, both as unit and functional tests. As for the spec, since I honestly would like to spare myself the hassle of rewriting it, I would kindly ask our glorious drivers team if they're ok with me submitting a spec in the shorter format approved for Liberty without going through the RFE process, as the spec is however in the Kilo backlog. For testing I wonder what strategy do you advice for implementing functional tests. I could do some black-box testing and verifying quota limits are correctly enforced. However, I would also like to go a bit white-box and also verify that reservation entries are created and removed as appropriate when a reservation is committed or cancelled. Finally it would be awesome if I was able to run in the gate functional tests on multi-worker servers, and inject delays or faults to verify the systems behaves correctly when it comes to quota enforcement. Do these kinds of test even make sense? And are they feasible at all? I doubt we have any framework for injecting anything in neutron code under test. Finally, please note I am using DB-level locks rather than non-locking algorithms for making reservations. I can move to a non-locking algorithm, Jay proposed one for nova for Kilo, and I can just implement that one, but first I would like to be convinced with a decent proof (or sort of) that the extra cost deriving from collision among workers is overshadowed by the cost for having to handle a write-set certification failure and retry the operation. Please advice. Regards, Salvatore [1] http://specs.openstack.org/openstack/neutron-specs/specs/kilo-backlog/better-quotas.html [2] https://review.openstack.org/#/c/190798/ [3] https://review.openstack.org/#/q/project:openstack/neutron+branch:master+topic:bp/better-quotas,n,z __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev