Re: [openstack-dev] [Neutron] Quota enforcement

2015-06-17 Thread Jay Pipes

On 06/16/2015 11:58 PM, Carl Baldwin wrote:

On Tue, Jun 16, 2015 at 5:17 PM, Kevin Benton  wrote:

There seems to be confusion on what causes deadlocks. Can one of you explain
to me how an optimistic locking strategy (a.k.a. compare-and-swap)  results
in deadlocks?

Take the following example where two workers want to update a record:

Worker1: "UPDATE items set value=newvalue1 where value=oldvalue"
Worker2: "UPDATE items set value=newvalue2 where value=oldvalue"

Then each worker checks the count of rows affected by the query. The one
that modified 1 gets to proceed, the one that modified 0 must retry.


Here's my understanding:  In a Galera cluster, if the two are run in
parallel on different masters, then the second one gets a write
certification failure after believing that it had succeeded *and*
reading that 1 row was modified.  The transaction -- when it was all
prepared for commit -- is aborted because the server finds out from
the other masters that it doesn't really work.  This failure is
manifested as a deadlock error from the server that lost.  The code
must catch this "deadlock" error and retry the entire thing.


Yes, Carl, you are correct.


I just learned about Mike Bayer's DBFacade from this thread which will
apparently make the db behave as an active/passive for writes which
should clear this up.  This is new information to me.


The two things are actually unrelated. You can think of the DBFacade 
work -- specifically the @reader and @writer decorators -- as a slicker 
version of the "use_slave=True" keyword arguments that many DB API 
functions in Nova have, which send SQL SELECT statements that can 
tolerate some slave lag to a slave DB node.


In Galera, however, there are no master and slave nodes. They are all 
"masters", because they all represent exactly the same data on disk, 
since Galera uses synchronous replication [1]. So the @writer and 
@reader decorators of DBFacade are not actually going to be useful for 
separating reads and writes to Galera nodes in the same way that that 
functionality is useful in traditional MySQL master/slave replication 
setups.


Best,
-jay

[1] Technically, it's not synchronous, which implies some sort of 
distributed locking is used to protect the order of writes, and Galera 
does not do that. But, for all intents and purposes, the behaviour of 
the replication is synchronous.


__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Neutron] Quota enforcement

2015-06-17 Thread Kevin Benton
Ok. So if I understand it correctly, every update operation we do could
result in a deadlock then? Or is it just ones with "where" criteria that
became invalid.

On Tue, Jun 16, 2015 at 8:58 PM, Carl Baldwin  wrote:

> On Tue, Jun 16, 2015 at 5:17 PM, Kevin Benton  wrote:
> > There seems to be confusion on what causes deadlocks. Can one of you
> explain
> > to me how an optimistic locking strategy (a.k.a. compare-and-swap)
> results
> > in deadlocks?
> >
> > Take the following example where two workers want to update a record:
> >
> > Worker1: "UPDATE items set value=newvalue1 where value=oldvalue"
> > Worker2: "UPDATE items set value=newvalue2 where value=oldvalue"
> >
> > Then each worker checks the count of rows affected by the query. The one
> > that modified 1 gets to proceed, the one that modified 0 must retry.
>
> Here's my understanding:  In a Galera cluster, if the two are run in
> parallel on different masters, then the second one gets a write
> certification failure after believing that it had succeeded *and*
> reading that 1 row was modified.  The transaction -- when it was all
> prepared for commit -- is aborted because the server finds out from
> the other masters that it doesn't really work.  This failure is
> manifested as a deadlock error from the server that lost.  The code
> must catch this "deadlock" error and retry the entire thing.
>
> I just learned about Mike Bayer's DBFacade from this thread which will
> apparently make the db behave as an active/passive for writes which
> should clear this up.  This is new information to me.
>
> I hope my understanding is sound and that it makes sense.
>
> Carl
>
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>



-- 
Kevin Benton
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Neutron] Quota enforcement

2015-06-16 Thread Carl Baldwin
On Tue, Jun 16, 2015 at 5:17 PM, Kevin Benton  wrote:
> There seems to be confusion on what causes deadlocks. Can one of you explain
> to me how an optimistic locking strategy (a.k.a. compare-and-swap)  results
> in deadlocks?
>
> Take the following example where two workers want to update a record:
>
> Worker1: "UPDATE items set value=newvalue1 where value=oldvalue"
> Worker2: "UPDATE items set value=newvalue2 where value=oldvalue"
>
> Then each worker checks the count of rows affected by the query. The one
> that modified 1 gets to proceed, the one that modified 0 must retry.

Here's my understanding:  In a Galera cluster, if the two are run in
parallel on different masters, then the second one gets a write
certification failure after believing that it had succeeded *and*
reading that 1 row was modified.  The transaction -- when it was all
prepared for commit -- is aborted because the server finds out from
the other masters that it doesn't really work.  This failure is
manifested as a deadlock error from the server that lost.  The code
must catch this "deadlock" error and retry the entire thing.

I just learned about Mike Bayer's DBFacade from this thread which will
apparently make the db behave as an active/passive for writes which
should clear this up.  This is new information to me.

I hope my understanding is sound and that it makes sense.

Carl

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Neutron] Quota enforcement

2015-06-16 Thread Kevin Benton
There seems to be confusion on what causes deadlocks. Can one of you
explain to me how an optimistic locking strategy (a.k.a.
compare-and-swap)  results in deadlocks?

Take the following example where two workers want to update a record:

Worker1: "UPDATE items set value=newvalue1 where value=oldvalue"
Worker2: "UPDATE items set value=newvalue2 where value=oldvalue"

Then each worker checks the count of rows affected by the query. The one
that modified 1 gets to proceed, the one that modified 0 must retry.

Do those statements also risk throwing deadlock exceptions? If so, why? I
haven't seen a clear article explaining deadlock conditions not related to
"FOR UPDATE".



On Tue, Jun 16, 2015 at 4:01 PM, Carl Baldwin  wrote:

> On Tue, Jun 16, 2015 at 2:18 PM, Salvatore Orlando 
> wrote:
> > But zzzeek (Mike Bayer) is coming to our help; as a part of his DBFacade
> > work, we should be able to treat active/active cluster as active/passive
> for
> > writes, and active/active for reads. This means that the write set
> > certification issue just won't show up, and the benefits of active/active
> > clusters will still be attained for most operations (I don't think
> there's
> > any doubt that SELECT operations represent the majority of all DB
> > statements).
>
> Okay, so we stop worrying about the write certification failures?
> Lock for update would work as expected?  That would certainly simplify
> the Galera concern.  Maybe everyone already knew this and I have just
> been behind on the latest news again.
>
> > DBDeadlocks without multiple workers also suggest we should look closely
> at
> > what eventlet is doing before placing the blame on pymysql. I don't think
> > that the switch to pymysql is changing the behaviour of the database
> > interface; I think it's changing the way in which neutron interacts to
> the
> > database thus unveiling concurrency issues that we did not spot before
> as we
> > were relying on a sort of implicit locking triggered by the fact that
> some
> > parts of Mysql-Python were implemented in C.
>
> ++
>
> Carl
>
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>



-- 
Kevin Benton
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Neutron] Quota enforcement

2015-06-16 Thread Carl Baldwin
On Tue, Jun 16, 2015 at 2:18 PM, Salvatore Orlando  wrote:
> But zzzeek (Mike Bayer) is coming to our help; as a part of his DBFacade
> work, we should be able to treat active/active cluster as active/passive for
> writes, and active/active for reads. This means that the write set
> certification issue just won't show up, and the benefits of active/active
> clusters will still be attained for most operations (I don't think there's
> any doubt that SELECT operations represent the majority of all DB
> statements).

Okay, so we stop worrying about the write certification failures?
Lock for update would work as expected?  That would certainly simplify
the Galera concern.  Maybe everyone already knew this and I have just
been behind on the latest news again.

> DBDeadlocks without multiple workers also suggest we should look closely at
> what eventlet is doing before placing the blame on pymysql. I don't think
> that the switch to pymysql is changing the behaviour of the database
> interface; I think it's changing the way in which neutron interacts to the
> database thus unveiling concurrency issues that we did not spot before as we
> were relying on a sort of implicit locking triggered by the fact that some
> parts of Mysql-Python were implemented in C.

++

Carl

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Neutron] Quota enforcement

2015-06-16 Thread Salvatore Orlando
Some more comments inline.

Salvatore

On 16 June 2015 at 19:00, Carl Baldwin  wrote:

> On Tue, Jun 16, 2015 at 12:33 AM, Kevin Benton  wrote:
> >>Do these kinds of test even make sense? And are they feasible at all? I
> >> doubt we have any framework for injecting anything in neutron code under
> >> test.
> >
> > I was thinking about this in the context of a lot of the fixes we have
> for
> > other concurrency issues with the database. There are several exception
> > handlers that aren't exercised in normal functional, tempest, and API
> tests
> > because they require a very specific order of events between workers.
> >
> > I wonder if we could write a small shim DB driver that wraps the python
> one
> > for use in tests that just makes a desired set of queries take a long
> time
> > or fail in particular ways? That wouldn't require changes to the neutron
> > code, but it might not give us the right granularity of control.
>
> Might be worth a look.
>

It's a solution for pretty much mocking out the DB interactions. This would
work for fault injection on most neutron-server scenarios, both for RESTful
and RPC interfaces, but we'll need something else to "mock" interactions
with the data plane  that are performed by agents. I think we already have
a mock for the AMQP bus on which we shall just install hooks for injecting
faults.


> >>Finally, please note I am using DB-level locks rather than non-locking
> >> algorithms for making reservations.
> >
> > I thought these were effectively broken in Galera clusters. Is that not
> > correct?
>
> As I understand it, if two writes to two different masters end up
> violating some db-level constraint then the operation will cause a
> failure regardless if there is a lock.
>


> Basically, on Galera, instead of waiting for the lock, each will
> proceed with the transaction.  Finally, on commit, a write
> certification will double check constraints with the rest of the
> cluster (with a write certification).  It is at this point where
> Galera will fail one of them as a deadlock for violating the
> constraint.  Hence the need to retry.  To me, non-locking just means
> that you embrace the fact that the lock won't work and you don't
> bother to apply it in the first place.
>

This is correct.

Db level locks are broken in galera. As Carl says, write sets are sent out
for certification after a transaction is committed.
So the write intent lock, or even primary key constraint violations cannot
be verified before committing the transaction.
As a result you incur a write set certification failure, which is notably
more expensive than an instance-level rollback, and manifests as a
DBDeadlock exception to the OpenStack service.

Retrying a transaction is also a way of embracing this behaviour... you
just accept the idea of having to reach to write set certifications.
Non-locking approaches instead aim at avoiding write set certifications.
The downside is that especially in high concurrency scenario, the operation
is retries many times, and this might become even more expensive than
dealing with the write set certification failure.

But zzzeek (Mike Bayer) is coming to our help; as a part of his DBFacade
work, we should be able to treat active/active cluster as active/passive
for writes, and active/active for reads. This means that the write set
certification issue just won't show up, and the benefits of active/active
clusters will still be attained for most operations (I don't think there's
any doubt that SELECT operations represent the majority of all DB
statements).


> If my understanding is incorrect, please set me straight.
>

You're already straight enough ;)


>
> > If you do go that route, I think you will have to contend with DBDeadlock
> > errors when we switch to the new SQL driver anyway. From what I've
> observed,
> > it seems that if someone is holding a lock on a table and you try to grab
> > it, pymsql immediately throws a deadlock exception.
>

> I'm not familiar with pymysql to know if this is true or not.  But,
> I'm sure that it is possible not to detect the lock at all on galera.
> Someone else will have to chime in to set me straight on the details.
>

DBDeadlocks without multiple workers also suggest we should look closely at
what eventlet is doing before placing the blame on pymysql. I don't think
that the switch to pymysql is changing the behaviour of the database
interface; I think it's changing the way in which neutron interacts to the
database thus unveiling concurrency issues that we did not spot before as
we were relying on a sort of implicit locking triggered by the fact that
some parts of Mysql-Python were implemented in C.


>
> Carl
>
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>

Re: [openstack-dev] [Neutron] Quota enforcement

2015-06-16 Thread Salvatore Orlando
On 16 June 2015 at 18:49, Carl Baldwin  wrote:

> On Thu, Jun 11, 2015 at 2:45 PM, Salvatore Orlando 
> wrote:
> > I have been then following a different approach. And a set of patches,
> > including a devref one [2], if up for review [3]. This hardly completes
> the
> > job: more work is required on the testing side, both as unit and
> functional
> > tests.
> >
> > As for the spec, since I honestly would like to spare myself the hassle
> of
> > rewriting it, I would kindly ask our glorious drivers team if they're ok
> > with me submitting a spec in the shorter format approved for Liberty
> without
> > going through the RFE process, as the spec is however in the Kilo
> backlog.
>
> It took me a second read through to realize that you're talking to me
> among the drivers team.  Personally, I'm okay with this and our
> currently documented policy seems to allow for this until Liberty-1.
>

Great!


>
> I just hope that this isn't an indication that we're requiring too
> much in this new RFE process and scaring potential filers away.  I'm
> trying to learn how to write good RFEs, so let me give it a shot:
>
>   Summary:  "Need robust quota enforcement in Neutron."
>
>   Further Information:  "Neutron can allow exceeding the quota in
> certain cases.  Some investigation revealed that quotas in Neutron are
> subject to a race where parallel requests can each check quota and
> find there is just enough left to fulfill its individual request.
> Each request proceeds to fulfillment with no more regard to the quota.
> When all of the requests are eventually fulfilled, we find that they
> have exceeded the quota."
>
> Given my current knowledge of the RFE process, that is what I would
> file as a bug in launchpad and tag it with 'rfe.'
>

The RFE process is fine and relatively simple. I was just luring somebody
into giving me the exact text to put in it!
Jokes apart, I was suggesting this because since it was a "backlog" spec,
it was already assumed that it was something we wanted to have for Neutron
and thus skip the RFE approval step.


> > For testing I wonder what strategy do you advice for implementing
> functional
> > tests. I could do some black-box testing and verifying quota limits are
> > correctly enforced. However, I would also like to go a bit white-box and
> > also verify that reservation entries are created and removed as
> appropriate
> > when a reservation is committed or cancelled.
> > Finally it would be awesome if I was able to run in the gate functional
> > tests on multi-worker servers, and inject delays or faults to verify the
> > systems behaves correctly when it comes to quota enforcement.
>
> Full black box testing would be impossible to achieve without multiple
> workers, right?  We've proposed adding multiple worker processes to
> the gate a couple of times if I recall including a recent one to .
>

Yeah but Neutron was not as stable with multiple workers, and we had to
revert it (I think I did the revert)


> Fixing the failures has not yet been seen as a priority.
>

I wonder if this is because developers are too busy bikeshedding or chasing
unicorns,  or because the issues we saw are mostly due to the way we run
tests in the gate and are not found by operators in real deployments
(another option if that operators are too afraid of neutron's
unpredictability and they do not even try turning on multiple workers)


> I agree that some whitebox testing should be added.  It may sound a
> bit double-entry to some but I don't mind, especially given the
> challenges around block box testing.  Maybe Assaf can chime in here
> and set us straight.
>

I want white-box testing. I think it's important. Unit tests to an extent
do this, but they don't test the whole functionality. On the other hand
black-bot testing tests the functionality, but it does not tell you whether
the system is actually behaving as you expect. If it's not, it means you
have a fault. And that fault will eventually emerge as a failure. So we
need this kind of testing. However, I need hooks in Neutron in order to
achieve this. Like a sqlalchemy event listener that informs me of completed
transactions, for instance. Or hooks to perform fault injection - like
adding a delay, or altering the return value of a function. It would be
good for me to know whether this is in the testing roadmap for Liberty.


>
> > Do these kinds of test even make sense? And are they feasible at all? I
> > doubt we have any framework for injecting anything in neutron code under
> > test.
>
> Dunno.


> > Finally, please note I am using DB-level locks rather than non-locking
> > algorithms for making reservations. I can move to a non-locking
> algorithm,
> > Jay proposed one for nova for Kilo, and I can just implement that one,
> but
> > first I would like to be convinced with a decent proof (or sort of) that
> the
> > extra cost deriving from collision among workers is overshadowed by the
> cost
> > for having to handle a write-set certification failure 

Re: [openstack-dev] [Neutron] Quota enforcement

2015-06-16 Thread Carl Baldwin
On Tue, Jun 16, 2015 at 12:33 AM, Kevin Benton  wrote:
>>Do these kinds of test even make sense? And are they feasible at all? I
>> doubt we have any framework for injecting anything in neutron code under
>> test.
>
> I was thinking about this in the context of a lot of the fixes we have for
> other concurrency issues with the database. There are several exception
> handlers that aren't exercised in normal functional, tempest, and API tests
> because they require a very specific order of events between workers.
>
> I wonder if we could write a small shim DB driver that wraps the python one
> for use in tests that just makes a desired set of queries take a long time
> or fail in particular ways? That wouldn't require changes to the neutron
> code, but it might not give us the right granularity of control.

Might be worth a look.

>>Finally, please note I am using DB-level locks rather than non-locking
>> algorithms for making reservations.
>
> I thought these were effectively broken in Galera clusters. Is that not
> correct?

As I understand it, if two writes to two different masters end up
violating some db-level constraint then the operation will cause a
failure regardless if there is a lock.

Basically, on Galera, instead of waiting for the lock, each will
proceed with the transaction.  Finally, on commit, a write
certification will double check constraints with the rest of the
cluster (with a write certification).  It is at this point where
Galera will fail one of them as a deadlock for violating the
constraint.  Hence the need to retry.  To me, non-locking just means
that you embrace the fact that the lock won't work and you don't
bother to apply it in the first place.

If my understanding is incorrect, please set me straight.

> If you do go that route, I think you will have to contend with DBDeadlock
> errors when we switch to the new SQL driver anyway. From what I've observed,
> it seems that if someone is holding a lock on a table and you try to grab
> it, pymsql immediately throws a deadlock exception.

I'm not familiar with pymysql to know if this is true or not.  But,
I'm sure that it is possible not to detect the lock at all on galera.
Someone else will have to chime in to set me straight on the details.

Carl

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Neutron] Quota enforcement

2015-06-16 Thread Carl Baldwin
On Thu, Jun 11, 2015 at 2:45 PM, Salvatore Orlando  wrote:
> I have been then following a different approach. And a set of patches,
> including a devref one [2], if up for review [3]. This hardly completes the
> job: more work is required on the testing side, both as unit and functional
> tests.
>
> As for the spec, since I honestly would like to spare myself the hassle of
> rewriting it, I would kindly ask our glorious drivers team if they're ok
> with me submitting a spec in the shorter format approved for Liberty without
> going through the RFE process, as the spec is however in the Kilo backlog.

It took me a second read through to realize that you're talking to me
among the drivers team.  Personally, I'm okay with this and our
currently documented policy seems to allow for this until Liberty-1.

I just hope that this isn't an indication that we're requiring too
much in this new RFE process and scaring potential filers away.  I'm
trying to learn how to write good RFEs, so let me give it a shot:

  Summary:  "Need robust quota enforcement in Neutron."

  Further Information:  "Neutron can allow exceeding the quota in
certain cases.  Some investigation revealed that quotas in Neutron are
subject to a race where parallel requests can each check quota and
find there is just enough left to fulfill its individual request.
Each request proceeds to fulfillment with no more regard to the quota.
When all of the requests are eventually fulfilled, we find that they
have exceeded the quota."

Given my current knowledge of the RFE process, that is what I would
file as a bug in launchpad and tag it with 'rfe.'

> For testing I wonder what strategy do you advice for implementing functional
> tests. I could do some black-box testing and verifying quota limits are
> correctly enforced. However, I would also like to go a bit white-box and
> also verify that reservation entries are created and removed as appropriate
> when a reservation is committed or cancelled.
> Finally it would be awesome if I was able to run in the gate functional
> tests on multi-worker servers, and inject delays or faults to verify the
> systems behaves correctly when it comes to quota enforcement.

Full black box testing would be impossible to achieve without multiple
workers, right?  We've proposed adding multiple worker processes to
the gate a couple of times if I recall including a recent one to .
Fixing the failures has not yet been seen as a priority.

I agree that some whitebox testing should be added.  It may sound a
bit double-entry to some but I don't mind, especially given the
challenges around block box testing.  Maybe Assaf can chime in here
and set us straight.

> Do these kinds of test even make sense? And are they feasible at all? I
> doubt we have any framework for injecting anything in neutron code under
> test.

Dunno.

> Finally, please note I am using DB-level locks rather than non-locking
> algorithms for making reservations. I can move to a non-locking algorithm,
> Jay proposed one for nova for Kilo, and I can just implement that one, but
> first I would like to be convinced with a decent proof (or sort of) that the
> extra cost deriving from collision among workers is overshadowed by the cost
> for having to handle a write-set certification failure and retry the
> operation.

Do you have a reference describing the algorithm Jay proposed?

> Please advice.
>
> Regards,
> Salvatore
>
> [1]
> http://specs.openstack.org/openstack/neutron-specs/specs/kilo-backlog/better-quotas.html
> [2] https://review.openstack.org/#/c/190798/
> [3]
> https://review.openstack.org/#/q/project:openstack/neutron+branch:master+topic:bp/better-quotas,n,z
>
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Neutron] Quota enforcement

2015-06-15 Thread Kevin Benton
>I would kindly ask our glorious drivers team if they're ok with me
submitting a spec in the shorter format approved for Liberty without going
through the RFE process, as the spec is however in the Kilo backlog.

+1!

>Do these kinds of test even make sense? And are they feasible at all? I
doubt we have any framework for injecting anything in neutron code under
test.

I was thinking about this in the context of a lot of the fixes we have for
other concurrency issues with the database. There are several exception
handlers that aren't exercised in normal functional, tempest, and API tests
because they require a very specific order of events between workers.

I wonder if we could write a small shim DB driver that wraps the python one
for use in tests that just makes a desired set of queries take a long time
or fail in particular ways? That wouldn't require changes to the neutron
code, but it might not give us the right granularity of control.

>Finally, please note I am using DB-level locks rather than non-locking
algorithms for making reservations.

I thought these were effectively broken in Galera clusters. Is that not
correct?

If you do go that route, I think you will have to contend with DBDeadlock
errors when we switch to the new SQL driver anyway. From what I've
observed, it seems that if someone is holding a lock on a table and you try
to grab it, pymsql immediately throws a deadlock exception.

Cheers,
Kevin Benton

On Thu, Jun 11, 2015 at 1:45 PM, Salvatore Orlando 
wrote:

> Aloha!
>
> As you know I pushed spec [1] during the Kilo lifecycle, but given the
> lazy procrastinator that I am, I did not manage to complete in time for the
> release.
>
> This actually gave me a chance to realise that the spec that I pushed and
> had approved did not make a lot of sense. Even worse, there were some false
> claims especially when it comes to active-active DB clusters such as mysql
> galera.
>
> Thankfully nobody bothered to look at that - possibly because it renders
> horribly in HTML - and that spared me a public shaming.
>
> I have been then following a different approach. And a set of patches,
> including a devref one [2], if up for review [3]. This hardly completes the
> job: more work is required on the testing side, both as unit and functional
> tests.
>
> As for the spec, since I honestly would like to spare myself the hassle of
> rewriting it, I would kindly ask our glorious drivers team if they're ok
> with me submitting a spec in the shorter format approved for Liberty
> without going through the RFE process, as the spec is however in the Kilo
> backlog.
>
> For testing I wonder what strategy do you advice for implementing
> functional tests. I could do some black-box testing and verifying quota
> limits are correctly enforced. However, I would also like to go a bit
> white-box and also verify that reservation entries are created and removed
> as appropriate when a reservation is committed or cancelled.
> Finally it would be awesome if I was able to run in the gate functional
> tests on multi-worker servers, and inject delays or faults to verify the
> systems behaves correctly when it comes to quota enforcement.
>
> Do these kinds of test even make sense? And are they feasible at all? I
> doubt we have any framework for injecting anything in neutron code under
> test.
>
> Finally, please note I am using DB-level locks rather than non-locking
> algorithms for making reservations. I can move to a non-locking algorithm,
> Jay proposed one for nova for Kilo, and I can just implement that one, but
> first I would like to be convinced with a decent proof (or sort of) that
> the extra cost deriving from collision among workers is overshadowed by the
> cost for having to handle a write-set certification failure and retry the
> operation.
>
> Please advice.
>
> Regards,
> Salvatore
>
> [1]
> http://specs.openstack.org/openstack/neutron-specs/specs/kilo-backlog/better-quotas.html
> [2] https://review.openstack.org/#/c/190798/
> [3]
> https://review.openstack.org/#/q/project:openstack/neutron+branch:master+topic:bp/better-quotas,n,z
>
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
>


-- 
Kevin Benton
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [Neutron] Quota enforcement

2015-06-11 Thread Salvatore Orlando
Aloha!

As you know I pushed spec [1] during the Kilo lifecycle, but given the lazy
procrastinator that I am, I did not manage to complete in time for the
release.

This actually gave me a chance to realise that the spec that I pushed and
had approved did not make a lot of sense. Even worse, there were some false
claims especially when it comes to active-active DB clusters such as mysql
galera.

Thankfully nobody bothered to look at that - possibly because it renders
horribly in HTML - and that spared me a public shaming.

I have been then following a different approach. And a set of patches,
including a devref one [2], if up for review [3]. This hardly completes the
job: more work is required on the testing side, both as unit and functional
tests.

As for the spec, since I honestly would like to spare myself the hassle of
rewriting it, I would kindly ask our glorious drivers team if they're ok
with me submitting a spec in the shorter format approved for Liberty
without going through the RFE process, as the spec is however in the Kilo
backlog.

For testing I wonder what strategy do you advice for implementing
functional tests. I could do some black-box testing and verifying quota
limits are correctly enforced. However, I would also like to go a bit
white-box and also verify that reservation entries are created and removed
as appropriate when a reservation is committed or cancelled.
Finally it would be awesome if I was able to run in the gate functional
tests on multi-worker servers, and inject delays or faults to verify the
systems behaves correctly when it comes to quota enforcement.

Do these kinds of test even make sense? And are they feasible at all? I
doubt we have any framework for injecting anything in neutron code under
test.

Finally, please note I am using DB-level locks rather than non-locking
algorithms for making reservations. I can move to a non-locking algorithm,
Jay proposed one for nova for Kilo, and I can just implement that one, but
first I would like to be convinced with a decent proof (or sort of) that
the extra cost deriving from collision among workers is overshadowed by the
cost for having to handle a write-set certification failure and retry the
operation.

Please advice.

Regards,
Salvatore

[1]
http://specs.openstack.org/openstack/neutron-specs/specs/kilo-backlog/better-quotas.html
[2] https://review.openstack.org/#/c/190798/
[3]
https://review.openstack.org/#/q/project:openstack/neutron+branch:master+topic:bp/better-quotas,n,z
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev