Re: [openstack-dev] [Neutron] DB: transaction isolation and related questions

2014-11-21 Thread Eugene Nikanorov
Comments inline:

On Thu, Nov 20, 2014 at 4:34 AM, Jay Pipes jaypi...@gmail.com wrote:



 So while the SELECTs may return different data on successive calls when
 you use the READ COMMITTED isolation level, the UPDATE statements will
 continue to return 0 rows affected **if they attempt to change rows that
 have been changed since the start of the transaction**

 The reason that changing the isolation level to READ COMMITTED appears to
 work for the code in question:

 https://github.com/openstack/neutron/blob/master/neutron/
 plugins/ml2/drivers/helpers.py#L98

 is likely because the SELECT ... LIMIT 1 query is returning a different
 row on successive attempts (though since there is no ORDER BY on the query,
 the returned row of the query is entirely unpredictable (line 112)). Since
 data from that returned row is used in the UPDATE statement (line 118 and
 124), *different* rows are actually being changed by successive UPDATE
 statements.


Not really, we're updating the same row we've selected. It's ensured
by 'raw_segment'
which actually contains 'gre_id' (or similar) attribute.
So in each iteration we're working with the same row, but in different
iterations READ COMMITTED allows us to see different data and hence work
with a different row.



 What this means is that for this *very particular* case, setting the
 transaction isolation level to READ COMMITTTED will work presumably most of
 the time on MySQL, but it's not an appropriate solution for the generalized
 problem domain of the SELECT FOR UPDATE. If you need to issue a SELECT and
 an UPDATE in a retry loop, and you are attempting to update the same row or
 rows (for instance, in the quota reservation or resource allocation
 scenarios), this solution will not work, even with READ COMMITTED. This is
 why I say it's not really appropriate, and a better general solution is to
 use separate transactions for each loop in the retry mechanic.


By saying 'this solution will not work' what issues do you mean what
exactly?
Btw, I agree on using separate transaction for each loop, the problem is
that transaction is usually not 'local' to the method where the retry loop
resides.



 The issue is about doing the retry within a single transaction. That's not
 what I recommend doing. I recommend instead doing short separate
 transactions instead of long-lived, multi-statement transactions and
 relying on the behaviour of the DB's isolation level (default or otherwise)
 to solve the problem of reading changes to a record that you intend to
 update.


 instead of long-lived, multi-statement transactions - that's really what
would require quite large code redesign.
So far finding a way to bring retry logic upper to the stack of nesting
transactions seems more appropriate.

Thanks,
Eugene.


 Cheers,
 -jay

  Also, thanks Clint for clarification about example scenario described by
 Mike Bayer.
 Initially the issue was discovered with concurrent tests on multi master
 environment with galera as a DB backend.

 Thanks,
 Eugene

 On Thu, Nov 20, 2014 at 12:20 AM, Mike Bayer mba...@redhat.com
 mailto:mba...@redhat.com wrote:


  On Nov 19, 2014, at 3:47 PM, Ryan Moats rmo...@us.ibm.com
 mailto:rmo...@us.ibm.com wrote:

 
 BTW, I view your examples from oslo as helping make my argument for
 me (and I don't think that was your intent :) )


 I disagree with that as IMHO the differences in producing MM in the
 app layer against arbitrary backends (Postgresql vs. DB2 vs. MariaDB
 vs. ???)  will incur a lot more “bifurcation” than a system that
 targets only a handful of existing MM solutions.  The example I
 referred to in oslo.db is dealing with distinct, non MM backends.
 That level of DB-specific code and more is a given if we are
 building a MM system against multiple backends generically.

 It’s not possible to say which approach would be better or worse at
 the level of “how much database specific application logic do we
 need”, though in my experience, no matter what one is trying to do,
 the answer is always, “tons”; we’re dealing not just with databases
 but also Python drivers that have a vast amount of differences in
 behaviors, at every level.On top of all of that, hand-rolled MM
 adds just that much more application code to be developed and
 maintained, which also claims it will do a better job than mature
 (ish?) database systems designed to do the same job against a
 specific backend.




   My reason for asking this question here is that if the community
   wants to consider #2, then these problems are the place to start
   crafting that solution - if we solve the conflicts inherent
 with the
   two conncurrent thread scenarios, then I think we will find that
   we've solved the multi-master problem essentially for free”.
 
  Maybe I’m missing something, if we learn how to write out a row
 such
  that a concurrent transaction 

Re: [openstack-dev] [Neutron] DB: transaction isolation and related questions

2014-11-21 Thread Jay Pipes
Eugene, I just spent about an hour playing around with an example 
scenario in both MySQL and PostgreSQL using READ COMMITTED and 
REPEATABLE READ isolation levels. The scenario I used had a single row 
being updated, with a loop and a check on rows affected.


*You are 100% correct that setting the transaction isolation level to 
READ COMMITTED works in the retry loop*.


I stand corrected, and humbled :) Please accept my apologies.

One thing I did note, though, is that changing the isolation level of an 
*already-started transaction* does not change the current transaction's 
isolation level -- the new isolation level only takes effect once the 
previously started transaction is committed or rolled back. So, on line 
107 in your proposed patch here:


https://review.openstack.org/#/c/129288/5/neutron/plugins/ml2/drivers/helpers.py

From what I could find out in my research, the setting of the isolation 
level needs to be done *outside* of the session.begin() call, otherwise 
the isolation level will not take effect until that transaction is 
committed or rolled back. Of course, if SQLAlchemy is doing some 
auto-commit or something in the session, then you may not see this 
affect, but I certainly was able to see this in my testing in mysql 
client sessions... so I'm a little perplexed as to how your code works 
on already-started transactions. The documentation on the MySQL site 
backs up what I observed:


http://dev.mysql.com/doc/refman/5.0/en/set-transaction.html

...the statement sets the default transaction level for all subsequent 
transactions performed within the current session.


All the best, and thanks for the informative lesson of the week!
-jay

On 11/21/2014 03:24 AM, Eugene Nikanorov wrote:

Comments inline:

On Thu, Nov 20, 2014 at 4:34 AM, Jay Pipes jaypi...@gmail.com
mailto:jaypi...@gmail.com wrote:



So while the SELECTs may return different data on successive calls
when you use the READ COMMITTED isolation level, the UPDATE
statements will continue to return 0 rows affected **if they attempt
to change rows that have been changed since the start of the
transaction**

The reason that changing the isolation level to READ COMMITTED
appears to work for the code in question:


https://github.com/openstack/__neutron/blob/master/neutron/__plugins/ml2/drivers/helpers.__py#L98

https://github.com/openstack/neutron/blob/master/neutron/plugins/ml2/drivers/helpers.py#L98

is likely because the SELECT ... LIMIT 1 query is returning a
different row on successive attempts (though since there is no ORDER
BY on the query, the returned row of the query is entirely
unpredictable (line 112)). Since data from that returned row is used
in the UPDATE statement (line 118 and 124), *different* rows are
actually being changed by successive UPDATE statements.


Not really, we're updating the same row we've selected. It's ensured by
'raw_segment' which actually contains 'gre_id' (or similar) attribute.
So in each iteration we're working with the same row, but in different
iterations READ COMMITTED allows us to see different data and hence work
with a different row.


What this means is that for this *very particular* case, setting the
transaction isolation level to READ COMMITTTED will work presumably
most of the time on MySQL, but it's not an appropriate solution for
the generalized problem domain of the SELECT FOR UPDATE. If you need
to issue a SELECT and an UPDATE in a retry loop, and you are
attempting to update the same row or rows (for instance, in the
quota reservation or resource allocation scenarios), this solution
will not work, even with READ COMMITTED. This is why I say it's not
really appropriate, and a better general solution is to use separate
transactions for each loop in the retry mechanic.

By saying 'this solution will not work' what issues do you mean what
exactly?
Btw, I agree on using separate transaction for each loop, the problem is
that transaction is usually not 'local' to the method where the retry
loop resides.



The issue is about doing the retry within a single transaction.
That's not what I recommend doing. I recommend instead doing short
separate transactions instead of long-lived, multi-statement
transactions and relying on the behaviour of the DB's isolation
level (default or otherwise) to solve the problem of reading
changes to a record that you intend to update.

 instead of long-lived, multi-statement transactions - that's really
what would require quite large code redesign.
So far finding a way to bring retry logic upper to the stack of nesting
transactions seems more appropriate.

Thanks,
Eugene.


Cheers,
-jay

Also, thanks Clint for clarification about example scenario
described by
Mike Bayer.
Initially the issue was discovered with concurrent tests on
multi master
environment with galera as 

Re: [openstack-dev] [Neutron] DB: transaction isolation and related questions

2014-11-21 Thread Mike Bayer
regardless, I’d still very much prefer isolation being set here in an explicit 
fashion and on a per-transaction basis, rather than across the board, if a 
method is known to require it.   The advantage of this is that the isolation 
level in general can be changed without any risk of the individual method’s 
isolation level changing, because it is explicit.  It also serves as a document 
as to the intricacies of a certain method.

I’d favor adding markers for isolation level into my decorators outlined in 
https://review.openstack.org/#/c/125181/.   The system will be tailored in such 
a way that it is settable on a per-backend basis, or otherwise in some agnostic 
fashion, either:


@sql.writer(mysql_isolation=‘READ COMMITTED’)
def some_api_method(context):

or:

@sql.writer(effective_isolation=‘READ COMMITTED')
def some_api_method(context):

where in the latter case, steps will be taken behind the scenes on a 
per-dialect basis to ensure the effect is present as much as possible, else (if 
say using SQLite) a warning would be emitted.

I believe if we can set this up in a visible and explicit way it will be a lot 
more robust for those methods which contain retry logic that is sensitive to 
isolation level.




 On Nov 21, 2014, at 2:35 PM, Jay Pipes jaypi...@gmail.com wrote:
 
 Eugene, I just spent about an hour playing around with an example scenario in 
 both MySQL and PostgreSQL using READ COMMITTED and REPEATABLE READ isolation 
 levels. The scenario I used had a single row being updated, with a loop and a 
 check on rows affected.
 
 *You are 100% correct that setting the transaction isolation level to READ 
 COMMITTED works in the retry loop*.
 
 I stand corrected, and humbled :) Please accept my apologies.
 
 One thing I did note, though, is that changing the isolation level of an 
 *already-started transaction* does not change the current transaction's 
 isolation level -- the new isolation level only takes effect once the 
 previously started transaction is committed or rolled back. So, on line 107 
 in your proposed patch here:
 
 https://review.openstack.org/#/c/129288/5/neutron/plugins/ml2/drivers/helpers.py
 
 From what I could find out in my research, the setting of the isolation level 
 needs to be done *outside* of the session.begin() call, otherwise the 
 isolation level will not take effect until that transaction is committed or 
 rolled back. Of course, if SQLAlchemy is doing some auto-commit or something 
 in the session, then you may not see this affect, but I certainly was able to 
 see this in my testing in mysql client sessions... so I'm a little perplexed 
 as to how your code works on already-started transactions. The documentation 
 on the MySQL site backs up what I observed:
 
 http://dev.mysql.com/doc/refman/5.0/en/set-transaction.html
 
 ...the statement sets the default transaction level for all subsequent 
 transactions performed within the current session.
 
 All the best, and thanks for the informative lesson of the week!
 -jay
 
 On 11/21/2014 03:24 AM, Eugene Nikanorov wrote:
 Comments inline:
 
 On Thu, Nov 20, 2014 at 4:34 AM, Jay Pipes jaypi...@gmail.com
 mailto:jaypi...@gmail.com wrote:
 
 
 
So while the SELECTs may return different data on successive calls
when you use the READ COMMITTED isolation level, the UPDATE
statements will continue to return 0 rows affected **if they attempt
to change rows that have been changed since the start of the
transaction**
 
The reason that changing the isolation level to READ COMMITTED
appears to work for the code in question:
 

 https://github.com/openstack/__neutron/blob/master/neutron/__plugins/ml2/drivers/helpers.__py#L98

 https://github.com/openstack/neutron/blob/master/neutron/plugins/ml2/drivers/helpers.py#L98
 
is likely because the SELECT ... LIMIT 1 query is returning a
different row on successive attempts (though since there is no ORDER
BY on the query, the returned row of the query is entirely
unpredictable (line 112)). Since data from that returned row is used
in the UPDATE statement (line 118 and 124), *different* rows are
actually being changed by successive UPDATE statements.
 
 
 Not really, we're updating the same row we've selected. It's ensured by
 'raw_segment' which actually contains 'gre_id' (or similar) attribute.
 So in each iteration we're working with the same row, but in different
 iterations READ COMMITTED allows us to see different data and hence work
 with a different row.
 
 
What this means is that for this *very particular* case, setting the
transaction isolation level to READ COMMITTTED will work presumably
most of the time on MySQL, but it's not an appropriate solution for
the generalized problem domain of the SELECT FOR UPDATE. If you need
to issue a SELECT and an UPDATE in a retry loop, and you are
attempting to update the same row or rows (for instance, in the
quota reservation or resource 

Re: [openstack-dev] [Neutron] DB: transaction isolation and related questions

2014-11-21 Thread Eugene Nikanorov


 *You are 100% correct that setting the transaction isolation level to READ
 COMMITTED works in the retry loop*.

 I stand corrected, and humbled :) Please accept my apologies.

Thanks for letting me know :)



 One thing I did note, though, is that changing the isolation level of an
 *already-started transaction* does not change the current transaction's
 isolation level -- the new isolation level only takes effect once the
 previously started transaction is committed or rolled back. So, on line 107
 in your proposed patch here:

 https://review.openstack.org/#/c/129288/5/neutron/plugins/
 ml2/drivers/helpers.py

 From what I could find out in my research, the setting of the isolation
 level needs to be done *outside* of the session.begin() call, otherwise the
 isolation level will not take effect until that transaction is committed or
 rolled back.


You're right. Apparently I've misread sqlalchemy docs at some point. Now I
see I've understood them incorrectly.
Also, from some sqlalchemy code inspection I thought that default engine
isolation level is restored after current transaction is committed, but
it's not so as well.



 Of course, if SQLAlchemy is doing some auto-commit or something in the
 session, then you may not see this affect, but I certainly was able to see
 this in my testing in mysql client sessions... so I'm a little perplexed as
 to how your code works on already-started transactions. The documentation
 on the MySQL site backs up what I observed:

 http://dev.mysql.com/doc/refman/5.0/en/set-transaction.html

 ...the statement sets the default transaction level for all subsequent
 transactions performed within the current session.


That basically means that my patch effectively changes isolation level of
every connection that happens to be used for the session.


 All the best, and thanks for the informative lesson of the week!


Thank you for getting to the very bottom of it! :)

Eugene.

-jay


___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Neutron] DB: transaction isolation and related questions

2014-11-19 Thread Mike Bayer

 On Nov 18, 2014, at 1:38 PM, Eugene Nikanorov enikano...@mirantis.com wrote:
 
 Hi neutron folks,
 
 There is an ongoing effort to refactor some neutron DB logic to be compatible 
 with galera/mysql which doesn't support locking (with_lockmode('update')).
 
 Some code paths that used locking in the past were rewritten to retry the 
 operation if they detect that an object was modified concurrently.
 The problem here is that all DB operations (CRUD) are performed in the scope 
 of some transaction that makes complex operations to be executed in atomic 
 manner.
 For mysql the default transaction isolation level is 'REPEATABLE READ' which 
 means that once the code issue a query within a transaction, this query will 
 return the same result while in this transaction (e.g. the snapshot is taken 
 by the DB during the first query and then reused for the same query).
 In other words, the retry logic like the following will not work:
 
 def allocate_obj():
 with session.begin(subtrans=True):
  for i in xrange(n_retries):
   obj = session.query(Model).filter_by(filters)
   count = session.query(Model).filter_by(id=obj.id 
 http://obj.id/).update({'allocated': True})
   if count:
return obj
 
 since usually methods like allocate_obj() is called from within another 
 transaction, we can't simply put transaction under 'for' loop to fix the 
 issue.

has this been confirmed?  the point of systems like repeatable read is not just 
that you read the “old” data, it’s also to ensure that updates to that data 
either proceed or fail explicitly; locking is also used to prevent concurrent 
access that can’t be reconciled.  A lower isolation removes these advantages.  

I ran a simple test in two MySQL sessions as follows:

session 1:

mysql create table some_table(data integer) engine=innodb;
Query OK, 0 rows affected (0.01 sec)

mysql insert into some_table(data) values (1);
Query OK, 1 row affected (0.00 sec)

mysql begin;
Query OK, 0 rows affected (0.00 sec)

mysql select data from some_table;
+--+
| data |
+--+
|1 |
+--+
1 row in set (0.00 sec)


session 2:

mysql begin;
Query OK, 0 rows affected (0.00 sec)

mysql update some_table set data=2 where data=1;
Query OK, 1 row affected (0.00 sec)
Rows matched: 1  Changed: 1  Warnings: 0

then back in session 1, I ran:

mysql update some_table set data=3 where data=1;

this query blocked;  that’s because session 2 has placed a write lock on the 
table.  this is the effect of repeatable read isolation.

while it blocked, I went to session 2 and committed the in-progress transaction:

mysql commit;
Query OK, 0 rows affected (0.00 sec)

then session 1 unblocked, and it reported, correctly, that zero rows were 
affected:

Query OK, 0 rows affected (7.29 sec)
Rows matched: 0  Changed: 0  Warnings: 0

the update had not taken place, as was stated by “rows matched:

mysql select * from some_table;
+--+
| data |
+--+
|1 |
+--+
1 row in set (0.00 sec)

the code in question would do a retry at this point; it is checking the number 
of rows matched, and that number is accurate.

if our code did *not* block at the point of our UPDATE, then it would have 
proceeded, and the other transaction would have overwritten what we just did, 
when it committed.   I don’t know that read committed is necessarily any better 
here.

now perhaps, with Galera, none of this works correctly.  That would be a 
different issue in which case sure, we should use whatever isolation is 
recommended for Galera.  But I’d want to potentially peg it to the fact that 
Galera is in use, or not.

would love also to hear from Jay Pipes on this since he literally wrote the 
book on MySQL ! :)


___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Neutron] DB: transaction isolation and related questions

2014-11-19 Thread Ryan Moats
I was waiting for this because I think I may have a slightly different (and
outside of the box) view on how to approach a solution to this.

Conceptually (at least in my mind) there isn't a whole lot of difference
between how the example below (i.e. updates from two concurrent threads) is
handled
and how/if neutron wants to support a multi-master database scenario (which
in turn lurks in the background when one starts thinking/talking about
multi-region support).

If neutron wants to eventually support multi-master database scenarios, I
see two ways to go about it:

1) Defer multi-master support to the database itself.
2) Take responsibility for managing the conflict resolution inherent in
multi-master scenarios itself.

The first approach is certainly simpler in the near term, but it has the
down side of restricting the choice of databases to those that have solved
multi-master and further, may lead to code bifurcation based on possibly
different solutions to the conflict resolution scenarios inherent in
multi-master.

The second approach is certainly more complex as neutron assumes more
responsibility for its own actions, but it has the advantage that (if done
right) would be transparent to the underlying databases (with all that
implies)

My reason for asking this question here is that if the community wants to
consider #2, then these problems are the place to start crafting that
solution - if we solve the conflicts inherent with the  two conncurrent
thread scenarios, then I think we will find that we've solved the
multi-master problem essentially for free.

Ryan Moats

Mike Bayer mba...@redhat.com wrote on 11/19/2014 12:05:35 PM:

 From: Mike Bayer mba...@redhat.com
 To: OpenStack Development Mailing List (not for usage questions)
 openstack-dev@lists.openstack.org
 Date: 11/19/2014 12:05 PM
 Subject: Re: [openstack-dev] [Neutron] DB: transaction isolation and
 related questions

 On Nov 18, 2014, at 1:38 PM, Eugene Nikanorov enikano...@mirantis.com
wrote:

 Hi neutron folks,

 There is an ongoing effort to refactor some neutron DB logic to be
 compatible with galera/mysql which doesn't support locking
 (with_lockmode('update')).

 Some code paths that used locking in the past were rewritten to
 retry the operation if they detect that an object was modified
concurrently.
 The problem here is that all DB operations (CRUD) are performed in
 the scope of some transaction that makes complex operations to be
 executed in atomic manner.
 For mysql the default transaction isolation level is 'REPEATABLE
 READ' which means that once the code issue a query within a
 transaction, this query will return the same result while in this
 transaction (e.g. the snapshot is taken by the DB during the first
 query and then reused for the same query).
 In other words, the retry logic like the following will not work:

 def allocate_obj():
 with session.begin(subtrans=True):
  for i in xrange(n_retries):
   obj = session.query(Model).filter_by(filters)
   count = session.query(Model).filter_by(id=obj.id
 ).update({'allocated': True})
   if count:
return obj

 since usually methods like allocate_obj() is called from within
 another transaction, we can't simply put transaction under 'for'
 loop to fix the issue.

 has this been confirmed?  the point of systems like repeatable read
 is not just that you read the “old” data, it’s also to ensure that
 updates to that data either proceed or fail explicitly; locking is
 also used to prevent concurrent access that can’t be reconciled.  A
 lower isolation removes these advantages.

 I ran a simple test in two MySQL sessions as follows:

 session 1:

 mysql create table some_table(data integer) engine=innodb;
 Query OK, 0 rows affected (0.01 sec)

 mysql insert into some_table(data) values (1);
 Query OK, 1 row affected (0.00 sec)

 mysql begin;
 Query OK, 0 rows affected (0.00 sec)

 mysql select data from some_table;
 +--+
 | data |
 +--+
 |1 |
 +--+
 1 row in set (0.00 sec)

 session 2:

 mysql begin;
 Query OK, 0 rows affected (0.00 sec)

 mysql update some_table set data=2 where data=1;
 Query OK, 1 row affected (0.00 sec)
 Rows matched: 1  Changed: 1  Warnings: 0

 then back in session 1, I ran:

 mysql update some_table set data=3 where data=1;

 this query blocked;  that’s because session 2 has placed a write
 lock on the table.  this is the effect of repeatable read isolation.

 while it blocked, I went to session 2 and committed the in-progress
 transaction:

 mysql commit;
 Query OK, 0 rows affected (0.00 sec)

 then session 1 unblocked, and it reported, correctly, that zero rows
 were affected:

 Query OK, 0 rows affected (7.29 sec)
 Rows matched: 0  Changed: 0  Warnings: 0

 the update had not taken place, as was stated by “rows matched:

 mysql select * from some_table;
 +--+
 | data |
 +--+
 |1 |
 +--+
 1 row in set (0.00 sec)

 the code in question would do a retry

Re: [openstack-dev] [Neutron] DB: transaction isolation and related questions

2014-11-19 Thread Jay Pipes
Hi Eugene, please see comments inline. But, bottom line, is that setting 
the transaction isolation level to READ_COMMITTED should be avoided.


On 11/18/2014 01:38 PM, Eugene Nikanorov wrote:

Hi neutron folks,

There is an ongoing effort to refactor some neutron DB logic to be
compatible with galera/mysql which doesn't support locking
(with_lockmode('update')).

Some code paths that used locking in the past were rewritten to retry
the operation if they detect that an object was modified concurrently.
The problem here is that all DB operations (CRUD) are performed in the
scope of some transaction that makes complex operations to be executed
in atomic manner.


Yes. The root of the problem in Neutron is that the session object is 
passed through all of the various plugin methods and the 
session.begin(subtransactions=True) is used all over the place, when in 
reality many things should not need to be done in long-lived 
transactional containers.



For mysql the default transaction isolation level is 'REPEATABLE READ'
which means that once the code issue a query within a transaction, this
query will return the same result while in this transaction (e.g. the
snapshot is taken by the DB during the first query and then reused for
the same query).


Correct.

However note that the default isolation level in PostgreSQL is READ 
COMMITTED, though it is important to point out that PostgreSQL's READ 
COMMITTED isolation level does *NOT* allow one session to see changes 
committed during query execution by concurrent transactions.


It is a common misunderstanding that MySQL's READ COMMITTED isolation 
level is the same as PostgreSQL's READ COMMITTED isolation level. It is 
not. PostgreSQL's READ COMMITTED isolation level is actually most 
closely similar to MySQL's REPEATABLE READ isolation level.


I bring this up because the proposed solution of setting the isolation 
level to READ COMMITTED will not work like you think it will on 
PostgreSQL. Regardless, please see below as to why setting the isolation 
level to READ COMMITTED is not the appropriate solution to this problem 
anyway...



In other words, the retry logic like the following will not work:

def allocate_obj():
 with session.begin(subtrans=True):
  for i in xrange(n_retries):
   obj = session.query(Model).filter_by(filters)
   count = session.query(Model).filter_by(id=obj.id
http://obj.id).update({'allocated': True})
   if count:
return obj

since usually methods like allocate_obj() is called from within another
transaction, we can't simply put transaction under 'for' loop to fix the
issue.


Exactly. The above code, from here:

https://github.com/openstack/neutron/blob/master/neutron/plugins/ml2/drivers/helpers.py#L98

has no chance of working at all under the existing default isolation 
levels for either MySQL or PostgreSQL. If another session updates the 
same row in between the time the first session began and the UPDATE 
statement in the first session starts, then the first session will 
return 0 rows affected. It will continue to return 0 rows affected for 
each loop, as long as the same transaction/session is still in effect, 
which in the code above, is the case.



The particular issue here is
https://bugs.launchpad.net/neutron/+bug/1382064 with the proposed fix:
https://review.openstack.org/#/c/129288

So far the solution proven by tests is to change transaction isolation
level for mysql to be 'READ COMMITTED'.
The patch suggests changing the level for particular transaction where
issue occurs (per sqlalchemy, it will be reverted to engine default once
transaction is committed)
This isolation level allows the code above to see different result in
each iteration.


Not for PostgreSQL, see above. You would need to set the level to READ 
*UNCOMMITTED* to get that behaviour for PostgreSQL, and setting to READ 
UNCOMMITTED is opening up the code to a variety of other issues and 
should be avoided.



At the same time, any code that relies that repeated query under same
transaction gives the same result may potentially break.

So the question is: what do you think about changing the default
isolation level to READ COMMITTED for mysql project-wise?
It is already so for postgress, however we don't have much concurrent
test coverage to guarantee that it's safe to move to a weaker isolation
level.


PostgreSQL READ COMMITTED is the same as MySQL's REPEATABLE READ. :) So, 
no, it doesn't work for PostgreSQL either.


The design of the Neutron plugin code's interaction with the SQLAlchemy 
session object is the main problem here. Instead of doing all of this 
within a single transactional container, the code should instead be 
changed to perform the SELECT statements in separate transactions/sessions.


That means not using the session parameter supplied to the 
neutron.plugins.ml2.drivers.helpers.TypeDriverHelper.allocate_partially_specified_segment() 
method, and instead performing 

Re: [openstack-dev] [Neutron] DB: transaction isolation and related questions

2014-11-19 Thread Mike Bayer

 On Nov 19, 2014, at 1:49 PM, Ryan Moats rmo...@us.ibm.com wrote:
 
 I was waiting for this because I think I may have a slightly different (and 
 outside of the box) view on how to approach a solution to this.
 
 Conceptually (at least in my mind) there isn't a whole lot of difference 
 between how the example below (i.e. updates from two concurrent threads) is 
 handled
 and how/if neutron wants to support a multi-master database scenario (which 
 in turn lurks in the background when one starts thinking/talking about 
 multi-region support).
 
 If neutron wants to eventually support multi-master database scenarios, I see 
 two ways to go about it:
 
 1) Defer multi-master support to the database itself.
 2) Take responsibility for managing the conflict resolution inherent in 
 multi-master scenarios itself.
 
 The first approach is certainly simpler in the near term, but it has the down 
 side of restricting the choice of databases to those that have solved 
 multi-master and further, may lead to code bifurcation based on possibly 
 different solutions to the conflict resolution scenarios inherent in 
 multi-master.
 
 The second approach is certainly more complex as neutron assumes more 
 responsibility for its own actions, but it has the advantage that (if done 
 right) would be transparent to the underlying databases (with all that 
 implies)
 
multi-master is a very advanced use case so I don’t see why it would be 
unreasonable to require a multi-master vendor database.   Reinventing a complex 
system like that in the application layer is an unnecessary reinvention.

As far as working across different conflict resolution scenarios, while there 
may be differences across backends, these differences will be much less 
significant compared to the differences against non-clustered backends in which 
we are inventing our own multi-master solution.   I doubt a home rolled 
solution would insulate us at all from “code bifurcation” as this is already a 
fact of life in targeting different backends even without any implication of 
clustering.   Even with simple things like transaction isolation, we see that 
different databases have different behavior, and if you look at the logic in 
oslo.db inside of 
https://github.com/openstack/oslo.db/blob/master/oslo/db/sqlalchemy/exc_filters.py
 
https://github.com/openstack/oslo.db/blob/master/oslo/db/sqlalchemy/exc_filters.py
 you can see an example of just how complex it is to just do the most 
rudimental task of organizing exceptions into errors that mean the same thing.


 My reason for asking this question here is that if the community wants to 
 consider #2, then these problems are the place to start crafting that 
 solution - if we solve the conflicts inherent with the  two conncurrent 
 thread scenarios, then I think we will find that we've solved the 
 multi-master problem essentially for free”.
 

Maybe I’m missing something, if we learn how to write out a row such that a 
concurrent transaction against the same row doesn’t throw us off, where is the 
part where that data is replicated to databases running concurrently on other 
IP numbers in a way that is atomic come out of that effort “for free” ?   A 
home-rolled “multi master” scenario would have to start with a system that has 
multiple create_engine() calls, since we need to communicate directly to 
multiple database servers. From there it gets really crazy.  Where’s all that ?




 
 Ryan Moats
 
 Mike Bayer mba...@redhat.com wrote on 11/19/2014 12:05:35 PM:
 
  From: Mike Bayer mba...@redhat.com
  To: OpenStack Development Mailing List (not for usage questions) 
  openstack-dev@lists.openstack.org
  Date: 11/19/2014 12:05 PM
  Subject: Re: [openstack-dev] [Neutron] DB: transaction isolation and
  related questions
  
  On Nov 18, 2014, at 1:38 PM, Eugene Nikanorov enikano...@mirantis.com 
  wrote:
  
  Hi neutron folks,
  
  There is an ongoing effort to refactor some neutron DB logic to be 
  compatible with galera/mysql which doesn't support locking 
  (with_lockmode('update')).
  
  Some code paths that used locking in the past were rewritten to 
  retry the operation if they detect that an object was modified concurrently.
  The problem here is that all DB operations (CRUD) are performed in 
  the scope of some transaction that makes complex operations to be 
  executed in atomic manner.
  For mysql the default transaction isolation level is 'REPEATABLE 
  READ' which means that once the code issue a query within a 
  transaction, this query will return the same result while in this 
  transaction (e.g. the snapshot is taken by the DB during the first 
  query and then reused for the same query).
  In other words, the retry logic like the following will not work:
  
  def allocate_obj():
  with session.begin(subtrans=True):
   for i in xrange(n_retries):
obj = session.query(Model).filter_by(filters)
count = session.query(Model).filter_by(id=obj.id

Re: [openstack-dev] [Neutron] DB: transaction isolation and related questions

2014-11-19 Thread Mike Bayer

 On Nov 19, 2014, at 2:58 PM, Jay Pipes jaypi...@gmail.com wrote:
 
 
 In other words, the retry logic like the following will not work:
 
 def allocate_obj():
 with session.begin(subtrans=True):
  for i in xrange(n_retries):
   obj = session.query(Model).filter_by(filters)
   count = session.query(Model).filter_by(id=obj.id
 http://obj.id).update({'allocated': True})
   if count:
return obj
 
 since usually methods like allocate_obj() is called from within another
 transaction, we can't simply put transaction under 'for' loop to fix the
 issue.
 
 Exactly. The above code, from here:
 
 https://github.com/openstack/neutron/blob/master/neutron/plugins/ml2/drivers/helpers.py#L98
 
 has no chance of working at all under the existing default isolation levels 
 for either MySQL or PostgreSQL. If another session updates the same row in 
 between the time the first session began and the UPDATE statement in the 
 first session starts, then the first session will return 0 rows affected. It 
 will continue to return 0 rows affected for each loop, as long as the same 
 transaction/session is still in effect, which in the code above, is the case.

oh, because it stays a zero, right.  yeah I didn’t understand that that was the 
failure case before.  should have just pinged you on IRC to answer the question 
without me wasting everyone’s time! :)

 
 The design of the Neutron plugin code's interaction with the SQLAlchemy 
 session object is the main problem here. Instead of doing all of this within 
 a single transactional container, the code should instead be changed to 
 perform the SELECT statements in separate transactions/sessions.
 
 That means not using the session parameter supplied to the 
 neutron.plugins.ml2.drivers.helpers.TypeDriverHelper.allocate_partially_specified_segment()
  method, and instead performing the SQL statements in separate transactions.
 
 Mike Bayer's EngineFacade blueprint work should hopefully unclutter the 
 current passing of a session object everywhere, but until that hits, it 
 should be easy enough to simply ensure that you don't use the same session 
 object over and over again, instead of changing the isolation level.

OK but EngineFacade was all about unifying broken-up transactions into one big 
transaction.   I’ve never been partial to the “retry something inside of a 
transaction” approach i usually prefer to have the API method raise and retry 
it’s whole series of operations all over again.  How do you propose 
EngineFacade’s transaction-unifying behavior with 
separate-transaction-per-SELECT (and wouldn’t that need to include the UPDATE 
as well? )  Did you see it as having the “one main transaction” with separate 
“ad-hoc, out of band” transactions as needed?




 
 All the best,
 -jay
 
 Your feedback is appreciated.
 
 Thanks,
 Eugene.
 
 
 
 ___
 OpenStack-dev mailing list
 OpenStack-dev@lists.openstack.org
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
 
 
 ___
 OpenStack-dev mailing list
 OpenStack-dev@lists.openstack.org
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Neutron] DB: transaction isolation and related questions

2014-11-19 Thread Ian Wells
On 19 November 2014 11:58, Jay Pipes jaypi...@gmail.com wrote:

 Some code paths that used locking in the past were rewritten to retry

 the operation if they detect that an object was modified concurrently.
 The problem here is that all DB operations (CRUD) are performed in the
 scope of some transaction that makes complex operations to be executed
 in atomic manner.


 Yes. The root of the problem in Neutron is that the session object is
 passed through all of the various plugin methods and the
 session.begin(subtransactions=True) is used all over the place, when in
 reality many things should not need to be done in long-lived transactional
 containers.


I think the issue is one of design, and it's possible what we discussed at
the summit may address some of this.

At the moment, Neutron's a bit confused about what it is.  Some plugins
treat a call to Neutron as the period of time in which an action should be
completed - the 'atomicity' thing.  This is not really compatible with a
distributed system and it's certainly not compatible with the principle of
eventual consistency that Openstack is supposed to follow.  Some plugins
treat the call as a change to desired networking state, and the action on
the network is performed asynchronously to bring the network state into
alignment with the state of the database.  (Many plugins do a bit of both.)

When you have a plugin that's decided to be synchronous, then there are
cases where the DB lock is held for a technically indefinite period of
time.  This is basically broken.

What we said at the summit is that we should move to an entirely async
model for the API, which in turn gets us to the 'desired state' model for
the DB.  DB writes would take one of two forms:

- An API call has requested that the data be updated, which it can do
immediately - the DB transaction takes as long as it takes to write the DB
consistently, and can hold locks on referenced rows to main consistency
providing the whole operation remains brief
- A network change has completed and the plugin wants to update an object's
state - again, the DB transaction contains only DB ops and nothing else and
should be quick.

Now, if we moved to that model, DB locks would be very very brief for the
sort of queries we'd need to do.  Setting aside the joys of Galera (and I
believe we identified that using one Galera node and doing all writes
through it worked just fine, though we could probably distribute read-only
transactions across all of them in the future), would there be any need for
transaction retries in that scenario?  I would have thought that DB locking
would be just fine as long as there was nothing but DB operations for the
period a transaction was open, and thus significantly changing the DB
lock/retry model now is a waste of time because it's a problem that will go
away.

Does that theory hold water?

-- 
Ian.
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Neutron] DB: transaction isolation and related questions

2014-11-19 Thread Ryan Moats



Ian Wells ijw.ubu...@cack.org.uk wrote on 11/19/2014 02:33:40 PM:

[snip]

 When you have a plugin that's decided to be synchronous, then there
 are cases where the DB lock is held for a technically indefinite
 period of time.  This is basically broken.

A big +1 to this statement

Ryan Moats___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Neutron] DB: transaction isolation and related questions

2014-11-19 Thread Ryan Moats


 Mike Bayer mba...@redhat.com wrote on 11/19/2014 02:10:18 PM:

  From: Mike Bayer mba...@redhat.com
  To: OpenStack Development Mailing List (not for usage questions)
  openstack-dev@lists.openstack.org
  Date: 11/19/2014 02:11 PM
  Subject: Re: [openstack-dev] [Neutron] DB: transaction isolation and
  related questions
 
  On Nov 19, 2014, at 1:49 PM, Ryan Moats rmo...@us.ibm.com wrote:
 
  I was waiting for this because I think I may have a slightly
  different (and outside of the box) view on how to approach a solution
to this.
 
  Conceptually (at least in my mind) there isn't a whole lot of
  difference between how the example below (i.e. updates from two
  concurrent threads) is handled
  and how/if neutron wants to support a multi-master database scenario
  (which in turn lurks in the background when one starts thinking/
  talking about multi-region support).
 
  If neutron wants to eventually support multi-master database
  scenarios, I see two ways to go about it:
 
  1) Defer multi-master support to the database itself.
  2) Take responsibility for managing the conflict resolution inherent
  in multi-master scenarios itself.
 
  The first approach is certainly simpler in the near term, but it has
  the down side of restricting the choice of databases to those that
  have solved multi-master and further, may lead to code bifurcation
  based on possibly different solutions to the conflict resolution
  scenarios inherent in multi-master.
  The second approach is certainly more complex as neutron assumes
  more responsibility for its own actions, but it has the advantage
  that (if done right) would be transparent to the underlying
  databases (with all that implies)

 multi-master is a very advanced use case so I don’t see why it would
 be unreasonable to require a multi-master vendor database.
 Reinventing a complex system like that in the application layer is
 an unnecessary reinvention.

 As far as working across different conflict resolution scenarios,
 while there may be differences across backends, these differences
 will be much less significant compared to the differences against
 non-clustered backends in which we are inventing our own multi-
 master solution.   I doubt a home rolled solution would insulate us
 at all from “code bifurcation” as this is already a fact of life in
 targeting different backends even without any implication of
 clustering.   Even with simple things like transaction isolation, we
 see that different databases have different behavior, and if you
 look at the logic in oslo.db inside of https://github.com/openstack/
 oslo.db/blob/master/oslo/db/sqlalchemy/exc_filters.py you can see an
 example of just how complex it is to just do the most rudimental
 task of organizing exceptions into errors that mean the same thing.

I didn't say it was unreasonable, I only point out that there is an
alternative for consideration.

BTW, I view your examples from oslo as helping make my argument for
me (and I don't think that was your intent :) )

  My reason for asking this question here is that if the community
  wants to consider #2, then these problems are the place to start
  crafting that solution - if we solve the conflicts inherent with the
  two conncurrent thread scenarios, then I think we will find that
  we've solved the multi-master problem essentially for free”.

 Maybe I’m missing something, if we learn how to write out a row such
 that a concurrent transaction against the same row doesn’t throw us
 off, where is the part where that data is replicated to databases
 running concurrently on other IP numbers in a way that is atomic
 come out of that effort “for free” ?   A home-rolled “multi master”
 scenario would have to start with a system that has multiple
 create_engine() calls, since we need to communicate directly to
 multiple database servers. From there it gets really crazy.  Where’sall
that ?

Boiled down, what you are talking about here w.r.t. concurrent
transactions is really conflict resolution, which is the hardest
part of implementing multi-master (as a side note, using locking in
this case is the equivalent of option #1).

All I wished to point out is that there are other ways to solve the
conflict resolution that could then be leveraged into a multi-master
scenario.

As for the parts that I glossed over, once conflict resolution is
separated out, replication turns into a much simpler problem with
well understood patterns and so I view that part as coming
for free.

Ryan___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Neutron] DB: transaction isolation and related questions

2014-11-19 Thread Clint Byrum
Excerpts from Mike Bayer's message of 2014-11-19 10:05:35 -0800:
 
  On Nov 18, 2014, at 1:38 PM, Eugene Nikanorov enikano...@mirantis.com 
  wrote:
  
  Hi neutron folks,
  
  There is an ongoing effort to refactor some neutron DB logic to be 
  compatible with galera/mysql which doesn't support locking 
  (with_lockmode('update')).
  
  Some code paths that used locking in the past were rewritten to retry the 
  operation if they detect that an object was modified concurrently.
  The problem here is that all DB operations (CRUD) are performed in the 
  scope of some transaction that makes complex operations to be executed in 
  atomic manner.
  For mysql the default transaction isolation level is 'REPEATABLE READ' 
  which means that once the code issue a query within a transaction, this 
  query will return the same result while in this transaction (e.g. the 
  snapshot is taken by the DB during the first query and then reused for the 
  same query).
  In other words, the retry logic like the following will not work:
  
  def allocate_obj():
  with session.begin(subtrans=True):
   for i in xrange(n_retries):
obj = session.query(Model).filter_by(filters)
count = session.query(Model).filter_by(id=obj.id 
  http://obj.id/).update({'allocated': True})
if count:
 return obj
  
  since usually methods like allocate_obj() is called from within another 
  transaction, we can't simply put transaction under 'for' loop to fix the 
  issue.
 
 has this been confirmed?  the point of systems like repeatable read is not 
 just that you read the “old” data, it’s also to ensure that updates to that 
 data either proceed or fail explicitly; locking is also used to prevent 
 concurrent access that can’t be reconciled.  A lower isolation removes these 
 advantages.  
 

Yes this is confirmed and fails reliably on Galera based systems.

 I ran a simple test in two MySQL sessions as follows:
 
 session 1:
 
 mysql create table some_table(data integer) engine=innodb;
 Query OK, 0 rows affected (0.01 sec)
 
 mysql insert into some_table(data) values (1);
 Query OK, 1 row affected (0.00 sec)
 
 mysql begin;
 Query OK, 0 rows affected (0.00 sec)
 
 mysql select data from some_table;
 +--+
 | data |
 +--+
 |1 |
 +--+
 1 row in set (0.00 sec)
 
 
 session 2:
 
 mysql begin;
 Query OK, 0 rows affected (0.00 sec)
 
 mysql update some_table set data=2 where data=1;
 Query OK, 1 row affected (0.00 sec)
 Rows matched: 1  Changed: 1  Warnings: 0
 
 then back in session 1, I ran:
 
 mysql update some_table set data=3 where data=1;
 
 this query blocked;  that’s because session 2 has placed a write lock on the 
 table.  this is the effect of repeatable read isolation.

With Galera this session might happen on another node. There is no
distributed lock, so this would not block...

 
 while it blocked, I went to session 2 and committed the in-progress 
 transaction:
 
 mysql commit;
 Query OK, 0 rows affected (0.00 sec)
 
 then session 1 unblocked, and it reported, correctly, that zero rows were 
 affected:
 
 Query OK, 0 rows affected (7.29 sec)
 Rows matched: 0  Changed: 0  Warnings: 0
 
 the update had not taken place, as was stated by “rows matched:
 
 mysql select * from some_table;
 +--+
 | data |
 +--+
 |1 |
 +--+
 1 row in set (0.00 sec)
 
 the code in question would do a retry at this point; it is checking the 
 number of rows matched, and that number is accurate.
 
 if our code did *not* block at the point of our UPDATE, then it would have 
 proceeded, and the other transaction would have overwritten what we just did, 
 when it committed.   I don’t know that read committed is necessarily any 
 better here.
 
 now perhaps, with Galera, none of this works correctly.  That would be a 
 different issue in which case sure, we should use whatever isolation is 
 recommended for Galera.  But I’d want to potentially peg it to the fact that 
 Galera is in use, or not.
 
 would love also to hear from Jay Pipes on this since he literally wrote the 
 book on MySQL ! :)

What you missed is that with Galera the commit that happened last would
be rolled back. This is a reality in many scenarios on SQL databases and
should be handled _regardless_ of Galera. It is a valid way to handle
deadlocks on single node DBs as well (pgsql will do this sometimes).

One simply cannot rely on multi-statement transactions to always succeed.

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Neutron] DB: transaction isolation and related questions

2014-11-19 Thread Mike Bayer

 On Nov 19, 2014, at 3:47 PM, Ryan Moats rmo...@us.ibm.com wrote:
 
  
 BTW, I view your examples from oslo as helping make my argument for
 me (and I don't think that was your intent :) )
 

I disagree with that as IMHO the differences in producing MM in the app layer 
against arbitrary backends (Postgresql vs. DB2 vs. MariaDB vs. ???)  will incur 
a lot more “bifurcation” than a system that targets only a handful of existing 
MM solutions.  The example I referred to in oslo.db is dealing with distinct, 
non MM backends.   That level of DB-specific code and more is a given if we are 
building a MM system against multiple backends generically.

It’s not possible to say which approach would be better or worse at the level 
of “how much database specific application logic do we need”, though in my 
experience, no matter what one is trying to do, the answer is always, “tons”; 
we’re dealing not just with databases but also Python drivers that have a vast 
amount of differences in behaviors, at every level.On top of all of that, 
hand-rolled MM adds just that much more application code to be developed and 
maintained, which also claims it will do a better job than mature (ish?) 
database systems designed to do the same job against a specific backend.



 
   My reason for asking this question here is that if the community 
   wants to consider #2, then these problems are the place to start 
   crafting that solution - if we solve the conflicts inherent with the
   two conncurrent thread scenarios, then I think we will find that 
   we've solved the multi-master problem essentially for free”.
   
  Maybe I’m missing something, if we learn how to write out a row such
  that a concurrent transaction against the same row doesn’t throw us 
  off, where is the part where that data is replicated to databases 
  running concurrently on other IP numbers in a way that is atomic 
  come out of that effort “for free” ?   A home-rolled “multi master” 
  scenario would have to start with a system that has multiple 
  create_engine() calls, since we need to communicate directly to 
  multiple database servers. From there it gets really crazy.  Where’sall 
  that ?
 
 Boiled down, what you are talking about here w.r.t. concurrent
 transactions is really conflict resolution, which is the hardest
 part of implementing multi-master (as a side note, using locking in
 this case is the equivalent of option #1).  
 
 All I wished to point out is that there are other ways to solve the
 conflict resolution that could then be leveraged into a multi-master
 scenario.
 
 As for the parts that I glossed over, once conflict resolution is
 separated out, replication turns into a much simpler problem with
 well understood patterns and so I view that part as coming
 for free.
 
 Ryan
 
 ___
 OpenStack-dev mailing list
 OpenStack-dev@lists.openstack.org
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Neutron] DB: transaction isolation and related questions

2014-11-19 Thread Eugene Nikanorov
Wow, lots of feedback in a matter of hours.

First of all, reading postgres docs I see that READ COMMITTED is the same
as for mysql, so it should address the issue we're discussing:

*Read Committed* is the default isolation level in PostgreSQL. When a
transaction uses this isolation level, a SELECT query (without a FOR
UPDATE/SHARE clause) *sees only data committed before the query began (not
before TX began - Eugene)*; it never sees either uncommitted data or
changes committed during query execution by concurrent transactions. In
effect, a SELECT query sees a snapshot of the database as of the instant
the query begins to run. However, SELECT does see the effects of previous
updates executed within its own transaction, even though they are not yet
committed. *Also note that two successive **SELECT commands can see
different data, even though they are within a single transaction, if other
transactions commit changes during execution of the first SELECT. *
http://www.postgresql.org/docs/8.4/static/transaction-iso.html

So, in my opinion, unless neutron code has parts that rely on 'repeatable
read' transaction isolation level (and I believe such code is possible,
didn't inspected closely yet), switching to READ COMMITTED is fine for
mysql.

On multi-master scenario: it is not really an advanced use case. It is
basic, we need to consider it as a basic and build architecture with
respect to this fact.
Retry approach fits well here, however it either requires proper
isolation level, or redesign of whole DB access layer.

Also, thanks Clint for clarification about example scenario described by Mike
Bayer.
Initially the issue was discovered with concurrent tests on multi master
environment with galera as a DB backend.

Thanks,
Eugene

On Thu, Nov 20, 2014 at 12:20 AM, Mike Bayer mba...@redhat.com wrote:


 On Nov 19, 2014, at 3:47 PM, Ryan Moats rmo...@us.ibm.com wrote:

 
 BTW, I view your examples from oslo as helping make my argument for
 me (and I don't think that was your intent :) )


 I disagree with that as IMHO the differences in producing MM in the app
 layer against arbitrary backends (Postgresql vs. DB2 vs. MariaDB vs. ???)
  will incur a lot more “bifurcation” than a system that targets only a
 handful of existing MM solutions.  The example I referred to in oslo.db is
 dealing with distinct, non MM backends.   That level of DB-specific code
 and more is a given if we are building a MM system against multiple
 backends generically.

 It’s not possible to say which approach would be better or worse at the
 level of “how much database specific application logic do we need”, though
 in my experience, no matter what one is trying to do, the answer is always,
 “tons”; we’re dealing not just with databases but also Python drivers that
 have a vast amount of differences in behaviors, at every level.On top
 of all of that, hand-rolled MM adds just that much more application code to
 be developed and maintained, which also claims it will do a better job than
 mature (ish?) database systems designed to do the same job against a
 specific backend.




   My reason for asking this question here is that if the community
   wants to consider #2, then these problems are the place to start
   crafting that solution - if we solve the conflicts inherent with the
   two conncurrent thread scenarios, then I think we will find that
   we've solved the multi-master problem essentially for free”.
 
  Maybe I’m missing something, if we learn how to write out a row such
  that a concurrent transaction against the same row doesn’t throw us
  off, where is the part where that data is replicated to databases
  running concurrently on other IP numbers in a way that is atomic
  come out of that effort “for free” ?   A home-rolled “multi master”
  scenario would have to start with a system that has multiple
  create_engine() calls, since we need to communicate directly to
  multiple database servers. From there it gets really crazy.  Where’sall
 that ?

 Boiled down, what you are talking about here w.r.t. concurrent
 transactions is really conflict resolution, which is the hardest
 part of implementing multi-master (as a side note, using locking in
 this case is the equivalent of option #1).

 All I wished to point out is that there are other ways to solve the
 conflict resolution that could then be leveraged into a multi-master
 scenario.

 As for the parts that I glossed over, once conflict resolution is
 separated out, replication turns into a much simpler problem with
 well understood patterns and so I view that part as coming
 for free.

 Ryan
 ___
 OpenStack-dev mailing list
 OpenStack-dev@lists.openstack.org
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev



 ___
 OpenStack-dev mailing list
 OpenStack-dev@lists.openstack.org
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev



Re: [openstack-dev] [Neutron] DB: transaction isolation and related questions

2014-11-19 Thread Mike Bayer

 On Nov 19, 2014, at 4:14 PM, Clint Byrum cl...@fewbar.com wrote:
 
 
 One simply cannot rely on multi-statement transactions to always succeed.

agree, but the thing you want is that the transaction either succeeds or 
explicitly fails, the latter hopefully in such a way that a retry can be added 
which has a chance at succeeding, if needed.  We have transaction replay logic 
in place in nova for example based on known failure conditions like concurrency 
exceptions, and this replay logic works, because it starts a new transaction.   
In this specific case, since it’s looping within a transaction where the data 
won’t change, it’ll never succeed, and the retry mechanism is useless.   But 
the isolation mode change won’t really help here as pointed out by Jay; 
discrete transactions have to be used instead.


___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Neutron] DB: transaction isolation and related questions

2014-11-19 Thread Eugene Nikanorov
 But the isolation mode change won’t really help here as pointed out by
Jay; discrete transactions have to be used instead.
I still think it will, per postgres documentation (which might look
confusing, but still...)
It actually helps for mysql, that was confirmed. For postgres it appears to
be the same.

Thanks,
Eugene.

On Thu, Nov 20, 2014 at 12:56 AM, Mike Bayer mba...@redhat.com wrote:


  On Nov 19, 2014, at 4:14 PM, Clint Byrum cl...@fewbar.com wrote:
 
 
  One simply cannot rely on multi-statement transactions to always succeed.

 agree, but the thing you want is that the transaction either succeeds or
 explicitly fails, the latter hopefully in such a way that a retry can be
 added which has a chance at succeeding, if needed.  We have transaction
 replay logic in place in nova for example based on known failure conditions
 like concurrency exceptions, and this replay logic works, because it starts
 a new transaction.   In this specific case, since it’s looping within a
 transaction where the data won’t change, it’ll never succeed, and the retry
 mechanism is useless.   But the isolation mode change won’t really help
 here as pointed out by Jay; discrete transactions have to be used instead.


 ___
 OpenStack-dev mailing list
 OpenStack-dev@lists.openstack.org
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Neutron] DB: transaction isolation and related questions

2014-11-19 Thread Jay Pipes

On 11/19/2014 04:27 PM, Eugene Nikanorov wrote:

Wow, lots of feedback in a matter of hours.

First of all, reading postgres docs I see that READ COMMITTED is the
same as for mysql, so it should address the issue we're discussing:

/Read Committed/ is the default isolation level in PostgreSQL. When a
transaction uses this isolation level, a SELECT query (without a FOR
UPDATE/SHARE clause) *sees only data committed before the query began
(not before TX began - Eugene)*; it never sees either uncommitted data
or changes committed during query execution by concurrent transactions.
In effect, a SELECT query sees a snapshot of the database as of the
instant the query begins to run. However, SELECT does see the effects of
previous updates executed within its own transaction, even though they
are not yet committed. *Also note that two successive **SELECT commands
can see different data, even though they are within a single
transaction, if other transactions commit changes during execution of
the first SELECT. *
http://www.postgresql.org/docs/8.4/static/transaction-iso.html


So while the SELECTs may return different data on successive calls when 
you use the READ COMMITTED isolation level, the UPDATE statements will 
continue to return 0 rows affected **if they attempt to change rows that 
have been changed since the start of the transaction**


The reason that changing the isolation level to READ COMMITTED appears 
to work for the code in question:


https://github.com/openstack/neutron/blob/master/neutron/plugins/ml2/drivers/helpers.py#L98

is likely because the SELECT ... LIMIT 1 query is returning a different 
row on successive attempts (though since there is no ORDER BY on the 
query, the returned row of the query is entirely unpredictable (line 
112)). Since data from that returned row is used in the UPDATE statement 
(line 118 and 124), *different* rows are actually being changed by 
successive UPDATE statements.


What this means is that for this *very particular* case, setting the 
transaction isolation level to READ COMMITTTED will work presumably most 
of the time on MySQL, but it's not an appropriate solution for the 
generalized problem domain of the SELECT FOR UPDATE. If you need to 
issue a SELECT and an UPDATE in a retry loop, and you are attempting to 
update the same row or rows (for instance, in the quota reservation or 
resource allocation scenarios), this solution will not work, even with 
READ COMMITTED. This is why I say it's not really appropriate, and a 
better general solution is to use separate transactions for each loop in 
the retry mechanic.



So, in my opinion, unless neutron code has parts that rely on
'repeatable read' transaction isolation level (and I believe such code
is possible, didn't inspected closely yet), switching to READ COMMITTED
is fine for mysql.


This will introduce more problems than you think, I believe. A better 
strategy is to simply use separate transactions for each loop 
iteration's queries.



On multi-master scenario: it is not really an advanced use case. It is
basic, we need to consider it as a basic and build architecture with
respect to this fact.
Retry approach fits well here, however it either requires proper
isolation level, or redesign of whole DB access layer.


It's not about the retry approach. I don't think anyone is saying that a 
retry approach is not a good idea. I've been a proponent of the retry 
approach to get around issues with SELECT FOR UPDATE ever since I 
brought up the issue to the mailing list about 7 months ago. :)


The issue is about doing the retry within a single transaction. That's 
not what I recommend doing. I recommend instead doing short separate 
transactions instead of long-lived, multi-statement transactions and 
relying on the behaviour of the DB's isolation level (default or 
otherwise) to solve the problem of reading changes to a record that 
you intend to update.


Cheers,
-jay


Also, thanks Clint for clarification about example scenario described by
Mike Bayer.
Initially the issue was discovered with concurrent tests on multi master
environment with galera as a DB backend.

Thanks,
Eugene

On Thu, Nov 20, 2014 at 12:20 AM, Mike Bayer mba...@redhat.com
mailto:mba...@redhat.com wrote:



On Nov 19, 2014, at 3:47 PM, Ryan Moats rmo...@us.ibm.com
mailto:rmo...@us.ibm.com wrote:


BTW, I view your examples from oslo as helping make my argument for
me (and I don't think that was your intent :) )



I disagree with that as IMHO the differences in producing MM in the
app layer against arbitrary backends (Postgresql vs. DB2 vs. MariaDB
vs. ???)  will incur a lot more “bifurcation” than a system that
targets only a handful of existing MM solutions.  The example I
referred to in oslo.db is dealing with distinct, non MM backends.
That level of DB-specific code and more is a given if we are
building a MM system against multiple backends generically.

It’s not possible to